Commit Graph

367 Commits

Author SHA1 Message Date
Magnus Karlsson
b4b8faa1de samples/bpf: sample application and documentation for AF_XDP sockets
This is a sample application for AF_XDP sockets. The application
supports three different modes of operation: rxdrop, txonly and l2fwd.

To show-case a simple round-robin load-balancing between a set of
sockets in an xskmap, set the RR_LB compile time define option to 1 in
"xdpsock.h".

v2: The entries variable was calculated twice in {umem,xq}_nb_avail.

Co-authored-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-03 15:55:25 -07:00
Yonghong Song
34745aed51 samples/bpf: fix kprobe attachment issue on x64
Commit d5a00528b5 ("syscalls/core, syscalls/x86: Rename
struct pt_regs-based sys_*() to __x64_sys_*()") renamed a lot
of syscall function sys_*() to __x64_sys_*().
This caused several kprobe based samples/bpf tests failing.

This patch fixed the problem in bpf_load.c.
For x86_64 architecture, function name __x64_sys_*() will be
first used for kprobe event creation. If the creation is successful,
it will be used. Otherwise, function name sys_*() will be used
for kprobe event creation.

Fixes: d5a00528b5 ("syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-29 20:36:53 -07:00
Yonghong Song
28dbf861de samples/bpf: move common-purpose trace functions to selftests
There is no functionality change in this patch. The common-purpose
trace functions, including perf_event polling and ksym lookup,
are moved from trace_output_user.c and bpf_load.c to
selftests/bpf/trace_helpers.c so that these function can
be reused later in selftests.

Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-04-29 08:45:54 -07:00
Wang Sheng-Hui
c0885f61bb samples, bpf: remove redundant ret assignment in bpf_load_program()
2 redundant ret assignments removed:

* 'ret = 1' before the logic 'if (data_maps)', and if any errors jump to
  label 'done'. No 'ret = 1' needed before the error jump.

* After the '/* load programs */' part, if everything goes well, then
  the BPF code will be loaded and 'ret' set to 0 by load_and_attach().
  If something goes wrong, 'ret' set to none-O, the redundant 'ret = 0'
  after the for clause will make the error skipped.

  For example, if some BPF code cannot provide supported program types
  in ELF SEC("unknown"), the for clause will not call load_and_attach()
  to load the BPF code. 1 should be returned to callees instead of 0.

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-27 00:47:06 +02:00
William Tu
b05cd74043 samples/bpf: remove the bpf tunnel testsuite.
Move the testsuite to
selftests/bpf/{test_tunnel_kern.c, test_tunnel.sh}

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-27 00:11:15 +02:00
Eyal Birger
29a36f9eef samples/bpf: extend test_tunnel_bpf.sh with xfrm state test
Add a test for fetching xfrm state parameters from a tc program running
on ingress.

Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-24 22:26:58 +02:00
Nikita V. Shirokov
c6ffd1ff78 bpf: add bpf_xdp_adjust_tail sample prog
adding bpf's sample program which is using bpf_xdp_adjust_tail helper
by generating ICMPv4 "packet to big" message if ingress packet's size is
bigger then 600 bytes

Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-18 23:34:17 +02:00
Jesper Dangaard Brouer
8de0e8ba97 samples/bpf: fix xdp_monitor user output for tracepoint exception
The variable rec_i contains an XDP action code not an error.
Thus, using err2str() was wrong, it should have been action2str().

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-18 14:48:06 +02:00
Wang Sheng-Hui
97c33610ac samples/bpf: correct comment in sock_example.c
The program run against loopback interace "lo", not "eth0".
Correct the comment.

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-04-18 14:45:26 +02:00
Alexei Starovoitov
4662a4e538 samples/bpf: raw tracepoint test
add empty raw_tracepoint bpf program to test overhead similar
to kprobe and traditional tracepoint tests

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-28 22:55:19 +02:00
Colin Ian King
20cfb7a04f samples/bpf: fix spelling mistake: "revieve" -> "receive"
Trivial fix to spelling mistake in error message text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-28 15:30:25 +02:00
John Fastabend
4c4c3c276c bpf: sockmap sample, add option to attach SK_MSG program
Add sockmap option to use SK_MSG program types.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-19 21:14:40 +01:00
Teng Qin
12fe12253c samples/bpf: add example to test reading address
This commit adds additional test in the trace_event example, by
attaching the bpf program to MEM_UOPS_RETIRED.LOCK_LOADS event with
PERF_SAMPLE_ADDR requested, and print the lock address value read from
the bpf program to trace_pipe.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-08 02:22:34 +01:00
William Tu
5f280b60d2 samples/bpf: add gre sequence number test.
The patch adds tests for GRE sequence number
support for metadata mode tunnel.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-04 18:35:02 -05:00
Prashant Bhole
c8745e07d5 samples/bpf: detach prog from cgroup
test_cgrp2_sock.sh and test_cgrp2_sock2.sh tests keep the program
attached to cgroup even after completion.
Using detach functionality of test_cgrp2_sock in both scripts.

Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-02 00:16:36 +01:00
Leo Yan
c535077789 samples/bpf: Add program for CPU state statistics
CPU is active when have running tasks on it and CPUFreq governor can
select different operating points (OPP) according to different workload;
we use 'pstate' to present CPU state which have running tasks with one
specific OPP.  On the other hand, CPU is idle which only idle task on
it, CPUIdle governor can select one specific idle state to power off
hardware logics; we use 'cstate' to present CPU idle state.

Based on trace events 'cpu_idle' and 'cpu_frequency' we can accomplish
the duration statistics for every state.  Every time when CPU enters
into or exits from idle states, the trace event 'cpu_idle' is recorded;
trace event 'cpu_frequency' records the event for CPU OPP changing, so
it's easily to know how long time the CPU stays in the specified OPP,
and the CPU must be not in any idle state.

This patch is to utilize the mentioned trace events for pstate and
cstate statistics.  To achieve more accurate profiling data, the program
uses below sequence to insure CPU running/idle time aren't missed:

- Before profiling the user space program wakes up all CPUs for once, so
  can avoid to missing account time for CPU staying in idle state for
  long time; the program forces to set 'scaling_max_freq' to lowest
  frequency and then restore 'scaling_max_freq' to highest frequency,
  this can ensure the frequency to be set to lowest frequency and later
  after start to run workload the frequency can be easily to be changed
  to higher frequency;

- User space program reads map data and update statistics for every 5s,
  so this is same with other sample bpf programs for avoiding big
  overload introduced by bpf program self;

- When send signal to terminate program, the signal handler wakes up
  all CPUs, set lowest frequency and restore highest frequency to
  'scaling_max_freq'; this is exactly same with the first step so
  avoid to missing account CPU pstate and cstate time during last
  stage.  Finally it reports the latest statistics.

The program has been tested on Hikey board with octa CA53 CPUs, below
is one example for statistics result, the format mainly follows up
Jesper Dangaard Brouer suggestion.

Jesper reminds to 'get printf to pretty print with thousands separators
use %' and setlocale(LC_NUMERIC, "en_US")', tried three different arm64
GCC toolchains (5.4.0 20160609, 6.2.1 20161016, 6.3.0 20170516) but all
of them cannot support printf flag character %' on arm64 platform, so go
back print number without grouping mode.

CPU states statistics:
state(ms)  cstate-0    cstate-1    cstate-2    pstate-0    pstate-1    pstate-2    pstate-3    pstate-4
CPU-0      767         6111        111863      561         31          756         853         190
CPU-1      241         10606       107956      484         125         646         990         85
CPU-2      413         19721       98735       636         84          696         757         89
CPU-3      84          11711       79989       17516       909         4811        5773        341
CPU-4      152         19610       98229       444         53          649         708         1283
CPU-5      185         8781        108697      666         91          671         677         1365
CPU-6      157         21964       95825       581         67          566         684         1284
CPU-7      125         15238       102704      398         20          665         786         1197

Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-02-26 10:54:02 +01:00
Tushar Dave
8ac2441e0b samples/bpf: adjust rlimit RLIMIT_MEMLOCK for xdp_redirect
Default rlimit RLIMIT_MEMLOCK is 64KB, causes bpf map failure.
e.g.
[root@labbpf]# ./xdp_redirect $(</sys/class/net/eth2/ifindex) \
> $(</sys/class/net/eth3/ifindex)
failed to create a map: 1 Operation not permitted

The failure is seen when executing xdp_redirect while xdp_monitor
is already runnig.

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-13 17:42:01 -08:00
William Tu
9c33ca4317 sample/bpf: fix erspan metadata
The commit c69de58ba8 ("net: erspan: use bitfield instead of
mask and offset") changes the erspan header to use bitfield, and
commit d350a82302 ("net: erspan: create erspan metadata uapi header")
creates a uapi header file.  The above two commit breaks the current
erspan test.  This patch fixes it by adapting the above two changes.

Fixes: ac80c2a165 ("samples/bpf: add erspan v2 sample code")
Fixes: ef88f89c83 ("samples/bpf: extend test_tunnel_bpf.sh with ERSPAN")
Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-06 11:32:49 -05:00
Eric Leblond
b259c2ffd9 samples/bpf: use bpf_set_link_xdp_fd
Use bpf_set_link_xdp_fd instead of set_link_xdp_fd to remove some
code duplication and benefit of netlink ext ack errors message.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-02 17:53:48 -08:00
Eric Leblond
bbf48c18ee libbpf: add error reporting in XDP
Parse netlink ext attribute to get the error message returned by
the card. Code is partially take from libnl.

We add netlink.h to the uapi include of tools. And we need to
avoid include of userspace netlink header to have a successful
build of sample so nlattr.h has a define to avoid
the inclusion. Using a direct define could have been an issue
as NLMSGERR_ATTR_MAX can change in the future.

We also define SOL_NETLINK if not defined to avoid to have to
copy socket.h for a fixed value.

Signed-off-by: Eric Leblond <eric@regit.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-02-02 17:53:48 -08:00
Mickaël Salaün
c25ef6a5e6 samples/bpf: Partially fixes the bpf.o build
Do not build lib/bpf/bpf.o with this Makefile but use the one from the
library directory.  This avoid making a buggy bpf.o file (e.g. missing
symbols).

This patch is useful if some code (e.g. Landlock tests) needs both the
bpf.o (from tools/lib/bpf) and the bpf_load.o (from samples/bpf).

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-26 23:57:10 +01:00
Jesper Dangaard Brouer
417f1d9f21 samples/bpf: xdp_monitor include cpumap tracepoints in monitoring
The xdp_redirect_cpu sample have some "builtin" monitoring of the
tracepoints for xdp_cpumap_*, but it is practical to have an external
tool that can monitor these transpoint as an easy way to troubleshoot
an application using XDP + cpumap.

Specifically I need such external tool when working on Suricata and
XDP cpumap redirect. Extend the xdp_monitor tool sample with
monitoring of these xdp_cpumap_* tracepoints.  Model the output format
like xdp_redirect_cpu.

Given I needed to handle per CPU decoding for cpumap, this patch also
add per CPU info on the existing monitor events.  This resembles part
of the builtin monitoring output from sample xdp_rxq_info.  Thus, also
covering part of that sample in an external monitoring tool.

Performance wise, the cpumap tracepoints uses bulking, which cause
them to have very little overhead.  Thus, they are enabled by default.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-20 02:10:55 +01:00
Jesper Dangaard Brouer
e2e3224122 samples/bpf: xdp2skb_meta comment explain why pkt-data pointers are invalidated
Improve the 'unknown reason' comment, with an actual explaination of why
the ctx pkt-data pointers need to be loaded after the helper function
bpf_xdp_adjust_meta().  Based on the explaination Daniel gave.

Fixes: 36e04a2d78 ("samples/bpf: xdp2skb_meta shows transferring info from XDP to SKB")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-18 01:49:09 +01:00
Luis de Bethencourt
4c38f74c91 samples/bpf: Fix trailing semicolon
The trailing semicolon is an empty statement that does no operation.
Removing it since it doesn't do anything.

Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-17 00:23:58 +01:00
Jesper Dangaard Brouer
36e04a2d78 samples/bpf: xdp2skb_meta shows transferring info from XDP to SKB
Creating a bpf sample that shows howto use the XDP 'data_meta'
infrastructure, created by Daniel Borkmann.  Very few drivers support
this feature, but I wanted a functional sample to begin with, when
working on adding driver support.

XDP data_meta is about creating a communication channel between BPF
programs.  This can be XDP tail-progs, but also other SKB based BPF
hooks, like in this case the TC clsact hook. In this sample I show
that XDP can store info named "mark", and TC/clsact chooses to use
this info and store it into the skb->mark.

It is a bit annoying that XDP and TC samples uses different tools/libs
when attaching their BPF hooks.  As the XDP and TC programs need to
cooperate and agree on a struct-layout, it is best/easiest if the two
programs can be contained within the same BPF restricted-C file.

As the bpf-loader, I choose to not use bpf_load.c (or libbpf), but
instead wrote a bash shell scripted named xdp2skb_meta.sh, which
demonstrate howto use the iproute cmdline tools 'tc' and 'ip' for
loading BPF programs.  To make it easy for first time users, the shell
script have command line parsing, and support --verbose and --dry-run
mode, if you just want to see/learn the tc+ip command syntax:

 # ./xdp2skb_meta.sh --dev ixgbe2 --dry-run
 # Dry-run mode: enable VERBOSE and don't call TC+IP
 tc qdisc del dev ixgbe2 clsact
 tc qdisc add dev ixgbe2 clsact
 tc filter add dev ixgbe2 ingress prio 1 handle 1 bpf da obj ./xdp2skb_meta_kern.o sec tc_mark
 # Flush XDP on device: ixgbe2
 ip link set dev ixgbe2 xdp off
 ip link set dev ixgbe2 xdp obj ./xdp2skb_meta_kern.o sec xdp_mark

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-01-11 01:02:25 +01:00
Jesper Dangaard Brouer
0fca931a6f samples/bpf: program demonstrating access to xdp_rxq_info
This sample program can be used for monitoring and reporting how many
packets per sec (pps) are received per NIC RX queue index and which
CPU processed the packet. In itself it is a useful tool for quickly
identifying RSS imbalance issues, see below.

The default XDP action is XDP_PASS in-order to provide a monitor
mode. For benchmarking purposes it is possible to specify other XDP
actions on the cmdline --action.

Output below shows an imbalance RSS case where most RXQ's deliver to
CPU-0 while CPU-2 only get packets from a single RXQ.  Looking at
things from a CPU level the two CPUs are processing approx the same
amount, BUT looking at the rx_queue_index levels it is clear that
RXQ-2 receive much better service, than other RXQs which all share CPU-0.

Running XDP on dev:i40e1 (ifindex:3) action:XDP_PASS
XDP stats       CPU     pps         issue-pps
XDP-RX CPU      0       900,473     0
XDP-RX CPU      2       906,921     0
XDP-RX CPU      total   1,807,395

RXQ stats       RXQ:CPU pps         issue-pps
rx_queue_index    0:0   180,098     0
rx_queue_index    0:sum 180,098
rx_queue_index    1:0   180,098     0
rx_queue_index    1:sum 180,098
rx_queue_index    2:2   906,921     0
rx_queue_index    2:sum 906,921
rx_queue_index    3:0   180,098     0
rx_queue_index    3:sum 180,098
rx_queue_index    4:0   180,082     0
rx_queue_index    4:sum 180,082
rx_queue_index    5:0   180,093     0
rx_queue_index    5:sum 180,093

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05 15:21:22 -08:00
David S. Miller
59436c9ee1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2017-12-18

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Allow arbitrary function calls from one BPF function to another BPF function.
   As of today when writing BPF programs, __always_inline had to be used in
   the BPF C programs for all functions, unnecessarily causing LLVM to inflate
   code size. Handle this more naturally with support for BPF to BPF calls
   such that this __always_inline restriction can be overcome. As a result,
   it allows for better optimized code and finally enables to introduce core
   BPF libraries in the future that can be reused out of different projects.
   x86 and arm64 JIT support was added as well, from Alexei.

2) Add infrastructure for tagging functions as error injectable and allow for
   BPF to return arbitrary error values when BPF is attached via kprobes on
   those. This way of injecting errors generically eases testing and debugging
   without having to recompile or restart the kernel. Tags for opting-in for
   this facility are added with BPF_ALLOW_ERROR_INJECTION(), from Josef.

3) For BPF offload via nfp JIT, add support for bpf_xdp_adjust_head() helper
   call for XDP programs. First part of this work adds handling of BPF
   capabilities included in the firmware, and the later patches add support
   to the nfp verifier part and JIT as well as some small optimizations,
   from Jakub.

4) The bpftool now also gets support for basic cgroup BPF operations such
   as attaching, detaching and listing current BPF programs. As a requirement
   for the attach part, bpftool can now also load object files through
   'bpftool prog load'. This reuses libbpf which we have in the kernel tree
   as well. bpftool-cgroup man page is added along with it, from Roman.

5) Back then commit e87c6bc385 ("bpf: permit multiple bpf attachments for
   a single perf event") added support for attaching multiple BPF programs
   to a single perf event. Given they are configured through perf's ioctl()
   interface, the interface has been extended with a PERF_EVENT_IOC_QUERY_BPF
   command in this work in order to return an array of one or multiple BPF
   prog ids that are currently attached, from Yonghong.

6) Various minor fixes and cleanups to the bpftool's Makefile as well
   as a new 'uninstall' and 'doc-uninstall' target for removing bpftool
   itself or prior installed documentation related to it, from Quentin.

7) Add CONFIG_CGROUP_BPF=y to the BPF kernel selftest config file which is
   required for the test_dev_cgroup test case to run, from Naresh.

8) Fix reporting of XDP prog_flags for nfp driver, from Jakub.

9) Fix libbpf's exit code from the Makefile when libelf was not found in
   the system, also from Jakub.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-18 10:51:06 -05:00
William Tu
ac80c2a165 samples/bpf: add erspan v2 sample code
Extend the existing tests for ipv4 ipv6 erspan version 2.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-15 12:34:00 -05:00
Josef Bacik
965de87e54 samples/bpf: add a test for bpf_override_return
This adds a basic test for bpf_override_return to verify it works.  We
override the main function for mounting a btrfs fs so it'll return
-ENOMEM and then make sure that trying to mount a btrfs fs will fail.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-12 09:02:40 -08:00
William Tu
d37e3bb774 samples/bpf: add ip6erspan sample code
Extend the existing tests for ip6erspan.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-06 14:45:29 -05:00
David S. Miller
7cda4cee13 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Small overlapping change conflict ('net' changed a line,
'net-next' added a line right afterwards) in flexcan.c

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-05 10:44:19 -05:00
David S. Miller
d671965b54 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2017-12-03

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Addition of a software model for BPF offloads in order to ease
   testing code changes in that area and make semantics more clear.
   This is implemented in a new driver called netdevsim, which can
   later also be extended for other offloads. SR-IOV support is added
   as well to netdevsim. BPF kernel selftests for offloading are
   added so we can track basic functionality as well as exercising
   all corner cases around BPF offloading, from Jakub.

2) Today drivers have to drop the reference on BPF progs they hold
   due to XDP on device teardown themselves. Change this in order
   to make XDP handling inside the drivers less error prone, and
   move disabling XDP to the core instead, also from Jakub.

3) Misc set of BPF verifier improvements and cleanups as preparatory
   work for upcoming BPF-to-BPF calls. Among others, this set also
   improves liveness marking such that pruning can be slightly more
   effective. Register and stack liveness information is now included
   in the verifier log as well, from Alexei.

4) nfp JIT improvements in order to identify load/store sequences in
   the BPF prog e.g. coming from memcpy lowering and optimizing them
   through the NPU's command push pull (CPP) instruction, from Jiong.

5) Cleanups to test_cgrp2_attach2.c BPF sample code in oder to remove
   bpf_prog_attach() magic values and replacing them with actual proper
   attach flag instead, from David.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 12:07:10 -05:00
William Tu
56ddd30280 samples/bpf: extend test_tunnel_bpf.sh with ip6gre
Extend existing tests for vxlan, gre, geneve, ipip, erspan,
to include ip6 gre and gretap tunnel.

Signed-off-by: William Tu <u9012063@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-04 11:04:19 -05:00
David S. Miller
c2eb6d07a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2017-12-02

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a compilation warning in xdp redirect tracepoint due to
   missing bpf.h include that pulls in struct bpf_map, from Xie.

2) Limit the maximum number of attachable BPF progs for a given
   perf event as long as uabi is not frozen yet. The hard upper
   limit is now 64 and therefore the same as with BPF multi-prog
   for cgroups. Also add related error checking for the sample
   BPF loader when enabling and attaching to the perf event, from
   Yonghong.

3) Specifically set the RLIMIT_MEMLOCK for the test_verifier_log
   case, so that the test case can always pass and not fail in
   some environments due to too low default limit, also from
   Yonghong.

4) Fix up a missing license header comment for kernel/bpf/offload.c,
   from Jakub.

5) Several fixes for bpftool, among others a crash on incorrect
   arguments when json output is used, error message handling
   fixes on unknown options and proper destruction of json writer
   for some exit cases, all from Quentin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-03 13:08:30 -05:00
Yonghong Song
0ec9552b43 samples/bpf: add error checking for perf ioctl calls in bpf loader
load_bpf_file() should fail if ioctl with command
PERF_EVENT_IOC_ENABLE and PERF_EVENT_IOC_SET_BPF fails.
When they do fail, proper error messages are printed.

With this change, the below "syscall_tp" run shows that
the maximum number of bpf progs attaching to the same
perf tracepoint is indeed enforced.
  $ ./syscall_tp -i 64
  prog #0: map ids 4 5
  ...
  prog #63: map ids 382 383
  $ ./syscall_tp -i 65
  prog #0: map ids 4 5
  ...
  prog #64: map ids 388 389
  ioctl PERF_EVENT_IOC_SET_BPF failed err Argument list too long

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-01 02:59:21 +01:00
David Ahern
51de082570 samples/bpf: Convert magic numbers to names in multi-prog cgroup test case
Attach flag 1 == BPF_F_ALLOW_OVERRIDE; attach flag 2 == BPF_F_ALLOW_MULTI.
Update the calls to bpf_prog_attach() in test_cgrp2_attach2.c to use the
names over the magic numbers.

Fixes: 39323e788c ("samples/bpf: add multi-prog cgroup test case")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-11-30 19:58:11 +01:00
Masahiro Yamada
bf070bb0e6 kbuild: remove all dummy assignments to obj-
Now kbuild core scripts create empty built-in.o where necessary.
Remove "obj- := dummy.o" tricks.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2017-11-18 11:46:06 +09:00
Dan Carpenter
fae45363ae xdp: sample: Missing curly braces in read_route()
The assert statement is supposed to be part of the else branch but the
curly braces were accidentally left off.

Fixes: 3e29cd0e65 ("xdp: Sample xdp program implementing ip forward")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-14 22:00:13 +09:00
David S. Miller
f3edacbd69 bpf: Revert bpf_overrid_function() helper changes.
NACK'd by x86 maintainer.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 18:24:55 +09:00
Lawrence Brakmo
03e982eed4 bpf: Fix tcp_clamp_kern.c sample program
The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:52:41 +09:00
Lawrence Brakmo
e1853319fc bpf: Fix tcp_iw_kern.c sample program
The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:52:41 +09:00
Lawrence Brakmo
2ff969fbe2 bpf: Fix tcp_cong_kern.c sample program
The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:52:41 +09:00
Lawrence Brakmo
a4174f0560 bpf: Fix tcp_bufs_kern.c sample program
The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:52:41 +09:00
Lawrence Brakmo
016e661bb0 bpf: Fix tcp_rwnd_kern.c sample program
The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:52:41 +09:00
Lawrence Brakmo
7863f46bac bpf: Fix tcp_synrto_kern.c sample program
The program was returning -1 in some cases which is not allowed
by the verifier any longer.

Fixes: 390ee7e29f ("bpf: enforce return code for cgroup-bpf programs")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 15:52:41 +09:00
Josef Bacik
eafb3401fa samples/bpf: add a test for bpf_override_return
This adds a basic test for bpf_override_return to verify it works.  We
override the main function for mounting a btrfs fs so it'll return
-ENOMEM and then make sure that trying to mount a btrfs fs will fail.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-11 12:18:06 +09:00
Lawrence Brakmo
aaf151b9e6 bpf: Rename tcp_bbf.readme to tcp_bpf.readme
The original patch had the wrong filename.

Fixes: bfdf756938 ("bpf: create samples/bpf/tcp_bpf.readme")
Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08 12:13:41 +09:00
Christina Jacob
3e29cd0e65 xdp: Sample xdp program implementing ip forward
Implements port to port forwarding with route table and arp table
lookup for ipv4 packets using bpf_redirect helper function and
lpm_trie  map.

Signed-off-by: Christina Jacob <Christina.Jacob@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-08 10:39:41 +09:00
Roman Gushchin
9d1f159419 bpf: move cgroup_helpers from samples/bpf/ to tools/testing/selftesting/bpf/
The purpose of this move is to use these files in bpf tests.

Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-05 23:26:51 +09:00
David S. Miller
2a171788ba Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Files removed in 'net-next' had their license header updated
in 'net'.  We take the remove from 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-04 09:26:51 +09:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Tushar Dave
21d72af7dc samples/bpf: adjust rlimit RLIMIT_MEMLOCK for xdp_redirect_map
Default rlimit RLIMIT_MEMLOCK is 64KB, causes bpf map failure.
e.g.
[root@labbpf]# ./xdp_redirect_map $(</sys/class/net/eth2/ifindex) \
> $(</sys/class/net/eth3/ifindex)
failed to create a map: 1 Operation not permitted

The failure is 100% when multiple xdp programs are running. Fix it.

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 12:17:32 +09:00
Tushar Dave
6dfca831c0 samples/bpf: adjust rlimit RLIMIT_MEMLOCK for xdp1
Default rlimit RLIMIT_MEMLOCK is 64KB, causes bpf map failure.
e.g.
[root@lab bpf]#./xdp1 -N $(</sys/class/net/eth2/ifindex)
failed to create a map: 1 Operation not permitted

Fix it.

Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-29 12:17:13 +09:00
Yonghong Song
a678be5cc7 bpf: add a test case to test single tp multiple bpf attachment
The bpf sample program syscall_tp is modified to
show attachment of more than bpf programs
for a particular kernel tracepoint.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-25 10:47:47 +09:00
Lawrence Brakmo
bfdf756938 bpf: create samples/bpf/tcp_bpf.readme
Readme file explaining how to create a cgroupv2 and attach one
of the tcp_*_kern.o socket_ops BPF program.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked_by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:12:05 +01:00
Lawrence Brakmo
c890063e44 bpf: sample BPF_SOCKET_OPS_BASE_RTT program
Sample socket_ops BPF program to test the BPF helper function
bpf_getsocketops and the new socket_ops op BPF_SOCKET_OPS_BASE_RTT.

The program provides a base RTT of 80us when the calling flow is
within a DC (as determined by the IPV6 prefix) and the congestion
algorithm is "nv".

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked_by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 03:12:05 +01:00
Jesper Dangaard Brouer
fad3917e36 samples/bpf: add cpumap sample program xdp_redirect_cpu
This sample program show how to use cpumap and the associated
tracepoints.

It provides command line stats, which shows how the XDP-RX process,
cpumap-enqueue and cpumap kthread dequeue is cooperating on a per CPU
basis.  It also utilize the xdp_exception and xdp_redirect_err
transpoints to allow users quickly to identify setup issues.

One issue with ixgbe driver is that the driver reset the link when
loading XDP.  This reset the procfs smp_affinity settings.  Thus,
after loading the program, these must be reconfigured.  The easiest
workaround it to reduce the RX-queue to e.g. two via:

 # ethtool --set-channels ixgbe1 combined 2

And then add CPUs above 0 and 1, like:

 # xdp_redirect_cpu --dev ixgbe1 --prog 2 --cpu 2 --cpu 3 --cpu 4

Another issue with ixgbe is that the page recycle mechanism is tied to
the RX-ring size.  And the default setting of 512 elements is too
small.  This is the same issue with regular devmap XDP_REDIRECT.
To overcome this I've been using 1024 rx-ring size:

 # ethtool -G ixgbe1 rx 1024 tx 1024

V3:
 - whitespace cleanups
 - bpf tracepoint cannot access top part of struct

V4:
 - report on kthread sched events, according to tracepoint change
 - report average bulk enqueue size

V5:
 - bpf_map_lookup_elem on cpumap not allowed from bpf_prog
   use separate map to mark CPUs not available

V6:
 - correct kthread sched summary output

V7:
 - Added a --stress-mode for concurrently changing underlying cpumap

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-18 12:12:18 +01:00
Abhijit Ayarekar
9db9583839 bpf: Add -target to clang switch while cross compiling.
Update to llvm excludes assembly instructions.
llvm git revision is below

commit 65fad7c26569 ("bpf: add inline-asm support")

This change will be part of llvm  release 6.0

__ASM_SYSREG_H define is not required for native compile.
-target switch includes appropriate target specific files
while cross compiling

Tested on x86 and arm64.

Signed-off-by: Abhijit Ayarekar <abhijit.ayarekar@caviumnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-16 21:04:32 +01:00
Yonghong Song
81b9cf8028 bpf: add a test case for helper bpf_perf_prog_read_value
The bpf sample program trace_event is enhanced to use the new
helper to print out enabled/running time.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-07 23:05:57 +01:00
Yonghong Song
020a32d958 bpf: add a test case for helper bpf_perf_event_read_value
The bpf sample program tracex6 is enhanced to use the new
helper to read enabled/running time as well.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-07 23:05:57 +01:00
Jesper Dangaard Brouer
c4eb7f4643 samples/bpf: xdp_monitor increase memory rlimit
Other concurrent running programs, like perf or the XDP program what
needed to be monitored, might take up part of the max locked memory
limit.  Thus, the xdp_monitor tool have to set the RLIMIT_MEMLOCK to
RLIM_INFINITY, as it cannot determine a more sane limit.

Using the man exit(3) specified EXIT_FAILURE return exit code, and
correct other users too.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-06 10:04:36 -07:00
Jesper Dangaard Brouer
280b058d48 samples/bpf: xdp_monitor also record xdp_exception tracepoint
Also monitor the tracepoint xdp_exception.  This tracepoint is usually
invoked by the drivers.  Programs themselves can activate this by
returning XDP_ABORTED, which will drop the packet but also trigger the
tracepoint.  This is useful for distinguishing intentional (XDP_DROP)
vs. ebpf-program error cases that cased a drop (XDP_ABORTED).

Drivers also use this tracepoint for reporting on XDP actions that are
unknown to the specific driver.  This can help the user to detect if a
driver e.g. doesn't implement XDP_REDIRECT yet.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-06 10:04:35 -07:00
Jesper Dangaard Brouer
f4ce0a0116 samples/bpf: xdp_monitor first 8 bytes are not accessible by bpf
The first 8 bytes of the tracepoint context struct are not accessible
by the bpf code.  This is a choice that dates back to the original
inclusion of this code.

See explaination in:
 commit 98b5c2c65c ("perf, bpf: allow bpf programs attach to tracepoints")

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-06 10:04:35 -07:00
Alexei Starovoitov
dfc069998e samples/bpf: use bpf_prog_query() interface
use BPF_PROG_QUERY command to strengthen test coverage

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 16:05:06 -07:00
Alexei Starovoitov
39323e788c samples/bpf: add multi-prog cgroup test case
create 5 cgroups, attach 6 progs and check that progs are executed as:
cgrp1 (MULTI progs A, B) ->
   cgrp2 (OVERRIDE prog C) ->
     cgrp3 (MULTI prog D) ->
       cgrp4 (OVERRIDE prog E) ->
         cgrp5 (NONE prog F)
the event in cgrp5 triggers execution of F,D,A,B in that order.
if prog F is detached, the execution is E,D,A,B
if prog F and D are detached, the execution is E,A,B
if prog F, E and D are detached, the execution is C,A,B

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-04 16:05:05 -07:00
Stephen Hemminger
0929567a7a samples/bpf: fix warnings in xdp_monitor_user
Make local functions static to fix

  HOSTCC  samples/bpf/xdp_monitor_user.o
samples/bpf/xdp_monitor_user.c:64:7: warning: no previous prototype for ‘gettime’ [-Wmissing-prototypes]
 __u64 gettime(void)
       ^~~~~~~
samples/bpf/xdp_monitor_user.c:209:6: warning: no previous prototype for ‘print_bpf_prog_info’ [-Wmissing-prototypes]
 void print_bpf_prog_info(void)
      ^~~~~~~~~~~~~~~~~~~
Fixes: 3ffab54602 ("samples/bpf: xdp_monitor tool based on tracepoints")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-01 23:08:24 -07:00
Martin KaFai Lau
88cda1c9da bpf: libbpf: Provide basic API support to specify BPF obj name
This patch extends the libbpf to provide API support to
allow specifying BPF object name.

In tools/lib/bpf/libbpf, the C symbol of the function
and the map is used.  Regarding section name, all maps are
under the same section named "maps".  Hence, section name
is not a good choice for map's name.  To be consistent with
map, bpf_prog also follows and uses its function symbol as
the prog's name.

This patch adds logic to collect function's symbols in libbpf.
There is existing codes to collect the map's symbols and no change
is needed.

The bpf_load_program_name() and bpf_map_create_name() are
added to take the name argument.  For the other bpf_map_create_xxx()
variants, a name argument is directly added to them.

In samples/bpf, bpf_load.c in particular, the symbol is also
used as the map's name and the map symbols has already been
collected in the existing code.  For bpf_prog, bpf_load.c does
not collect the function symbol name.  We can consider to collect
them later if there is a need to continue supporting the bpf_load.c.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-29 06:17:05 +01:00
Joel Fernandes
8bf2ac25a9 samples/bpf: Add documentation on cross compilation
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 11:59:16 -07:00
Joel Fernandes
b655fc1c2e samples/bpf: Fix pt_regs issues when cross-compiling
BPF samples fail to build when cross-compiling for ARM64 because of incorrect
pt_regs param selection. This is because clang defines __x86_64__ and
bpf_headers thinks we're building for x86. Since clang is building for the BPF
target, it shouldn't make assumptions about what target the BPF program is
going to run on. To fix this, lets pass ARCH so the header knows which target
the BPF program is being compiled for and can use the correct pt_regs code.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 11:59:16 -07:00
Joel Fernandes
876e88e327 samples/bpf: Enable cross compiler support
When cross compiling, bpf samples use HOSTCC for compiling the non-BPF part of
the sample, however what we really want is to use the cross compiler to build
for the cross target since that is what will load and run the BPF sample.
Detect this and compile samples correctly.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 11:59:16 -07:00
Joel Fernandes
95ec669685 samples/bpf: Use getppid instead of getpgrp for array map stress
When cross-compiling the bpf sample map_perf_test for aarch64, I find that
__NR_getpgrp is undefined. This causes build errors. This syscall is deprecated
and requires defining __ARCH_WANT_SYSCALL_DEPRECATED. To avoid having to define
that, just use a different syscall (getppid) for the array map stress test.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-21 11:59:16 -07:00
Martin KaFai Lau
637cd8c312 bpf: Add lru_hash_lookup performance test
Create a new case to test the LRU lookup performance.

At the beginning, the LRU map is fully loaded (i.e. the number of keys
is equal to map->max_entries).   The lookup is done through key 0
to num_map_entries and then repeats from 0 again.

This patch also creates an anonymous struct to properly
name the test params in stress_lru_hmap_alloc() in map_perf_test_kern.c.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 09:57:38 -07:00
David Ahern
0adc3dd900 samples/bpf: Update cgroup socket examples to use uid gid helper
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 06:05:15 +01:00
David Ahern
33aeb5e30a samples/bpf: Update cgrp2 socket tests
Update cgrp2 bpf sock tests to check that device, mark and priority
can all be set on a socket via bpf programs attached to a cgroup.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 06:05:15 +01:00
David Ahern
f776d460b8 samples/bpf: Add option to dump socket settings
Add option to dump socket settings. Will be used in the next patch
to verify bpf programs are correctly setting mark, priority and
device based on the cgroup attachment for the program run.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 06:05:15 +01:00
David Ahern
609b1c3275 samples/bpf: Add detach option to test_cgrp2_sock
Add option to detach programs from a cgroup.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 06:05:15 +01:00
David Ahern
fa38aa17bc samples/bpf: Update sock test to allow setting mark and priority
Update sock test to set mark and priority on socket create.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-01 06:05:15 +01:00
Tariq Toukan
3edcf18ec2 samples/bpf: Fix compilation issue in redirect dummy program
Fix compilation error below:

$ make samples/bpf/

LLVM ERROR: 'xdp_redirect_dummy' label emitted multiple times to
assembly file
make[1]: *** [samples/bpf/xdp_redirect_kern.o] Error 1
make: *** [samples/bpf/] Error 2

Fixes: 306da4e685 ("samples/bpf: xdp_redirect load XDP dummy prog on TX device")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-31 11:56:57 -07:00
Jesper Dangaard Brouer
3ffab54602 samples/bpf: xdp_monitor tool based on tracepoints
This tool xdp_monitor demonstrate how to use the different xdp_redirect
tracepoints xdp_redirect{,_map}{,_err} from a BPF program.

The default mode is to only monitor the error counters, to avoid
affecting the per packet performance. Tracepoints comes with a base
overhead of 25 nanosec for an attached bpf_prog, and 48 nanosec for
using a full perf record (with non-matching filter).  Thus, default
loading the --stats mode could affect the maximum performance.

This version of the tool is very simple and count all types of errors
as one.  It will be natural to extend this later with the different
types of errors that can occur, which should help users quickly
identify common mistakes.

Because the TP_STRUCT was kept in sync all the tracepoints loads the
same BPF code.  It would also be natural to extend the map version to
demonstrate how the map information could be used.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29 10:51:29 -07:00
Jesper Dangaard Brouer
306da4e685 samples/bpf: xdp_redirect load XDP dummy prog on TX device
For supporting XDP_REDIRECT, a device driver must (obviously)
implement the "TX" function ndo_xdp_xmit().  An additional requirement
is you cannot TX out a device, unless it also have a xdp bpf program
attached. This dependency is caused by the driver code need to setup
XDP resources before it can ndo_xdp_xmit.

Update bpf samples xdp_redirect and xdp_redirect_map to automatically
attach a dummy XDP program to the configured ifindex_out device.  Use
the XDP flag XDP_FLAGS_UPDATE_IF_NOEXIST on the dummy load, to avoid
overriding an existing XDP prog on the device.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-29 10:51:29 -07:00
William Tu
ef88f89c83 samples/bpf: extend test_tunnel_bpf.sh with ERSPAN
Extend existing tests for vxlan, gre, geneve, ipip to
include ERSPAN tunnel.

Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-28 15:04:52 -07:00
Martin KaFai Lau
ad17d0e6c7 bpf: Allow numa selection in INNER_LRU_HASH_PREALLOC test of map_perf_test
This patch makes the needed changes to allow each process of
the INNER_LRU_HASH_PREALLOC test to provide its numa node id
when creating the lru map.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-19 21:35:43 -07:00
John Fastabend
69e8cc134b bpf: sockmap sample program
This program binds a program to a cgroup and then matches hard
coded IP addresses and adds these to a sockmap.

This will receive messages from the backend and send them to
the client.

     client:X <---> frontend:10000 client:X <---> backend:10001

To keep things simple this is only designed for 1:1 connections
using hard coded values. A more complete example would allow many
backends and clients.

To run,

 # sockmap <cgroup2_dir>

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-16 11:27:53 -07:00
Yonghong Song
1da236b6be bpf: add a test case for syscalls/sys_{enter|exit}_* tracepoints
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07 14:09:48 -07:00
David S. Miller
29fda25a2d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two minor conflicts in virtio_net driver (bug fix overlapping addition
of a helper) and MAINTAINERS (new driver edit overlapping revamp of
PHY entry).

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 10:07:50 -07:00
William Tu
cc75f8514d samples/bpf: fix bpf tunnel cleanup
test_tunnel_bpf.sh fails to remove the vxlan11 tunnel device, causing the
next geneve tunnelling test case fails.  In addition, the geneve reserved bit
in tcbpf2_kern.c should be zero, according to the RFC.

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 22:02:47 -07:00
Andy Gospodarek
24251c2647 samples/bpf: add option for native and skb mode for redirect apps
When testing with a driver that has both native and generic redirect support:

$ sudo ./samples/bpf/xdp_redirect -N 5 6
input: 5 output: 6
ifindex 6:    4961879 pkt/s
ifindex 6:    6391319 pkt/s
ifindex 6:    6419468 pkt/s

$ sudo ./samples/bpf/xdp_redirect -S 5 6
input: 5 output: 6
ifindex 6:    1845435 pkt/s
ifindex 6:    3882850 pkt/s
ifindex 6:    3893974 pkt/s

$ sudo ./samples/bpf/xdp_redirect_map -N 5 6
input: 5 output: 6
map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
ifindex 6:    2207374 pkt/s
ifindex 6:    6212869 pkt/s
ifindex 6:    6286515 pkt/s

$ sudo ./samples/bpf/xdp_redirect_map -S 5 6
input: 5 output: 6
map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
ifindex 6:    5052528 pkt/s
ifindex 6:    5736631 pkt/s
ifindex 6:    5739962 pkt/s

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 13:46:27 -07:00
John Fastabend
9d6e005287 xdp: bpf redirect with map sample program
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:48:06 -07:00
John Fastabend
832622e6bd xdp: sample program for new bpf_redirect helper
This implements a sample program for testing bpf_redirect. It reports
the number of packets redirected per second and as input takes the
ifindex of the device to run the xdp program on and the ifindex of the
interface to redirect packets to.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-17 09:48:06 -07:00
Yonghong Song
533350227d samples/bpf: fix a build issue
With latest net-next:

====
clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h  -Isamples/bpf \
    -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
    -Wno-compare-distinct-pointer-types \
    -Wno-gnu-variable-sized-type-not-at-end \
    -Wno-address-of-packed-member -Wno-tautological-compare \
    -Wno-unknown-warning-option \
    -O2 -emit-llvm -c samples/bpf/tcp_synrto_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/tcp_synrto_kern.o
samples/bpf/tcp_synrto_kern.c:20:10: fatal error: 'bpf_endian.h' file not found
          ^~~~~~~~~~~~~~
1 error generated.
====

net has the same issue.

Add support for ntohl and htonl in tools/testing/selftests/bpf/bpf_endian.h.
Also move bpf_helpers.h from samples/bpf to selftests/bpf and change
compiler include logic so that programs in samples/bpf can access the headers
in selftests/bpf, but not the other way around.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-11 20:51:29 -07:00
Lawrence Brakmo
f856e46978 bpf: fix return in load_bpf_file
The function load_bpf_file ignores the return value of
load_and_attach(), so even if load_and_attach() returns an error,
load_bpf_file() will return 0.

Now, load_bpf_file() can call load_and_attach() multiple times and some
can succeed and some could fail. I think the correct behavor is to
return error on the first failed load_and_attach().

v2: Added missing SOB

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-05 09:05:28 +01:00
Lawrence Brakmo
6c4a01b278 bpf: Sample bpf program to set sndcwnd clamp
Sample BPF program, tcp_clamp_kern.c, to demostrate the use
of setting the sndcwnd clamp. This program assumes that if the
first 5.5 bytes of the host's IPv6 addresses are the same, then
the hosts are in the same datacenter and sets sndcwnd clamp to
100 packets, SYN and SYN-ACK RTOs to 10ms and send/receive buffer
sizes to 150KB.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:14 -07:00
Lawrence Brakmo
7bc62e2854 bpf: Sample BPF program to set initial cwnd
Sample BPF program that assumes hosts are far away (i.e. large RTTs)
and sets initial cwnd and initial receive window to 40 packets,
send and receive buffers to 1.5MB.

In practice there would be a test to insure the hosts are actually
far enough away.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:14 -07:00
Lawrence Brakmo
bb56d4449d bpf: Sample BPF program to set congestion control
Sample BPF program that sets congestion control to dctcp when both hosts
are within the same datacenter. In this example that is assumed to be
when they have the first 5.5 bytes of their IPv6 address are the same.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:14 -07:00
Lawrence Brakmo
d9925368a6 bpf: Sample BPF program to set buffer sizes
This patch contains a BPF program to set initial receive window to
40 packets and send and receive buffers to 1.5MB. This would usually
be done after doing appropriate checks that indicate the hosts are
far enough away (i.e. large RTT).

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:14 -07:00
Lawrence Brakmo
8c4b4c7e9f bpf: Add setsockopt helper function to bpf
Added support for calling a subset of socket setsockopts from
BPF_PROG_TYPE_SOCK_OPS programs. The code was duplicated rather
than making the changes to call the socket setsockopt function because
the changes required would have been larger.

The ops supported are:
  SO_RCVBUF
  SO_SNDBUF
  SO_MAX_PACING_RATE
  SO_PRIORITY
  SO_RCVLOWAT
  SO_MARK

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:13 -07:00
Lawrence Brakmo
c400296bf6 bpf: Sample bpf program to set initial window
The sample bpf program, tcp_rwnd_kern.c, sets the initial
advertized window to 40 packets in an environment where
distinct IPv6 prefixes indicate that both hosts are not
in the same data center.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:13 -07:00
Lawrence Brakmo
61bc4d8daa bpf: Sample bpf program to set SYN/SYN-ACK RTOs
The sample BPF program, tcp_synrto_kern.c, sets the SYN and SYN-ACK
RTOs to 10ms when both hosts are within the same datacenter (i.e.
small RTTs) in an environment where common IPv6 prefixes indicate
both hosts are in the same data center.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:13 -07:00
Lawrence Brakmo
ae16189efb bpf: program to load and attach sock_ops BPF progs
The program load_sock_ops can be used to load sock_ops bpf programs and
to attach it to an existing (v2) cgroup. It can also be used to detach
sock_ops programs.

Examples:
    load_sock_ops [-l] <cg-path> <prog filename>
	Load and attaches a sock_ops program at the specified cgroup.
	If "-l" is used, the program will continue to run to output the
	BPF log buffer.
	If the specified filename does not end in ".o", it appends
	"_kern.o" to the name.

    load_sock_ops -r <cg-path>
	Detaches the currently attached sock_ops program from the
	specified cgroup.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:13 -07:00
Lawrence Brakmo
40304b2a15 bpf: BPF support for sock_ops
Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.). It uses the
existing bpf cgroups infrastructure so the programs can be attached per
cgroup with full inheritance support. The program will be called at
appropriate times to set relevant connections parameters such as buffer
sizes, SYN and SYN-ACK RTOs, etc., based on connection information such
as IP addresses, port numbers, etc.

Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
distinct advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it oes not require
application changes and it can be updated easily at any time.

Although the bpf cgroup framework already contains a sock related
program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type
(BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called
only once during the connections's lifetime. In contrast, the new
program type will be called multiple times from different places in the
network stack code.  For example, before sending SYN and SYN-ACKs to set
an appropriate timeout, when the connection is established to set
congestion control, etc. As a result it has "op" field to specify the
type of operation requested.

The purpose of this new program type is to simplify setting connection
parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is
easy to use facebook's internal IPv6 addresses to determine if both hosts
of a connection are in the same datacenter. Therefore, it is easy to
write a BPF program to choose a small SYN RTO value when both hosts are
in the same datacenter.

This patch only contains the framework to support the new BPF program
type, following patches add the functionality to set various connection
parameters.

This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS
and a new bpf syscall command to load a new program of this type:
BPF_PROG_LOAD_SOCKET_OPS.

Two new corresponding structs (one for the kernel one for the user/BPF
program):

/* kernel version */
struct bpf_sock_ops_kern {
        struct sock *sk;
        __u32  op;
        union {
                __u32 reply;
                __u32 replylong[4];
        };
};

/* user version
 * Some fields are in network byte order reflecting the sock struct
 * Use the bpf_ntohl helper macro in samples/bpf/bpf_endian.h to
 * convert them to host byte order.
 */
struct bpf_sock_ops {
        __u32 op;
        union {
                __u32 reply;
                __u32 replylong[4];
        };
        __u32 family;
        __u32 remote_ip4;     /* In network byte order */
        __u32 local_ip4;      /* In network byte order */
        __u32 remote_ip6[4];  /* In network byte order */
        __u32 local_ip6[4];   /* In network byte order */
        __u32 remote_port;    /* In network byte order */
        __u32 local_port;     /* In host byte horder */
};

Currently there are two types of ops. The first type expects the BPF
program to return a value which is then used by the caller (or a
negative value to indicate the operation is not supported). The second
type expects state changes to be done by the BPF program, for example
through a setsockopt BPF helper function, and they ignore the return
value.

The reply fields of the bpf_sockt_ops struct are there in case a bpf
program needs to return a value larger than an integer.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01 16:15:13 -07:00
Martin KaFai Lau
a8744f2528 bpf: Add test for syscall on fd array/htab lookup
Checks are added to the existing sockex3 and test_map_in_map test.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-29 13:13:26 -04:00
Yonghong Song
00a3855d68 samples/bpf: fix a build problem
tracex5_kern.c build failed with the following error message:
  ../samples/bpf/tracex5_kern.c:12:10: fatal error: 'syscall_nrs.h' file not found
  #include "syscall_nrs.h"
The generated file syscall_nrs.h is put in build/samples/bpf directory,
but this directory is not in include path, hence build failed.

The fix is to add $(obj) into the clang compilation path.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-22 11:35:19 -04:00
David Daney
4b7190e841 samples/bpf: Fix tracex5 to work with MIPS syscalls.
There are two problems:

1) In MIPS the __NR_* macros expand to an expression, this causes the
   sections of the object file to be named like:

  .
  .
  .
  [ 5] kprobe/(5000 + 1) PROGBITS        0000000000000000 000160 ...
  [ 6] kprobe/(5000 + 0) PROGBITS        0000000000000000 000258 ...
  [ 7] kprobe/(5000 + 9) PROGBITS        0000000000000000 000348 ...
  .
  .
  .

The fix here is to use the "asm_offsets" trick to evaluate the macros
in the C compiler and generate a header file with a usable form of the
macros.

2) MIPS syscall numbers start at 5000, so we need a bigger map to hold
the sub-programs.

Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14 15:03:23 -04:00
David Daney
c1932cdb27 bpf: Add MIPS support to samples/bpf.
Signed-off-by: David Daney <david.daney@cavium.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-14 15:03:22 -04:00
Teng Qin
41e9a8046c samples/bpf: add tests for more perf event types
$ trace_event

tests attaching BPF program to HW_CPU_CYCLES, SW_CPU_CLOCK, HW_CACHE_L1D and other events.
It runs 'dd' in the background while bpf program collects user and kernel
stack trace on counter overflow.
User space expects to see sys_read and sys_write in the kernel stack.

$ tracex6

tests reading of various perf counters from BPF program.

Both tests were refactored to increase coverage and be more accurate.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-04 21:58:15 -04:00
Jesper Dangaard Brouer
7bc57950bd samples/bpf: bpf_load.c order of prog_fd[] should correspond with ELF order
An eBPF ELF file generated with LLVM can contain several program
section, which can be used for bpf tail calls.  The bpf prog file
descriptors are accessible via array prog_fd[].

At-least XDP samples assume ordering, and uses prog_fd[0] is the main
XDP program to attach.  The actual order of array prog_fd[] depend on
whether or not a bpf program section is referencing any maps or not.
Not using a map result in being loaded/processed after all other
prog section.  Thus, this can lead to some very strange and hard to
debug situation, as the user can only see a FD and cannot correlated
that with the ELF section name.

The fix is rather simple, and even removes duplicate memcmp code.
Simply load program sections as the last step, instead of
load_and_attach while processing the relocation section.

When working with tail calls, it become even more essential that the
order of prog_fd[] is consistant, like the current dependency of the
map_fd[] order.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-31 12:59:20 -04:00
Andy Gospodarek
ad990dbe6d samples/bpf: run cleanup routines when receiving SIGTERM
Shahid Habib noticed that when xdp1 was killed from a different console the xdp
program was not cleaned-up properly in the kernel and it continued to forward
traffic.

Most of the applications in samples/bpf cleanup properly, but only when getting
SIGINT.  Since kill defaults to using SIGTERM, add support to cleanup when the
application receives either SIGINT or SIGTERM.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Reported-by: Shahid Habib <shahid.habib@broadcom.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-11 21:43:30 -04:00
Daniel Borkmann
0489df9a43 xdp: add flag to enforce driver mode
After commit b5cdae3291 ("net: Generic XDP") we automatically fall
back to a generic XDP variant if the driver does not support native
XDP. Allow for an option where the user can specify that always the
native XDP variant should be selected and in case it's not supported
by a driver, just bail out.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-11 21:30:57 -04:00
Jesper Dangaard Brouer
9178b4c17d samples/bpf: export map_data[] for more info on maps
Giving *_user.c side tools access to map_data[] provides easier
access to information on the maps being loaded.  Still provide
the guarantee that the order maps are being defined in inside the
_kern.c file corresponds with the order in the array.  Now user
tools are not blind, but can inspect and verify the maps that got
loaded from the ELF binary.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-03 09:30:24 -04:00
Jesper Dangaard Brouer
6979bcc731 samples/bpf: load_bpf.c make callback fixup more flexible
Do this change before others start to use this callback.
Change map_perf_test_user.c which seems to be the only user.

This patch extends capabilities of commit 9fd63d05f3 ("bpf:
Allow bpf sample programs (*_user.c) to change bpf_map_def").

Give fixup callback access to struct bpf_map_data, instead of
only stuct bpf_map_def.  This add flexibility to allow userspace
to reassign the map file descriptor.  This is very useful when
wanting to share maps between several bpf programs.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-03 09:30:24 -04:00
Jesper Dangaard Brouer
156450d9d9 samples/bpf: make bpf_load.c code compatible with ELF maps section changes
This patch does proper parsing of the ELF "maps" section, in-order to
be both backwards and forwards compatible with changes to the map
definition struct bpf_map_def, which gets compiled into the ELF file.

The assumption is that new features with value zero, means that they
are not in-use.  For backward compatibility where loading an ELF file
with a smaller struct bpf_map_def, only copy objects ELF size, leaving
rest of loaders struct zero.  For forward compatibility where ELF file
have a larger struct bpf_map_def, only copy loaders own struct size
and verify that rest of the larger struct is zero, assuming this means
the newer feature was not activated, thus it should be safe for this
older loader to load this newer ELF file.

Fixes: fb30d4b712 ("bpf: Add tests for map-in-map")
Fixes: 409526bea3c3 ("samples/bpf: bpf_load.c detect and abort if ELF maps section size is wrong")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-03 09:30:24 -04:00
Jesper Dangaard Brouer
55de170382 samples/bpf: adjust rlimit RLIMIT_MEMLOCK for traceex2, tracex3 and tracex4
Needed to adjust max locked memory RLIMIT_MEMLOCK for testing these bpf samples
as these are using more and larger maps than can fit in distro default 64Kbytes limit.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-03 09:30:23 -04:00
Daniel Borkmann
eb6211d360 bpf, samples: fix build warning in cookie_uid_helper_example
Fix the following warnings triggered by 51570a5ab2 ("A Sample of
using socket cookie and uid for traffic monitoring"):

  In file included from /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:54:0:
  /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c: In function 'prog_load':
  /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:119:27: warning: overflow in implicit constant conversion [-Woverflow]
     -32 + offsetof(struct stats, uid)),
                           ^
  /home/foo/net-next/samples/bpf/libbpf.h:135:12: note: in definition of macro 'BPF_STX_MEM'
   .off   = OFF,     \
            ^
  /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:121:27: warning: overflow in implicit constant conversion [-Woverflow]
     -32 + offsetof(struct stats, packets), 1),
                           ^
  /home/foo/net-next/samples/bpf/libbpf.h:155:12: note: in definition of macro 'BPF_ST_MEM'
   .off   = OFF,     \
            ^
  /home/foo/net-next/samples/bpf/cookie_uid_helper_example.c:129:27: warning: overflow in implicit constant conversion [-Woverflow]
     -32 + offsetof(struct stats, bytes)),
                           ^
  /home/foo/net-next/samples/bpf/libbpf.h:135:12: note: in definition of macro 'BPF_STX_MEM'
   .off   = OFF,     \
            ^
  HOSTLD  /home/foo/net-next/samples/bpf/per_socket_stats_example

Fixes: 51570a5ab2 ("A Sample of using socket cookie and uid for traffic monitoring")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-02 09:29:35 -04:00
Jesper Dangaard Brouer
f76254a845 samples/bpf: fix XDP_FLAGS_SKB_MODE detach for xdp_tx_iptunnel
The xdp_tx_iptunnel program can be terminated in two ways, after
N-seconds or via Ctrl-C SIGINT.  The SIGINT code path does not
handle detatching the correct XDP program, in-case the program
was attached with XDP_FLAGS_SKB_MODE.

Fix this by storing the XDP flags as a global variable, which is
available for the SIGINT handler function.

Fixes: 3993f2cb98 ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 10:42:37 -04:00
Jesper Dangaard Brouer
6387d0111c samples/bpf: fix SKB_MODE flag to be a 32-bit unsigned int
The kernel side of XDP_FLAGS_SKB_MODE is unsigned, and the rtnetlink
IFLA_XDP_FLAGS is defined as NLA_U32. Thus, userspace programs under
samples/bpf/ should use the correct type.

Fixes: 3993f2cb98 ("samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-01 10:42:37 -04:00
Jesper Dangaard Brouer
5010e94842 samples/bpf: bpf_load.c detect and abort if ELF maps section size is wrong
The struct bpf_map_def was extended in commit fb30d4b712 ("bpf: Add tests
for map-in-map") with member unsigned int inner_map_idx.  This changed the size
of the maps section in the generated ELF _kern.o files.

Unfortunately the loader in bpf_load.c does not detect or handle this.  Thus,
older _kern.o files became incompatible, and caused hard-to-debug errors
where the syscall validation rejected BPF_MAP_CREATE request.

This patch only detect the situation and aborts load_bpf_file(). It also
add code comments warning people that read this loader for inspiration
for these pitfalls.

Fixes: fb30d4b712 ("bpf: Add tests for map-in-map")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-30 22:41:59 -04:00
David Ahern
3993f2cb98 samples/bpf: Add support for SKB_MODE to xdp1 and xdp_tx_iptunnel
Add option to xdp1 and xdp_tx_iptunnel to insert xdp program in
SKB_MODE:
 - update set_link_xdp_fd to take a flags argument that is added to the
   RTM_SETLINK message

 - Add -S option to xdp1 and xdp_tx_iptunnel user code. When passed in
   XDP_FLAGS_SKB_MODE is set in the flags arg passed to set_link_xdp_fd

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-27 12:49:26 -04:00
Alexander Alemayhu
dfc5be0dc0 samples/bpf: check before defining offsetof
Fixes the following warning

samples/bpf/test_lru_dist.c:28:0: warning: "offsetof" redefined
 #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)

In file included from ./tools/lib/bpf/bpf.h:25:0,
                 from samples/bpf/libbpf.h:5,
                 from samples/bpf/test_lru_dist.c:24:
/usr/lib/gcc/x86_64-redhat-linux/6.3.1/include/stddef.h:417:0: note: this is the location of the previous definition
 #define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER)

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24 16:20:19 -04:00
Alexander Alemayhu
4784726f69 samples/bpf: add static to function with no prototype
Fixes the following warning

samples/bpf/cookie_uid_helper_example.c: At top level:
samples/bpf/cookie_uid_helper_example.c:276:6: warning: no previous prototype for ‘finish’ [-Wmissing-prototypes]
 void finish(int ret)
      ^~~~~~
  HOSTLD  samples/bpf/per_socket_stats_example

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24 16:20:19 -04:00
Alexander Alemayhu
69b6a7f743 samples/bpf: add -Wno-unknown-warning-option to clang
I was initially going to remove '-Wno-address-of-packed-member' because I
thought it was not supposed to be there but Daniel suggested using
'-Wno-unknown-warning-option'.

This silences several warnings similiar to the one below

warning: unknown warning option '-Wno-address-of-packed-member' [-Wunknown-warning-option]
1 warning generated.
clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.3.1/include -I./arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I./include
 -I./arch/x86/include/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h  \
        -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
        -Wno-compare-distinct-pointer-types \
        -Wno-gnu-variable-sized-type-not-at-end \
        -Wno-address-of-packed-member -Wno-tautological-compare \
        -O2 -emit-llvm -c samples/bpf/xdp_tx_iptunnel_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/xdp_tx_iptunnel_kern.o

$ clang --version

 clang version 3.9.1 (tags/RELEASE_391/final)
 Target: x86_64-unknown-linux-gnu
 Thread model: posix
 InstalledDir: /usr/bin

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-24 16:20:19 -04:00
David S. Miller
b0c47807d3 bpf: Add sparc support to tools and samples.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
2017-04-22 13:01:52 -07:00
Martin KaFai Lau
3a5795b83d bpf: lru: Add map-in-map LRU example
This patch adds a map-in-map LRU example.
If we know only a subset of cores will use the
LRU, we can allocate a common LRU list per targeting core
and store it into an array-of-hashs.

It allows using the common LRU map with map-update performance
comparable to the BPF_F_NO_COMMON_LRU map but without wasting memory
on the unused cores that we know they will never access the LRU map.

BPF_F_NO_COMMON_LRU:
> map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}'
9234314 (9.23M/s)

map-in-map LRU:
> map_perf_test 512 8 1260000 80000000 | awk '{sum += $3}END{print sum}'
9962743 (9.96M/s)

Notes that the max_entries for the map-in-map LRU test is 1260000 which
is the max_entries for each inner LRU map.  8 processes have been
started, so 8 * 1260000 = 10080000 (~10M) which is close to what is
used in the BPF_F_NO_COMMON_LRU test.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 13:55:52 -04:00
Martin KaFai Lau
9fd63d05f3 bpf: Allow bpf sample programs (*_user.c) to change bpf_map_def
The current bpf_map_def is statically defined during compile
time.  This patch allows the *_user.c program to change it during
runtime.  It is done by adding load_bpf_file_fixup_map() which
takes a callback.  The callback will be called before creating
each map so that it has a chance to modify the bpf_map_def.

The current usecase is to change max_entries in map_perf_test.
It is interesting to test with a much bigger map size in
some cases (e.g. the following patch on bpf_lru_map.c).
However,  it is hard to find one size to fit all testing
environment.  Hence, it is handy to take the max_entries
as a cmdline arg and then configure the bpf_map_def during
runtime.

This patch adds two cmdline args.  One is to configure
the map's max_entries.  Another is to configure the max_cnt
which controls how many times a syscall is called.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 13:55:52 -04:00
Martin KaFai Lau
bf8db5d243 bpf: lru: Refactor LRU map tests in map_perf_test
One more LRU test will be added later in this patch series.
In this patch, we first move all existing LRU map tests into
a single syscall (connect) first so that the future new
LRU test can be added without hunting another syscall.

One of the map name is also changed from percpu_lru_hash_map
to nocommon_lru_hash_map to avoid the confusion with percpu_hash_map.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-17 13:55:52 -04:00
Chenbo Feng
00f660eaf3 Sample program using SO_COOKIE
Added a per socket traffic monitoring option to illustrate the usage
of new getsockopt SO_COOKIE. The program is based on the socket traffic
monitoring program using xt_eBPF and in the new option the data entry
can be directly accessed using socket cookie. The cookie retrieved
allow us to lookup an element in the eBPF for a specific socket.

Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-08 08:07:01 -07:00
Chenbo Feng
51570a5ab2 A Sample of using socket cookie and uid for traffic monitoring
Add a sample program to demostrate the possible usage of
get_socket_cookie and get_socket_uid helper function. The program will
store bytes and packets counting of in/out traffic monitored by iptables
and store the stats in a bpf map in per socket base. The owner uid of
the socket will be stored as part of the data entry. A shell script for
running the program is also included.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23 17:01:57 -07:00
Martin KaFai Lau
fb30d4b712 bpf: Add tests for map-in-map
Test cases for array of maps and hash of maps.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 15:45:45 -07:00
Alexei Starovoitov
95ff141e52 samples/bpf: add map_lookup microbenchmark
$ map_perf_test 128
speed of HASH bpf_map_lookup_elem() in lookups per second
	w/o JIT		w/JIT
before	46M		58M
after	42M		74M

perf report
before:
    54.23%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
    14.24%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
     8.84%  map_perf_test  [kernel.kallsyms]  [k] htab_map_lookup_elem
     5.93%  map_perf_test  [kernel.kallsyms]  [k] bpf_map_lookup_elem
     2.30%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     1.49%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

after:
    60.03%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
    18.07%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
     2.91%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     1.94%  map_perf_test  [kernel.kallsyms]  [k] _einittext
     1.90%  map_perf_test  [kernel.kallsyms]  [k] __audit_syscall_exit
     1.72%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

Notice that bpf_map_lookup_elem() and htab_map_lookup_elem() are trivial
functions, yet they take sizeable amount of cpu time.
htab_map_gen_lookup() removes bpf_map_lookup_elem() and converts
htab_map_lookup_elem() into three BPF insns which causing cpu time
for bpf_prog_da4fc6a3f41761a2() slightly increase.

$ map_perf_test 256
speed of ARRAY bpf_map_lookup_elem() in lookups per second
	w/o JIT		w/JIT
before	97M		174M
after	64M		280M

before:
    37.33%  map_perf_test  [kernel.kallsyms]  [k] array_map_lookup_elem
    13.95%  map_perf_test  [kernel.kallsyms]  [k] bpf_map_lookup_elem
     6.54%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     4.57%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

after:
    32.86%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     6.54%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

array_map_gen_lookup() removes calls to array_map_lookup_elem()
and bpf_map_lookup_elem() and replaces them with 7 bpf insns.

The performance without JIT is slower, since executing extra insns
in the interpreter is slower than running native C code,
but with JIT the performance gains are obvious,
since native C->x86 code is replaced with fewer bpf->x86 instructions.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 20:44:12 -07:00
Linus Torvalds
3051bf36c2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
      Varadhan.

   2) Simplify classifier state on sk_buff in order to shrink it a bit.
      From Willem de Bruijn.

   3) Introduce SIPHASH and it's usage for secure sequence numbers and
      syncookies. From Jason A. Donenfeld.

   4) Reduce CPU usage for ICMP replies we are going to limit or
      suppress, from Jesper Dangaard Brouer.

   5) Introduce Shared Memory Communications socket layer, from Ursula
      Braun.

   6) Add RACK loss detection and allow it to actually trigger fast
      recovery instead of just assisting after other algorithms have
      triggered it. From Yuchung Cheng.

   7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.

   8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.

   9) Export MPLS packet stats via netlink, from Robert Shearman.

  10) Significantly improve inet port bind conflict handling, especially
      when an application is restarted and changes it's setting of
      reuseport. From Josef Bacik.

  11) Implement TX batching in vhost_net, from Jason Wang.

  12) Extend the dummy device so that VF (virtual function) features,
      such as configuration, can be more easily tested. From Phil
      Sutter.

  13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
      Dumazet.

  14) Add new bpf MAP, implementing a longest prefix match trie. From
      Daniel Mack.

  15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.

  16) Add new aquantia driver, from David VomLehn.

  17) Add bpf tracepoints, from Daniel Borkmann.

  18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
      Florian Fainelli.

  19) Remove custom busy polling in many drivers, it is done in the core
      networking since 4.5 times. From Eric Dumazet.

  20) Support XDP adjust_head in virtio_net, from John Fastabend.

  21) Fix several major holes in neighbour entry confirmation, from
      Julian Anastasov.

  22) Add XDP support to bnxt_en driver, from Michael Chan.

  23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.

  24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.

  25) Support GRO in IPSEC protocols, from Steffen Klassert"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
  Revert "ath10k: Search SMBIOS for OEM board file extension"
  net: socket: fix recvmmsg not returning error from sock_error
  bnxt_en: use eth_hw_addr_random()
  bpf: fix unlocking of jited image when module ronx not set
  arch: add ARCH_HAS_SET_MEMORY config
  net: napi_watchdog() can use napi_schedule_irqoff()
  tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
  net/hsr: use eth_hw_addr_random()
  net: mvpp2: enable building on 64-bit platforms
  net: mvpp2: switch to build_skb() in the RX path
  net: mvpp2: simplify MVPP2_PRS_RI_* definitions
  net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
  net: mvpp2: remove unused register definitions
  net: mvpp2: simplify mvpp2_bm_bufs_add()
  net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
  net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
  net: mvpp2: release reference to txq_cpu[] entry after unmapping
  net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
  net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
  net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
  ...
2017-02-22 10:15:09 -08:00
Linus Torvalds
7f4eb0a6d5 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "On the kernel side the main changes in this cycle were:

   - Add Intel Kaby Lake CPU support (Srinivas Pandruvada)

   - AMD uncore driver updates for fam17 (Janakarajan Natarajan)

   - Intel/PT updates and core events optimizations and cleanups
     (Alexander Shishkin)

   - cgroups events fixes (David Carrillo-Cisneros)

   - kprobes improvements (Masami Hiramatsu)

   - ... plus misc fixes and updates.

  On the tooling side the main changes were:

   - Support clang build in tools/{perf,lib/{bpf,traceevent,api}} with
     CC=clang, to, for instance, take advantage of better warnings
     (Arnaldo Carvalho de Melo):

   - Introduce the 'delta-abs' 'perf diff' compute method, that orders
     the histogram entries by the absolute value of the percentage delta
     for a function in two perf.data files, i.e. the functions that
     changed the most (increase or decrease in samples) comes first
     (Namhyung Kim)

   - Add support for parsing Intel uncore vendor event files and add
     uncore vendor events for the Intel server processors (Haswell,
     Broadwell, IvyBridge), Xeon Phi (Knights Landing) and Broadwell DE
     (Andi Kleen)

   - Introduce 'perf ftrace' a perf front end to the kernel's ftrace
     function and function_graph tracer, defaulting to the
     "function_graph" tracer, more work will be done in reviving this
     effort, forward porting it from its initial patch submission
     (Namhyung Kim)

   - Add 'e' and 'c' hotkeys to expand/collapse call chains for a single
     hist entry in the 'perf report' and 'perf top' TUI (Jiri Olsa)

   - Account thread wait time (off CPU time) separately: sleep, iowait
     and preempt, based on the prev_state of the last event, show the
     breakdown when using "perf sched timehist --state" (Namhyumg Kim)

   - Add more triggers to switch the output file (perf.data.TIMESTAMP).

     Now, in addition to switching to a different output file when
     receiving a SIGUSR2, one can also specify file size and time based
     triggers:

           perf record -a --switch-output=signal

     is equivalent to what we had before:

           perf record -a --switch-output

     While we can also ask for the file to be "sliced" by size, taking
     into account that that will happen only when we get woken up by the
     kernel, i.e. one has to take into account the --mmap-pages (the
     size of the perf mmap ring buffer):

           perf record -a --switch-output=2G

     will break the perf.data output into multiple files limited to 2GB
     of samples, right when generating the output.

     For time based samples, alert() will be used, so to have 1 minute
     limited perf.data output files:

          perf record -a --switch-output=1m

     (Jiri Olsa)

   - Improve 'perf trace' (Arnaldo Carvalho de Melo)

   - 'perf kallsyms' toy tool to look for extended symbol information on
     the running kernel and demonstrate the machine/thread/symbol APIs
     for use in other tools, such as 'perf probe' (Arnaldo Carvalho de
     Melo)

   - ... plus tons of other changes, see the shortlog and Git log for
     details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (131 commits)
  perf tools: Add missing parse_events_error() prototype
  perf pmu: Fix check for unset alias->unit array
  perf tools: Be consistent on the type of map->symbols[] interator
  perf intel pt decoder: clang has no -Wno-override-init
  perf evsel: Do not put a variable sized type not at the end of a struct
  perf probe: Avoid accessing uninitialized 'map' variable
  perf tools: Do not put a variable sized type not at the end of a struct
  perf record: Do not put a variable sized type not at the end of a struct
  perf tests: Synthesize struct instead of using field after variable sized type
  perf bench numa: Make sure dprintf() is not defined
  Revert "perf bench futex: Sanitize numeric parameters"
  tools lib subcmd: Make it an error to pass a signed value to OPTION_UINTEGER
  tools: Set the maximum optimization level according to the compiler being used
  tools: Suppress request for warning options not existent in clang
  samples/bpf: Reset global variables
  samples/bpf: Ignore already processed ELF sections
  samples/bpf: Add missing header
  perf symbols: dso->name is an array, no need to check it against NULL
  perf tests record: No need to test an array against NULL
  perf symbols: No need to check if sym->name is NULL
  ...
2017-02-20 12:21:13 -08:00
David S. Miller
3f64116a83 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-02-16 19:34:01 -05:00
Mickaël Salaün
a734fb5d60 samples/bpf: Reset global variables
Before loading a new ELF, clean previous kernel version, license and
processed sections.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170208202744.16274-3-mic@digikod.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-02-13 17:22:53 -03:00
Mickaël Salaün
16ad132900 samples/bpf: Ignore already processed ELF sections
Add a missing check for the map fixup loop.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170208202744.16274-2-mic@digikod.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-02-13 17:22:47 -03:00
Mickaël Salaün
af392a8f53 samples/bpf: Add missing header
Include unistd.h to define __NR_getuid and __NR_getsid.

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Acked-by: Joe Stringer <joe@ovn.org>
Acked-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20170208202744.16274-4-mic@digikod.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-02-13 17:22:35 -03:00
Alexei Starovoitov
7f67763337 bpf: introduce BPF_F_ALLOW_OVERRIDE flag
If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
to the given cgroup the descendent cgroup will be able to override
effective bpf program that was inherited from this cgroup.
By default it's not passed, therefore override is disallowed.

Examples:
1.
prog X attached to /A with default
prog Y fails to attach to /A/B and /A/B/C
Everything under /A runs prog X

2.
prog X attached to /A with allow_override.
prog Y fails to attach to /A/B with default (non-override)
prog M attached to /A/B with allow_override.
Everything under /A/B runs prog M only.

3.
prog X attached to /A with allow_override.
prog Y fails to attach to /A with default.
The user has to detach first to switch the mode.

In the future this behavior may be extended with a chain of
non-overridable programs.

Also fix the bug where detach from cgroup where nothing is attached
was not throwing error. Return ENOENT in such case.

Add several testcases and adjust libbpf.

Fixes: 3007098494 ("cgroup: add support for eBPF programs")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Daniel Mack <daniel@zonque.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-12 21:52:19 -05:00
David S. Miller
4e8f2fc1a5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two trivial overlapping changes conflicts in MPLS and mlx5.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-28 10:33:06 -05:00
David Herrmann
b8a943e294 samples/bpf: add lpm-trie benchmark
Extend the map_perf_test_{user,kern}.c infrastructure to stress test
lpm-trie lookups. We hook into the kprobe on sys_gettid() and measure
the latency depending on trie size and lookup count.

On my Intel Haswell i7-6400U, a single gettid() syscall with an empty
bpf program takes roughly 6.5us on my system. Lookups in empty tries
take ~1.8us on first try, ~0.9us on retries. Lookups in tries with 8192
entries take ~7.1us (on the first _and_ any subsequent try).

Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Daniel Mack <daniel@zonque.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-23 16:10:38 -05:00
Jesper Dangaard Brouer
cdb749cef1 bpf: fix samples xdp_tx_iptunnel and tc_l2_redirect with fake KBUILD_MODNAME
Fix build errors for samples/bpf xdp_tx_iptunnel and tc_l2_redirect,
when dynamic debugging is enabled (CONFIG_DYNAMIC_DEBUG) by defining a
fake KBUILD_MODNAME.

Just like Daniel Borkmann fixed other samples/bpf in commit
96a8eb1eee ("bpf: fix samples to add fake KBUILD_MODNAME").

Fixes: 12d8bb64e3 ("bpf: xdp: Add XDP example for head adjustment")
Fixes: 90e02896f1 ("bpf: Add test for bpf_redirect to ipip/ip6tnl")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-20 12:04:07 -05:00
Ingo Molnar
4e06d4f083 perf/urgent fixes and one improvement:
Fixes:
 
 - Fix prev/next_prio formatting for deadline tasks in libtraceevent (Daniel Bristot de Oliveira)
 
 - Robustify reading of build-ids from /sys/kernel/note (Arnaldo Carvalho de Melo)
 
 - Fix building some sample/bpf in Alpine Linux 3.4 (Arnaldo Carvalho de Melo)
 
 - Fix 'make install-bin' to install libtraceevent plugins (Arnaldo Carvalho de Melo)
 
 - Fix 'perf record --switch-output' documentation and comment (Jiri Olsa)
 
 - 'perf probe' fixes for cross arch probing (Masami Hiramatsu)
 
 Improvement:
 
 - Show total scheduling time in 'perf sched timehist' (Namhyumg Kim)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJYbS6KAAoJENZQFvNTUqpAWqkP/1xdvbDqeTA2YjcWG+ffQJye
 LULSNQpRShvBGpB3zsOzOebOtw0R5oWb7YAp4JFZNyMIk5qcnS9gVwdw9Qx1lc/W
 Gq8OuhI28Pr9AHE37eTJ1+12HI2aAOylwKFqUpEWMzoaUqBDRh5JV2MeVGVQXBRe
 YBJ4Qwk1a7Ng74uzN/pdiC6V7pjddPEoLZEDG5p35vrgtNAY1JGlX5rvNR7CO6nN
 72QHRnB4r1d9fgNGS5oO5zEU1q5+JN5phe+gDANdk/8pCqDegRjZpHnj5gcm71jr
 mEvbstmlV1XPZSA9aFVu2L5LZuv6asS02HjrbGydqO2DUTpDgrynHsjAuqqaTnwk
 kLGwA2I+Kyh8g155H+HrQANONe2TTOXRkBVWIoFBWkhd1JUS/p3GE2qvqx5N36sr
 U26d9i/Bk/pbOYOdRUFMc4lckZY2HrrRWj8dVr06qLr2rlbYoLv8hpqvdyHr9MJS
 Id2Yqqd1JVzBDU3/+GfdFDBjltVtpSG1mdc6LWy9IpXFMJoSOQNsrbjf+mgMadeV
 EbtY8MdDZ525EB8niv9E9Vfd56LiWTKCjg9Mc758+U+Ns50YlQFsDq2hBO2+jQ/k
 W3QNX+WJiLgxqQK8adveXoYgCSWPNVttIvg8Lj1KNk8Y//vUC1KSrvDfaKiROzLB
 iF0u5mLRAQ+Pfec6X5PP
 =o9KL
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo-4.10-20170104' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes and one improvement from Arnaldo Carvalho de Melo:

Fixes:

  - Fix prev/next_prio formatting for deadline tasks in libtraceevent (Daniel Bristot de Oliveira)

  - Robustify reading of build-ids from /sys/kernel/note (Arnaldo Carvalho de Melo)

  - Fix building some sample/bpf in Alpine Linux 3.4 (Arnaldo Carvalho de Melo)

  - Fix 'make install-bin' to install libtraceevent plugins (Arnaldo Carvalho de Melo)

  - Fix 'perf record --switch-output' documentation and comment (Jiri Olsa)

  - Fix 'perf probe' for cross arch probing (Masami Hiramatsu)

Improvement:

  - Show total scheduling time in 'perf sched timehist' (Namhyumg Kim)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-01-05 08:33:02 +01:00
Arnaldo Carvalho de Melo
b6f4c66704 samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@ovn.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-3awp0nv8tpnblatojmwjww7z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-28 10:47:13 -03:00
Arnaldo Carvalho de Melo
ee12996c9d samples/bpf sock_example: Avoid getting ethhdr from two includes
To avoid the following build failure on Alpine Linux 3.4, that has
clang-3.8 with the bpf target:

    HOSTCC  samples/bpf/sock_example.o
  In file included from /usr/include/net/ethernet.h:10:0,
                   from /git/linux/samples/bpf/sock_example.h:7,
                   from /git/linux/samples/bpf/sock_example.c:30:
  /usr/include/netinet/if_ether.h:96:8: error: redefinition of 'struct
  ethhdr'
   struct ethhdr {
          ^
  In file included from /git/linux/samples/bpf/sock_example.c:26:0:
  ./usr/include/linux/if_ether.h:144:8: note: originally defined here
   struct ethhdr {
          ^
  scripts/Makefile.host:124: recipe for target
  'samples/bpf/sock_example.o' failed
  make[2]: *** [samples/bpf/sock_example.o] Error 1
  /git/linux/Makefile:1658: recipe for target 'samples/bpf/' failed

So include net/if_ether.h for the needs of sock_example.h, using the
same include that sock_example.c uses.

Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@ovn.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-m9avekl1b651qe1r1zd5tzz9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-27 21:49:17 -03:00
Linus Torvalds
00198dab3b Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "On the kernel side there's two x86 PMU driver fixes and a uprobes fix,
  plus on the tooling side there's a number of fixes and some late
  updates"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  perf sched timehist: Fix invalid period calculation
  perf sched timehist: Remove hardcoded 'comm_width' check at print_summary
  perf sched timehist: Enlarge default 'comm_width'
  perf sched timehist: Honour 'comm_width' when aligning the headers
  perf/x86: Fix overlap counter scheduling bug
  perf/x86/pebs: Fix handling of PEBS buffer overflows
  samples/bpf: Move open_raw_sock to separate header
  samples/bpf: Remove perf_event_open() declaration
  samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
  tools lib bpf: Add bpf_prog_{attach,detach}
  samples/bpf: Switch over to libbpf
  perf diff: Do not overwrite valid build id
  perf annotate: Don't throw error for zero length symbols
  perf bench futex: Fix lock-pi help string
  perf trace: Check if MAP_32BIT is defined (again)
  samples/bpf: Make perf_event_read() static
  uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation
  samples/bpf: Make samples more libbpf-centric
  tools lib bpf: Add flags to bpf_create_map()
  tools lib bpf: use __u32 from linux/types.h
  ...
2016-12-23 16:49:12 -08:00
Joe Stringer
9899694a7f samples/bpf: Move open_raw_sock to separate header
This function was declared in libbpf.c and was the only remaining
function in this library, but has nothing to do with BPF. Shift it out
into a new header, sock_example.h, and include it from the relevant
samples.

Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20161209024620.31660-8-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:40 -03:00
Joe Stringer
205c8ada31 samples/bpf: Remove perf_event_open() declaration
This declaration was made in samples/bpf/libbpf.c for convenience, but
there's already one in tools/perf/perf-sys.h. Reuse that one.

Committer notes:

Testing it:

  $ make -j4 O=../build/v4.9.0-rc8+ samples/bpf/
  make[1]: Entering directory '/home/build/v4.9.0-rc8+'
    CHK     include/config/kernel.release
    GEN     ./Makefile
    CHK     include/generated/uapi/linux/version.h
    Using /home/acme/git/linux as source for kernel
    CHK     include/generated/utsrelease.h
    CHK     include/generated/timeconst.h
    CHK     include/generated/bounds.h
    CHK     include/generated/asm-offsets.h
    CALL    /home/acme/git/linux/scripts/checksyscalls.sh
    HOSTCC  samples/bpf/test_verifier.o
    HOSTCC  samples/bpf/libbpf.o
    HOSTCC  samples/bpf/../../tools/lib/bpf/bpf.o
    HOSTCC  samples/bpf/test_maps.o
    HOSTCC  samples/bpf/sock_example.o
    HOSTCC  samples/bpf/bpf_load.o
<SNIP>
    HOSTLD  samples/bpf/trace_event
    HOSTLD  samples/bpf/sampleip
    HOSTLD  samples/bpf/tc_l2_redirect
  make[1]: Leaving directory '/home/build/v4.9.0-rc8+'
  $

Also tested the offwaketime resulting from the rebuild, seems to work as
before.

Signed-off-by: Joe Stringer <joe@ovn.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20161209024620.31660-7-joe@ovn.org
[ Use -I$(srctree)/tools/lib/ to support out of source code tree builds ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:40 -03:00
Arnaldo Carvalho de Melo
811b4f0d78 samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
Only one of the examples declare the bpf_insn bpf proggie as a const:

  $ grep 'struct bpf_insn [a-z]' samples/bpf/*.c
  samples/bpf/fds_example.c:	static const struct bpf_insn insns[] = {
  samples/bpf/sock_example.c:	struct bpf_insn prog[] = {
  samples/bpf/test_cgrp2_attach2.c:	struct bpf_insn prog[] = {
  samples/bpf/test_cgrp2_attach.c:	struct bpf_insn prog[] = {
  samples/bpf/test_cgrp2_sock.c:	struct bpf_insn prog[] = {
  $

Which causes this warning:

  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
  <SNIP>
     HOSTCC  samples/bpf/fds_example.o
  /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
  /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        insns, insns_cnt, "GPL", 0,
        ^~~~~
  In file included from /git/linux/samples/bpf/libbpf.h:5:0,
                   from /git/linux/samples/bpf/bpf_load.h:4,
                   from /git/linux/samples/bpf/fds_example.c:15:
  /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
   int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
       ^~~~~~~~~~~~~~~~
    HOSTCC  samples/bpf/sockex1_user.o

So just ditch that 'const' to reduce build noise, leaving changing the
bpf_load_program() bpf_insn parameter to const to a later patch, if deemed
adequate.

Cc: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-1z5xee8n3oa66jf62bpv16ed@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:39 -03:00
Joe Stringer
5dc880de6e tools lib bpf: Add bpf_prog_{attach,detach}
Commit d8c5b17f2b ("samples: bpf: add userspace example for attaching
eBPF programs to cgroups") added these functions to samples/libbpf, but
during this merge all of the samples libbpf functionality is shifting to
tools/lib/bpf. Shift these functions there.

Committer notes:

Use bzero + attr.FIELD = value instead of 'attr = { .FIELD = value, just
like the other wrapper calls to sys_bpf with bpf_attr to make this build
in older toolchais, such as the ones in CentOS 5 and 6.

Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-au2zvtsh55vqeo3v3uw7jr4c@git.kernel.org
Link: 353e6f298c.patch
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:39 -03:00
Joe Stringer
43371c83f3 samples/bpf: Switch over to libbpf
Now that libbpf under tools/lib/bpf/* is synced with the version from
samples/bpf, we can get rid most of the libbpf library here.

Committer notes:

Built it in a docker fedora rawhide container and ran it in the f25 host, seems
to work just like it did before this patch, i.e. the switch to tools/lib/bpf/
doesn't seem to have introduced problems and Joe said he tested it with
all the entries in samples/bpf/ and other code he found:

  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux headers_install
  <SNIP>
  [root@f5065a7d6272 linux]# rm -rf /tmp/build/linux/samples/bpf/
  [root@f5065a7d6272 linux]# make -j4 O=/tmp/build/linux samples/bpf/
  make[1]: Entering directory '/tmp/build/linux'
    CHK     include/config/kernel.release
    HOSTCC  scripts/basic/fixdep
    GEN     ./Makefile
    CHK     include/generated/uapi/linux/version.h
    Using /git/linux as source for kernel
    CHK     include/generated/utsrelease.h
    HOSTCC  scripts/basic/bin2c
    HOSTCC  arch/x86/tools/relocs_32.o
    HOSTCC  arch/x86/tools/relocs_64.o
    LD      samples/bpf/built-in.o
  <SNIP>
    HOSTCC  samples/bpf/fds_example.o
    HOSTCC  samples/bpf/sockex1_user.o
  /git/linux/samples/bpf/fds_example.c: In function 'bpf_prog_create':
  /git/linux/samples/bpf/fds_example.c:63:6: warning: passing argument 2 of 'bpf_load_program' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
        insns, insns_cnt, "GPL", 0,
        ^~~~~
  In file included from /git/linux/samples/bpf/libbpf.h:5:0,
                   from /git/linux/samples/bpf/bpf_load.h:4,
                   from /git/linux/samples/bpf/fds_example.c:15:
  /git/linux/tools/lib/bpf/bpf.h:31:5: note: expected 'struct bpf_insn *' but argument is of type 'const struct bpf_insn *'
   int bpf_load_program(enum bpf_prog_type type, struct bpf_insn *insns,
       ^~~~~~~~~~~~~~~~
    HOSTCC  samples/bpf/sockex2_user.o
  <SNIP>
    HOSTCC  samples/bpf/xdp_tx_iptunnel_user.o
  clang  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/6.2.1/include -I/git/linux/arch/x86/include -I./arch/x86/include/generated/uapi -I./arch/x86/include/generated  -I/git/linux/include -I./include -I/git/linux/arch/x86/include/uapi -I/git/linux/include/uapi -I./include/generated/uapi -include /git/linux/include/linux/kconfig.h  \
	  -D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
	  -Wno-compare-distinct-pointer-types \
	  -Wno-gnu-variable-sized-type-not-at-end \
	  -Wno-address-of-packed-member -Wno-tautological-compare \
	  -O2 -emit-llvm -c /git/linux/samples/bpf/sockex1_kern.c -o -| llc -march=bpf -filetype=obj -o samples/bpf/sockex1_kern.o
    HOSTLD  samples/bpf/tc_l2_redirect
  <SNIP>
    HOSTLD  samples/bpf/lwt_len_hist
    HOSTLD  samples/bpf/xdp_tx_iptunnel
  make[1]: Leaving directory '/tmp/build/linux'
  [root@f5065a7d6272 linux]#

And then, in the host:

  [root@jouet bpf]# mount | grep "docker.*devicemapper\/"
  /dev/mapper/docker-253:0-1705076-9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 on /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9 type xfs (rw,relatime,context="system_u:object_r:container_file_t:s0:c73,c276",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota)
  [root@jouet bpf]# cd /var/lib/docker/devicemapper/mnt/9bd8aa1e0af33adce89ff42090847868ca676932878942be53941a06ec5923f9/rootfs/tmp/build/linux/samples/bpf/
  [root@jouet bpf]# file offwaketime
  offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=f423d171e0487b2f802b6a792657f0f3c8f6d155, not stripped
  [root@jouet bpf]# readelf -SW offwaketime
  offwaketime         offwaketime_kern.o  offwaketime_user.o
  [root@jouet bpf]# readelf -SW offwaketime_kern.o
  There are 11 section headers, starting at offset 0x700:

  Section Headers:
    [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
    [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
    [ 1] .strtab           STRTAB          0000000000000000 000658 0000a8 00      0   0  1
    [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
    [ 3] kprobe/try_to_wake_up PROGBITS        0000000000000000 000040 0000d8 00  AX  0   0  8
    [ 4] .relkprobe/try_to_wake_up REL             0000000000000000 0005a8 000020 10     10   3  8
    [ 5] tracepoint/sched/sched_switch PROGBITS        0000000000000000 000118 000318 00  AX  0   0  8
    [ 6] .reltracepoint/sched/sched_switch REL             0000000000000000 0005c8 000090 10     10   5  8
    [ 7] maps              PROGBITS        0000000000000000 000430 000050 00  WA  0   0  4
    [ 8] license           PROGBITS        0000000000000000 000480 000004 00  WA  0   0  1
    [ 9] version           PROGBITS        0000000000000000 000484 000004 00  WA  0   0  4
    [10] .symtab           SYMTAB          0000000000000000 000488 000120 18      1   4  8
  Key to Flags:
    W (write), A (alloc), X (execute), M (merge), S (strings)
    I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
    O (extra OS processing required) o (OS specific), p (processor specific)
    [root@jouet bpf]# ./offwaketime | head -3
  qemu-system-x86;entry_SYSCALL_64_fastpath;sys_ppoll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;hrtimer_wakeup;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel;start_cpu;;swapper/0 4
  firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 1
  swapper/2;start_cpu;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 61
  [root@jouet bpf]#

Signed-off-by: Joe Stringer <joe@ovn.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: 5c40f54a52.patch
Link: http://lkml.kernel.org/n/tip-xr8twtx7sjh5821g8qw47yxk@git.kernel.org
[ Use -I$(srctree)/tools/lib/ to support out of source code tree builds, as noticed by Wang Nan ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 12:00:38 -03:00
Arnaldo Carvalho de Melo
96c2fb69b9 samples/bpf: Make perf_event_read() static
While testing Joe's conversion of samples/bpf/ to use tools/lib/bpf/ I noticed
some warnings building samples/bpf/ on a Fedora Rawhide container, with
clang/llvm 3.9 I noticed this:

  [root@1e797fdfbf4f linux]# make -j4 O=/tmp/build/linux/ samples/bpf/
  make[1]: Entering directory '/tmp/build/linux'
    CHK     include/config/kernel.release
    GEN     ./Makefile
    CHK     include/generated/uapi/linux/version.h
    Using /git/linux as source for kernel
  <SNIP>
    HOSTCC  samples/bpf/trace_output_user.o
  /git/linux/samples/bpf/trace_output_user.c:64:6: warning: no previous
  prototype for 'perf_event_read' [-Wmissing-prototypes]
   void perf_event_read(print_fn fn)
        ^~~~~~~~~~~~~~~
    HOSTLD  samples/bpf/trace_output
  make[1]: Leaving directory '/tmp/build/linux'

Shut up the compiler by making that function static.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Joe Stringer <joe@ovn.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20161215152927.GC6866@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-20 09:37:33 -03:00
Linus Torvalds
41e0e24b45 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:

 - prototypes for x86 asm-exported symbols (Adam Borowski) and a warning
   about missing CRCs (Nick Piggin)

 - asm-exports fix for LTO (Nicolas Pitre)

 - thin archives improvements (Nick Piggin)

 - linker script fix for CONFIG_LD_DEAD_CODE_DATA_ELIMINATION (Nick
   Piggin)

 - genksyms support for __builtin_va_list keyword

 - misc minor fixes

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  x86/kbuild: enable modversions for symbols exported from asm
  kbuild: fix scripts/adjust_autoksyms.sh* for the no modules case
  scripts/kallsyms: remove last remnants of --page-offset option
  make use of make variable CURDIR instead of calling pwd
  kbuild: cmd_export_list: tighten the sed script
  kbuild: minor improvement for thin archives build
  kbuild: modpost warn if export version crc is missing
  kbuild: keep data tables through dead code elimination
  kbuild: improve linker compatibility with lib-ksyms.o build
  genksyms: Regenerate parser
  kbuild/genksyms: handle va_list type
  kbuild: thin archives for multi-y targets
  kbuild: kallsyms allow 3-pass generation if symbols size has changed
2016-12-17 16:24:13 -08:00
Joe Stringer
d40fc181eb samples/bpf: Make samples more libbpf-centric
Switch all of the sample code to use the function names from
tools/lib/bpf so that they're consistent with that, and to declare their
own log buffers. This allow the next commit to be purely devoted to
getting rid of the duplicate library in samples/bpf.

Committer notes:

Testing it:

On a fedora rawhide container, with clang/llvm 3.9, sharing the host
linux kernel git tree:

  # make O=/tmp/build/linux/ headers_install
  # make O=/tmp/build/linux -C samples/bpf/

Since I forgot to make it privileged, just tested it outside the
container, using what it generated:

  # uname -a
  Linux jouet 4.9.0-rc8+ #1 SMP Mon Dec 12 11:20:49 BRT 2016 x86_64 x86_64 x86_64 GNU/Linux
  # cd /var/lib/docker/devicemapper/mnt/c43e09a53ff56c86a07baf79847f00e2cc2a17a1e2220e1adbf8cbc62734feda/rootfs/tmp/build/linux/samples/bpf/
  # ls -la offwaketime
  -rwxr-xr-x. 1 root root 24200 Dec 15 12:19 offwaketime
  # file offwaketime
  offwaketime: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=c940d3f127d5e66cdd680e42d885cb0b64f8a0e4, not stripped
  # readelf -SW offwaketime_kern.o  | grep PROGBITS
  [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
  [ 3] kprobe/try_to_wake_up PROGBITS        0000000000000000 000040 0000d8 00  AX  0   0  8
  [ 5] tracepoint/sched/sched_switch PROGBITS        0000000000000000 000118 000318 00  AX  0   0  8
  [ 7] maps              PROGBITS        0000000000000000 000430 000050 00  WA  0   0  4
  [ 8] license           PROGBITS        0000000000000000 000480 000004 00  WA  0   0  1
  [ 9] version           PROGBITS        0000000000000000 000484 000004 00  WA  0   0  4
  # ./offwaketime | head -5
  swapper/1;start_secondary;cpu_startup_entry;schedule_preempt_disabled;schedule;__schedule;-;---;; 106
  CPU 0/KVM;entry_SYSCALL_64_fastpath;sys_ioctl;do_vfs_ioctl;kvm_vcpu_ioctl;kvm_arch_vcpu_ioctl_run;kvm_vcpu_block;schedule;__schedule;-;try_to_wake_up;swake_up_locked;swake_up;apic_timer_expired;apic_timer_fn;__hrtimer_run_queues;hrtimer_interrupt;local_apic_timer_interrupt;smp_apic_timer_interrupt;__irqentry_text_start;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary;;swapper/3 2
  Compositor;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;futex_requeue;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;SoftwareVsyncTh 5
  firefox;entry_SYSCALL_64_fastpath;sys_poll;do_sys_poll;poll_schedule_timeout;schedule_hrtimeout_range;schedule_hrtimeout_range_clock;schedule;__schedule;-;try_to_wake_up;pollwake;__wake_up_common;__wake_up_sync_key;pipe_write;__vfs_write;vfs_write;sys_write;entry_SYSCALL_64_fastpath;;Timer 13
  JS Helper;entry_SYSCALL_64_fastpath;sys_futex;do_futex;futex_wait;futex_wait_queue_me;schedule;__schedule;-;try_to_wake_up;do_futex;sys_futex;entry_SYSCALL_64_fastpath;;firefox 2
  #

Signed-off-by: Joe Stringer <joe@ovn.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20161214224342.12858-2-joe@ovn.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-12-15 16:25:47 -03:00