linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-17 03:36:59 +07:00

Author	SHA1	Message	Date
Sargun Dhillon	cf9b1199de	samples/bpf: Add test/example of using bpf_probe_write_user bpf helper This example shows using a kprobe to act as a dnat mechanism to divert traffic for arbitrary endpoints. It rewrite the arguments to a syscall while they're still in userspace, and before the syscall has a chance to copy the argument into kernel space. Although this is an example, it also acts as a test because the mapped address is 255.255.255.255:555 -> real address, and that's not a legal address to connect to. If the helper is broken, the example will fail on the intermediate steps, as well as the final step to verify the rewrite of userspace memory succeeded. Signed-off-by: Sargun Dhillon <sargun@sargun.me> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-25 18:07:48 -07:00
Brenden Blanco	764cbccef8	bpf: add sample for xdp forwarding and rewrite Add a sample that rewrites and forwards packets out on the same interface. Observed single core forwarding performance of ~10Mpps. Since the mlx4 driver under test recycles every single packet page, the perf output shows almost exclusively just the ring management and bpf program work. Slowdowns are likely occurring due to cache misses. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-19 21:46:33 -07:00
Brenden Blanco	86af8b4191	Add sample for adding simple drop program to link Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX hook of a link. With the drop-only program, observed single core rate is ~20Mpps. Other tests were run, for instance without the dropcnt increment or without reading from the packet header, the packet rate was mostly unchanged. $ perf record -a samples/bpf/xdp1 $(</sys/class/net/eth0/ifindex) proto 17: 20403027 drops/s ./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4 Running... ctrl^C to stop Device: eth4@0 Result: OK: 11791017(c11788327+d2689) usec, 59622913 (60byte,0frags) 5056638pps 2427Mb/sec (2427186240bps) errors: 0 Device: eth4@1 Result: OK: 11791012(c11787906+d3106) usec, 60526944 (60byte,0frags) 5133311pps 2463Mb/sec (2463989280bps) errors: 0 Device: eth4@2 Result: OK: 11791019(c11788249+d2769) usec, 59868091 (60byte,0frags) 5077431pps 2437Mb/sec (2437166880bps) errors: 0 Device: eth4@3 Result: OK: 11795039(c11792403+d2636) usec, 59483181 (60byte,0frags) 5043067pps 2420Mb/sec (2420672160bps) errors: 0 perf report --no-children: 26.05% ksoftirqd/0 [mlx4_en] [k] mlx4_en_process_rx_cq 17.84% ksoftirqd/0 [mlx4_en] [k] mlx4_en_alloc_frags 5.52% ksoftirqd/0 [mlx4_en] [k] mlx4_en_free_frag 4.90% swapper [kernel.vmlinux] [k] poll_idle 4.14% ksoftirqd/0 [kernel.vmlinux] [k] get_page_from_freelist 2.78% ksoftirqd/0 [kernel.vmlinux] [k] __free_pages_ok 2.57% ksoftirqd/0 [kernel.vmlinux] [k] bpf_map_lookup_elem 2.51% swapper [mlx4_en] [k] mlx4_en_process_rx_cq 1.94% ksoftirqd/0 [kernel.vmlinux] [k] percpu_array_map_lookup_elem 1.45% swapper [mlx4_en] [k] mlx4_en_alloc_frags 1.35% ksoftirqd/0 [kernel.vmlinux] [k] free_one_page 1.33% swapper [kernel.vmlinux] [k] intel_idle 1.04% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5c5 0.96% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c58d 0.93% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c6ee 0.92% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c6b9 0.89% ksoftirqd/0 [kernel.vmlinux] [k] __alloc_pages_nodemask 0.83% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c686 0.83% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5d5 0.78% ksoftirqd/0 [mlx4_en] [k] mlx4_alloc_pages.isra.23 0.77% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5b4 0.77% ksoftirqd/0 [kernel.vmlinux] [k] net_rx_action machine specs: receiver - Intel E5-1630 v3 @ 3.70GHz sender - Intel E5645 @ 2.40GHz Mellanox ConnectX-3 @40G Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-19 21:46:32 -07:00
Martin KaFai Lau	a3f7461734	cgroup: bpf: Add an example to do cgroup checking in BPF test_cgrp2_array_pin.c: A userland program that creates a bpf_map (BPF_MAP_TYPE_GROUP_ARRAY), pouplates/updates it with a cgroup2's backed fd and pins it to a bpf-fs's file. The pinned file can be loaded by tc and then used by the bpf prog later. This program can also update an existing pinned array and it could be useful for debugging/testing purpose. test_cgrp2_tc_kern.c: A bpf prog which should be loaded by tc. It is to demonstrate the usage of bpf_skb_in_cgroup. test_cgrp2_tc.sh: A script that glues the test_cgrp2_array_pin.c and test_cgrp2_tc_kern.c together. The idea is like: 1. Load the test_cgrp2_tc_kern.o by tc 2. Use test_cgrp2_array_pin.c to populate a BPF_MAP_TYPE_CGROUP_ARRAY with a cgroup fd 3. Do a 'ping -6 ff02::1%ve' to ensure the packet has been dropped because of a match on the cgroup Most of the lines in test_cgrp2_tc.sh is the boilerplate to setup the cgroup/bpf-fs/net-devices/netns...etc. It is not bulletproof on errors but should work well enough and give enough debug info if things did not go well. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Tejun Heo <tj@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-01 16:32:13 -04:00
Alexei Starovoitov	65d472fb00	samples/bpf: add 'pointer to packet' tests parse_simple.c - packet parser exapmle with single length check that filters out udp packets for port 9 parse_varlen.c - variable length parser that understand multiple vlan headers, ipip, ipip6 and ip options to filter out udp or tcp packets on port 9. The packet is parsed layer by layer with multitple length checks. parse_ldabs.c - classic style of packet parsing using LD_ABS instruction. Same functionality as parse_simple. simple = 24.1Mpps per core varlen = 22.7Mpps ldabs = 21.4Mpps Parser with LD_ABS instructions is slower than full direct access parser which does more packet accesses and checks. These examples demonstrate the choice bpf program authors can make between flexibility of the parser vs speed. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-06 16:01:54 -04:00
Jesper Dangaard Brouer	bdefbbf2ec	samples/bpf: like LLC also verify and allow redefining CLANG command Users are likely to manually compile both LLVM 'llc' and 'clang' tools. Thus, also allow redefining CLANG and verify command exist. Makefile implementation wise, the target that verify the command have been generalized. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 14:26:08 -04:00
Jesper Dangaard Brouer	b62a796c10	samples/bpf: allow make to be run from samples/bpf/ directory It is not intuitive that 'make' must be run from the top level directory with argument "samples/bpf/" to compile these eBPF samples. Introduce a kbuild make file trick that allow make to be run from the "samples/bpf/" directory itself. It basically change to the top level directory and call "make samples/bpf/" with the "/" slash after the directory name. Also add a clean target that only cleans this directory, by taking advantage of the kbuild external module setting M=$PWD. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 14:25:33 -04:00
Jesper Dangaard Brouer	7b01dd5793	samples/bpf: Makefile verify LLVM compiler avail and bpf target is supported Make compiling samples/bpf more user friendly, by detecting if LLVM compiler tool 'llc' is available, and also detect if the 'bpf' target is available in this version of LLVM. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 14:25:32 -04:00
Jesper Dangaard Brouer	6ccfba75d3	samples/bpf: add back functionality to redefine LLC command It is practical to be-able-to redefine the location of the LLVM command 'llc', because not all distros have a LLVM version with bpf target support. Thus, it is sometimes required to compile LLVM from source, and sometimes it is not desired to overwrite the distros default LLVM version. This feature was removed with `128d1514be` ("samples/bpf: Use llc in PATH, rather than a hardcoded value"). Add this features back. Note that it is possible to redefine the LLC on the make command like: make samples/bpf/ LLC=~/git/llvm/build/bin/llc Fixes: `128d1514be` ("samples/bpf: Use llc in PATH, rather than a hardcoded value") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 14:25:32 -04:00
David S. Miller	ae95d71261	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-04-09 17:41:41 -04:00
Alexei Starovoitov	e3edfdec04	samples/bpf: add tracepoint vs kprobe performance tests the first microbenchmark does fd=open("/proc/self/comm"); for() { write(fd, "test"); } and on 4 cpus in parallel: writes per sec base (no tracepoints, no kprobes) 930k with kprobe at __set_task_comm() 420k with tracepoint at task:task_rename 730k For kprobe + full bpf program manully fetches oldcomm, newcomm via bpf_probe_read. For tracepint bpf program does nothing, since arguments are copied by tracepoint. 2nd microbenchmark does: fd=open("/dev/urandom"); for() { read(fd, buf); } and on 4 cpus in parallel: reads per sec base (no tracepoints, no kprobes) 300k with kprobe at urandom_read() 279k with tracepoint at random:urandom_read 290k bpf progs attached to kprobe and tracepoint are noop. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-07 21:04:27 -04:00
Naveen N. Rao	128d1514be	samples/bpf: Use llc in PATH, rather than a hardcoded value While at it, remove the generation of .s files and fix some typos in the related comment. Cc: Alexei Starovoitov <ast@fb.com> Cc: David S. Miller <davem@davemloft.net> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 16:01:28 -04:00
Alexei Starovoitov	26e9093110	samples/bpf: add map performance test performance tests for hash map and per-cpu hash map with and without pre-allocation Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-08 23:22:03 -05:00
Alexei Starovoitov	9d8b612d88	samples/bpf: add bpf map stress test this test calls bpf programs from different contexts: from inside of slub, from rcu, from pretty much everywhere, since it kprobes all spin_lock functions. It stresses the bpf hash and percpu map pre-allocation, deallocation logic and call_rcu mechanisms. User space part adding more stress by walking and deleting map elements. Note that due to nature bpf_load.c the earlier kprobe+bpf programs are already active while loader loads new programs, creates new kprobes and attaches them. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-08 23:22:02 -05:00
Alexei Starovoitov	a6ffe7b9df	samples/bpf: offwaketime example This is simplified version of Brendan Gregg's offwaketime: This program shows kernel stack traces and task names that were blocked and "off-CPU", along with the stack traces and task names for the threads that woke them, and the total elapsed time from when they blocked to when they were woken up. The combined stacks, task names, and total time is summarized in kernel context for efficiency. Example: $ sudo ./offwaketime \| flamegraph.pl > demo.svg Open demo.svg in the browser as FlameGraph visualization. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-20 00:21:44 -05:00
Yang Shi	30b50aa612	bpf: samples: exclude asm/sysreg.h for arm64 commit `338d4f49d6` ("arm64: kernel: Add support for Privileged Access Never") includes sysreg.h into futex.h and uaccess.h. But, the inline assembly used by asm/sysreg.h is incompatible with llvm so it will cause BPF samples build failure for ARM64. Since sysreg.h is useless for BPF samples, just exclude it from Makefile via defining __ASM_SYSREG_H. Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-16 14:40:49 -05:00
Daniel Borkmann	42984d7c1e	bpf: add sample usages for persistent maps/progs This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN and BPF_OBJ_GET commands can be used. Example with maps: # ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42 bpf: map fd:3 (Success) bpf: pin ret:(0,Success) bpf: fd:3 u->(1:42) ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):42 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24 bpf: get fd:3 (Success) bpf: fd:3 u->(1:24) ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):24 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -P -m bpf: map fd:3 (Success) bpf: pin ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1 bpf: get fd:3 (Success) bpf: fd:3 l->(1):0 ret:(0,Success) # ./fds_example -F /sys/fs/bpf/m2 -G -m bpf: get fd:3 (Success) Example with progs: # ./fds_example -F /sys/fs/bpf/p -P -p bpf: prog fd:3 (Success) bpf: pin ret:(0,Success) bpf sock:4 <- fd:3 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p -G -p bpf: get fd:3 (Success) bpf: sock:4 <- fd:3 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p2 -P -p -o ./sockex1_kern.o bpf: prog fd:5 (Success) bpf: pin ret:(0,Success) bpf: sock:3 <- fd:5 attached ret:(0,Success) # ./fds_example -F /sys/fs/bpf/p2 -G -p bpf: get fd:3 (Success) bpf: sock:4 <- fd:3 attached ret:(0,Success) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 22:48:39 -05:00
Alexei Starovoitov	39111695b1	samples: bpf: add bpf_perf_event_output example Performance test and example of bpf_perf_event_output(). kprobe is attached to sys_write() and trivial bpf program streams pid+cookie into userspace via PERF_COUNT_SW_BPF_OUTPUT event. Usage: $ sudo ./bld_x64/samples/bpf/trace_output recv 2968913 events per sec Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-22 06:42:15 -07:00
Kaixu Xia	47efb30274	samples/bpf: example of get selected PMU counter value This is a simple example and shows how to use the new ability to get the selected Hardware PMU counter value. Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:50:06 -07:00
Daniel Wagner	0fb1170ee6	bpf: BPF based latency tracing BPF offers another way to generate latency histograms. We attach kprobes at trace_preempt_off and trace_preempt_on and calculate the time it takes to from seeing the off/on transition. The first array is used to store the start time stamp. The key is the CPU id. The second array stores the log2(time diff). We need to use static allocation here (array and not hash tables). The kprobes hooking into trace_preempt_on\|off should not calling any dynamic memory allocation or free path. We need to avoid recursivly getting called. Besides that, it reduces jitter in the measurement. CPU 0 latency : count distribution 1 -> 1 : 0 \| \| 2 -> 3 : 0 \| \| 4 -> 7 : 0 \| \| 8 -> 15 : 0 \| \| 16 -> 31 : 0 \| \| 32 -> 63 : 0 \| \| 64 -> 127 : 0 \| \| 128 -> 255 : 0 \| \| 256 -> 511 : 0 \| \| 512 -> 1023 : 0 \| \| 1024 -> 2047 : 0 \| \| 2048 -> 4095 : 166723 \|************************************* \| 4096 -> 8191 : 19870 \|* \| 8192 -> 16383 : 6324 \| \| 16384 -> 32767 : 1098 \| \| 32768 -> 65535 : 190 \| \| 65536 -> 131071 : 179 \| \| 131072 -> 262143 : 18 \| \| 262144 -> 524287 : 4 \| \| 524288 -> 1048575 : 1363 \| \| CPU 1 latency : count distribution 1 -> 1 : 0 \| \| 2 -> 3 : 0 \| \| 4 -> 7 : 0 \| \| 8 -> 15 : 0 \| \| 16 -> 31 : 0 \| \| 32 -> 63 : 0 \| \| 64 -> 127 : 0 \| \| 128 -> 255 : 0 \| \| 256 -> 511 : 0 \| \| 512 -> 1023 : 0 \| \| 1024 -> 2047 : 0 \| \| 2048 -> 4095 : 114042 \|************************************* \| 4096 -> 8191 : 9587 \| \| 8192 -> 16383 : 4140 \| \| 16384 -> 32767 : 673 \| \| 32768 -> 65535 : 179 \| \| 65536 -> 131071 : 29 \| \| 131072 -> 262143 : 4 \| \| 262144 -> 524287 : 1 \| \| 524288 -> 1048575 : 364 \| \| CPU 2 latency : count distribution 1 -> 1 : 0 \| \| 2 -> 3 : 0 \| \| 4 -> 7 : 0 \| \| 8 -> 15 : 0 \| \| 16 -> 31 : 0 \| \| 32 -> 63 : 0 \| \| 64 -> 127 : 0 \| \| 128 -> 255 : 0 \| \| 256 -> 511 : 0 \| \| 512 -> 1023 : 0 \| \| 1024 -> 2047 : 0 \| \| 2048 -> 4095 : 40147 \|*************************************** \| 4096 -> 8191 : 2300 \|* \| 8192 -> 16383 : 828 \| \| 16384 -> 32767 : 178 \| \| 32768 -> 65535 : 59 \| \| 65536 -> 131071 : 2 \| \| 131072 -> 262143 : 0 \| \| 262144 -> 524287 : 1 \| \| 524288 -> 1048575 : 174 \| \| CPU 3 latency : count distribution 1 -> 1 : 0 \| \| 2 -> 3 : 0 \| \| 4 -> 7 : 0 \| \| 8 -> 15 : 0 \| \| 16 -> 31 : 0 \| \| 32 -> 63 : 0 \| \| 64 -> 127 : 0 \| \| 128 -> 255 : 0 \| \| 256 -> 511 : 0 \| \| 512 -> 1023 : 0 \| \| 1024 -> 2047 : 0 \| \| 2048 -> 4095 : 29626 \|************************************* \| 4096 -> 8191 : 2704 \| \| 8192 -> 16383 : 1090 \| \| 16384 -> 32767 : 160 \| \| 32768 -> 65535 : 72 \| \| 65536 -> 131071 : 32 \| \| 131072 -> 262143 : 26 \| \| 262144 -> 524287 : 12 \| \| 524288 -> 1048575 : 298 \| \| All this is based on the trace3 examples written by Alexei Starovoitov <ast@plumgrid.com>. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-23 06:09:58 -07:00
Alexei Starovoitov	530b2c8619	samples/bpf: bpf_tail_call example for networking Usage: $ sudo ./sockex3 IP src.port -> dst.port bytes packets 127.0.0.1.42010 -> 127.0.0.1.12865 1568 8 127.0.0.1.59526 -> 127.0.0.1.33778 11422636 173070 127.0.0.1.33778 -> 127.0.0.1.59526 11260224828 341974 127.0.0.1.12865 -> 127.0.0.1.42010 1832 12 IP src.port -> dst.port bytes packets 127.0.0.1.42010 -> 127.0.0.1.12865 1568 8 127.0.0.1.59526 -> 127.0.0.1.33778 23198092 351486 127.0.0.1.33778 -> 127.0.0.1.59526 22972698518 698616 127.0.0.1.12865 -> 127.0.0.1.42010 1832 12 this example is similar to sockex2 in a way that it accumulates per-flow statistics, but it does packet parsing differently. sockex2 inlines full packet parser routine into single bpf program. This sockex3 example have 4 independent programs that parse vlan, mpls, ip, ipv6 and one main program that starts the process. bpf_tail_call() mechanism allows each program to be small and be called on demand potentially multiple times, so that many vlan, mpls, ip in ip, gre encapsulations can be parsed. These and other protocol parsers can be added or removed at runtime. TLVs can be parsed in similar manner. Note, tail_call_cnt dynamic check limits the number of tail calls to 32. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-21 17:07:59 -04:00
Alexei Starovoitov	5bacd7805a	samples/bpf: bpf_tail_call example for tracing kprobe example that demonstrates how future seccomp programs may look like. It attaches to seccomp_phase1() function and tail-calls other BPF programs depending on syscall number. Existing optimized classic BPF seccomp programs generated by Chrome look like: if (sd.nr < 121) { if (sd.nr < 57) { if (sd.nr < 22) { if (sd.nr < 7) { if (sd.nr < 4) { if (sd.nr < 1) { check sys_read } else { if (sd.nr < 3) { check sys_write and sys_open } else { check sys_close } } } else { } else { } else { } else { } else { } the future seccomp using native eBPF may look like: bpf_tail_call(&sd, &syscall_jmp_table, sd.nr); which is simpler, faster and leaves more room for per-syscall checks. Usage: $ sudo ./tracex5 <...>-366 [001] d... 4.870033: : read(fd=1, buf=00007f6d5bebf000, size=771) <...>-369 [003] d... 4.870066: : mmap <...>-369 [003] d... 4.870077: : syscall=110 (one of get/set uid/pid/gid) <...>-369 [003] d... 4.870089: : syscall=107 (one of get/set uid/pid/gid) sh-369 [000] d... 4.891740: : read(fd=0, buf=00000000023d1000, size=512) sh-369 [000] d... 4.891747: : write(fd=1, buf=00000000023d3000, size=512) sh-369 [000] d... 4.891747: : read(fd=1, buf=00000000023d3000, size=512) Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-21 17:07:59 -04:00
Brenden Blanco	b88c06e36d	samples/bpf: fix in-source build of samples with clang in-source build of 'make samples/bpf/' was incorrectly using default compiler instead of invoking clang/llvm. out-of-source build was ok. Fixes: `a80857822b` ("samples: bpf: trivial eBPF program in C") Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-05-12 23:15:25 -04:00
Linus Torvalds	6c373ca893	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) Add BQL support to via-rhine, from Tino Reichardt. 2) Integrate SWITCHDEV layer support into the DSA layer, so DSA drivers can support hw switch offloading. From Floria Fainelli. 3) Allow 'ip address' commands to initiate multicast group join/leave, from Madhu Challa. 4) Many ipv4 FIB lookup optimizations from Alexander Duyck. 5) Support EBPF in cls_bpf classifier and act_bpf action, from Daniel Borkmann. 6) Remove the ugly compat support in ARP for ugly layers like ax25, rose, etc. And use this to clean up the neigh layer, then use it to implement MPLS support. All from Eric Biederman. 7) Support L3 forwarding offloading in switches, from Scott Feldman. 8) Collapse the LOCAL and MAIN ipv4 FIB tables when possible, to speed up route lookups even further. From Alexander Duyck. 9) Many improvements and bug fixes to the rhashtable implementation, from Herbert Xu and Thomas Graf. In particular, in the case where an rhashtable user bulk adds a large number of items into an empty table, we expand the table much more sanely. 10) Don't make the tcp_metrics hash table per-namespace, from Eric Biederman. 11) Extend EBPF to access SKB fields, from Alexei Starovoitov. 12) Split out new connection request sockets so that they can be established in the main hash table. Much less false sharing since hash lookups go direct to the request sockets instead of having to go first to the listener then to the request socks hashed underneath. From Eric Dumazet. 13) Add async I/O support for crytpo AF_ALG sockets, from Tadeusz Struk. 14) Support stable privacy address generation for RFC7217 in IPV6. From Hannes Frederic Sowa. 15) Hash network namespace into IP frag IDs, also from Hannes Frederic Sowa. 16) Convert PTP get/set methods to use 64-bit time, from Richard Cochran. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1816 commits) fm10k: Bump driver version to 0.15.2 fm10k: corrected VF multicast update fm10k: mbx_update_max_size does not drop all oversized messages fm10k: reset head instead of calling update_max_size fm10k: renamed mbx_tx_dropped to mbx_tx_oversized fm10k: update xcast mode before synchronizing multicast addresses fm10k: start service timer on probe fm10k: fix function header comment fm10k: comment next_vf_mbx flow fm10k: don't handle mailbox events in iov_event path and always process mailbox fm10k: use separate workqueue for fm10k driver fm10k: Set PF queues to unlimited bandwidth during virtualization fm10k: expose tx_timeout_count as an ethtool stat fm10k: only increment tx_timeout_count in Tx hang path fm10k: remove extraneous "Reset interface" message fm10k: separate PF only stats so that VF does not display them fm10k: use hw->mac.max_queues for stats fm10k: only show actual queues, not the maximum in hardware fm10k: allow creation of VLAN on default vid fm10k: fix unused warnings ...	2015-04-15 09:00:47 -07:00
Alexei Starovoitov	91bc4822c3	tc: bpf: add checksum helpers Commit `608cd71a9c` ("tc: bpf: generalize pedit action") has added the possibility to mangle packet data to BPF programs in the tc pipeline. This patch adds two helpers bpf_l3_csum_replace() and bpf_l4_csum_replace() for fixing up the protocol checksums after the packet mangling. It also adds 'flags' argument to bpf_skb_store_bytes() helper to avoid unnecessary checksum recomputations when BPF programs adjusting l3/l4 checksums and documents all three helpers in uapi header. Moreover, a sample program is added to show how BPF programs can make use of the mangle and csum helpers. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-04-06 16:42:35 -04:00
Alexei Starovoitov	9811e35359	samples/bpf: Add kmem_alloc()/free() tracker tool One BPF program attaches to kmem_cache_alloc_node() and remembers all allocated objects in the map. Another program attaches to kmem_cache_free() and deletes corresponding object from the map. User space walks the map every second and prints any objects which are older than 1 second. Usage: $ sudo tracex4 Then start few long living processes. The 'tracex4' will print something like this: obj 0xffff880465928000 is 13sec old was allocated at ip ffffffff8105dc32 obj 0xffff88043181c280 is 13sec old was allocated at ip ffffffff8105dc32 obj 0xffff880465848000 is 8sec old was allocated at ip ffffffff8105dc32 obj 0xffff8804338bc280 is 15sec old was allocated at ip ffffffff8105dc32 $ addr2line -fispe vmlinux ffffffff8105dc32 do_fork at fork.c:1665 As soon as processes exit the memory is reclaimed and 'tracex4' prints nothing. Similar experiment can be done with the __kmalloc()/kfree() pair. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1427312966-8434-10-git-send-email-ast@plumgrid.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 13:25:51 +02:00
Alexei Starovoitov	5c7fc2d27d	samples/bpf: Add IO latency analysis (iosnoop/heatmap) tool BPF C program attaches to blk_mq_start_request()/blk_update_request() kprobe events to calculate IO latency. For every completed block IO event it computes the time delta in nsec and records in a histogram map: map[log10(delta)10]++ User space reads this histogram map every 2 seconds and prints it as a 'heatmap' using gray shades of text terminal. Black spaces have many events and white spaces have very few events. Left most space is the smallest latency, right most space is the largest latency in the range. Usage: $ sudo ./tracex3 and do 'sudo dd if=/dev/sda of=/dev/null' in other terminal. Observe IO latencies and how different activity (like 'make kernel') affects it. Similar experiments can be done for network transmit latencies, syscalls, etc. '-t' flag prints the heatmap using normal ascii characters: $ sudo ./tracex3 -t heatmap of IO latency # - many events with this latency - few events \|1us \|10us \|100us \|1ms \|10ms \|100ms \|1s \|10s ooo. O.#. # 221 . # . # 125 .. .o#.. # 55 . . . . .#O # 37 .# # 175 .#. # 37 # # 199 . . #. # 55 #.. # 42 # # 266 ...**Oo#OO*o# . # 629 # # 271 . .#o* o.o # 221 . . o* *#O.. # 50 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1427312966-8434-9-git-send-email-ast@plumgrid.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 13:25:51 +02:00
Alexei Starovoitov	d822a19268	samples/bpf: Add counting example for kfree_skb() function calls and the write() syscall this example has two probes in one C file that attach to different kprove events and use two different maps. 1st probe is x64 specific equivalent of dropmon. It attaches to kfree_skb, retrevies 'ip' address of kfree_skb() caller and counts number of packet drops at that 'ip' address. User space prints 'location - count' map every second. 2nd probe attaches to kprobe:sys_write and computes a histogram of different write sizes Usage: $ sudo tracex2 location 0xffffffff81695995 count 1 location 0xffffffff816d0da9 count 2 location 0xffffffff81695995 count 2 location 0xffffffff816d0da9 count 2 location 0xffffffff81695995 count 3 location 0xffffffff816d0da9 count 2 557145+0 records in 557145+0 records out 285258240 bytes (285 MB) copied, 1.02379 s, 279 MB/s syscall write() stats byte_size : count distribution 1 -> 1 : 3 \| \| 2 -> 3 : 0 \| \| 4 -> 7 : 0 \| \| 8 -> 15 : 0 \| \| 16 -> 31 : 2 \| \| 32 -> 63 : 3 \| \| 64 -> 127 : 1 \| \| 128 -> 255 : 1 \| \| 256 -> 511 : 0 \| \| 512 -> 1023 : 1118968 \|************************************* \| Ctrl-C at any time. Kernel will auto cleanup maps and programs $ addr2line -ape ./bld_x64/vmlinux 0xffffffff81695995 0xffffffff816d0da9 0xffffffff81695995: ./bld_x64/../net/ipv4/icmp.c:1038 0xffffffff816d0da9: ./bld_x64/../net/unix/af_unix.c:1231 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1427312966-8434-8-git-send-email-ast@plumgrid.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 13:25:50 +02:00
Alexei Starovoitov	b896c4f95a	samples/bpf: Add simple non-portable kprobe filter example tracex1_kern.c - C program compiled into BPF. It attaches to kprobe:netif_receive_skb() When skb->dev->name == "lo", it prints sample debug message into trace_pipe via bpf_trace_printk() helper function. tracex1_user.c - corresponding user space component that: - loads BPF program via bpf() syscall - opens kprobes:netif_receive_skb event via perf_event_open() syscall - attaches the program to event via ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd); - prints from trace_pipe Note, this BPF program is non-portable. It must be recompiled with current kernel headers. kprobe is not a stable ABI and BPF+kprobe scripts may no longer be meaningful when kernel internals change. No matter in what way the kernel changes, neither the kprobe, nor the BPF program can ever crash or corrupt the kernel, assuming the kprobes, perf and BPF subsystem has no bugs. The verifier will detect that the program is using bpf_trace_printk() and the kernel will print 'this is a DEBUG kernel' warning banner, which means that bpf_trace_printk() should be used for debugging of the BPF program only. Usage: $ sudo tracex1 ping-19826 [000] d.s2 63103.382648: : skb ffff880466b1ca00 len 84 ping-19826 [000] d.s2 63103.382684: : skb ffff880466b1d300 len 84 ping-19826 [000] d.s2 63104.382533: : skb ffff880466b1ca00 len 84 ping-19826 [000] d.s2 63104.382594: : skb ffff880466b1d300 len 84 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1427312966-8434-7-git-send-email-ast@plumgrid.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-04-02 13:25:50 +02:00
Alexei Starovoitov	fbe3310840	samples: bpf: large eBPF program in C sockex2_kern.c is purposefully large eBPF program in C. llvm compiles ~200 lines of C code into ~300 eBPF instructions. It's similar to __skb_flow_dissect() to demonstrate that complex packet parsing can be done by eBPF. Then it uses (struct flow_keys)->dst IP address (or hash of ipv6 dst) to keep stats of number of packets per IP. User space loads eBPF program, attaches it to loopback interface and prints dest_ip->#packets stats every second. Usage: $sudo samples/bpf/sockex2 ip 127.0.0.1 count 19 ip 127.0.0.1 count 178115 ip 127.0.0.1 count 369437 ip 127.0.0.1 count 559841 ip 127.0.0.1 count 750539 Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-05 21:47:34 -08:00
Alexei Starovoitov	a80857822b	samples: bpf: trivial eBPF program in C this example does the same task as previous socket example in assembler, but this one does it in C. eBPF program in kernel does: /* assume that packet is IPv4, load one byte of IP->proto / int index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)); long value; value = bpf_map_lookup_elem(&my_map, &index); if (value) __sync_fetch_and_add(value, 1); Corresponding user space reads map[tcp], map[udp], map[icmp] and prints protocol stats every second Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-05 21:47:33 -08:00
Alexei Starovoitov	03f4723ed7	samples: bpf: example of stateful socket filtering this socket filter example does: - creates arraymap in kernel with key 4 bytes and value 8 bytes - loads eBPF program which assumes that packet is IPv4 and loads one byte of IP->proto from the packet and uses it as a key in a map r0 = skb->data[ETH_HLEN + offsetof(struct iphdr, protocol)]; (u32)(fp - 4) = r0; value = bpf_map_lookup_elem(map_fd, fp - 4); if (value) ((u64)value) += 1; - attaches this program to raw socket - every second user space reads map[IPPROTO_TCP], map[IPPROTO_UDP], map[IPPROTO_ICMP] to see how many packets of given protocol were seen on loopback interface Usage: $sudo samples/bpf/sock_example TCP 0 UDP 0 ICMP 0 packets TCP 187600 UDP 0 ICMP 4 packets TCP 376504 UDP 0 ICMP 8 packets TCP 563116 UDP 0 ICMP 12 packets TCP 753144 UDP 0 ICMP 16 packets Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-12-05 21:47:32 -08:00
Alexei Starovoitov	ffb65f27a1	bpf: add a testsuite for eBPF maps . check error conditions and sanity of hash and array map APIs . check large maps (that kernel gracefully switches to vmalloc from kmalloc) . check multi-process parallel access and stress test Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-11-18 13:43:59 -05:00
Alexei Starovoitov	3c731eba48	bpf: mini eBPF library, test stubs and verifier testsuite 1. the library includes a trivial set of BPF syscall wrappers: int bpf_create_map(int key_size, int value_size, int max_entries); int bpf_update_elem(int fd, void key, void value); int bpf_lookup_elem(int fd, void key, void value); int bpf_delete_elem(int fd, void key); int bpf_get_next_key(int fd, void key, void next_key); int bpf_prog_load(enum bpf_prog_type prog_type, const struct sock_filter_int insns, int insn_len, const char license); bpf_prog_load() stores verifier log into global bpf_log_buf[] array and BPF_() macros to build instructions 2. test stubs configure eBPF infra with 'unspec' map and program types. These are fake types used by user space testsuite only. 3. verifier tests valid and invalid programs and expects predefined error log messages from kernel. 40 tests so far. $ sudo ./test_verifier #0 add+sub+mul OK #1 unreachable OK #2 unreachable2 OK #3 out of range jump OK #4 out of range jump2 OK #5 test1 ld_imm64 OK ... Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:05:15 -04:00

34 Commits