mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2025-01-16 06:56:51 +07:00
4eaf0b5c5e
Add fmod_ret BPF program to existing test_overhead selftest. Also re-implement user-space benchmarking part into benchmark runner to compare results. Results with ./bench are consistently somewhat lower than test_overhead's, but relative performance of various types of BPF programs stay consisten (e.g., kretprobe is noticeably slower). This slowdown seems to be coming from the fact that test_overhead is single-threaded, while benchmark always spins off at least one thread for producer. This has been confirmed by hacking multi-threaded test_overhead variant and also single-threaded bench variant. Resutls are below. run_bench_rename.sh script from benchs/ subdirectory was used to produce results for ./bench. Single-threaded implementations =============================== /* bench: single-threaded, atomics */ base : 4.622 ± 0.049M/s kprobe : 3.673 ± 0.052M/s kretprobe : 2.625 ± 0.052M/s rawtp : 4.369 ± 0.089M/s fentry : 4.201 ± 0.558M/s fexit : 4.309 ± 0.148M/s fmodret : 4.314 ± 0.203M/s /* selftest: single-threaded, no atomics */ task_rename base 4555K events per sec task_rename kprobe 3643K events per sec task_rename kretprobe 2506K events per sec task_rename raw_tp 4303K events per sec task_rename fentry 4307K events per sec task_rename fexit 4010K events per sec task_rename fmod_ret 3984K events per sec Multi-threaded implementations ============================== /* bench: multi-threaded w/ atomics */ base : 3.910 ± 0.023M/s kprobe : 3.048 ± 0.037M/s kretprobe : 2.300 ± 0.015M/s rawtp : 3.687 ± 0.034M/s fentry : 3.740 ± 0.087M/s fexit : 3.510 ± 0.009M/s fmodret : 3.485 ± 0.050M/s /* selftest: multi-threaded w/ atomics */ task_rename base 3872K events per sec task_rename kprobe 3068K events per sec task_rename kretprobe 2350K events per sec task_rename raw_tp 3731K events per sec task_rename fentry 3639K events per sec task_rename fexit 3558K events per sec task_rename fmod_ret 3511K events per sec /* selftest: multi-threaded, no atomics */ task_rename base 3945K events per sec task_rename kprobe 3298K events per sec task_rename kretprobe 2451K events per sec task_rename raw_tp 3718K events per sec task_rename fentry 3782K events per sec task_rename fexit 3543K events per sec task_rename fmod_ret 3526K events per sec Note that the fact that ./bench benchmark always uses atomic increments for counting, while test_overhead doesn't, doesn't influence test results all that much. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-4-andriin@fb.com |
||
---|---|---|
.. | ||
.gitignore | ||
attach_probe.c | ||
bpf_iter.c | ||
bpf_obj_id.c | ||
bpf_tcp_ca.c | ||
bpf_verif_scale.c | ||
btf_dump.c | ||
btf_map_in_map.c | ||
cgroup_attach_autodetach.c | ||
cgroup_attach_multi.c | ||
cgroup_attach_override.c | ||
cgroup_link.c | ||
cls_redirect.c | ||
connect_force_port.c | ||
core_extern.c | ||
core_reloc.c | ||
cpu_mask.c | ||
enable_stats.c | ||
fentry_fexit.c | ||
fentry_test.c | ||
fexit_bpf2bpf.c | ||
fexit_stress.c | ||
fexit_test.c | ||
flow_dissector_load_bytes.c | ||
flow_dissector_reattach.c | ||
flow_dissector.c | ||
get_stack_raw_tp.c | ||
global_data_init.c | ||
global_data.c | ||
hashmap.c | ||
kfree_skb.c | ||
l4lb_all.c | ||
link_pinning.c | ||
map_lock.c | ||
mmap.c | ||
modify_return.c | ||
ns_current_pid_tgid.c | ||
obj_name.c | ||
perf_branches.c | ||
perf_buffer.c | ||
pinning.c | ||
pkt_access.c | ||
pkt_md_access.c | ||
probe_user.c | ||
prog_run_xattr.c | ||
queue_stack_map.c | ||
raw_tp_writable_reject_nbd_invalid.c | ||
raw_tp_writable_test_run.c | ||
rdonly_maps.c | ||
reference_tracking.c | ||
section_names.c | ||
select_reuseport.c | ||
send_signal_sched_switch.c | ||
send_signal.c | ||
signal_pending.c | ||
sk_assign.c | ||
skb_ctx.c | ||
skeleton.c | ||
sockmap_basic.c | ||
sockmap_ktls.c | ||
sockmap_listen.c | ||
sockopt_inherit.c | ||
sockopt_multi.c | ||
sockopt_sk.c | ||
sockopt.c | ||
spinlock.c | ||
stacktrace_build_id_nmi.c | ||
stacktrace_build_id.c | ||
stacktrace_map_raw_tp.c | ||
stacktrace_map.c | ||
tailcalls.c | ||
task_fd_query_rawtp.c | ||
task_fd_query_tp.c | ||
tcp_estats.c | ||
tcp_rtt.c | ||
test_global_funcs.c | ||
test_lsm.c | ||
test_overhead.c | ||
tp_attach_query.c | ||
trampoline_count.c | ||
vmlinux.c | ||
xdp_adjust_tail.c | ||
xdp_attach.c | ||
xdp_bpf2bpf.c | ||
xdp_info.c | ||
xdp_noinline.c | ||
xdp_perf.c | ||
xdp.c |