mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-11-24 06:40:54 +07:00
a6c32317cd
11887 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Nicholas Fraser
|
b0363faf98 |
perf symbols: Fix return value when loading PE DSO
[ Upstream commit 77771a97011fa9146ccfaf2983a3a2885dc57b6f ]
The first time dso__load() was called on a PE file it always returned -1
error. This caused the first call to map__find_symbol() to always fail
on a PE file so the first sample from each PE file always had symbol
<unknown>. Subsequent samples succeed however because the DSO is already
loaded.
This fixes dso__load() to return 0 when successfully loading a DSO with
libbfd.
Fixes:
|
||
Dmitry Safonov
|
c7a1a092d3 |
perf symbols: Use (long) for iterator for bfd symbols
[ Upstream commit 96de68fff5ded8833bf5832658cb43c54f86ff6c ]
GCC (GCC) 8.4.0 20200304 fails to build perf with:
: util/symbol.c: In function 'dso__load_bfd_symbols':
: util/symbol.c:1626:16: error: comparison of integer expressions of different signednes
: for (i = 0; i < symbols_count; ++i) {
: ^
: util/symbol.c:1632:16: error: comparison of integer expressions of different signednes
: while (i + 1 < symbols_count &&
: ^
: util/symbol.c:1637:13: error: comparison of integer expressions of different signednes
: if (i + 1 < symbols_count &&
: ^
: cc1: all warnings being treated as errors
It's unlikely that the symtable will be that big, but the fix is an
oneliner and as perf has CORE_CFLAGS += -Wextra, which makes build to
fail together with CORE_CFLAGS += -Werror
Fixes:
|
||
John Garry
|
5132b4f248 |
perf vendor events arm64: Fix Ampere eMag event typo
[ Upstream commit 2bf797be81fa808f05f1a7a65916619132256a27 ]
The "briefdescription" for event 0x35 has a typo - fix it.
Fixes:
|
||
Arnaldo Carvalho de Melo
|
100ba40217 |
perf tools: Fix DSO filtering when not finding a map for a sampled address
[ Upstream commit c69bf11ad3d30b6bf01cfa538ddff1a59467c734 ]
When we lookup an address and don't find a map we should filter that
sample if the user specified a list of --dso entries to filter on, fix
it.
Before:
$ perf script
sleep 274800 2843.556162: 1 cycles:u: ffffffffbb26bff4 [unknown] ([unknown])
sleep 274800 2843.556168: 1 cycles:u: ffffffffbb2b047d [unknown] ([unknown])
sleep 274800 2843.556171: 1 cycles:u: ffffffffbb2706b2 [unknown] ([unknown])
sleep 274800 2843.556174: 6 cycles:u: ffffffffbb2b0267 [unknown] ([unknown])
sleep 274800 2843.556176: 59 cycles:u: ffffffffbb2b03b1 [unknown] ([unknown])
sleep 274800 2843.556180: 691 cycles:u: ffffffffbb26bff4 [unknown] ([unknown])
sleep 274800 2843.556189: 9160 cycles:u: 7fa9550eeaa3 __GI___tunables_init+0xf3 (/usr/lib64/ld-2.32.so)
sleep 274800 2843.556312: 86937 cycles:u: 7fa9550e157b _dl_lookup_symbol_x+0x4b (/usr/lib64/ld-2.32.so)
$
So we have some samples we somehow didn't find in a map for, if we now
do:
$ perf report --stdio --dso /usr/lib64/ld-2.32.so
# dso: /usr/lib64/ld-2.32.so
#
# Total Lost Samples: 0
#
# Samples: 8 of event 'cycles:u'
# Event count (approx.): 96856
#
# Overhead Command Symbol
# ........ ....... ........................
#
89.76% sleep [.] _dl_lookup_symbol_x
9.46% sleep [.] __GI___tunables_init
0.71% sleep [k] 0xffffffffbb26bff4
0.06% sleep [k] 0xffffffffbb2b03b1
0.01% sleep [k] 0xffffffffbb2b0267
0.00% sleep [k] 0xffffffffbb2706b2
0.00% sleep [k] 0xffffffffbb2b047d
$
After this patch we get the right output with just entries for the DSOs
specified in --dso:
$ perf report --stdio --dso /usr/lib64/ld-2.32.so
# dso: /usr/lib64/ld-2.32.so
#
# Total Lost Samples: 0
#
# Samples: 8 of event 'cycles:u'
# Event count (approx.): 96856
#
# Overhead Command Symbol
# ........ ....... ........................
#
89.76% sleep [.] _dl_lookup_symbol_x
9.46% sleep [.] __GI___tunables_init
$
#
Fixes:
|
||
Jean-Philippe Brucker
|
cb14bbbb7b |
tools: Factor HOSTCC, HOSTLD, HOSTAR definitions
commit c8a950d0d3b926a02c7b2e713850d38217cec3d1 upstream. Several Makefiles in tools/ need to define the host toolchain variables. Move their definition to tools/scripts/Makefile.include Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/bpf/20201110164310.2600671-2-jean-philippe@linaro.org Cc: Alistair Delva <adelva@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Adrian Hunter
|
09b3e0bc8e |
perf intel-pt: Fix 'CPU too large' error
commit 5501e9229a80d95a1ea68609f44c447a75d23ed5 upstream. In some cases, the number of cpus (nr_cpus_online) is confused with the maximum cpu number (nr_cpus_avail), which results in the error in the example below: Example on system with 8 cpus: Before: # echo 0 > /sys/devices/system/cpu/cpu2/online # ./perf record --kcore -e intel_pt// taskset --cpu-list 7 uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.147 MB perf.data ] # ./perf script --itrace=e Requested CPU 7 too large. Consider raising MAX_NR_CPUS 0x25908 [0x8]: failed to process type: 68 [Invalid argument] After: # ./perf script --itrace=e # Fixes: |
||
Arnaldo Carvalho de Melo
|
b931ea024e |
perf probe: Fix memory leak when synthesizing SDT probes
[ Upstream commit 5149303fdfe5c67ddb51c911e23262f781cd75eb ]
The argv_split() function must be paired with argv_free(), else we must
keep a reference to the argv array received or do the freeing ourselves,
in synthesize_sdt_probe_command() we were simply leaking that argv[]
array.
Fixes:
|
||
Zheng Zengkai
|
98c9b3aeff |
perf record: Fix memory leak when using '--user-regs=?' to list registers
[ Upstream commit 2eb5dd418034ecea2f7031e3d33f2991a878b148 ]
When using 'perf record's option '-I' or '--user-regs=' along with
argument '?' to list available register names, memory of variable 'os'
allocated by strdup() needs to be released before __parse_regs()
returns, otherwise memory leak will occur.
Fixes:
|
||
Kajol Jain
|
33e8ef090b |
perf test: Fix metric parsing test
[ Upstream commit b2ce5dbc15819ea4bef47dbd368239cb1e965158 ] Commit |
||
Namhyung Kim
|
3647b89442 |
perf test: Use generic event for expand_libpfm_events()
[ Upstream commit 9b0a7836359443227c9af101f7aea8412e739458 ]
I found that the UNHALTED_CORE_CYCLES event is only available in the
Intel machines and it makes other vendors/archs fail on the test. As
libpfm4 can parse the generic events like cycles, let's use them.
Fixes:
|
||
Masami Hiramatsu
|
a9ffd0484e |
perf probe: Change function definition check due to broken DWARF
Since some gcc generates a broken DWARF which lacks DW_AT_declaration attribute from the subprogram DIE of function prototype. (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060) So, in addition to the DW_AT_declaration check, we also check the subprogram DIE has DW_AT_inline or actual entry pc. Committer testing: # cat /etc/fedora-release Fedora release 33 (Thirty Three) # Before: # perf test vfs_getname 78: Use vfs_getname probe to get syscall args filenames : FAILED! 79: Check open filename arg using perf trace + vfs_getname : FAILED! 81: Add vfs_getname probe to get syscall args filenames : FAILED! # After: # perf test vfs_getname 78: Use vfs_getname probe to get syscall args filenames : Ok 79: Check open filename arg using perf trace + vfs_getname : Ok 81: Add vfs_getname probe to get syscall args filenames : Ok # Reported-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Link: http://lore.kernel.org/lkml/160645613571.2824037.7441351537890235895.stgit@devnote2 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Masami Hiramatsu
|
ab4200c17b |
perf probe: Fix to die_entrypc() returns error correctly
Fix die_entrypc() to return error correctly if the DIE has no
DW_AT_ranges attribute. Since dwarf_ranges() will treat the case as an
empty ranges and return 0, we have to check it by ourselves.
Fixes:
|
||
Namhyung Kim
|
c0ee1d5ae8 |
perf stat: Use proper cpu for shadow stats
Currently perf stat shows some metrics (like IPC) for defined events.
But when no aggregation mode is used (-A option), it shows incorrect
values since it used a value from a different cpu.
Before:
$ perf stat -aA -e cycles,instructions sleep 1
Performance counter stats for 'system wide':
CPU0 116,057,380 cycles
CPU1 86,084,722 cycles
CPU2 99,423,125 cycles
CPU3 98,272,994 cycles
CPU0 53,369,217 instructions # 0.46 insn per cycle
CPU1 33,378,058 instructions # 0.29 insn per cycle
CPU2 58,150,086 instructions # 0.50 insn per cycle
CPU3 40,029,703 instructions # 0.34 insn per cycle
1.001816971 seconds time elapsed
So the IPC for CPU1 should be 0.38 (= 33,378,058 / 86,084,722)
but it was 0.29 (= 33,378,058 / 116,057,380) and so on.
After:
$ perf stat -aA -e cycles,instructions sleep 1
Performance counter stats for 'system wide':
CPU0 109,621,384 cycles
CPU1 159,026,454 cycles
CPU2 99,460,366 cycles
CPU3 124,144,142 cycles
CPU0 44,396,706 instructions # 0.41 insn per cycle
CPU1 120,195,425 instructions # 0.76 insn per cycle
CPU2 44,763,978 instructions # 0.45 insn per cycle
CPU3 69,049,079 instructions # 0.56 insn per cycle
1.001910444 seconds time elapsed
Fixes:
|
||
Namhyung Kim
|
aa50d953c1 |
perf record: Synthesize cgroup events only if needed
It didn't check the tool->cgroup_events bit which is set when the
--all-cgroups option is given. Without it, samples will not have cgroup
info so no reason to synthesize.
We can check the PERF_RECORD_CGROUP records after running perf record
*WITHOUT* the --all-cgroups option:
Before:
$ perf report -D | grep CGROUP
0 0 0x8430 [0x38]: PERF_RECORD_CGROUP cgroup: 1 /
CGROUP events: 1
CGROUP events: 0
CGROUP events: 0
After:
$ perf report -D | grep CGROUP
CGROUP events: 0
CGROUP events: 0
CGROUP events: 0
Committer testing:
Before:
# perf record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.208 MB perf.data (10003 samples) ]
# perf report -D | grep "CGROUP events"
CGROUP events: 146
CGROUP events: 0
CGROUP events: 0
#
After:
# perf record -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.208 MB perf.data (10448 samples) ]
# perf report -D | grep "CGROUP events"
CGROUP events: 0
CGROUP events: 0
CGROUP events: 0
#
With all-cgroups:
# perf record --all-cgroups -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.374 MB perf.data (11526 samples) ]
# perf report -D | grep "CGROUP events"
CGROUP events: 146
CGROUP events: 0
CGROUP events: 0
#
Fixes:
|
||
Zhen Lei
|
9713070028 |
perf diff: Fix error return value in __cmd_diff()
An appropriate return value should be set on the failed path.
Fixes:
|
||
Arnaldo Carvalho de Melo
|
3b13eaf0ba |
perf tools: Update copy of libbpf's hashmap.c
To pick the changes in:
|
||
Ian Rogers
|
568beb2795 |
perf test: Avoid an msan warning in a copied stack.
This fix is for a failure that occurred in the DWARF unwind perf test. Stack unwinders may probe memory when looking for frames. Memory sanitizer will poison and track uninitialized memory on the stack, and on the heap if the value is copied to the heap. This can lead to false memory sanitizer failures for the use of an uninitialized value. Avoid this problem by removing the poison on the copied stack. The full msan failure with track origins looks like: ==2168==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x559ceb10755b in handle_cfi elfutils/libdwfl/frame_unwind.c:648:8 #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 #7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 #8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 #9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 #10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 #11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) #12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 #13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 #14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 #15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 #16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 #17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 #18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 #19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 #20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 #21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 #22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 #23 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceb106acf in __libdwfl_frame_reg_set elfutils/libdwfl/frame_unwind.c:77:22 #1 0x559ceb106acf in handle_cfi elfutils/libdwfl/frame_unwind.c:627:13 #2 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #3 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #4 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #5 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #6 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #7 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 #8 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 #9 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 #10 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 #11 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 #12 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) #13 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 #14 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 #15 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 #16 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 #17 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 #18 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 #19 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 #20 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 #21 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 #22 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 #23 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 #24 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceb106a54 in handle_cfi elfutils/libdwfl/frame_unwind.c:613:9 #1 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #2 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #3 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #4 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #5 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #6 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 #7 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 #8 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 #9 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 #10 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 #11 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) #12 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 #13 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 #14 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 #15 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 #16 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 #17 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 #18 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 #19 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 #20 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 #21 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 #22 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 #23 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559ceaff8800 in memory_read tools/perf/util/unwind-libdw.c:156:10 #1 0x559ceb10f053 in expr_eval elfutils/libdwfl/frame_unwind.c:501:13 #2 0x559ceb1060cc in handle_cfi elfutils/libdwfl/frame_unwind.c:603:18 #3 0x559ceb105448 in __libdwfl_frame_unwind elfutils/libdwfl/frame_unwind.c:741:4 #4 0x559ceb0ece90 in dwfl_thread_getframes elfutils/libdwfl/dwfl_frame.c:435:7 #5 0x559ceb0ec6b7 in get_one_thread_frames_cb elfutils/libdwfl/dwfl_frame.c:379:10 #6 0x559ceb0ec6b7 in get_one_thread_cb elfutils/libdwfl/dwfl_frame.c:308:17 #7 0x559ceb0ec6b7 in dwfl_getthreads elfutils/libdwfl/dwfl_frame.c:283:17 #8 0x559ceb0ec6b7 in getthread elfutils/libdwfl/dwfl_frame.c:354:14 #9 0x559ceb0ec6b7 in dwfl_getthread_frames elfutils/libdwfl/dwfl_frame.c:388:10 #10 0x559ceaff6ae6 in unwind__get_entries tools/perf/util/unwind-libdw.c:236:8 #11 0x559ceabc9dbc in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:111:8 #12 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 #13 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) #14 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 #15 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 #16 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 #17 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 #18 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 #19 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 #20 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 #21 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 #22 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 #23 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 #24 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 #25 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was stored to memory at #0 0x559cea9027d9 in __msan_memcpy llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:1558:3 #1 0x559cea9d2185 in sample_ustack tools/perf/arch/x86/tests/dwarf-unwind.c:41:2 #2 0x559cea9d202c in test__arch_unwind_sample tools/perf/arch/x86/tests/dwarf-unwind.c:72:9 #3 0x559ceabc9cbd in test_dwarf_unwind__thread tools/perf/tests/dwarf-unwind.c:106:6 #4 0x559ceabca5cf in test_dwarf_unwind__compare tools/perf/tests/dwarf-unwind.c:138:26 #5 0x7f812a6865b0 in bsearch (libc.so.6+0x4e5b0) #6 0x559ceabca871 in test_dwarf_unwind__krava_3 tools/perf/tests/dwarf-unwind.c:162:2 #7 0x559ceabca926 in test_dwarf_unwind__krava_2 tools/perf/tests/dwarf-unwind.c:169:9 #8 0x559ceabca946 in test_dwarf_unwind__krava_1 tools/perf/tests/dwarf-unwind.c:174:9 #9 0x559ceabcae12 in test__dwarf_unwind tools/perf/tests/dwarf-unwind.c:211:8 #10 0x559ceabbc4ab in run_test tools/perf/tests/builtin-test.c:418:9 #11 0x559ceabbc4ab in test_and_print tools/perf/tests/builtin-test.c:448:9 #12 0x559ceabbac70 in __cmd_test tools/perf/tests/builtin-test.c:669:4 #13 0x559ceabbac70 in cmd_test tools/perf/tests/builtin-test.c:815:9 #14 0x559cea960e30 in run_builtin tools/perf/perf.c:313:11 #15 0x559cea95fbce in handle_internal_command tools/perf/perf.c:365:8 #16 0x559cea95fbce in run_argv tools/perf/perf.c:409:2 #17 0x559cea95fbce in main tools/perf/perf.c:539:3 Uninitialized value was created by an allocation of 'bf' in the stack frame of function 'perf_event__synthesize_mmap_events' #0 0x559ceafc5f60 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:445 SUMMARY: MemorySanitizer: use-of-uninitialized-value elfutils/libdwfl/frame_unwind.c:648:8 in handle_cfi Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: clang-built-linux@googlegroups.com Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sandeep Dasgupta <sdasgup@google.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20201113182053.754625-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Al Grant
|
1c756cd429 |
perf inject: Fix file corruption due to event deletion
"perf inject" can create corrupt files when synthesizing sample events from AUX data. This happens when in the input file, the first event (for the AUX data) has a different sample_type from the second event (generally dummy). Specifically, they differ in the bits that indicate the standard fields appended to perf records in the mmap buffer. "perf inject" deletes the first event and moves up the second event to first position. The problem is with the synthetic PERF_RECORD_MMAP (etc.) events created by "perf record". Since these are synthetic versions of events which are normally produced by the kernel, they have to have the standard fields appended as described by sample_type. "perf record" fills these in with zeroes, including the IDENTIFIER field; perf readers interpret records with zero IDENTIFIER using the descriptor for the first event in the file. Since "perf inject" changes the first event, these synthetic records are then processed with the wrong value of sample_type, and the perf reader reads bad data, reports on incorrect length records etc. Mismatching sample_types are seen with "perf record -e cs_etm//", where the AUX event has TID|TIME|CPU|IDENTIFIER and the dummy event has TID|TIME|IDENTIFIER. Perhaps they could be the same, but it isn't normally a problem if they aren't - perf has no problems reading the file. The sample_types have to agree on the position of IDENTIFIER, because that's how perf finds the right event descriptor in the first place, but they don't normally have to agree on other fields, and perf doesn't check that they do. The problem is specific to the way "perf inject" reorganizes the events and the way synthetic MMAP events are recorded with a zero identifier. A simple solution is to stop "perf inject" deleting the tracing event. Committer testing Removed the now unused 'evsel' variable, update the comment about the evsel removal not being performed anymore, and apply the patch manually as it failed with this warning: warning: Patch sent with format=flowed; space at the end of lines might be lost. Testing it with: $ perf bench internals inject-build-id # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 8.543 msec (+- 0.130 msec) Average time per event: 0.838 usec (+- 0.013 usec) Average memory usage: 12717 KB (+- 9 KB) Average build-id-all injection took: 5.710 msec (+- 0.058 msec) Average time per event: 0.560 usec (+- 0.006 usec) Average memory usage: 12079 KB (+- 7 KB) $ Signed-off-by: Al Grant <al.grant@arm.com> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> LPU-Reference: b9cf5611-daae-2390-3439-6617f8f0a34b@foss.arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
dd94ac807a |
perf test: Update branch sample pattern for cs-etm
Since the commit |
||
Leo Yan
|
db2ac2e49e |
perf test: Fix a typo in cs-etm testing
Fix a typo: s/devce_name/device_name.
Fixes:
|
||
Arnaldo Carvalho de Melo
|
db1a8b97a0 |
tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem memcpy'
To bring in the change made in this cset: |
||
Leo Yan
|
b0e5a05cc9 |
perf lock: Don't free "lock_seq_stat" if read_count isn't zero
When execute command "perf lock report", it hits failure and outputs log
as follows:
perf: builtin-lock.c:623: report_lock_release_event: Assertion `!(seq->read_count < 0)' failed.
Aborted
This is an imbalance issue. The locking sequence structure
"lock_seq_stat" contains the reader counter and it is used to check if
the locking sequence is balance or not between acquiring and releasing.
If the tool wrongly frees "lock_seq_stat" when "read_count" isn't zero,
the "read_count" will be reset to zero when allocate a new structure at
the next time; thus it causes the wrong counting for reader and finally
results in imbalance issue.
To fix this issue, if detects "read_count" is not zero (means still have
read user in the locking sequence), goto the "end" tag to skip freeing
structure "lock_seq_stat".
Fixes:
|
||
Leo Yan
|
e24a87b54e |
perf lock: Correct field name "flags"
The tracepoint "lock:lock_acquire" contains field "flags" but not
"flag". Current code wrongly retrieves value from field "flag" and it
always gets zero for the value, thus "perf lock" doesn't report the
correct result.
This patch replaces the field name "flag" with "flags", so can read out
the correct flags for locking.
Fixes:
|
||
Namhyung Kim
|
2c589d933e |
perf tools: Add missing swap for cgroup events
It was missed to add a swap function for PERF_RECORD_CGROUP.
Fixes:
|
||
Jiri Olsa
|
fe01adb723 |
perf tools: Add missing swap for ino_generation
We are missing swap for ino_generation field.
Fixes:
|
||
Jiri Olsa
|
6311951d4f |
perf tools: Initialize output buffer in build_id__sprintf
We display garbage for undefined build_id objects, because we don't initialize the output buffer. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20201101233103.3537427-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Song Liu
|
86449b12f6 |
perf hists browser: Increase size of 'buf' in perf_evsel__hists_browse()
Making perf with gcc-9.1.1 generates the following warning:
CC ui/browsers/hists.o
ui/browsers/hists.c: In function 'perf_evsel__hists_browse':
ui/browsers/hists.c:3078:61: error: '%d' directive output may be \
truncated writing between 1 and 11 bytes into a region of size \
between 2 and 12 [-Werror=format-truncation=]
3078 | "Max event group index to sort is %d (index from 0 to %d)",
| ^~
ui/browsers/hists.c:3078:7: note: directive argument in the range [-2147483648, 8]
3078 | "Max event group index to sort is %d (index from 0 to %d)",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/stdio.h:937,
from ui/browsers/hists.c:5:
IOW, the string in line 3078 might be too long for buf[] of 64 bytes.
Fix this by increasing the size of buf[] to 128.
Fixes:
|
||
Arnaldo Carvalho de Melo
|
d0e7b0c71f |
perf scripting python: Avoid declaring function pointers with a visibility attribute
To avoid this: util/scripting-engines/trace-event-python.c: In function 'python_start_script': util/scripting-engines/trace-event-python.c:1595:2: error: 'visibility' attribute ignored [-Werror=attributes] 1595 | PyMODINIT_FUNC (*initfunc)(void); | ^~~~~~~~~~~~~~ That started breaking when building with PYTHON=python3 and these gcc versions (I haven't checked with the clang ones, maybe it breaks there as well): # export PERF_TARBALL=http://192.168.86.5/perf/perf-5.9.0.tar.xz # dm fedora:33 fedora:rawhide 1 107.80 fedora:33 : Ok gcc (GCC) 10.2.1 20201005 (Red Hat 10.2.1-5), clang version 11.0.0 (Fedora 11.0.0-1.fc33) 2 92.47 fedora:rawhide : Ok gcc (GCC) 10.2.1 20201016 (Red Hat 10.2.1-6), clang version 11.0.0 (Fedora 11.0.0-1.fc34) # Avoid that by ditching that 'initfunc' function pointer with its: #define Py_EXPORTED_SYMBOL _attribute_ ((visibility ("default"))) #define PyMODINIT_FUNC Py_EXPORTED_SYMBOL PyObject* And just call PyImport_AppendInittab() at the end of the ifdef python3 block with the functions that were being attributed to that initfunc. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Peter Zijlstra
|
9ae1e990f1 |
perf tools: Remove broken __no_tail_call attribute
The GCC specific __attribute__((optimize)) attribute does not what is commonly expected and is explicitly recommended against using in production code by the GCC people. Unlike what is often expected, it doesn't add to the optimization flags, but it fully replaces them, loosing any and all optimization flags provided by the compiler commandline. The only guaranteed upon means of inhibiting tail-calls is by placing a volatile asm with side-effects after the call such that the tail-call simply cannot be done. Given the original commit wasn't specific on which calls were the problem, this removal might re-introduce the problem, which can then be re-analyzed and cured properly. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Miguel Ojeda <ojeda@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ian Rogers <irogers@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Kook <keescook@chromium.org> Cc: Martin Liška <mliska@suse.cz> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/20201028081123.GT2628@hirez.programming.kicks-ass.net Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
0dfbe4c646 |
perf vendor events: Fix DRAM_BW_Use 0 issue for CLX/SKX
Ian reports an issue that the metric DRAM_BW_Use often remains 0. The metric expression for DRAM_BW_Use on CLX/SKX: "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 1000000000 ) / duration_time" The counts of uncore_imc/cas_count_read/ and uncore_imc/cas_count_write/ are scaled up by 64, that is to turn a count of cache lines into bytes, the count is then divided by 1000000000 to give GB. However, the counts of uncore_imc/cas_count_read/ and uncore_imc/cas_count_write/ have been scaled yet. The scale values are from sysfs, such as /sys/devices/uncore_imc_0/events/cas_count_read.scale. It's 6.103515625e-5 (64 / 1024.0 / 1024.0). So if we use original metric expression, the result is not correct. But the difficulty is, for SKL client, the counts are not scaled. The metric expression for DRAM_BW_Use on SKL: "64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) / 1000000 / duration_time / 1000" root@kbl-ppc:~# perf stat -M DRAM_BW_Use -a -- sleep 1 Performance counter stats for 'system wide': 190 arb/event=0x84,umask=0x1/ # 1.86 DRAM_BW_Use 29,093,178 arb/event=0x81,umask=0x1/ 1,000,703,287 ns duration_time 1.000703287 seconds time elapsed The result is expected. So the easy way is just change the metric expression for CLX/SKX. This patch changes the metric expression to: "( ( ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) * 1048576 ) / 1000000000 ) / duration_time" 1048576 = 1024 * 1024. Before (tested on CLX): root@lkp-csl-2sp5 ~# perf stat -M DRAM_BW_Use -a -- sleep 1 Performance counter stats for 'system wide': 765.35 MiB uncore_imc/cas_count_read/ # 0.00 DRAM_BW_Use 5.42 MiB uncore_imc/cas_count_write/ 1001515088 ns duration_time 1.001515088 seconds time elapsed After: root@lkp-csl-2sp5 ~# perf stat -M DRAM_BW_Use -a -- sleep 1 Performance counter stats for 'system wide': 767.95 MiB uncore_imc/cas_count_read/ # 0.80 DRAM_BW_Use 5.02 MiB uncore_imc/cas_count_write/ 1001900010 ns duration_time 1.001900010 seconds time elapsed Fixes: |
||
Stanislav Ivanichkin
|
a6293f36ac |
perf trace: Fix segfault when trying to trace events by cgroup
# ./perf trace -e sched:sched_switch -G test -a sleep 1
perf: Segmentation fault
Obtained 11 stack frames.
./perf(sighandler_dump_stack+0x43) [0x55cfdc636db3]
/lib/x86_64-linux-gnu/libc.so.6(+0x3efcf) [0x7fd23eecafcf]
./perf(parse_cgroups+0x36) [0x55cfdc673f36]
./perf(+0x3186ed) [0x55cfdc70d6ed]
./perf(parse_options_subcommand+0x629) [0x55cfdc70e999]
./perf(cmd_trace+0x9c2) [0x55cfdc5ad6d2]
./perf(+0x1e8ae0) [0x55cfdc5ddae0]
./perf(+0x1e8ded) [0x55cfdc5ddded]
./perf(main+0x370) [0x55cfdc556f00]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fd23eeadb96]
./perf(_start+0x29) [0x55cfdc557389]
Segmentation fault
#
It happens because "struct trace" in option->value is passed to the
parse_cgroups function instead of "struct evlist".
Fixes:
|
||
Tommi Rantala
|
ab8bf5f2e0 |
perf tools: Fix crash with non-jited bpf progs
The addr in PERF_RECORD_KSYMBOL events for non-jited bpf progs points to the bpf interpreter, ie. within kernel text section. When processing the unregister event, this causes unexpected removal of vmlinux_map, crashing perf later in cleanup: # perf record -- timeout --signal=INT 2s /usr/share/bcc/tools/execsnoop PCOMM PID PPID RET ARGS [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.208 MB perf.data (5155 samples) ] perf: tools/include/linux/refcount.h:131: refcount_sub_and_test: Assertion `!(new > val)' failed. Aborted (core dumped) # perf script -D|grep KSYM 0 0xa40 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_f958f6eb72ef5af6 0 0xab0 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_8c42dee26e8cd4c2 0 0xb20 [0x48]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b530 len 0 type 1 flags 0x0 name bpf_prog_f958f6eb72ef5af6 108563691893 0x33d98 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3b0 len 0 type 1 flags 0x0 name bpf_prog_bc5697a410556fc2_syscall__execve 108568518458 0x34098 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3f0 len 0 type 1 flags 0x0 name bpf_prog_45e2203c2928704d_do_ret_sys_execve 109301967895 0x34830 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3b0 len 0 type 1 flags 0x1 name bpf_prog_bc5697a410556fc2_syscall__execve 109302007356 0x348b0 [0x58]: PERF_RECORD_KSYMBOL addr ffffffffa9b6b3f0 len 0 type 1 flags 0x1 name bpf_prog_45e2203c2928704d_do_ret_sys_execve perf: tools/include/linux/refcount.h:131: refcount_sub_and_test: Assertion `!(new > val)' failed. Here the addresses match the bpf interpreter: # grep -e ffffffffa9b6b530 -e ffffffffa9b6b3b0 -e ffffffffa9b6b3f0 /proc/kallsyms ffffffffa9b6b3b0 t __bpf_prog_run224 ffffffffa9b6b3f0 t __bpf_prog_run192 ffffffffa9b6b530 t __bpf_prog_run32 Fix by not allowing vmlinux_map to be removed by PERF_RECORD_KSYMBOL unregister event. Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201016114718.54332-1-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
263e452eff |
tools headers UAPI: Update process_madvise affected files
To pick the changes from:
|
||
Arnaldo Carvalho de Melo
|
e555b4b8d7 |
perf tools: Update copy of libbpf's hashmap.c
To pick the changes in: |
||
Justin M. Forbes
|
b773ea6505 |
perf tools: Remove LTO compiler options when building perl support
To avoid breaking the build by mixing files compiled with things coming
from distro specific compiler options for perl with the rest of perf,
i.e. to avoid this:
`.gnu.debuglto_.debug_macro' referenced in section `.gnu.debuglto_.debug_macro' of /tmp/build/perf/util/scripting-engines/perf-in.o: defined in discarded section `.gnu.debuglto_.debug_macro[wm4.stdcpredef.h.19.8dc41bed5d9037ff9622e015fb5f0ce3]' of /tmp/build/perf/util/scripting-engines/perf-in.o
Noticed on Fedora 33.
Signed-off-by: Justin M. Forbes <jforbes@fedoraproject.org>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1593431
Cc: Jiri Olsa <jolsa@redhat.com>
Link:
|
||
Linus Torvalds
|
9d9af1007b |
perf tools changes for v5.10: 1st batch
- cgroup improvements for 'perf stat', allowing for compact specification of events and cgroups in the command line. - Support per thread topdown metrics in 'perf stat'. - Support sample-read topdown metric group in 'perf record' - Show start of latency in addition to its start in 'perf sched latency'. - Add min, max to 'perf script' futex-contention output, in addition to avg. - Allow usage of 'perf_event_attr->exclusive' attribute via the new ':e' event modifier. - Add 'snapshot' command to 'perf record --control', using it with Intel PT. - Support FIFO file names as alternative options to 'perf record --control'. - Introduce branch history "streams", to compare 'perf record' runs with 'perf diff' based on branch records and report hot streams. - Support PE executable symbol tables using libbfd, to profile, for instance, wine binaries. - Add filter support for option 'perf ftrace -F/--funcs'. - Allow configuring the 'disassembler_style' 'perf annotate' knob via 'perf config' - Update CascadelakeX and SkylakeX JSON vendor events files. - Add support for parsing perchip/percore JSON vendor events. - Add power9 hv_24x7 core level metric events. - Add L2 prefetch, ITLB instruction fetch hits JSON events for AMD zen1. - Enable Family 19h users by matching Zen2 AMD vendor events. - Use debuginfod in 'perf probe' when required debug files not found locally. - Display negative tid in non-sample events in 'perf script'. - Make GTK2 support opt-in - Add build test with GTK+ - Add missing -lzstd to the fast path feature detection - Add scripts to auto generate 'mmap', 'mremap' string<->id tables for use in 'perf trace'. - Show python test script in verbose mode. - Fix uncore metric expressions - Msan uninitialized use fixes. - Use condition variables in 'perf bench numa' - Autodetect python3 binary in systems without python2. - Support md5 build ids in addition to sha1. - Add build id 'perf test' regression test. - Fix printable strings in python3 scripts. - Fix off by ones in 'perf trace' in arches using libaudit. - Fix JSON event code for events referencing std arch events. - Introduce 'perf test' shell script for Arm CoreSight testing. - Add rdtsc() for Arm64 for used in the PERF_RECORD_TIME_CONV metadata event and in 'perf test tsc'. - 'perf c2c' improvements: Add "RMT Load Hit" metric, "Total Stores", fixes and documentation update. - Fix usage of reloc_sym in 'perf probe' when using both kallsyms and debuginfo files. - Do not print 'Metric Groups:' unnecessarily in 'perf list' - Refcounting fixes in the event parsing code. - Add expand cgroup event 'perf test' entry. - Fix out of bounds CPU map access when handling armv8_pmu events in 'perf stat'. - Add build-id injection 'perf bench' benchmark. - Enter namespace when reading build-id in 'perf inject'. - Do not load map/dso when injecting build-id speeding up the 'perf inject' process. - Add --buildid-all option to avoid processing all samples, just the mmap metadata events. - Add feature test to check if libbfd has buildid support - Add 'perf test' entry for PE binary format support. - Fix typos in power8 PMU vendor events JSON files. - Hide libtraceevent non API functions. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Test results: The first ones are container based builds of tools/perf with and without libelf support. Where clang is available, it is also used to build perf with/without libelf, and building with LIBCLANGLLVM=1 (built-in clang) with gcc and clang when clang and its devel libraries are installed. The objtool and samples/bpf/ builds are disabled now that I'm switching from using the sources in a local volume to fetching them from a http server to build it inside the container, to make it easier to build in a container cluster. Those will come back later. Several are cross builds, the ones with -x-ARCH and the android one, and those may not have all the features built, due to lack of multi-arch devel packages, available and being used so far on just a few, like debian:experimental-x-{arm64,mipsel}. The 'perf test' one will perform a variety of tests exercising tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands with a variety of command line event specifications to then intercept the sys_perf_event syscall to check that the perf_event_attr fields are set up as expected, among a variety of other unit tests. Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/ with a variety of feature sets, exercising the build with an incomplete set of features as well as with a complete one. It is planned to have it run on each of the containers mentioned above, using some container orchestration infrastructure. Get in contact if interested in helping having this in place. $ grep "model name" -m1 /proc/cpuinfo model name: AMD Ryzen 9 3900X 12-Core Processor $ export PERF_TARBALL=http://192.168.122.1/perf/perf-5.9.0-rc7.tar.xz $ dm Thu 15 Oct 2020 01:10:56 PM -03 1 67.40 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0, clang version 3.8.0 (tags/RELEASE_380/final) 2 69.01 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822, clang version 3.8.1 (tags/RELEASE_381/final) 3 70.79 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0, clang version 4.0.0 (tags/RELEASE_400/final) 4 79.89 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.0 (tags/RELEASE_500/final) (based on LLVM 5.0.0) 5 80.88 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0, Alpine clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1) 6 83.88 alpine:3.9 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 5.0.1 (tags/RELEASE_502/final) (based on LLVM 5.0.1) 7 107.87 alpine:3.10 : Ok gcc (Alpine 8.3.0) 8.3.0, Alpine clang version 8.0.0 (tags/RELEASE_800/final) (based on LLVM 8.0.0) 8 115.43 alpine:3.11 : Ok gcc (Alpine 9.3.0) 9.3.0, Alpine clang version 9.0.0 (https://git.alpinelinux.org/aports f7f0d2c2b8bcd6a5843401a9a702029556492689) (based on LLVM 9.0.0) 9 106.80 alpine:3.12 : Ok gcc (Alpine 9.3.0) 9.3.0, Alpine clang version 10.0.0 (https://gitlab.alpinelinux.org/alpine/aports.git 7445adce501f8473efdb93b17b5eaf2f1445ed4c) 10 114.06 alpine:edge : Ok gcc (Alpine 10.2.0) 10.2.0, Alpine clang version 10.0.1 11 70.42 alt:p8 : Ok x86_64-alt-linux-gcc (GCC) 5.3.1 20151207 (ALT p8 5.3.1-alt3.M80P.1), clang version 3.8.0 (tags/RELEASE_380/final) 12 98.70 alt:p9 : Ok x86_64-alt-linux-gcc (GCC) 8.4.1 20200305 (ALT p9 8.4.1-alt0.p9.1), clang version 10.0.0 13 80.37 alt:sisyphus : Ok x86_64-alt-linux-gcc (GCC) 9.3.1 20200518 (ALT Sisyphus 9.3.1-alt1), clang version 10.0.1 14 64.12 amazonlinux:1 : Ok gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2), clang version 3.6.2 (tags/RELEASE_362/final) 15 97.64 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-9), clang version 7.0.1 (Amazon Linux 2 7.0.1-1.amzn2.0.2) 16 22.70 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 17 22.72 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease) 18 26.70 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23) 19 31.86 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39) 20 113.19 centos:8 : Ok gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5), clang version 9.0.1 (Red Hat 9.0.1-2.module_el8.2.0+309+0c7b6b03) 21 57.23 clearlinux:latest : Ok gcc (Clear Linux OS for Intel Architecture) 10.2.1 20200908 releases/gcc-10.2.0-203-g127d693955, clang version 10.0.1 22 64.98 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u2) 4.9.2, Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0) 23 76.08 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, clang version 3.8.1-24 (tags/RELEASE_381/final) 24 74.49 debian:10 : Ok gcc (Debian 8.3.0-6) 8.3.0, clang version 7.0.1-8+deb10u2 (tags/RELEASE_701/final) 25 78.50 debian:experimental : Ok gcc (Debian 10.2.0-15) 10.2.0, Debian clang version 11.0.0-2 26 33.30 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 10.2.0-3) 10.2.0 27 30.96 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 9.3.0-8) 9.3.0 28 32.63 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 9.3.0-8) 9.3.0 29 30.12 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) 30 30.99 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.5.0 (tags/RELEASE_350/final) 31 68.60 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6), clang version 3.7.0 (tags/RELEASE_370/final) 32 78.92 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1), clang version 3.8.1 (tags/RELEASE_381/final) 33 26.15 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710 34 80.13 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1), clang version 3.9.1 (tags/RELEASE_391/final) 35 90.68 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2), clang version 4.0.1 (tags/RELEASE_401/final) 36 90.45 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), clang version 5.0.2 (tags/RELEASE_502/final) 37 100.88 fedora:28 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 6.0.1 (tags/RELEASE_601/final) 38 105.99 fedora:29 : Ok gcc (GCC) 8.3.1 20190223 (Red Hat 8.3.1-2), clang version 7.0.1 (Fedora 7.0.1-6.fc29) 39 111.05 fedora:30 : Ok gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2), clang version 8.0.0 (Fedora 8.0.0-3.fc30) 40 29.96 fedora:30-x-ARC-glibc : Ok arc-linux-gcc (ARC HS GNU/Linux glibc toolchain 2019.03-rc1) 8.3.1 20190225 41 27.02 fedora:30-x-ARC-uClibc : Ok arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 42 110.47 fedora:31 : Ok gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2), clang version 9.0.1 (Fedora 9.0.1-2.fc31) 43 88.78 fedora:32 : Ok gcc (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1), clang version 10.0.0 (Fedora 10.0.0-2.fc32) 44 15.92 fedora:rawhide : FAIL gcc (GCC) 10.2.1 20200916 (Red Hat 10.2.1-4), clang version 11.0.0 (Fedora 11.0.0-0.4.rc3.fc34) 45 33.58 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 9.3.0-r1 p3) 9.3.0 46 65.32 mageia:5 : Ok gcc (GCC) 4.9.2, clang version 3.5.2 (tags/RELEASE_352/final) 47 81.35 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0, clang version 3.9.1 (tags/RELEASE_391/final) 48 103.94 mageia:7 : Ok gcc (Mageia 8.4.0-1.mga7) 8.4.0, clang version 8.0.0 (Mageia 8.0.0-1.mga7) 49 91.62 manjaro:latest : Ok gcc (GCC) 10.2.0, clang version 10.0.1 50 219.87 openmandriva:cooker : Ok gcc (GCC) 10.2.0 20200723 (OpenMandriva), OpenMandriva 11.0.0-0.20200909.1 clang version 11.0.0 (/builddir/build/BUILD/llvm-project-release-11.x/clang 5cb8ffbab42358a7cdb0a67acfadb84df0779579) 51 111.76 opensuse:15.0 : Ok gcc (SUSE Linux) 7.4.1 20190905 [gcc-7-branch revision 275407], clang version 5.0.1 (tags/RELEASE_501/final 312548) 52 118.03 opensuse:15.1 : Ok gcc (SUSE Linux) 7.5.0, clang version 7.0.1 (tags/RELEASE_701/final 349238) 53 107.91 opensuse:15.2 : Ok gcc (SUSE Linux) 7.5.0, clang version 9.0.1 54 102.34 opensuse:tumbleweed : Ok gcc (SUSE Linux) 10.2.1 20200825 [revision c0746a1beb1ba073c7981eb09f55b3d993b32e5c], clang version 10.0.1 55 25.33 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1) 56 30.45 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44.0.3) 57 104.65 oraclelinux:8 : Ok gcc (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5.0.3), clang version 9.0.1 (Red Hat 9.0.1-2.0.1.module+el8.2.0+5599+9ed9ef6d) 58 26.04 ubuntu:12.04 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, Ubuntu clang version 3.0-6ubuntu3 (tags/RELEASE_30/final) (based on LLVM 3.0) 59 29.49 ubuntu:14.04 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4 60 72.95 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final) 61 26.03 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 62 25.15 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 63 24.88 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 64 25.72 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 65 25.39 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 66 25.34 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 67 84.84 ubuntu:18.04 : Ok gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, clang version 6.0.0-1ubuntu2 (tags/RELEASE_600/final) 68 27.15 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0 69 26.68 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0 70 22.38 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 71 26.35 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 72 28.58 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 73 28.18 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 74 178.55 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 75 24.58 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 76 26.89 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 77 24.81 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 78 68.90 ubuntu:19.10 : Ok gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, clang version 8.0.1-3build1 (tags/RELEASE_801/final) 79 69.31 ubuntu:20.04 : Ok gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0, clang version 10.0.0-4ubuntu1 80 30.00 ubuntu:20.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 10-20200411-0ubuntu1) 10.0.1 20200411 (experimental) [master revision bb87d5cc77d:75961caccb7:f883c46b4877f637e0fa5025b4d6b5c9040ec566] 81 70.34 ubuntu:20.10 : Ok gcc (Ubuntu 10.2.0-5ubuntu2) 10.2.0, Ubuntu clang version 10.0.1-1 $ # uname -a Linux five 5.9.0+ #1 SMP Thu Oct 15 09:06:41 -03 2020 x86_64 x86_64 x86_64 GNU/Linux # git log --oneline -1 |
||
Linus Torvalds
|
9ff9b0d392 |
networking changes for the 5.10 merge window
Add redirect_neigh() BPF packet redirect helper, allowing to limit stack traversal in common container configs and improving TCP back-pressure. Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain. Expand netlink policy support and improve policy export to user space. (Ge)netlink core performs request validation according to declared policies. Expand the expressiveness of those policies (min/max length and bitmasks). Allow dumping policies for particular commands. This is used for feature discovery by user space (instead of kernel version parsing or trial and error). Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge. Allow more than 255 IPv4 multicast interfaces. Add support for Type of Service (ToS) reflection in SYN/SYN-ACK packets of TCPv6. In Multi-patch TCP (MPTCP) support concurrent transmission of data on multiple subflows in a load balancing scenario. Enhance advertising addresses via the RM_ADDR/ADD_ADDR options. Support SMC-Dv2 version of SMC, which enables multi-subnet deployments. Allow more calls to same peer in RxRPC. Support two new Controller Area Network (CAN) protocols - CAN-FD and ISO 15765-2:2016. Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit kernel problem. Add TC actions for implementing MPLS L2 VPNs. Improve nexthop code - e.g. handle various corner cases when nexthop objects are removed from groups better, skip unnecessary notifications and make it easier to offload nexthops into HW by converting to a blocking notifier. Support adding and consuming TCP header options by BPF programs, opening the doors for easy experimental and deployment-specific TCP option use. Reorganize TCP congestion control (CC) initialization to simplify life of TCP CC implemented in BPF. Add support for shipping BPF programs with the kernel and loading them early on boot via the User Mode Driver mechanism, hence reusing all the user space infra we have. Support sleepable BPF programs, initially targeting LSM and tracing. Add bpf_d_path() helper for returning full path for given 'struct path'. Make bpf_tail_call compatible with bpf-to-bpf calls. Allow BPF programs to call map_update_elem on sockmaps. Add BPF Type Format (BTF) support for type and enum discovery, as well as support for using BTF within the kernel itself (current use is for pretty printing structures). Support listing and getting information about bpf_links via the bpf syscall. Enhance kernel interfaces around NIC firmware update. Allow specifying overwrite mask to control if settings etc. are reset during update; report expected max time operation may take to users; support firmware activation without machine reboot incl. limits of how much impact reset may have (e.g. dropping link or not). Extend ethtool configuration interface to report IEEE-standard counters, to limit the need for per-vendor logic in user space. Adopt or extend devlink use for debug, monitoring, fw update in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx, dpaa2-eth). In mlxsw expose critical and emergency SFP module temperature alarms. Refactor port buffer handling to make the defaults more suitable and support setting these values explicitly via the DCBNL interface. Add XDP support for Intel's igb driver. Support offloading TC flower classification and filtering rules to mscc_ocelot switches. Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as fixed interval period pulse generator and one-step timestamping in dpaa-eth. Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3) offload. Add Lynx PHY/PCS MDIO module, and convert various drivers which have this HW to use it. Convert mvpp2 to split PCS. Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as 7-port Mediatek MT7531 IP. Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver, and wcn3680 support in wcn36xx. Improve performance for packets which don't require much offloads on recent Mellanox NICs by 20% by making multiple packets share a descriptor entry. Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto subtree to drivers/net. Move MDIO drivers out of the phy directory. Clean up a lot of W=1 warnings, reportedly the actively developed subsections of networking drivers should now build W=1 warning free. Make sure drivers don't use in_interrupt() to dynamically adapt their code. Convert tasklets to use new tasklet_setup API (sadly this conversion is not yet complete). Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+ItRwACgkQMUZtbf5S IrtTMg//UxpdR/MirT1DatBU0K/UGAZY82hV7F/UC8tPgjfHZeHvWlDFxfi3YP81 PtPKbhRZ7DhwBXefUp6nY3UdvjftrJK2lJm8prJUPSsZRye8Wlcb7y65q7/P2y2U Efucyopg6RUrmrM0DUsIGYGJgylQLHnMYUl/keCsD4t5Bp4ksyi9R2t5eitGoWzh r3QGdbSa0AuWx4iu0i+tqp6Tj0ekMBMXLVb35dtU1t0joj2KTNEnSgABN3prOa8E iWYf2erOau68Ogp3yU3miCy0ZU4p/7qGHTtzbcp677692P/ekak6+zmfHLT9/Pjy 2Stq2z6GoKuVxdktr91D9pA3jxG4LxSJmr0TImcGnXbvkMP3Ez3g9RrpV5fn8j6F mZCH8TKZAoD5aJrAJAMkhZmLYE1pvDa7KolSk8WogXrbCnTEb5Nv8FHTS1Qnk3yl wSKXuvutFVNLMEHCnWQLtODbTST9DI/aOi6EctPpuOA/ZyL1v3pl+gfp37S+LUTe owMnT/7TdvKaTD0+gIyU53M6rAWTtr5YyRQorX9awIu/4Ha0F0gYD7BJZQUGtegp HzKt59NiSrFdbSH7UdyemdBF4LuCgIhS7rgfeoUXMXmuPHq7eHXyHZt5dzPPa/xP 81P0MAvdpFVwg8ij2yp2sHS7sISIRKq17fd1tIewUabxQbjXqPc= =bc1U -----END PGP SIGNATURE----- Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: - Add redirect_neigh() BPF packet redirect helper, allowing to limit stack traversal in common container configs and improving TCP back-pressure. Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain. - Expand netlink policy support and improve policy export to user space. (Ge)netlink core performs request validation according to declared policies. Expand the expressiveness of those policies (min/max length and bitmasks). Allow dumping policies for particular commands. This is used for feature discovery by user space (instead of kernel version parsing or trial and error). - Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge. - Allow more than 255 IPv4 multicast interfaces. - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK packets of TCPv6. - In Multi-patch TCP (MPTCP) support concurrent transmission of data on multiple subflows in a load balancing scenario. Enhance advertising addresses via the RM_ADDR/ADD_ADDR options. - Support SMC-Dv2 version of SMC, which enables multi-subnet deployments. - Allow more calls to same peer in RxRPC. - Support two new Controller Area Network (CAN) protocols - CAN-FD and ISO 15765-2:2016. - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit kernel problem. - Add TC actions for implementing MPLS L2 VPNs. - Improve nexthop code - e.g. handle various corner cases when nexthop objects are removed from groups better, skip unnecessary notifications and make it easier to offload nexthops into HW by converting to a blocking notifier. - Support adding and consuming TCP header options by BPF programs, opening the doors for easy experimental and deployment-specific TCP option use. - Reorganize TCP congestion control (CC) initialization to simplify life of TCP CC implemented in BPF. - Add support for shipping BPF programs with the kernel and loading them early on boot via the User Mode Driver mechanism, hence reusing all the user space infra we have. - Support sleepable BPF programs, initially targeting LSM and tracing. - Add bpf_d_path() helper for returning full path for given 'struct path'. - Make bpf_tail_call compatible with bpf-to-bpf calls. - Allow BPF programs to call map_update_elem on sockmaps. - Add BPF Type Format (BTF) support for type and enum discovery, as well as support for using BTF within the kernel itself (current use is for pretty printing structures). - Support listing and getting information about bpf_links via the bpf syscall. - Enhance kernel interfaces around NIC firmware update. Allow specifying overwrite mask to control if settings etc. are reset during update; report expected max time operation may take to users; support firmware activation without machine reboot incl. limits of how much impact reset may have (e.g. dropping link or not). - Extend ethtool configuration interface to report IEEE-standard counters, to limit the need for per-vendor logic in user space. - Adopt or extend devlink use for debug, monitoring, fw update in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx, dpaa2-eth). - In mlxsw expose critical and emergency SFP module temperature alarms. Refactor port buffer handling to make the defaults more suitable and support setting these values explicitly via the DCBNL interface. - Add XDP support for Intel's igb driver. - Support offloading TC flower classification and filtering rules to mscc_ocelot switches. - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as fixed interval period pulse generator and one-step timestamping in dpaa-eth. - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3) offload. - Add Lynx PHY/PCS MDIO module, and convert various drivers which have this HW to use it. Convert mvpp2 to split PCS. - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as 7-port Mediatek MT7531 IP. - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver, and wcn3680 support in wcn36xx. - Improve performance for packets which don't require much offloads on recent Mellanox NICs by 20% by making multiple packets share a descriptor entry. - Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto subtree to drivers/net. Move MDIO drivers out of the phy directory. - Clean up a lot of W=1 warnings, reportedly the actively developed subsections of networking drivers should now build W=1 warning free. - Make sure drivers don't use in_interrupt() to dynamically adapt their code. Convert tasklets to use new tasklet_setup API (sadly this conversion is not yet complete). * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits) Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH" net, sockmap: Don't call bpf_prog_put() on NULL pointer bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo bpf, sockmap: Add locking annotations to iterator netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements net: fix pos incrementment in ipv6_route_seq_next net/smc: fix invalid return code in smcd_new_buf_create() net/smc: fix valid DMBE buffer sizes net/smc: fix use-after-free of delayed events bpfilter: Fix build error with CONFIG_BPFILTER_UMH cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info bpf: Fix register equivalence tracking. rxrpc: Fix loss of final ack on shutdown rxrpc: Fix bundle counting for exclusive connections netfilter: restore NF_INET_NUMHOOKS ibmveth: Identify ingress large send packets. ibmveth: Switch order of ibmveth_helper calls. cxgb4: handle 4-tuple PEDIT to NAT mode translation selftests: Add VRF route leaking tests ... |
||
Leo Yan
|
744aec4df2 |
perf c2c: Update documentation for metrics reorganization
The output format for metrics has been reorganized, update documentation to reflect the changes for it. Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Al Grant <al.grant@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Joe Mario <jmario@redhat.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20201015144548.18482-10-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
91d933c221 |
perf c2c: Add metrics "RMT Load Hit"
The metrics "LLC Ld Miss" and "Load Dram" overlap with each other for accouting items: "LLC Ld Miss" = "lcl_dram" + "rmt_dram" + "rmt_hit" + "rmt_hitm" "Load Dram" = "lcl_dram" + "rmt_dram" Furthermore, the metrics "LLC Ld Miss" is not directive to show statistics due to it contains summary value and cannot give out breakdown details. For this reason, add a new metrics "RMT Load Hit" which is used to present the remote cache hit; it contains two items: "RMT Load Hit" = remote hit ("rmt_hit") + remote hitm ("rmt_hitm") As result, the metrics "LLC Ld Miss" is perfectly divided into two metrics "RMT Load Hit" and "Load Dram". It's not necessary to keep metrics "LLC Ld Miss", so remove it. Before: # ----------- Cacheline ---------- Tot ------- Load Hitm ------- Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- LLC --- Load Dram ---- # Index Address Node PA cnt Hitm Total LclHitm RmtHitm records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm Ld Miss Lcl Rmt # ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ....... ........ ........ # 0 0x55f07d580100 0 1499 85.89% 481 481 0 7243 3879 3364 2599 765 548 2615 66 169 481 0 0 0 1 0x55f07d580080 0 1 13.93% 78 78 0 664 664 0 0 0 187 361 27 11 78 0 0 0 2 0x55f07d5800c0 0 1 0.18% 1 1 0 405 405 0 0 0 131 0 10 263 1 0 0 0 After: # ----------- Cacheline ---------- Tot ------- Load Hitm ------- Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ---- # Index Address Node PA cnt Hitm Total LclHitm RmtHitm records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt # ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........ # 0 0x55f07d580100 0 1499 85.89% 481 481 0 7243 3879 3364 2599 765 548 2615 66 169 481 0 0 0 0 1 0x55f07d580080 0 1 13.93% 78 78 0 664 664 0 0 0 187 361 27 11 78 0 0 0 0 2 0x55f07d5800c0 0 1 0.18% 1 1 0 405 405 0 0 0 131 0 10 263 1 0 0 0 0 Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Joe Mario <jmario@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20201014050921.5591-9-leo.yan@linaro.org |
||
Leo Yan
|
77c158698c |
perf c2c: Correct LLC load hit metrics
"rmt_hit" is accounted into two metrics: one is accounted into the metrics "LLC Ld Miss" (see the function llc_miss() for calculation "llcmiss"); and it's accounted into metrics "LLC Load Hit". Thus, for the literal meaning, it is contradictory that "rmt_hit" is accounted for both "LLC Ld Miss" (LLC miss) and "LLC Load Hit" (LLC hit). Thus this is easily to introduce confusion: "LLC Load Hit" gives impression that all items belong to it are LLC hit; in fact "rmt_hit" is LLC miss and remote cache hit. To give out clear semantics for metric "LLC Load Hit", "rmt_hit" is moved out from it and changes "LLC Load Hit" to contain two items: LLC Load Hit = LLC's hit ("ld_llchit") + LLC's hitm ("lcl_hitm") For output alignment, adjusts the header for "LLC Load Hit". Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Joe Mario <jmario@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201014050921.5591-8-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
ed626a3e52 |
perf c2c: Change header for LLC local hit
Replace the header string "Lcl" with "LclHit", which is more explicit to express the event type is LLC local hit. Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Joe Mario <jmario@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201014050921.5591-7-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
0fbe2fe965 |
perf c2c: Use more explicit headers for HITM
Local and remote HITM use the headers 'Lcl' and 'Rmt' respectively, suppose if we want to extend the tool to display these two dimensions under any one metrics, users cannot understand the semantics if only based on the header string 'Lcl' or 'Rmt'. To explicit express the meaning for HITM items, this patch changes the headers string as "LclHitm" and "RmtHitm", the strings are more readable and this allows to extend metrics for using HITM items. Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Joe Mario <jmario@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201014050921.5591-6-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
fdd32d7e8e |
perf c2c: Change header from "LLC Load Hitm" to "Load Hitm"
The metrics "LLC Load Hitm" contains two items: one is "local Hitm" and another is "remote Hitm". "local Hitm" means: L3 HIT and was serviced by another processor core with a cross core snoop where modified copies were found; it's no doubt that "local Hitm" belongs to LLC access. But for "remote Hitm", based on the code in util/mem-events, it's the event for remote cache HIT and was serviced by another processor core with modified copies. Thus the remote Hitm is a remote cache's hit and actually it's LLC load miss. Now the display format gives users the impression that "local Hitm" and "remote Hitm" both belong to the LLC load, but this is not the fact as described. This patch changes the header from "LLC Load Hitm" to "Load Hitm", this can avoid the give the wrong impression that all Hitm belong to LLC. Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Joe Mario <jmario@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201014050921.5591-5-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
6d662d730d |
perf c2c: Organize metrics based on memory hierarchy
The metrics are not organized based on memory hierarchy, e.g. the tool doesn't organize the metrics order based on memory nodes from the close node (e.g. L1/L2 cache) to far node (e.g. L3 cache and DRAM). To output metrics with more friendly form, this patch refines the metrics order based on memory hierarchy: "Core Load Hit" => "LLC Load Hit" => "LLC Ld Miss" => "Load Dram" Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Joe Mario <jmario@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201014050921.5591-4-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
4f28641bde |
perf c2c: Display "Total Stores" as a standalone metrics
The total stores is displayed under the metrics "Store Reference", to output the same format with total records and all loads, extract the total stores number as a standalone metrics "Total Stores". After this patch, the tool shows the summary numbers ("Total records", "Total loads", "Total Stores") in the unified form. Before: # ----------- Cacheline ---------- Tot ----- LLC Load Hitm ----- Total Total ---- Store Reference ---- --- Load Dram ---- LLC ----- Core Load Hit ----- -- LLC Load Hit -- # Index Address Node PA cnt Hitm Total Lcl Rmt records Loads Total L1Hit L1Miss Lcl Rmt Ld Miss FB L1 L2 Llc Rmt # ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ........ ....... ....... ....... ....... ........ ........ # 0 0x55f07d580100 0 1499 85.89% 481 481 0 7243 3879 3364 2599 765 0 0 0 548 2615 66 169 0 1 0x55f07d580080 0 1 13.93% 78 78 0 664 664 0 0 0 0 0 0 187 361 27 11 0 2 0x55f07d5800c0 0 1 0.18% 1 1 0 405 405 0 0 0 0 0 0 131 0 10 263 0 After: # ----------- Cacheline ---------- Tot ----- LLC Load Hitm ----- Total Total Total ---- Stores ---- --- Load Dram ---- LLC ----- Core Load Hit ----- -- LLC Load Hit -- # Index Address Node PA cnt Hitm Total Lcl Rmt records Loads Stores L1Hit L1Miss Lcl Rmt Ld Miss FB L1 L2 Llc Rmt # ..... .................. .... ...... ....... ....... ....... ....... ....... ....... ....... ....... ....... ........ ........ ....... ....... ....... ....... ........ ........ # 0 0x55f07d580100 0 1499 85.89% 481 481 0 7243 3879 3364 2599 765 0 0 0 548 2615 66 169 0 1 0x55f07d580080 0 1 13.93% 78 78 0 664 664 0 0 0 0 0 0 187 361 27 11 0 2 0x55f07d5800c0 0 1 0.18% 1 1 0 405 405 0 0 0 0 0 0 131 0 10 263 0 Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Joe Mario <jmario@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201014050921.5591-3-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Leo Yan
|
b596e979c8 |
perf c2c: Display the total numbers continuously
To view the statistics with "breakdown" mode, it's good to show the summary numbers for the total records, all stores and all loads, then the sequential conlumns can be used to break into more detailed items. To achieve this purpose, this patch displays the summary numbers for records/stores/loads continuously and places them before breakdown items, this can allow uses to easily read the summarized statistics. Signed-off-by: Leo Yan <leo.yan@linaro.org> Tested-by: Joe Mario <jmario@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201014050921.5591-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
f92993851f |
perf bench: Use condition variables in numa.
The existing approach to synchronization between threads in the numa benchmark is unbalanced mutexes. This synchronization causes thread sanitizer to warn of locks being taken twice on a thread without an unlock, as well as unlocks with no corresponding locks. This change replaces the synchronization with more regular condition variables. While this fixes one class of thread sanitizer warnings, there still remain warnings of data races due to threads reading and writing shared memory without any atomics. Committer testing: Basic run on a non-NUMA machine. # perf bench numa # List of available benchmarks for collection 'numa': mem: Benchmark for NUMA workloads all: Run all NUMA benchmarks # perf bench numa all # Running numa/mem benchmark... # Running main, "perf bench numa numa-mem" # # Running test on: Linux five 5.8.12-200.fc32.x86_64 #1 SMP Mon Sep 28 12:17:31 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux # # Running RAM-bw-local, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp 1 --no-data_rand_walk" 20.076 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.073 secs average thread-runtime 0.190 % difference between max/avg runtime 241.828 GB data processed, per thread 241.828 GB data processed, total 0.083 nsecs/byte/thread runtime 12.045 GB/sec/thread speed 12.045 GB/sec total speed # Running RAM-bw-local-NOTHP, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp 1 --no-data_rand_walk --thp -1" 20.045 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.014 secs average thread-runtime 0.111 % difference between max/avg runtime 234.304 GB data processed, per thread 234.304 GB data processed, total 0.086 nsecs/byte/thread runtime 11.689 GB/sec/thread speed 11.689 GB/sec total speed # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 1 -s 20 -zZq --thp 1 --no-data_rand_walk" Test not applicable, system has only 1 nodes. # Running RAM-bw-local-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 0x2 -s 20 -zZq --thp 1 --no-data_rand_walk" 20.138 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.121 secs average thread-runtime 0.342 % difference between max/avg runtime 135.961 GB data processed, per thread 271.922 GB data processed, total 0.148 nsecs/byte/thread runtime 6.752 GB/sec/thread speed 13.503 GB/sec total speed # Running RAM-bw-remote-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 1x2 -s 20 -zZq --thp 1 --no-data_rand_walk" Test not applicable, system has only 1 nodes. # Running RAM-bw-cross, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk" Test not applicable, system has only 1 nodes. # Running 1x3-convergence, "perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp 1" 0.747 secs latency to NUMA-converge 0.747 secs slowest (max) thread-runtime 0.000 secs fastest (min) thread-runtime 0.714 secs average thread-runtime 50.000 % difference between max/avg runtime 3.228 GB data processed, per thread 9.683 GB data processed, total 0.231 nsecs/byte/thread runtime 4.321 GB/sec/thread speed 12.964 GB/sec total speed # Running 1x4-convergence, "perf bench numa mem -p 1 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 1.127 secs latency to NUMA-converge 1.127 secs slowest (max) thread-runtime 1.000 secs fastest (min) thread-runtime 1.089 secs average thread-runtime 5.624 % difference between max/avg runtime 3.765 GB data processed, per thread 15.062 GB data processed, total 0.299 nsecs/byte/thread runtime 3.342 GB/sec/thread speed 13.368 GB/sec total speed # Running 1x6-convergence, "perf bench numa mem -p 1 -t 6 -P 1020 -s 100 -zZ0qcm --thp 1" 1.003 secs latency to NUMA-converge 1.003 secs slowest (max) thread-runtime 0.000 secs fastest (min) thread-runtime 0.889 secs average thread-runtime 50.000 % difference between max/avg runtime 2.141 GB data processed, per thread 12.847 GB data processed, total 0.469 nsecs/byte/thread runtime 2.134 GB/sec/thread speed 12.805 GB/sec total speed # Running 2x3-convergence, "perf bench numa mem -p 2 -t 3 -P 1020 -s 100 -zZ0qcm --thp 1" 1.814 secs latency to NUMA-converge 1.814 secs slowest (max) thread-runtime 1.000 secs fastest (min) thread-runtime 1.716 secs average thread-runtime 22.440 % difference between max/avg runtime 3.747 GB data processed, per thread 22.483 GB data processed, total 0.484 nsecs/byte/thread runtime 2.065 GB/sec/thread speed 12.393 GB/sec total speed # Running 3x3-convergence, "perf bench numa mem -p 3 -t 3 -P 1020 -s 100 -zZ0qcm --thp 1" 2.065 secs latency to NUMA-converge 2.065 secs slowest (max) thread-runtime 1.000 secs fastest (min) thread-runtime 1.947 secs average thread-runtime 25.788 % difference between max/avg runtime 2.855 GB data processed, per thread 25.694 GB data processed, total 0.723 nsecs/byte/thread runtime 1.382 GB/sec/thread speed 12.442 GB/sec total speed # Running 4x4-convergence, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 1.912 secs latency to NUMA-converge 1.912 secs slowest (max) thread-runtime 1.000 secs fastest (min) thread-runtime 1.775 secs average thread-runtime 23.852 % difference between max/avg runtime 1.479 GB data processed, per thread 23.668 GB data processed, total 1.293 nsecs/byte/thread runtime 0.774 GB/sec/thread speed 12.378 GB/sec total speed # Running 4x4-convergence-NOTHP, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp 1 --thp -1" 1.783 secs latency to NUMA-converge 1.783 secs slowest (max) thread-runtime 1.000 secs fastest (min) thread-runtime 1.633 secs average thread-runtime 21.960 % difference between max/avg runtime 1.345 GB data processed, per thread 21.517 GB data processed, total 1.326 nsecs/byte/thread runtime 0.754 GB/sec/thread speed 12.067 GB/sec total speed # Running 4x6-convergence, "perf bench numa mem -p 4 -t 6 -P 1020 -s 100 -zZ0qcm --thp 1" 5.396 secs latency to NUMA-converge 5.396 secs slowest (max) thread-runtime 4.000 secs fastest (min) thread-runtime 4.928 secs average thread-runtime 12.937 % difference between max/avg runtime 2.721 GB data processed, per thread 65.306 GB data processed, total 1.983 nsecs/byte/thread runtime 0.504 GB/sec/thread speed 12.102 GB/sec total speed # Running 4x8-convergence, "perf bench numa mem -p 4 -t 8 -P 512 -s 100 -zZ0qcm --thp 1" 3.121 secs latency to NUMA-converge 3.121 secs slowest (max) thread-runtime 2.000 secs fastest (min) thread-runtime 2.836 secs average thread-runtime 17.962 % difference between max/avg runtime 1.194 GB data processed, per thread 38.192 GB data processed, total 2.615 nsecs/byte/thread runtime 0.382 GB/sec/thread speed 12.236 GB/sec total speed # Running 8x4-convergence, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 4.302 secs latency to NUMA-converge 4.302 secs slowest (max) thread-runtime 3.000 secs fastest (min) thread-runtime 4.045 secs average thread-runtime 15.133 % difference between max/avg runtime 1.631 GB data processed, per thread 52.178 GB data processed, total 2.638 nsecs/byte/thread runtime 0.379 GB/sec/thread speed 12.128 GB/sec total speed # Running 8x4-convergence-NOTHP, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp 1 --thp -1" 4.418 secs latency to NUMA-converge 4.418 secs slowest (max) thread-runtime 3.000 secs fastest (min) thread-runtime 4.104 secs average thread-runtime 16.045 % difference between max/avg runtime 1.664 GB data processed, per thread 53.254 GB data processed, total 2.655 nsecs/byte/thread runtime 0.377 GB/sec/thread speed 12.055 GB/sec total speed # Running 3x1-convergence, "perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 0.973 secs latency to NUMA-converge 0.973 secs slowest (max) thread-runtime 0.000 secs fastest (min) thread-runtime 0.955 secs average thread-runtime 50.000 % difference between max/avg runtime 4.124 GB data processed, per thread 12.372 GB data processed, total 0.236 nsecs/byte/thread runtime 4.238 GB/sec/thread speed 12.715 GB/sec total speed # Running 4x1-convergence, "perf bench numa mem -p 4 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 0.820 secs latency to NUMA-converge 0.820 secs slowest (max) thread-runtime 0.000 secs fastest (min) thread-runtime 0.808 secs average thread-runtime 50.000 % difference between max/avg runtime 2.555 GB data processed, per thread 10.220 GB data processed, total 0.321 nsecs/byte/thread runtime 3.117 GB/sec/thread speed 12.468 GB/sec total speed # Running 8x1-convergence, "perf bench numa mem -p 8 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 0.667 secs latency to NUMA-converge 0.667 secs slowest (max) thread-runtime 0.000 secs fastest (min) thread-runtime 0.607 secs average thread-runtime 50.000 % difference between max/avg runtime 1.009 GB data processed, per thread 8.069 GB data processed, total 0.661 nsecs/byte/thread runtime 1.512 GB/sec/thread speed 12.095 GB/sec total speed # Running 16x1-convergence, "perf bench numa mem -p 16 -t 1 -P 256 -s 100 -zZ0qcm --thp 1" 1.546 secs latency to NUMA-converge 1.546 secs slowest (max) thread-runtime 1.000 secs fastest (min) thread-runtime 1.485 secs average thread-runtime 17.664 % difference between max/avg runtime 1.162 GB data processed, per thread 18.594 GB data processed, total 1.331 nsecs/byte/thread runtime 0.752 GB/sec/thread speed 12.025 GB/sec total speed # Running 32x1-convergence, "perf bench numa mem -p 32 -t 1 -P 128 -s 100 -zZ0qcm --thp 1" 0.812 secs latency to NUMA-converge 0.812 secs slowest (max) thread-runtime 0.000 secs fastest (min) thread-runtime 0.739 secs average thread-runtime 50.000 % difference between max/avg runtime 0.309 GB data processed, per thread 9.874 GB data processed, total 2.630 nsecs/byte/thread runtime 0.380 GB/sec/thread speed 12.166 GB/sec total speed # Running 2x1-bw-process, "perf bench numa mem -p 2 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 20.044 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.020 secs average thread-runtime 0.109 % difference between max/avg runtime 125.750 GB data processed, per thread 251.501 GB data processed, total 0.159 nsecs/byte/thread runtime 6.274 GB/sec/thread speed 12.548 GB/sec total speed # Running 3x1-bw-process, "perf bench numa mem -p 3 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 20.148 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.090 secs average thread-runtime 0.367 % difference between max/avg runtime 85.267 GB data processed, per thread 255.800 GB data processed, total 0.236 nsecs/byte/thread runtime 4.232 GB/sec/thread speed 12.696 GB/sec total speed # Running 4x1-bw-process, "perf bench numa mem -p 4 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 20.169 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.100 secs average thread-runtime 0.419 % difference between max/avg runtime 63.144 GB data processed, per thread 252.576 GB data processed, total 0.319 nsecs/byte/thread runtime 3.131 GB/sec/thread speed 12.523 GB/sec total speed # Running 8x1-bw-process, "perf bench numa mem -p 8 -t 1 -P 512 -s 20 -zZ0q --thp 1" 20.175 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.107 secs average thread-runtime 0.433 % difference between max/avg runtime 31.267 GB data processed, per thread 250.133 GB data processed, total 0.645 nsecs/byte/thread runtime 1.550 GB/sec/thread speed 12.398 GB/sec total speed # Running 8x1-bw-process-NOTHP, "perf bench numa mem -p 8 -t 1 -P 512 -s 20 -zZ0q --thp 1 --thp -1" 20.216 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.113 secs average thread-runtime 0.535 % difference between max/avg runtime 30.998 GB data processed, per thread 247.981 GB data processed, total 0.652 nsecs/byte/thread runtime 1.533 GB/sec/thread speed 12.266 GB/sec total speed # Running 16x1-bw-process, "perf bench numa mem -p 16 -t 1 -P 256 -s 20 -zZ0q --thp 1" 20.234 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.174 secs average thread-runtime 0.577 % difference between max/avg runtime 15.377 GB data processed, per thread 246.039 GB data processed, total 1.316 nsecs/byte/thread runtime 0.760 GB/sec/thread speed 12.160 GB/sec total speed # Running 1x4-bw-thread, "perf bench numa mem -p 1 -t 4 -T 256 -s 20 -zZ0q --thp 1" 20.040 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.028 secs average thread-runtime 0.099 % difference between max/avg runtime 66.832 GB data processed, per thread 267.328 GB data processed, total 0.300 nsecs/byte/thread runtime 3.335 GB/sec/thread speed 13.340 GB/sec total speed # Running 1x8-bw-thread, "perf bench numa mem -p 1 -t 8 -T 256 -s 20 -zZ0q --thp 1" 20.064 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.034 secs average thread-runtime 0.160 % difference between max/avg runtime 32.911 GB data processed, per thread 263.286 GB data processed, total 0.610 nsecs/byte/thread runtime 1.640 GB/sec/thread speed 13.122 GB/sec total speed # Running 1x16-bw-thread, "perf bench numa mem -p 1 -t 16 -T 128 -s 20 -zZ0q --thp 1" 20.092 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.052 secs average thread-runtime 0.230 % difference between max/avg runtime 16.131 GB data processed, per thread 258.088 GB data processed, total 1.246 nsecs/byte/thread runtime 0.803 GB/sec/thread speed 12.845 GB/sec total speed # Running 1x32-bw-thread, "perf bench numa mem -p 1 -t 32 -T 64 -s 20 -zZ0q --thp 1" 20.099 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.063 secs average thread-runtime 0.247 % difference between max/avg runtime 7.962 GB data processed, per thread 254.773 GB data processed, total 2.525 nsecs/byte/thread runtime 0.396 GB/sec/thread speed 12.676 GB/sec total speed # Running 2x3-bw-process, "perf bench numa mem -p 2 -t 3 -P 512 -s 20 -zZ0q --thp 1" 20.150 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.120 secs average thread-runtime 0.372 % difference between max/avg runtime 44.827 GB data processed, per thread 268.960 GB data processed, total 0.450 nsecs/byte/thread runtime 2.225 GB/sec/thread speed 13.348 GB/sec total speed # Running 4x4-bw-process, "perf bench numa mem -p 4 -t 4 -P 512 -s 20 -zZ0q --thp 1" 20.258 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.168 secs average thread-runtime 0.636 % difference between max/avg runtime 17.079 GB data processed, per thread 273.263 GB data processed, total 1.186 nsecs/byte/thread runtime 0.843 GB/sec/thread speed 13.489 GB/sec total speed # Running 4x6-bw-process, "perf bench numa mem -p 4 -t 6 -P 512 -s 20 -zZ0q --thp 1" 20.559 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.382 secs average thread-runtime 1.359 % difference between max/avg runtime 10.758 GB data processed, per thread 258.201 GB data processed, total 1.911 nsecs/byte/thread runtime 0.523 GB/sec/thread speed 12.559 GB/sec total speed # Running 4x8-bw-process, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp 1" 20.744 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.516 secs average thread-runtime 1.792 % difference between max/avg runtime 8.069 GB data processed, per thread 258.201 GB data processed, total 2.571 nsecs/byte/thread runtime 0.389 GB/sec/thread speed 12.447 GB/sec total speed # Running 4x8-bw-process-NOTHP, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp 1 --thp -1" 20.855 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.561 secs average thread-runtime 2.050 % difference between max/avg runtime 8.069 GB data processed, per thread 258.201 GB data processed, total 2.585 nsecs/byte/thread runtime 0.387 GB/sec/thread speed 12.381 GB/sec total speed # Running 3x3-bw-process, "perf bench numa mem -p 3 -t 3 -P 512 -s 20 -zZ0q --thp 1" 20.134 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.077 secs average thread-runtime 0.333 % difference between max/avg runtime 28.091 GB data processed, per thread 252.822 GB data processed, total 0.717 nsecs/byte/thread runtime 1.395 GB/sec/thread speed 12.557 GB/sec total speed # Running 5x5-bw-process, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1" 20.588 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.375 secs average thread-runtime 1.427 % difference between max/avg runtime 10.177 GB data processed, per thread 254.436 GB data processed, total 2.023 nsecs/byte/thread runtime 0.494 GB/sec/thread speed 12.359 GB/sec total speed # Running 2x16-bw-process, "perf bench numa mem -p 2 -t 16 -P 512 -s 20 -zZ0q --thp 1" 20.657 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.429 secs average thread-runtime 1.589 % difference between max/avg runtime 8.170 GB data processed, per thread 261.429 GB data processed, total 2.528 nsecs/byte/thread runtime 0.395 GB/sec/thread speed 12.656 GB/sec total speed # Running 1x32-bw-process, "perf bench numa mem -p 1 -t 32 -P 2048 -s 20 -zZ0q --thp 1" 22.981 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 21.996 secs average thread-runtime 6.486 % difference between max/avg runtime 8.863 GB data processed, per thread 283.606 GB data processed, total 2.593 nsecs/byte/thread runtime 0.386 GB/sec/thread speed 12.341 GB/sec total speed # Running numa02-bw, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp 1" 20.047 secs slowest (max) thread-runtime 19.000 secs fastest (min) thread-runtime 20.026 secs average thread-runtime 2.611 % difference between max/avg runtime 8.441 GB data processed, per thread 270.111 GB data processed, total 2.375 nsecs/byte/thread runtime 0.421 GB/sec/thread speed 13.474 GB/sec total speed # Running numa02-bw-NOTHP, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp 1 --thp -1" 20.088 secs slowest (max) thread-runtime 19.000 secs fastest (min) thread-runtime 20.025 secs average thread-runtime 2.709 % difference between max/avg runtime 8.411 GB data processed, per thread 269.142 GB data processed, total 2.388 nsecs/byte/thread runtime 0.419 GB/sec/thread speed 13.398 GB/sec total speed # Running numa01-bw-thread, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp 1" 20.293 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.175 secs average thread-runtime 0.721 % difference between max/avg runtime 7.918 GB data processed, per thread 253.374 GB data processed, total 2.563 nsecs/byte/thread runtime 0.390 GB/sec/thread speed 12.486 GB/sec total speed # Running numa01-bw-thread-NOTHP, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp 1 --thp -1" 20.411 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.226 secs average thread-runtime 1.006 % difference between max/avg runtime 7.931 GB data processed, per thread 253.778 GB data processed, total 2.574 nsecs/byte/thread runtime 0.389 GB/sec/thread speed 12.434 GB/sec total speed # Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20201012161611.366482-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Linus Torvalds
|
6873139ed0 |
objtool changes for v5.10:
- Most of the changes are cleanups and reorganization to make the objtool code more arch-agnostic. This is in preparation for non-x86 support. Fixes: - KASAN fixes. - Handle unreachable trap after call to noreturn functions better. - Ignore unreachable fake jumps. - Misc smaller fixes & cleanups. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+FgwIRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1juGw/6A6goA5/HHapM965yG1eY/rTLp3eIbcma 1ZbkUsP0YfT6wVUzw/sOeZzKNOwOq1FuMfkjuH2KcnlxlcMekIaKvLk8uauW4igM hbFGuuZfZ0An5ka9iQ1W6HGdsuD3vVlN1w/kxdWk0c3lJCVQSTxdCfzF8fuF3gxX lF3Bc1D/ZFcHIHT/hu/jeIUCgCYpD3qZDjQJBScSwVthZC+Fw6weLLGp2rKDaCao HhSQft6MUfDrUKfH3LBIUNPRPCOrHo5+AX6BXxLXJVxqlwO/YU3e0GMwSLedMtBy TASWo7/9GAp+wNNZe8EliyTKrfC3sLxN1QImfjuojxbBVXx/YQ/ToTt9fVGpF4Y+ XhhRFv9520v1tS2wPHIgQGwbh7EWG6mdrmo10RAs/31ViONPrbEZ4WmcA08b/5FY KEkOVb18yfmDVzVZPpSc+HpIFkppEBOf7wPg27Bj3RTZmzIl/y+rKSnxROpsJsWb R6iov7SFVET14lHl1G7tPNXfqRaS7HaOQIj3rSUyAP0ZfX+yIupVJp32dc6Ofg8b SddUCwdIHoFdUNz4Y9csUCrewtCVJbxhV4MIdv0GpWbrgSw96RFZgetaH+6mGRpj 0Kh6M1eC3irDbhBuarWUBAr2doPAq4iOUeQU36Q6YSAbCs83Ws2uKOWOHoFBVwCH uSKT0wqqG+E= =KX5o -----END PGP SIGNATURE----- Merge tag 'objtool-core-2020-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: "Most of the changes are cleanups and reorganization to make the objtool code more arch-agnostic. This is in preparation for non-x86 support. Other changes: - KASAN fixes - Handle unreachable trap after call to noreturn functions better - Ignore unreachable fake jumps - Misc smaller fixes & cleanups" * tag 'objtool-core-2020-10-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) perf build: Allow nested externs to enable BUILD_BUG() usage objtool: Allow nested externs to enable BUILD_BUG() objtool: Permit __kasan_check_{read,write} under UACCESS objtool: Ignore unreachable trap after call to noreturn functions objtool: Handle calling non-function symbols in other sections objtool: Ignore unreachable fake jumps objtool: Remove useless tests before save_reg() objtool: Decode unwind hint register depending on architecture objtool: Make unwind hint definitions available to other architectures objtool: Only include valid definitions depending on source file type objtool: Rename frame.h -> objtool.h objtool: Refactor jump table code to support other architectures objtool: Make relocation in alternative handling arch dependent objtool: Abstract alternative special case handling objtool: Move macros describing structures to arch-dependent code objtool: Make sync-check consider the target architecture objtool: Group headers to check in a single list objtool: Define 'struct orc_entry' only when needed objtool: Skip ORC entry creation for non-text sections objtool: Move ORC logic out of check() ... |
||
John Garry
|
caf7f9685d |
perf jevents: Fix event code for events referencing std arch events
The event code for events referencing std arch events is incorrectly evaluated in json_events(). The issue is that je.event is evaluated properly from try_fixup(), but later NULLified from the real_event() call, as "event" may be NULL. Fix by setting "event" same je.event in try_fixup(). Also remove support for overwriting event code for events using std arch events, as it is not used. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-By: Kajol Jain<kjain@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/1602170368-11892-1-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
2a09a84c72 |
perf diff: Support hot streams comparison
This patch enables perf-diff with "--stream" option. "--stream": Enable hot streams comparison Now let's see example. perf record -b ... Generate perf.data.old with branch data perf record -b ... Generate perf.data with branch data perf diff --stream [ Matched hot streams ] hot chain pair 1: cycles: 1, hits: 27.77% cycles: 1, hits: 9.24% --------------------------- -------------------------- main div.c:39 main div.c:39 main div.c:44 main div.c:44 hot chain pair 2: cycles: 34, hits: 20.06% cycles: 27, hits: 16.98% --------------------------- -------------------------- __random_r random_r.c:360 __random_r random_r.c:360 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:380 __random_r random_r.c:380 __random_r random_r.c:357 __random_r random_r.c:357 __random random.c:293 __random random.c:293 __random random.c:293 __random random.c:293 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:288 __random random.c:288 rand rand.c:27 rand rand.c:27 rand rand.c:26 rand rand.c:26 rand@plt rand@plt rand@plt rand@plt compute_flag div.c:25 compute_flag div.c:25 compute_flag div.c:22 compute_flag div.c:22 main div.c:40 main div.c:40 main div.c:40 main div.c:40 main div.c:39 main div.c:39 hot chain pair 3: cycles: 9, hits: 4.48% cycles: 6, hits: 4.51% --------------------------- -------------------------- __random_r random_r.c:360 __random_r random_r.c:360 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:380 __random_r random_r.c:380 [ Hot streams in old perf data only ] hot chain 1: cycles: 18, hits: 6.75% -------------------------- __random_r random_r.c:360 __random_r random_r.c:388 __random_r random_r.c:388 __random_r random_r.c:380 __random_r random_r.c:357 __random random.c:293 __random random.c:293 __random random.c:291 __random random.c:291 __random random.c:291 __random random.c:288 rand rand.c:27 rand rand.c:26 rand@plt rand@plt compute_flag div.c:25 compute_flag div.c:22 main div.c:40 hot chain 2: cycles: 29, hits: 2.78% -------------------------- compute_flag div.c:22 main div.c:40 main div.c:40 main div.c:39 [ Hot streams in new perf data only ] hot chain 1: cycles: 4, hits: 4.54% -------------------------- main div.c:42 compute_flag div.c:28 hot chain 2: cycles: 5, hits: 3.51% -------------------------- main div.c:39 main div.c:44 main div.c:42 compute_flag div.c:28 Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20201009022845.13141-8-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
5bbd6bad3b |
perf streams: Report hot streams
We show the streams separately. They are divided into different sections. 1. "Matched hot streams" 2. "Hot streams in old perf data only" 3. "Hot streams in new perf data only". For each stream, we report the cycles and hot percent (hits%). For example, cycles: 2, hits: 4.08% -------------------------- main div.c:42 compute_flag div.c:28 Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20201009022845.13141-7-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
28904f4dce |
perf streams: Calculate the sum of total streams hits
We have used callchain_node->hit to measure the hot level of one stream. This patch calculates the sum of hits of total streams. Thus in next patch, we can use following formula to report hot percent for one stream. hot percent = callchain_node->hit / sum of total hits Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20201009022845.13141-6-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
fa79aa6485 |
perf streams: Link stream pair
In previous patch, we have created an evsel_streams for one event, and top N hottest streams will be saved in a stream array in evsel_streams. This patch compares total streams among two evsel_streams. Once two streams are fully matched, they will be linked as a pair. From the pair, we can know which streams are matched. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20201009022845.13141-5-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
47ef8398c3 |
perf streams: Compare two streams
Stream is the branch history which is aggregated by the branch records from perf samples. Now we support the callchain as stream. If the callchain entries of one stream are fully matched with the callchain entries of another stream, we think two streams are matched. For example, cycles: 1, hits: 26.80% cycles: 1, hits: 27.30% ----------------------- ----------------------- main div.c:39 main div.c:39 main div.c:44 main div.c:44 Above two streams are matched (we don't consider the case that source code is changed). The matching logic is, compare the chain string first. If it's not matched, fallback to dso address comparison. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20201009022845.13141-4-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
dd1d841810 |
perf streams: Get the evsel_streams by evsel_idx
In previous patch, we have created evsel_streams array. This patch returns the specified evsel_streams according to the evsel_idx. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20201009022845.13141-3-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jin Yao
|
480accbb17 |
perf streams: Introduce branch history "streams"
We define a stream as the branch history which is aggregated by the branch records from perf samples. For example, the callchains aggregated from the branch records are considered as streams. By browsing the hot stream, we can understand the hot code path. Now we only support the callchain for stream. For measuring the hot level for a stream, we use the callchain_node->hit, higher is hotter. There may be many callchains sampled so we only focus on the top N hottest callchains. N is a user defined parameter or predefined default value (nr_streams_max). This patch creates an evsel_streams array per event, and saves the top N hottest streams in a stream array. So now we can get the per-event top N hottest streams. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20201009022845.13141-2-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Andi Kleen
|
6556a75bec |
perf intel-pt: Improve PT documentation slightly
Document the higher level --insn-trace etc. perf script options. Include the howto how to build xed into the manpage Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lore.kernel.org/lkml/20201014035346.4772-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Andi Kleen
|
0997a2662f |
perf tools: Add support for exclusive groups/events
Peter suggested that using the exclusive mode in perf could avoid some problems with bad scheduling of groups. Exclusive is implemented in the kernel, but wasn't exposed by the perf tool, so hard to use without custom low level API users. Add support for marking groups or events with :e for exclusive in the perf tool. The implementation is basically the same as the existing pinned attribute. Committer testing: # perf test "parse event" 6: Parse event definition strings : Ok # perf test -v "parse event" |& grep :u*e running test 56 'instructions:uep' running test 57 '{cycles,cache-misses,branch-misses}:e' # # # grep "model name" -m1 /proc/cpuinfo model name : AMD Ryzen 9 3900X 12-Core Processor # # perf stat -a -e '{cycles,cache-misses,branch-misses}:e' sleep 1 Performance counter stats for 'system wide': <not counted> cycles (0.00%) <not counted> cache-misses (0.00%) <not counted> branch-misses (0.00%) 1.001269893 seconds time elapsed Some events weren't counted. Try disabling the NMI watchdog: echo 0 > /proc/sys/kernel/nmi_watchdog perf stat ... echo 1 > /proc/sys/kernel/nmi_watchdog # echo 0 > /proc/sys/kernel/nmi_watchdog # perf stat -a -e '{cycles,cache-misses,branch-misses}:e' sleep 1 Performance counter stats for 'system wide': 1,298,663,141 cycles 30,962,215 cache-misses 5,325,150 branch-misses 1.001474934 seconds time elapsed # # The output for asking for precise events on AMD needs to improve, it # supposedly works only for system wide or per CPU # # perf stat -a -e '{cycles,cache-misses,branch-misses}:uep' sleep 1 Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cycles). /bin/dmesg | grep -i perf may provide additional information. # perf stat -a -e '{cycles,cache-misses,branch-misses}:ue' sleep 1 Performance counter stats for 'system wide': 746,363,126 cycles 16,881,611 cache-misses 2,871,259 branch-misses 1.001636066 seconds time elapsed # Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20201014144255.22699-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jiri Olsa
|
78b2c50c5d |
perf test: Add build id shell test
Add a test for the build id cache that adds a binary with sha1 and md5 build ids and verifies it's added properly. The test updates build id cache with 'perf record' and 'perf buildid-cache -a'. Committer testing: # perf test "build id" 82: build id cache operations : Ok # # perf test -v "build id" 82: build id cache operations : --- start --- test child forked, pid 447218 test binaries: /tmp/perf.ex.SHA1.B8I /tmp/perf.ex.MD5.7Nv Adding d1abc1eb7568358cf23c959566f23462461834d1 /tmp/perf.ex.SHA1.B8I: Ok build id: d1abc1eb7568358cf23c959566f23462461834d1 link: /tmp/perf.debug.sS2/.build-id/d1/abc1eb7568358cf23c959566f23462461834d1 file: /tmp/perf.debug.sS2/.build-id/d1/../../tmp/perf.ex.SHA1.B8I/d1abc1eb7568358cf23c959566f23462461834d1/elf OK for /tmp/perf.ex.SHA1.B8I Adding a50e350e97c43b4708d09bcd85ebfff7 /tmp/perf.ex.MD5.7Nv: Ok build id: a50e350e97c43b4708d09bcd85ebfff7 link: /tmp/perf.debug.IuW/.build-id/a5/0e350e97c43b4708d09bcd85ebfff7 file: /tmp/perf.debug.IuW/.build-id/a5/../../tmp/perf.ex.MD5.7Nv/a50e350e97c43b4708d09bcd85ebfff7/elf OK for /tmp/perf.ex.MD5.7Nv [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.034 MB /tmp/perf.data.xrH ] build id: d1abc1eb7568358cf23c959566f23462461834d1 link: /tmp/perf.debug.eGR/.build-id/d1/abc1eb7568358cf23c959566f23462461834d1 file: /tmp/perf.debug.eGR/.build-id/d1/../../tmp/perf.ex.SHA1.B8I/d1abc1eb7568358cf23c959566f23462461834d1/elf OK for /tmp/perf.ex.SHA1.B8I [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.034 MB /tmp/perf.data.cbE ] build id: a50e350e97c43b4708d09bcd85ebfff7 link: /tmp/perf.debug.82t/.build-id/a5/0e350e97c43b4708d09bcd85ebfff7 file: /tmp/perf.debug.82t/.build-id/a5/../../tmp/perf.ex.MD5.7Nv/a50e350e97c43b4708d09bcd85ebfff7/elf OK for /tmp/perf.ex.MD5.7Nv test child finished with 0 ---- end ---- build id cache operations: Ok # Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20201013192441.1299447-10-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jiri Olsa
|
e9ad94381c |
perf tools: Align buildid list output for short build ids
With shorter md5 build ids we need to align their paths properly with other build ids: $ perf buildid-list 17f4e448cc746582ea1881528deb549f7fdb3fd5 [kernel.kallsyms] a50e350e97c43b4708d09bcd85ebfff7 .../tools/perf/buildid-ex-md5 1805c738c8f3ec0f47b7ea09080c28f34d18a82b /usr/lib64/ld-2.31.so $ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20201013192441.1299447-9-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jiri Olsa
|
b0a323c7f0 |
perf tools: Add size to 'struct perf_record_header_build_id'
We do not store size with build ids in perf data, but there's enough space to do it. Adding misc bit PERF_RECORD_MISC_BUILD_ID_SIZE to mark build id event with size. With this fix the dso with md5 build id will have correct build id data and will be usable for debuginfod processing if needed (coming in following patches). Committer notes: Use %zu with size_t to fix this error on 32-bit arches: util/header.c: In function '__event_process_build_id': util/header.c:2105:3: error: format '%lu' expects argument of type 'long unsigned int', but argument 6 has type 'size_t' [-Werror=format=] pr_debug("build id event received for %s: %s [%lu]\n", ^ Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20201013192441.1299447-8-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jiri Olsa
|
39be8d0115 |
perf tools: Pass build_id object to dso__build_id_equal()
Passing build_id object to dso__build_id_equal(), so we can properly check build id with different size than sha1. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20201013192441.1299447-7-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jiri Olsa
|
8dfdf440d3 |
perf tools: Pass build_id object to dso__set_build_id()
Passing build_id object to dso__set_build_id(), so it's easier to initialize dos's build id object. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20201013192441.1299447-6-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jiri Olsa
|
bf5411695a |
perf tools: Pass build_id object to build_id__sprintf()
Passing build_id object to build_id__sprintf function, so it can operate with the proper size of build id. This will create proper md5 build id readable names, like following: a50e350e97c43b4708d09bcd85ebfff7 instead of: a50e350e97c43b4708d09bcd85ebfff700000000 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20201013192441.1299447-5-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jiri Olsa
|
3ff1b8c8cc |
perf tools: Pass build id object to sysfs__read_build_id()
Passing build id object to sysfs__read_build_id function, so it can populate the size of the build_id object. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20201013192441.1299447-4-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jiri Olsa
|
f766819cd5 |
perf tools: Pass build_id object to filename__read_build_id()
Pass a build_id object to filename__read_build_id function, so it can populate the size of the build_id object. Changing filename__read_build_id() code for both ELF/non-ELF code. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20201013192441.1299447-3-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jiri Olsa
|
0aba7f036a |
perf tools: Use build_id object in dso
Replace build_id byte array with struct build_id object and all the code that references it. The objective is to carry size together with build id array, so it's better to keep both together. This is preparatory change for following patches, and there's no functional change. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20201013192441.1299447-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
79bbbabd22 |
perf config: Export the perf_config_from_file() function
We'll use it to ask for extra config files to be loaded, profile like stuff that will be used first to make 'perf trace' mimic 'strace' output via a 'perf strace' command that just sets up 'perf trace' output. At some point it'll be used for regression tests, where we'll run some simple commands like: perf strace ls > perf-strace.output strace ls > strace.output And then do some mutable syscall arg aware diff like tool to deal with arguments for things like mmap, that change at each execution, to be first ignored and then properly tracked when used accoss multiple syscalls. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
James Clark
|
79373082fa |
perf python: Autodetect python3 binary
Some distros don't come with python2 and only have python3 available. This causes the "'import perf' in python" self test to fail. This change adds python3 to the list of possible python versions that are autodetected but maintains the priorities for 'python2' and 'python' detection. Python3 has the lowest priority. Committer notes: On a fedora system without python2 packages the 'perf test python' continues to work: # python2 bash: python2: command not found... Similar command is: 'python' # rpm -qa | grep python2 # That "Similar command" gives the clue: # rpm -qf /usr/bin/python python-unversioned-command-3.8.5-5.fc32.noarch # rpm -ql python-unversioned-command /usr/bin/python /usr/share/man/man1/python.1.gz # With it in place the 'python' binary is found and perf builds the python binding using python3: # perf test -v python 19: 'import perf' in python : --- start --- test child forked, pid 379988 python usage test: "echo "import sys ; sys.path.append('/tmp/build/perf/python'); import perf" | '/usr/bin/python' " test child finished with 0 ---- end ---- 'import perf' in python: Ok # Looking at that path: # ls -la /tmp/build/perf/python total 1864 drwxrwxr-x. 2 acme acme 60 Oct 13 16:20 . drwxrwxr-x. 18 acme acme 4420 Oct 13 16:28 .. -rwxrwxr-x. 1 acme acme 1907216 Oct 13 16:28 perf.cpython-38-x86_64-linux-gnu.so # And: # ldd ~/bin/perf | grep python libpython3.8.so.1.0 => /lib64/libpython3.8.so.1.0 (0x00007f5471187000) # As soon as we remove it: # rpm -e python-unversioned-command-3.8.5-5.fc32.noarch # hash -r # python bash: python: command not found... Install package 'python-unversioned-command' to provide command 'python'? [N/y] n # And rebuilding perf now doesn't find python in the system: make: Entering directory '/home/acme/git/perf/tools/perf' BUILD: Doing 'make -j24' parallel build <SNIP> Makefile.config:786: No python interpreter was found: disables Python support - please install python-devel/python-dev <SNIP> After this patch: $ rpm -qi python-unversioned-command package python-unversioned-command is not installed $ $ python bash: python: command not found... Install package 'python-unversioned-command' to provide command 'python'? [N/y] ^C $ $ m make: Entering directory '/home/acme/git/perf/tools/perf' BUILD: Doing 'make -j24' parallel build <SNIP> CC /tmp/build/perf/tests/attr.o CC /tmp/build/perf/tests/python-use.o DESCEND plugins GEN /tmp/build/perf/python/perf.so INSTALL trace_plugins LD /tmp/build/perf/tests/perf-in.o LD /tmp/build/perf/perf-in.o LINK /tmp/build/perf/perf <SNIP> make: Leaving directory '/home/acme/git/perf/tools/perf' 19: 'import perf' in python : Ok $ ldd ~/bin/perf | grep python libpython3.8.so.1.0 => /lib64/libpython3.8.so.1.0 (0x00007f2c8c708000) $ ls -la /tmp/build/perf/python total 1864 drwxrwxr-x. 2 acme acme 60 Oct 13 16:20 . drwxrwxr-x. 18 acme acme 4420 Oct 13 16:31 .. -rwxrwxr-x. 1 acme acme 1907216 Oct 13 16:31 perf.cpython-38-x86_64-linux-gnu.so $ Signed-off-by: James Clark <james.clark@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> LPU-Reference: 20201005080645.6588-1-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
0fd0f00fdb |
perf tests: Show python test script in verbose mode
To help figure out where it is getting the binding. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Vasily Gorbik
|
6cf4ecf5c5 |
perf build: Allow nested externs to enable BUILD_BUG() usage
Currently BUILD_BUG() macro is expanded to smth like the following: do { extern void __compiletime_assert_0(void) __attribute__((error("BUILD_BUG failed"))); if (!(!(1))) __compiletime_assert_0(); } while (0); If used in a function body this obviously would produce build errors with -Wnested-externs and -Werror. To enable BUILD_BUG() usage in tools/arch/x86/lib/insn.c which perf includes in intel-pt-decoder, build perf without -Wnested-externs. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # build tested Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lore.kernel.org/lkml/patch-1.thread-251403.git-2514037e9477.your-ad-here.call-01602244460-ext-7088@work.hours Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Jiri Slaby
|
f3013f7ed4 |
perf trace: Fix off by ones in memset() after realloc() in arches using libaudit
'perf trace ls' started crashing after commit |
||
Leo Yan
|
edac75a2f8 |
perf c2c: Update usage for showing memory events
Since commit
|
||
Arnaldo Carvalho de Melo
|
dbaa1b3d9a |
Merge branch 'perf/urgent' into perf/core
To pick fixes that missed v5.9. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Joel Fernandes (Google)
|
dc000c4593 |
perf sched: Show start of latency as well
The 'perf sched latency' tool is really useful at showing worst-case latencies that task encountered since wakeup. However it shows only the end of the latency. Often times the start of a latency is interesting as it can show what else was going on at the time to cause the latency. I certainly myself spending a lot of time backtracking to the start of the latency in "perf sched script" which wastes a lot of time. This patch therefore adds a new column "Max delay start". Considering this, also rename "Maximum delay at" to "Max delay end" as its easier to understand. Example of the new output: ---------------------------------------------------------------------------------------------------------------------------------- Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Max delay start | Max delay end | ---------------------------------------------------------------------------------------------------------------------------------- MediaScannerSer:11936 | 651.296 ms | 67978 | avg: 0.113 ms | max: 77.250 ms | max start: 477.691360 s | max end: 477.768610 s audio@2.0-servi:(3) | 0.000 ms | 3440 | avg: 0.034 ms | max: 72.267 ms | max start: 477.697051 s | max end: 477.769318 s AudioOut_1D:8112 | 0.000 ms | 2588 | avg: 0.083 ms | max: 64.020 ms | max start: 477.710740 s | max end: 477.774760 s Time-limited te:14973 | 7966.090 ms | 24807 | avg: 0.073 ms | max: 15.563 ms | max start: 477.162746 s | max end: 477.178309 s surfaceflinger:8049 | 9.680 ms | 603 | avg: 0.063 ms | max: 13.275 ms | max start: 476.931791 s | max end: 476.945067 s HeapTaskDaemon:(3) | 1588.830 ms | 7040 | avg: 0.065 ms | max: 6.880 ms | max start: 473.666043 s | max end: 473.672922 s mount-passthrou:(3) | 1370.809 ms | 68904 | avg: 0.011 ms | max: 6.524 ms | max start: 478.090630 s | max end: 478.097154 s ReferenceQueueD:(3) | 11.794 ms | 1725 | avg: 0.014 ms | max: 6.521 ms | max start: 476.119782 s | max end: 476.126303 s writer:14077 | 18.410 ms | 1427 | avg: 0.036 ms | max: 6.131 ms | max start: 474.169675 s | max end: 474.175805 s Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20200925235634.4089867-1-joel@joelfernandes.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Sandipan Das
|
70830f974e |
perf vendor events: Fix typos in power8 PMU events
This replaces the incorrectly spelled word "localtion" with "location"
in some power8 PMU event descriptions.
Fixes:
|
||
Namhyung Kim
|
bf7ef5ddb0 |
perf bench: Run inject-build-id with --buildid-all option too
For comparison, it now runs the benchmark twice - one if regular -b and another for --buildid-all. $ perf bench internals inject-build-id # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 21.002 msec (+- 0.172 msec) Average time per event: 2.059 usec (+- 0.017 usec) Average memory usage: 8169 KB (+- 0 KB) Average build-id-all injection took: 19.543 msec (+- 0.124 msec) Average time per event: 1.916 usec (+- 0.012 usec) Average memory usage: 7348 KB (+- 0 KB) Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201012070214.2074921-7-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
27c9c3424f |
perf inject: Add --buildid-all option
Like 'perf record', we can even more speedup build-id processing by just using all DSOs. Then we don't need to look at all the sample events anymore. The following patch will update 'perf bench' to show the result of the --buildid-all option too. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Original-patch-by: Stephane Eranian <eranian@google.com> Acked-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201012070214.2074921-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
e7b60c5a0c |
perf inject: Do not load map/dso when injecting build-id
No need to load symbols in a DSO when injecting build-id. I guess the reason was to check the DSO is a special file like anon files. Use some helper functions in map.c to check them before reading build-id. Also pass sample event's cpumode to a new build-id event. It brought a speedup in the benchmark of 25 -> 21 msec on my laptop. Also the memory usage (Max RSS) went down by ~200 KB. # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 21.389 msec (+- 0.138 msec) Average time per event: 2.097 usec (+- 0.014 usec) Average memory usage: 8225 KB (+- 0 KB) Committer notes: Before: $ perf stat -r5 perf bench internals inject-build-id > /dev/null Performance counter stats for 'perf bench internals inject-build-id' (5 runs): 4,020.56 msec task-clock:u # 1.271 CPUs utilized ( +- 0.74% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 123,354 page-faults:u # 0.031 M/sec ( +- 0.81% ) 7,119,951,568 cycles:u # 1.771 GHz ( +- 1.74% ) (83.27%) 230,086,969 stalled-cycles-frontend:u # 3.23% frontend cycles idle ( +- 1.97% ) (83.41%) 1,168,298,765 stalled-cycles-backend:u # 16.41% backend cycles idle ( +- 1.13% ) (83.44%) 11,173,083,669 instructions:u # 1.57 insn per cycle # 0.10 stalled cycles per insn ( +- 1.58% ) (83.31%) 2,413,908,936 branches:u # 600.392 M/sec ( +- 1.69% ) (83.26%) 46,576,289 branch-misses:u # 1.93% of all branches ( +- 2.20% ) (83.31%) 3.1638 +- 0.0309 seconds time elapsed ( +- 0.98% ) $ After: $ perf stat -r5 perf bench internals inject-build-id > /dev/null Performance counter stats for 'perf bench internals inject-build-id' (5 runs): 2,379.94 msec task-clock:u # 1.473 CPUs utilized ( +- 0.18% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 62,584 page-faults:u # 0.026 M/sec ( +- 0.07% ) 2,372,389,668 cycles:u # 0.997 GHz ( +- 0.29% ) (83.14%) 106,937,862 stalled-cycles-frontend:u # 4.51% frontend cycles idle ( +- 4.89% ) (83.20%) 581,697,915 stalled-cycles-backend:u # 24.52% backend cycles idle ( +- 0.71% ) (83.47%) 3,659,692,199 instructions:u # 1.54 insn per cycle # 0.16 stalled cycles per insn ( +- 0.10% ) (83.63%) 791,372,961 branches:u # 332.518 M/sec ( +- 0.27% ) (83.39%) 10,648,083 branch-misses:u # 1.35% of all branches ( +- 0.22% ) (83.16%) 1.61570 +- 0.00172 seconds time elapsed ( +- 0.11% ) $ Signed-off-by: Namhyung Kim <namhyung@kernel.org> Original-patch-by: Stephane Eranian <eranian@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Link: https://lore.kernel.org/r/20201012070214.2074921-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
336c95b297 |
perf inject: Enter namespace when reading build-id
It should be in a proper mnt namespace when accessing the file. I think this had no problem since the build-id was actually read from map__load() -> dso__load() already. But I'd like to change it in the following commit. Acked-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20201012070214.2074921-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
2946ecedd0 |
perf inject: Add missing callbacks in perf_tool
I found some events (like PERF_RECORD_CGROUP) are not copied by perf inject due to the missing callbacks. Let's add them. While at it, I've changed the order of the callbacks to match with struct perf_tool so that we can compare them easily. Acked-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20201012070214.2074921-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
0bf02a0d80 |
perf bench: Add build-id injection benchmark
Sometimes I can see that 'perf record' piped with 'perf inject' take a long time processing build-ids. So introduce a inject-build-id benchmark to the internals benchmark suite to measure its overhead regularly. It runs the 'perf inject' command internally and feeds the given number of synthesized events (MMAP2 + SAMPLE basically). Usage: perf bench internals inject-build-id <options> -i, --iterations <n> Number of iterations used to compute average (default: 100) -m, --nr-mmaps <n> Number of mmap events for each iteration (default: 100) -n, --nr-samples <n> Number of sample events per mmap event (default: 100) -v, --verbose be more verbose (show iteration count, DSO name, etc) By default, it measures average processing time of 100 MMAP2 events and 10000 SAMPLE events. Below is a result on my laptop. $ perf bench internals inject-build-id # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 25.789 msec (+- 0.202 msec) Average time per event: 2.528 usec (+- 0.020 usec) Average memory usage: 8411 KB (+- 7 KB) Committer testing: $ perf bench Usage: perf bench [<common options>] <collection> <benchmark> [<options>] # List of all available benchmark collections: sched: Scheduler and IPC benchmarks syscall: System call benchmarks mem: Memory access benchmarks numa: NUMA scheduling and MM benchmarks futex: Futex stressing benchmarks epoll: Epoll stressing benchmarks internals: Perf-internals benchmarks all: All benchmarks $ perf bench internals # List of available benchmarks for collection 'internals': synthesize: Benchmark perf event synthesis kallsyms-parse: Benchmark kallsyms parsing inject-build-id: Benchmark build-id injection $ perf bench internals inject-build-id # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.202 msec (+- 0.059 msec) Average time per event: 1.392 usec (+- 0.006 usec) Average memory usage: 12650 KB (+- 10 KB) Average build-id-all injection took: 12.831 msec (+- 0.071 msec) Average time per event: 1.258 usec (+- 0.007 usec) Average memory usage: 11895 KB (+- 10 KB) $ $ perf stat -r5 perf bench internals inject-build-id # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.380 msec (+- 0.056 msec) Average time per event: 1.410 usec (+- 0.006 usec) Average memory usage: 12608 KB (+- 11 KB) Average build-id-all injection took: 11.889 msec (+- 0.064 msec) Average time per event: 1.166 usec (+- 0.006 usec) Average memory usage: 11838 KB (+- 10 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.246 msec (+- 0.065 msec) Average time per event: 1.397 usec (+- 0.006 usec) Average memory usage: 12744 KB (+- 10 KB) Average build-id-all injection took: 12.019 msec (+- 0.066 msec) Average time per event: 1.178 usec (+- 0.006 usec) Average memory usage: 11963 KB (+- 10 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.321 msec (+- 0.067 msec) Average time per event: 1.404 usec (+- 0.007 usec) Average memory usage: 12690 KB (+- 10 KB) Average build-id-all injection took: 11.909 msec (+- 0.041 msec) Average time per event: 1.168 usec (+- 0.004 usec) Average memory usage: 11938 KB (+- 10 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.287 msec (+- 0.059 msec) Average time per event: 1.401 usec (+- 0.006 usec) Average memory usage: 12864 KB (+- 10 KB) Average build-id-all injection took: 11.862 msec (+- 0.058 msec) Average time per event: 1.163 usec (+- 0.006 usec) Average memory usage: 12103 KB (+- 10 KB) # Running 'internals/inject-build-id' benchmark: Average build-id injection took: 14.402 msec (+- 0.053 msec) Average time per event: 1.412 usec (+- 0.005 usec) Average memory usage: 12876 KB (+- 10 KB) Average build-id-all injection took: 11.826 msec (+- 0.061 msec) Average time per event: 1.159 usec (+- 0.006 usec) Average memory usage: 12111 KB (+- 10 KB) Performance counter stats for 'perf bench internals inject-build-id' (5 runs): 4,267.48 msec task-clock:u # 1.502 CPUs utilized ( +- 0.14% ) 0 context-switches:u # 0.000 K/sec 0 cpu-migrations:u # 0.000 K/sec 102,092 page-faults:u # 0.024 M/sec ( +- 0.08% ) 3,894,589,578 cycles:u # 0.913 GHz ( +- 0.19% ) (83.49%) 140,078,421 stalled-cycles-frontend:u # 3.60% frontend cycles idle ( +- 0.77% ) (83.34%) 948,581,189 stalled-cycles-backend:u # 24.36% backend cycles idle ( +- 0.46% ) (83.25%) 5,835,587,719 instructions:u # 1.50 insn per cycle # 0.16 stalled cycles per insn ( +- 0.21% ) (83.24%) 1,267,423,636 branches:u # 296.996 M/sec ( +- 0.22% ) (83.12%) 17,484,290 branch-misses:u # 1.38% of all branches ( +- 0.12% ) (83.55%) 2.84176 +- 0.00222 seconds time elapsed ( +- 0.08% ) $ Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20201012070214.2074921-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Vasily Gorbik
|
ab0a40ea88 |
perf build: Allow nested externs to enable BUILD_BUG() usage
Currently the BUILD_BUG() macro is expanded to the following: do { extern void __compiletime_assert_0(void) __attribute__((error("BUILD_BUG failed"))); if (!(!(1))) __compiletime_assert_0(); } while (0); If used in a function body this would obviously produce build errors with -Wnested-externs and -Werror. To enable BUILD_BUG() usage in tools/arch/x86/lib/insn.c which perf includes in intel-pt-decoder, build perf without -Wnested-externs. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> # build tested Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/patch-1.thread-251403.git-2514037e9477.your-ad-here.call-01602244460-ext-7088@work.hours |
||
Linus Torvalds
|
22230cd2c5 |
Merge branch 'compat.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull compat mount cleanups from Al Viro: "The last remnants of mount(2) compat buried by Christoph. Buried into NFS, that is. Generally I'm less enthusiastic about "let's use in_compat_syscall() deep in call chain" kind of approach than Christoph seems to be, but in this case it's warranted - that had been an NFS-specific wart, hopefully not to be repeated in any other filesystems (read: any new filesystem introducing non-text mount options will get NAKed even if it doesn't mess the layout up). IOW, not worth trying to grow an infrastructure that would avoid that use of in_compat_syscall()..." * 'compat.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: remove compat_sys_mount fs,nfs: lift compat nfs4 mount data handling into the nfs code nfs: simplify nfs4_parse_monolithic |
||
Linus Torvalds
|
85ed13e78d |
Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull compat iovec cleanups from Al Viro: "Christoph's series around import_iovec() and compat variant thereof" * 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: security/keys: remove compat_keyctl_instantiate_key_iov mm: remove compat_process_vm_{readv,writev} fs: remove compat_sys_vmsplice fs: remove the compat readv/writev syscalls fs: remove various compat readv/writev helpers iov_iter: transparently handle compat iovecs in import_iovec iov_iter: refactor rw_copy_check_uvector and import_iovec iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c compat.h: fix a spelling error in <linux/compat.h> |
||
Linus Torvalds
|
ca1b66922a |
* Extend the recovery from MCE in kernel space also to processes which
encounter an MCE in kernel space but while copying from user memory by sending them a SIGBUS on return to user space and umapping the faulty memory, by Tony Luck and Youquan Song. * memcpy_mcsafe() rework by splitting the functionality into copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables support for new hardware which can recover from a machine check encountered during a fast string copy and makes that the default and lets the older hardware which does not support that advance recovery, opt in to use the old, fragile, slow variant, by Dan Williams. * New AMD hw enablement, by Yazen Ghannam and Akshay Gupta. * Do not use MSR-tracing accessors in #MC context and flag any fault while accessing MCA architectural MSRs as an architectural violation with the hope that such hw/fw misdesigns are caught early during the hw eval phase and they don't make it into production. * Misc fixes, improvements and cleanups, as always. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+EIpUACgkQEsHwGGHe VUouoBAAgwb+NkWZtIqGImV4f+LOyFjhTR/r/7ZyiijXdbhOIuAdc/jQM31mQxug sX2jxaRYnf1n6SLA0ggX99gwr2deRQ/hsNf5Abw55GC+Z1dOxpGL0k59A3ELl1IR H9KYmCAFQIHvzfk38qcdND73XHcgthQoXFBOG9wAPAdgDWnaiWt6lcLAq8OiJTmp D8pInAYhcnL8YXwMGyQQ1KkFn9HwydoWDsK5Ff2shaw2/+dMQqd1zetenbVtjhLb iNYGvV7Bi/RQ8PyMbzmtTWa4kwQJAHC2gptkGxty//2ADGVBbqUQdqF9TjIWCNy5 V6Ldv5zo0/1s7DOzji3htzqkSs/K1Ea6d2LtZjejkJipHKV5x068UC6Fu+PlfS2D VZfcICeapU4G2F3Zvks2DlZ7dVTbHCvoI78Qi7bBgczPUVmk6iqah4xuQaiHyBJc kTFDA4Nnf/026GpoWRiFry9vqdnHBZyLet5A6Y+SoWF0FbhYnCVPpq4MnussYoav lUIi9ZZav6X2RZp9DDM1f9d5xubtKq0DKt93wvzqAhjK0T2DikckJ+riOYkI6N8t fHCBNUkdfgyMzJUTBPAzYQ7RmjbjKWJi7xWP0oz6+GqOJkQfSTVC5/2yEffbb3ya whYRS6iklbl7yshzaOeecXsZcAeK2oGPfoHg34WkHFgXdF5mNgA= =u1Wg -----END PGP SIGNATURE----- Merge tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Borislav Petkov: - Extend the recovery from MCE in kernel space also to processes which encounter an MCE in kernel space but while copying from user memory by sending them a SIGBUS on return to user space and umapping the faulty memory, by Tony Luck and Youquan Song. - memcpy_mcsafe() rework by splitting the functionality into copy_mc_to_user() and copy_mc_to_kernel(). This, as a result, enables support for new hardware which can recover from a machine check encountered during a fast string copy and makes that the default and lets the older hardware which does not support that advance recovery, opt in to use the old, fragile, slow variant, by Dan Williams. - New AMD hw enablement, by Yazen Ghannam and Akshay Gupta. - Do not use MSR-tracing accessors in #MC context and flag any fault while accessing MCA architectural MSRs as an architectural violation with the hope that such hw/fw misdesigns are caught early during the hw eval phase and they don't make it into production. - Misc fixes, improvements and cleanups, as always. * tag 'ras_updates_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Allow for copy_mc_fragile symbol checksum to be generated x86/mce: Decode a kernel instruction to determine if it is copying from user x86/mce: Recover from poison found while copying from user space x86/mce: Avoid tail copy when machine check terminated a copy from user x86/mce: Add _ASM_EXTABLE_CPY for copy user access x86/mce: Provide method to find out the type of an exception handler x86/mce: Pass pointer to saved pt_regs to severity calculation routines x86/copy_mc: Introduce copy_mc_enhanced_fast_string() x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() x86/mce: Drop AMD-specific "DEFERRED" case from Intel severity rule list x86/mce: Add Skylake quirk for patrol scrub reported errors RAS/CEC: Convert to DEFINE_SHOW_ATTRIBUTE() x86/mce: Annotate mce_rd/wrmsrl() with noinstr x86/mce/dev-mcelog: Do not update kflags on AMD systems x86/mce: Stop mce_reign() from re-computing severity for every CPU x86/mce: Make mce_rdmsrl() panic on an inaccessible MSR x86/mce: Increase maximum number of banks to 64 x86/mce: Delay clearing IA32_MCG_STATUS to the end of do_machine_check() x86/MCE/AMD, EDAC/mce_amd: Remove struct smca_hwid.xec_bitmap RAS/CEC: Fix cec_init() prototype |
||
Dan Williams
|
ec6347bb43 |
x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()
In reaction to a proposal to introduce a memcpy_mcsafe_fast() implementation Linus points out that memcpy_mcsafe() is poorly named relative to communicating the scope of the interface. Specifically what addresses are valid to pass as source, destination, and what faults / exceptions are handled. Of particular concern is that even though x86 might be able to handle the semantics of copy_mc_to_user() with its common copy_user_generic() implementation other archs likely need / want an explicit path for this case: On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote: > > > > However now I see that copy_user_generic() works for the wrong reason. > > It works because the exception on the source address due to poison > > looks no different than a write fault on the user address to the > > caller, it's still just a short copy. So it makes copy_to_user() work > > for the wrong reason relative to the name. > > Right. > > And it won't work that way on other architectures. On x86, we have a > generic function that can take faults on either side, and we use it > for both cases (and for the "in_user" case too), but that's an > artifact of the architecture oddity. > > In fact, it's probably wrong even on x86 - because it can hide bugs - > but writing those things is painful enough that everybody prefers > having just one function. Replace a single top-level memcpy_mcsafe() with either copy_mc_to_user(), or copy_mc_to_kernel(). Introduce an x86 copy_mc_fragile() name as the rename for the low-level x86 implementation formerly named memcpy_mcsafe(). It is used as the slow / careful backend that is supplanted by a fast copy_mc_generic() in a follow-on patch. One side-effect of this reorganization is that separating copy_mc_64.S to its own file means that perf no longer needs to track dependencies for its memcpy_64.S benchmarks. [ bp: Massage a bit. ] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: <stable@vger.kernel.org> Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com |
||
Christoph Hellwig
|
c3973b401e |
mm: remove compat_process_vm_{readv,writev}
Now that import_iovec handles compat iovecs, the native syscalls can be used for the compat case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Christoph Hellwig
|
598b3cec83 |
fs: remove compat_sys_vmsplice
Now that import_iovec handles compat iovecs, the native vmsplice syscall can be used for the compat case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Christoph Hellwig
|
5f764d624a |
fs: remove the compat readv/writev syscalls
Now that import_iovec handles compat iovecs, the native readv and writev syscalls can be used for the compat case as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Jiri Olsa
|
6fcd5ddc3b |
perf python scripting: Fix printable strings in python3 scripts
Hagen reported broken strings in python3 tracepoint scripts:
make PYTHON=python3
perf record -e sched:sched_switch -a -- sleep 5
perf script --gen-script py
perf script -s ./perf-script.py
[..]
sched__sched_switch 7 563231.759525792 0 swapper prev_comm=bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00'), prev_pid=0, prev_prio=120, prev_state=, next_comm=bytearray(b'mutex-thread-co\x00'),
The problem is in the is_printable_array function that does not take the
zero byte into account and claim such string as not printable, so the
code will create byte array instead of string.
Committer testing:
After this fix:
sched__sched_switch 3 484522.497072626 1158680 kworker/3:0-eve prev_comm=kworker/3:0, prev_pid=1158680, prev_prio=120, prev_state=I, next_comm=swapper/3, next_pid=0, next_prio=120
Sample: {addr=0, cpu=3, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744071841817196, period=1, phys_addr=0, pid=1158680, tid=1158680, time=484522497072626, transaction=0, values=[(0, 0)], weight=0}
sched__sched_switch 4 484522.497085610 1225814 perf prev_comm=perf, prev_pid=1225814, prev_prio=120, prev_state=, next_comm=migration/4, next_pid=30, next_prio=0
Sample: {addr=0, cpu=4, datasrc=84410401, datasrc_decode=N/A|SNP N/A|TLB N/A|LCK N/A, ip=18446744071841817196, period=1, phys_addr=0, pid=1225814, tid=1225814, time=484522497085610, transaction=0, values=[(0, 0)], weight=0}
Fixes:
|
||
Arnaldo Carvalho de Melo
|
388968d864 |
perf trace: Use the autogenerated mmap 'prot' string/id table
No change in behaviour: # perf trace -e mmap sleep 1 0.000 ( 0.009 ms): sleep/751870 mmap(len: 143317, prot: READ, flags: PRIVATE, fd: 3) = 0x7fa96d0f7000 0.028 ( 0.004 ms): sleep/751870 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) = 0x7fa96d0f5000 0.037 ( 0.005 ms): sleep/751870 mmap(len: 1872744, prot: READ, flags: PRIVATE|DENYWRITE, fd: 3) = 0x7fa96cf2b000 0.044 ( 0.011 ms): sleep/751870 mmap(addr: 0x7fa96cf50000, len: 1376256, prot: READ|EXEC, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x25000) = 0x7fa96cf50000 0.056 ( 0.007 ms): sleep/751870 mmap(addr: 0x7fa96d0a0000, len: 307200, prot: READ, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x175000) = 0x7fa96d0a0000 0.064 ( 0.007 ms): sleep/751870 mmap(addr: 0x7fa96d0eb000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 0x1bf000) = 0x7fa96d0eb000 0.075 ( 0.005 ms): sleep/751870 mmap(addr: 0x7fa96d0f1000, len: 13160, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) = 0x7fa96d0f1000 0.253 ( 0.005 ms): sleep/751870 mmap(len: 218049136, prot: READ, flags: PRIVATE, fd: 3) = 0x7fa95ff38000 # # # set -o vi # strace -e mmap sleep 1 mmap(NULL, 143317, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f333bd83000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f333bd81000 mmap(NULL, 1872744, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f333bbb7000 mmap(0x7f333bbdc000, 1376256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f333bbdc000 mmap(0x7f333bd2c000, 307200, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x175000) = 0x7f333bd2c000 mmap(0x7f333bd77000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bf000) = 0x7f333bd77000 mmap(0x7f333bd7d000, 13160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f333bd7d000 mmap(NULL, 218049136, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f332ebc4000 +++ exited with 0 +++ # And you can as well tweak 'perf trace's output to more closely match strace's: # perf config trace.show_arg_names=no # perf config trace.show_duration=no # perf config trace.show_prefix=yes # perf config trace.show_timestamp=no # perf config trace.show_zeros=yes # perf config trace.no_inherit=yes # perf trace -e mmap sleep 1 mmap(NULL, 143317, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0d287ca000 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS) = 0x7f0d287c8000 mmap(NULL, 1872744, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0d285fe000 mmap(0x7f0d28623000, 1376256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f0d28623000 mmap(0x7f0d28773000, 307200, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x175000) = 0x7f0d28773000 mmap(0x7f0d287be000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bf000) = 0x7f0d287be000 mmap(0x7f0d287c4000, 13160, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS) = 0x7f0d287c4000 mmap(NULL, 218049136, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f0d1b60b000 # # perf config | grep ^trace trace.show_arg_names=no trace.show_duration=no trace.show_prefix=yes trace.show_timestamp=no trace.show_zeros=yes trace.no_inherit=yes # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
08fc476214 |
tools beauty: Add script to generate table of mmap's 'prot' argument
Will be wired up in the following csets: $ tools/perf/trace/beauty/mmap_prot.sh static const char *mmap_prot[] = { [ilog2(0x1) + 1] = "READ", #ifndef PROT_READ #define PROT_READ 0x1 #endif [ilog2(0x2) + 1] = "WRITE", #ifndef PROT_WRITE #define PROT_WRITE 0x2 #endif [ilog2(0x4) + 1] = "EXEC", #ifndef PROT_EXEC #define PROT_EXEC 0x4 #endif [ilog2(0x8) + 1] = "SEM", #ifndef PROT_SEM #define PROT_SEM 0x8 #endif [ilog2(0x01000000) + 1] = "GROWSDOWN", #ifndef PROT_GROWSDOWN #define PROT_GROWSDOWN 0x01000000 #endif [ilog2(0x02000000) + 1] = "GROWSUP", #ifndef PROT_GROWSUP #define PROT_GROWSUP 0x02000000 #endif }; $ $ $ $ tools/perf/trace/beauty/mmap_prot.sh alpha static const char *mmap_prot[] = { [ilog2(0x4) + 1] = "EXEC", #ifndef PROT_EXEC #define PROT_EXEC 0x4 #endif [ilog2(0x01000000) + 1] = "GROWSDOWN", #ifndef PROT_GROWSDOWN #define PROT_GROWSDOWN 0x01000000 #endif [ilog2(0x02000000) + 1] = "GROWSUP", #ifndef PROT_GROWSUP #define PROT_GROWSUP 0x02000000 #endif [ilog2(0x1) + 1] = "READ", #ifndef PROT_READ #define PROT_READ 0x1 #endif [ilog2(0x8) + 1] = "SEM", #ifndef PROT_SEM #define PROT_SEM 0x8 #endif [ilog2(0x2) + 1] = "WRITE", #ifndef PROT_WRITE #define PROT_WRITE 0x2 #endif }; $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
61693228b6 |
perf beauty mmap_flags: Conditionaly define the mmap flags
So that in older systems we get it in the mmap flags scnprintf routines: $ tools/perf/trace/beauty/mmap_flags.sh | head -9 2> /dev/null static const char *mmap_flags[] = { [ilog2(0x40) + 1] = "32BIT", #ifndef MAP_32BIT #define MAP_32BIT 0x40 #endif [ilog2(0x01) + 1] = "SHARED", #ifndef MAP_SHARED #define MAP_SHARED 0x01 #endif $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
9012e3dda2 |
perf trace beauty: Add script to autogenerate mremap's flags args string/id table
It'll also conditionally generate the defines, so that if we don't have those when building a new tool tarball in an older systems, we get those, and we need them sometimes in the actual scnprintf routine, such as when checking if a flags means we have an extra arg, like with MREMAP_FIXED. $ tools/perf/trace/beauty/mremap_flags.sh static const char *mremap_flags[] = { [ilog2(1) + 1] = "MAYMOVE", #ifndef MREMAP_MAYMOVE #define MREMAP_MAYMOVE 1 #endif [ilog2(2) + 1] = "FIXED", #ifndef MREMAP_FIXED #define MREMAP_FIXED 2 #endif [ilog2(4) + 1] = "DONTUNMAP", #ifndef MREMAP_DONTUNMAP #define MREMAP_DONTUNMAP 4 #endif }; $ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
d758d5d474 |
perf tools: Separate the checking of headers only used to build beautification tables
Some headers are not used in building the tools directly, but instead to generate tables that then gets source code included to do id->string and string->id lookups for things like syscall flags and commands. We were adding it directly to tools/include/ and this sometimes gets in the way of building using system headers, lets untangle this a bit. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Ian Rogers
|
a55b7bb1c1 |
perf test: Fix msan uninitialized use.
Ensure 'st' is initialized before an error branch is taken.
Fixes test "67: Parse and process metrics" with LLVM msan:
==6757==WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x5570edae947d in rblist__exit tools/perf/util/rblist.c:114:2
#1 0x5570edb1c6e8 in runtime_stat__exit tools/perf/util/stat-shadow.c:141:2
#2 0x5570ed92cfae in __compute_metric tools/perf/tests/parse-metric.c:187:2
#3 0x5570ed92cb74 in compute_metric tools/perf/tests/parse-metric.c:196:9
#4 0x5570ed92c6d8 in test_recursion_fail tools/perf/tests/parse-metric.c:318:2
#5 0x5570ed92b8c8 in test__parse_metric tools/perf/tests/parse-metric.c:356:2
#6 0x5570ed8de8c1 in run_test tools/perf/tests/builtin-test.c:410:9
#7 0x5570ed8ddadf in test_and_print tools/perf/tests/builtin-test.c:440:9
#8 0x5570ed8dca04 in __cmd_test tools/perf/tests/builtin-test.c:661:4
#9 0x5570ed8dbc07 in cmd_test tools/perf/tests/builtin-test.c:807:9
#10 0x5570ed7326cc in run_builtin tools/perf/perf.c:313:11
#11 0x5570ed731639 in handle_internal_command tools/perf/perf.c:365:8
#12 0x5570ed7323cd in run_argv tools/perf/perf.c:409:2
#13 0x5570ed731076 in main tools/perf/perf.c:539:3
Fixes: commit
|
||
Ian Rogers
|
aa98d8482c |
perf parse-events: Reduce casts around bp_addr
perf_event_attr bp_addr is a u64. parse-events.y parses it as a u64, but casts it to a void* and then parse-events.c casts it back to a u64. Rather than all the casts, change the type of the address to be a u64. This removes an issue noted in: https://lore.kernel.org/lkml/20200903184359.GC3495158@kernel.org/ Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200925003903.561568-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
40b74c30ff |
perf test: Add expand cgroup event test
It'll expand given events for cgroups A, B and C. $ perf test -v expansion 69: Event expansion for cgroups : --- start --- test child forked, pid 983140 metric expr 1 / IPC for CPI metric expr instructions / cycles for IPC found event instructions found event cycles adding {instructions,cycles}:W copying metric event for cgroup 'A': instructions (idx=0) copying metric event for cgroup 'B': instructions (idx=0) copying metric event for cgroup 'C': instructions (idx=0) test child finished with 0 ---- end ---- Event expansion for cgroups: Ok Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: John Garry <john.garry@huawei.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200924124455.336326-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Namhyung Kim
|
89fb1ca2ab |
perf tools: Allow creation of cgroup without open
This is a preparation for a test case of expanding events for multiple cgroups. Instead of using real system cgroup, the test will use fake cgroups so it needs a way to have them without a open file descriptor. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20200924124455.336326-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |