perf bench: (Jiri Olsa):
. Fix NUMA report output code handling of less than 1s runtimes.
perf script: (Ravi Bangoria)
. Add missing output fields in a 'perf script -h' hint.
. Fix crash because of missing evsel->priv.
. Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE], which
is just a end of features header marker.
perf stat: (Thomas Richter)
. Remove duplicate event counting
perf test:
. Wire parsing error handling in 'parse events' test (Jiri Olsa)
. Fix 'session topology' test on s/390 (Thomas Richter)
eBPF: (Yonghong Song)
. Fix a clang 7.0 compilation error when building perf linking
with libclang
intel-pt: (Adrian Hunter)
. Fix packet decoding of CYC packets.
Copies of kernel files: (Arnaldo Carvalho de Melo)
. Synchronize drm/drm.h UAPI
. Update x86's syscall_64.tbl, adding support for 'io_pgetevents' and 'rseq'
in 'perf trace'.
. Update powerpc uapi/asm/unistd.h, adding support for the 'rseq' syscall.
. Update if_link.h and bpf.h, no effect on tool features.
PowerPC: (Sandipan Das)
. Fix crash if callchain is empty.
s/390: (Thomas Richter)
. Support random socked_id assignment in the perf header.
. Support s390 random socket_id assignment in perf.data file.
. Make PMU alias definitions taken from sysfs and JSON files comparable
by normalizing them wrt spaces and newlines.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEELb9bqkb7Te0zijNb1lAW81NSqkAFAlsxGm0ACgkQ1lAW81NS
qkCgLg/+JIm0GDKnYiLNRGEduw5nTy0+KHwE84Zo2GnW8BzCGzMnsFQNgKM0+xjb
tMrZ9uFG3zieNisVRCyDoXQvvmlsr0kggqUGDNSZJa7Cx2bX28GW3X2cVrqbV9zm
12ubPClk65lJ7WN3ti3gqzEbkKwoP6/KbIdAgwIhwCobVczw2eNgvYnB6ycWjh4D
3Ly7CLjzYI05QgGDoZntv9PkN7MQ9zil7lQjGc8FzMeeCxXuikVaOVywGda8FIyl
bdXMyVYQZ+fmGZ/Vxs1gwouLsm+734ad1SY0vwR9FK0gvFlRD2Ls4kROmNjpAxqj
68PHg5T8Bw9zz1MKQ02BK1Qzb+kAWWBMhOkKGnZWoG/lvQABbVpIMSuo8FqppjQ4
adUjxvxnFYIkeRiWneyv2/ezmDtWxjnwYE3SIMjwSJH1R1rSVqoJ6qot0TKRXXnt
UyF8mHTlVkPbOpYW9aZKFuYA5e7qdUQTLjhrbStE9U8YKLE4vlnkYdZpK9anJlzz
tPrM9rKGjszZuceRJFCWvoL01h73b3KsScW2GieyakxcFdldDcgTPDpNsoVwjGl7
YQwrJkuRW/M0yLYyZ7LYqBW1exCSayRC1L4cxZgP12xzEsxhg+MlLLxturF62F5Y
qERgDmeG8bcUmhpltHo8MIY3OAk1TNBtRdzWMEwOTxjybh93NOM=
=96FW
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo-4.18-20180625' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
perf bench: (Jiri Olsa):
- Fix NUMA report output code handling of less than 1s runtimes.
perf script: (Ravi Bangoria)
- Add missing output fields in a 'perf script -h' hint.
- Fix crash because of missing evsel->priv.
- Fix crash caused by accessing feat_ops[HEADER_LAST_FEATURE], which
is just a end of features header marker.
perf stat: (Thomas Richter)
- Remove duplicate event counting
perf test:
- Wire parsing error handling in 'parse events' test (Jiri Olsa)
- Fix 'session topology' test on s/390 (Thomas Richter)
eBPF: (Yonghong Song)
- Fix a clang 7.0 compilation error when building perf linking
with libclang
intel-pt: (Adrian Hunter)
- Fix packet decoding of CYC packets.
Copies of kernel files: (Arnaldo Carvalho de Melo)
- Synchronize drm/drm.h UAPI
- Update x86's syscall_64.tbl, adding support for 'io_pgetevents' and 'rseq'
in 'perf trace'.
- Update powerpc uapi/asm/unistd.h, adding support for the 'rseq' syscall.
- Update if_link.h and bpf.h, no effect on tool features.
PowerPC: (Sandipan Das)
- Fix crash if callchain is empty.
s/390: (Thomas Richter)
- Support random socked_id assignment in the perf header.
- Support s390 random socket_id assignment in perf.data file.
- Make PMU alias definitions taken from sysfs and JSON files comparable
by normalizing them wrt spaces and newlines.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf_event__process_feature() accesses feat_ops[HEADER_LAST_FEATURE]
which is not defined and thus perf is crashing. HEADER_LAST_FEATURE is
used as an end marker for the perf report but it's unused for perf
script/annotate. Ignore HEADER_LAST_FEATURE for perf script/annotate,
just like it is done in 'perf report'.
Before:
# perf record -o - ls | perf script
<SNIP 'ls' output>
Segmentation fault (core dumped)
#
After:
# perf record -o - ls | perf script
<SNIP 'ls' output>
Segmentation fault (core dumped)
ls 7031 4392.099856: 250000 cpu-clock:uhH: 7f5e0ce7cd60
ls 7031 4392.100355: 250000 cpu-clock:uhH: 7f5e0c706ef7
#
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: 57b5de4639 ("perf report: Support forced leader feature in pipe mode")
Link: http://lkml.kernel.org/r/20180625124220.6434-4-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
A few fields are missing in a perf script -F hint. Add them.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20180625124220.6434-2-ravi.bangoria@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently we can hit following assert when running numa bench:
$ perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZ0cm --thp 1
perf: bench/numa.c:1577: __bench_numa: Assertion `!(!(((wait_stat) & 0x7f) == 0))' failed.
The assertion is correct, because we hit the SIGFPE in following line:
Thread 2.2 "thread 0/0" received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7fffd28c6700 (LWP 11750)]
0x000.. in worker_thread (__tdata=0x7.. ) at bench/numa.c:1257
1257 td->speed_gbs = bytes_done / (td->runtime_ns / NSEC_PER_SEC) / 1e9;
We don't check if the runtime is actually bigger than 1 second,
and thus this might end up with zero division within FPU.
Adding the check to prevent this.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180620094036.17278-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'perf stat' shows a mismatch in perf stat regarding counter names on
s390:
Run command:
[root@s35lp76 perf]# ./perf stat -e tx_nc_tend -v --
~/mytesttx 1 >/tmp/111
tx_nc_tend: 1 573146 573146
tx_nc_tend: 1 573146 573146
Performance counter stats for '/root/mytesttx 1':
3 tx_nc_tend
0.001037252 seconds time elapsed
[root@s35lp76 perf]#
shows transaction counter tx_nc_tend with value 3 but it was triggered
only once as seen by the output of mytesttx.
When looking up the event name tx_nc_tend the following function
sequence is called:
parse_events_multi_pmu_add()
+--> perf_pmu__scan() being called with NULL argument
+--> pmu_read_sysfs() scans directory ../devices/ for
all PMUs
+--> perf_pmu__find() tries to find a PMU in the
global pmu list.
+--> pmu_lookup() called to read all file
entries when not in global
list.
pmu_lookup() causes the issue. It calls
+---> pmu_aliases() to read all the entries in the PMU directory.
On s390 this is named
/sys/devices/cpum_cf/events.
+--> pmu_aliases_parse() reads all files and creates an
alias for each file name.
So we end up with first entry created by
reading the sysfs file
[root@s35lp76 perf]# cat /sys/devices/cpum_cf
/events/TX_NC_TEND
event=0x008d
[root@s35lp76 perf]#
Debug output shows this entry
tx_nc_tend -> 'cpum_cf'/'event=0x008d
'/
After all files in this directory have been
read and aliases created this function is called:
+--> pmu_add_cpu_aliases()
This function looks up the CPU tables
created by the json files.
With json files for s390 now available all
the aliases are added to
the PMU alias list a second time.
The second entry is added by
reading the json file converted by jevent
resulting in file pmu-events/pmu-events.c:
{
.name = "tx_nc_tend",
.event = "event=0x8d",
.desc = "Unit: cpum_cf Completed TEND \
instructions \
in non-constrained TX mode",
.topic = "extended",
.long_desc = "A TEND instruction has \
completed in a \
non-constrained \
transactional-execution mode",
.pmu = "cpum_cf",
},
Debug output shows this entry
tx_nc_tend -> 'cpum_cf'/'event=0x8d'/
Function pmu_aliases_parse() and pmu_add_cpu_aliases() both use
__perf_pmu__new_alias() to add an alias to the PMU alias list. There is
no check if an alias already exist
So we end up with 2 entries for tx_nc_tend in the PMU alias list.
Having set up the PMU alias list for this PMU now
parse_events_multi_add_pmu() reads the complete alias list and adds each
alias with parse_events_add_pmu() to the global perfev_list. This
causes the alias to be added multiple times to the event list.
Fix this by making __perf_pmu__new_alias() to merge alias definitions if
an alias is already on the alias list. Also print a debug message when
the alias has mismatches in some fields.
Output before:
[root@s35lp76 perf]# ./perf stat -e tx_nc_tend -v \
-- ~/mytesttx 1 >/tmp/111
tx_nc_tend: 1 551446 551446
Performance counter stats for '/root/mytesttx 1':
3 tx_nc_tend
0.000961134 seconds time elapsed
[root@s35lp76 perf]#
Output after:
[root@s35lp76 perf]# ./perf stat -e tx_nc_tend -v \
-- ~/mytesttx 1 >/tmp/111
tx_nc_tend: 1 551446 551446
Performance counter stats for '/root/mytesttx 1':
1 tx_nc_tend
0.000961134 seconds time elapsed
[root@s35lp76 perf]#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180615101105.47047-3-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
PMU alias definitions in sysfs files may have spaces, newlines and
numbers with leading zeroes. Some alias definitions may also appear in
JSON files without spaces, etc.
Scan alias definitions and remove leading zeroes, spaces, newlines, etc
and rebuild string to make alias->str member comparable.
s390 for example has terms specified as event=0x0091 (read from files
../<PMU>/events/<FILE> and terms specified as event=0x91 (read from JSON
files).
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180615101105.47047-2-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Remove a trailing newline when reading sysfs file contents such as
/sys/devices/cpum_cf/events/TX_NC_TEND. This shows when verbose option
-v is used.
Output before:
tx_nc_tend -> 'cpum_cf'/'event=0x008d
'/
Output after:
tx_nc_tend -> 'cpum_cf'/'event=0x8d'/
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180615101105.47047-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arnaldo reported the perf build failure with latest llvm/clang compiler
(7.0).
$ make LIBCLANGLLVM=1 -C tools/perf/
<SNIP>
CC /tmp/tmp.t53Qo38zci/tests/kmod-path.o
util/c++/clang.cpp: In function ‘std::unique_ptr<llvm::SmallVectorImpl<char> >
perf::getBPFObjectFromModule(llvm::Module*)’:
util/c++/clang.cpp:150:43: error: no matching function for call to
‘llvm::TargetMachine::addPassesToEmitFile(llvm::legacy::PassManager&,
llvm::raw_svector_ostream&, llvm::TargetMachine::CodeGenFileType)’
TargetMachine::CGFT_ObjectFile)) {
^
In file included from util/c++/clang.cpp:25:0:
/usr/local/include/llvm/Target/TargetMachine.h:254:16: note: candidate:
virtual bool llvm::TargetMachine::addPassesToEmitFile(
llvm::legacy::PassManagerBase&, llvm::raw_pwrite_stream&,
llvm::raw_pwrite_stream*, llvm::TargetMachine::CodeGenFileType, bool,
llvm::MachineModuleInfo*)
virtual bool addPassesToEmitFile(PassManagerBase &, raw_pwrite_stream &,
^~~~~~~~~~~~~~~~~~~
/usr/local/include/llvm/Target/TargetMachine.h:254:16: note:
candidate expects 6 arguments, 3 provided
mv: cannot stat '/tmp/tmp.t53Qo38zci/util/c++/.clang.o.tmp': No such file or directory
make[7]: *** [/home/acme/git/perf/tools/build/Makefile.build:101:
/tmp/tmp.t53Qo38zci/util/c++/clang.o] Error 1
make[6]: *** [/home/acme/git/perf/tools/build/Makefile.build:139: c++] Error 2
make[5]: *** [/home/acme/git/perf/tools/build/Makefile.build:139: util] Error 2
make[5]: *** Waiting for unfinished jobs....
CC /tmp/tmp.t53Qo38zci/tests/thread-map.o
The function addPassesToEmitFile signature changed in llvm 7.0 and such
a change caused the failure. This patch fixed the issue with using
proper function signatures under different compiler versions.
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20180616174739.1076733-1-yhs@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This updates the tools/perf/ copy of the system call table for x86 which makes
'perf trace' become aware of the new 'io_pgetevents' and 'rseq' syscalls, no
matter in which system it gets built, i.e. older systems where the syscalls are
not available in the running kernel (via tracefs) or in the system headers will
still be aware of these syscalls/.
These are the csets introducing the source drift:
05c17cedf8 ("x86: Wire up restartable sequence system call")
7a074e96de ("aio: implement io_pgetevents")
This results in this build time change:
$ diff -u /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c.old /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c
--- /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c.old 2018-06-15 11:48:17.648948094 -0300
+++ /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c 2018-06-15 11:48:22.133942480 -0300
@@ -332,5 +332,7 @@
[330] = "pkey_alloc",
[331] = "pkey_free",
[332] = "statx",
+ [333] = "io_pgetevents",
+ [334] = "rseq",
};
-#define SYSCALLTBL_x86_64_MAX_ID 332
+#define SYSCALLTBL_x86_64_MAX_ID 334
$
This silences the following tools/perf/ build warning:
Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-tfvyz51sabuzemrszbrhzxni@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding optional 'valid' callback for events tests in parse-events
object, so we don't try to parse PMUs, which are not supported.
Following line is displayed for skipped test:
running test 52 'intel_pt//u'... SKIP
Committer note:
Use named initializers in the struct evlist_test variable to avoid
breaking the build on centos:5, 6 and others with a similar gcc:
cc1: warnings being treated as errors
tests/parse-events.c: In function 'test_pmu_events':
tests/parse-events.c:1817: error: missing initializer
tests/parse-events.c:1817: error: (near initialization for 'e.type')
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: http://lkml.kernel.org/r/20180611093422.1005-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add missing error handling for parse_events calls in test_event function
that led to following segfault on s390:
running test 52 'intel_pt//u'
perf: Segmentation fault
...
/lib64/libc.so.6(vasprintf+0xe6) [0x3fffca3f106]
/lib64/libc.so.6(asprintf+0x46) [0x3fffca1aa96]
./perf(parse_events_add_pmu+0xb8) [0x80132088]
./perf(parse_events_parse+0xc62) [0x8019529a]
./perf(parse_events+0x98) [0x801341c0]
./perf(test__parse_events+0x48) [0x800cd140]
./perf(cmd_test+0x26a) [0x800bd44a]
test child interrupted
Adding the struct parse_events_error argument to parse_events call. Also
adding parse_events_print_error to get more details on the parsing
failures, like:
# perf test 6 -v
running test 52 'intel_pt//u'failed to parse event 'intel_pt//u', err 1, str 'Cannot find PMU `intel_pt'. Missing kernel support?'
event syntax error: 'intel_pt//u'
\___ Cannot find PMU `intel_pt'. Missing kernel support?
Committer note:
Use named initializers in the struct parse_events_error variable to
avoid breaking the build on centos5, 6 and others with a similar gcc:
cc1: warnings being treated as errors
tests/parse-events.c: In function 'test_event':
tests/parse-events.c:1696: error: missing initializer
tests/parse-events.c:1696: error: (near initialization for 'err.str')
Reported-by: Kim Phillips <kim.phillips@arm.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Kim Phillips <kim.phillips@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: http://lkml.kernel.org/r/20180611093422.1005-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For some cases, the callchain provided by the kernel may be empty. So,
the callchain ip filtering code will cause a crash if we do not check
whether the struct ip_callchain pointer is NULL before accessing any
members.
This can be observed on a powerpc64le system running Fedora 27 as shown
below.
# perf record -b -e cycles:u ls
Before:
# perf report --branch-history
perf: Segmentation fault
-------- backtrace --------
perf[0x1027615c]
linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x7fff856304d8]
perf(arch_skip_callchain_idx+0x44)[0x10257c58]
perf[0x1017f2e4]
perf(thread__resolve_callchain+0x124)[0x1017ff5c]
perf(sample__resolve_callchain+0xf0)[0x10172788]
...
After:
# perf report --branch-history
Samples: 25 of event 'cycles:u', Event count (approx.): 2306870
Overhead Source:Line Symbol Shared Object
+ 11.60% _init+35736 [.] _init ls
+ 9.84% strcoll_l.c:137 [.] __strcoll_l libc-2.26.so
+ 9.16% memcpy.S:175 [.] __memcpy_power7 libc-2.26.so
+ 9.01% gconv_charset.h:54 [.] _nl_find_locale libc-2.26.so
+ 8.87% dl-addr.c:52 [.] _dl_addr libc-2.26.so
+ 8.83% _init+236 [.] _init ls
...
Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20180611104049.11048-1-sandipan@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On s390 this test case fails because the socket identifiction numbers
assigned to the CPU are higher than the CPU identification numbers.
F/ix this by adding the platform architecture into the perf data header
flag information. This helps identifiing the test platform and handles
s390 specifics in process_cpu_topology().
Before:
[root@p23lp27 perf]# perf test -vvvvv -F 39
39: Session topology :
--- start ---
templ file: /tmp/perf-test-iUv755
socket_id number is too big.You may need to upgrade the perf tool.
---- end ----
Session topology: Skip
[root@p23lp27 perf]#
After:
[root@p23lp27 perf]# perf test -vvvvv -F 39
39: Session topology :
--- start ---
templ file: /tmp/perf-test-8X8VTs
CPU 0, core 0, socket 6
CPU 1, core 1, socket 3
---- end ----
Session topology: Ok
[root@p23lp27 perf]#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fixes: c84974ed9f ("perf test: Add entry to test cpu topology")
Link: http://lkml.kernel.org/r/20180611073153.15592-2-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On s390 the socket identifier assigned to a CPU identifier is random and
(depending on the configuration of the LPAR) may be higher than the CPU
identifier. This is currently not supported.
Fix this by allowing arbitrary socket identifiers being assigned to
CPU id.
Output before:
[root@p23lp27 perf]# ./perf report --header -I -v
...
socket_id number is too big.You may need to upgrade the perf tool.
Error:
The perf.data file has no samples!
# ========
# captured on : Tue May 29 09:29:57 2018
# header version : 1
...
# Core ID and Socket ID information is not available
...
[root@p23lp27 perf]#
Output after:
[root@p23lp27 perf]# ./perf report --header -I -v
...
Error:
The perf.data file has no samples!
# ========
# captured on : Tue May 29 09:29:57 2018
# header version : 1
...
# CPU 0: Core ID 0, Socket ID 6
# CPU 1: Core ID 1, Socket ID 3
# CPU 2: Core ID -1, Socket ID -1
...
[root@p23lp27 perf]#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180611073153.15592-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull perf fixes from Thomas Gleixner:
"A pile of perf updates:
Kernel side:
- Remove an incorrect warning in uprobe_init_insn() when
insn_get_length() fails. The error return code is handled at the
call site.
- Move the inline keyword to the right place in the perf ringbuffer
code to address a W=1 build warning.
Tooling:
perf stat:
- Fix metric column header display alignment
- Improve error messages for default attributes, providing better
output for error in command line.
- Add --interval-clear option, to provide a 'watch' like printing
perf script:
- Show hw-cache events too
perf c2c:
- Fix data dependency problem in layout of 'struct c2c_hist_entry'
Core:
- Do not blindly assume that 'struct perf_evsel' can be obtained via
a straight forward container_of() as there are call sites which
hand in a plain 'struct hist' which is not part of a container.
- Fix error index in the PMU event parser, so that error messages can
point to the problematic token"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Move the inline keyword at the beginning of the function declaration
uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn()
perf script: Show hw-cache events
perf c2c: Keep struct hist_entry at the end of struct c2c_hist_entry
perf stat: Add event parsing error handling to add_default_attributes
perf stat: Allow to specify specific metric column len
perf stat: Fix metric column header display alignment
perf stat: Use only color_fprintf call in print_metric_only
perf stat: Add --interval-clear option
perf tools: Fix error index for pmu event parser
perf hists: Reimplement hists__has_callchains()
perf hists browser gtk: Use hist_entry__has_callchains()
perf hists: Make hist_entry__has_callchains() work with 'perf c2c'
perf hists: Save the callchain_size in struct hist_entry
As we move stuff around, some doc references are broken. Fix some of
them via this script:
./scripts/documentation-file-ref-check --fix
Manually checked if the produced result is valid, removing a few
false-positives.
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Coly Li <colyli@suse.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Exactly as the comment just before 'struct c2c_hist_entry" says, i.e.
the last entry in struct hist_entry is a zero length array, that when
allocating space for hist_entry gets extra space if callchains are in
use, which, if hist_entry is not at the end of c2c_hist_entry, the
members after it gets corrupted when callchains get added to the rb
trees collecting them, etc.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Fixes: 7f834c2e84 ("perf c2c report: Display node for cacheline address")
Link: http://lkml.kernel.org/n/tip-bh0ke4fh2ygpj3yowna7o1di@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The following change will introduce new metrics, that doesn't need such
wide hard coded spacing. Switch METRIC_ONLY_LEN macro usage with
metric_only_len variable.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180606221513.11302-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We can call color_fprintf also for non color case, it's handled
properly. This change simplifies following patch.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180606221513.11302-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding --interval-clear option to clear the screen before next interval.
Committer testing:
# perf stat -I 1000 --interval-clear
And, as expected, it behaves almost like:
# watch -n 0 perf stat -a sleep 1
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20180606221513.11302-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There are places where we have only access to struct hists and need to
know if any of its hist_entries has callchains, like when drawing
headers for the various output modes (stdio, TUI, etc), so, when adding
a new hist_entry, check if it has callchains, storing this info for
later use by hists__has_callchains().
This reimplementation is necessary because not always a 'struct hists'
is allocated together with a 'struct perf evsel', so we can't go from
'hists' to 'perf_event_attr.sample_type & PERF_SAMPLE_CALLCHAIN'.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-hg5g7yddjio3ljwyqnnaj5dt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since we can't go from struct hists to struct evsel for all cases (c2c
is an exception) and we have access to the hist_entry, use
hist_entry__has_callchains() in the GTK+ hists browser to figure out
if callchains are available.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-8owkgrruzzi5emvblwh4e6le@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since 'perf c2c' uses 'struct hists' not allocated together with a
'struct perf_evsel' instance, we can't go from a 'struct hist_entry'
pointer to a 'struct perf_evsel' via he->hists, so, instead, check if
space was set aside for hist_entry->callchain[0] at hist_entry__new()
time.
Reported-by: Jin Yao <yao.jin@linux.intel.com>
Reported-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: fabd37b837 ("perf hists: Check if a hist_entry has callchains before using them")
Link: https://lkml.kernel.org/n/tip-e8ife8djvvvwmeze3s4yodii@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that we can figure out the real size of the struct and also be able
to tell if callchains may be present in this histogram entry.
Since we can't always guarantee that from hist_entry->hists we can use
hists_to_evsel, to then look at evsel->attr.sample_type for
PERF_SAMPLE_CALLCHAIN, like with the 'perf c2c' tool, that uses plain
'struct hists' instances, we need another way of deciding if a specific
hist_entry instance has callchains associated with it, i.e. if its
hist_entry->callchain[0] has space allocated for.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ptvndealxs1k7myluvu9flnq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a brief introduction about fields to perf-script-python.txt.
It should help python script developers in easily finding what fields
are supported.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1527843663-32288-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When doing pmu sampling and then running a script with perf script -s
script.py, the process_event function gets dictionary with some fields
from the perf ring buffer (like ip, sym, callchain etc).
But we miss quite a few fields we report now, for example, LBRs, data
source, weight, transaction, iregs, uregs, etc.
This patch reports these fields for perf script python processing.
New keys/items:
---------------
key : brstack
items: from, to, from_dsoname, to_dsoname, mispred,
predicted, in_tx, abort, cycles.
key : brstacksym
items: from, to, pred, in_tx, abort (converted string)
key : datasrc
key : datasrc_decode (decoded string)
key : iregs
key : uregs
key : weight
key : transaction
v2:
---
Add new fields for dso.
Use PyBool_FromLong() for mispred/predicted/in_tx/abort
Committer notes:
!sym->name isn't valid, as its not a pointer, its a [0] array, use
!sym->name[0] instead, guaranteed to be the case by symbol__new.
This was caught by just one of the containers:
52 54.22 ubuntu:17.04 : FAIL gcc (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406
CC /tmp/build/perf/util/scripting-engines/trace-event-python.o
util/scripting-engines/trace-event-python.c:534:20: error: address of array 'sym->name' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion]
if (!sym || !sym->name)
~~~~~~^~~~
1 error generated.
mv: cannot stat '/tmp/build/perf/util/scripting-engines/.trace-event-python.o.tmp': No such file or directory
/git/linux/tools/build/Makefile.build:96: recipe for target '/tmp/build/perf/util/scripting-engines/trace-event-python.o' failed
make[5]: *** [/tmp/build/perf/util/scripting-engines/trace-event-python.o] Error 1
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1527843663-32288-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch creates a new function get_dsoname() and move the code which
gets the dsoname string to this function.
That's because in next patch, when we process LBR data, we will also
need get_dsoname() to return dsoname for branch from/to.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1527843663-32288-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We were not considering 'B' and 'b' (BSS, uninitialized data objects,
that gets set to zero at program start), do it so that we can resolve
more symbols in tools doing resolution of data operands, like 'perf c2c'.
When using vmlinux, i.e. an ELF symbol table, those were already
considered, as the decision was about STT_FUNC or STT_OBJECT, and the
later covers BSS symbols.
# grep -i ' b ' /proc/kallsyms | head -20 | tail -5
ffffffffa789d030 b execute_command
ffffffffa789d038 b initcall_command_line
ffffffffa789d040 b static_command_line
ffffffffa789d048 B ROOT_DEV
ffffffffa789d050 b once.73786
#
# readelf -s /lib/modules/`uname -r`/build/vmlinux | grep ROOT_DEV
79219: ffffffff8289d048 4 OBJECT GLOBAL DEFAULT 58 ROOT_DEV
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-z960xobig39ca1pmp5brl2fr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Making it a bit more robust, this took place here when a sample appeared
right after:
ffffffff8a925000 D __nosave_end
And before the next considered symbol, which, using kallsyms make us
over guess the size of __nosave_end, and then the sequence:
hist_entry__inc_addr_samples ->
symbol__inc_addr_samples ->
symbol__hists ->
annotated_source__alloc_histograms
Ends up not liking to allocate gigabytes of ram for annotation...
This will be alleviated by considering BSS symbols, which we should but
don't so far, and then we should investigate those samples further.
The testcase was to have:
perf top -e cycles/call-graph=fp/,cache-misses/call-graph=dwarf/,instructions
Running for a while till it segfaulted trying to access NULL notes->src->histograms.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ndfjtpiop3tdcnyjgp320ra8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some Atom CPUs can produce FUP packets that contain NLIP (next linear
instruction pointer) instead of CLIP (current linear instruction
pointer). That will result in "Unexpected indirect branch" errors. Fix
by comparing IP to NLIP in that case.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1527762225-26024-5-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On some platforms, overflows will clear before MTC wraparound, and there
is no following TSC/TMA packet. In that case the previous TMA is valid.
Since there will be a valid TMA either way, stop setting 'have_tma' to
false upon overflow.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1527762225-26024-4-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
sync_switch is a facility to synchronize decoding more closely with the
point in the kernel when the context actually switched.
In one case, INTEL_PT_SS_NOT_TRACING state was not correctly
transitioning to INTEL_PT_SS_TRACING state due to a missing case clause.
Add it.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1527762225-26024-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian reported that this test fails in his system where:
probe libc's inet_pton & backtrace it with ping: FAILED!
root@kbl04:~/git/linux-perf# nm -g /lib/x86_64-linux-gnu/libc-2.19.so | grep inet_pton
nm: /lib/x86_64-linux-gnu/libc-2.19.so: no symbols
This fails on ubuntu systems, with Adrian's being kubuntu 14.04, I
tested with ubuntu 14.04.4 and 18.04, and there we need to use the
-D/--dynamic 'nm' option to have this test working. And it works as well
with that on fedora 27, so use it.
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-zlfnbauad3ljlmtjgo0v660u@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tools uses map__rip_2objdump() to calculate objdump virtual addresses.
map__rip_2objdump() needs to be amended to deal with PTI entry trampolines.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1528183800-21577-1-git-send-email-adrian.hunter@intel.com
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The "Object code reading" test will not create maps for the PTI entry
trampolines unless the machine environment exists to show that the arch is
x86_64.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1528183800-21577-1-git-send-email-adrian.hunter@intel.com
[ split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently all the event parsing fails end up
in the event_pmu rule, and display misleading
help like:
$ perf stat -e inst kill
event syntax error: 'inst'
\___ Cannot find PMU `inst'. Missing kernel support?
...
The reason is that the event_pmu is too strong
and match also single string. Changing it to
force the '/' separators to be part of the rule,
and getting the proper error now:
$ perf stat -e inst kill
event syntax error: 'inst'
\___ parser error
Run 'perf list' for a list of valid events
...
Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180605121416.31645-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding the support to read rusage data once the workload is finished and
display the system/user time values:
$ perf stat --null perf bench sched pipe
...
Performance counter stats for 'perf bench sched pipe':
5.342599256 seconds time elapsed
2.544434000 seconds user
4.549691000 seconds sys
It works only in non -r mode and only for workload target.
So as of now, for workload targets, we display 3 types of timings. The
time we meassure in perf stat from enable to disable+period:
5.342599256 seconds time elapsed
The time spent in user and system lands, displayed only for workload
session/target:
2.544434000 seconds user
4.549691000 seconds sys
Those times are the very same displayed by 'time' tool. They are
returned by wait4 call via the getrusage struct interface.
Committer notes:
Had to rename some variables to avoid this on older systems such as
centos:6:
builtin-stat.c: In function 'print_footer':
builtin-stat.c:1831: warning: declaration of 'stime' shadows a global declaration
/usr/include/time.h:297: warning: shadowed declaration is here
Committer testing:
# perf stat --null time perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 5.526 [sec]
5.526534 usecs/op
180945 ops/sec
1.00user 6.25system 0:05.52elapsed 131%CPU (0avgtext+0avgdata 8056maxresident)k
0inputs+0outputs (0major+606minor)pagefaults 0swaps
Performance counter stats for 'time perf bench sched pipe':
5.530978744 seconds time elapsed
1.004037000 seconds user
6.259937000 seconds sys
#
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180605121313.31337-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Enable complex event names containing [.:=,] symbols to be encoded into Perf
trace using name= modifier e.g. like this:
perf record -e cpu/name=\'OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM\',\
period=0x3567e0,event=0x3c,cmask=0x1/Duk ./futex
Below is how it looks like in the report output. Please note explicit escaped
quoting at cmdline string in the header so that thestring can be directly reused
for another collection in shell:
perf report --header
# ========
...
# cmdline : /root/abudanko/kernel/tip/tools/perf/perf record -v -e cpu/name=\'OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM\',period=0x3567e0,event=0x3c,cmask=0x1/Duk ./futex
# event : name = OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM, , type = 4, size = 112, config = 0x100003c, { sample_period, sample_freq } = 3500000, sample_type = IP|TID|TIME, disabled = 1, inh
...
# ========
#
#
# Total Lost Samples: 0
#
# Samples: 24K of event 'OFFCORE_RESPONSE:request=DEMAND_RFO:response=L3_HIT.SNOOP_HITM'
# Event count (approx.): 86492000000
#
# Overhead Command Shared Object Symbol
# ........ ....... ................ ..............................................
#
14.75% futex [kernel.vmlinux] [k] __entry_trampoline_start
...
perf stat -e cpu/name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\',period=0x3567e0,event=0x3c,cmask=0x1/Duk ./futex
10000000 process context switches in 16678890291ns (1667.9ns/ctxsw)
Performance counter stats for './futex':
88,095,770,571 CPU_CLK_UNHALTED.THREAD:cmask=0x1
16.679542407 seconds time elapsed
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/c194b060-761d-0d50-3b21-bb4ed680002d@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix __kmod_path__parse() so that perf tools does not treat vdso32 and
vdsox32 as kernel modules and fail to find the object.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: stable@vger.kernel.org
Fixes: 1f121b03d0 ("perf tools: Deal with kernel module names in '[]' correctly")
Link: http://lkml.kernel.org/r/1528117014-30032-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add tests for vdso32 and vdsox32. This will cause the overall test to
fail because __kmod_path__parse() does not handle vdso32 or vdsox32.
Fixes: 1f121b03d0 ("perf tools: Deal with kernel module names in '[]' correctly")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1528117014-30032-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So far if we use 'perf record -g' this will make
symbol_conf.use_callchain 'true' and logic will assume that all events
have callchains enabled, but ever since we added the possibility of
setting up callchains for some events (e.g.: -e
cycles/call-graph=dwarf/) while not for others, we limit usage scenarios
by looking at that symbol_conf.use_callchain global boolean, we better
look at each event attributes.
On the road to that we need to look if a hist_entry has callchains, that
is, to go from hist_entry->hists to the evsel that contains it, to then
look at evsel->sample_type for PERF_SAMPLE_CALLCHAIN.
The next step is to add a symbol_conf.ignore_callchains global, to use
in the places where what we really want to know is if callchains should
be ignored, even if present.
Then -g will mean just to select a callchain mode to be applied to all
events not explicitely setting some other callchain mode, i.e. a default
callchain mode, and --no-call-graph will set
symbol_conf.ignore_callchains with that clear intention.
That too will at some point become a per evsel thing, that tools can set
for all or just a few of its evsels.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-0sas5cm4dsw2obn75g7ruz69@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We'll use this helper more frequently when reworking
symbol_conf.use_callchain logic, where knowing if a hist_entry has
callchains is the important bit, so make going from hist_entry to hists
to evsel easier, compact.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-p6gioxkzpkpz71dtt4wcs36o@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Instead of using symbol_conf.use_callchain, reducing its usage a bit
more.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-edgwb1b2mpbrdeg0w64wp7ms@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We were checking just if callchain processing was asked for by the
user, not if the evsel itself has callchains, and since we can have
some evsels with callchains and others without, check that.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-inxl7k49q9f9w1se039fbxuw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Its common to have the (evsel->attr.sample_type & PERF_SAMPLE_CALLCHAIN),
so add an evsel__has_callchain(evsel) helper.
This will actually get more uses as we check that instead of
symbol_conf.use_callchain in places where that produces the same result
but makes this decision to be more fine grained, per evsel.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-145340oytbthatpfeaq1do18@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is used in a single place, move the declaration to that function.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-p650ofrl8xike4dewxod51gg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use the header file util/debug.h instead of declaration of verbose
variable.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180528134817.36643-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
One more step in grouping annotation options.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-sogzdhugoavm6fyw60jnb0vs@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that things changed in the command line may percolate to the browser
code without using globals.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-5daawc40zhl6gcs600com1ua@git.kernel.org
[ Merged fix for NO_SLANG=1 build provided by Jiri Olsa ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Continuing to group annotation specific stuff into a struct.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-p3cdhltj58jt0byjzg3g7obx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Continuing to group annotation options in an annotation specific struct.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-astei92tzxp4yccag5pxb2h7@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now all callers to symbol__disassemble() can hand it the per-tool
annotation_options, which will allow us to remove lots of stuff
from symbol_options, the kitchen sink of perf configs, reducing its
size and getting annotation specific stuff grouped together.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-vpr7ys7ggvs2fzpg8wbjcw7e@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
No need to have "get_srcline", plain hist_entry__srcline() is enough and
shorter.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-irhzpfmgdaf6cyk0uqqexoh9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since we have 'struct addr_map_symbol' and the srcline sort order keys
all operate on those, make the code more compact by introducing a
function that receives a pointer to such struct and expands the
arguments to map__srcline().
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-j540wq7n3ukkh70gk5be0in5@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Replacing a common open coded sequence.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-2d7d1nzd3ksqornloqeer99r@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Accross all the routines, this way we can have eventually have a
consistent set of defaults for all UIs.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-6qgtixurjgdk5u0n3rw78ges@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When we have multiple groups in an evlist, say:
$ perf stat -e '{cycles,instructions},{cache-references,cache-misses}' sleep 1
Performance counter stats for 'sleep 1':
343,134 cycles:u
249,292 instructions:u # 0.73 insn per cycle
15,556 cache-references:u
8,925 cache-misses:u # 57.373 % of all cache refs
1.000957550 seconds time elapsed
$
Then the perf_evsel instances for the two group leaders ("cycles" and
"cache-references") will have evsel->nr_members set to 2, while all the
evsel->evlist->nr_entries will be set to 4, so we can't use
evsel->evlist->nr_entries everywhere, as event groups need to be taken
into account.
But this probably requires us to audit at least the forced-group code,
where we want all of the events to be in a "group", to see them all in
the screen, one column for each, even knowing that they were not
necessarily scheduled to count at the same time by the kernel perf
subsystem.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-2g0vwqnc49wl4ttjk8dvpgcc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since over time the places where we need to pass this got reduced
because we can obtain it from evsel->evlist->nr_entries, no need to have
this global anymore.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ovhikrfj8pzdv93yq3gt6sei@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Its a bit shorter, so ditch the old symbol__alloc_hists() function.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-m7tienxk7dijh5ln62yln1m9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since now we have evsel->evlist->nr_entries in the single place calling
this function, use it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-9mgosbqa977h39j4i9ys8t75@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In this case we're wanting just notes->src->cycles_hist, allocating it if needed.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-pqj81aneunhftlntm66tmhz0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In this case we're wanting just notes->src->histograms, allocating it if needed.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-4iatualjskia7sojmdb65cmm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It only operates on the histograms, so no need for the encompassing
'struct annotation'.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-2se2v7rrjil0kwqywks04ey2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that we can call it independently, in contexts were we know we
already have notes->src allocated.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-f5fn7tr1asey6g013wavpn4c@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
More stuff will go in there, all the parts that are not needed when a
symbol had no samples and that were mistakenly added to 'struct
annotation'.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-u4761kyzhixw9ydk6kib3f0o@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that we can allocate just the notes->src->cyc_hist, that, unlike
notes->src->histograms, is not per event, and in paths where we
need to lazily allocate notes->src->cyc_hist we don't have the
number of events handy to also allocate ->histograms.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-tsx7dhxzpi0criyx0sio3pz3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It only operates on the notes->src->cyc_hist, just pass that to it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-zd1cu4zwmu21k0cxlr83y6vr@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The code gets shorter and we'll be able to use evsel->evlist in a
followup patch.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-t0s7vy19wq5kak74kavm8swf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Those functions always check if the argument is NULL before trying to
grab a reference count, and also will return the received object, so, to
make code more compact, no need to check for NULL.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-i9wycjdxh0fwhryu55lmafks@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
By taking advantage that __get() routines return the pointer to the
object for which a reference count is being get.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-xnvd07rdxliy04oi062samik@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The __get() idiom returns a reference count for the object passed, i.e.
all functions of this type return the object passed, so take advantage
of that to make the code more compact.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ds6vdm7clh070512rpydidsc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In c68677014b ("perf tools: Remove support for command aliases") we
removed the only remaining use of a function provided by these files, so
ditch it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mgnzqbi46gucs48d7bzfwr55@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes in:
ee6a7354a3 ("kprobes/x86: Prohibit probing on exception masking instructions")
That doesn't entail changes in tooling, but silences this perf build
warning:
Warning: Intel PT: x86 instruction decoder header at 'tools/perf/util/intel-pt-decoder/insn.h' differs from latest version at 'arch/x86/include/asm/insn.h'
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-o3wfwjnyh7r8l0gi9q3y9f44@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In the perf.data HEADER_CPUDESC feadure header we store first the number
of available CPUs in the system, then the number of CPUs at the time of
writing the header, not the other way around.
Reported-by: Thomas-Mich Richter <tmricht@linux.ibm.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Lakshman Annadorai <lakshmana@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-j7o92acm2vnxjv70y4o3swoc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
ARM CoreSight auxtrace uses 'sample->addr' to record the target address
for branch instructions, so the data of 'sample->addr' is required for
tracing data analysis.
This commit collects data of 'sample->addr' into perf sample dict,
finally can be used for python script for parsing event.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: Tor Jeremiassen <tor@ti.com>
Cc: coresight@lists.linaro.org
Cc: kim.phillips@arm.co
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-doc@vger.kernel.org
Link: http://lkml.kernel.org/r/1527497103-3593-3-git-send-email-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add an explanation of each cpu's core and socket identifier to the
perf.data file format documentation.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180528074433.16652-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The tail of a queue is supposed to be pointing to the next available
slot in a queue. In this implementation the tail is incremented before
it is used and as such points to the last used element, something that
has the immense advantage of centralizing tail management at a single
location and eliminating a lot of redundant code.
But this needs to be taken into consideration on the dequeueing side
where the head also needs to be incremented before it is used, or the
first available element of the queue will be skipped.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Walker <robert.walker@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1527289854-10755-1-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
bpf_object__open()/bpf_object__open_buffer can return error pointer or
NULL, check the return values with IS_ERR_OR_NULL() in bpf__prepare_load
and bpf__prepare_load_buffer
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: netdev@vger.kernel.org
Link: https://lkml.kernel.org/n/tip-psf4xwc09n62al2cb9s33v9h@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The "perf test Session topology" entry fails with core dump on s390. The root
cause is a NULL pointer dereference in function check_cpu_topology() line 76
(or line 82 without -v).
The session->header.env.cpu variable is NULL because on s390 function
process_cpu_topology() returns with error:
socket_id number is too big.
You may need to upgrade the perf tool.
and releases the env.cpu variable via zfree() and sets it to NULL.
Here is the gdb output:
(gdb) n
76 pr_debug("CPU %d, core %d, socket %d\n", i,
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
0x00000000010f4d9e in check_cpu_topology (path=0x3ffffffd6c8
"/tmp/perf-test-J6CHMa", map=0x14a1740) at tests/topology.c:76
76 pr_debug("CPU %d, core %d, socket %d\n", i,
(gdb)
Make sure the env.cpu variable is not used when its NULL.
Test for NULL pointer and return TEST_SKIP if so.
Output before:
[root@p23lp27 perf]# ./perf test -F 39
39: Session topology :Segmentation fault (core dumped)
[root@p23lp27 perf]#
Output after:
[root@p23lp27 perf]# ./perf test -vF 39
39: Session topology :
--- start ---
templ file: /tmp/perf-test-Ajx59D
socket_id number is too big.You may need to upgrade the perf tool.
---- end ----
Session topology: Skip
[root@p23lp27 perf]#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180528073657.11743-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Perf stat doesn't count the uncore event aliases from the same uncore
block in a group, for example:
perf stat -e '{unc_m_cas_count.all,unc_m_clockticks}' -a -I 1000
# time counts unit events
1.000447342 <not counted> unc_m_cas_count.all
1.000447342 <not counted> unc_m_clockticks
2.000740654 <not counted> unc_m_cas_count.all
2.000740654 <not counted> unc_m_clockticks
The output is very misleading. It gives a wrong impression that the
uncore event doesn't work.
An uncore block could be composed by several PMUs. An uncore event alias
is a joint name which means the same event runs on all PMUs of a block.
Perf doesn't support mixed events from different PMUs in the same group.
It is wrong to put uncore event aliases in a big group.
The right way is to split the big group into multiple small groups which
only include the events from the same PMU.
Only uncore event aliases from the same uncore block should be specially
handled here. It doesn't make sense to mix the uncore events with other
uncore events from different blocks or even core events in a group.
With the patch:
# time counts unit events
1.001557653 140,833 unc_m_cas_count.all
1.001557653 1,330,231,332 unc_m_clockticks
2.002709483 85,007 unc_m_cas_count.all
2.002709483 1,429,494,563 unc_m_clockticks
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1525727623-19768-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
x86 PTI entry trampolines all map to the same physical page. If that is
reflected in the program headers of /proc/kcore, then do the same for the
copy of kcore.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526986485-6562-18-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Identify and copy any sections for x86 PTI entry trampolines.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526986485-6562-17-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In preparation to add more program headers, get rid of kernel_map and
modules_map by moving ->kernel_map and ->modules_map to newly allocated
entries in the ->phdrs list.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526986485-6562-16-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In preparation to add more program headers, iterate phdrs instead of
assuming there is only one for the kernel text and one for the modules.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526986485-6562-15-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In preparation to add more program headers, layout the relative offset
of each section.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526986485-6562-14-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In preparation to add more program headers, calculate offset from the
number of phdrs.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526986485-6562-13-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In preparation to add more program headers, keep a count of phdrs.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526986485-6562-12-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently, kcore_copy makes 2 program headers, one for the kernel text
(namely kernel_map) and one for the modules (namely modules_map). Now
more program headers are needed, but treating each program header as a
special case results in much more code.
Instead, in preparation to add more program headers, change to keep
program header data (phdr_data) in a list.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526986485-6562-11-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When we enable the group, for tui/stdio2, the output first line includes
the group event string. While for stdio, it will show only one event.
For example,
perf record -e cycles,branches ./div
perf annotate --group --stdio
Percent | Source code & Disassembly of div for cycles (44407 samples)
......
The first line doesn't include the event 'branches'.
With this patch, it will show the correct group even string.
perf annotate --group --stdio
Percent | Source code & Disassembly of div for cycles, branches (44407 samples)
......
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1526989115-14435-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Like the kernel text, the location of x86 PTI entry trampolines must be
recorded in the perf.data file. Like the kernel, synthesize a mmap event
for that, and add processing for it.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526986485-6562-10-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Create maps for x86 PTI entry trampolines, based on symbols found in
kallsyms. It is also necessary to keep track of whether the trampolines
have been mapped particularly when the kernel dso is kcore.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526986485-6562-9-git-send-email-adrian.hunter@intel.com
[ Fix extra_kernel_map_info.cnt designed struct initializer on gcc 4.4.7 (centos:6, etc) ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Identify extra kernel maps by name so that they can be distinguished
from the kernel map and module maps.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526986485-6562-8-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When kernel symbols are derived from /proc/kallsyms only (not using
vmlinux or /proc/kcore) map_groups__split_kallsyms() is used. However
that function makes assumptions that are not true with entry trampoline
symbols. For now, remove the entry trampoline symbols at that point, as
they are no longer needed at that point.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526986485-6562-7-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On x86_64 the PTI entry trampolines are not in the kernel map created by
perf tools. That results in the addresses having no symbols and prevents
annotation. It also causes Intel PT to have decoding errors at the
trampoline addresses.
Workaround that by creating maps for the trampolines.
At present the kernel does not export information revealing where the
trampolines are. Until that happens, the addresses are hardcoded.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526986485-6562-6-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a function to return the number of the machine's available CPUs.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526986485-6562-5-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since we created a new function perf_evlist__force_leader(), remove the
old code and use that new evlist method.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1526914666-31839-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For non-explicit group (e.g. those created with -e '{eventA,eventB}'),
'perf report' supports a option '--group' which can enable group output.
We also need to support 'perf annotate' with the same '--group'.
Create a new function perf_evlist__force_leader() which contains common
code to force setting the group leader.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1526914666-31839-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Opickn x86_64, PTI entry trampolines are less than the start of kernel text,
but still above 2^63. So leave kernel_start = 1ULL << 63 for x86_64.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526548928-20790-7-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a function to identify the machine architecture.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1526548928-20790-6-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently perf has a feature to account cycles for LBRs
For example, on skylake:
perf record -b ...
perf report or perf annotate
And then browsing the annotate browser gives average cycle counts for
program blocks.
For some analysis it would be useful if we could know not only the
average cycles but also the min and max cycles.
This patch records the min and max cycles.
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1526569118-14217-2-git-send-email-yao.jin@linux.intel.com
[ Switch from max/min to min/max ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since the ip shown for a symbol is now always a virtual address, it
becomes difficult to correlate this with objdump output and determine
the exact instruction address. So, we always show the offset from the
start of the symbol.
This can be verified on a powerpc64le system running Fedora 27 as
follows:
# perf probe -a sys_write
# perf record -e probe:sys_write -g ~/test
Before applying this patch:
# perf script
test 9710 [013] 95614.332431: probe:sys_write: (c0000000004025b0)
c0000000004025b0 sys_write (/lib/modules/4.17.0-rc4+/build/vmlinux)
c00000000000b9e0 system_call (/lib/modules/4.17.0-rc4+/build/vmlinux)
7fffb70d8234 __GI___libc_write (/usr/lib64/libc-2.26.so)
7fffb7052c74 _IO_file_write@@GLIBC_2.17 (/usr/lib64/libc-2.26.so)
5afc1818 [unknown] ([unknown])
7fffb7051a60 new_do_write (/usr/lib64/libc-2.26.so)
7fffb7054638 _IO_do_write@@GLIBC_2.17 (/usr/lib64/libc-2.26.so)
7fffb7054bbc _IO_file_overflow@@GLIBC_2.17 (/usr/lib64/libc-2.26.so)
7fffb7055a24 __overflow (/usr/lib64/libc-2.26.so)
7fffb7044548 _IO_puts (/usr/lib64/libc-2.26.so)
10000440 main (/home/sandipan/test)
7fffb6fe36a0 generic_start_main.isra.0 (/usr/lib64/libc-2.26.so)
7fffb6fe3898 __libc_start_main (/usr/lib64/libc-2.26.so)
0 [unknown] ([unknown])
...
After applying this patch:
# perf script
test 9710 [013] 95614.332431: probe:sys_write: (c0000000004025b0)
c0000000004025b0 sys_write+0x10 (/lib/modules/4.17.0-rc4+/build/vmlinux)
c00000000000b9e0 system_call+0x58 (/lib/modules/4.17.0-rc4+/build/vmlinux)
7fffb70d8234 __GI___libc_write+0x24 (/usr/lib64/libc-2.26.so)
7fffb7052c74 _IO_file_write@@GLIBC_2.17+0x44 (/usr/lib64/libc-2.26.so)
5afc1818 [unknown] ([unknown])
7fffb7051a60 new_do_write+0x90 (/usr/lib64/libc-2.26.so)
7fffb7054638 _IO_do_write@@GLIBC_2.17+0x38 (/usr/lib64/libc-2.26.so)
7fffb7054bbc _IO_file_overflow@@GLIBC_2.17+0x14c (/usr/lib64/libc-2.26.so)
7fffb7055a24 __overflow+0x64 (/usr/lib64/libc-2.26.so)
7fffb7044548 _IO_puts+0x218 (/usr/lib64/libc-2.26.so)
10000440 main+0x20 (/home/sandipan/test)
7fffb6fe36a0 generic_start_main.isra.0+0x140 (/usr/lib64/libc-2.26.so)
7fffb6fe3898 __libc_start_main+0xb8 (/usr/lib64/libc-2.26.so)
0 [unknown] ([unknown])
...
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: http://lkml.kernel.org/r/20180517063326.6319-2-sandipan@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When perf data is recorded with the call-graph option enabled, the
callchain shown by perf script shows the binary offsets of the symbols
as the ip. This is incorrect for kernel symbols as the ip values are
always off by a fixed offset depending on the architecture. If the
offsets from the start of the symbols are printed, they are also
incorrect for both kernel and userspace symbols.
Without the call-graph option, the callchain shows the virtual addresses
of the symbols rather than their binary offsets. The offsets printed in
this case are also correct.
This fixes the inconsistency in perf script's output.
This can be verified on a powerpc64le system running Fedora 27 as
follows:
# cat /proc/kallsyms | grep sys_write
...
c0000000004025a0 T sys_write
c0000000004025a0 T __se_sys_write
...
# perf probe -a sys_write
Before applying this patch:
# perf record -e probe:sys_write -g ~/test
# perf script -F ip,sym,symoff
4125b0 sys_write+0x8000000000008010
1b9e0 system_call+0x8000000000008058
118234 __GI___libc_write+0xffff0000f52c0024
92c74 _IO_file_write@@GLIBC_2.17+0xffff0000f52c0044
5afbfd8a [unknown]
91a60 new_do_write+0xffff0000f52c0090
94638 _IO_do_write@@GLIBC_2.17+0xffff0000f52c0038
94bbc _IO_file_overflow@@GLIBC_2.17+0xffff0000f52c014c
95a24 __overflow+0xffff0000f52c0064
84548 _IO_puts+0xffff0000f52c0218
440 main+0xffffffffe0000020
236a0 generic_start_main.isra.0+0xffff0000f52c0140
23898 __libc_start_main+0xffff0000f52c00b8
0 [unknown]
...
# perf record -e probe:sys_write ~/test
# perf script -F ip,sym,symoff
c0000000004025b0 sys_write+0x10
...
After applying this patch:
# perf record -e probe:sys_write -g ~/test
# perf script -F ip,sym,symoff
c0000000004025b0 sys_write+0x10
c00000000000b9e0 system_call+0x58
7fffb70d8234 __GI___libc_write+0x24
7fffb7052c74 _IO_file_write@@GLIBC_2.17+0x44
5afc1818 [unknown]
7fffb7051a60 new_do_write+0x90
7fffb7054638 _IO_do_write@@GLIBC_2.17+0x38
7fffb7054bbc _IO_file_overflow@@GLIBC_2.17+0x14c
7fffb7055a24 __overflow+0x64
7fffb7044548 _IO_puts+0x218
10000440 main+0x20
7fffb6fe36a0 generic_start_main.isra.0+0x140
7fffb6fe3898 __libc_start_main+0xb8
0 [unknown]
...
# perf record -e probe:sys_write ~/test
# perf script -F ip,sym,symoff
c0000000004025b0 sys_write+0x10
...
Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: http://lkml.kernel.org/r/20180517063326.6319-1-sandipan@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Let tools that need to have those variables with the sysctl current
values use a function that will read them.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-1ljj3oeo5kpt2n1icfd9vowe@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is not read as commonly as 'page_size', so it makes sense to read it
lazily, caching its value when it is first read.
Less files open unconditionally at startup.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-35xhrq91u94uc1djtclek1ie@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
That takes care of using the right call to get the tracing_path
directory, the one that will end up calling tracing_path_set() to figure
out where tracefs is mounted.
One more step in doing just lazy reading of system structures to reduce
the number of operations done unconditionaly at 'perf' start.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-42zzi0f274909bg9mxzl81bu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Instead of accessing the trace_events_path variable directly, that may
not have been properly initialized wrt detecting where tracefs is
mounted.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-id7hzn1ydgkxbumeve5wapqz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When using for_each_event() we needlessly rebuild the whole path to
the tracepoint directory, reuse the dir_path instead, saving some cycles
and reducing the size of the next patch.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-54bcs15n0cp6gwcgpc4hptyc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To make reading events files a tad more compact than with
get_tracing_files("events/foo").
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-do6xgtwpmfl8zjs1euxsd2du@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
One should use tracing_path_mount() instead, so more things get done
lazily instead of at every 'perf' tool call startup.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-fci4yll35idd9yuslp67vqc2@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We check what perf_config__init() does at each perf_config() call,
namely if the static perf_config instance was created, so instead of
bailing out in that case, try to allocate it, bailing if it fails.
Now to get the perf_config() call out of the start of perf's main()
function, doing it also lazily.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-4bo45k6ivsmbxpfpdte4orsg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
bpf_object__open()/bpf_object__open_buffer can return error pointer or
NULL, check the return values with IS_ERR_OR_NULL() in bpf__prepare_load
and bpf__prepare_load_buffer
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: netdev@vger.kernel.org
Link: https://lkml.kernel.org/n/tip-psf4xwc09n62al2cb9s33v9h@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Perf stat doesn't count the uncore event aliases from the same uncore
block in a group, for example:
perf stat -e '{unc_m_cas_count.all,unc_m_clockticks}' -a -I 1000
# time counts unit events
1.000447342 <not counted> unc_m_cas_count.all
1.000447342 <not counted> unc_m_clockticks
2.000740654 <not counted> unc_m_cas_count.all
2.000740654 <not counted> unc_m_clockticks
The output is very misleading. It gives a wrong impression that the
uncore event doesn't work.
An uncore block could be composed by several PMUs. An uncore event alias
is a joint name which means the same event runs on all PMUs of a block.
Perf doesn't support mixed events from different PMUs in the same group.
It is wrong to put uncore event aliases in a big group.
The right way is to split the big group into multiple small groups which
only include the events from the same PMU.
Only uncore event aliases from the same uncore block should be specially
handled here. It doesn't make sense to mix the uncore events with other
uncore events from different blocks or even core events in a group.
With the patch:
# time counts unit events
1.001557653 140,833 unc_m_cas_count.all
1.001557653 1,330,231,332 unc_m_clockticks
2.002709483 85,007 unc_m_cas_count.all
2.002709483 1,429,494,563 unc_m_clockticks
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1525727623-19768-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The first symbol is not necessarily in the kernel text. Instead of
using the first symbol, use the _stest symbol to identify the kernel map
when loading kcore.
This allows for the introduction of symbols to identify the x86_64 PTI
entry trampolines.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/1525866228-30321-6-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that kprobe definitions become:
int probe(function, variables)(void *ctx, int err, var1, var2, ...)
The existing 5sec.c, got converted and goes from:
SEC("func=hrtimer_nanosleep rqtp->tv_sec")
int func(void *ctx, int err, long sec)
{
}
To:
int probe(hrtimer_nanosleep, rqtp->tv_sec)(void *ctx, int err, long sec)
{
}
If we decide to add tv_nsec as well, then it becomes:
$ cat tools/perf/examples/bpf/5sec.c
#include <bpf.h>
int probe(hrtimer_nanosleep, rqtp->tv_sec rqtp->tv_nsec)(void *ctx, int err, long sec, long nsec)
{
return sec == 5;
}
license(GPL);
$
And if we run it, system wide as before and run some 'sleep' with values
for the tv_nsec field, we get:
# perf trace --no-syscalls -e tools/perf/examples/bpf/5sec.c
0.000 perf_bpf_probe:hrtimer_nanosleep:(ffffffff9811b5f0) tv_sec=5 tv_nsec=100000000
9641.650 perf_bpf_probe:hrtimer_nanosleep:(ffffffff9811b5f0) tv_sec=5 tv_nsec=123450001
^C#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-1v9r8f6ds5av0w9pcwpeknyl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To further reduce boilerplate.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-vst6hj335s0ebxzqltes3nsc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Description:
. Disable strace like syscall tracing (--no-syscalls), or try tracing
just some (-e *sleep).
. Attach a filter function to a kernel function, returning when it should
be considered, i.e. appear on the output:
$ cat tools/perf/examples/bpf/5sec.c
#include <bpf.h>
SEC("func=hrtimer_nanosleep rqtp->tv_sec")
int func(void *ctx, int err, long sec)
{
return sec == 5;
}
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
$
. Run it system wide, so that any sleep of >= 5 seconds and < than 6
seconds gets caught.
. Ask for callgraphs using DWARF info, so that userspace can be unwound
. While this is running, run something like "sleep 5s".
# perf trace --no-syscalls -e tools/perf/examples/bpf/5sec.c/call-graph=dwarf/
0.000 perf_bpf_probe:func:(ffffffff9811b5f0) tv_sec=5
hrtimer_nanosleep ([kernel.kallsyms])
__x64_sys_nanosleep ([kernel.kallsyms])
do_syscall_64 ([kernel.kallsyms])
entry_SYSCALL_64 ([kernel.kallsyms])
__GI___nanosleep (/usr/lib64/libc-2.26.so)
rpl_nanosleep (/usr/bin/sleep)
xnanosleep (/usr/bin/sleep)
main (/usr/bin/sleep)
__libc_start_main (/usr/lib64/libc-2.26.so)
_start (/usr/bin/sleep)
^C#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-2nmxth2l2h09f9gy85lyexcq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So, the first helper is the one shortening a variable/function section
attribute, from, for instance:
char _license[] __attribute__((section("license"), used)) = "GPL";
to:
char _license[] SEC("license") = "GPL";
Convert empty.c to that and it becomes:
# cat ~acme/lib/examples/perf/bpf/empty.c
#include <bpf.h>
char _license[] SEC("license") = "GPL";
int _version SEC("version") = LINUX_VERSION_CODE;
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-zmeg52dlvy51rdlhyumfl5yf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The first one is the bare minimum that bpf infrastructure accepts before
it expects actual events to be set up:
$ cat tools/perf/examples/bpf/empty.c
char _license[] __attribute__((section("license"), used)) = "GPL";
int _version __attribute__((section("version"), used)) = LINUX_VERSION_CODE;
$
If you remove that "version" line, then it will be refused with:
# perf trace -e tools/perf/examples/bpf/empty.c
event syntax error: 'tools/perf/examples/bpf/empty.c'
\___ Failed to load tools/perf/examples/bpf/empty.c from source: 'version' section incorrect or lost
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf trace [<options>] [<command>]
or: perf trace [<options>] -- <command> [<options>]
or: perf trace record [<options>] [<command>]
or: perf trace record [<options>] -- <command> [<options>]
-e, --event <event> event/syscall selector. use 'perf list' to list available events
#
The next ones will, step by step, show simple filters, then the needs
for headers will be made clear, it will be put in place and tested with
new examples, rinse, repeat.
Back to using this first one to test the perf+bpf infrastructure:
If we run it will fail, as no functions are present connecting with,
say, a tracepoint or a function using the kprobes or uprobes
infrastructure:
# perf trace -e tools/perf/examples/bpf/empty.c
WARNING: event parser found nothing
invalid or unsupported event: 'tools/perf/examples/bpf/empty.c'
Run 'perf list' for a list of valid events
Usage: perf trace [<options>] [<command>]
or: perf trace [<options>] -- <command> [<options>]
or: perf trace record [<options>] [<command>]
or: perf trace record [<options>] -- <command> [<options>]
-e, --event <event> event/syscall selector. use 'perf list' to list available events
#
But, if we set things up to dump the generated object file to a file,
and do this after having run 'make install', still on the developer's
$HOME directory:
# cat ~/.perfconfig
[llvm]
dump-obj = true
#
# perf trace -e ~acme/lib/examples/perf/bpf/empty.c
LLVM: dumping /home/acme/lib/examples/perf/bpf/empty.o
WARNING: event parser found nothing
invalid or unsupported event: '/home/acme/lib/examples/perf/bpf/empty.c'
<SNIP>
#
We can look at the dumped object file:
# ls -la ~acme/lib/examples/perf/bpf/empty.o
-rw-r--r--. 1 root root 576 May 4 12:10 /home/acme/lib/examples/perf/bpf/empty.o
# file ~acme/lib/examples/perf/bpf/empty.o
/home/acme/lib/examples/perf/bpf/empty.o: ELF 64-bit LSB relocatable, *unknown arch 0xf7* version 1 (SYSV), not stripped
# readelf -sw ~acme/lib/examples/perf/bpf/empty.o
Symbol table '.symtab' contains 3 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 3 _license
2: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 4 _version
#
# tools/bpf/bpftool/bpftool --pretty ~acme/lib/examples/perf/bpf/empty.o
null
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-y7dkhakejz3013o0w21n98xd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To avoid regressions such as the one fixed by 4a35a9027f ("Revert
"perf pmu: Fix pmu events parsing rule""), where '-e intel_pt//u' got
broken, with this new entry in this 'perf tests' subtest, we would have
caught it before pushing upstream.
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-kw62fys9bwdgsp722so2ln1l@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is not specific to BPF but was found when parsing a .c BPF proggie
that while valid, had no events attached to tracepoints, kprobes, etc:
Very minimal file that perf's BPF code can compile:
# cat empty.c
char _license[] __attribute__((section("license"), used)) = "GPL";
int _version __attribute__((section("version"), used)) = LINUX_VERSION_CODE;
#
Before this patch:
# perf trace -e empty.c
WARNING: event parser found nothinginvalid or unsupported event: 'empty.c'
Run 'perf list' for a list of valid events
Usage: perf trace [<options>] [<command>]
or: perf trace [<options>] -- <command> [<options>]
or: perf trace record [<options>] [<command>]
or: perf trace record [<options>] -- <command> [<options>]
-e, --event <event> event/syscall selector. use 'perf list' to list available events
#
After:
# perf trace -e empty.c
WARNING: event parser found nothing
invalid or unsupported event: 'empty.c'
Run 'perf list' for a list of valid events
Usage: perf trace [<options>] [<command>]
or: perf trace [<options>] -- <command> [<options>]
or: perf trace record [<options>] [<command>]
or: perf trace record [<options>] -- <command> [<options>]
-e, --event <event> event/syscall selector. use 'perf list' to list available events
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-8ysughiz00h6mjpcot04qyjj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There have two spaces ahead function name cs_etm__set_pid_tid_cpu(), so
remove one space and correct indentation.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1525924920-4381-2-git-send-email-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
CoreSight doesn't allocate thread structure for unknown_thread in ETM
auxtrace, so unknown_thread is NULL pointer. If the perf data doesn't
contain valid tid and then cs_etm__mem_access() uses unknown_thread
instead as thread handler, this results in a segmentation fault when
thread__find_addr_map() accesses the thread handler.
This commit creates a new thread data which is used by unknown_thread, so
CoreSight tracing can roll back to use unknown_thread if perf data
doesn't include valid thread info. This commit also releases thread
data for initialization failure case and for normal auxtrace free flow.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1525924920-4381-1-git-send-email-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When we perform the following command lines:
$ perf record -e "{cycles,branches}" ./div
$ perf annotate main --stdio
The output shows only the first event, "cycles" and the displaying
format is not correct.
Percent | Source code & Disassembly of div for cycles (44550 samples)
-----------------------------------------------------------------------------------
:
:
:
: Disassembly of section .text:
:
: 00000000004004b0 <main>:
: main():
:
: return i;
: }
:
: int main(void)
: {
0.00 : 4004b0: push %rbx
: int i;
: int flag;
: volatile double x = 1212121212, y = 121212;
:
: s_randseed = time(0);
0.00 : 4004b1: xor %edi,%edi
: srand(s_randseed);
0.00 : 4004b3: mov $0x77359400,%ebx
:
: return i;
: }
The issue is that the value of the 'nr_percent' variable is hardcoded to
1. This patch fixes it.
With this patch, the output is:
Percent | Source code & Disassembly of div for cycles (44550 samples)
-----------------------------------------------------------------------------------
:
:
:
: Disassembly of section .text:
:
: 00000000004004b0 <main>:
: main():
:
: return i;
: }
:
: int main(void)
: {
0.00 0.00 : 4004b0: push %rbx
: int i;
: int flag;
: volatile double x = 1212121212, y = 121212;
:
: s_randseed = time(0);
0.00 0.00 : 4004b1: xor %edi,%edi
: srand(s_randseed);
0.00 0.00 : 4004b3: mov $0x77359400,%ebx
:
: return i;
: }
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: f681d593d1 ("perf annotate: Remove disasm__calc_percent() from disasm_line__print()")
Link: http://lkml.kernel.org/r/1525881435-4092-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf test "probe libc's inet_pton & backtrace it with ping" fails on
4.17.0rc3 on s/390. It turned out that function __inet_pton is reported
as inline:
[root@s8360047 perf]# ./perf script -i /tmp/perf.data.111
ping 12457 [000] 1584.478959: probe_libc:inet_pton: (3ffb5a347e8)
1347e8 __inet_pton (inlined)
f19d7 gaih_inet.constprop.5 (/usr/lib64/libc-2.24.so)
f4c3f __GI_getaddrinfo (inlined)
410b main (/usr/bin/ping)
Allow __inet_pton listed as inline.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180503065837.71043-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As reported by Adrian Hunter, this breaks intel_pt event parsing:
# perf record -e intel_pt//u uname
event syntax error: 'intel_pt//u'
\___ parser error
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
#
This reverts commit 9a4a931ce8.
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ye1o2mji7x68xotiot1tn1gp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: William Cohen <wcohen@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180503195032.28871-1-wcohen@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'R' means access the data via reads instead of writes, fix this typo.
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1524644707-11030-1-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since we do not have split symtabs anymore, no need to have explicit
find_kernel_function variants, use the find_kernel_symbol ones.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-hiw2ryflju000f6wl62128it@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Trivial fix to spelling mistake in error message text
Signed-off-by: Colin King <colin.king@canonical.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/20180427193158.17932-1-colin.king@canonical.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since it mainly will populate symtabs of its maps (kernel modules).
While looking at this I wonder if map_groups__split_kallsyms_for_kcore()
shouldn't be all that we need, seems much simpler.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-3d1f3iby76popdr8ia9yimsc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It was only using the map to obtain its kmap, so do the validation in
its called, __dso__load_kallsyms() and pass the kmap, that will be used
in the following patches in similar simplifications.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-u6p9hbonlqzpl6o1z9xzxd75@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Only the 'dso' is needed, so ditch the struct used to pass (map, dso),
passing just the used 'dso' pointer.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-17a4gkk1cs4up4smkviymi2g@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
More should be done to split this function, removing stuff map
relocation steps from the actual symbol table loading.
Arch specific stuff also should go elsewhere, to tools/arch/ and
we should have it keyed by data from the perf_env either in the
perf.data header or from the running environment.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-236gyo6cx6iet90u3uc01cws@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We can plain use the an else to the if block that is right after that
goto, so simplify it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-vnpc2rakf6vc98pcl5z1cfrg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Remove the split of symbol tables for data (MAP__VARIABLE) and for
functions (MAP__FUNCTION), its unneeded and there were various places
doing two lookups to find a symbol, so simplify this.
We still will consider only the symbols that matched the filters in
place, i.e. see the (elf_(sec,sym)|symbol_type)__filter() routines in
the patch, just so that we consider only the same symbols as before,
to reduce the possibility of regressions.
All the tests on 50-something build environments, in varios versions
of lots of distros and cross build environments were performed without
build regressions, as usual with all pull requests the other tests were
also performed: 'perf test' and 'make -C tools/perf build-test'.
Also this was done at a great granularity so that regressions can be
bisected more easily.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-hiq0fy2rsleupnqqwuojo1ne@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Its equivalent, one less use of enum map_type.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-6m18iv1ty7nh7kxlfmn89sgz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Equivalent, one step more in ditching enum map_type.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-mrjjc87a4tpf896j5u4sql4e@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
map->type is going away, we can derive it from map->prot, so use
the same logic as in the kernel's arch/arm/kernel/module.c file:
ELF32_ST_TYPE(sym->st_info) == STT_FUNC && !(sym->st_value & 1))
This was introduced in b2f8fb237e ("perf symbols: Fix annotation of
thumb code"), that fix is maintained with this change.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Dave Martin <dave.martin@linaro.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Dr. David Alan Gilbert <david.gilbert@linaro.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-us590h81uqgxaumucfttqj50@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In 39b12f7812 ("perf tools: Make it possible to read object code from
vmlinux") we special case MAP__FUNCTION maps inconsistently, the first
test tests the map type while the following tests added by this patch
don't do that, be consistent and elliminate this special case.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-khmi5jccpcwqa9nybefluzqp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To match the kernel when setting the PERF_RECORD_MISC_MMAP_DATA bit
in perf_event_attr.header.misc, that gets set when VM_EXEC is not
set in the vm_flags.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-r1z0tbdc7tich469aw4szinx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The kernel doesn't fill the map 'prot' field for PERF_RECORD_MMAP
records, and we will use that info to replace checking for
MAP__VARIABLE, so store that when processing the
PERF_RECORD_MISC_MMAP_DATA perf_event_attr.header.misc bit.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-es3zz9r0q2qlssg4wh1w1d8p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There is code that needs to see if a resolved address is a function, so,
since we're going to ditch the MAP__{FUNCTION,VARIABLE} split, store
that info in the per symbol struct.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-9ugwxz0i8ryg5702rx8u5q6z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
One more step in ditching the split.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-4pour7egur07tkrpbynawemv@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We still have the split internally, but users don't see it anymore,
simplifying the growing number of cases where we end up searching
in the MAP__VARIABLE maps.
This further paves the way for ditching the split.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-86mfxrztf310konutxvhr5ua@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Simulate having all symbols in just one tree by searching the still
existing two trees.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-uss70e8tvzzbzs326330t83q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We have that equivalent, shorter helper, use it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-1hcgu3k7vxdy4vknqf3kbtzt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
All callers are for MAP__FUNCTION, so just ditch it and use
thread__find_symbol(), that already ditched MAP__FUNCTION, i.e.
internally uses it till we ditch it for good.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-i0ocxs00b4a0tlrx31lyh2cs@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
One more step to ditch MAP__{VARIABLE,FUNCTION}
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-919d1k13ts62pjipnpibvgwd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Only the symbol core needs to use that, so provide a __ variant for that
case, that will end up removed when we ditch the MAP__ split.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-x29k9e1ohastsoqbilp3mguh@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now this is only used in the symbols.c file, where it will finally
disappear when we remove the MAP_{FUNCTION,VARIABLE} split.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-a9t4d4hfrycczq9vpsk5sr8q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Replacing equivalent, the equivalent and longer variation:
symbol__is_a(type, MAP__FUNCTION);
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-9t3dqogher54owfl9o2mir52@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
All users want MAP__FUNCTION, and this split is going away.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-sm72zwt1f03ma5uw78l6zze0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Instead of the variant that allows asking for just a specific map_type,
because that map_type split will go away.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-eya0jvmu26qvro0nxxd49xia@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Removing the map_type, that is going away.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-18iiiw25r75xn7zlppjldk48@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We had this much shorter map__for_each_symbol() helper for ages, use it
and kill one more map_type use outside the code, in the tools.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-iswqjy1elghc5jjvr0nds3nc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We had this for ages, IIRC for 'perf probe' use initially, so use them
instead of the variants that pass the map_type, that is going away.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-x1jpogsvj822sh0q8leiaoep@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since it uses machine__kernel_map() and this function always returns the
MAP__FUNCTION map, it doesn't make sense to call it with MAP__VARIABLE.
And also this is a step in the direction of nuking the MAP__{FUNCTION,VARIABLE}
split.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-0h3eof3kx3kq32ixg5fquf3p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So far the only use is for MAP__FUNCTION, and since we're going to
remove that split, remove the map_type argument in machine__load_kallsyms().
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-5dhgh7x8g9hx5hpxlp3k08jp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
That returns the a data structure contained the ordered list of kernel
modules + the main kernel maps, one more step in removing the
MAP__{FUNCTION,VARIABLE} split.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-qsgbxfyaohc80c9ma049dubm@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The asciidoc package seems behind the recent big wave of python3
conversion, and we were advised to switch to asciidoctor instead. It's
almost compatible but some extensions used for perf documentation don't
work with it. Here is the patch to cover them, and add the proper
support for asciidoctor.
Pass USE_ASCIIDOCTOR=yes to make for using asciidoctor instead of
asciidoc. The man source and manual attributes are passed via command
options. The support for these attributes have been fixed in the
latest asciidoctor code.
Since asciidoctor can covert to a man page and an HTML directly, we
can omit the dependency on xmlto when USE_ASCIIDOCTOR is set.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180424150456.17353-1-tiwai@suse.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Another step in the road to elliminate the MAP_{FUNCTION,VARIABLE}
separation, reducing the exposure to these details in the tools using
the symbol APIs.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-8a1hvrqe3r5i0kw865u3uxwt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Instead of just returning it in al.sym, allowing for some simplification
in its users, and to make it consistent with thread__find_map().
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-4axi2sigslffdixzxbehvgoj@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It was returning the searched map just on the addr_location passed, with
the function itself returning void.
Make it return the map so that we can make the code more compact.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-tzlrrzdeoof4i6ktyqv1t6ks@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In dc323ce8e7 ("perf script: Enable printing of branch stack") it
first tries to find the map for an address, then the symbol in the DSO
backing that map, for that address, well, this is what
thread__find_symbol() does, so just use it and make the code shorter.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-03nx3aod955yqnf9l06im28j@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Out of thread__find_addr_location(..., MAP__FUNCTION, ...), idea here is to
continue removing references to MAP__{FUNCTION,VARIABLE} ahead of
getting both types of symbols in the same rbtree, as various places do
two lookups, looking first at MAP__FUNCTION, then at MAP__VARIABLE.
So thread__find_symbol() will eventually do just that, and 'struct
symbol' will have the symbol type, for code that cares about that.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-n7528en9e08yd3flzmb26tth@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The output of perf test and perf test list differ because perf test list
does not display subtests. Correct this behavior and also let perf test
list report subtests.
For example:
$ ./perf test 2>&1 |wc -l
65
Without this commit:
$ ./perf test list 2>&1 |wc -l
57
With this commit:
$ ./perf test list 2>&1 |wc -l
65
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Reviewed-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: linux-s390@vger.kernel.org
LPU-Reference: 1523605343-11970-1-git-send-email-brueckner@linux.ibm.com
Link: https://lkml.kernel.org/n/tip-efb74jw7x2xs2bucp5hf4ilu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Out of thread__find_add_map(..., MAP__FUNCTION, ...), idea here is to
continue removing references to MAP__{FUNCTION,VARIABLE} ahead of
getting both types of symbols in the same rbtree, as various places do
two lookups, looking first at MAP__FUNCTION, then at MAP__VARIABLE.
So thread__find_map() will eventually do just that, and 'struct symbol'
will have the symbol type, for code that cares about that.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-q27xee34l4izpfau49w103s6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To further simplify checking if symbols are available for a given map
and to reduce the number of users of MAP__{FUNCTION,VARIABLE}.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-iyfoyvbfdti5uehgpjum3qrq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To replace longer code sequences in various places.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-tlk3klbkfyjrbfjvryyznfju@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Shorter, should be equivalent code, use it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-q90olng8sfkvrnsrwu7xnul6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Shorter form to figure out if a given map is the kernel one and also
reduces the number of code accessing MAP__{FUNCTION,VARIABLE}, that
should go away at some point.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-rn8pexelsxpx92ce3elu3wiw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo suggested to display elapsed time for multirun workload (perf stat
-e) with precision based on the precision of the standard deviation.
In his own words:
> This output is a slightly bit misleading:
> Performance counter stats for 'make -j128' (10 runs):
> 27.988995256 seconds time elapsed ( +- 0.39% )
> The 9 significant digits in the result, while only 1 is valid, suggests accuracy
> where none exists.
> It would be better if 'perf stat' would display elapsed time with a precision
> adjusted to stddev, it should display at most 2 more significant digits than
> the stddev inaccuracy.
> I.e. in the above case 0.39% is 0.109, so we only have accuracy for 1 digit, and
> so we should only display 3:
> 27.988 seconds time elapsed ( +- 0.39% )
Plus a suggestion about the output, which is small enough and connected
with the above change that I merged both changes together.
> Small output style nit - I think it would be nice if with --repeat the stddev was
> also displayed in absolute values, besides percentage:
>
> 27.988 +- 0.109 seconds time elapsed ( +- 0.39% )
The output is now:
Performance counter stats for './perf bench sched pipe' (5 runs):
SNIP
13.3667 +- 0.0256 seconds time elapsed ( +- 0.19% )
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180423090823.32309-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add 'check_2' function to check 2 different files, the 'check' function
stays to check files that differs only in the prefix path.
In upcoming changes we need to check header files in locations which
don't follow the prefix logic.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180423090823.32309-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Passing whole string instead of parsing them after. It simplifies
things for the next patches, that adds another function call, which
makes it hard to pass arguments in the correct shape.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180423090823.32309-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
User can remove files from cache using --remove/--purge options but both
needs list of files as an argument. It's not convenient when you want to
flush out entire cache. Add an option to purge all files from cache.
Ex,
# perf buildid-cache -l
8a86ef73e44067bca52cc3f6cd3e5446c783391c /tmp/a.out
ebe71fdcf4b366518cc154d570a33cd461a51c36 /tmp/a.out.1
# perf buildid-cache -P -v
Removing /tmp/a.out (8a86ef73e44067bca52cc3f6cd3e5446c783391c): Ok
Removing /tmp/a.out.1 (ebe71fdcf4b366518cc154d570a33cd461a51c36): Ok
Purged all: Ok
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Sihyeon Jang <uneedsihyeon@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180417041346.5617-4-ravi.bangoria@linux.vnet.ibm.com
[ Initialize 'err' in build_id_cache__purge_all(), to fix build on debian:7, as it can be used uninitialized ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'perf buildid-cache' allows to add/remove files into cache but there is
no option to list all cached files. Add --list option to list all
_valid_ cached files.
Ex,
# perf buildid-cache --add /tmp/a.out
# perf buildid-cache -l
8a86ef73e44067bca52cc3f6cd3e5446c783391c /tmp/a.out
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Sihyeon Jang <uneedsihyeon@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180417041346.5617-3-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
PMU name is printed repeatedly for interval print, for example:
perf stat --no-merge -e 'unc_m_clockticks' -a -I 1000
# time counts unit events
1.001053069 243,702,144 unc_m_clockticks [uncore_imc_4]
1.001053069 244,268,304 unc_m_clockticks [uncore_imc_2]
1.001053069 244,427,386 unc_m_clockticks [uncore_imc_0]
1.001053069 244,583,760 unc_m_clockticks [uncore_imc_5]
1.001053069 244,738,971 unc_m_clockticks [uncore_imc_3]
1.001053069 244,880,309 unc_m_clockticks [uncore_imc_1]
2.002024821 240,818,200 unc_m_clockticks [uncore_imc_4] [uncore_imc_4]
2.002024821 240,767,812 unc_m_clockticks [uncore_imc_2] [uncore_imc_2]
2.002024821 240,764,215 unc_m_clockticks [uncore_imc_0] [uncore_imc_0]
2.002024821 240,759,504 unc_m_clockticks [uncore_imc_5] [uncore_imc_5]
2.002024821 240,755,992 unc_m_clockticks [uncore_imc_3] [uncore_imc_3]
2.002024821 240,750,403 unc_m_clockticks [uncore_imc_1] [uncore_imc_1]
For each print, the PMU name is unconditionally appended to the
counter->name.
Need to check the counter->name first. If the PMU name is already
appended, do nothing.
Committer notes:
Add and use perf_evsel->uniquified_name bool instead of doing the more
expensive strstr(event->name, pmu->name).
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 8c5421c016 ("perf pmu: Display pmu name when printing unmerged events in stat")
Link: http://lkml.kernel.org/r/1524594014-79243-5-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Perf doesn't support mixed events from different PMUs (except software
event) in a group. The perf stat should output <not counted>/<not
supported> for all events, but it doesn't. For example,
perf stat -e '{cycles,uncore_imc_5/umask=0xF,event=0x4/,instructions}'
<not counted> cycles
<not supported> uncore_imc_5/umask=0xF,event=0x4/
1,024,300 instructions
If perf fails to open an event, it doesn't error out directly. It will
disable some features and retry, until the event is opened or all
features are disabled. The disabled features will not be re-enabled. The
group read is one of these features.
For the example as above, the IMC event and the leader event "cycles"
are from different PMUs. Opening the IMC event must fail. The group read
feature must be disabled for IMC event and the followed event
"instructions". The "instructions" event has the same PMU as the leader
"cycles". It can be opened successfully. Since the group read feature
has been disabled, the "instructions" event will be read as a single
event, which definitely has a value.
The group read fallback is still useful for the case which kernel
doesn't support group read. It is good enough to be handled only by the
leader.
For the fallback request from members, it must be caused by an error.
The fallback only breaks the semantics of group. Limit the group read
fallback only for the leader.
Committer testing:
On a broadwell t450s notebook:
Before:
# perf stat -e '{cycles,unc_cbo_cache_lookup.read_i,instructions}' sleep 1
Performance counter stats for 'sleep 1':
<not counted> cycles
<not supported> unc_cbo_cache_lookup.read_i
818,206 instructions
1.003170887 seconds time elapsed
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
After:
# perf stat -e '{cycles,unc_cbo_cache_lookup.read_i,instructions}' sleep 1
Performance counter stats for 'sleep 1':
<not counted> cycles
<not supported> unc_cbo_cache_lookup.read_i
<not counted> instructions
1.001380511 seconds time elapsed
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
#
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 82bf311e15 ("perf stat: Use group read for event groups")
Link: http://lkml.kernel.org/r/1524594014-79243-3-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Perf doesn't support mixed events from different PMUs (except software
event) in a group. For this case, only "<not counted>" or "<not
supported>" are printed out. There is no hint which guides users to fix
the issue.
Checking the PMU type of events to determine if they are from the same
PMU. There may be false alarm for the checking. E.g. the core PMU has
different PMU type. But it should not happen often.
The false alarm can also be tolerated, because:
- It only happens on error path.
- It just provides a possible solution for the issue.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1524594014-79243-2-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When counting uncore event with alias, core event is mistakenly
involved, for example:
perf stat --no-merge -e "unc_m_cas_count.all" -C0 sleep 1
Performance counter stats for 'CPU(s) 0':
0 unc_m_cas_count.all [uncore_imc_4]
0 unc_m_cas_count.all [uncore_imc_2]
0 unc_m_cas_count.all [uncore_imc_0]
153,640 unc_m_cas_count.all [cpu]
0 unc_m_cas_count.all [uncore_imc_5]
25,026 unc_m_cas_count.all [uncore_imc_3]
0 unc_m_cas_count.all [uncore_imc_1]
1.001447890 seconds time elapsed
The reason is that current implementation doesn't check PMU name of a
event when adding its alias into the alias list for core PMU. The
uncore event aliases are mistakenly added.
This bug was introduced in:
commit 14b22ae028 ("perf pmu: Add helper function is_pmu_core to
detect PMU CORE devices")
Checking the PMU name for all PMUs on X86 and other architectures except
ARM.
There is no behavior change for ARM.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 14b22ae028 ("perf pmu: Add helper function is_pmu_core to detect PMU CORE devices")
Link: http://lkml.kernel.org/r/1524594014-79243-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Command 'perf record' calls:
cmd_report()
record__auxtrace_init()
auxtrace_record__init()
On s390 function auxtrace_record__init() returns random return value due
to missing initialization.
This sometime causes 'perf record' to exit immediately without error
message and creating a perf.data file.
Fix this by setting error the return code to zero before returning from
platform specific functions which may not set the error code in call
cases.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Link: http://lkml.kernel.org/r/20180423142940.21143-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Several options were incorrectly described, some lacked describing
required arguments while others were simply not documented, fix it.
Signed-off-by: Sangwon Hong <qpakzk@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/1524382146-19609-1-git-send-email-qpakzk@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>