Needed to use the PRI[xu](32,64) formatting macros.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-wkbho8kaw24q67dd11q0j39f@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
TYPEOF(), for instance, was only used by MSB() that wasn't used at all,
besides typeof() is used in many places, should be the preferred way.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-golox8oa2w1oq28snki14z6s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pave the way for further cleanups where linux/kernel.h may stop being
included in some header.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-qqxan6tfsl6qx3l0v3nwgjvk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To match the kernel, then look for places redefining it to make it use
this version, which checks that its parameter is an array at build time.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-txlcf1im83bcbj6kh0wxmyy8@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We rely on symbol->name[0] since the beginning of tools/perf/, never
having received any complaint about it, also all the containers build
perf just fine, so remove this git codebase remnant.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-jsjpgojut8e22o2gtz83augk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In https://lkml.org/lkml/2017/2/2/16 I reported a build error that I
believed was caused by wrong uapi includes. The synthom was fixed by
Arnaldo in:
commit 2f7db55579 ("perf tools: Fix include of linux/mman.h")
but I was wrong attributing the problem to the uapi include.
The root cause was that I was using ARCH=x86_64, hence using the wrong
uapi include path. This explains why no one else ran into this build
problem.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170412064919.92449-8-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Besides memory allocation failure, tips.txt may fail to load because the
file is not found (a more likely cause).
Communicate that to the user in tips failure warning.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170412064919.92449-5-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
(This is a patch has been sitting in the Intel CQM/CMT driver series for
a while, despite not depend on it. Sending it now independently since
the series is being discarded.)
When an event is in error state, read() returns 0 instead of sizeof()
buffer. In certain modes, such as interval printing, ignoring the 0
return value may cause bogus count deltas to be computed and thus
invalid results printed.
This patch fixes this problem by modifying read_counters() to mark the
event as not scaled (scaled = -1) to force the printout routine to show
<NOT COUNTED>.
Signed-off-by: Stephane Eranian <eranian@google.com>
Reviewed-by: David Carrillo-Cisneros <davidcc@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170412182301.44406-1-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When parsing disassemble lines for source line number, use a stripped
line instead of raw line.
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1491612748-1605-3-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When parsing disassemble lines, use ltrim() and rtrim() to strip them,
not using just while loop and isspace().
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1491612748-1605-2-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pipe-mode has no perf.data header, hence no upfront knowledge of presend
and missing features, hence, do not print missing features in pipe-mode.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170410201432.24807-8-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Session sets a number parameters that rely on evlist. These parameters
are not used in pipe-mode and should not be set, since evlist is
unavailable. Fix that.
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170410201432.24807-6-davidcc@google.com
[ Check if file != NULL in perf_session__new(), like when used by builtin-top.c ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
__perf_session__process_pipe_events reuses the same memory buffer to
process all events in the pipe.
When reordering is needed (e.g. -b option), events are not immediately
flushed, but kept around until reordering is possible, causing
memory corruption.
The problem is usually observed by a "Unknown sample error" output. It
can easily be reproduced by:
perf record -o - noploop | perf inject -b > output
Committer testing:
Before:
$ perf record -o - stress -t 2 -c 2 | perf inject -b > /dev/null
stress: info: [8297] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd
stress: info: [8297] successful run completed in 2s
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
Warning:
Found 1 unknown events!
Is this an older tool processing a perf.data file generated by a more recent tool?
If that is not the case, consider reporting to linux-kernel@vger.kernel.org.
$
After:
$ perf record -o - stress -t 2 -c 2 | perf inject -b > /dev/null
stress: info: [9027] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd
stress: info: [9027] successful run completed in 2s
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
no symbols found in /usr/bin/stress, maybe install a debug package?
no symbols found in /usr/bin/stress, maybe install a debug package?
$
Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170410201432.24807-3-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Implement simple detection for all kind of jumps and branches.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-s390 <linux-s390@vger.kernel.org>
Cc: stable@kernel.org # v4.10+
Link: http://lkml.kernel.org/r/1491465112-45819-3-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
since 4.10 perf annotate exits on s390 with an "unknown error -95".
Turns out that commit 786c1b5184 ("perf annotate: Start supporting
cross arch annotation") added a hard requirement for architecture
support when objdump is used but only provided x86 and arm support.
Meanwhile power was added so lets add s390 as well.
While at it make sure to implement the branch and jump types.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-s390 <linux-s390@vger.kernel.org>
Cc: stable@kernel.org # v4.10+
Fixes: 786c1b5184 "perf annotate: Start supporting cross arch annotation"
Link: http://lkml.kernel.org/r/1491465112-45819-2-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We don't need to use strlen(), a var, or check for the end explicitely,
isspace('\0') is false:
[acme@jouet c]$ cat ltrim.c
#include <ctype.h>
#include <stdio.h>
static char *ltrim(char *s)
{
while (isspace(*s))
++s;
return s;
}
int main(void)
{
printf("ltrim(\"\")='%s'\n", ltrim(""));
return 0;
}
[acme@jouet c]$ ./ltrim
ltrim("")=''
[acme@jouet c]$
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/n/tip-w3nk0x3pai2vojk2ab6kdvaw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
After reading command name from /proc/<pid>/status, use ltrim() and
rtrim() to strip command name, not using just while loop, isspace() and
etc.
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1491575061-704-6-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The kernel has a special check for a specific irq_vectors trace event.
TRACE_EVENT_PERF_PERM(irq_work_exit,
is_sampling_event(p_event) ? -EPERM : 0);
The perf-record fails for this irq_vectors event when it is present,
like when using a wildcard:
root@skl:/tmp# perf record -a -e irq_vectors:* sleep 2
Error:
You may not have permission to collect system-wide stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid,
which controls use of the performance events system by
unprivileged users (without CAP_SYS_ADMIN).
The current value is 2:
-1: Allow use of (almost) all events by all users
>= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
>= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
>= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
To make this setting permanent, edit /etc/sysctl.conf too, e.g.:
kernel.perf_event_paranoid = -1
This patch prints out the exact sub event that failed with EPERM for
wildcards to help in understanding what went wrong when this event is
present:
After the patch:
root@skl:/tmp# perf record -a -e irq_vectors:* sleep 2
Error:
No permission to enable irq_vectors:irq_work_exit event.
You may not have permission to collect system-wide stats.
......
Committer notes:
So we have a lot of irq_vectors events:
[root@jouet ~]# perf list irq_vectors:*
List of pre-defined events (to be used in -e):
irq_vectors:call_function_entry [Tracepoint event]
irq_vectors:call_function_exit [Tracepoint event]
irq_vectors:call_function_single_entry [Tracepoint event]
irq_vectors:call_function_single_exit [Tracepoint event]
irq_vectors:deferred_error_apic_entry [Tracepoint event]
irq_vectors:deferred_error_apic_exit [Tracepoint event]
irq_vectors:error_apic_entry [Tracepoint event]
irq_vectors:error_apic_exit [Tracepoint event]
irq_vectors:irq_work_entry [Tracepoint event]
irq_vectors:irq_work_exit [Tracepoint event]
irq_vectors:local_timer_entry [Tracepoint event]
irq_vectors:local_timer_exit [Tracepoint event]
irq_vectors:reschedule_entry [Tracepoint event]
irq_vectors:reschedule_exit [Tracepoint event]
irq_vectors:spurious_apic_entry [Tracepoint event]
irq_vectors:spurious_apic_exit [Tracepoint event]
irq_vectors:thermal_apic_entry [Tracepoint event]
irq_vectors:thermal_apic_exit [Tracepoint event]
irq_vectors:threshold_apic_entry [Tracepoint event]
irq_vectors:threshold_apic_exit [Tracepoint event]
irq_vectors:x86_platform_ipi_entry [Tracepoint event]
irq_vectors:x86_platform_ipi_exit [Tracepoint event]
#
And some may be sampled:
[root@jouet ~]# perf record -e irq_vectors:local* sleep 20s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.020 MB perf.data (2 samples) ]
[root@jouet ~]# perf report -D | egrep 'stats:|events:'
Aggregated stats:
TOTAL events: 155
MMAP events: 144
COMM events: 2
EXIT events: 1
SAMPLE events: 2
MMAP2 events: 4
FINISHED_ROUND events: 1
TIME_CONV events: 1
irq_vectors:local_timer_entry stats:
TOTAL events: 1
SAMPLE events: 1
irq_vectors:local_timer_exit stats:
TOTAL events: 1
SAMPLE events: 1
[root@jouet ~]#
But, as shown in the tracepoint definition at the start of this message,
some, like "irq_vectors:irq_work_exit", may not be sampled, just counted,
i.e. if we try to sample, as when using 'perf record', we get an error:
[root@jouet ~]# perf record -e irq_vectors:irq_work_exit
Error:
You may not have permission to collect system-wide stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid,
<SNIP>
The error message is misleading, this patch will help in pointing out
what is the event causing such an error, but the error message needs
improvement, i.e. we need to figure out a way to check if a tracepoint
is counting only, like this one, when all we can do is to count it with
'perf stat', at most printing the delta using interval printing, as in:
[root@jouet ~]# perf stat -I 5000 -e irq_vectors:irq_work_*
# time counts unit events
5.000168871 0 irq_vectors:irq_work_entry
5.000168871 0 irq_vectors:irq_work_exit
10.000676730 0 irq_vectors:irq_work_entry
10.000676730 0 irq_vectors:irq_work_exit
15.001122415 0 irq_vectors:irq_work_entry
15.001122415 0 irq_vectors:irq_work_exit
20.001298051 0 irq_vectors:irq_work_entry
20.001298051 0 irq_vectors:irq_work_exit
25.001485020 1 irq_vectors:irq_work_entry
25.001485020 1 irq_vectors:irq_work_exit
30.001658706 0 irq_vectors:irq_work_entry
30.001658706 0 irq_vectors:irq_work_exit
^C 32.045711878 0 irq_vectors:irq_work_entry
32.045711878 0 irq_vectors:irq_work_exit
[root@jouet ~]#
But at least, when we use a wildcard, this patch helps a bit.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1491566932-503-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The option 'show-total-period' works fine without a option '-l'. But if
running 'perf annotate --stdio -l --show-total-period', you can see a
problem showing only zero '0' for number of samples.
Before:
$ perf annotate --stdio -l --show-total-period
...
0 : 400816: push %rbp
0 : 400817: mov %rsp,%rbp
0 : 40081a: mov %edi,-0x24(%rbp)
0 : 40081d: mov %rsi,-0x30(%rbp)
0 : 400821: mov -0x24(%rbp),%eax
0 : 400824: mov -0x30(%rbp),%rdx
0 : 400828: mov (%rdx),%esi
0 : 40082a: mov $0x0,%edx
...
The reason is it was missed to set number of samples of
source_line_samples, so set it ordinarily.
After:
$ perf annotate --stdio -l --show-total-period
...
3 : 400816: push %rbp
4 : 400817: mov %rsp,%rbp
0 : 40081a: mov %edi,-0x24(%rbp)
0 : 40081d: mov %rsi,-0x30(%rbp)
1 : 400821: mov -0x24(%rbp),%eax
2 : 400824: mov -0x30(%rbp),%rdx
0 : 400828: mov (%rdx),%esi
1 : 40082a: mov $0x0,%edx
...
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin Liska <mliska@suse.cz>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 0c4a5bcea4 ("perf annotate: Display total number of samples with --show-total-period")
Link: http://lkml.kernel.org/r/1490703125-13643-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The callers of perf_read_values__enlarge_counters() already propagate
errors, so just print some debug diagnostics and handle allocation
failures gracefully, not trying to do silly things like 'a =
realloc(a)'.
Link: http://lkml.kernel.org/n/tip-nsmmh7uzpg35rzcl9nq7yztp@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently we fail in the following case:
$ unset HOME
$ ./perf record ls
$ echo $?
255
It's because the config code init fails due to a missing HOME variable
value. Fix this by skipping the user config init if there's no HOME
variable value.
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170330144637.7468-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Trivial fix to spelling mistake in pr_debug message.
Signed-off-by: Colin King <colin.king@canonical.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-janitors@vger.kernel.org
Link: http://lkml.kernel.org/r/20170330095440.19444-1-colin.king@canonical.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
For some platforms, for example Broadwell, it doesn't support cycles
for LBR. But the perf always prints cycles:0, it's not necessary.
The patch refactors the LBR info print code and drops the cycles:0.
For example: perf report --branch-history --no-children --stdio
On Broadwell:
--0.91%--__random_r random_r.c:394 (iterations:2)
__random_r random_r.c:360 (predicted:0.0%)
__random_r random_r.c:380 (predicted:0.0%)
__random_r random_r.c:357
On Skylake:
--1.07%--main div.c:39 (predicted:52.4% cycles:1 iterations:17)
main div.c:44 (predicted:52.4% cycles:1)
main div.c:42 (cycles:2)
compute_flag div.c:28 (cycles:2)
compute_flag div.c:27 (cycles:1)
rand rand.c:28 (cycles:1)
rand rand.c:28 (cycles:1)
__random random.c:298 (cycles:1)
__random random.c:297 (cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (cycles:1)
__random random.c:295 (cycles:1)
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1489046786-10061-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
SDT marker argument is in N@OP format. N is the size of argument and OP
is the actual assembly operand. OP is arch dependent component and hence
it's parsing logic also should be placed under tools/perf/arch/.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexis Berlemont <alexis.berlemont@gmail.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170328094754.3156-3-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This came from 'git', but isn't documented anywhere in
tools/perf/Documentation/, looks like baggage we can do without, ditch
it.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-e7uwkn60t4hmlnwj99ba4t2s@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
New features:
- Handle inline functions in callchains (Jin Yao)
- Enable sorting by srcline as key (Milian Wolff)
Fixes:
- Fix no_size logic in addr_filter__resolve_kernel_syms() in the
auxtrace code (Adrian Hunter)
- Fix some thread refcount leaks in 'perf trace' (Arnaldo Carvalho de Melo)
- Fix divide by zero when calculating percent for an event in a group in
the annotate by source line code (Taeung Song)
- build-id files now aren't anymore symlinks, their parent directories
are, so readlink the later (Taeung Song)
- Assorted fixes for null termination problems, mostly related to
readlink, detected by valgrind (Tommi Rantala)
Infrastructure:
- Make vfs_getname probe point logic in 'perf trace' more robust
wrt length of pathname (Arnaldo Carvalho de Melo)
- Remove unused 'prefix' parameter from builtins main functions (Arnaldo Carvalho de Melo)
- Show 'perf list sdt' option in man page (Ravi Bangoria)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJY2WNUAAoJENZQFvNTUqpAZ3AQAIn/Q+Y665oP57RbikedeifL
He8vdMUkD/haRo0atbvuu5tRrwiRUabkUa6GKPHNCDl8GUD6UbkztUirL4Cq4v9s
7ONbCHXzaPnPZbDbl/W7Yx4vADow3YMR9EyNkL8/i2ApZqMCPQ9mUBhxJlSDp7RY
agYcOugUlYuvHsKVX59fTyvTAq8btfyFQTqhJ+NPddcxsyR5jam9XxxvgMURdFJr
h6OLO9wqCxlMctqlGXU+6tpqiAR+bp8UZgzDKwabGR4mZR+uLBYGf0FUQz52vf2A
83ufaZ5UrQUsSnVeYXBPW+i8+Ixu8pEOFDMDcSpk/wQXunLlN52LmuatSCkPBEV1
jFth8SX3IAX349hpaRBNuLk5UuqS6NKBztYzlaVsKMpuIw4hRPVE3VvqKefZD/hx
Vdlr1v6fPXMcRUcc3lFFiVCIvs0hRV4IDDIimGjJHf8dm+GFMHH+bk+tfiSQAlmZ
q3aSKMImUM3vlD01E4BmTVr4IEZHTd3mv0Ml+nbQGNj6Bu2364eBsFRnNHJWwGmt
c9tcnmeRv6JzrmprVXMuOUyyTcml+b5/vincEEmTxUdbxCbYFkQS3JzPxfpxqFI/
zM5rlJJ9KKWXmwD6OgUoXT5IUzq4BuIVyJ3DxwuL2rrQggsv0zORxQtVduY+IJSj
ZD/Qu7SOiFfnAFM6kLwP
=Lm/M
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo-4.12-20170327' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
New features:
- Handle inline functions in callchains (Jin Yao)
- Enable sorting by srcline as key (Milian Wolff)
Fixes:
- Fix no_size logic in addr_filter__resolve_kernel_syms() in the
auxtrace code (Adrian Hunter)
- Fix some thread refcount leaks in 'perf trace' (Arnaldo Carvalho de Melo)
- Fix divide by zero when calculating percent for an event in a group in
the annotate by source line code (Taeung Song)
- build-id files now aren't anymore symlinks, their parent directories
are, so readlink the later (Taeung Song)
- Assorted fixes for null termination problems, mostly related to
readlink, detected by valgrind (Tommi Rantala)
Infrastructure changes:
- Make vfs_getname probe point logic in 'perf trace' more robust
wrt length of pathname (Arnaldo Carvalho de Melo)
- Remove unused 'prefix' parameter from builtins main functions (Arnaldo Carvalho de Melo)
- Show 'perf list sdt' option in man page (Ravi Bangoria)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Simplification: it is easier to open /proc/self/exe than /proc/$pid/exe.
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170322130624.21881-7-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ensure that the string that we read from the data file is null terminated.
Valgrind was complaining:
==31357== Invalid read of size 1
==31357== at 0x4EC8C1: __strtok_r_1c (string2.h:200)
==31357== by 0x4EC8C1: parse_ftrace_printk (trace-event-parse.c:161)
==31357== by 0x4F82A8: read_ftrace_printk (trace-event-read.c:204)
==31357== by 0x4F82A8: trace_report (trace-event-read.c:468)
==31357== by 0x4CD552: process_tracing_data (header.c:1576)
==31357== by 0x4D3397: perf_file_section__process (header.c:2705)
==31357== by 0x4D3397: perf_header__process_sections (header.c:2488)
==31357== by 0x4D3397: perf_session__read_header (header.c:2925)
==31357== by 0x4E71E2: perf_session__open (session.c:32)
==31357== by 0x4E71E2: perf_session__new (session.c:139)
==31357== by 0x429F5D: cmd_annotate (builtin-annotate.c:472)
==31357== by 0x497150: run_builtin (perf.c:359)
==31357== by 0x428CE0: handle_internal_command (perf.c:421)
==31357== by 0x428CE0: run_argv (perf.c:467)
==31357== by 0x428CE0: main (perf.c:614)
==31357== Address 0x8ac0efb is 0 bytes after a block of size 1,963 alloc'd
==31357== at 0x4C2DB9D: malloc (vg_replace_malloc.c:299)
==31357== by 0x4F827B: read_ftrace_printk (trace-event-read.c:195)
==31357== by 0x4F827B: trace_report (trace-event-read.c:468)
==31357== by 0x4CD552: process_tracing_data (header.c:1576)
==31357== by 0x4D3397: perf_file_section__process (header.c:2705)
==31357== by 0x4D3397: perf_header__process_sections (header.c:2488)
==31357== by 0x4D3397: perf_session__read_header (header.c:2925)
==31357== by 0x4E71E2: perf_session__open (session.c:32)
==31357== by 0x4E71E2: perf_session__new (session.c:139)
==31357== by 0x429F5D: cmd_annotate (builtin-annotate.c:472)
==31357== by 0x497150: run_builtin (perf.c:359)
==31357== by 0x428CE0: handle_internal_command (perf.c:421)
==31357== by 0x428CE0: run_argv (perf.c:467)
==31357== by 0x428CE0: main (perf.c:614)
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170322130624.21881-6-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ensure that we have space for the null byte in buf.
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170322130624.21881-5-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Valgrind was complaining:
$ valgrind ./perf list >/dev/null
==11643== Memcheck, a memory error detector
==11643== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==11643== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==11643== Command: ./perf list
==11643==
==11643== Conditional jump or move depends on uninitialised value(s)
==11643== at 0x4C30620: rindex (vg_replace_strmem.c:199)
==11643== by 0x49DAA9: build_id_cache__origname (build-id.c:198)
==11643== by 0x49E1C7: build_id_cache__valid_id (build-id.c:222)
==11643== by 0x49E1C7: build_id_cache__list_all (build-id.c:507)
==11643== by 0x4B9C8F: print_sdt_events (parse-events.c:2067)
==11643== by 0x4BB0B3: print_events (parse-events.c:2313)
==11643== by 0x439501: cmd_list (builtin-list.c:53)
==11643== by 0x497150: run_builtin (perf.c:359)
==11643== by 0x428CE0: handle_internal_command (perf.c:421)
==11643== by 0x428CE0: run_argv (perf.c:467)
==11643== by 0x428CE0: main (perf.c:614)
[...]
Additionally, a zero length result from readlink() is not very interesting.
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170322130624.21881-3-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Valgrind was complaining:
==2633== Syscall param open(filename) points to unaddressable byte(s)
==2633== at 0x5281CC0: __open_nocancel (syscall-template.S:84)
==2633== by 0x537D38: open (fcntl2.h:53)
==2633== by 0x537D38: get_sdt_note_list (symbol-elf.c:2017)
==2633== by 0x5396FD: probe_cache__scan_sdt (probe-file.c:700)
==2633== by 0x49EA2C: build_id_cache__add_sdt_cache (build-id.c:625)
==2633== by 0x49EA2C: build_id_cache__add_s (build-id.c:697)
==2633== by 0x49EE72: build_id_cache__add_b (build-id.c:717)
==2633== by 0x49EE72: dso__cache_build_id (build-id.c:782)
==2633== by 0x49F190: __dsos__cache_build_ids (build-id.c:793)
==2633== by 0x49F190: machine__cache_build_ids (build-id.c:801)
==2633== by 0x49F190: perf_session__cache_build_ids (build-id.c:815)
==2633== by 0x4CD4F2: write_build_id (header.c:165)
==2633== by 0x4D26F7: do_write_feat (header.c:2296)
==2633== by 0x4D26F7: perf_header__adds_write (header.c:2335)
==2633== by 0x4D26F7: perf_session__write_header (header.c:2414)
==2633== by 0x43B324: __cmd_record (builtin-record.c:1154)
==2633== by 0x43B324: cmd_record (builtin-record.c:1839)
==2633== by 0x455A07: __cmd_record (builtin-kmem.c:1868)
==2633== by 0x455A07: cmd_kmem (builtin-kmem.c:1944)
==2633== by 0x497150: run_builtin (perf.c:359)
==2633== by 0x428CE0: handle_internal_command (perf.c:421)
==2633== by 0x428CE0: run_argv (perf.c:467)
==2633== by 0x428CE0: main (perf.c:614)
==2633== Address 0x0 is not stack'd, malloc'd or (recently) free'd
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tommi Rantala <tommi.t.rantala@nokia.com>
Link: http://lkml.kernel.org/r/20170322130624.21881-2-tommi.t.rantala@nokia.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is wrong way to read link name from a build-id file. Because a
build-id file is not anymore a symbolic link but build-id directory of
it is symbolic link, so fix it.
For example, if build-id file name gotten from
dso__build_id_filename() is as below,
/root/.debug/.build-id/4f/75c7d197c951659d1c1b8b5fd49bcdf8f3f8b1/elf
To correctly read link name of build-id, use the build-id dir path that
is a symbolic link, instead of the above build-id file name like below.
/root/.debug/.build-id/4f/75c7d197c951659d1c1b8b5fd49bcdf8f3f8b1
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1490598638-13947-2-git-send-email-treeze.taeung@gmail.com
Fixes: 01412261d9 ("perf buildid-cache: Use path/to/bin/buildid/elf instead of path/to/bin/buildid")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Often it is interesting to know how costly a given source line is in
total. Previously, one had to build these sums manually based on all
addresses that pointed to the same source line. This patch introduces
srcline as a sort key, which will do the aggregation for us.
Paired with the recent addition of showing inline frames, this makes
perf report much more useful for many C++ work loads.
The following shows the new feature in action. First, let's show the
status quo output when we sort by address. The result contains many hist
entries that generate the same output:
~~~~~~~~~~~~~~~~
$ perf report --stdio --inline -g address
# Children Self Command Shared Object Symbol
# ........ ........ ............ ................... .........................................
#
99.89% 35.34% cpp-inlining cpp-inlining [.] main
|
|--64.55%--main complex:655
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/complex:664 (inline)
| |
| |--60.31%--hypot +20
| | |
| | |--8.52%--__hypot_finite +273
| | |
| | |--7.32%--__hypot_finite +411
...
--35.34%--_start +4194346
__libc_start_main +241
|
|--6.65%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
|
|--2.70%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
|
|--1.69%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
...
~~~~~~~~~~~~~~~~
With this patch and `-g srcline` we instead get the following output:
~~~~~~~~~~~~~~~~
$ perf report --stdio --inline -g srcline
# Children Self Command Shared Object Symbol
# ........ ........ ............ ................... .........................................
#
99.89% 35.34% cpp-inlining cpp-inlining [.] main
|
|--64.55%--main complex:655
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/complex:664 (inline)
| |
| |--64.02%--hypot
| | |
| | --59.81%--__hypot_finite
| |
| --0.53%--cabs
|
--35.34%--_start
__libc_start_main
|
|--12.48%--main random.tcc:3326
| /home/milian/projects/kdab/rnd/hotspot/tests/test-clients/cpp-inlining/main.cpp:39 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1809 (inline)
| /usr/include/c++/6.3.1/bits/random.h:1818 (inline)
| /usr/include/c++/6.3.1/bits/random.h:185 (inline)
...
~~~~~~~~~~~~~~~~
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170318214928.9047-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If the address belongs to an inlined function, the source information
back to the first non-inlined function will be printed.
For example:
1. Show inlined function name
perf report -g function --inline
- 0.69% 0.00% inline ld-2.23.so [.] dl_main
- dl_main
0.56% _dl_relocate_object
_dl_relocate_object (inline)
elf_dynamic_do_Rela (inline)
2. Show the file/line information
perf report -g address --inline
- 0.69% 0.00% inline ld-2.23.so [.] _dl_start
_dl_start rtld.c:307
/build/glibc-GKVZIf/glibc-2.23/elf/rtld.c:413 (inline)
+ _dl_sysdep_start dl-sysdep.c:250
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1490474069-15823-6-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It takes some time to look for inline stack for callgraph addresses. So
it provides new option "--inline" to let user decide if enable this
feature.
--inline:
If a callgraph address belongs to an inlined function, the inline stack
will be printed. Each entry is the inline function name or file/line.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Tested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1490474069-15823-4-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It would be useful for perf to support a mode to query the inline stack
for a given callgraph address. This would simplify finding the right
code in code that does a lot of inlining.
The srcline.c has contained the code which supports to translate the
address to filename:line_nr. This patch just extends the function to let
it support getting the inline stacks.
It introduces the inline_list which will store the inline function
result (filename:line_nr and funcname).
If BFD lib is not supported, the result is only filename:line_nr.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Tested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1490474069-15823-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introduce dso__name() and filename_split() out of existing code because
these codes will be used in several places in next patch.
For filename_split(), it may also solve a potential memory leak in
existing code. In existing addr2line(),
sep = strchr(filename, ':');
if (sep) {
*sep++ = '\0';
*file = filename;
*line_nr = strtoul(sep, NULL, 0);
ret = 1;
}
out:
pclose(fp);
return ret;
If sep is NULL, filename is not freed or returned via file.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Tested-by: Milian Wolff <milian.wolff@kdab.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Link: http://lkml.kernel.org/r/1490474069-15823-2-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Address filtering with kernel symbols incorrectly resulted in the error
"Cannot determine size of symbol" because the no_size logic was the wrong
way around.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org # v4.9+
Link: http://lkml.kernel.org/r/1490357752-27942-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move the printing of perf expressions and internal events to a new
clearer --details flag, instead of lumping it together with other debug
options in --debug. This makes it clearer to use.
Before
perf list --debug
...
unc_m_power_critical_throttle_cycles
[Cycles all ranks are in critical thermal throttle. Unit: uncore_imc]
uncore_imc_2/event=0x86/ MetricName: power_critical_throttle_cycles % MetricExpr: (unc_m_power_critical_throttle_cycles / unc_m_clockticks) * 100.
after
perf list --details
...
unc_m_power_critical_throttle_cycles
[Cycles all ranks are in critical thermal throttle. Unit: uncore_imc]
uncore_imc_2/event=0x86/ MetricName: power_critical_throttle_cycles % MetricExpr: (unc_m_power_critical_throttle_cycles / unc_m_clockticks) * 100.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20170320201711.14142-14-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add support for a new JSON event attribute to name MetricExpr for better
output in perf stat.
If the event has no MetricName it uses the normal event name instead to
describe the metric.
Before
% perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
time unc_p_freq_max_os_cycles
1.000149775 15.7
2.000344807 19.3
3.000502544 16.7
4.000640656 6.6
5.000779955 9.9
After
% perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
time freq_max_os_cycles %
1.000149775 15.7
2.000344807 19.3
3.000502544 16.7
4.000640656 6.6
5.000779955 9.9
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170320201711.14142-13-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Output the metric expr in perf list when --debug is specified, so that
the user can check the formula.
Before:
% perf list
...
unc_m_power_channel_ppd
[Cycles where DRAM ranks are in power down (CKE) mode. Derived from unc_m_power_channel_ppd. Unit:
uncore_imc]
uncore_imc_2/event=0x85/
After:
% perf list --debug
...
unc_m_power_channel_ppd
[Cycles where DRAM ranks are in power down (CKE) mode. Derived from unc_m_power_channel_ppd. Unit:
uncore_imc]
Perf: uncore_imc_2/event=0x85/ MetricExpr: (unc_m_power_channel_ppd / unc_m_clockticks) * 100.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170320201711.14142-12-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add generic infrastructure to perf stat to output ratios for
"MetricExpr" entries in the event lists. Many events are more useful as
ratios than in raw form, typically some count in relation to total
ticks.
Transfer the MetricExpr information from the alias to the evsel.
We mark the events that need to be collected for MetricExpr, and also
link the events using them with a pointer. The code is careful to always
prefer the right event in the same group to minimize multiplexing
errors. At the moment only a single relation is supported.
Then add a rblist to the stat shadow code that remembers stats based on
the cpu and context.
Then finally update and retrieve and print these values similarly to the
existing hardcoded perf metrics. We use the simple expression parser
added earlier to evaluate the expression.
Normally we just output the result without further commentary, but for
--metric-only this would lead to empty columns. So for this case use the
original event as description.
There is no attempt to automatically add the MetricExpr event, if it is
missing, however we suggest it to the user, because the user tool
doesn't have enough information to reliably construct a group that is
guaranteed to schedule. So we leave that to the user.
% perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}'
1.000147889 800,085,181 unc_p_clockticks
1.000147889 93,126,241 unc_p_freq_max_os_cycles # 11.6
2.000448381 800,218,217 unc_p_clockticks
2.000448381 142,516,095 unc_p_freq_max_os_cycles # 17.8
3.000639852 800,243,057 unc_p_clockticks
3.000639852 162,292,689 unc_p_freq_max_os_cycles # 20.3
% perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
# time freq_max_os_cycles %
1.000127077 0.9
2.000301436 0.7
3.000456379 0.0
v2: Change from DivideBy to MetricExpr
v3: Use expr__ prefix. Support more than one other event.
v4: Update description
v5: Only print warning message once for multiple PMUs.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170320201711.14142-11-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add support for parsing the MetricExpr header in the JSON event lists
and storing them in the alias structure.
Used in the next patch.
v2: Change DividedBy to MetricExpr
v3: Really catch all uses of DividedBy
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170320201711.14142-10-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add a simple expression parser good enough to parse JSON relation
expressions. The parser is implemented using bison.
This is just intended as an simple parser for internal usage in the
event lists, not the beginning of a "perf scripting language"
v2: Use expr__ prefix instead of expr_
Support multiple free variables for parser
Committer note:
The v2 patch had:
%define api.pure full
In expr.y, that is a feature introduced in bison 2.7, to have reentrant
parsers, not using global variables, which would make tools/perf stop
building with the bison version shipped in older distros, so Andi
realised that the other parsers (e.g. parse-events.y) were using:
%pure-parser
Which is present in older versions of bison and fits the bill.
I added:
CFLAGS_expr-bison.o += -DYYENABLE_NLS=0 -DYYLTYPE_IS_TRIVIAL=0 -w
To finally make it build, copying what was there for pmu-bison.o,
another parser.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170320201711.14142-8-andi@firstfloor.org
[ stdlib.h is needed in tests/expr.c for free() fixing build in systems such as ubuntu:16.04-x-s390 ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Special case uncore_ prefix in PMU match, to allow for shorter event
uncore specifications.
Before:
perf stat -a -e uncore_cbox/event=0x35,umask=0x1,filter_opc=0x19C/ sleep 1
After
perf stat -a -e cbox/event=0x35,umask=0x1,filter_opc=0x19C/ sleep 1
Committer tests:
# perf list uncore
List of pre-defined events (to be used in -e):
uncore_cbox_0/clockticks/ [Kernel PMU event]
uncore_cbox_1/clockticks/ [Kernel PMU event]
uncore_imc/data_reads/ [Kernel PMU event]
uncore_imc/data_writes/ [Kernel PMU event]
# perf stat -a -e cbox_0/clockticks/ sleep 1
Performance counter stats for 'system wide':
281,474,976,653,084 cbox_0/clockticks/
1.000870129 seconds time elapsed
#
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20170320201711.14142-7-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>