Pull perf updates from Ingo Molnar:
"The main updates in this cycle were:
- Lots of perf tooling changes too voluminous to list (big perf trace
and perf stat improvements, lots of libtraceevent reorganization,
etc.), so I'll list the authors and refer to the changelog for
details:
Benjamin Peterson, Jérémie Galarneau, Kim Phillips, Peter
Zijlstra, Ravi Bangoria, Sangwon Hong, Sean V Kelley, Steven
Rostedt, Thomas Gleixner, Ding Xiang, Eduardo Habkost, Thomas
Richter, Andi Kleen, Sanskriti Sharma, Adrian Hunter, Tzvetomir
Stoyanov, Arnaldo Carvalho de Melo, Jiri Olsa.
... with the bulk of the changes written by Jiri Olsa, Tzvetomir
Stoyanov and Arnaldo Carvalho de Melo.
- Continued intel_rdt work with a focus on playing well with perf
events. This also imported some non-perf RDT work due to
dependencies. (Reinette Chatre)
- Implement counter freezing for Arch Perfmon v4 (Skylake and newer).
This allows to speed up the PMI handler by avoiding unnecessary MSR
writes and make it more accurate. (Andi Kleen)
- kprobes cleanups and simplification (Masami Hiramatsu)
- Intel Goldmont PMU updates (Kan Liang)
- ... plus misc other fixes and updates"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (155 commits)
kprobes/x86: Use preempt_enable() in optimized_callback()
x86/intel_rdt: Prevent pseudo-locking from using stale pointers
kprobes, x86/ptrace.h: Make regs_get_kernel_stack_nth() not fault on bad stack
perf/x86/intel: Export mem events only if there's PEBS support
x86/cpu: Drop pointless static qualifier in punit_dev_state_show()
x86/intel_rdt: Fix initial allocation to consider CDP
x86/intel_rdt: CBM overlap should also check for overlap with CDP peer
x86/intel_rdt: Introduce utility to obtain CDP peer
tools lib traceevent, perf tools: Move struct tep_handler definition in a local header file
tools lib traceevent: Separate out tep_strerror() for strerror_r() issues
perf python: More portable way to make CFLAGS work with clang
perf python: Make clang_has_option() work on Python 3
perf tools: Free temporary 'sys' string in read_event_files()
perf tools: Avoid double free in read_event_file()
perf tools: Free 'printk' string in parse_ftrace_printk()
perf tools: Cleanup trace-event-info 'tdata' leak
perf strbuf: Match va_{add,copy} with va_end
perf test: S390 does not support watchpoints in test 22
perf auxtrace: Include missing asm/bitsperlong.h to get BITS_PER_LONG
tools include: Adopt linux/bits.h
...
John reported crash when recording on an event under PMU with cpumask defined:
root@localhost:~# ./perf_debug_ record -e armv8_pmuv3_0/br_mis_pred/ sleep 1
perf: Segmentation fault
Obtained 9 stack frames.
./perf_debug_() [0x4c5ef8]
[0xffff82ba267c]
./perf_debug_() [0x4bc5a8]
./perf_debug_() [0x419550]
./perf_debug_() [0x41a928]
./perf_debug_() [0x472f58]
./perf_debug_() [0x473210]
./perf_debug_() [0x4070f4]
/lib/aarch64-linux-gnu/libc.so.6(__libc_start_main+0xe0) [0xffff8294c8a0]
Segmentation fault (core dumped)
We synthesize an update event that needs to touch the evsel id array, which is
not defined at that time. Fixing this by forcing the id allocation for events
with their own cpus.
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxarm@huawei.com
Fixes: bfd8f72c27 ("perf record: Synthesize unit/scale/... in event update")
Link: http://lkml.kernel.org/r/20181003212052.GA32371@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In order to make libtraceevent into a proper library, variables, data
structures and functions require a unique prefix to prevent name space
conflicts. That prefix will be "tep_". This renames enum format_flags
to enum tep_format_flags and adds prefix TEP_ to all of its members.
Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Cc: linux-trace-devel@vger.kernel.org
Link: http://lkml.kernel.org/r/20180919185722.803127871@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In order to make libtraceevent into a proper library, variables, data
structures and functions require a unique prefix to prevent name space
conflicts. That prefix will be "tep_". This renames struct format to
struct tep_format and struct format_field to struct tep_format_field
Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Cc: linux-trace-devel@vger.kernel.org
Link: http://lkml.kernel.org/r/20180919185722.661319373@goodmis.org
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add perf_evsel__store_ids() from stat's store_counter_ids() code to the
evsel class, so that it can be used globally.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180830063252.23729-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If evsel is NULL, we should return NULL to avoid a NULL pointer
dereference a bit later in the code.
Signed-off-by: Hisao Tanabe <xtanabe@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 03e0a7df3e ("perf tools: Introduce bpf-output event")
LPU-Reference: 20180824154556.23428-1-xtanabe@gmail.com
Link: https://lkml.kernel.org/n/tip-e5plzjhx6595a5yjaf22jss3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In order to make libtraceevent into a proper library, variables, data
structures and functions require a unique prefix to prevent name space
conflicts. That prefix will be "tep_" and not "pevent_". This changes
APIs: pevent_find_any_field, pevent_find_common_field,
pevent_find_event, pevent_find_field
Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yordan Karadzhov (VMware) <y.karadz@gmail.com>
Cc: linux-trace-devel@vger.kernel.org
Link: http://lkml.kernel.org/r/20180808180700.316995920@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'perf record' will error out if both --delay and LBR are applied.
For example:
# perf record -D 1000 -a -e cycles -j any -- sleep 2
Error:
dummy:HG: PMU Hardware doesn't support sampling/overflow-interrupts.
Try 'perf stat'
#
A dummy event is added implicitly for initial delay, which has the same
configurations as real sampling events. The dummy event is a software
event. If LBR is configured, perf must error out.
The dummy event will only be used to track PERF_RECORD_MMAP while perf
waits for the initial delay to enable the real events. The BRANCH_STACK
bit can be safely cleared for the dummy event.
After applying the patch:
# perf record -D 1000 -a -e cycles -j any -- sleep 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.054 MB perf.data (828 samples) ]
#
Reported-by: Sunil K Pandey <sunil.k.pandey@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1531145722-16404-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There's no reason to have separate function to display clock events.
It's only purpose was to convert the nanosecond value into microseconds.
We do that now in generic code, if the unit and scale values are
properly set, which this patch do for clock events.
The output differs in the unit field being displayed in its columns
rather than having it added as a suffix of the event name. Plus the
value is rounded into 2 decimal numbers as for any other event.
Before:
# perf stat -e cpu-clock,task-clock -C 0 sleep 3
Performance counter stats for 'CPU(s) 0':
3001.123137 cpu-clock (msec) # 1.000 CPUs utilized
3001.133250 task-clock (msec) # 1.000 CPUs utilized
3.001159813 seconds time elapsed
Now:
# perf stat -e cpu-clock,task-clock -C 0 sleep 3
Performance counter stats for 'CPU(s) 0':
3,001.05 msec cpu-clock # 1.000 CPUs utilized
3,001.05 msec task-clock # 1.000 CPUs utilized
3.001077794 seconds time elapsed
There's a small difference in csv output, as we now output the unit
field, which was empty before. It's in the proper spot, so there's no
compatibility issue.
Before:
# perf stat -e cpu-clock,task-clock -C 0 -x, sleep 3
3001.065177,,cpu-clock,3001064187,100.00,1.000,CPUs utilized
3001.077085,,task-clock,3001077085,100.00,1.000,CPUs utilized
# perf stat -e cpu-clock,task-clock -C 0 -x, sleep 3
3000.80,msec,cpu-clock,3000799026,100.00,1.000,CPUs utilized
3000.80,msec,task-clock,3000799550,100.00,1.000,CPUs utilized
Add perf_evsel__is_clock to replace nsec_counter.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180720110036.32251-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Its common to have the (evsel->attr.sample_type & PERF_SAMPLE_CALLCHAIN),
so add an evsel__has_callchain(evsel) helper.
This will actually get more uses as we check that instead of
symbol_conf.use_callchain in places where that produces the same result
but makes this decision to be more fine grained, per evsel.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-145340oytbthatpfeaq1do18@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Let tools that need to have those variables with the sysctl current
values use a function that will read them.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-1ljj3oeo5kpt2n1icfd9vowe@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Perf doesn't support mixed events from different PMUs (except software
event) in a group. The perf stat should output <not counted>/<not
supported> for all events, but it doesn't. For example,
perf stat -e '{cycles,uncore_imc_5/umask=0xF,event=0x4/,instructions}'
<not counted> cycles
<not supported> uncore_imc_5/umask=0xF,event=0x4/
1,024,300 instructions
If perf fails to open an event, it doesn't error out directly. It will
disable some features and retry, until the event is opened or all
features are disabled. The disabled features will not be re-enabled. The
group read is one of these features.
For the example as above, the IMC event and the leader event "cycles"
are from different PMUs. Opening the IMC event must fail. The group read
feature must be disabled for IMC event and the followed event
"instructions". The "instructions" event has the same PMU as the leader
"cycles". It can be opened successfully. Since the group read feature
has been disabled, the "instructions" event will be read as a single
event, which definitely has a value.
The group read fallback is still useful for the case which kernel
doesn't support group read. It is good enough to be handled only by the
leader.
For the fallback request from members, it must be caused by an error.
The fallback only breaks the semantics of group. Limit the group read
fallback only for the leader.
Committer testing:
On a broadwell t450s notebook:
Before:
# perf stat -e '{cycles,unc_cbo_cache_lookup.read_i,instructions}' sleep 1
Performance counter stats for 'sleep 1':
<not counted> cycles
<not supported> unc_cbo_cache_lookup.read_i
818,206 instructions
1.003170887 seconds time elapsed
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
After:
# perf stat -e '{cycles,unc_cbo_cache_lookup.read_i,instructions}' sleep 1
Performance counter stats for 'sleep 1':
<not counted> cycles
<not supported> unc_cbo_cache_lookup.read_i
<not counted> instructions
1.001380511 seconds time elapsed
Some events weren't counted. Try disabling the NMI watchdog:
echo 0 > /proc/sys/kernel/nmi_watchdog
perf stat ...
echo 1 > /proc/sys/kernel/nmi_watchdog
#
Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Agustin Vega-Frias <agustinv@codeaurora.org>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 82bf311e15 ("perf stat: Use group read for event groups")
Link: http://lkml.kernel.org/r/1524594014-79243-3-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
.. and other related fields that do not need to be enabled
for events that have sampling leader.
It fixes the perf top usage Ingo reported broken:
# perf top -e '{cycles,msr/aperf/}:S'
The 'msr/aperf/' event is configured for write_back sampling, which is
not allowed by the MSR PMU, so it fails to create the event.
Adjusting related attr test.
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180423090823.32309-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'perf stat' fallback for EACCES error sets the exclude_kernel
perf_event_attr and tries perf_event_open() again with it. In addition,
it also changes the name of the event to reflect that change by adding
the 'u' modifier.
But it does not take into account the '/' separator, so the event name
can end up mangled, like: (note the '/:' characters)
$ perf stat -e cpu/cpu-cycles/ kill
...
386,832 cpu/cpu-cycles/:u
Adding the code to check on the '/' separator and set the following
correct event name:
$ perf stat -e cpu/cpu-cycles/ kill
...
388,548 cpu/cpu-cycles/u
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180423090823.32309-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'perf record' suggests to enable the APIC on errors.
APIC is practically always used today and the problem is usually
somewhere else.
Just remove the outdated suggestion.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20180406203812.3087-5-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When perf record encounters an error setting up an event it suggests
to enable CONFIG_PERF_EVENTS. This is misleading because:
- Usually it is enabled (it is really hard to disable on x86)
- The problem is usually somewhere else, e.g. the CPU is not supported
or an invalid configuration has been used.
Remove the misleading suggestion.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20180406203812.3087-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Changing the output header for reporting forced groups via --groups
option on non grouped events, like:
$ perf record -e 'cycles,instructions'
$ perf report --stdio --group
Before:
# Samples: 24 of event 'anon group { cycles:u, instructions:u }'
After:
# Samples: 24 of events 'cycles:u, instructions:u'
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Fixes: ad52b8cb48 ("perf report: Add support to display group output for non group events")
Link: http://lkml.kernel.org/r/20180307155020.32613-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To simplify creation of events accross multiple instances of the same
type of PMU stat supports two methods for creating multiple events from
a single event specification:
1. A prefix or glob can be used in the PMU name.
2. Aliases, which are listed immediately after the Kernel PMU events
by perf list, are used.
When the --no-merge option is passed and these events are displayed
individually the PMU name is lost and it's not possible to see which
count corresponds to which pmu:
$ perf stat -a -e l3cache/read-miss/ --no-merge ls > /dev/null
Performance counter stats for 'system wide':
67 l3cache/read-miss/
67 l3cache/read-miss/
63 l3cache/read-miss/
60 l3cache/read-miss/
0.001675706 seconds time elapsed
$ perf stat -a -e l3cache_read_miss --no-merge ls > /dev/null
Performance counter stats for 'system wide':
12 l3cache_read_miss
17 l3cache_read_miss
10 l3cache_read_miss
8 l3cache_read_miss
0.001661305 seconds time elapsed
This change adds the original pmu name to the event. For dynamic pmu
events the pmu name is restored in the event name:
$ perf stat -a -e l3cache/read-miss/ --no-merge ls > /dev/null
Performance counter stats for 'system wide':
63 l3cache_0_3/read-miss/
74 l3cache_0_1/read-miss/
64 l3cache_0_2/read-miss/
74 l3cache_0_0/read-miss/
0.001675706 seconds time elapsed
For alias events the name is added after the event name:
$ perf stat -a -e l3cache_read_miss --no-merge ls > /dev/null
Performance counter stats for 'system wide':
10 l3cache_read_miss [l3cache_0_3]
12 l3cache_read_miss [l3cache_0_1]
10 l3cache_read_miss [l3cache_0_2]
17 l3cache_read_miss [l3cache_0_0]
0.001661305 seconds time elapsed
Signed-off-by: Agustin Vega-Frias <agustinv@codeaurora.org>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Timur Tabi <timur@codeaurora.org>
Cc: linux-arm-kernel@lists.infradead.org
Change-Id: I8056b9eda74bda33e95065056167ad96e97cb1fb
Link: http://lkml.kernel.org/r/1520345084-42646-3-git-send-email-agustinv@codeaurora.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is not really closing the cgroup, but instead dropping a reference
count and if it hits zero, then calling delete, which will, among other
cleanup shores, close the cgroup fd.
So it is really dropping a reference to that cgroup, and the method name
for that is "put", so rename close_cgroup() to cgroup__put() to follow
this naming convention.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-sccxpnd7bgwc1llgokt6fcey@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If we execute 'perf stat --per-thread' with non-root account (even set
kernel.perf_event_paranoid = -1 yet), it reports the error:
jinyao@skl:~$ perf stat --per-thread
Error:
You may not have permission to collect system-wide stats.
Consider tweaking /proc/sys/kernel/perf_event_paranoid,
which controls use of the performance events system by
unprivileged users (without CAP_SYS_ADMIN).
The current value is 2:
-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
>= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN
To make this setting permanent, edit /etc/sysctl.conf too, e.g.:
kernel.perf_event_paranoid = -1
Perhaps the ptrace rule doesn't allow to trace some processes. But anyway
the global --per-thread mode had better ignore such errors and continue
working on other threads.
This patch will record the index of error thread in perf_evsel__open()
and remove this thread before retrying.
For example (run with non-root, kernel.perf_event_paranoid isn't set):
jinyao@skl:~$ perf stat --per-thread
^C
Performance counter stats for 'system wide':
vmstat-3458 6.171984 cpu-clock:u (msec) # 0.000 CPUs utilized
perf-3670 0.515599 cpu-clock:u (msec) # 0.000 CPUs utilized
vmstat-3458 1,163,643 cycles:u # 0.189 GHz
perf-3670 40,881 cycles:u # 0.079 GHz
vmstat-3458 1,410,238 instructions:u # 1.21 insn per cycle
perf-3670 3,536 instructions:u # 0.09 insn per cycle
vmstat-3458 288,937 branches:u # 46.814 M/sec
perf-3670 936 branches:u # 1.815 M/sec
vmstat-3458 15,195 branch-misses:u # 5.26% of all branches
perf-3670 76 branch-misses:u # 8.12% of all branches
12.651675247 seconds time elapsed
Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1516117388-10120-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As tools may need to adjust to missing features, as 'perf top' will, in
the next csets, to cope with a missing 'write_backward' feature.
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-jelngl9q1ooaizvkcput9tic@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Stephane reported that we don't set properly PERIOD sample type for
events with period term defined.
Before:
$ perf record -e cpu/cpu-cycles,period=1000/u ls
$ perf evlist -v
cpu/cpu-cycles,period=1000/u: ... sample_type: IP|TID|TIME|PERIOD, ...
After:
$ perf record -e cpu/cpu-cycles,period=1000/u ls
$ perf evlist -v
cpu/cpu-cycles,period=1000/u: ... sample_type: IP|TID|TIME, ...
Setting PERIOD sample type based on period term setup.
Committer note:
When we use -c or a period=N term in the event definition, then we don't
need to ask the kernel, for this event, via perf_event_attr.sample_type
|= PERF_SAMPLE_PERIOD, to put the event period in each sample for this
event, as we know it already, it is in perf_event_attr.sample_period.
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Stephane Eranian <eranian@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180201083812.11359-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There is never a need to synthesize a 'swapped' sample, so all callers
to perf_event__synthesize_sample() pass 'false' as the value to
'swapped'. So get rid of the unused 'swapped' parameter.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1516108492-21401-4-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
PERF_SAMPLE_CPU contains the cpu number in the first 4 bytes and the
second 4 bytes are reserved. Ensure the reserved bytes are zero in
perf_event__synthesize_sample().
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1516108492-21401-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When we use a global DWARF setting as in:
perf record --call-graph dwarf
According to 5c0cf22477 ("perf record: Store data mmaps for dwarf unwind") we need
to set up some extra perf_event_attr bits.
But when we instead do a per event dwarf setting:
perf record -e cycles/call-graph=dwarf/
This was not being done, make them equivalent.
This didn't produce any output changes in my tests while fixing up loose
ends in the per-event settings, I found it just by comparing the
perf_event_attr fields trying to find an explanation for those problems.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Noel Grandin <noelgrandin@gmail.com>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-6476r53h2o38skbs9qa4ust4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When setting the "dwarf" unwinder for a specific event and not
specifying the max-stack, the attr.sample_max_stack ended up using an
uninitialized callchain_param.max_stack, fix it by using designated
initializers for that callchain_param variable, zeroing all non
explicitely initialized struct members.
Here is what happened:
# perf trace -vv --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
callchain: type DWARF
callchain: stack dump size 8192
perf_event_attr:
type 2
size 112
config 0x730
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|ADDR|CALLCHAIN|CPU|PERIOD|RAW|REGS_USER|STACK_USER|DATA_SRC
exclude_callchain_user 1
{ wakeup_events, wakeup_watermark } 1
sample_regs_user 0xff0fff
sample_stack_user 8192
sample_max_stack 50656
sys_perf_event_open failed, error -75
Value too large for defined data type
# perf trace -vv --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
callchain: type DWARF
callchain: stack dump size 8192
perf_event_attr:
type 2
size 112
config 0x730
sample_type IP|TID|TIME|ADDR|CALLCHAIN|CPU|PERIOD|RAW|REGS_USER|STACK_USER|DATA_SRC
exclude_callchain_user 1
sample_regs_user 0xff0fff
sample_stack_user 8192
sample_max_stack 30448
sys_perf_event_open failed, error -75
Value too large for defined data type
#
Now the attr.sample_max_stack is set to zero and the above works as
expected:
# perf trace --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.072 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.072/0.072/0.072/0.000 ms
0.000 probe_libc:inet_pton:(7feb7a998350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffaa39b6108f3f] (/usr/bin/ping)
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-is9tramondqa9jlxxsgcm9iz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The construct:
if (callchain_param)
perf_evsel__config_callchain(evsel, opts, &callchain_param);
happens in several places, so make perf_evsel__config_callchain() work
just like free(NULL), do nothing if param->enabled is not set.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ykk0qzxnxwx3o611ctjnmxav@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit ("d0565132605f perf evsel: Enable type checking for
perf_evsel_config_term types") assumes PERF_EVSEL__CONFIG_TERM_DRV_CFG
isn't used and as such adds a BUG_ON().
Since the enumeration type is used in macro ADD_CONFIG_TERM() the change
break CoreSight trace acquisition.
This patch restores the original code.
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: d056513260 ("perf evsel: Enable type checking for perf_evsel_config_term types")
Link: http://lkml.kernel.org/r/1515617211-32024-1-git-send-email-mathieu.poirier@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding support to display sample misc field in form
of letter for each bit:
# perf script -F +misc ...
sched-messaging 1414 K 28690.636582: 4590 cycles ...
sched-messaging 1407 U 28690.636600: 325620 cycles ...
sched-messaging 1414 K 28690.636608: 19473 cycles ...
misc field __________/
The misc bits are assigned to following letters:
PERF_RECORD_MISC_KERNEL K
PERF_RECORD_MISC_USER U
PERF_RECORD_MISC_HYPERVISOR H
PERF_RECORD_MISC_GUEST_KERNEL G
PERF_RECORD_MISC_GUEST_USER g
PERF_RECORD_MISC_MMAP_DATA* M
PERF_RECORD_MISC_COMM_EXEC E
PERF_RECORD_MISC_SWITCH_OUT S
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180107160356.28203-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While monitoring a multithread process with pid option, perf sometimes
may return sys_perf_event_open failure with 3(No such process) if any of
the process's threads die before we open the event. However, we want
perf continue monitoring the remaining threads and do not exit with
error.
Here, the patch enables perf_evsel::ignore_missing_thread for -p option
to ignore complete failure if any of threads die before we open the event.
But it may still return sys_perf_event_open failure with 22(Invalid) if we
monitors several event groups.
sys_perf_event_open: pid 28960 cpu 40 group_fd 118202 flags 0x8
sys_perf_event_open: pid 28961 cpu 40 group_fd 118203 flags 0x8
WARNING: Ignored open failure for pid 28962
sys_perf_event_open: pid 28962 cpu 40 group_fd [118203] flags 0x8
sys_perf_event_open failed, error -22
That is because when we ignore a missing thread, we change the thread_idx
without dealing with its fds, FD(evsel, cpu, thread). Then get_group_fd()
may return a wrong group_fd for the next thread and sys_perf_event_open()
return with 22.
sys_perf_event_open(){
...
if (group_fd != -1)
perf_fget_light()//to get corresponding group_leader by group_fd
...
if (group_leader)
if (group_leader->ctx->task != ctx->task)//should on the same task
goto err_context
...
}
This patch also fixes this bug by introducing perf_evsel__remove_fd() and
update_fds to allow removing fds for the missing thread.
Changes since v1:
- Change group_fd__remove() into a more genetic way without changing code logic
- Remove redundant condition
Changes since v2:
- Use a proper function name and add some comment.
- Multiline comment style fixes.
Committer testing:
Before this patch the recently added 'perf stat --per-thread' for system
wide counting would race while enumerating all threads using /proc:
[root@jouet ~]# perf stat --per-thread
failed to parse CPUs map: No such file or directory
Usage: perf stat [<options>] [<command>]
-C, --cpu <cpu> list of cpus to monitor in system-wide
-a, --all-cpus system-wide collection from all CPUs
[root@jouet ~]# perf stat --per-thread
failed to parse CPUs map: No such file or directory
Usage: perf stat [<options>] [<command>]
-C, --cpu <cpu> list of cpus to monitor in system-wide
-a, --all-cpus system-wide collection from all CPUs
[root@jouet ~]#
When, say, the kernel was being built, so lots of shortlived threads,
after this patch this doesn't happen.
Signed-off-by: Mengting Zhang <zhangmengting@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Cheng Jian <cj.chengjian@huawei.com>
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1513148513-6974-1-git-send-email-zhangmengting@huawei.com
[ Remove one use 'evlist' alias variable ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When we detect a different endianity we swap event before processing.
It's tricky for samples because we have no idea what's inside. We treat
it as an array of u64s, swap them and later on we swap back parts which
are different.
We mangle this way also the tracepoint raw data, which ends up in report
showing wrong data:
1.95% comm=Q^B pid=29285 prio=16777216 target_cpu=000
1.67% comm=l^B pid=0 prio=16777216 target_cpu=000
Luckily the traceevent library handles the endianity by itself (thank
you Steven!), so we can pass the RAW data directly in the other
endianity.
2.51% comm=beah-rhts-task pid=1175 prio=120 target_cpu=002
2.23% comm=kworker/0:0 pid=11566 prio=120 target_cpu=000
The fix is basically to swap back the raw data if different endianity is
detected.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20171129184346.3656-1-jolsa@kernel.org
[ Add util/memswap.c to python-ext-sources to link missing mem_bswap_64() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Paving the way to reuse these routines in other areas, like when
generating errno tables.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-rh1qv051vb8gfdcswskrn53h@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To reduce its function signature, since we get this from 'evsel' which
is already one of its arguments.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-070eap7t6uicg9c3w086xy2z@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add perf_evlist__parse_sample_timestamp to retrieve the timestamp of the
sample.
The idea is to use this function instead of the full sample parsing
before we queue the sample. At that time only the timestamp is needed
and we parse the sample once again later on delivery.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-o7syqo8lipj4or7renpu8e8y@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move the initialization bits into common place at the beginning of the
function.
Also removing some superfluous zero initialization for addr and
transaction, because we zero all the data at the top.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-1gv5t6fvv735t1rt3mxpy1h9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Help identify to the user the event with the unsupported sampling error.
Also suggest a corrective action.
BEFORE:
$ sudo ./oldperf record -e armv8_pmuv3/mem_access/,ccn/cycles/,armv8_pmuv3/l2d_cache/ true
Error:
PMU Hardware doesn't support sampling/overflow-interrupts.
AFTER:
$ sudo ./newperf record -e armv8_pmuv3/mem_access/,ccn/cycles/,armv8_pmuv3/l2d_cache/ true
Error:
ccn/cycles/: PMU Hardware doesn't support sampling/overflow-interrupts. Try 'perf stat'
Signed-off-by: Kim Phillips <kim.phillips@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171114150452.e846f2e23684c7d7d8ee706f@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I forgot one conversion, which got noticed by Thomas when running:
$ perf stat -e '{cpu-clock,instructions}' kill
kill: not enough arguments
Segmentation fault (core dumped)
$
Fix it, those stats are in evsel->stats, not anymore in evsel->priv.
Reported-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Tested-by: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: e669e833da ("perf evsel: Restore evsel->priv as a tool private area")
Link: http://lkml.kernel.org/r/20171109150046.GN4333@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use a typed enum for the perf_evsel_config_term type enum. This allows
gcc to do much stronger type checks, and also check for missing case
statements.
I removed the unused _MAX member from the number.
It found one missing case. I'm not sure it's a real problem, so I just
turned it into a BUG_ON for now.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20171020202755.21410-1-andi@firstfloor.org
[ Renamed the enum name to term_type as per jolsa's request ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The Intel PMU event aliases have a implicit period= specifier to set the
default period.
Unfortunately this breaks overriding these periods with -c or -F,
because the alias terms look like they are user specified to the
internal parser, and user specified event qualifiers override the
command line options.
Track that they are coming from aliases by adding a "weak" state to the
term. Any weak terms don't override command line options.
I only did it for -c/-F for now, I think that's the only case that's
broken currently.
Before:
$ perf record -c 1000 -vv -e uops_issued.any
...
{ sample_period, sample_freq } 2000003
After:
$ perf record -c 1000 -vv -e uops_issued.any
...
{ sample_period, sample_freq } 1000
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20171020202755.21410-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Yet another fix for probing the max attr.precise_ip setting: it is not
enough settting attr.exclude_kernel for !root users, as they _can_
profile the kernel if the kernel.perf_event_paranoid sysctl is set to
-1, so check that as well.
Testing it:
As non root:
$ sysctl kernel.perf_event_paranoid
kernel.perf_event_paranoid = 2
$ perf record sleep 1
$ perf evlist -v
cycles:uppp: ..., exclude_kernel: 1, ... precise_ip: 3, ...
Now as non-root, but with kernel.perf_event_paranoid set set to the
most permissive value, -1:
$ sysctl kernel.perf_event_paranoid
kernel.perf_event_paranoid = -1
$ perf record sleep 1
$ perf evlist -v
cycles:ppp: ..., exclude_kernel: 0, ... precise_ip: 3, ...
$
I.e. non-root, default kernel.perf_event_paranoid: :uppp modifier = not allowed to sample the kernel,
non-root, most permissible kernel.perf_event_paranoid: :ppp = allowed to sample the kernel.
In both cases, use the highest available precision: attr.precise_ip = 3.
Reported-and-Tested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: d37a369790 ("perf evsel: Fix attr.exclude_kernel setting for default cycles:p")
Link: http://lkml.kernel.org/n/tip-nj2qkf75xsd6pw6hhjzfqqdx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
USER_REGS can currently only collected implicitely with call graph
recording. Sometimes it is useful to see them separately, and filter
them. Add a new --user-regs option to record that is similar to
--intr-regs, but acts on user regs.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170905170029.19722-1-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Support new sample type PERF_SAMPLE_PHYS_ADDR for physical address.
Add new option --phys-data to record sample physical address.
Signed-off-by: Kan Liang <kan.liang@intel.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1504026672-7304-2-git-send-email-kan.liang@intel.com
[ Added missing printing in evsel.c patch sent by Jiri Olsa ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Set read_format for what we expect to get from read event generated by
perf_event_attr::inherit_stat.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170824162737.7813-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix misprint CAP_IOC_LOCK -> CAP_IPC_LOCK. This capability have nothing
to do with raw tracepoints. This part is about bypassing mlock limits.
Sysctl kernel.perf_event_paranoid = -1 allows raw and ftrace function
tracepoints without CAP_SYS_ADMIN.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/150322916080.129746.11285255474738558340.stgit@buzz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Fix buffer overflow for:
% perf stat -e msr/tsc/,cstate_core/c7-residency/ true
that causes glibc free list corruption. For some reason it doesn't
trigger in valgrind, but it is visible in AS:
=================================================================
==32681==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000003f5c at pc 0x0000005671ef bp 0x7ffdaaac9ac0 sp 0x7ffdaaac9ab0
READ of size 4 at 0x603000003f5c thread T0
#0 0x5671ee in perf_evsel__close_fd util/evsel.c:1196
#1 0x56c57a in perf_evsel__close util/evsel.c:1717
#2 0x55ed5f in perf_evlist__close util/evlist.c:1631
#3 0x4647e1 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:749
#4 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
#5 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
#6 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
#7 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
#8 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
#9 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
#10 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)
#11 0x428419 in _start (/home/ak/hle/obj-perf/perf+0x428419)
0x603000003f5c is located 0 bytes to the right of 28-byte region [0x603000003f40,0x603000003f5c)
allocated by thread T0 here:
#0 0x7f0675139020 in calloc (/lib64/libasan.so.3+0xc7020)
#1 0x648a2d in zalloc util/util.h:23
#2 0x648a88 in xyarray__new util/xyarray.c:9
#3 0x566419 in perf_evsel__alloc_fd util/evsel.c:1039
#4 0x56b427 in perf_evsel__open util/evsel.c:1529
#5 0x56c620 in perf_evsel__open_per_thread util/evsel.c:1730
#6 0x461dea in create_perf_stat_counter /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:263
#7 0x4637d7 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:600
#8 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
#9 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
#10 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
#11 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
#12 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
#13 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
#14 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)
The event is allocated with cpus == 1, but freed with cpus == real number
When the evsel close function walks the file descriptors it exceeds the
fd xyarray boundaries and reads random memory.
v2:
Now that xyarrays save their original dimensions we can use these to
iterate the two dimensional fd arrays. Fix some users (close, ioctl) in
evsel.c to use these fields directly. This allows simplifying the code
and dropping quite a few function arguments. Adjust all callers by
removing the unneeded arguments.
The actual perf event reading still uses the original values from the
evsel list.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170811232634.30465-2-andi@firstfloor.org
[ Fix up xy_max_[xy]() -> xyarray__max_[xy]() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Make perf stat use group read if there are groups defined. The group
read will get the values for all member of groups within a single
syscall instead of calling read syscall for every event.
We can see considerable less amount of kernel cycles spent on single
group read, than reading each event separately, like for following perf
stat command:
# perf stat -e {cycles,instructions} -I 10 -a sleep 1
Monitored with "perf stat -r 5 -e '{cycles:u,cycles:k}'"
Before:
24,325,676 cycles:u
297,040,775 cycles:k
1.038554134 seconds time elapsed
After:
25,034,418 cycles:u
158,256,395 cycles:k
1.036864497 seconds time elapsed
The perf_evsel__open fallback changes contributed by Andi Kleen.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170726120206.9099-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add perf_evsel__read_counter() to read single or group counter. After
calling this function the counter's evsel::counts struct is filled with
values for the counter and member of its group if there are any.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170726120206.9099-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>