License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 21:07:57 +07:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2017-04-18 22:26:44 +07:00
|
|
|
#include <dirent.h>
|
2017-04-18 20:46:11 +07:00
|
|
|
#include <errno.h>
|
2017-04-18 01:23:08 +07:00
|
|
|
#include <inttypes.h>
|
2017-04-18 22:33:30 +07:00
|
|
|
#include <regex.h>
|
2019-08-31 00:45:20 +07:00
|
|
|
#include <stdlib.h>
|
2012-12-08 03:39:39 +07:00
|
|
|
#include "callchain.h"
|
2012-10-07 02:26:02 +07:00
|
|
|
#include "debug.h"
|
2019-08-30 21:11:01 +07:00
|
|
|
#include "dso.h"
|
2019-08-31 00:45:20 +07:00
|
|
|
#include "env.h"
|
2012-10-07 02:26:02 +07:00
|
|
|
#include "event.h"
|
2012-12-08 03:39:39 +07:00
|
|
|
#include "evsel.h"
|
|
|
|
#include "hist.h"
|
2012-10-07 01:43:20 +07:00
|
|
|
#include "machine.h"
|
|
|
|
#include "map.h"
|
2019-08-31 01:09:54 +07:00
|
|
|
#include "map_symbol.h"
|
|
|
|
#include "branch.h"
|
|
|
|
#include "mem-events.h"
|
2019-08-23 03:10:08 +07:00
|
|
|
#include "srcline.h"
|
2019-01-28 06:03:34 +07:00
|
|
|
#include "symbol.h"
|
2012-12-08 03:39:39 +07:00
|
|
|
#include "sort.h"
|
2012-11-09 21:32:52 +07:00
|
|
|
#include "strlist.h"
|
2019-08-23 01:40:29 +07:00
|
|
|
#include "target.h"
|
2012-10-07 01:43:20 +07:00
|
|
|
#include "thread.h"
|
2019-08-23 03:10:08 +07:00
|
|
|
#include "util.h"
|
2014-07-23 18:23:00 +07:00
|
|
|
#include "vdso.h"
|
2012-10-07 01:43:20 +07:00
|
|
|
#include <stdbool.h>
|
2017-04-20 06:57:47 +07:00
|
|
|
#include <sys/types.h>
|
|
|
|
#include <sys/stat.h>
|
|
|
|
#include <unistd.h>
|
2012-12-08 03:39:39 +07:00
|
|
|
#include "unwind.h"
|
perf callchain: Support handling complete branch stacks as histograms
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.
This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.
This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.
Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:
tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}
% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12
The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.
This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR
Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)
v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-13 09:05:20 +07:00
|
|
|
#include "linux/hash.h"
|
perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info
Introduce a new option to record PERF_RECORD_NAMESPACES events emitted
by the kernel when fork, clone, setns or unshare are invoked. And update
perf-record documentation with the new option to record namespace
events.
Committer notes:
Combined it with a later patch to allow printing it via 'perf report -D'
and be able to test the feature introduced in this patch. Had to move
here also perf_ns__name(), that was introduced in another later patch.
Also used PRIu64 and PRIx64 to fix the build in some enfironments wrt:
util/event.c:1129:39: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'long long unsigned int' [-Werror=format=]
ret += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx
^
Testing it:
# perf record --namespaces -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.083 MB perf.data (423 samples) ]
#
# perf report -D
<SNIP>
3 2028902078892 0x115140 [0xa0]: PERF_RECORD_NAMESPACES 14783/14783 - nr_namespaces: 7
[0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]
0x1151e0 [0x30]: event: 9
.
. ... raw event: size 48 bytes
. 0000: 09 00 00 00 02 00 30 00 c4 71 82 68 0c 7f 00 00 ......0..q.h....
. 0010: a9 39 00 00 a9 39 00 00 94 28 fe 63 d8 01 00 00 .9...9...(.c....
. 0020: 03 00 00 00 00 00 00 00 ce c4 02 00 00 00 00 00 ................
<SNIP>
NAMESPACES events: 1
<SNIP>
#
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891930386.25309.18412039920746995488.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-08 03:41:43 +07:00
|
|
|
#include "asm/bug.h"
|
perf tools: Handle PERF_RECORD_BPF_EVENT
This patch adds basic handling of PERF_RECORD_BPF_EVENT. Tracking of
PERF_RECORD_BPF_EVENT is OFF by default. Option --bpf-event is added to
turn it on.
Committer notes:
Add dummy machine__process_bpf_event() variant that returns zero for
systems without HAVE_LIBBPF_SUPPORT, such as Alpine Linux, unbreaking
the build in such systems.
Remove the needless include <machine.h> from bpf->event.h, provide just
forward declarations for the structs and unions in the parameters, to
reduce compilation time and needless rebuilds when machine.h gets
changed.
Committer testing:
When running with:
# perf record --bpf-event
On an older kernel where PERF_RECORD_BPF_EVENT and PERF_RECORD_KSYMBOL
is not present, we fallback to removing those two bits from
perf_event_attr, making the tool to continue to work on older kernels:
perf_event_attr:
size 112
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|PERIOD
read_format ID
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
precise_ip 3
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
sys_perf_event_open: pid 5779 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open failed, error -22
switching off bpf_event
------------------------------------------------------------
perf_event_attr:
size 112
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|PERIOD
read_format ID
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
precise_ip 3
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
------------------------------------------------------------
sys_perf_event_open: pid 5779 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open failed, error -22
switching off ksymbol
------------------------------------------------------------
perf_event_attr:
size 112
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|PERIOD
read_format ID
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
precise_ip 3
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
------------------------------------------------------------
And then proceeds to work without those two features.
As passing --bpf-event is an explicit action performed by the user, perhaps we
should emit a warning telling that the kernel has no such feature, but this can
be done on top of this patch.
Now with a kernel that supports these events, start the 'record --bpf-event -a'
and then run 'perf trace sleep 10000' that will use the BPF
augmented_raw_syscalls.o prebuilt (for another kernel version even) and thus
should generate PERF_RECORD_BPF_EVENT events:
[root@quaco ~]# perf record -e dummy -a --bpf-event
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.713 MB perf.data ]
[root@quaco ~]# bpftool prog
13: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 13,14
14: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 13,14
15: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 15,16
16: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 15,16
17: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:44-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 17,18
18: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:44-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 17,18
21: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:45-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 21,22
22: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:45-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 21,22
31: tracepoint name sys_enter tag 12504ba9402f952f gpl
loaded_at 2019-01-19T09:19:56-0300 uid 0
xlated 512B jited 374B memlock 4096B map_ids 30,29,28
32: tracepoint name sys_exit tag c1bd85c092d6e4aa gpl
loaded_at 2019-01-19T09:19:56-0300 uid 0
xlated 256B jited 191B memlock 4096B map_ids 30,29
# perf report -D | grep PERF_RECORD_BPF_EVENT | nl
1 0 55834574849 0x4fc8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 13
2 0 60129542145 0x5118 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 14
3 0 64424509441 0x5268 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 15
4 0 68719476737 0x53b8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 16
5 0 73014444033 0x5508 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 17
6 0 77309411329 0x5658 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 18
7 0 90194313217 0x57a8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 21
8 0 94489280513 0x58f8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 22
9 7 620922484360 0xb6390 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 29
10 7 620922486018 0xb6410 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 2, flags 0, id 29
11 7 620922579199 0xb6490 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 30
12 7 620922580240 0xb6510 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 2, flags 0, id 30
13 7 620922765207 0xb6598 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 31
14 7 620922874543 0xb6620 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 32
#
There, the 31 and 32 tracepoint BPF programs put in place by 'perf trace'.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20190117161521.1341602-7-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-17 23:15:18 +07:00
|
|
|
#include "bpf-event.h"
|
2019-08-06 20:25:25 +07:00
|
|
|
#include <internal/lib.h> // page_size
|
2012-10-07 01:43:20 +07:00
|
|
|
|
tools perf: Move from sane_ctype.h obtained from git to the Linux's original
We got the sane_ctype.h headers from git and kept using it so far, but
since that code originally came from the kernel sources to the git
sources, perhaps its better to just use the one in the kernel, so that
we can leverage tools/perf/check_headers.sh to be notified when our copy
gets out of sync, i.e. when fixes or goodies are added to the code we've
copied.
This will help with things like tools/lib/string.c where we want to have
more things in common with the kernel, such as strim(), skip_spaces(),
etc so as to go on removing the things that we have in tools/perf/util/
and instead using the code in the kernel, indirectly and removing things
like EXPORT_SYMBOL(), etc, getting notified when fixes and improvements
are made to the original code.
Hopefully this also should help with reducing the difference of code
hosted in tools/ to the one in the kernel proper.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-7k9868l713wqtgo01xxygn12@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-06-26 03:27:31 +07:00
|
|
|
#include <linux/ctype.h>
|
2017-04-18 02:10:49 +07:00
|
|
|
#include <symbol/kallsyms.h>
|
2018-04-26 21:30:50 +07:00
|
|
|
#include <linux/mman.h>
|
2019-08-30 02:18:59 +07:00
|
|
|
#include <linux/string.h>
|
2019-07-04 21:32:27 +07:00
|
|
|
#include <linux/zalloc.h>
|
2017-04-18 02:10:49 +07:00
|
|
|
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock);
|
|
|
|
|
2019-11-01 01:22:24 +07:00
|
|
|
static struct dso *machine__kernel_dso(struct machine *machine)
|
|
|
|
{
|
|
|
|
return machine->vmlinux_map->dso;
|
|
|
|
}
|
|
|
|
|
perf machine: Add missing dsos->root rbtree root initialization
A segfault happens on 'perf test hists_link' because we end up using a
struct machines on the stack, and then machines__init() was not
initializing the newly introduced rb_root, just the existing list_head.
When we introduced struct dsos, to group the two ways to store dsos,
i.e. the linked list and the rbtree, we didn't turned the initialization
done in:
machines__init(machines->host) ->
machine__init() ->
INIT_LIST_HEAD
into a dsos__init() to keep on initializing the list_head but _as well_
initializing the rb_root, oops.
All worked because outside perf-test we probably zalloc the whole thing
which ends up initializing it in to NULL.
So the problem looks contained to 'perf test' that uses it on stack,
etc.
Reported-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Waiman Long <Waiman.Long@hp.com>,
Cc: Adrian Hunter <adrian.hunter@intel.com>,
Cc: Don Zickus <dzickus@redhat.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Waiman Long <Waiman.Long@hp.com>,
Link: http://lkml.kernel.org/r/20141014180353.GF3198@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-10-15 01:07:48 +07:00
|
|
|
static void dsos__init(struct dsos *dsos)
|
|
|
|
{
|
|
|
|
INIT_LIST_HEAD(&dsos->head);
|
|
|
|
dsos->root = RB_ROOT;
|
2017-04-04 23:15:04 +07:00
|
|
|
init_rwsem(&dsos->lock);
|
perf machine: Add missing dsos->root rbtree root initialization
A segfault happens on 'perf test hists_link' because we end up using a
struct machines on the stack, and then machines__init() was not
initializing the newly introduced rb_root, just the existing list_head.
When we introduced struct dsos, to group the two ways to store dsos,
i.e. the linked list and the rbtree, we didn't turned the initialization
done in:
machines__init(machines->host) ->
machine__init() ->
INIT_LIST_HEAD
into a dsos__init() to keep on initializing the list_head but _as well_
initializing the rb_root, oops.
All worked because outside perf-test we probably zalloc the whole thing
which ends up initializing it in to NULL.
So the problem looks contained to 'perf test' that uses it on stack,
etc.
Reported-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Waiman Long <Waiman.Long@hp.com>,
Cc: Adrian Hunter <adrian.hunter@intel.com>,
Cc: Don Zickus <dzickus@redhat.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Waiman Long <Waiman.Long@hp.com>,
Link: http://lkml.kernel.org/r/20141014180353.GF3198@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-10-15 01:07:48 +07:00
|
|
|
}
|
|
|
|
|
2017-09-11 09:23:14 +07:00
|
|
|
static void machine__threads_init(struct machine *machine)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
for (i = 0; i < THREADS__TABLE_SIZE; i++) {
|
|
|
|
struct threads *threads = &machine->threads[i];
|
2018-12-07 02:18:14 +07:00
|
|
|
threads->entries = RB_ROOT_CACHED;
|
2017-04-04 23:15:04 +07:00
|
|
|
init_rwsem(&threads->lock);
|
2017-09-11 09:23:14 +07:00
|
|
|
threads->nr = 0;
|
|
|
|
INIT_LIST_HEAD(&threads->dead);
|
|
|
|
threads->last_match = NULL;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-02-15 19:26:30 +07:00
|
|
|
static int machine__set_mmap_name(struct machine *machine)
|
|
|
|
{
|
2018-03-12 22:24:06 +07:00
|
|
|
if (machine__is_host(machine))
|
|
|
|
machine->mmap_name = strdup("[kernel.kallsyms]");
|
|
|
|
else if (machine__is_default_guest(machine))
|
|
|
|
machine->mmap_name = strdup("[guest.kernel.kallsyms]");
|
|
|
|
else if (asprintf(&machine->mmap_name, "[guest.kernel.kallsyms.%d]",
|
|
|
|
machine->pid) < 0)
|
|
|
|
machine->mmap_name = NULL;
|
2018-02-15 19:26:30 +07:00
|
|
|
|
|
|
|
return machine->mmap_name ? 0 : -ENOMEM;
|
|
|
|
}
|
|
|
|
|
2012-11-09 21:32:52 +07:00
|
|
|
int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
|
|
|
|
{
|
2018-02-15 19:26:29 +07:00
|
|
|
int err = -ENOMEM;
|
|
|
|
|
2015-12-08 09:25:44 +07:00
|
|
|
memset(machine, 0, sizeof(*machine));
|
2014-10-22 03:29:02 +07:00
|
|
|
map_groups__init(&machine->kmaps, machine);
|
2012-11-09 21:32:52 +07:00
|
|
|
RB_CLEAR_NODE(&machine->rb_node);
|
2015-05-28 23:06:42 +07:00
|
|
|
dsos__init(&machine->dsos);
|
2012-11-09 21:32:52 +07:00
|
|
|
|
2017-09-11 09:23:14 +07:00
|
|
|
machine__threads_init(machine);
|
2012-11-09 21:32:52 +07:00
|
|
|
|
2014-07-23 18:23:00 +07:00
|
|
|
machine->vdso_info = NULL;
|
2015-09-09 22:25:00 +07:00
|
|
|
machine->env = NULL;
|
2014-07-23 18:23:00 +07:00
|
|
|
|
2012-11-09 21:32:52 +07:00
|
|
|
machine->pid = pid;
|
|
|
|
|
2014-01-07 19:47:19 +07:00
|
|
|
machine->id_hdr_size = 0;
|
2016-05-17 21:56:24 +07:00
|
|
|
machine->kptr_restrict_warned = false;
|
2014-07-31 13:00:45 +07:00
|
|
|
machine->comm_exec = false;
|
2014-08-16 02:08:39 +07:00
|
|
|
machine->kernel_start = 0;
|
2018-04-27 02:52:34 +07:00
|
|
|
machine->vmlinux_map = NULL;
|
2015-12-09 09:11:33 +07:00
|
|
|
|
2012-11-09 21:32:52 +07:00
|
|
|
machine->root_dir = strdup(root_dir);
|
|
|
|
if (machine->root_dir == NULL)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2018-02-15 19:26:30 +07:00
|
|
|
if (machine__set_mmap_name(machine))
|
|
|
|
goto out;
|
|
|
|
|
2012-11-09 21:32:52 +07:00
|
|
|
if (pid != HOST_KERNEL_ID) {
|
2014-07-14 17:02:25 +07:00
|
|
|
struct thread *thread = machine__findnew_thread(machine, -1,
|
2013-08-27 15:23:03 +07:00
|
|
|
pid);
|
2012-11-09 21:32:52 +07:00
|
|
|
char comm[64];
|
|
|
|
|
|
|
|
if (thread == NULL)
|
2018-02-15 19:26:29 +07:00
|
|
|
goto out;
|
2012-11-09 21:32:52 +07:00
|
|
|
|
|
|
|
snprintf(comm, sizeof(comm), "[guest/%d]", pid);
|
2013-09-11 21:18:24 +07:00
|
|
|
thread__set_comm(thread, comm, 0);
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
thread__put(thread);
|
2012-11-09 21:32:52 +07:00
|
|
|
}
|
|
|
|
|
2014-07-22 20:17:25 +07:00
|
|
|
machine->current_tid = NULL;
|
2018-02-15 19:26:29 +07:00
|
|
|
err = 0;
|
2014-07-22 20:17:25 +07:00
|
|
|
|
2018-02-15 19:26:29 +07:00
|
|
|
out:
|
2018-02-15 19:26:30 +07:00
|
|
|
if (err) {
|
2018-02-15 19:26:29 +07:00
|
|
|
zfree(&machine->root_dir);
|
2018-02-15 19:26:30 +07:00
|
|
|
zfree(&machine->mmap_name);
|
|
|
|
}
|
2012-11-09 21:32:52 +07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-09-29 02:13:00 +07:00
|
|
|
struct machine *machine__new_host(void)
|
|
|
|
{
|
|
|
|
struct machine *machine = malloc(sizeof(*machine));
|
|
|
|
|
|
|
|
if (machine != NULL) {
|
|
|
|
machine__init(machine, "", HOST_KERNEL_ID);
|
|
|
|
|
|
|
|
if (machine__create_kernel_maps(machine) < 0)
|
|
|
|
goto out_delete;
|
|
|
|
}
|
|
|
|
|
|
|
|
return machine;
|
|
|
|
out_delete:
|
|
|
|
free(machine);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2017-01-06 01:31:48 +07:00
|
|
|
struct machine *machine__new_kallsyms(void)
|
|
|
|
{
|
|
|
|
struct machine *machine = machine__new_host();
|
|
|
|
/*
|
|
|
|
* FIXME:
|
2018-12-03 17:22:00 +07:00
|
|
|
* 1) We should switch to machine__load_kallsyms(), i.e. not explicitly
|
2017-01-06 01:31:48 +07:00
|
|
|
* ask for not using the kcore parsing code, once this one is fixed
|
|
|
|
* to create a map per module.
|
|
|
|
*/
|
2018-04-25 21:40:32 +07:00
|
|
|
if (machine && machine__load_kallsyms(machine, "/proc/kallsyms") <= 0) {
|
2017-01-06 01:31:48 +07:00
|
|
|
machine__delete(machine);
|
|
|
|
machine = NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
return machine;
|
|
|
|
}
|
|
|
|
|
2015-06-02 21:53:26 +07:00
|
|
|
static void dsos__purge(struct dsos *dsos)
|
2012-11-09 21:32:52 +07:00
|
|
|
{
|
|
|
|
struct dso *pos, *n;
|
|
|
|
|
2017-04-04 23:15:04 +07:00
|
|
|
down_write(&dsos->lock);
|
2015-06-02 01:40:01 +07:00
|
|
|
|
2014-09-30 03:07:28 +07:00
|
|
|
list_for_each_entry_safe(pos, n, &dsos->head, node) {
|
2014-10-01 00:36:15 +07:00
|
|
|
RB_CLEAR_NODE(&pos->rb_node);
|
2015-11-13 16:48:30 +07:00
|
|
|
pos->root = NULL;
|
2015-06-02 21:53:26 +07:00
|
|
|
list_del_init(&pos->node);
|
|
|
|
dso__put(pos);
|
2012-11-09 21:32:52 +07:00
|
|
|
}
|
2015-06-02 01:40:01 +07:00
|
|
|
|
2017-04-04 23:15:04 +07:00
|
|
|
up_write(&dsos->lock);
|
2015-06-02 21:53:26 +07:00
|
|
|
}
|
2015-06-02 01:40:01 +07:00
|
|
|
|
2015-06-02 21:53:26 +07:00
|
|
|
static void dsos__exit(struct dsos *dsos)
|
|
|
|
{
|
|
|
|
dsos__purge(dsos);
|
2017-04-04 23:15:04 +07:00
|
|
|
exit_rwsem(&dsos->lock);
|
2012-11-09 21:32:52 +07:00
|
|
|
}
|
|
|
|
|
2012-12-08 03:39:39 +07:00
|
|
|
void machine__delete_threads(struct machine *machine)
|
|
|
|
{
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
struct rb_node *nd;
|
2017-09-11 09:23:14 +07:00
|
|
|
int i;
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2017-09-11 09:23:14 +07:00
|
|
|
for (i = 0; i < THREADS__TABLE_SIZE; i++) {
|
|
|
|
struct threads *threads = &machine->threads[i];
|
2017-04-04 23:15:04 +07:00
|
|
|
down_write(&threads->lock);
|
2018-12-07 02:18:14 +07:00
|
|
|
nd = rb_first_cached(&threads->entries);
|
2017-09-11 09:23:14 +07:00
|
|
|
while (nd) {
|
|
|
|
struct thread *t = rb_entry(nd, struct thread, rb_node);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2017-09-11 09:23:14 +07:00
|
|
|
nd = rb_next(nd);
|
|
|
|
__machine__remove_thread(machine, t, false);
|
|
|
|
}
|
2017-04-04 23:15:04 +07:00
|
|
|
up_write(&threads->lock);
|
2012-12-08 03:39:39 +07:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-11-09 21:32:52 +07:00
|
|
|
void machine__exit(struct machine *machine)
|
|
|
|
{
|
2017-09-11 09:23:14 +07:00
|
|
|
int i;
|
|
|
|
|
2017-11-14 02:06:29 +07:00
|
|
|
if (machine == NULL)
|
|
|
|
return;
|
|
|
|
|
2015-11-18 13:40:24 +07:00
|
|
|
machine__destroy_kernel_maps(machine);
|
2012-11-09 21:32:52 +07:00
|
|
|
map_groups__exit(&machine->kmaps);
|
2015-06-02 01:40:01 +07:00
|
|
|
dsos__exit(&machine->dsos);
|
perf machine: Fix up vdso methods names
To make it consistent with the other dso lifetime routines.
For instance:
struct dso *vdso__new(struct machine *machine, const char *short_name,
const char *long_name)
Becomes:
struct dso *machine__addnew_vdso(struct machine *machine, const
char *short_name, const char *long_name)
Because:
1) There is no 'struct vdso' for us to have vdso__ prefixed routines.
2) Because it will not really just create a new instance of 'struct
dso', it'll call dso__new() but it will also insert it into the
DSO's list/rbtree, and we have a method name for that: 'addnew',
just like we have dsos__addnew().
3) So it is really a 'struct machine' operation, it is the first
argument, etc.
This way the place where this is used gets consistent:
if (vdso) {
pgoff = 0;
- dso = vdso__dso_findnew(machine, thread);
+ dso = machine__findnew_vdso(machine, thread);
} else
dso = machine__findnew_dso(machine, filename);
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/n/tip-r3w3tvh8exm9xfz3p4tz9qbz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-05-29 21:54:08 +07:00
|
|
|
machine__exit_vdso(machine);
|
2013-12-27 03:41:15 +07:00
|
|
|
zfree(&machine->root_dir);
|
2018-02-15 19:26:30 +07:00
|
|
|
zfree(&machine->mmap_name);
|
2014-07-22 20:17:25 +07:00
|
|
|
zfree(&machine->current_tid);
|
2017-09-11 09:23:14 +07:00
|
|
|
|
|
|
|
for (i = 0; i < THREADS__TABLE_SIZE; i++) {
|
|
|
|
struct threads *threads = &machine->threads[i];
|
perf thread: Allow references to thread objects after machine__exit()
Threads are created when we either synthesize PERF_RECORD_FORK events
for pre-existing threads or when we receive PERF_RECORD_FORK events from
the kernel as new threads get created.
We then keep them in machine->threads[].entries rb trees till when we
receive a PERF_RECORD_EXIT, i.e. that thread terminated.
The thread object has a reference count that is grabbed when, for
instance, we keep that thread referenced in struct hist_entry, in 'perf
report' and 'perf top'.
When we receive a PERF_RECORD_EXIT we remove the thread object from the
rb tree and move it to the corresponding machine->threads[].dead list,
then we do a thread__put(), dropping the reference we had for keeping it
in the rb tree.
In thread__put() we were assuming that when the reference count hit zero
we should remove it from the dead list by simply doing a
list_del_init(&thread->node).
That works well when all the thread lifetime is during the machine that
has the list heads lifetime, since we know that we can do the
list_del_init() and it will update the 'dead' list_head.
But in 'perf sched lat' we were doing:
machine__new() (via perf_session__new)
process events, grabbing refcounts to keep those thread objects
in 'perf sched' local data structures.
machine__exit() (via perf_session__delete) which would delete the
'dead' list heads.
And then doing the final thread__put() for the refcounts 'perf sched'
rightfully obtained for keeping those thread object references.
b00m, since thread__put() would do the list_del_init() touching
a dead dead list head.
Fix it by removing all the dead threads from machine->threads[].dead at
machine__exit(), since whatever is there should have refcounts taken by
things like 'perf sched lat', and make thread__put() check if the thread
is in a linked list before removing it from that list.
Reported-by: Wei Li <liwei391@huawei.com>
Link: https://lkml.kernel.org/r/20190508143648.8153-1-liwei391@huawei.com
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhipeng Xie <xiezhipeng1@huawei.com>
Link: https://lkml.kernel.org/r/20190704194355.GI10740@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-05 22:11:35 +07:00
|
|
|
struct thread *thread, *n;
|
|
|
|
/*
|
|
|
|
* Forget about the dead, at this point whatever threads were
|
|
|
|
* left in the dead lists better have a reference count taken
|
|
|
|
* by who is using them, and then, when they drop those references
|
|
|
|
* and it finally hits zero, thread__put() will check and see that
|
|
|
|
* its not in the dead threads list and will not try to remove it
|
|
|
|
* from there, just calling thread__delete() straight away.
|
|
|
|
*/
|
|
|
|
list_for_each_entry_safe(thread, n, &threads->dead, node)
|
|
|
|
list_del_init(&thread->node);
|
|
|
|
|
2017-04-04 23:15:04 +07:00
|
|
|
exit_rwsem(&threads->lock);
|
2017-09-11 09:23:14 +07:00
|
|
|
}
|
2012-11-09 21:32:52 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
void machine__delete(struct machine *machine)
|
|
|
|
{
|
2016-06-22 20:19:11 +07:00
|
|
|
if (machine) {
|
|
|
|
machine__exit(machine);
|
|
|
|
free(machine);
|
|
|
|
}
|
2012-11-09 21:32:52 +07:00
|
|
|
}
|
|
|
|
|
2012-12-19 05:15:48 +07:00
|
|
|
void machines__init(struct machines *machines)
|
|
|
|
{
|
|
|
|
machine__init(&machines->host, "", HOST_KERNEL_ID);
|
2018-12-07 02:18:14 +07:00
|
|
|
machines->guests = RB_ROOT_CACHED;
|
2012-12-19 05:15:48 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
void machines__exit(struct machines *machines)
|
|
|
|
{
|
|
|
|
machine__exit(&machines->host);
|
|
|
|
/* XXX exit guest */
|
|
|
|
}
|
|
|
|
|
|
|
|
struct machine *machines__add(struct machines *machines, pid_t pid,
|
2012-11-09 21:32:52 +07:00
|
|
|
const char *root_dir)
|
|
|
|
{
|
2018-12-07 02:18:14 +07:00
|
|
|
struct rb_node **p = &machines->guests.rb_root.rb_node;
|
2012-11-09 21:32:52 +07:00
|
|
|
struct rb_node *parent = NULL;
|
|
|
|
struct machine *pos, *machine = malloc(sizeof(*machine));
|
2018-12-07 02:18:14 +07:00
|
|
|
bool leftmost = true;
|
2012-11-09 21:32:52 +07:00
|
|
|
|
|
|
|
if (machine == NULL)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
if (machine__init(machine, root_dir, pid) != 0) {
|
|
|
|
free(machine);
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
|
|
|
while (*p != NULL) {
|
|
|
|
parent = *p;
|
|
|
|
pos = rb_entry(parent, struct machine, rb_node);
|
|
|
|
if (pid < pos->pid)
|
|
|
|
p = &(*p)->rb_left;
|
2018-12-07 02:18:14 +07:00
|
|
|
else {
|
2012-11-09 21:32:52 +07:00
|
|
|
p = &(*p)->rb_right;
|
2018-12-07 02:18:14 +07:00
|
|
|
leftmost = false;
|
|
|
|
}
|
2012-11-09 21:32:52 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
rb_link_node(&machine->rb_node, parent, p);
|
2018-12-07 02:18:14 +07:00
|
|
|
rb_insert_color_cached(&machine->rb_node, &machines->guests, leftmost);
|
2012-11-09 21:32:52 +07:00
|
|
|
|
|
|
|
return machine;
|
|
|
|
}
|
|
|
|
|
2014-07-31 13:00:45 +07:00
|
|
|
void machines__set_comm_exec(struct machines *machines, bool comm_exec)
|
|
|
|
{
|
|
|
|
struct rb_node *nd;
|
|
|
|
|
|
|
|
machines->host.comm_exec = comm_exec;
|
|
|
|
|
2018-12-07 02:18:14 +07:00
|
|
|
for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
|
2014-07-31 13:00:45 +07:00
|
|
|
struct machine *machine = rb_entry(nd, struct machine, rb_node);
|
|
|
|
|
|
|
|
machine->comm_exec = comm_exec;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-12-19 05:15:48 +07:00
|
|
|
struct machine *machines__find(struct machines *machines, pid_t pid)
|
2012-11-09 21:32:52 +07:00
|
|
|
{
|
2018-12-07 02:18:14 +07:00
|
|
|
struct rb_node **p = &machines->guests.rb_root.rb_node;
|
2012-11-09 21:32:52 +07:00
|
|
|
struct rb_node *parent = NULL;
|
|
|
|
struct machine *machine;
|
|
|
|
struct machine *default_machine = NULL;
|
|
|
|
|
2012-12-19 05:15:48 +07:00
|
|
|
if (pid == HOST_KERNEL_ID)
|
|
|
|
return &machines->host;
|
|
|
|
|
2012-11-09 21:32:52 +07:00
|
|
|
while (*p != NULL) {
|
|
|
|
parent = *p;
|
|
|
|
machine = rb_entry(parent, struct machine, rb_node);
|
|
|
|
if (pid < machine->pid)
|
|
|
|
p = &(*p)->rb_left;
|
|
|
|
else if (pid > machine->pid)
|
|
|
|
p = &(*p)->rb_right;
|
|
|
|
else
|
|
|
|
return machine;
|
|
|
|
if (!machine->pid)
|
|
|
|
default_machine = machine;
|
|
|
|
}
|
|
|
|
|
|
|
|
return default_machine;
|
|
|
|
}
|
|
|
|
|
2012-12-19 05:15:48 +07:00
|
|
|
struct machine *machines__findnew(struct machines *machines, pid_t pid)
|
2012-11-09 21:32:52 +07:00
|
|
|
{
|
|
|
|
char path[PATH_MAX];
|
|
|
|
const char *root_dir = "";
|
|
|
|
struct machine *machine = machines__find(machines, pid);
|
|
|
|
|
|
|
|
if (machine && (machine->pid == pid))
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if ((pid != HOST_KERNEL_ID) &&
|
|
|
|
(pid != DEFAULT_GUEST_KERNEL_ID) &&
|
|
|
|
(symbol_conf.guestmount)) {
|
|
|
|
sprintf(path, "%s/%d", symbol_conf.guestmount, pid);
|
|
|
|
if (access(path, R_OK)) {
|
|
|
|
static struct strlist *seen;
|
|
|
|
|
|
|
|
if (!seen)
|
2015-07-20 22:13:34 +07:00
|
|
|
seen = strlist__new(NULL, NULL);
|
2012-11-09 21:32:52 +07:00
|
|
|
|
|
|
|
if (!strlist__has_entry(seen, path)) {
|
|
|
|
pr_err("Can't access file %s\n", path);
|
|
|
|
strlist__add(seen, path);
|
|
|
|
}
|
|
|
|
machine = NULL;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
root_dir = path;
|
|
|
|
}
|
|
|
|
|
|
|
|
machine = machines__add(machines, pid, root_dir);
|
|
|
|
out:
|
|
|
|
return machine;
|
|
|
|
}
|
|
|
|
|
2012-12-19 05:15:48 +07:00
|
|
|
void machines__process_guests(struct machines *machines,
|
|
|
|
machine__process_t process, void *data)
|
2012-11-09 21:32:52 +07:00
|
|
|
{
|
|
|
|
struct rb_node *nd;
|
|
|
|
|
2018-12-07 02:18:14 +07:00
|
|
|
for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
|
2012-11-09 21:32:52 +07:00
|
|
|
struct machine *pos = rb_entry(nd, struct machine, rb_node);
|
|
|
|
process(pos, data);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-12-19 05:15:48 +07:00
|
|
|
void machines__set_id_hdr_size(struct machines *machines, u16 id_hdr_size)
|
2012-11-09 21:32:52 +07:00
|
|
|
{
|
|
|
|
struct rb_node *node;
|
|
|
|
struct machine *machine;
|
|
|
|
|
2012-12-19 05:15:48 +07:00
|
|
|
machines->host.id_hdr_size = id_hdr_size;
|
|
|
|
|
2018-12-07 02:18:14 +07:00
|
|
|
for (node = rb_first_cached(&machines->guests); node;
|
|
|
|
node = rb_next(node)) {
|
2012-11-09 21:32:52 +07:00
|
|
|
machine = rb_entry(node, struct machine, rb_node);
|
|
|
|
machine->id_hdr_size = id_hdr_size;
|
|
|
|
}
|
|
|
|
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2014-07-16 15:07:13 +07:00
|
|
|
static void machine__update_thread_pid(struct machine *machine,
|
|
|
|
struct thread *th, pid_t pid)
|
|
|
|
{
|
|
|
|
struct thread *leader;
|
|
|
|
|
|
|
|
if (pid == th->pid_ || pid == -1 || th->pid_ != -1)
|
|
|
|
return;
|
|
|
|
|
|
|
|
th->pid_ = pid;
|
|
|
|
|
|
|
|
if (th->pid_ == th->tid)
|
|
|
|
return;
|
|
|
|
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
leader = __machine__findnew_thread(machine, th->pid_, th->pid_);
|
2014-07-16 15:07:13 +07:00
|
|
|
if (!leader)
|
|
|
|
goto out_err;
|
|
|
|
|
|
|
|
if (!leader->mg)
|
2014-10-22 03:29:02 +07:00
|
|
|
leader->mg = map_groups__new(machine);
|
2014-07-16 15:07:13 +07:00
|
|
|
|
|
|
|
if (!leader->mg)
|
|
|
|
goto out_err;
|
|
|
|
|
|
|
|
if (th->mg == leader->mg)
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (th->mg) {
|
|
|
|
/*
|
|
|
|
* Maps are created from MMAP events which provide the pid and
|
|
|
|
* tid. Consequently there never should be any maps on a thread
|
|
|
|
* with an unknown pid. Just print an error if there are.
|
|
|
|
*/
|
|
|
|
if (!map_groups__empty(th->mg))
|
|
|
|
pr_err("Discarding thread maps for %d:%d\n",
|
|
|
|
th->pid_, th->tid);
|
2015-05-20 06:07:14 +07:00
|
|
|
map_groups__put(th->mg);
|
2014-07-16 15:07:13 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
th->mg = map_groups__get(leader->mg);
|
2015-12-12 05:11:23 +07:00
|
|
|
out_put:
|
|
|
|
thread__put(leader);
|
2014-07-16 15:07:13 +07:00
|
|
|
return;
|
|
|
|
out_err:
|
|
|
|
pr_err("Failed to join map groups for %d:%d\n", th->pid_, th->tid);
|
2015-12-12 05:11:23 +07:00
|
|
|
goto out_put;
|
2014-07-16 15:07:13 +07:00
|
|
|
}
|
|
|
|
|
2015-12-12 05:11:23 +07:00
|
|
|
/*
|
2018-07-19 21:33:42 +07:00
|
|
|
* Front-end cache - TID lookups come in blocks,
|
|
|
|
* so most of the time we dont have to look up
|
|
|
|
* the full rbtree:
|
2015-12-12 05:11:23 +07:00
|
|
|
*/
|
2018-07-19 21:33:42 +07:00
|
|
|
static struct thread*
|
perf machine: Use last_match threads cache only in single thread mode
There's an issue with using threads::last_match in multithread mode
which is enabled during the perf top synthesize. It might crash with
following assertion:
perf: ...include/linux/refcount.h:109: refcount_inc:
Assertion `!(!refcount_inc_not_zero(r))' failed.
The gdb backtrace looks like this:
0x00007ffff50839fb in raise () from /lib64/libc.so.6
(gdb)
#0 0x00007ffff50839fb in raise () from /lib64/libc.so.6
#1 0x00007ffff5085800 in abort () from /lib64/libc.so.6
#2 0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
#4 0x0000000000535ff9 in refcount_inc (r=0x7fffe8009a70)
at ...include/linux/refcount.h:109
#5 0x0000000000536771 in thread__get (thread=0x7fffe8009a40)
at util/thread.c:115
#6 0x0000000000523cd0 in ____machine__findnew_thread (machine=0xbfde38,
threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:432
#7 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
pid=2, tid=2) at util/machine.c:489
#8 0x0000000000523f24 in machine__findnew_thread (machine=0xbfde38,
pid=2, tid=2) at util/machine.c:499
#9 0x0000000000526fbe in machine__process_fork_event (machine=0xbfde38,
...
The failing assertion is this one:
REFCOUNT_WARN(!refcount_inc_not_zero(r), ...
the problem is that we don't serialize access to threads::last_match.
We serialize the access to the threads tree, but we don't care how's
threads::last_match being accessed. Both locked/unlocked paths use
that data and can set it. In multithreaded mode we can end up with
invalid object in thread__get call, like in following paths race:
thread 1
...
machine__findnew_thread
down_write(&threads->lock);
__machine__findnew_thread
____machine__findnew_thread
th = threads->last_match;
if (th->tid == tid) {
thread__get
thread 2
...
machine__find_thread
down_read(&threads->lock);
__machine__findnew_thread
____machine__findnew_thread
th = threads->last_match;
if (th->tid == tid) {
thread__get
thread 3
...
machine__process_fork_event
machine__remove_thread
__machine__remove_thread
threads->last_match = NULL
thread__put
thread__put
Thread 1 and 2 might got stale last_match, before thread 3 clears
it. Thread 1 and 2 then race with thread 3's thread__put and they
might trigger the refcnt == 0 assertion above.
The patch is disabling the last_match cache for multiple thread
mode. It was originally meant for single thread scenarios, where
it's common to have multiple sequential searches of the same
thread.
In multithread mode this does not make sense, because top's threads
processes different /proc entries and so the 'struct threads' object
is queried for various threads. Moreover we'd need to add more locks
to make it work.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20180719143345.12963-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-19 21:33:44 +07:00
|
|
|
__threads__get_last_match(struct threads *threads, struct machine *machine,
|
|
|
|
int pid, int tid)
|
2012-10-07 01:43:20 +07:00
|
|
|
{
|
|
|
|
struct thread *th;
|
|
|
|
|
2017-09-11 09:23:14 +07:00
|
|
|
th = threads->last_match;
|
2015-03-03 08:21:35 +07:00
|
|
|
if (th != NULL) {
|
|
|
|
if (th->tid == tid) {
|
|
|
|
machine__update_thread_pid(machine, th, pid);
|
2015-12-12 05:11:23 +07:00
|
|
|
return thread__get(th);
|
2015-03-03 08:21:35 +07:00
|
|
|
}
|
|
|
|
|
2017-09-11 09:23:14 +07:00
|
|
|
threads->last_match = NULL;
|
2013-08-26 20:00:19 +07:00
|
|
|
}
|
2012-10-07 01:43:20 +07:00
|
|
|
|
2018-07-19 21:33:42 +07:00
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
perf machine: Use last_match threads cache only in single thread mode
There's an issue with using threads::last_match in multithread mode
which is enabled during the perf top synthesize. It might crash with
following assertion:
perf: ...include/linux/refcount.h:109: refcount_inc:
Assertion `!(!refcount_inc_not_zero(r))' failed.
The gdb backtrace looks like this:
0x00007ffff50839fb in raise () from /lib64/libc.so.6
(gdb)
#0 0x00007ffff50839fb in raise () from /lib64/libc.so.6
#1 0x00007ffff5085800 in abort () from /lib64/libc.so.6
#2 0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
#4 0x0000000000535ff9 in refcount_inc (r=0x7fffe8009a70)
at ...include/linux/refcount.h:109
#5 0x0000000000536771 in thread__get (thread=0x7fffe8009a40)
at util/thread.c:115
#6 0x0000000000523cd0 in ____machine__findnew_thread (machine=0xbfde38,
threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:432
#7 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
pid=2, tid=2) at util/machine.c:489
#8 0x0000000000523f24 in machine__findnew_thread (machine=0xbfde38,
pid=2, tid=2) at util/machine.c:499
#9 0x0000000000526fbe in machine__process_fork_event (machine=0xbfde38,
...
The failing assertion is this one:
REFCOUNT_WARN(!refcount_inc_not_zero(r), ...
the problem is that we don't serialize access to threads::last_match.
We serialize the access to the threads tree, but we don't care how's
threads::last_match being accessed. Both locked/unlocked paths use
that data and can set it. In multithreaded mode we can end up with
invalid object in thread__get call, like in following paths race:
thread 1
...
machine__findnew_thread
down_write(&threads->lock);
__machine__findnew_thread
____machine__findnew_thread
th = threads->last_match;
if (th->tid == tid) {
thread__get
thread 2
...
machine__find_thread
down_read(&threads->lock);
__machine__findnew_thread
____machine__findnew_thread
th = threads->last_match;
if (th->tid == tid) {
thread__get
thread 3
...
machine__process_fork_event
machine__remove_thread
__machine__remove_thread
threads->last_match = NULL
thread__put
thread__put
Thread 1 and 2 might got stale last_match, before thread 3 clears
it. Thread 1 and 2 then race with thread 3's thread__put and they
might trigger the refcnt == 0 assertion above.
The patch is disabling the last_match cache for multiple thread
mode. It was originally meant for single thread scenarios, where
it's common to have multiple sequential searches of the same
thread.
In multithread mode this does not make sense, because top's threads
processes different /proc entries and so the 'struct threads' object
is queried for various threads. Moreover we'd need to add more locks
to make it work.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20180719143345.12963-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-19 21:33:44 +07:00
|
|
|
static struct thread*
|
|
|
|
threads__get_last_match(struct threads *threads, struct machine *machine,
|
|
|
|
int pid, int tid)
|
|
|
|
{
|
|
|
|
struct thread *th = NULL;
|
|
|
|
|
|
|
|
if (perf_singlethreaded)
|
|
|
|
th = __threads__get_last_match(threads, machine, pid, tid);
|
|
|
|
|
|
|
|
return th;
|
|
|
|
}
|
|
|
|
|
2018-07-19 21:33:43 +07:00
|
|
|
static void
|
perf machine: Use last_match threads cache only in single thread mode
There's an issue with using threads::last_match in multithread mode
which is enabled during the perf top synthesize. It might crash with
following assertion:
perf: ...include/linux/refcount.h:109: refcount_inc:
Assertion `!(!refcount_inc_not_zero(r))' failed.
The gdb backtrace looks like this:
0x00007ffff50839fb in raise () from /lib64/libc.so.6
(gdb)
#0 0x00007ffff50839fb in raise () from /lib64/libc.so.6
#1 0x00007ffff5085800 in abort () from /lib64/libc.so.6
#2 0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
#4 0x0000000000535ff9 in refcount_inc (r=0x7fffe8009a70)
at ...include/linux/refcount.h:109
#5 0x0000000000536771 in thread__get (thread=0x7fffe8009a40)
at util/thread.c:115
#6 0x0000000000523cd0 in ____machine__findnew_thread (machine=0xbfde38,
threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:432
#7 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
pid=2, tid=2) at util/machine.c:489
#8 0x0000000000523f24 in machine__findnew_thread (machine=0xbfde38,
pid=2, tid=2) at util/machine.c:499
#9 0x0000000000526fbe in machine__process_fork_event (machine=0xbfde38,
...
The failing assertion is this one:
REFCOUNT_WARN(!refcount_inc_not_zero(r), ...
the problem is that we don't serialize access to threads::last_match.
We serialize the access to the threads tree, but we don't care how's
threads::last_match being accessed. Both locked/unlocked paths use
that data and can set it. In multithreaded mode we can end up with
invalid object in thread__get call, like in following paths race:
thread 1
...
machine__findnew_thread
down_write(&threads->lock);
__machine__findnew_thread
____machine__findnew_thread
th = threads->last_match;
if (th->tid == tid) {
thread__get
thread 2
...
machine__find_thread
down_read(&threads->lock);
__machine__findnew_thread
____machine__findnew_thread
th = threads->last_match;
if (th->tid == tid) {
thread__get
thread 3
...
machine__process_fork_event
machine__remove_thread
__machine__remove_thread
threads->last_match = NULL
thread__put
thread__put
Thread 1 and 2 might got stale last_match, before thread 3 clears
it. Thread 1 and 2 then race with thread 3's thread__put and they
might trigger the refcnt == 0 assertion above.
The patch is disabling the last_match cache for multiple thread
mode. It was originally meant for single thread scenarios, where
it's common to have multiple sequential searches of the same
thread.
In multithread mode this does not make sense, because top's threads
processes different /proc entries and so the 'struct threads' object
is queried for various threads. Moreover we'd need to add more locks
to make it work.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20180719143345.12963-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-19 21:33:44 +07:00
|
|
|
__threads__set_last_match(struct threads *threads, struct thread *th)
|
2018-07-19 21:33:43 +07:00
|
|
|
{
|
|
|
|
threads->last_match = th;
|
|
|
|
}
|
|
|
|
|
perf machine: Use last_match threads cache only in single thread mode
There's an issue with using threads::last_match in multithread mode
which is enabled during the perf top synthesize. It might crash with
following assertion:
perf: ...include/linux/refcount.h:109: refcount_inc:
Assertion `!(!refcount_inc_not_zero(r))' failed.
The gdb backtrace looks like this:
0x00007ffff50839fb in raise () from /lib64/libc.so.6
(gdb)
#0 0x00007ffff50839fb in raise () from /lib64/libc.so.6
#1 0x00007ffff5085800 in abort () from /lib64/libc.so.6
#2 0x00007ffff507c0da in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007ffff507c152 in __assert_fail () from /lib64/libc.so.6
#4 0x0000000000535ff9 in refcount_inc (r=0x7fffe8009a70)
at ...include/linux/refcount.h:109
#5 0x0000000000536771 in thread__get (thread=0x7fffe8009a40)
at util/thread.c:115
#6 0x0000000000523cd0 in ____machine__findnew_thread (machine=0xbfde38,
threads=0xbfdf28, pid=2, tid=2, create=true) at util/machine.c:432
#7 0x0000000000523eb4 in __machine__findnew_thread (machine=0xbfde38,
pid=2, tid=2) at util/machine.c:489
#8 0x0000000000523f24 in machine__findnew_thread (machine=0xbfde38,
pid=2, tid=2) at util/machine.c:499
#9 0x0000000000526fbe in machine__process_fork_event (machine=0xbfde38,
...
The failing assertion is this one:
REFCOUNT_WARN(!refcount_inc_not_zero(r), ...
the problem is that we don't serialize access to threads::last_match.
We serialize the access to the threads tree, but we don't care how's
threads::last_match being accessed. Both locked/unlocked paths use
that data and can set it. In multithreaded mode we can end up with
invalid object in thread__get call, like in following paths race:
thread 1
...
machine__findnew_thread
down_write(&threads->lock);
__machine__findnew_thread
____machine__findnew_thread
th = threads->last_match;
if (th->tid == tid) {
thread__get
thread 2
...
machine__find_thread
down_read(&threads->lock);
__machine__findnew_thread
____machine__findnew_thread
th = threads->last_match;
if (th->tid == tid) {
thread__get
thread 3
...
machine__process_fork_event
machine__remove_thread
__machine__remove_thread
threads->last_match = NULL
thread__put
thread__put
Thread 1 and 2 might got stale last_match, before thread 3 clears
it. Thread 1 and 2 then race with thread 3's thread__put and they
might trigger the refcnt == 0 assertion above.
The patch is disabling the last_match cache for multiple thread
mode. It was originally meant for single thread scenarios, where
it's common to have multiple sequential searches of the same
thread.
In multithread mode this does not make sense, because top's threads
processes different /proc entries and so the 'struct threads' object
is queried for various threads. Moreover we'd need to add more locks
to make it work.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20180719143345.12963-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-19 21:33:44 +07:00
|
|
|
static void
|
|
|
|
threads__set_last_match(struct threads *threads, struct thread *th)
|
|
|
|
{
|
|
|
|
if (perf_singlethreaded)
|
|
|
|
__threads__set_last_match(threads, th);
|
|
|
|
}
|
|
|
|
|
2018-07-19 21:33:42 +07:00
|
|
|
/*
|
|
|
|
* Caller must eventually drop thread->refcnt returned with a successful
|
|
|
|
* lookup/new thread inserted.
|
|
|
|
*/
|
|
|
|
static struct thread *____machine__findnew_thread(struct machine *machine,
|
|
|
|
struct threads *threads,
|
|
|
|
pid_t pid, pid_t tid,
|
|
|
|
bool create)
|
|
|
|
{
|
2018-12-07 02:18:14 +07:00
|
|
|
struct rb_node **p = &threads->entries.rb_root.rb_node;
|
2018-07-19 21:33:42 +07:00
|
|
|
struct rb_node *parent = NULL;
|
|
|
|
struct thread *th;
|
2018-12-07 02:18:14 +07:00
|
|
|
bool leftmost = true;
|
2018-07-19 21:33:42 +07:00
|
|
|
|
|
|
|
th = threads__get_last_match(threads, machine, pid, tid);
|
|
|
|
if (th)
|
|
|
|
return th;
|
|
|
|
|
2012-10-07 01:43:20 +07:00
|
|
|
while (*p != NULL) {
|
|
|
|
parent = *p;
|
|
|
|
th = rb_entry(parent, struct thread, rb_node);
|
|
|
|
|
2013-07-04 20:20:31 +07:00
|
|
|
if (th->tid == tid) {
|
2018-07-19 21:33:43 +07:00
|
|
|
threads__set_last_match(threads, th);
|
2014-07-16 15:07:13 +07:00
|
|
|
machine__update_thread_pid(machine, th, pid);
|
2015-12-12 05:11:23 +07:00
|
|
|
return thread__get(th);
|
2012-10-07 01:43:20 +07:00
|
|
|
}
|
|
|
|
|
2013-07-04 20:20:31 +07:00
|
|
|
if (tid < th->tid)
|
2012-10-07 01:43:20 +07:00
|
|
|
p = &(*p)->rb_left;
|
2018-12-07 02:18:14 +07:00
|
|
|
else {
|
2012-10-07 01:43:20 +07:00
|
|
|
p = &(*p)->rb_right;
|
2018-12-07 02:18:14 +07:00
|
|
|
leftmost = false;
|
|
|
|
}
|
2012-10-07 01:43:20 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
if (!create)
|
|
|
|
return NULL;
|
|
|
|
|
2013-08-26 20:00:19 +07:00
|
|
|
th = thread__new(pid, tid);
|
2012-10-07 01:43:20 +07:00
|
|
|
if (th != NULL) {
|
|
|
|
rb_link_node(&th->rb_node, parent, p);
|
2018-12-07 02:18:14 +07:00
|
|
|
rb_insert_color_cached(&th->rb_node, &threads->entries, leftmost);
|
2014-04-10 01:54:29 +07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We have to initialize map_groups separately
|
|
|
|
* after rb tree is updated.
|
|
|
|
*
|
|
|
|
* The reason is that we call machine__findnew_thread
|
|
|
|
* within thread__init_map_groups to find the thread
|
|
|
|
* leader and that would screwed the rb tree.
|
|
|
|
*/
|
2014-07-16 14:19:44 +07:00
|
|
|
if (thread__init_map_groups(th, machine)) {
|
2018-12-07 02:18:14 +07:00
|
|
|
rb_erase_cached(&th->rb_node, &threads->entries);
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
RB_CLEAR_NODE(&th->rb_node);
|
2015-12-12 05:11:23 +07:00
|
|
|
thread__put(th);
|
2014-04-10 01:54:29 +07:00
|
|
|
return NULL;
|
2014-07-16 14:19:44 +07:00
|
|
|
}
|
2015-03-03 08:21:35 +07:00
|
|
|
/*
|
|
|
|
* It is now in the rbtree, get a ref
|
|
|
|
*/
|
|
|
|
thread__get(th);
|
2018-07-19 21:33:43 +07:00
|
|
|
threads__set_last_match(threads, th);
|
2017-09-11 09:23:14 +07:00
|
|
|
++threads->nr;
|
2012-10-07 01:43:20 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
return th;
|
|
|
|
}
|
|
|
|
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
struct thread *__machine__findnew_thread(struct machine *machine, pid_t pid, pid_t tid)
|
|
|
|
{
|
2017-09-15 02:16:34 +07:00
|
|
|
return ____machine__findnew_thread(machine, machine__threads(machine, tid), pid, tid, true);
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
}
|
|
|
|
|
2013-08-27 15:23:03 +07:00
|
|
|
struct thread *machine__findnew_thread(struct machine *machine, pid_t pid,
|
|
|
|
pid_t tid)
|
2012-10-07 01:43:20 +07:00
|
|
|
{
|
2017-09-11 09:23:14 +07:00
|
|
|
struct threads *threads = machine__threads(machine, tid);
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
struct thread *th;
|
|
|
|
|
2017-04-04 23:15:04 +07:00
|
|
|
down_write(&threads->lock);
|
2015-12-12 05:11:23 +07:00
|
|
|
th = __machine__findnew_thread(machine, pid, tid);
|
2017-04-04 23:15:04 +07:00
|
|
|
up_write(&threads->lock);
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
return th;
|
2012-10-07 01:43:20 +07:00
|
|
|
}
|
|
|
|
|
2014-03-14 21:00:03 +07:00
|
|
|
struct thread *machine__find_thread(struct machine *machine, pid_t pid,
|
|
|
|
pid_t tid)
|
2012-10-07 01:43:20 +07:00
|
|
|
{
|
2017-09-11 09:23:14 +07:00
|
|
|
struct threads *threads = machine__threads(machine, tid);
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
struct thread *th;
|
2017-09-11 09:23:14 +07:00
|
|
|
|
2017-04-04 23:15:04 +07:00
|
|
|
down_read(&threads->lock);
|
2017-09-15 02:16:34 +07:00
|
|
|
th = ____machine__findnew_thread(machine, threads, pid, tid, false);
|
2017-04-04 23:15:04 +07:00
|
|
|
up_read(&threads->lock);
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
return th;
|
2012-10-07 01:43:20 +07:00
|
|
|
}
|
2012-10-07 02:26:02 +07:00
|
|
|
|
2014-07-31 13:00:45 +07:00
|
|
|
struct comm *machine__thread_exec_comm(struct machine *machine,
|
|
|
|
struct thread *thread)
|
|
|
|
{
|
|
|
|
if (machine->comm_exec)
|
|
|
|
return thread__exec_comm(thread);
|
|
|
|
else
|
|
|
|
return thread__comm(thread);
|
|
|
|
}
|
|
|
|
|
2013-09-11 21:18:24 +07:00
|
|
|
int machine__process_comm_event(struct machine *machine, union perf_event *event,
|
|
|
|
struct perf_sample *sample)
|
2012-10-07 02:26:02 +07:00
|
|
|
{
|
2013-08-27 15:23:03 +07:00
|
|
|
struct thread *thread = machine__findnew_thread(machine,
|
|
|
|
event->comm.pid,
|
|
|
|
event->comm.tid);
|
2014-07-31 13:00:44 +07:00
|
|
|
bool exec = event->header.misc & PERF_RECORD_MISC_COMM_EXEC;
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
int err = 0;
|
2012-10-07 02:26:02 +07:00
|
|
|
|
2014-07-31 13:00:45 +07:00
|
|
|
if (exec)
|
|
|
|
machine->comm_exec = true;
|
|
|
|
|
2012-10-07 02:26:02 +07:00
|
|
|
if (dump_trace)
|
|
|
|
perf_event__fprintf_comm(event, stdout);
|
|
|
|
|
2014-07-31 13:00:44 +07:00
|
|
|
if (thread == NULL ||
|
|
|
|
__thread__set_comm(thread, event->comm.comm, sample->time, exec)) {
|
2012-10-07 02:26:02 +07:00
|
|
|
dump_printf("problem processing PERF_RECORD_COMM, skipping event.\n");
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
err = -1;
|
2012-10-07 02:26:02 +07:00
|
|
|
}
|
|
|
|
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
thread__put(thread);
|
|
|
|
|
|
|
|
return err;
|
2012-10-07 02:26:02 +07:00
|
|
|
}
|
|
|
|
|
perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info
Introduce a new option to record PERF_RECORD_NAMESPACES events emitted
by the kernel when fork, clone, setns or unshare are invoked. And update
perf-record documentation with the new option to record namespace
events.
Committer notes:
Combined it with a later patch to allow printing it via 'perf report -D'
and be able to test the feature introduced in this patch. Had to move
here also perf_ns__name(), that was introduced in another later patch.
Also used PRIu64 and PRIx64 to fix the build in some enfironments wrt:
util/event.c:1129:39: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'long long unsigned int' [-Werror=format=]
ret += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx
^
Testing it:
# perf record --namespaces -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.083 MB perf.data (423 samples) ]
#
# perf report -D
<SNIP>
3 2028902078892 0x115140 [0xa0]: PERF_RECORD_NAMESPACES 14783/14783 - nr_namespaces: 7
[0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]
0x1151e0 [0x30]: event: 9
.
. ... raw event: size 48 bytes
. 0000: 09 00 00 00 02 00 30 00 c4 71 82 68 0c 7f 00 00 ......0..q.h....
. 0010: a9 39 00 00 a9 39 00 00 94 28 fe 63 d8 01 00 00 .9...9...(.c....
. 0020: 03 00 00 00 00 00 00 00 ce c4 02 00 00 00 00 00 ................
<SNIP>
NAMESPACES events: 1
<SNIP>
#
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891930386.25309.18412039920746995488.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-08 03:41:43 +07:00
|
|
|
int machine__process_namespaces_event(struct machine *machine __maybe_unused,
|
|
|
|
union perf_event *event,
|
|
|
|
struct perf_sample *sample __maybe_unused)
|
|
|
|
{
|
|
|
|
struct thread *thread = machine__findnew_thread(machine,
|
|
|
|
event->namespaces.pid,
|
|
|
|
event->namespaces.tid);
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
WARN_ONCE(event->namespaces.nr_namespaces > NR_NAMESPACES,
|
|
|
|
"\nWARNING: kernel seems to support more namespaces than perf"
|
|
|
|
" tool.\nTry updating the perf tool..\n\n");
|
|
|
|
|
|
|
|
WARN_ONCE(event->namespaces.nr_namespaces < NR_NAMESPACES,
|
|
|
|
"\nWARNING: perf tool seems to support more namespaces than"
|
|
|
|
" the kernel.\nTry updating the kernel..\n\n");
|
|
|
|
|
|
|
|
if (dump_trace)
|
|
|
|
perf_event__fprintf_namespaces(event, stdout);
|
|
|
|
|
|
|
|
if (thread == NULL ||
|
|
|
|
thread__set_namespaces(thread, sample->time, &event->namespaces)) {
|
|
|
|
dump_printf("problem processing PERF_RECORD_NAMESPACES, skipping event.\n");
|
|
|
|
err = -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
thread__put(thread);
|
|
|
|
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2012-10-07 02:26:02 +07:00
|
|
|
int machine__process_lost_event(struct machine *machine __maybe_unused,
|
2013-09-11 21:18:24 +07:00
|
|
|
union perf_event *event, struct perf_sample *sample __maybe_unused)
|
2012-10-07 02:26:02 +07:00
|
|
|
{
|
2019-08-26 01:17:46 +07:00
|
|
|
dump_printf(": id:%" PRI_lu64 ": lost:%" PRI_lu64 "\n",
|
2012-10-07 02:26:02 +07:00
|
|
|
event->lost.id, event->lost.lost);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-05-11 02:13:15 +07:00
|
|
|
int machine__process_lost_samples_event(struct machine *machine __maybe_unused,
|
|
|
|
union perf_event *event, struct perf_sample *sample)
|
|
|
|
{
|
2019-08-26 01:17:47 +07:00
|
|
|
dump_printf(": id:%" PRIu64 ": lost samples :%" PRI_lu64 "\n",
|
2015-05-11 02:13:15 +07:00
|
|
|
sample->id, event->lost_samples.lost);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
perf machine: Fix up some more method names
Calling the function 'machine__new_module' implies a new 'module' will
be allocated, when in fact what is returned is a 'struct map' instance,
that not necessarily will be instantiated, as if one already exists with
the given module name, it will be returned instead.
So be consistent with other "find and if not there, create" like
functions, like machine__findnew_thread, machine__findnew_dso, etc, and
rename it to machine__findnew_module_map(), that in turn will call
machine__findnew_module_dso().
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/n/tip-acv830vd3hwww2ih5vjtbmu3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-01 22:01:02 +07:00
|
|
|
static struct dso *machine__findnew_module_dso(struct machine *machine,
|
|
|
|
struct kmod_path *m,
|
|
|
|
const char *filename)
|
2015-02-13 04:10:52 +07:00
|
|
|
{
|
|
|
|
struct dso *dso;
|
|
|
|
|
2017-04-04 23:15:04 +07:00
|
|
|
down_write(&machine->dsos.lock);
|
2015-06-02 01:40:01 +07:00
|
|
|
|
|
|
|
dso = __dsos__find(&machine->dsos, m->name, true);
|
2015-02-13 04:10:52 +07:00
|
|
|
if (!dso) {
|
2015-06-02 01:40:01 +07:00
|
|
|
dso = __dsos__addnew(&machine->dsos, m->name);
|
2015-02-13 04:10:52 +07:00
|
|
|
if (dso == NULL)
|
2015-06-02 01:40:01 +07:00
|
|
|
goto out_unlock;
|
2015-02-13 04:10:52 +07:00
|
|
|
|
2017-05-31 19:01:04 +07:00
|
|
|
dso__set_module_info(dso, m, machine);
|
2015-02-17 23:29:57 +07:00
|
|
|
dso__set_long_name(dso, strdup(filename), true);
|
2015-02-13 04:10:52 +07:00
|
|
|
}
|
|
|
|
|
2015-06-02 21:53:26 +07:00
|
|
|
dso__get(dso);
|
2015-06-02 01:40:01 +07:00
|
|
|
out_unlock:
|
2017-04-04 23:15:04 +07:00
|
|
|
up_write(&machine->dsos.lock);
|
2015-02-13 04:10:52 +07:00
|
|
|
return dso;
|
|
|
|
}
|
|
|
|
|
2015-04-30 21:37:29 +07:00
|
|
|
int machine__process_aux_event(struct machine *machine __maybe_unused,
|
|
|
|
union perf_event *event)
|
|
|
|
{
|
|
|
|
if (dump_trace)
|
|
|
|
perf_event__fprintf_aux(event, stdout);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-04-30 21:37:30 +07:00
|
|
|
int machine__process_itrace_start_event(struct machine *machine __maybe_unused,
|
|
|
|
union perf_event *event)
|
|
|
|
{
|
|
|
|
if (dump_trace)
|
|
|
|
perf_event__fprintf_itrace_start(event, stdout);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2015-07-21 16:44:03 +07:00
|
|
|
int machine__process_switch_event(struct machine *machine __maybe_unused,
|
|
|
|
union perf_event *event)
|
|
|
|
{
|
|
|
|
if (dump_trace)
|
|
|
|
perf_event__fprintf_switch(event, stdout);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-01-17 23:15:17 +07:00
|
|
|
static int machine__process_ksymbol_register(struct machine *machine,
|
|
|
|
union perf_event *event,
|
|
|
|
struct perf_sample *sample __maybe_unused)
|
|
|
|
{
|
|
|
|
struct symbol *sym;
|
|
|
|
struct map *map;
|
|
|
|
|
2019-08-27 05:15:18 +07:00
|
|
|
map = map_groups__find(&machine->kmaps, event->ksymbol.addr);
|
2019-01-17 23:15:17 +07:00
|
|
|
if (!map) {
|
2019-08-27 05:15:18 +07:00
|
|
|
map = dso__new_map(event->ksymbol.name);
|
2019-01-17 23:15:17 +07:00
|
|
|
if (!map)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
2019-08-27 05:15:18 +07:00
|
|
|
map->start = event->ksymbol.addr;
|
|
|
|
map->end = map->start + event->ksymbol.len;
|
2019-01-17 23:15:17 +07:00
|
|
|
map_groups__insert(&machine->kmaps, map);
|
|
|
|
}
|
|
|
|
|
2019-05-08 20:20:04 +07:00
|
|
|
sym = symbol__new(map->map_ip(map, map->start),
|
2019-08-27 05:15:18 +07:00
|
|
|
event->ksymbol.len,
|
|
|
|
0, 0, event->ksymbol.name);
|
2019-01-17 23:15:17 +07:00
|
|
|
if (!sym)
|
|
|
|
return -ENOMEM;
|
|
|
|
dso__insert_symbol(map->dso, sym);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int machine__process_ksymbol_unregister(struct machine *machine,
|
|
|
|
union perf_event *event,
|
|
|
|
struct perf_sample *sample __maybe_unused)
|
|
|
|
{
|
|
|
|
struct map *map;
|
|
|
|
|
2019-08-27 05:15:18 +07:00
|
|
|
map = map_groups__find(&machine->kmaps, event->ksymbol.addr);
|
2019-01-17 23:15:17 +07:00
|
|
|
if (map)
|
|
|
|
map_groups__remove(&machine->kmaps, map);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int machine__process_ksymbol(struct machine *machine __maybe_unused,
|
|
|
|
union perf_event *event,
|
|
|
|
struct perf_sample *sample)
|
|
|
|
{
|
|
|
|
if (dump_trace)
|
|
|
|
perf_event__fprintf_ksymbol(event, stdout);
|
|
|
|
|
2019-08-27 05:15:18 +07:00
|
|
|
if (event->ksymbol.flags & PERF_RECORD_KSYMBOL_FLAGS_UNREGISTER)
|
2019-01-17 23:15:17 +07:00
|
|
|
return machine__process_ksymbol_unregister(machine, event,
|
|
|
|
sample);
|
|
|
|
return machine__process_ksymbol_register(machine, event, sample);
|
|
|
|
}
|
|
|
|
|
2019-11-14 22:28:41 +07:00
|
|
|
static struct map *machine__addnew_module_map(struct machine *machine, u64 start,
|
|
|
|
const char *filename)
|
2012-12-08 03:39:39 +07:00
|
|
|
{
|
2015-02-17 23:29:57 +07:00
|
|
|
struct map *map = NULL;
|
|
|
|
struct kmod_path m;
|
2019-11-14 22:28:41 +07:00
|
|
|
struct dso *dso;
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2015-02-17 23:29:57 +07:00
|
|
|
if (kmod_path__parse_name(&m, filename))
|
2012-12-08 03:39:39 +07:00
|
|
|
return NULL;
|
|
|
|
|
perf machine: Fix up some more method names
Calling the function 'machine__new_module' implies a new 'module' will
be allocated, when in fact what is returned is a 'struct map' instance,
that not necessarily will be instantiated, as if one already exists with
the given module name, it will be returned instead.
So be consistent with other "find and if not there, create" like
functions, like machine__findnew_thread, machine__findnew_dso, etc, and
rename it to machine__findnew_module_map(), that in turn will call
machine__findnew_module_dso().
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/n/tip-acv830vd3hwww2ih5vjtbmu3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-01 22:01:02 +07:00
|
|
|
dso = machine__findnew_module_dso(machine, &m, filename);
|
2015-02-17 23:29:57 +07:00
|
|
|
if (dso == NULL)
|
|
|
|
goto out;
|
|
|
|
|
2018-04-27 02:52:34 +07:00
|
|
|
map = map__new2(start, dso);
|
2012-12-08 03:39:39 +07:00
|
|
|
if (map == NULL)
|
2015-02-17 23:29:57 +07:00
|
|
|
goto out;
|
2012-12-08 03:39:39 +07:00
|
|
|
|
|
|
|
map_groups__insert(&machine->kmaps, map);
|
2015-02-17 23:29:57 +07:00
|
|
|
|
2015-11-18 13:40:20 +07:00
|
|
|
/* Put the map here because map_groups__insert alread got it */
|
|
|
|
map__put(map);
|
2015-02-17 23:29:57 +07:00
|
|
|
out:
|
2015-11-18 13:40:35 +07:00
|
|
|
/* put the dso here, corresponding to machine__findnew_module_dso */
|
|
|
|
dso__put(dso);
|
2019-07-04 22:06:20 +07:00
|
|
|
zfree(&m.name);
|
2012-12-08 03:39:39 +07:00
|
|
|
return map;
|
|
|
|
}
|
|
|
|
|
2012-12-19 05:15:48 +07:00
|
|
|
size_t machines__fprintf_dsos(struct machines *machines, FILE *fp)
|
2012-12-08 03:39:39 +07:00
|
|
|
{
|
|
|
|
struct rb_node *nd;
|
2015-05-28 23:06:42 +07:00
|
|
|
size_t ret = __dsos__fprintf(&machines->host.dsos.head, fp);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2018-12-07 02:18:14 +07:00
|
|
|
for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
|
2012-12-08 03:39:39 +07:00
|
|
|
struct machine *pos = rb_entry(nd, struct machine, rb_node);
|
2015-05-28 23:06:42 +07:00
|
|
|
ret += __dsos__fprintf(&pos->dsos.head, fp);
|
2012-12-08 03:39:39 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2014-09-30 03:07:28 +07:00
|
|
|
size_t machine__fprintf_dsos_buildid(struct machine *m, FILE *fp,
|
2012-12-08 03:39:39 +07:00
|
|
|
bool (skip)(struct dso *dso, int parm), int parm)
|
|
|
|
{
|
2015-05-28 23:06:42 +07:00
|
|
|
return __dsos__fprintf_buildid(&m->dsos.head, fp, skip, parm);
|
2012-12-08 03:39:39 +07:00
|
|
|
}
|
|
|
|
|
2012-12-19 05:15:48 +07:00
|
|
|
size_t machines__fprintf_dsos_buildid(struct machines *machines, FILE *fp,
|
2012-12-08 03:39:39 +07:00
|
|
|
bool (skip)(struct dso *dso, int parm), int parm)
|
|
|
|
{
|
|
|
|
struct rb_node *nd;
|
2012-12-19 05:15:48 +07:00
|
|
|
size_t ret = machine__fprintf_dsos_buildid(&machines->host, fp, skip, parm);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2018-12-07 02:18:14 +07:00
|
|
|
for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
|
2012-12-08 03:39:39 +07:00
|
|
|
struct machine *pos = rb_entry(nd, struct machine, rb_node);
|
|
|
|
ret += machine__fprintf_dsos_buildid(pos, fp, skip, parm);
|
|
|
|
}
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
size_t printed = 0;
|
2019-11-01 01:22:24 +07:00
|
|
|
struct dso *kdso = machine__kernel_dso(machine);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
|
|
|
if (kdso->has_build_id) {
|
|
|
|
char filename[PATH_MAX];
|
2017-07-06 08:48:13 +07:00
|
|
|
if (dso__build_id_filename(kdso, filename, sizeof(filename),
|
|
|
|
false))
|
2012-12-08 03:39:39 +07:00
|
|
|
printed += fprintf(fp, "[0] %s\n", filename);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0; i < vmlinux_path__nr_entries; ++i)
|
|
|
|
printed += fprintf(fp, "[%d] %s\n",
|
|
|
|
i + kdso->has_build_id, vmlinux_path[i]);
|
|
|
|
|
|
|
|
return printed;
|
|
|
|
}
|
|
|
|
|
|
|
|
size_t machine__fprintf(struct machine *machine, FILE *fp)
|
|
|
|
{
|
|
|
|
struct rb_node *nd;
|
2017-09-11 09:23:14 +07:00
|
|
|
size_t ret;
|
|
|
|
int i;
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2017-09-11 09:23:14 +07:00
|
|
|
for (i = 0; i < THREADS__TABLE_SIZE; i++) {
|
|
|
|
struct threads *threads = &machine->threads[i];
|
2017-04-04 23:15:04 +07:00
|
|
|
|
|
|
|
down_read(&threads->lock);
|
2016-05-04 20:09:33 +07:00
|
|
|
|
2017-09-11 09:23:14 +07:00
|
|
|
ret = fprintf(fp, "Threads: %u\n", threads->nr);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2018-12-07 02:18:14 +07:00
|
|
|
for (nd = rb_first_cached(&threads->entries); nd;
|
|
|
|
nd = rb_next(nd)) {
|
2017-09-11 09:23:14 +07:00
|
|
|
struct thread *pos = rb_entry(nd, struct thread, rb_node);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2017-09-11 09:23:14 +07:00
|
|
|
ret += thread__fprintf(pos, fp);
|
|
|
|
}
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
|
2017-04-04 23:15:04 +07:00
|
|
|
up_read(&threads->lock);
|
2017-09-11 09:23:14 +07:00
|
|
|
}
|
2012-12-08 03:39:39 +07:00
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static struct dso *machine__get_kernel(struct machine *machine)
|
|
|
|
{
|
2018-02-15 19:26:30 +07:00
|
|
|
const char *vmlinux_name = machine->mmap_name;
|
2012-12-08 03:39:39 +07:00
|
|
|
struct dso *kernel;
|
|
|
|
|
|
|
|
if (machine__is_host(machine)) {
|
2018-03-12 22:24:06 +07:00
|
|
|
if (symbol_conf.vmlinux_name)
|
|
|
|
vmlinux_name = symbol_conf.vmlinux_name;
|
|
|
|
|
2015-05-28 22:40:55 +07:00
|
|
|
kernel = machine__findnew_kernel(machine, vmlinux_name,
|
|
|
|
"[kernel]", DSO_TYPE_KERNEL);
|
2012-12-08 03:39:39 +07:00
|
|
|
} else {
|
2018-03-12 22:24:06 +07:00
|
|
|
if (symbol_conf.default_guest_vmlinux_name)
|
|
|
|
vmlinux_name = symbol_conf.default_guest_vmlinux_name;
|
|
|
|
|
2015-05-28 22:40:55 +07:00
|
|
|
kernel = machine__findnew_kernel(machine, vmlinux_name,
|
|
|
|
"[guest.kernel]",
|
|
|
|
DSO_TYPE_GUEST_KERNEL);
|
2012-12-08 03:39:39 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
if (kernel != NULL && (!kernel->has_build_id))
|
|
|
|
dso__read_running_kernel_build_id(kernel, machine);
|
|
|
|
|
|
|
|
return kernel;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct process_args {
|
|
|
|
u64 start;
|
|
|
|
};
|
|
|
|
|
2018-05-22 17:54:36 +07:00
|
|
|
void machine__get_kallsyms_filename(struct machine *machine, char *buf,
|
|
|
|
size_t bufsz)
|
2014-01-29 21:14:38 +07:00
|
|
|
{
|
|
|
|
if (machine__is_default_guest(machine))
|
|
|
|
scnprintf(buf, bufsz, "%s", symbol_conf.default_guest_kallsyms);
|
|
|
|
else
|
|
|
|
scnprintf(buf, bufsz, "%s/proc/kallsyms", machine->root_dir);
|
|
|
|
}
|
|
|
|
|
2014-06-17 01:32:09 +07:00
|
|
|
const char *ref_reloc_sym_names[] = {"_text", "_stext", NULL};
|
|
|
|
|
|
|
|
/* Figure out the start address of kernel map from /proc/kallsyms.
|
|
|
|
* Returns the name of the start symbol in *symbol_name. Pass in NULL as
|
|
|
|
* symbol_name if it's not that important.
|
|
|
|
*/
|
perf symbols: Accept symbols starting at address 0
That is the case of _text on s390, and we have some functions that return an
address, using address zero to report problems, oops.
This would lead the symbol loading routines to not use "_text" as the reference
relocation symbol, or the first symbol for the kernel, but use instead
"_stext", that is at the same address on x86_64 and others, but not on s390:
[acme@localhost perf-4.11.0-rc6]$ head -15 /proc/kallsyms
0000000000000000 T _text
0000000000000418 t iplstart
0000000000000800 T start
000000000000080a t .base
000000000000082e t .sk8x8
0000000000000834 t .gotr
0000000000000842 t .cmd
0000000000000846 t .parm
000000000000084a t .lowcase
0000000000010000 T startup
0000000000010010 T startup_kdump
0000000000010214 t startup_kdump_relocated
0000000000011000 T startup_continue
00000000000112a0 T _ehead
0000000000100000 T _stext
[acme@localhost perf-4.11.0-rc6]$
Which in turn would make 'perf test vmlinux' to fail because it wouldn't find
the symbols before "_stext" in kallsyms.
Fix it by using the return value only for errors and storing the
address, when the symbol is successfully found, in a provided pointer
arg.
Before this patch:
After:
[acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
1: vmlinux symtab matches kallsyms :
--- start ---
test child forked, pid 40693
Looking at the vmlinux_path (8 entries long)
Using /usr/lib/debug/lib/modules/3.10.0-654.el7.s390x/vmlinux for symbols
ERR : 0: _text not on kallsyms
ERR : 0x418: iplstart not on kallsyms
ERR : 0x800: start not on kallsyms
ERR : 0x80a: .base not on kallsyms
ERR : 0x82e: .sk8x8 not on kallsyms
ERR : 0x834: .gotr not on kallsyms
ERR : 0x842: .cmd not on kallsyms
ERR : 0x846: .parm not on kallsyms
ERR : 0x84a: .lowcase not on kallsyms
ERR : 0x10000: startup not on kallsyms
ERR : 0x10010: startup_kdump not on kallsyms
ERR : 0x10214: startup_kdump_relocated not on kallsyms
ERR : 0x11000: startup_continue not on kallsyms
ERR : 0x112a0: _ehead not on kallsyms
<SNIP warnings>
test child finished with -1
---- end ----
vmlinux symtab matches kallsyms: FAILED!
[acme@localhost perf-4.11.0-rc6]$
After:
[acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
1: vmlinux symtab matches kallsyms :
--- start ---
test child forked, pid 47160
<SNIP warnings>
test child finished with 0
---- end ----
vmlinux symtab matches kallsyms: Ok
[acme@localhost perf-4.11.0-rc6]$
Reported-by: Michael Petlan <mpetlan@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-9x9bwgd3btwdk1u51xie93fz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-28 07:21:09 +07:00
|
|
|
static int machine__get_running_kernel_start(struct machine *machine,
|
2019-05-08 20:20:03 +07:00
|
|
|
const char **symbol_name,
|
|
|
|
u64 *start, u64 *end)
|
2012-12-08 03:39:39 +07:00
|
|
|
{
|
2014-01-29 21:14:38 +07:00
|
|
|
char filename[PATH_MAX];
|
perf symbols: Accept symbols starting at address 0
That is the case of _text on s390, and we have some functions that return an
address, using address zero to report problems, oops.
This would lead the symbol loading routines to not use "_text" as the reference
relocation symbol, or the first symbol for the kernel, but use instead
"_stext", that is at the same address on x86_64 and others, but not on s390:
[acme@localhost perf-4.11.0-rc6]$ head -15 /proc/kallsyms
0000000000000000 T _text
0000000000000418 t iplstart
0000000000000800 T start
000000000000080a t .base
000000000000082e t .sk8x8
0000000000000834 t .gotr
0000000000000842 t .cmd
0000000000000846 t .parm
000000000000084a t .lowcase
0000000000010000 T startup
0000000000010010 T startup_kdump
0000000000010214 t startup_kdump_relocated
0000000000011000 T startup_continue
00000000000112a0 T _ehead
0000000000100000 T _stext
[acme@localhost perf-4.11.0-rc6]$
Which in turn would make 'perf test vmlinux' to fail because it wouldn't find
the symbols before "_stext" in kallsyms.
Fix it by using the return value only for errors and storing the
address, when the symbol is successfully found, in a provided pointer
arg.
Before this patch:
After:
[acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
1: vmlinux symtab matches kallsyms :
--- start ---
test child forked, pid 40693
Looking at the vmlinux_path (8 entries long)
Using /usr/lib/debug/lib/modules/3.10.0-654.el7.s390x/vmlinux for symbols
ERR : 0: _text not on kallsyms
ERR : 0x418: iplstart not on kallsyms
ERR : 0x800: start not on kallsyms
ERR : 0x80a: .base not on kallsyms
ERR : 0x82e: .sk8x8 not on kallsyms
ERR : 0x834: .gotr not on kallsyms
ERR : 0x842: .cmd not on kallsyms
ERR : 0x846: .parm not on kallsyms
ERR : 0x84a: .lowcase not on kallsyms
ERR : 0x10000: startup not on kallsyms
ERR : 0x10010: startup_kdump not on kallsyms
ERR : 0x10214: startup_kdump_relocated not on kallsyms
ERR : 0x11000: startup_continue not on kallsyms
ERR : 0x112a0: _ehead not on kallsyms
<SNIP warnings>
test child finished with -1
---- end ----
vmlinux symtab matches kallsyms: FAILED!
[acme@localhost perf-4.11.0-rc6]$
After:
[acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
1: vmlinux symtab matches kallsyms :
--- start ---
test child forked, pid 47160
<SNIP warnings>
test child finished with 0
---- end ----
vmlinux symtab matches kallsyms: Ok
[acme@localhost perf-4.11.0-rc6]$
Reported-by: Michael Petlan <mpetlan@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-9x9bwgd3btwdk1u51xie93fz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-28 07:21:09 +07:00
|
|
|
int i, err = -1;
|
2014-06-17 01:32:09 +07:00
|
|
|
const char *name;
|
|
|
|
u64 addr = 0;
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2014-01-29 21:14:38 +07:00
|
|
|
machine__get_kallsyms_filename(machine, filename, PATH_MAX);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
|
|
|
if (symbol__restricted_filename(filename, "/proc/kallsyms"))
|
|
|
|
return 0;
|
|
|
|
|
2014-06-17 01:32:09 +07:00
|
|
|
for (i = 0; (name = ref_reloc_sym_names[i]) != NULL; i++) {
|
perf symbols: Accept symbols starting at address 0
That is the case of _text on s390, and we have some functions that return an
address, using address zero to report problems, oops.
This would lead the symbol loading routines to not use "_text" as the reference
relocation symbol, or the first symbol for the kernel, but use instead
"_stext", that is at the same address on x86_64 and others, but not on s390:
[acme@localhost perf-4.11.0-rc6]$ head -15 /proc/kallsyms
0000000000000000 T _text
0000000000000418 t iplstart
0000000000000800 T start
000000000000080a t .base
000000000000082e t .sk8x8
0000000000000834 t .gotr
0000000000000842 t .cmd
0000000000000846 t .parm
000000000000084a t .lowcase
0000000000010000 T startup
0000000000010010 T startup_kdump
0000000000010214 t startup_kdump_relocated
0000000000011000 T startup_continue
00000000000112a0 T _ehead
0000000000100000 T _stext
[acme@localhost perf-4.11.0-rc6]$
Which in turn would make 'perf test vmlinux' to fail because it wouldn't find
the symbols before "_stext" in kallsyms.
Fix it by using the return value only for errors and storing the
address, when the symbol is successfully found, in a provided pointer
arg.
Before this patch:
After:
[acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
1: vmlinux symtab matches kallsyms :
--- start ---
test child forked, pid 40693
Looking at the vmlinux_path (8 entries long)
Using /usr/lib/debug/lib/modules/3.10.0-654.el7.s390x/vmlinux for symbols
ERR : 0: _text not on kallsyms
ERR : 0x418: iplstart not on kallsyms
ERR : 0x800: start not on kallsyms
ERR : 0x80a: .base not on kallsyms
ERR : 0x82e: .sk8x8 not on kallsyms
ERR : 0x834: .gotr not on kallsyms
ERR : 0x842: .cmd not on kallsyms
ERR : 0x846: .parm not on kallsyms
ERR : 0x84a: .lowcase not on kallsyms
ERR : 0x10000: startup not on kallsyms
ERR : 0x10010: startup_kdump not on kallsyms
ERR : 0x10214: startup_kdump_relocated not on kallsyms
ERR : 0x11000: startup_continue not on kallsyms
ERR : 0x112a0: _ehead not on kallsyms
<SNIP warnings>
test child finished with -1
---- end ----
vmlinux symtab matches kallsyms: FAILED!
[acme@localhost perf-4.11.0-rc6]$
After:
[acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
1: vmlinux symtab matches kallsyms :
--- start ---
test child forked, pid 47160
<SNIP warnings>
test child finished with 0
---- end ----
vmlinux symtab matches kallsyms: Ok
[acme@localhost perf-4.11.0-rc6]$
Reported-by: Michael Petlan <mpetlan@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-9x9bwgd3btwdk1u51xie93fz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-28 07:21:09 +07:00
|
|
|
err = kallsyms__get_function_start(filename, name, &addr);
|
|
|
|
if (!err)
|
2014-06-17 01:32:09 +07:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
perf symbols: Accept symbols starting at address 0
That is the case of _text on s390, and we have some functions that return an
address, using address zero to report problems, oops.
This would lead the symbol loading routines to not use "_text" as the reference
relocation symbol, or the first symbol for the kernel, but use instead
"_stext", that is at the same address on x86_64 and others, but not on s390:
[acme@localhost perf-4.11.0-rc6]$ head -15 /proc/kallsyms
0000000000000000 T _text
0000000000000418 t iplstart
0000000000000800 T start
000000000000080a t .base
000000000000082e t .sk8x8
0000000000000834 t .gotr
0000000000000842 t .cmd
0000000000000846 t .parm
000000000000084a t .lowcase
0000000000010000 T startup
0000000000010010 T startup_kdump
0000000000010214 t startup_kdump_relocated
0000000000011000 T startup_continue
00000000000112a0 T _ehead
0000000000100000 T _stext
[acme@localhost perf-4.11.0-rc6]$
Which in turn would make 'perf test vmlinux' to fail because it wouldn't find
the symbols before "_stext" in kallsyms.
Fix it by using the return value only for errors and storing the
address, when the symbol is successfully found, in a provided pointer
arg.
Before this patch:
After:
[acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
1: vmlinux symtab matches kallsyms :
--- start ---
test child forked, pid 40693
Looking at the vmlinux_path (8 entries long)
Using /usr/lib/debug/lib/modules/3.10.0-654.el7.s390x/vmlinux for symbols
ERR : 0: _text not on kallsyms
ERR : 0x418: iplstart not on kallsyms
ERR : 0x800: start not on kallsyms
ERR : 0x80a: .base not on kallsyms
ERR : 0x82e: .sk8x8 not on kallsyms
ERR : 0x834: .gotr not on kallsyms
ERR : 0x842: .cmd not on kallsyms
ERR : 0x846: .parm not on kallsyms
ERR : 0x84a: .lowcase not on kallsyms
ERR : 0x10000: startup not on kallsyms
ERR : 0x10010: startup_kdump not on kallsyms
ERR : 0x10214: startup_kdump_relocated not on kallsyms
ERR : 0x11000: startup_continue not on kallsyms
ERR : 0x112a0: _ehead not on kallsyms
<SNIP warnings>
test child finished with -1
---- end ----
vmlinux symtab matches kallsyms: FAILED!
[acme@localhost perf-4.11.0-rc6]$
After:
[acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
1: vmlinux symtab matches kallsyms :
--- start ---
test child forked, pid 47160
<SNIP warnings>
test child finished with 0
---- end ----
vmlinux symtab matches kallsyms: Ok
[acme@localhost perf-4.11.0-rc6]$
Reported-by: Michael Petlan <mpetlan@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-9x9bwgd3btwdk1u51xie93fz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-28 07:21:09 +07:00
|
|
|
if (err)
|
|
|
|
return -1;
|
|
|
|
|
2014-06-17 01:32:09 +07:00
|
|
|
if (symbol_name)
|
|
|
|
*symbol_name = name;
|
2012-12-08 03:39:39 +07:00
|
|
|
|
perf symbols: Accept symbols starting at address 0
That is the case of _text on s390, and we have some functions that return an
address, using address zero to report problems, oops.
This would lead the symbol loading routines to not use "_text" as the reference
relocation symbol, or the first symbol for the kernel, but use instead
"_stext", that is at the same address on x86_64 and others, but not on s390:
[acme@localhost perf-4.11.0-rc6]$ head -15 /proc/kallsyms
0000000000000000 T _text
0000000000000418 t iplstart
0000000000000800 T start
000000000000080a t .base
000000000000082e t .sk8x8
0000000000000834 t .gotr
0000000000000842 t .cmd
0000000000000846 t .parm
000000000000084a t .lowcase
0000000000010000 T startup
0000000000010010 T startup_kdump
0000000000010214 t startup_kdump_relocated
0000000000011000 T startup_continue
00000000000112a0 T _ehead
0000000000100000 T _stext
[acme@localhost perf-4.11.0-rc6]$
Which in turn would make 'perf test vmlinux' to fail because it wouldn't find
the symbols before "_stext" in kallsyms.
Fix it by using the return value only for errors and storing the
address, when the symbol is successfully found, in a provided pointer
arg.
Before this patch:
After:
[acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
1: vmlinux symtab matches kallsyms :
--- start ---
test child forked, pid 40693
Looking at the vmlinux_path (8 entries long)
Using /usr/lib/debug/lib/modules/3.10.0-654.el7.s390x/vmlinux for symbols
ERR : 0: _text not on kallsyms
ERR : 0x418: iplstart not on kallsyms
ERR : 0x800: start not on kallsyms
ERR : 0x80a: .base not on kallsyms
ERR : 0x82e: .sk8x8 not on kallsyms
ERR : 0x834: .gotr not on kallsyms
ERR : 0x842: .cmd not on kallsyms
ERR : 0x846: .parm not on kallsyms
ERR : 0x84a: .lowcase not on kallsyms
ERR : 0x10000: startup not on kallsyms
ERR : 0x10010: startup_kdump not on kallsyms
ERR : 0x10214: startup_kdump_relocated not on kallsyms
ERR : 0x11000: startup_continue not on kallsyms
ERR : 0x112a0: _ehead not on kallsyms
<SNIP warnings>
test child finished with -1
---- end ----
vmlinux symtab matches kallsyms: FAILED!
[acme@localhost perf-4.11.0-rc6]$
After:
[acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
1: vmlinux symtab matches kallsyms :
--- start ---
test child forked, pid 47160
<SNIP warnings>
test child finished with 0
---- end ----
vmlinux symtab matches kallsyms: Ok
[acme@localhost perf-4.11.0-rc6]$
Reported-by: Michael Petlan <mpetlan@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-9x9bwgd3btwdk1u51xie93fz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-28 07:21:09 +07:00
|
|
|
*start = addr;
|
2019-05-08 20:20:03 +07:00
|
|
|
|
|
|
|
err = kallsyms__get_function_start(filename, "_etext", &addr);
|
|
|
|
if (!err)
|
|
|
|
*end = addr;
|
|
|
|
|
perf symbols: Accept symbols starting at address 0
That is the case of _text on s390, and we have some functions that return an
address, using address zero to report problems, oops.
This would lead the symbol loading routines to not use "_text" as the reference
relocation symbol, or the first symbol for the kernel, but use instead
"_stext", that is at the same address on x86_64 and others, but not on s390:
[acme@localhost perf-4.11.0-rc6]$ head -15 /proc/kallsyms
0000000000000000 T _text
0000000000000418 t iplstart
0000000000000800 T start
000000000000080a t .base
000000000000082e t .sk8x8
0000000000000834 t .gotr
0000000000000842 t .cmd
0000000000000846 t .parm
000000000000084a t .lowcase
0000000000010000 T startup
0000000000010010 T startup_kdump
0000000000010214 t startup_kdump_relocated
0000000000011000 T startup_continue
00000000000112a0 T _ehead
0000000000100000 T _stext
[acme@localhost perf-4.11.0-rc6]$
Which in turn would make 'perf test vmlinux' to fail because it wouldn't find
the symbols before "_stext" in kallsyms.
Fix it by using the return value only for errors and storing the
address, when the symbol is successfully found, in a provided pointer
arg.
Before this patch:
After:
[acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
1: vmlinux symtab matches kallsyms :
--- start ---
test child forked, pid 40693
Looking at the vmlinux_path (8 entries long)
Using /usr/lib/debug/lib/modules/3.10.0-654.el7.s390x/vmlinux for symbols
ERR : 0: _text not on kallsyms
ERR : 0x418: iplstart not on kallsyms
ERR : 0x800: start not on kallsyms
ERR : 0x80a: .base not on kallsyms
ERR : 0x82e: .sk8x8 not on kallsyms
ERR : 0x834: .gotr not on kallsyms
ERR : 0x842: .cmd not on kallsyms
ERR : 0x846: .parm not on kallsyms
ERR : 0x84a: .lowcase not on kallsyms
ERR : 0x10000: startup not on kallsyms
ERR : 0x10010: startup_kdump not on kallsyms
ERR : 0x10214: startup_kdump_relocated not on kallsyms
ERR : 0x11000: startup_continue not on kallsyms
ERR : 0x112a0: _ehead not on kallsyms
<SNIP warnings>
test child finished with -1
---- end ----
vmlinux symtab matches kallsyms: FAILED!
[acme@localhost perf-4.11.0-rc6]$
After:
[acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
1: vmlinux symtab matches kallsyms :
--- start ---
test child forked, pid 47160
<SNIP warnings>
test child finished with 0
---- end ----
vmlinux symtab matches kallsyms: Ok
[acme@localhost perf-4.11.0-rc6]$
Reported-by: Michael Petlan <mpetlan@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-9x9bwgd3btwdk1u51xie93fz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-28 07:21:09 +07:00
|
|
|
return 0;
|
2012-12-08 03:39:39 +07:00
|
|
|
}
|
|
|
|
|
2018-05-22 17:54:36 +07:00
|
|
|
int machine__create_extra_kernel_map(struct machine *machine,
|
|
|
|
struct dso *kernel,
|
|
|
|
struct extra_kernel_map *xm)
|
2018-05-22 17:54:33 +07:00
|
|
|
{
|
|
|
|
struct kmap *kmap;
|
|
|
|
struct map *map;
|
|
|
|
|
|
|
|
map = map__new2(xm->start, kernel);
|
|
|
|
if (!map)
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
map->end = xm->end;
|
|
|
|
map->pgoff = xm->pgoff;
|
|
|
|
|
|
|
|
kmap = map__kmap(map);
|
|
|
|
|
|
|
|
kmap->kmaps = &machine->kmaps;
|
2018-05-22 17:54:35 +07:00
|
|
|
strlcpy(kmap->name, xm->name, KMAP_NAME_LEN);
|
2018-05-22 17:54:33 +07:00
|
|
|
|
|
|
|
map_groups__insert(&machine->kmaps, map);
|
|
|
|
|
2018-05-22 17:54:35 +07:00
|
|
|
pr_debug2("Added extra kernel map %s %" PRIx64 "-%" PRIx64 "\n",
|
|
|
|
kmap->name, map->start, map->end);
|
2018-05-22 17:54:33 +07:00
|
|
|
|
|
|
|
map__put(map);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static u64 find_entry_trampoline(struct dso *dso)
|
|
|
|
{
|
|
|
|
/* Duplicates are removed so lookup all aliases */
|
|
|
|
const char *syms[] = {
|
|
|
|
"_entry_trampoline",
|
|
|
|
"__entry_trampoline_start",
|
|
|
|
"entry_SYSCALL_64_trampoline",
|
|
|
|
};
|
|
|
|
struct symbol *sym = dso__first_symbol(dso);
|
|
|
|
unsigned int i;
|
|
|
|
|
|
|
|
for (; sym; sym = dso__next_symbol(sym)) {
|
|
|
|
if (sym->binding != STB_GLOBAL)
|
|
|
|
continue;
|
|
|
|
for (i = 0; i < ARRAY_SIZE(syms); i++) {
|
|
|
|
if (!strcmp(sym->name, syms[i]))
|
|
|
|
return sym->start;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* These values can be used for kernels that do not have symbols for the entry
|
|
|
|
* trampolines in kallsyms.
|
|
|
|
*/
|
|
|
|
#define X86_64_CPU_ENTRY_AREA_PER_CPU 0xfffffe0000000000ULL
|
|
|
|
#define X86_64_CPU_ENTRY_AREA_SIZE 0x2c000
|
|
|
|
#define X86_64_ENTRY_TRAMPOLINE 0x6000
|
|
|
|
|
|
|
|
/* Map x86_64 PTI entry trampolines */
|
|
|
|
int machine__map_x86_64_entry_trampolines(struct machine *machine,
|
|
|
|
struct dso *kernel)
|
|
|
|
{
|
2018-05-22 17:54:36 +07:00
|
|
|
struct map_groups *kmaps = &machine->kmaps;
|
|
|
|
struct maps *maps = &kmaps->maps;
|
2018-05-22 17:54:33 +07:00
|
|
|
int nr_cpus_avail, cpu;
|
2018-05-22 17:54:36 +07:00
|
|
|
bool found = false;
|
|
|
|
struct map *map;
|
|
|
|
u64 pgoff;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* In the vmlinux case, pgoff is a virtual address which must now be
|
|
|
|
* mapped to a vmlinux offset.
|
|
|
|
*/
|
2019-10-28 21:31:38 +07:00
|
|
|
maps__for_each_entry(maps, map) {
|
2018-05-22 17:54:36 +07:00
|
|
|
struct kmap *kmap = __map__kmap(map);
|
|
|
|
struct map *dest_map;
|
|
|
|
|
|
|
|
if (!kmap || !is_entry_trampoline(kmap->name))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
dest_map = map_groups__find(kmaps, map->pgoff);
|
|
|
|
if (dest_map != map)
|
|
|
|
map->pgoff = dest_map->map_ip(dest_map, map->pgoff);
|
|
|
|
found = true;
|
|
|
|
}
|
|
|
|
if (found || machine->trampolines_mapped)
|
|
|
|
return 0;
|
2018-05-22 17:54:33 +07:00
|
|
|
|
2018-05-22 17:54:36 +07:00
|
|
|
pgoff = find_entry_trampoline(kernel);
|
2018-05-22 17:54:33 +07:00
|
|
|
if (!pgoff)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
nr_cpus_avail = machine__nr_cpus_avail(machine);
|
|
|
|
|
|
|
|
/* Add a 1 page map for each CPU's entry trampoline */
|
|
|
|
for (cpu = 0; cpu < nr_cpus_avail; cpu++) {
|
|
|
|
u64 va = X86_64_CPU_ENTRY_AREA_PER_CPU +
|
|
|
|
cpu * X86_64_CPU_ENTRY_AREA_SIZE +
|
|
|
|
X86_64_ENTRY_TRAMPOLINE;
|
|
|
|
struct extra_kernel_map xm = {
|
|
|
|
.start = va,
|
|
|
|
.end = va + page_size,
|
|
|
|
.pgoff = pgoff,
|
|
|
|
};
|
|
|
|
|
2018-05-22 17:54:35 +07:00
|
|
|
strlcpy(xm.name, ENTRY_TRAMPOLINE_NAME, KMAP_NAME_LEN);
|
|
|
|
|
2018-05-22 17:54:33 +07:00
|
|
|
if (machine__create_extra_kernel_map(machine, kernel, &xm) < 0)
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2018-05-22 17:54:36 +07:00
|
|
|
machine->trampolines_mapped = nr_cpus_avail;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int __weak machine__create_extra_kernel_maps(struct machine *machine __maybe_unused,
|
|
|
|
struct dso *kernel __maybe_unused)
|
|
|
|
{
|
2018-05-22 17:54:33 +07:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2018-02-15 19:26:32 +07:00
|
|
|
static int
|
|
|
|
__machine__create_kernel_maps(struct machine *machine, struct dso *kernel)
|
2012-12-08 03:39:39 +07:00
|
|
|
{
|
2018-04-27 02:52:34 +07:00
|
|
|
struct kmap *kmap;
|
|
|
|
struct map *map;
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2015-12-09 09:11:33 +07:00
|
|
|
/* In case of renewal the kernel map, destroy previous one */
|
|
|
|
machine__destroy_kernel_maps(machine);
|
|
|
|
|
2018-04-27 02:52:34 +07:00
|
|
|
machine->vmlinux_map = map__new2(0, kernel);
|
|
|
|
if (machine->vmlinux_map == NULL)
|
|
|
|
return -1;
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2018-04-27 02:52:34 +07:00
|
|
|
machine->vmlinux_map->map_ip = machine->vmlinux_map->unmap_ip = identity__map_ip;
|
|
|
|
map = machine__kernel_map(machine);
|
|
|
|
kmap = map__kmap(map);
|
|
|
|
if (!kmap)
|
|
|
|
return -1;
|
2015-04-07 15:22:45 +07:00
|
|
|
|
2018-04-27 02:52:34 +07:00
|
|
|
kmap->kmaps = &machine->kmaps;
|
|
|
|
map_groups__insert(&machine->kmaps, map);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
void machine__destroy_kernel_maps(struct machine *machine)
|
|
|
|
{
|
2018-04-27 02:52:34 +07:00
|
|
|
struct kmap *kmap;
|
|
|
|
struct map *map = machine__kernel_map(machine);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2018-04-27 02:52:34 +07:00
|
|
|
if (map == NULL)
|
|
|
|
return;
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2018-04-27 02:52:34 +07:00
|
|
|
kmap = map__kmap(map);
|
|
|
|
map_groups__remove(&machine->kmaps, map);
|
|
|
|
if (kmap && kmap->ref_reloc_sym) {
|
|
|
|
zfree((char **)&kmap->ref_reloc_sym->name);
|
|
|
|
zfree(&kmap->ref_reloc_sym);
|
2012-12-08 03:39:39 +07:00
|
|
|
}
|
2018-04-27 02:52:34 +07:00
|
|
|
|
|
|
|
map__zput(machine->vmlinux_map);
|
2012-12-08 03:39:39 +07:00
|
|
|
}
|
|
|
|
|
2012-12-19 05:15:48 +07:00
|
|
|
int machines__create_guest_kernel_maps(struct machines *machines)
|
2012-12-08 03:39:39 +07:00
|
|
|
{
|
|
|
|
int ret = 0;
|
|
|
|
struct dirent **namelist = NULL;
|
|
|
|
int i, items = 0;
|
|
|
|
char path[PATH_MAX];
|
|
|
|
pid_t pid;
|
|
|
|
char *endp;
|
|
|
|
|
|
|
|
if (symbol_conf.default_guest_vmlinux_name ||
|
|
|
|
symbol_conf.default_guest_modules ||
|
|
|
|
symbol_conf.default_guest_kallsyms) {
|
|
|
|
machines__create_kernel_maps(machines, DEFAULT_GUEST_KERNEL_ID);
|
|
|
|
}
|
|
|
|
|
|
|
|
if (symbol_conf.guestmount) {
|
|
|
|
items = scandir(symbol_conf.guestmount, &namelist, NULL, NULL);
|
|
|
|
if (items <= 0)
|
|
|
|
return -ENOENT;
|
|
|
|
for (i = 0; i < items; i++) {
|
|
|
|
if (!isdigit(namelist[i]->d_name[0])) {
|
|
|
|
/* Filter out . and .. */
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
pid = (pid_t)strtol(namelist[i]->d_name, &endp, 10);
|
|
|
|
if ((*endp != '\0') ||
|
|
|
|
(endp == namelist[i]->d_name) ||
|
|
|
|
(errno == ERANGE)) {
|
|
|
|
pr_debug("invalid directory (%s). Skipping.\n",
|
|
|
|
namelist[i]->d_name);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
sprintf(path, "%s/%s/proc/kallsyms",
|
|
|
|
symbol_conf.guestmount,
|
|
|
|
namelist[i]->d_name);
|
|
|
|
ret = access(path, R_OK);
|
|
|
|
if (ret) {
|
|
|
|
pr_debug("Can't access file %s\n", path);
|
|
|
|
goto failure;
|
|
|
|
}
|
|
|
|
machines__create_kernel_maps(machines, pid);
|
|
|
|
}
|
|
|
|
failure:
|
|
|
|
free(namelist);
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2012-12-19 05:15:48 +07:00
|
|
|
void machines__destroy_kernel_maps(struct machines *machines)
|
2012-12-08 03:39:39 +07:00
|
|
|
{
|
2018-12-07 02:18:14 +07:00
|
|
|
struct rb_node *next = rb_first_cached(&machines->guests);
|
2012-12-19 05:15:48 +07:00
|
|
|
|
|
|
|
machine__destroy_kernel_maps(&machines->host);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
|
|
|
while (next) {
|
|
|
|
struct machine *pos = rb_entry(next, struct machine, rb_node);
|
|
|
|
|
|
|
|
next = rb_next(&pos->rb_node);
|
2018-12-07 02:18:14 +07:00
|
|
|
rb_erase_cached(&pos->rb_node, &machines->guests);
|
2012-12-08 03:39:39 +07:00
|
|
|
machine__delete(pos);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2012-12-19 05:15:48 +07:00
|
|
|
int machines__create_kernel_maps(struct machines *machines, pid_t pid)
|
2012-12-08 03:39:39 +07:00
|
|
|
{
|
|
|
|
struct machine *machine = machines__findnew(machines, pid);
|
|
|
|
|
|
|
|
if (machine == NULL)
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
return machine__create_kernel_maps(machine);
|
|
|
|
}
|
|
|
|
|
2018-04-27 02:52:34 +07:00
|
|
|
int machine__load_kallsyms(struct machine *machine, const char *filename)
|
2012-12-08 03:39:39 +07:00
|
|
|
{
|
2015-09-30 21:54:04 +07:00
|
|
|
struct map *map = machine__kernel_map(machine);
|
2018-02-15 19:26:33 +07:00
|
|
|
int ret = __dso__load_kallsyms(map->dso, filename, map, true);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
|
|
|
if (ret > 0) {
|
2018-04-27 02:52:34 +07:00
|
|
|
dso__set_loaded(map->dso);
|
2012-12-08 03:39:39 +07:00
|
|
|
/*
|
|
|
|
* Since /proc/kallsyms will have multiple sessions for the
|
|
|
|
* kernel, with modules between them, fixup the end of all
|
|
|
|
* sections.
|
|
|
|
*/
|
2018-04-27 02:52:34 +07:00
|
|
|
map_groups__fixup_end(&machine->kmaps);
|
2012-12-08 03:39:39 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2018-04-25 22:18:11 +07:00
|
|
|
int machine__load_vmlinux_path(struct machine *machine)
|
2012-12-08 03:39:39 +07:00
|
|
|
{
|
2015-09-30 21:54:04 +07:00
|
|
|
struct map *map = machine__kernel_map(machine);
|
2016-09-02 05:25:52 +07:00
|
|
|
int ret = dso__load_vmlinux_path(map->dso, map);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2013-08-07 18:38:47 +07:00
|
|
|
if (ret > 0)
|
2018-04-27 02:52:34 +07:00
|
|
|
dso__set_loaded(map->dso);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static char *get_kernel_version(const char *root_dir)
|
|
|
|
{
|
|
|
|
char version[PATH_MAX];
|
|
|
|
FILE *file;
|
|
|
|
char *name, *tmp;
|
|
|
|
const char *prefix = "Linux version ";
|
|
|
|
|
|
|
|
sprintf(version, "%s/proc/version", root_dir);
|
|
|
|
file = fopen(version, "r");
|
|
|
|
if (!file)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
tmp = fgets(version, sizeof(version), file);
|
|
|
|
fclose(file);
|
2019-05-28 20:41:28 +07:00
|
|
|
if (!tmp)
|
|
|
|
return NULL;
|
2012-12-08 03:39:39 +07:00
|
|
|
|
|
|
|
name = strstr(version, prefix);
|
|
|
|
if (!name)
|
|
|
|
return NULL;
|
|
|
|
name += strlen(prefix);
|
|
|
|
tmp = strchr(name, ' ');
|
|
|
|
if (tmp)
|
|
|
|
*tmp = '\0';
|
|
|
|
|
|
|
|
return strdup(name);
|
|
|
|
}
|
|
|
|
|
2015-02-13 04:20:01 +07:00
|
|
|
static bool is_kmod_dso(struct dso *dso)
|
|
|
|
{
|
|
|
|
return dso->symtab_type == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE ||
|
|
|
|
dso->symtab_type == DSO_BINARY_TYPE__GUEST_KMODULE;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int map_groups__set_module_path(struct map_groups *mg, const char *path,
|
|
|
|
struct kmod_path *m)
|
|
|
|
{
|
|
|
|
char *long_name;
|
2018-04-24 22:16:09 +07:00
|
|
|
struct map *map = map_groups__find_by_name(mg, m->name);
|
2015-02-13 04:20:01 +07:00
|
|
|
|
|
|
|
if (map == NULL)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
long_name = strdup(path);
|
|
|
|
if (long_name == NULL)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
dso__set_long_name(map->dso, long_name, true);
|
|
|
|
dso__kernel_module_get_build_id(map->dso, "");
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Full name could reveal us kmod compression, so
|
|
|
|
* we need to update the symtab_type if needed.
|
|
|
|
*/
|
2018-08-17 16:48:07 +07:00
|
|
|
if (m->comp && is_kmod_dso(map->dso)) {
|
2015-02-13 04:20:01 +07:00
|
|
|
map->dso->symtab_type++;
|
2018-08-17 16:48:07 +07:00
|
|
|
map->dso->comp = m->comp;
|
|
|
|
}
|
2015-02-13 04:20:01 +07:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2012-12-08 03:39:39 +07:00
|
|
|
static int map_groups__set_modules_path_dir(struct map_groups *mg,
|
2014-04-27 00:17:55 +07:00
|
|
|
const char *dir_name, int depth)
|
2012-12-08 03:39:39 +07:00
|
|
|
{
|
|
|
|
struct dirent *dent;
|
|
|
|
DIR *dir = opendir(dir_name);
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
if (!dir) {
|
|
|
|
pr_debug("%s: cannot open %s dir\n", __func__, dir_name);
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
while ((dent = readdir(dir)) != NULL) {
|
|
|
|
char path[PATH_MAX];
|
|
|
|
struct stat st;
|
|
|
|
|
|
|
|
/*sshfs might return bad dent->d_type, so we have to stat*/
|
|
|
|
snprintf(path, sizeof(path), "%s/%s", dir_name, dent->d_name);
|
|
|
|
if (stat(path, &st))
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (S_ISDIR(st.st_mode)) {
|
|
|
|
if (!strcmp(dent->d_name, ".") ||
|
|
|
|
!strcmp(dent->d_name, ".."))
|
|
|
|
continue;
|
|
|
|
|
2014-04-27 00:17:55 +07:00
|
|
|
/* Do not follow top-level source and build symlinks */
|
|
|
|
if (depth == 0) {
|
|
|
|
if (!strcmp(dent->d_name, "source") ||
|
|
|
|
!strcmp(dent->d_name, "build"))
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
|
|
|
ret = map_groups__set_modules_path_dir(mg, path,
|
|
|
|
depth + 1);
|
2012-12-08 03:39:39 +07:00
|
|
|
if (ret < 0)
|
|
|
|
goto out;
|
|
|
|
} else {
|
2015-02-13 04:20:01 +07:00
|
|
|
struct kmod_path m;
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2015-02-13 04:20:01 +07:00
|
|
|
ret = kmod_path__parse_name(&m, dent->d_name);
|
|
|
|
if (ret)
|
|
|
|
goto out;
|
2014-11-04 08:14:27 +07:00
|
|
|
|
2015-02-13 04:20:01 +07:00
|
|
|
if (m.kmod)
|
|
|
|
ret = map_groups__set_module_path(mg, path, &m);
|
2014-11-04 08:14:27 +07:00
|
|
|
|
2019-07-04 22:06:20 +07:00
|
|
|
zfree(&m.name);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2015-02-13 04:20:01 +07:00
|
|
|
if (ret)
|
2012-12-08 03:39:39 +07:00
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
closedir(dir);
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int machine__set_modules_path(struct machine *machine)
|
|
|
|
{
|
|
|
|
char *version;
|
|
|
|
char modules_path[PATH_MAX];
|
|
|
|
|
|
|
|
version = get_kernel_version(machine->root_dir);
|
|
|
|
if (!version)
|
|
|
|
return -1;
|
|
|
|
|
2014-04-27 00:17:55 +07:00
|
|
|
snprintf(modules_path, sizeof(modules_path), "%s/lib/modules/%s",
|
2012-12-08 03:39:39 +07:00
|
|
|
machine->root_dir, version);
|
|
|
|
free(version);
|
|
|
|
|
2014-04-27 00:17:55 +07:00
|
|
|
return map_groups__set_modules_path_dir(&machine->kmaps, modules_path, 0);
|
2012-12-08 03:39:39 +07:00
|
|
|
}
|
2016-07-21 10:10:51 +07:00
|
|
|
int __weak arch__fix_module_text_start(u64 *start __maybe_unused,
|
2019-07-24 19:27:02 +07:00
|
|
|
u64 *size __maybe_unused,
|
2016-07-21 10:10:51 +07:00
|
|
|
const char *name __maybe_unused)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2017-08-03 20:49:02 +07:00
|
|
|
static int machine__create_module(void *arg, const char *name, u64 start,
|
|
|
|
u64 size)
|
2012-12-08 03:39:39 +07:00
|
|
|
{
|
2013-10-08 15:45:48 +07:00
|
|
|
struct machine *machine = arg;
|
2012-12-08 03:39:39 +07:00
|
|
|
struct map *map;
|
2013-10-08 15:45:48 +07:00
|
|
|
|
2019-07-24 19:27:02 +07:00
|
|
|
if (arch__fix_module_text_start(&start, &size, name) < 0)
|
2016-07-21 10:10:51 +07:00
|
|
|
return -1;
|
|
|
|
|
2019-11-14 22:28:41 +07:00
|
|
|
map = machine__addnew_module_map(machine, start, name);
|
2013-10-08 15:45:48 +07:00
|
|
|
if (map == NULL)
|
|
|
|
return -1;
|
2017-08-03 20:49:02 +07:00
|
|
|
map->end = start + size;
|
2013-10-08 15:45:48 +07:00
|
|
|
|
|
|
|
dso__kernel_module_get_build_id(map->dso, machine->root_dir);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int machine__create_modules(struct machine *machine)
|
|
|
|
{
|
2012-12-08 03:39:39 +07:00
|
|
|
const char *modules;
|
|
|
|
char path[PATH_MAX];
|
|
|
|
|
2013-09-22 17:22:09 +07:00
|
|
|
if (machine__is_default_guest(machine)) {
|
2012-12-08 03:39:39 +07:00
|
|
|
modules = symbol_conf.default_guest_modules;
|
2013-09-22 17:22:09 +07:00
|
|
|
} else {
|
|
|
|
snprintf(path, PATH_MAX, "%s/proc/modules", machine->root_dir);
|
2012-12-08 03:39:39 +07:00
|
|
|
modules = path;
|
|
|
|
}
|
|
|
|
|
2013-09-22 17:22:09 +07:00
|
|
|
if (symbol__restricted_filename(modules, "/proc/modules"))
|
2012-12-08 03:39:39 +07:00
|
|
|
return -1;
|
|
|
|
|
2013-10-08 15:45:48 +07:00
|
|
|
if (modules__parse(modules, machine, machine__create_module))
|
2012-12-08 03:39:39 +07:00
|
|
|
return -1;
|
|
|
|
|
2013-10-08 15:45:48 +07:00
|
|
|
if (!machine__set_modules_path(machine))
|
|
|
|
return 0;
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2013-10-08 15:45:48 +07:00
|
|
|
pr_debug("Problems setting modules path maps, continuing anyway...\n");
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2013-07-16 03:27:53 +07:00
|
|
|
return 0;
|
2012-12-08 03:39:39 +07:00
|
|
|
}
|
|
|
|
|
2018-02-15 19:26:32 +07:00
|
|
|
static void machine__set_kernel_mmap(struct machine *machine,
|
|
|
|
u64 start, u64 end)
|
|
|
|
{
|
2018-04-27 02:52:34 +07:00
|
|
|
machine->vmlinux_map->start = start;
|
|
|
|
machine->vmlinux_map->end = end;
|
|
|
|
/*
|
|
|
|
* Be a bit paranoid here, some perf.data file came with
|
|
|
|
* a zero sized synthesized MMAP event for the kernel.
|
|
|
|
*/
|
|
|
|
if (start == 0 && end == 0)
|
|
|
|
machine->vmlinux_map->end = ~0ULL;
|
2018-02-15 19:26:32 +07:00
|
|
|
}
|
|
|
|
|
perf machine: Update kernel map address and re-order properly
Since commit 1fb87b8e9599 ("perf machine: Don't search for active kernel
start in __machine__create_kernel_maps"), the __machine__create_kernel_maps()
just create a map what start and end are both zero. Though the address will be
updated later, the order of map in the rbtree may be incorrect.
The commit ee05d21791db ("perf machine: Set main kernel end address properly")
fixed the logic in machine__create_kernel_maps(), but it's still wrong in
function machine__process_kernel_mmap_event().
To reproduce this issue, we need an environment which the module address
is before the kernel text segment. I tested it on an aarch64 machine with
kernel 4.19.25:
[root@localhost hulk]# grep _stext /proc/kallsyms
ffff000008081000 T _stext
[root@localhost hulk]# grep _etext /proc/kallsyms
ffff000009780000 R _etext
[root@localhost hulk]# tail /proc/modules
hisi_sas_v2_hw 77824 0 - Live 0xffff00000191d000
nvme_core 126976 7 nvme, Live 0xffff0000018b6000
mdio 20480 1 ixgbe, Live 0xffff0000018ab000
hisi_sas_main 106496 1 hisi_sas_v2_hw, Live 0xffff000001861000
hns_mdio 20480 2 - Live 0xffff000001822000
hnae 28672 3 hns_dsaf,hns_enet_drv, Live 0xffff000001815000
dm_mirror 40960 0 - Live 0xffff000001804000
dm_region_hash 32768 1 dm_mirror, Live 0xffff0000017f5000
dm_log 32768 2 dm_mirror,dm_region_hash, Live 0xffff0000017e7000
dm_mod 315392 17 dm_mirror,dm_log, Live 0xffff000001780000
[root@localhost hulk]#
Before fix:
[root@localhost bin]# perf record sleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ]
[root@localhost bin]# perf buildid-list -i perf.data
4c4e46c971ca935f781e603a09b52a92e8bdfee8 [vdso]
[root@localhost bin]# perf buildid-list -i perf.data -H
0000000000000000000000000000000000000000 /proc/kcore
[root@localhost bin]#
After fix:
[root@localhost tools]# ./perf/perf record sleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ]
[root@localhost tools]# ./perf/perf buildid-list -i perf.data
28a6c690262896dbd1b5e1011ed81623e6db0610 [kernel.kallsyms]
106c14ce6e4acea3453e484dc604d66666f08a2f [vdso]
[root@localhost tools]# ./perf/perf buildid-list -i perf.data -H
28a6c690262896dbd1b5e1011ed81623e6db0610 /proc/kcore
Signed-off-by: Wei Li <liwei391@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190228092003.34071-1-liwei391@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-28 16:20:03 +07:00
|
|
|
static void machine__update_kernel_mmap(struct machine *machine,
|
|
|
|
u64 start, u64 end)
|
|
|
|
{
|
|
|
|
struct map *map = machine__kernel_map(machine);
|
|
|
|
|
|
|
|
map__get(map);
|
|
|
|
map_groups__remove(&machine->kmaps, map);
|
|
|
|
|
|
|
|
machine__set_kernel_mmap(machine, start, end);
|
|
|
|
|
|
|
|
map_groups__insert(&machine->kmaps, map);
|
|
|
|
map__put(map);
|
|
|
|
}
|
|
|
|
|
2012-12-08 03:39:39 +07:00
|
|
|
int machine__create_kernel_maps(struct machine *machine)
|
|
|
|
{
|
|
|
|
struct dso *kernel = machine__get_kernel(machine);
|
perf symbols: Accept symbols starting at address 0
That is the case of _text on s390, and we have some functions that return an
address, using address zero to report problems, oops.
This would lead the symbol loading routines to not use "_text" as the reference
relocation symbol, or the first symbol for the kernel, but use instead
"_stext", that is at the same address on x86_64 and others, but not on s390:
[acme@localhost perf-4.11.0-rc6]$ head -15 /proc/kallsyms
0000000000000000 T _text
0000000000000418 t iplstart
0000000000000800 T start
000000000000080a t .base
000000000000082e t .sk8x8
0000000000000834 t .gotr
0000000000000842 t .cmd
0000000000000846 t .parm
000000000000084a t .lowcase
0000000000010000 T startup
0000000000010010 T startup_kdump
0000000000010214 t startup_kdump_relocated
0000000000011000 T startup_continue
00000000000112a0 T _ehead
0000000000100000 T _stext
[acme@localhost perf-4.11.0-rc6]$
Which in turn would make 'perf test vmlinux' to fail because it wouldn't find
the symbols before "_stext" in kallsyms.
Fix it by using the return value only for errors and storing the
address, when the symbol is successfully found, in a provided pointer
arg.
Before this patch:
After:
[acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
1: vmlinux symtab matches kallsyms :
--- start ---
test child forked, pid 40693
Looking at the vmlinux_path (8 entries long)
Using /usr/lib/debug/lib/modules/3.10.0-654.el7.s390x/vmlinux for symbols
ERR : 0: _text not on kallsyms
ERR : 0x418: iplstart not on kallsyms
ERR : 0x800: start not on kallsyms
ERR : 0x80a: .base not on kallsyms
ERR : 0x82e: .sk8x8 not on kallsyms
ERR : 0x834: .gotr not on kallsyms
ERR : 0x842: .cmd not on kallsyms
ERR : 0x846: .parm not on kallsyms
ERR : 0x84a: .lowcase not on kallsyms
ERR : 0x10000: startup not on kallsyms
ERR : 0x10010: startup_kdump not on kallsyms
ERR : 0x10214: startup_kdump_relocated not on kallsyms
ERR : 0x11000: startup_continue not on kallsyms
ERR : 0x112a0: _ehead not on kallsyms
<SNIP warnings>
test child finished with -1
---- end ----
vmlinux symtab matches kallsyms: FAILED!
[acme@localhost perf-4.11.0-rc6]$
After:
[acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
1: vmlinux symtab matches kallsyms :
--- start ---
test child forked, pid 47160
<SNIP warnings>
test child finished with 0
---- end ----
vmlinux symtab matches kallsyms: Ok
[acme@localhost perf-4.11.0-rc6]$
Reported-by: Michael Petlan <mpetlan@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-9x9bwgd3btwdk1u51xie93fz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-04-28 07:21:09 +07:00
|
|
|
const char *name = NULL;
|
2018-02-19 17:05:45 +07:00
|
|
|
struct map *map;
|
2019-05-08 20:20:03 +07:00
|
|
|
u64 start = 0, end = ~0ULL;
|
2015-11-18 13:40:33 +07:00
|
|
|
int ret;
|
|
|
|
|
2016-05-17 21:52:26 +07:00
|
|
|
if (kernel == NULL)
|
2014-01-29 21:14:39 +07:00
|
|
|
return -1;
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2015-11-18 13:40:33 +07:00
|
|
|
ret = __machine__create_kernel_maps(machine, kernel);
|
|
|
|
if (ret < 0)
|
2018-05-22 17:54:36 +07:00
|
|
|
goto out_put;
|
2012-12-08 03:39:39 +07:00
|
|
|
|
|
|
|
if (symbol_conf.use_modules && machine__create_modules(machine) < 0) {
|
|
|
|
if (machine__is_host(machine))
|
|
|
|
pr_debug("Problems creating module maps, "
|
|
|
|
"continuing anyway...\n");
|
|
|
|
else
|
|
|
|
pr_debug("Problems creating module maps for guest %d, "
|
|
|
|
"continuing anyway...\n", machine->pid);
|
|
|
|
}
|
|
|
|
|
2019-05-08 20:20:03 +07:00
|
|
|
if (!machine__get_running_kernel_start(machine, &name, &start, &end)) {
|
2017-06-26 16:51:53 +07:00
|
|
|
if (name &&
|
2019-05-08 20:20:03 +07:00
|
|
|
map__set_kallsyms_ref_reloc_sym(machine->vmlinux_map, name, start)) {
|
2017-06-26 16:51:53 +07:00
|
|
|
machine__destroy_kernel_maps(machine);
|
2018-05-22 17:54:36 +07:00
|
|
|
ret = -1;
|
|
|
|
goto out_put;
|
2017-06-26 16:51:53 +07:00
|
|
|
}
|
2018-02-19 17:05:45 +07:00
|
|
|
|
perf machine: Update kernel map address and re-order properly
Since commit 1fb87b8e9599 ("perf machine: Don't search for active kernel
start in __machine__create_kernel_maps"), the __machine__create_kernel_maps()
just create a map what start and end are both zero. Though the address will be
updated later, the order of map in the rbtree may be incorrect.
The commit ee05d21791db ("perf machine: Set main kernel end address properly")
fixed the logic in machine__create_kernel_maps(), but it's still wrong in
function machine__process_kernel_mmap_event().
To reproduce this issue, we need an environment which the module address
is before the kernel text segment. I tested it on an aarch64 machine with
kernel 4.19.25:
[root@localhost hulk]# grep _stext /proc/kallsyms
ffff000008081000 T _stext
[root@localhost hulk]# grep _etext /proc/kallsyms
ffff000009780000 R _etext
[root@localhost hulk]# tail /proc/modules
hisi_sas_v2_hw 77824 0 - Live 0xffff00000191d000
nvme_core 126976 7 nvme, Live 0xffff0000018b6000
mdio 20480 1 ixgbe, Live 0xffff0000018ab000
hisi_sas_main 106496 1 hisi_sas_v2_hw, Live 0xffff000001861000
hns_mdio 20480 2 - Live 0xffff000001822000
hnae 28672 3 hns_dsaf,hns_enet_drv, Live 0xffff000001815000
dm_mirror 40960 0 - Live 0xffff000001804000
dm_region_hash 32768 1 dm_mirror, Live 0xffff0000017f5000
dm_log 32768 2 dm_mirror,dm_region_hash, Live 0xffff0000017e7000
dm_mod 315392 17 dm_mirror,dm_log, Live 0xffff000001780000
[root@localhost hulk]#
Before fix:
[root@localhost bin]# perf record sleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ]
[root@localhost bin]# perf buildid-list -i perf.data
4c4e46c971ca935f781e603a09b52a92e8bdfee8 [vdso]
[root@localhost bin]# perf buildid-list -i perf.data -H
0000000000000000000000000000000000000000 /proc/kcore
[root@localhost bin]#
After fix:
[root@localhost tools]# ./perf/perf record sleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ]
[root@localhost tools]# ./perf/perf buildid-list -i perf.data
28a6c690262896dbd1b5e1011ed81623e6db0610 [kernel.kallsyms]
106c14ce6e4acea3453e484dc604d66666f08a2f [vdso]
[root@localhost tools]# ./perf/perf buildid-list -i perf.data -H
28a6c690262896dbd1b5e1011ed81623e6db0610 /proc/kcore
Signed-off-by: Wei Li <liwei391@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190228092003.34071-1-liwei391@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-28 16:20:03 +07:00
|
|
|
/*
|
|
|
|
* we have a real start address now, so re-order the kmaps
|
|
|
|
* assume it's the last in the kmaps
|
|
|
|
*/
|
2019-05-08 20:20:03 +07:00
|
|
|
machine__update_kernel_mmap(machine, start, end);
|
2014-01-29 21:14:39 +07:00
|
|
|
}
|
|
|
|
|
2018-05-22 17:54:36 +07:00
|
|
|
if (machine__create_extra_kernel_maps(machine, kernel))
|
|
|
|
pr_debug("Problems creating extra kernel maps, continuing anyway...\n");
|
|
|
|
|
2019-05-08 20:20:03 +07:00
|
|
|
if (end == ~0ULL) {
|
|
|
|
/* update end address of the kernel map using adjacent module address */
|
|
|
|
map = map__next(machine__kernel_map(machine));
|
|
|
|
if (map)
|
|
|
|
machine__set_kernel_mmap(machine, start, map->start);
|
|
|
|
}
|
|
|
|
|
2018-05-22 17:54:36 +07:00
|
|
|
out_put:
|
|
|
|
dso__put(kernel);
|
|
|
|
return ret;
|
2012-12-08 03:39:39 +07:00
|
|
|
}
|
|
|
|
|
2013-08-07 18:38:51 +07:00
|
|
|
static bool machine__uses_kcore(struct machine *machine)
|
|
|
|
{
|
|
|
|
struct dso *dso;
|
|
|
|
|
2015-05-28 23:06:42 +07:00
|
|
|
list_for_each_entry(dso, &machine->dsos.head, node) {
|
2013-08-07 18:38:51 +07:00
|
|
|
if (dso__is_kcore(dso))
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2018-05-22 17:54:37 +07:00
|
|
|
static bool perf_event__is_extra_kernel_mmap(struct machine *machine,
|
|
|
|
union perf_event *event)
|
|
|
|
{
|
|
|
|
return machine__is(machine, "x86_64") &&
|
|
|
|
is_entry_trampoline(event->mmap.filename);
|
|
|
|
}
|
|
|
|
|
|
|
|
static int machine__process_extra_kernel_map(struct machine *machine,
|
|
|
|
union perf_event *event)
|
|
|
|
{
|
2019-11-01 01:22:24 +07:00
|
|
|
struct dso *kernel = machine__kernel_dso(machine);
|
2018-05-22 17:54:37 +07:00
|
|
|
struct extra_kernel_map xm = {
|
|
|
|
.start = event->mmap.start,
|
|
|
|
.end = event->mmap.start + event->mmap.len,
|
|
|
|
.pgoff = event->mmap.pgoff,
|
|
|
|
};
|
|
|
|
|
|
|
|
if (kernel == NULL)
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
strlcpy(xm.name, event->mmap.filename, KMAP_NAME_LEN);
|
|
|
|
|
|
|
|
return machine__create_extra_kernel_map(machine, kernel, &xm);
|
|
|
|
}
|
|
|
|
|
2012-10-07 02:26:02 +07:00
|
|
|
static int machine__process_kernel_mmap_event(struct machine *machine,
|
|
|
|
union perf_event *event)
|
|
|
|
{
|
|
|
|
struct map *map;
|
|
|
|
enum dso_kernel_type kernel_type;
|
|
|
|
bool is_kernel_mmap;
|
|
|
|
|
2013-08-07 18:38:51 +07:00
|
|
|
/* If we have maps from kcore then we do not need or want any others */
|
|
|
|
if (machine__uses_kcore(machine))
|
|
|
|
return 0;
|
|
|
|
|
2012-10-07 02:26:02 +07:00
|
|
|
if (machine__is_host(machine))
|
|
|
|
kernel_type = DSO_TYPE_KERNEL;
|
|
|
|
else
|
|
|
|
kernel_type = DSO_TYPE_GUEST_KERNEL;
|
|
|
|
|
|
|
|
is_kernel_mmap = memcmp(event->mmap.filename,
|
2018-02-15 19:26:30 +07:00
|
|
|
machine->mmap_name,
|
|
|
|
strlen(machine->mmap_name) - 1) == 0;
|
2012-10-07 02:26:02 +07:00
|
|
|
if (event->mmap.filename[0] == '/' ||
|
|
|
|
(!is_kernel_mmap && event->mmap.filename[0] == '[')) {
|
2019-11-14 22:28:41 +07:00
|
|
|
map = machine__addnew_module_map(machine, event->mmap.start,
|
|
|
|
event->mmap.filename);
|
2012-10-07 02:26:02 +07:00
|
|
|
if (map == NULL)
|
|
|
|
goto out_problem;
|
|
|
|
|
|
|
|
map->end = map->start + event->mmap.len;
|
|
|
|
} else if (is_kernel_mmap) {
|
|
|
|
const char *symbol_name = (event->mmap.filename +
|
2018-02-15 19:26:30 +07:00
|
|
|
strlen(machine->mmap_name));
|
2012-10-07 02:26:02 +07:00
|
|
|
/*
|
|
|
|
* Should be there already, from the build-id table in
|
|
|
|
* the header.
|
|
|
|
*/
|
perf tools: Fix build-id matching on vmlinux
There's a problem on finding correct kernel symbols when perf report
runs on a different kernel. Although a part of the problem was solved
by the prior commit 0a7e6d1b6844 ("perf tools: Check recorded kernel
version when finding vmlinux"), there's a remaining problem still.
When perf records samples, it synthesizes the kernel map using
machine__mmap_name() and ref_reloc_sym like "[kernel.kallsyms]_text".
You can easily see it using 'perf report -D' command.
After finishing record, it goes through the recorded events to find
maps/dsos actually used. And then record build-id info of them.
During this process, it needs to load symbols in a dso and it'd call
dso__load_vmlinux_path() since the default value of the symbol_conf.
try_vmlinux_path is true. However it changes dso->long_name to a real
path of the vmlinux file (e.g. /lib/modules/3.16.4/build/vmlinux) if one
is running on a custom kernel.
It resulted in that perf report reads the build-id of the vmlinux, but
cannot use it since it only knows about the [kernel.kallsyms] map. It
then falls back to possible vmlinux paths by using the recorded kernel
version (in case of a recent version) or a running kernel silently.
Even with the recent tools, this still has a possibility of breaking
the result. As the build directory is a symbolic link, if one built a
new kernel in the same directory with different source/config, the old
link to vmlinux will point the new file. So it's absolutely needed to
use build-id when finding a kernel image.
In this patch, it's now changed to try to search a kernel dso in the
existing dso list which was constructed during build-id table parsing
so it'll always have a build-id. If not found, search "[kernel.kallsyms]".
Before:
$ perf report
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ...............................
#
72.15% 0.00% swapper [kernel.kallsyms] [k] set_curr_task_rt
72.15% 0.00% swapper [kernel.kallsyms] [k] native_calibrate_tsc
72.15% 0.00% swapper [kernel.kallsyms] [k] tsc_refine_calibration_work
71.87% 71.87% swapper [kernel.kallsyms] [k] module_finalize
...
After (for the same perf.data):
72.15% 0.00% swapper vmlinux [k] cpu_startup_entry
72.15% 0.00% swapper vmlinux [k] arch_cpu_idle
72.15% 0.00% swapper vmlinux [k] default_idle
71.87% 71.87% swapper vmlinux [k] native_safe_halt
...
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/20140924073356.GB1962@gmail.com
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1415063674-17206-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-04 08:14:33 +07:00
|
|
|
struct dso *kernel = NULL;
|
|
|
|
struct dso *dso;
|
|
|
|
|
2017-04-04 23:15:04 +07:00
|
|
|
down_read(&machine->dsos.lock);
|
2015-06-02 01:40:01 +07:00
|
|
|
|
2015-05-28 23:06:42 +07:00
|
|
|
list_for_each_entry(dso, &machine->dsos.head, node) {
|
2015-06-03 15:52:21 +07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* The cpumode passed to is_kernel_module is not the
|
|
|
|
* cpumode of *this* event. If we insist on passing
|
|
|
|
* correct cpumode to is_kernel_module, we should
|
|
|
|
* record the cpumode when we adding this dso to the
|
|
|
|
* linked list.
|
|
|
|
*
|
|
|
|
* However we don't really need passing correct
|
|
|
|
* cpumode. We know the correct cpumode must be kernel
|
|
|
|
* mode (if not, we should not link it onto kernel_dsos
|
|
|
|
* list).
|
|
|
|
*
|
|
|
|
* Therefore, we pass PERF_RECORD_MISC_CPUMODE_UNKNOWN.
|
|
|
|
* is_kernel_module() treats it as a kernel cpumode.
|
|
|
|
*/
|
|
|
|
|
|
|
|
if (!dso->kernel ||
|
|
|
|
is_kernel_module(dso->long_name,
|
|
|
|
PERF_RECORD_MISC_CPUMODE_UNKNOWN))
|
perf tools: Fix build-id matching on vmlinux
There's a problem on finding correct kernel symbols when perf report
runs on a different kernel. Although a part of the problem was solved
by the prior commit 0a7e6d1b6844 ("perf tools: Check recorded kernel
version when finding vmlinux"), there's a remaining problem still.
When perf records samples, it synthesizes the kernel map using
machine__mmap_name() and ref_reloc_sym like "[kernel.kallsyms]_text".
You can easily see it using 'perf report -D' command.
After finishing record, it goes through the recorded events to find
maps/dsos actually used. And then record build-id info of them.
During this process, it needs to load symbols in a dso and it'd call
dso__load_vmlinux_path() since the default value of the symbol_conf.
try_vmlinux_path is true. However it changes dso->long_name to a real
path of the vmlinux file (e.g. /lib/modules/3.16.4/build/vmlinux) if one
is running on a custom kernel.
It resulted in that perf report reads the build-id of the vmlinux, but
cannot use it since it only knows about the [kernel.kallsyms] map. It
then falls back to possible vmlinux paths by using the recorded kernel
version (in case of a recent version) or a running kernel silently.
Even with the recent tools, this still has a possibility of breaking
the result. As the build directory is a symbolic link, if one built a
new kernel in the same directory with different source/config, the old
link to vmlinux will point the new file. So it's absolutely needed to
use build-id when finding a kernel image.
In this patch, it's now changed to try to search a kernel dso in the
existing dso list which was constructed during build-id table parsing
so it'll always have a build-id. If not found, search "[kernel.kallsyms]".
Before:
$ perf report
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ...............................
#
72.15% 0.00% swapper [kernel.kallsyms] [k] set_curr_task_rt
72.15% 0.00% swapper [kernel.kallsyms] [k] native_calibrate_tsc
72.15% 0.00% swapper [kernel.kallsyms] [k] tsc_refine_calibration_work
71.87% 71.87% swapper [kernel.kallsyms] [k] module_finalize
...
After (for the same perf.data):
72.15% 0.00% swapper vmlinux [k] cpu_startup_entry
72.15% 0.00% swapper vmlinux [k] arch_cpu_idle
72.15% 0.00% swapper vmlinux [k] default_idle
71.87% 71.87% swapper vmlinux [k] native_safe_halt
...
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/20140924073356.GB1962@gmail.com
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1415063674-17206-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-04 08:14:33 +07:00
|
|
|
continue;
|
|
|
|
|
2015-06-03 15:52:21 +07:00
|
|
|
|
perf tools: Fix build-id matching on vmlinux
There's a problem on finding correct kernel symbols when perf report
runs on a different kernel. Although a part of the problem was solved
by the prior commit 0a7e6d1b6844 ("perf tools: Check recorded kernel
version when finding vmlinux"), there's a remaining problem still.
When perf records samples, it synthesizes the kernel map using
machine__mmap_name() and ref_reloc_sym like "[kernel.kallsyms]_text".
You can easily see it using 'perf report -D' command.
After finishing record, it goes through the recorded events to find
maps/dsos actually used. And then record build-id info of them.
During this process, it needs to load symbols in a dso and it'd call
dso__load_vmlinux_path() since the default value of the symbol_conf.
try_vmlinux_path is true. However it changes dso->long_name to a real
path of the vmlinux file (e.g. /lib/modules/3.16.4/build/vmlinux) if one
is running on a custom kernel.
It resulted in that perf report reads the build-id of the vmlinux, but
cannot use it since it only knows about the [kernel.kallsyms] map. It
then falls back to possible vmlinux paths by using the recorded kernel
version (in case of a recent version) or a running kernel silently.
Even with the recent tools, this still has a possibility of breaking
the result. As the build directory is a symbolic link, if one built a
new kernel in the same directory with different source/config, the old
link to vmlinux will point the new file. So it's absolutely needed to
use build-id when finding a kernel image.
In this patch, it's now changed to try to search a kernel dso in the
existing dso list which was constructed during build-id table parsing
so it'll always have a build-id. If not found, search "[kernel.kallsyms]".
Before:
$ perf report
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ...............................
#
72.15% 0.00% swapper [kernel.kallsyms] [k] set_curr_task_rt
72.15% 0.00% swapper [kernel.kallsyms] [k] native_calibrate_tsc
72.15% 0.00% swapper [kernel.kallsyms] [k] tsc_refine_calibration_work
71.87% 71.87% swapper [kernel.kallsyms] [k] module_finalize
...
After (for the same perf.data):
72.15% 0.00% swapper vmlinux [k] cpu_startup_entry
72.15% 0.00% swapper vmlinux [k] arch_cpu_idle
72.15% 0.00% swapper vmlinux [k] default_idle
71.87% 71.87% swapper vmlinux [k] native_safe_halt
...
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/20140924073356.GB1962@gmail.com
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1415063674-17206-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-04 08:14:33 +07:00
|
|
|
kernel = dso;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2017-04-04 23:15:04 +07:00
|
|
|
up_read(&machine->dsos.lock);
|
2015-06-02 01:40:01 +07:00
|
|
|
|
perf tools: Fix build-id matching on vmlinux
There's a problem on finding correct kernel symbols when perf report
runs on a different kernel. Although a part of the problem was solved
by the prior commit 0a7e6d1b6844 ("perf tools: Check recorded kernel
version when finding vmlinux"), there's a remaining problem still.
When perf records samples, it synthesizes the kernel map using
machine__mmap_name() and ref_reloc_sym like "[kernel.kallsyms]_text".
You can easily see it using 'perf report -D' command.
After finishing record, it goes through the recorded events to find
maps/dsos actually used. And then record build-id info of them.
During this process, it needs to load symbols in a dso and it'd call
dso__load_vmlinux_path() since the default value of the symbol_conf.
try_vmlinux_path is true. However it changes dso->long_name to a real
path of the vmlinux file (e.g. /lib/modules/3.16.4/build/vmlinux) if one
is running on a custom kernel.
It resulted in that perf report reads the build-id of the vmlinux, but
cannot use it since it only knows about the [kernel.kallsyms] map. It
then falls back to possible vmlinux paths by using the recorded kernel
version (in case of a recent version) or a running kernel silently.
Even with the recent tools, this still has a possibility of breaking
the result. As the build directory is a symbolic link, if one built a
new kernel in the same directory with different source/config, the old
link to vmlinux will point the new file. So it's absolutely needed to
use build-id when finding a kernel image.
In this patch, it's now changed to try to search a kernel dso in the
existing dso list which was constructed during build-id table parsing
so it'll always have a build-id. If not found, search "[kernel.kallsyms]".
Before:
$ perf report
# Children Self Command Shared Object Symbol
# ........ ........ ....... ................. ...............................
#
72.15% 0.00% swapper [kernel.kallsyms] [k] set_curr_task_rt
72.15% 0.00% swapper [kernel.kallsyms] [k] native_calibrate_tsc
72.15% 0.00% swapper [kernel.kallsyms] [k] tsc_refine_calibration_work
71.87% 71.87% swapper [kernel.kallsyms] [k] module_finalize
...
After (for the same perf.data):
72.15% 0.00% swapper vmlinux [k] cpu_startup_entry
72.15% 0.00% swapper vmlinux [k] arch_cpu_idle
72.15% 0.00% swapper vmlinux [k] default_idle
71.87% 71.87% swapper vmlinux [k] native_safe_halt
...
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/20140924073356.GB1962@gmail.com
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1415063674-17206-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-04 08:14:33 +07:00
|
|
|
if (kernel == NULL)
|
2018-02-15 19:26:30 +07:00
|
|
|
kernel = machine__findnew_dso(machine, machine->mmap_name);
|
2012-10-07 02:26:02 +07:00
|
|
|
if (kernel == NULL)
|
|
|
|
goto out_problem;
|
|
|
|
|
|
|
|
kernel->kernel = kernel_type;
|
2015-06-02 21:53:26 +07:00
|
|
|
if (__machine__create_kernel_maps(machine, kernel) < 0) {
|
|
|
|
dso__put(kernel);
|
2012-10-07 02:26:02 +07:00
|
|
|
goto out_problem;
|
2015-06-02 21:53:26 +07:00
|
|
|
}
|
2012-10-07 02:26:02 +07:00
|
|
|
|
2014-11-18 11:30:28 +07:00
|
|
|
if (strstr(kernel->long_name, "vmlinux"))
|
|
|
|
dso__set_short_name(kernel, "[kernel.vmlinux]", false);
|
2014-11-04 08:14:34 +07:00
|
|
|
|
perf machine: Update kernel map address and re-order properly
Since commit 1fb87b8e9599 ("perf machine: Don't search for active kernel
start in __machine__create_kernel_maps"), the __machine__create_kernel_maps()
just create a map what start and end are both zero. Though the address will be
updated later, the order of map in the rbtree may be incorrect.
The commit ee05d21791db ("perf machine: Set main kernel end address properly")
fixed the logic in machine__create_kernel_maps(), but it's still wrong in
function machine__process_kernel_mmap_event().
To reproduce this issue, we need an environment which the module address
is before the kernel text segment. I tested it on an aarch64 machine with
kernel 4.19.25:
[root@localhost hulk]# grep _stext /proc/kallsyms
ffff000008081000 T _stext
[root@localhost hulk]# grep _etext /proc/kallsyms
ffff000009780000 R _etext
[root@localhost hulk]# tail /proc/modules
hisi_sas_v2_hw 77824 0 - Live 0xffff00000191d000
nvme_core 126976 7 nvme, Live 0xffff0000018b6000
mdio 20480 1 ixgbe, Live 0xffff0000018ab000
hisi_sas_main 106496 1 hisi_sas_v2_hw, Live 0xffff000001861000
hns_mdio 20480 2 - Live 0xffff000001822000
hnae 28672 3 hns_dsaf,hns_enet_drv, Live 0xffff000001815000
dm_mirror 40960 0 - Live 0xffff000001804000
dm_region_hash 32768 1 dm_mirror, Live 0xffff0000017f5000
dm_log 32768 2 dm_mirror,dm_region_hash, Live 0xffff0000017e7000
dm_mod 315392 17 dm_mirror,dm_log, Live 0xffff000001780000
[root@localhost hulk]#
Before fix:
[root@localhost bin]# perf record sleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ]
[root@localhost bin]# perf buildid-list -i perf.data
4c4e46c971ca935f781e603a09b52a92e8bdfee8 [vdso]
[root@localhost bin]# perf buildid-list -i perf.data -H
0000000000000000000000000000000000000000 /proc/kcore
[root@localhost bin]#
After fix:
[root@localhost tools]# ./perf/perf record sleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ]
[root@localhost tools]# ./perf/perf buildid-list -i perf.data
28a6c690262896dbd1b5e1011ed81623e6db0610 [kernel.kallsyms]
106c14ce6e4acea3453e484dc604d66666f08a2f [vdso]
[root@localhost tools]# ./perf/perf buildid-list -i perf.data -H
28a6c690262896dbd1b5e1011ed81623e6db0610 /proc/kcore
Signed-off-by: Wei Li <liwei391@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190228092003.34071-1-liwei391@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-28 16:20:03 +07:00
|
|
|
machine__update_kernel_mmap(machine, event->mmap.start,
|
2018-02-15 19:26:31 +07:00
|
|
|
event->mmap.start + event->mmap.len);
|
2012-10-07 02:26:02 +07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Avoid using a zero address (kptr_restrict) for the ref reloc
|
|
|
|
* symbol. Effectively having zero here means that at record
|
|
|
|
* time /proc/sys/kernel/kptr_restrict was non zero.
|
|
|
|
*/
|
|
|
|
if (event->mmap.pgoff != 0) {
|
2018-04-27 02:52:34 +07:00
|
|
|
map__set_kallsyms_ref_reloc_sym(machine->vmlinux_map,
|
|
|
|
symbol_name,
|
|
|
|
event->mmap.pgoff);
|
2012-10-07 02:26:02 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
if (machine__is_default_guest(machine)) {
|
|
|
|
/*
|
|
|
|
* preload dso of guest kernel and modules
|
|
|
|
*/
|
2016-09-02 05:25:52 +07:00
|
|
|
dso__load(kernel, machine__kernel_map(machine));
|
2012-10-07 02:26:02 +07:00
|
|
|
}
|
2018-05-22 17:54:37 +07:00
|
|
|
} else if (perf_event__is_extra_kernel_mmap(machine, event)) {
|
|
|
|
return machine__process_extra_kernel_map(machine, event);
|
2012-10-07 02:26:02 +07:00
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
out_problem:
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2013-08-21 17:10:25 +07:00
|
|
|
int machine__process_mmap2_event(struct machine *machine,
|
2013-09-11 21:18:24 +07:00
|
|
|
union perf_event *event,
|
2016-03-23 04:23:43 +07:00
|
|
|
struct perf_sample *sample)
|
2013-08-21 17:10:25 +07:00
|
|
|
{
|
|
|
|
struct thread *thread;
|
|
|
|
struct map *map;
|
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
if (dump_trace)
|
|
|
|
perf_event__fprintf_mmap2(event, stdout);
|
|
|
|
|
2016-03-23 04:23:43 +07:00
|
|
|
if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
|
|
|
|
sample->cpumode == PERF_RECORD_MISC_KERNEL) {
|
2013-08-21 17:10:25 +07:00
|
|
|
ret = machine__process_kernel_mmap_event(machine, event);
|
|
|
|
if (ret < 0)
|
|
|
|
goto out_problem;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
thread = machine__findnew_thread(machine, event->mmap2.pid,
|
2014-02-26 22:45:27 +07:00
|
|
|
event->mmap2.tid);
|
2013-08-21 17:10:25 +07:00
|
|
|
if (thread == NULL)
|
|
|
|
goto out_problem;
|
|
|
|
|
2014-07-22 20:17:53 +07:00
|
|
|
map = map__new(machine, event->mmap2.start,
|
2013-08-21 17:10:25 +07:00
|
|
|
event->mmap2.len, event->mmap2.pgoff,
|
2017-07-06 08:48:09 +07:00
|
|
|
event->mmap2.maj,
|
2013-08-21 17:10:25 +07:00
|
|
|
event->mmap2.min, event->mmap2.ino,
|
|
|
|
event->mmap2.ino_generation,
|
2014-05-20 02:13:49 +07:00
|
|
|
event->mmap2.prot,
|
|
|
|
event->mmap2.flags,
|
2018-04-27 02:52:34 +07:00
|
|
|
event->mmap2.filename, thread);
|
2013-08-21 17:10:25 +07:00
|
|
|
|
|
|
|
if (map == NULL)
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
goto out_problem_map;
|
2013-08-21 17:10:25 +07:00
|
|
|
|
2016-06-03 10:33:13 +07:00
|
|
|
ret = thread__insert_map(thread, map);
|
|
|
|
if (ret)
|
|
|
|
goto out_problem_insert;
|
|
|
|
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
thread__put(thread);
|
2015-05-26 02:59:56 +07:00
|
|
|
map__put(map);
|
2013-08-21 17:10:25 +07:00
|
|
|
return 0;
|
|
|
|
|
2016-06-03 10:33:13 +07:00
|
|
|
out_problem_insert:
|
|
|
|
map__put(map);
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
out_problem_map:
|
|
|
|
thread__put(thread);
|
2013-08-21 17:10:25 +07:00
|
|
|
out_problem:
|
|
|
|
dump_printf("problem processing PERF_RECORD_MMAP2, skipping event.\n");
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-09-11 21:18:24 +07:00
|
|
|
int machine__process_mmap_event(struct machine *machine, union perf_event *event,
|
2016-03-23 04:23:43 +07:00
|
|
|
struct perf_sample *sample)
|
2012-10-07 02:26:02 +07:00
|
|
|
{
|
|
|
|
struct thread *thread;
|
|
|
|
struct map *map;
|
2018-04-26 21:30:50 +07:00
|
|
|
u32 prot = 0;
|
2012-10-07 02:26:02 +07:00
|
|
|
int ret = 0;
|
|
|
|
|
|
|
|
if (dump_trace)
|
|
|
|
perf_event__fprintf_mmap(event, stdout);
|
|
|
|
|
2016-03-23 04:23:43 +07:00
|
|
|
if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
|
|
|
|
sample->cpumode == PERF_RECORD_MISC_KERNEL) {
|
2012-10-07 02:26:02 +07:00
|
|
|
ret = machine__process_kernel_mmap_event(machine, event);
|
|
|
|
if (ret < 0)
|
|
|
|
goto out_problem;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-08-27 15:23:03 +07:00
|
|
|
thread = machine__findnew_thread(machine, event->mmap.pid,
|
2014-02-26 22:45:27 +07:00
|
|
|
event->mmap.tid);
|
2012-10-07 02:26:02 +07:00
|
|
|
if (thread == NULL)
|
|
|
|
goto out_problem;
|
2013-01-24 22:10:40 +07:00
|
|
|
|
2018-04-27 02:52:34 +07:00
|
|
|
if (!(event->header.misc & PERF_RECORD_MISC_MMAP_DATA))
|
2018-04-26 21:30:50 +07:00
|
|
|
prot = PROT_EXEC;
|
2013-01-24 22:10:40 +07:00
|
|
|
|
2014-07-22 20:17:53 +07:00
|
|
|
map = map__new(machine, event->mmap.start,
|
2012-10-07 02:26:02 +07:00
|
|
|
event->mmap.len, event->mmap.pgoff,
|
2018-04-26 21:30:50 +07:00
|
|
|
0, 0, 0, 0, prot, 0,
|
2013-08-21 17:10:25 +07:00
|
|
|
event->mmap.filename,
|
2018-04-27 02:52:34 +07:00
|
|
|
thread);
|
2013-01-24 22:10:40 +07:00
|
|
|
|
2012-10-07 02:26:02 +07:00
|
|
|
if (map == NULL)
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
goto out_problem_map;
|
2012-10-07 02:26:02 +07:00
|
|
|
|
2016-06-03 10:33:13 +07:00
|
|
|
ret = thread__insert_map(thread, map);
|
|
|
|
if (ret)
|
|
|
|
goto out_problem_insert;
|
|
|
|
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
thread__put(thread);
|
2015-05-26 02:59:56 +07:00
|
|
|
map__put(map);
|
2012-10-07 02:26:02 +07:00
|
|
|
return 0;
|
|
|
|
|
2016-06-03 10:33:13 +07:00
|
|
|
out_problem_insert:
|
|
|
|
map__put(map);
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
out_problem_map:
|
|
|
|
thread__put(thread);
|
2012-10-07 02:26:02 +07:00
|
|
|
out_problem:
|
|
|
|
dump_printf("problem processing PERF_RECORD_MMAP, skipping event.\n");
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
static void __machine__remove_thread(struct machine *machine, struct thread *th, bool lock)
|
2013-08-14 21:49:27 +07:00
|
|
|
{
|
2017-09-11 09:23:14 +07:00
|
|
|
struct threads *threads = machine__threads(machine, th->tid);
|
|
|
|
|
|
|
|
if (threads->last_match == th)
|
2018-07-19 21:33:43 +07:00
|
|
|
threads__set_last_match(threads, NULL);
|
2015-03-03 08:21:35 +07:00
|
|
|
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
if (lock)
|
2017-04-04 23:15:04 +07:00
|
|
|
down_write(&threads->lock);
|
perf thread: Allow references to thread objects after machine__exit()
Threads are created when we either synthesize PERF_RECORD_FORK events
for pre-existing threads or when we receive PERF_RECORD_FORK events from
the kernel as new threads get created.
We then keep them in machine->threads[].entries rb trees till when we
receive a PERF_RECORD_EXIT, i.e. that thread terminated.
The thread object has a reference count that is grabbed when, for
instance, we keep that thread referenced in struct hist_entry, in 'perf
report' and 'perf top'.
When we receive a PERF_RECORD_EXIT we remove the thread object from the
rb tree and move it to the corresponding machine->threads[].dead list,
then we do a thread__put(), dropping the reference we had for keeping it
in the rb tree.
In thread__put() we were assuming that when the reference count hit zero
we should remove it from the dead list by simply doing a
list_del_init(&thread->node).
That works well when all the thread lifetime is during the machine that
has the list heads lifetime, since we know that we can do the
list_del_init() and it will update the 'dead' list_head.
But in 'perf sched lat' we were doing:
machine__new() (via perf_session__new)
process events, grabbing refcounts to keep those thread objects
in 'perf sched' local data structures.
machine__exit() (via perf_session__delete) which would delete the
'dead' list heads.
And then doing the final thread__put() for the refcounts 'perf sched'
rightfully obtained for keeping those thread object references.
b00m, since thread__put() would do the list_del_init() touching
a dead dead list head.
Fix it by removing all the dead threads from machine->threads[].dead at
machine__exit(), since whatever is there should have refcounts taken by
things like 'perf sched lat', and make thread__put() check if the thread
is in a linked list before removing it from that list.
Reported-by: Wei Li <liwei391@huawei.com>
Link: https://lkml.kernel.org/r/20190508143648.8153-1-liwei391@huawei.com
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhipeng Xie <xiezhipeng1@huawei.com>
Link: https://lkml.kernel.org/r/20190704194355.GI10740@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-05 22:11:35 +07:00
|
|
|
|
|
|
|
BUG_ON(refcount_read(&th->refcnt) == 0);
|
|
|
|
|
2018-12-07 02:18:14 +07:00
|
|
|
rb_erase_cached(&th->rb_node, &threads->entries);
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
RB_CLEAR_NODE(&th->rb_node);
|
2017-09-11 09:23:14 +07:00
|
|
|
--threads->nr;
|
2013-08-14 21:49:27 +07:00
|
|
|
/*
|
2015-03-03 08:21:35 +07:00
|
|
|
* Move it first to the dead_threads list, then drop the reference,
|
|
|
|
* if this is the last reference, then the thread__delete destructor
|
|
|
|
* will be called and we will remove it from the dead_threads list.
|
2013-08-14 21:49:27 +07:00
|
|
|
*/
|
2017-09-11 09:23:14 +07:00
|
|
|
list_add_tail(&th->node, &threads->dead);
|
perf thread: Allow references to thread objects after machine__exit()
Threads are created when we either synthesize PERF_RECORD_FORK events
for pre-existing threads or when we receive PERF_RECORD_FORK events from
the kernel as new threads get created.
We then keep them in machine->threads[].entries rb trees till when we
receive a PERF_RECORD_EXIT, i.e. that thread terminated.
The thread object has a reference count that is grabbed when, for
instance, we keep that thread referenced in struct hist_entry, in 'perf
report' and 'perf top'.
When we receive a PERF_RECORD_EXIT we remove the thread object from the
rb tree and move it to the corresponding machine->threads[].dead list,
then we do a thread__put(), dropping the reference we had for keeping it
in the rb tree.
In thread__put() we were assuming that when the reference count hit zero
we should remove it from the dead list by simply doing a
list_del_init(&thread->node).
That works well when all the thread lifetime is during the machine that
has the list heads lifetime, since we know that we can do the
list_del_init() and it will update the 'dead' list_head.
But in 'perf sched lat' we were doing:
machine__new() (via perf_session__new)
process events, grabbing refcounts to keep those thread objects
in 'perf sched' local data structures.
machine__exit() (via perf_session__delete) which would delete the
'dead' list heads.
And then doing the final thread__put() for the refcounts 'perf sched'
rightfully obtained for keeping those thread object references.
b00m, since thread__put() would do the list_del_init() touching
a dead dead list head.
Fix it by removing all the dead threads from machine->threads[].dead at
machine__exit(), since whatever is there should have refcounts taken by
things like 'perf sched lat', and make thread__put() check if the thread
is in a linked list before removing it from that list.
Reported-by: Wei Li <liwei391@huawei.com>
Link: https://lkml.kernel.org/r/20190508143648.8153-1-liwei391@huawei.com
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhipeng Xie <xiezhipeng1@huawei.com>
Link: https://lkml.kernel.org/r/20190704194355.GI10740@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-05 22:11:35 +07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We need to do the put here because if this is the last refcount,
|
|
|
|
* then we will be touching the threads->dead head when removing the
|
|
|
|
* thread.
|
|
|
|
*/
|
|
|
|
thread__put(th);
|
|
|
|
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
if (lock)
|
2017-04-04 23:15:04 +07:00
|
|
|
up_write(&threads->lock);
|
2013-08-14 21:49:27 +07:00
|
|
|
}
|
|
|
|
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
void machine__remove_thread(struct machine *machine, struct thread *th)
|
|
|
|
{
|
|
|
|
return __machine__remove_thread(machine, th, true);
|
|
|
|
}
|
|
|
|
|
2013-09-11 21:18:24 +07:00
|
|
|
int machine__process_fork_event(struct machine *machine, union perf_event *event,
|
|
|
|
struct perf_sample *sample)
|
2012-10-07 02:26:02 +07:00
|
|
|
{
|
2014-03-14 21:00:03 +07:00
|
|
|
struct thread *thread = machine__find_thread(machine,
|
|
|
|
event->fork.pid,
|
|
|
|
event->fork.tid);
|
2013-08-27 15:23:03 +07:00
|
|
|
struct thread *parent = machine__findnew_thread(machine,
|
|
|
|
event->fork.ppid,
|
|
|
|
event->fork.ptid);
|
2018-10-31 12:24:04 +07:00
|
|
|
bool do_maps_clone = true;
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
int err = 0;
|
2012-10-07 02:26:02 +07:00
|
|
|
|
2015-08-19 21:29:20 +07:00
|
|
|
if (dump_trace)
|
|
|
|
perf_event__fprintf_task(event, stdout);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* There may be an existing thread that is not actually the parent,
|
|
|
|
* either because we are processing events out of order, or because the
|
|
|
|
* (fork) event that would have removed the thread was lost. Assume the
|
|
|
|
* latter case and continue on as best we can.
|
|
|
|
*/
|
|
|
|
if (parent->pid_ != (pid_t)event->fork.ppid) {
|
|
|
|
dump_printf("removing erroneous parent thread %d/%d\n",
|
|
|
|
parent->pid_, parent->tid);
|
|
|
|
machine__remove_thread(machine, parent);
|
|
|
|
thread__put(parent);
|
|
|
|
parent = machine__findnew_thread(machine, event->fork.ppid,
|
|
|
|
event->fork.ptid);
|
|
|
|
}
|
|
|
|
|
2013-08-14 21:49:27 +07:00
|
|
|
/* if a thread currently exists for the thread id remove it */
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
if (thread != NULL) {
|
2013-08-14 21:49:27 +07:00
|
|
|
machine__remove_thread(machine, thread);
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
thread__put(thread);
|
|
|
|
}
|
2013-08-14 21:49:27 +07:00
|
|
|
|
2013-08-27 15:23:03 +07:00
|
|
|
thread = machine__findnew_thread(machine, event->fork.pid,
|
|
|
|
event->fork.tid);
|
2018-10-31 12:24:04 +07:00
|
|
|
/*
|
|
|
|
* When synthesizing FORK events, we are trying to create thread
|
|
|
|
* objects for the already running tasks on the machine.
|
|
|
|
*
|
|
|
|
* Normally, for a kernel FORK event, we want to clone the parent's
|
|
|
|
* maps because that is what the kernel just did.
|
|
|
|
*
|
|
|
|
* But when synthesizing, this should not be done. If we do, we end up
|
|
|
|
* with overlapping maps as we process the sythesized MMAP2 events that
|
|
|
|
* get delivered shortly thereafter.
|
|
|
|
*
|
|
|
|
* Use the FORK event misc flags in an internal way to signal this
|
|
|
|
* situation, so we can elide the map clone when appropriate.
|
|
|
|
*/
|
|
|
|
if (event->fork.header.misc & PERF_RECORD_MISC_FORK_EXEC)
|
|
|
|
do_maps_clone = false;
|
2012-10-07 02:26:02 +07:00
|
|
|
|
|
|
|
if (thread == NULL || parent == NULL ||
|
2018-10-31 12:24:04 +07:00
|
|
|
thread__fork(thread, parent, sample->time, do_maps_clone) < 0) {
|
2012-10-07 02:26:02 +07:00
|
|
|
dump_printf("problem processing PERF_RECORD_FORK, skipping event.\n");
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
err = -1;
|
2012-10-07 02:26:02 +07:00
|
|
|
}
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
thread__put(thread);
|
|
|
|
thread__put(parent);
|
2012-10-07 02:26:02 +07:00
|
|
|
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
return err;
|
2012-10-07 02:26:02 +07:00
|
|
|
}
|
|
|
|
|
2013-09-11 21:18:24 +07:00
|
|
|
int machine__process_exit_event(struct machine *machine, union perf_event *event,
|
|
|
|
struct perf_sample *sample __maybe_unused)
|
2012-10-07 02:26:02 +07:00
|
|
|
{
|
2014-03-14 21:00:03 +07:00
|
|
|
struct thread *thread = machine__find_thread(machine,
|
|
|
|
event->fork.pid,
|
|
|
|
event->fork.tid);
|
2012-10-07 02:26:02 +07:00
|
|
|
|
|
|
|
if (dump_trace)
|
|
|
|
perf_event__fprintf_task(event, stdout);
|
|
|
|
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
if (thread != NULL) {
|
2013-08-14 21:49:27 +07:00
|
|
|
thread__exited(thread);
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
thread__put(thread);
|
|
|
|
}
|
2012-10-07 02:26:02 +07:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2013-09-11 21:18:24 +07:00
|
|
|
int machine__process_event(struct machine *machine, union perf_event *event,
|
|
|
|
struct perf_sample *sample)
|
2012-10-07 02:26:02 +07:00
|
|
|
{
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
switch (event->header.type) {
|
|
|
|
case PERF_RECORD_COMM:
|
2013-09-11 21:18:24 +07:00
|
|
|
ret = machine__process_comm_event(machine, event, sample); break;
|
2012-10-07 02:26:02 +07:00
|
|
|
case PERF_RECORD_MMAP:
|
2013-09-11 21:18:24 +07:00
|
|
|
ret = machine__process_mmap_event(machine, event, sample); break;
|
perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info
Introduce a new option to record PERF_RECORD_NAMESPACES events emitted
by the kernel when fork, clone, setns or unshare are invoked. And update
perf-record documentation with the new option to record namespace
events.
Committer notes:
Combined it with a later patch to allow printing it via 'perf report -D'
and be able to test the feature introduced in this patch. Had to move
here also perf_ns__name(), that was introduced in another later patch.
Also used PRIu64 and PRIx64 to fix the build in some enfironments wrt:
util/event.c:1129:39: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'long long unsigned int' [-Werror=format=]
ret += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx
^
Testing it:
# perf record --namespaces -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.083 MB perf.data (423 samples) ]
#
# perf report -D
<SNIP>
3 2028902078892 0x115140 [0xa0]: PERF_RECORD_NAMESPACES 14783/14783 - nr_namespaces: 7
[0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]
0x1151e0 [0x30]: event: 9
.
. ... raw event: size 48 bytes
. 0000: 09 00 00 00 02 00 30 00 c4 71 82 68 0c 7f 00 00 ......0..q.h....
. 0010: a9 39 00 00 a9 39 00 00 94 28 fe 63 d8 01 00 00 .9...9...(.c....
. 0020: 03 00 00 00 00 00 00 00 ce c4 02 00 00 00 00 00 ................
<SNIP>
NAMESPACES events: 1
<SNIP>
#
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891930386.25309.18412039920746995488.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-08 03:41:43 +07:00
|
|
|
case PERF_RECORD_NAMESPACES:
|
|
|
|
ret = machine__process_namespaces_event(machine, event, sample); break;
|
2013-08-21 17:10:25 +07:00
|
|
|
case PERF_RECORD_MMAP2:
|
2013-09-11 21:18:24 +07:00
|
|
|
ret = machine__process_mmap2_event(machine, event, sample); break;
|
2012-10-07 02:26:02 +07:00
|
|
|
case PERF_RECORD_FORK:
|
2013-09-11 21:18:24 +07:00
|
|
|
ret = machine__process_fork_event(machine, event, sample); break;
|
2012-10-07 02:26:02 +07:00
|
|
|
case PERF_RECORD_EXIT:
|
2013-09-11 21:18:24 +07:00
|
|
|
ret = machine__process_exit_event(machine, event, sample); break;
|
2012-10-07 02:26:02 +07:00
|
|
|
case PERF_RECORD_LOST:
|
2013-09-11 21:18:24 +07:00
|
|
|
ret = machine__process_lost_event(machine, event, sample); break;
|
2015-04-30 21:37:29 +07:00
|
|
|
case PERF_RECORD_AUX:
|
|
|
|
ret = machine__process_aux_event(machine, event); break;
|
2015-04-30 21:37:30 +07:00
|
|
|
case PERF_RECORD_ITRACE_START:
|
2015-06-29 18:27:45 +07:00
|
|
|
ret = machine__process_itrace_start_event(machine, event); break;
|
2015-05-11 02:13:15 +07:00
|
|
|
case PERF_RECORD_LOST_SAMPLES:
|
|
|
|
ret = machine__process_lost_samples_event(machine, event, sample); break;
|
2015-07-21 16:44:03 +07:00
|
|
|
case PERF_RECORD_SWITCH:
|
|
|
|
case PERF_RECORD_SWITCH_CPU_WIDE:
|
|
|
|
ret = machine__process_switch_event(machine, event); break;
|
2019-01-17 23:15:17 +07:00
|
|
|
case PERF_RECORD_KSYMBOL:
|
|
|
|
ret = machine__process_ksymbol(machine, event, sample); break;
|
perf tools: Handle PERF_RECORD_BPF_EVENT
This patch adds basic handling of PERF_RECORD_BPF_EVENT. Tracking of
PERF_RECORD_BPF_EVENT is OFF by default. Option --bpf-event is added to
turn it on.
Committer notes:
Add dummy machine__process_bpf_event() variant that returns zero for
systems without HAVE_LIBBPF_SUPPORT, such as Alpine Linux, unbreaking
the build in such systems.
Remove the needless include <machine.h> from bpf->event.h, provide just
forward declarations for the structs and unions in the parameters, to
reduce compilation time and needless rebuilds when machine.h gets
changed.
Committer testing:
When running with:
# perf record --bpf-event
On an older kernel where PERF_RECORD_BPF_EVENT and PERF_RECORD_KSYMBOL
is not present, we fallback to removing those two bits from
perf_event_attr, making the tool to continue to work on older kernels:
perf_event_attr:
size 112
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|PERIOD
read_format ID
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
precise_ip 3
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
------------------------------------------------------------
sys_perf_event_open: pid 5779 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open failed, error -22
switching off bpf_event
------------------------------------------------------------
perf_event_attr:
size 112
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|PERIOD
read_format ID
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
precise_ip 3
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
ksymbol 1
------------------------------------------------------------
sys_perf_event_open: pid 5779 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open failed, error -22
switching off ksymbol
------------------------------------------------------------
perf_event_attr:
size 112
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|PERIOD
read_format ID
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
precise_ip 3
sample_id_all 1
exclude_guest 1
mmap2 1
comm_exec 1
------------------------------------------------------------
And then proceeds to work without those two features.
As passing --bpf-event is an explicit action performed by the user, perhaps we
should emit a warning telling that the kernel has no such feature, but this can
be done on top of this patch.
Now with a kernel that supports these events, start the 'record --bpf-event -a'
and then run 'perf trace sleep 10000' that will use the BPF
augmented_raw_syscalls.o prebuilt (for another kernel version even) and thus
should generate PERF_RECORD_BPF_EVENT events:
[root@quaco ~]# perf record -e dummy -a --bpf-event
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.713 MB perf.data ]
[root@quaco ~]# bpftool prog
13: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 13,14
14: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 13,14
15: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 15,16
16: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:43-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 15,16
17: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:44-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 17,18
18: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:44-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 17,18
21: cgroup_skb tag 7be49e3934a125ba gpl
loaded_at 2019-01-19T09:09:45-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 21,22
22: cgroup_skb tag 2a142ef67aaad174 gpl
loaded_at 2019-01-19T09:09:45-0300 uid 0
xlated 296B jited 229B memlock 4096B map_ids 21,22
31: tracepoint name sys_enter tag 12504ba9402f952f gpl
loaded_at 2019-01-19T09:19:56-0300 uid 0
xlated 512B jited 374B memlock 4096B map_ids 30,29,28
32: tracepoint name sys_exit tag c1bd85c092d6e4aa gpl
loaded_at 2019-01-19T09:19:56-0300 uid 0
xlated 256B jited 191B memlock 4096B map_ids 30,29
# perf report -D | grep PERF_RECORD_BPF_EVENT | nl
1 0 55834574849 0x4fc8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 13
2 0 60129542145 0x5118 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 14
3 0 64424509441 0x5268 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 15
4 0 68719476737 0x53b8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 16
5 0 73014444033 0x5508 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 17
6 0 77309411329 0x5658 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 18
7 0 90194313217 0x57a8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 21
8 0 94489280513 0x58f8 [0x18]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 22
9 7 620922484360 0xb6390 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 29
10 7 620922486018 0xb6410 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 2, flags 0, id 29
11 7 620922579199 0xb6490 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 30
12 7 620922580240 0xb6510 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 2, flags 0, id 30
13 7 620922765207 0xb6598 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 31
14 7 620922874543 0xb6620 [0x30]: PERF_RECORD_BPF_EVENT bpf event with type 1, flags 0, id 32
#
There, the 31 and 32 tracepoint BPF programs put in place by 'perf trace'.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@fb.com
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20190117161521.1341602-7-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-17 23:15:18 +07:00
|
|
|
case PERF_RECORD_BPF_EVENT:
|
2019-08-27 05:28:13 +07:00
|
|
|
ret = machine__process_bpf(machine, event, sample); break;
|
2012-10-07 02:26:02 +07:00
|
|
|
default:
|
|
|
|
ret = -1;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
2012-12-08 03:39:39 +07:00
|
|
|
|
2012-12-07 12:48:05 +07:00
|
|
|
static bool symbol__match_regex(struct symbol *sym, regex_t *regex)
|
2012-12-08 03:39:39 +07:00
|
|
|
{
|
2017-02-14 02:52:15 +07:00
|
|
|
if (!regexec(regex, sym->name, 0, NULL, 0))
|
2012-12-08 03:39:39 +07:00
|
|
|
return 1;
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2014-10-23 22:50:25 +07:00
|
|
|
static void ip__resolve_ams(struct thread *thread,
|
2012-12-08 03:39:39 +07:00
|
|
|
struct addr_map_symbol *ams,
|
|
|
|
u64 ip)
|
|
|
|
{
|
|
|
|
struct addr_location al;
|
|
|
|
|
|
|
|
memset(&al, 0, sizeof(al));
|
2014-03-12 02:16:49 +07:00
|
|
|
/*
|
|
|
|
* We cannot use the header.misc hint to determine whether a
|
|
|
|
* branch stack address is user, kernel, guest, hypervisor.
|
|
|
|
* Branches may straddle the kernel/user/hypervisor boundaries.
|
|
|
|
* Thus, we have to try consecutively until we find a match
|
|
|
|
* or else, the symbol is unknown
|
|
|
|
*/
|
2018-04-26 03:58:03 +07:00
|
|
|
thread__find_cpumode_addr_location(thread, ip, &al);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
|
|
|
ams->addr = ip;
|
|
|
|
ams->al_addr = al.addr;
|
2019-11-05 02:02:35 +07:00
|
|
|
ams->ms.mg = al.mg;
|
2019-11-05 01:57:38 +07:00
|
|
|
ams->ms.sym = al.sym;
|
|
|
|
ams->ms.map = al.map;
|
2017-08-30 00:11:09 +07:00
|
|
|
ams->phys_addr = 0;
|
2012-12-08 03:39:39 +07:00
|
|
|
}
|
|
|
|
|
2014-10-23 22:50:25 +07:00
|
|
|
static void ip__resolve_data(struct thread *thread,
|
2017-08-30 00:11:09 +07:00
|
|
|
u8 m, struct addr_map_symbol *ams,
|
|
|
|
u64 addr, u64 phys_addr)
|
2013-01-24 22:10:35 +07:00
|
|
|
{
|
|
|
|
struct addr_location al;
|
|
|
|
|
|
|
|
memset(&al, 0, sizeof(al));
|
|
|
|
|
2018-04-26 04:16:53 +07:00
|
|
|
thread__find_symbol(thread, m, addr, &al);
|
2014-08-21 10:25:11 +07:00
|
|
|
|
2013-01-24 22:10:35 +07:00
|
|
|
ams->addr = addr;
|
|
|
|
ams->al_addr = al.addr;
|
2019-11-05 02:02:35 +07:00
|
|
|
ams->ms.mg = al.mg;
|
2019-11-05 01:57:38 +07:00
|
|
|
ams->ms.sym = al.sym;
|
|
|
|
ams->ms.map = al.map;
|
2017-08-30 00:11:09 +07:00
|
|
|
ams->phys_addr = phys_addr;
|
2013-01-24 22:10:35 +07:00
|
|
|
}
|
|
|
|
|
2014-01-22 23:05:06 +07:00
|
|
|
struct mem_info *sample__resolve_mem(struct perf_sample *sample,
|
|
|
|
struct addr_location *al)
|
2013-01-24 22:10:35 +07:00
|
|
|
{
|
2018-03-07 22:50:06 +07:00
|
|
|
struct mem_info *mi = mem_info__new();
|
2013-01-24 22:10:35 +07:00
|
|
|
|
|
|
|
if (!mi)
|
|
|
|
return NULL;
|
|
|
|
|
2014-10-23 22:50:25 +07:00
|
|
|
ip__resolve_ams(al->thread, &mi->iaddr, sample->ip);
|
2017-08-30 00:11:09 +07:00
|
|
|
ip__resolve_data(al->thread, al->cpumode, &mi->daddr,
|
|
|
|
sample->addr, sample->phys_addr);
|
2013-01-24 22:10:35 +07:00
|
|
|
mi->data_src.val = sample->data_src;
|
|
|
|
|
|
|
|
return mi;
|
|
|
|
}
|
|
|
|
|
2019-11-04 22:14:32 +07:00
|
|
|
static char *callchain_srcline(struct map_symbol *ms, u64 ip)
|
2017-10-10 03:32:56 +07:00
|
|
|
{
|
2019-11-04 22:14:32 +07:00
|
|
|
struct map *map = ms->map;
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
char *srcline = NULL;
|
|
|
|
|
2017-10-10 03:32:56 +07:00
|
|
|
if (!map || callchain_param.key == CCKEY_FUNCTION)
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
return srcline;
|
|
|
|
|
|
|
|
srcline = srcline__tree_find(&map->dso->srclines, ip);
|
|
|
|
if (!srcline) {
|
|
|
|
bool show_sym = false;
|
|
|
|
bool show_addr = callchain_param.key == CCKEY_ADDRESS;
|
|
|
|
|
|
|
|
srcline = get_srcline(map->dso, map__rip_2objdump(map, ip),
|
2019-11-04 22:14:32 +07:00
|
|
|
ms->sym, show_sym, show_addr, ip);
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
srcline__tree_insert(&map->dso->srclines, ip, srcline);
|
|
|
|
}
|
2017-10-10 03:32:56 +07:00
|
|
|
|
perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:
Before:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
52496.495043 task-clock (msec) # 0.999 CPUs utilized
634 context-switches # 0.012 K/sec
2 cpu-migrations # 0.000 K/sec
191,561 page-faults # 0.004 M/sec
165,074,498,235 cycles # 3.144 GHz
334,170,832,408 instructions # 2.02 insn per cycle
90,220,029,745 branches # 1718.591 M/sec
654,525,177 branch-misses # 0.73% of all branches
52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!
After:
Performance counter stats for 'perf report -s srcline -g srcline --stdio':
22606.323706 task-clock (msec) # 1.000 CPUs utilized
31 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
185,471 page-faults # 0.008 M/sec
71,188,113,681 cycles # 3.149 GHz
133,204,943,083 instructions # 1.87 insn per cycle
34,886,384,979 branches # 1543.214 M/sec
278,214,495 branch-misses # 0.80% of all branches
22.609857253 seconds time elapsed
Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.
I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:34 +07:00
|
|
|
return srcline;
|
2017-10-10 03:32:56 +07:00
|
|
|
}
|
|
|
|
|
2017-08-07 20:05:15 +07:00
|
|
|
struct iterations {
|
|
|
|
int nr_loop_iter;
|
|
|
|
u64 cycles;
|
|
|
|
};
|
|
|
|
|
2014-11-13 09:05:19 +07:00
|
|
|
static int add_callchain_ip(struct thread *thread,
|
2016-04-15 00:48:07 +07:00
|
|
|
struct callchain_cursor *cursor,
|
2014-11-13 09:05:19 +07:00
|
|
|
struct symbol **parent,
|
|
|
|
struct addr_location *root_al,
|
2015-03-30 15:11:00 +07:00
|
|
|
u8 *cpumode,
|
perf report: Add branch flag to callchain cursor node
Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.
Then we can know if the cursor node represents a branch and know what
the branch flag it has.
The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.
For example:
Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800
After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800
The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples
This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-31 08:19:49 +07:00
|
|
|
u64 ip,
|
|
|
|
bool branch,
|
|
|
|
struct branch_flags *flags,
|
2017-08-07 20:05:15 +07:00
|
|
|
struct iterations *iter,
|
2017-07-18 19:13:15 +07:00
|
|
|
u64 branch_from)
|
2014-11-13 09:05:19 +07:00
|
|
|
{
|
2019-11-04 22:14:32 +07:00
|
|
|
struct map_symbol ms;
|
2014-11-13 09:05:19 +07:00
|
|
|
struct addr_location al;
|
2017-08-07 20:05:15 +07:00
|
|
|
int nr_loop_iter = 0;
|
|
|
|
u64 iter_cycles = 0;
|
2017-10-10 03:32:56 +07:00
|
|
|
const char *srcline = NULL;
|
2014-11-13 09:05:19 +07:00
|
|
|
|
|
|
|
al.filtered = 0;
|
|
|
|
al.sym = NULL;
|
2015-03-30 15:11:00 +07:00
|
|
|
if (!cpumode) {
|
2018-04-26 03:58:03 +07:00
|
|
|
thread__find_cpumode_addr_location(thread, ip, &al);
|
2015-03-30 15:11:00 +07:00
|
|
|
} else {
|
2014-12-02 22:06:53 +07:00
|
|
|
if (ip >= PERF_CONTEXT_MAX) {
|
|
|
|
switch (ip) {
|
|
|
|
case PERF_CONTEXT_HV:
|
2015-03-30 15:11:00 +07:00
|
|
|
*cpumode = PERF_RECORD_MISC_HYPERVISOR;
|
2014-12-02 22:06:53 +07:00
|
|
|
break;
|
|
|
|
case PERF_CONTEXT_KERNEL:
|
2015-03-30 15:11:00 +07:00
|
|
|
*cpumode = PERF_RECORD_MISC_KERNEL;
|
2014-12-02 22:06:53 +07:00
|
|
|
break;
|
|
|
|
case PERF_CONTEXT_USER:
|
2015-03-30 15:11:00 +07:00
|
|
|
*cpumode = PERF_RECORD_MISC_USER;
|
2014-12-02 22:06:53 +07:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
pr_debug("invalid callchain context: "
|
|
|
|
"%"PRId64"\n", (s64) ip);
|
|
|
|
/*
|
|
|
|
* It seems the callchain is corrupted.
|
|
|
|
* Discard all.
|
|
|
|
*/
|
2016-04-15 00:48:07 +07:00
|
|
|
callchain_cursor_reset(cursor);
|
2014-12-02 22:06:53 +07:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
2018-04-24 21:24:49 +07:00
|
|
|
thread__find_symbol(thread, *cpumode, ip, &al);
|
2014-12-02 22:06:53 +07:00
|
|
|
}
|
|
|
|
|
2014-11-13 09:05:19 +07:00
|
|
|
if (al.sym != NULL) {
|
2016-05-03 18:54:43 +07:00
|
|
|
if (perf_hpp_list.parent && !*parent &&
|
2014-11-13 09:05:19 +07:00
|
|
|
symbol__match_regex(al.sym, &parent_regex))
|
|
|
|
*parent = al.sym;
|
|
|
|
else if (have_ignore_callees && root_al &&
|
|
|
|
symbol__match_regex(al.sym, &ignore_callees_regex)) {
|
|
|
|
/* Treat this symbol as the root,
|
|
|
|
forgetting its callees. */
|
|
|
|
*root_al = al;
|
2016-04-15 00:48:07 +07:00
|
|
|
callchain_cursor_reset(cursor);
|
2014-11-13 09:05:19 +07:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-11-26 14:08:20 +07:00
|
|
|
if (symbol_conf.hide_unresolved && al.sym == NULL)
|
|
|
|
return 0;
|
2017-08-07 20:05:15 +07:00
|
|
|
|
|
|
|
if (iter) {
|
|
|
|
nr_loop_iter = iter->nr_loop_iter;
|
|
|
|
iter_cycles = iter->cycles;
|
|
|
|
}
|
|
|
|
|
2019-11-05 02:02:35 +07:00
|
|
|
ms.mg = al.mg;
|
2019-11-04 22:14:32 +07:00
|
|
|
ms.map = al.map;
|
|
|
|
ms.sym = al.sym;
|
|
|
|
srcline = callchain_srcline(&ms, al.addr);
|
|
|
|
return callchain_cursor_append(cursor, ip, &ms,
|
2017-08-07 20:05:15 +07:00
|
|
|
branch, flags, nr_loop_iter,
|
2017-10-10 03:32:56 +07:00
|
|
|
iter_cycles, branch_from, srcline);
|
2014-11-13 09:05:19 +07:00
|
|
|
}
|
|
|
|
|
2014-01-22 23:15:36 +07:00
|
|
|
struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
|
|
|
|
struct addr_location *al)
|
2012-12-08 03:39:39 +07:00
|
|
|
{
|
|
|
|
unsigned int i;
|
2014-01-22 23:15:36 +07:00
|
|
|
const struct branch_stack *bs = sample->branch_stack;
|
|
|
|
struct branch_info *bi = calloc(bs->nr, sizeof(struct branch_info));
|
2012-12-08 03:39:39 +07:00
|
|
|
|
|
|
|
if (!bi)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
for (i = 0; i < bs->nr; i++) {
|
2014-10-23 22:50:25 +07:00
|
|
|
ip__resolve_ams(al->thread, &bi[i].to, bs->entries[i].to);
|
|
|
|
ip__resolve_ams(al->thread, &bi[i].from, bs->entries[i].from);
|
2012-12-08 03:39:39 +07:00
|
|
|
bi[i].flags = bs->entries[i].flags;
|
|
|
|
}
|
|
|
|
return bi;
|
|
|
|
}
|
|
|
|
|
2017-08-07 20:05:15 +07:00
|
|
|
static void save_iterations(struct iterations *iter,
|
|
|
|
struct branch_entry *be, int nr)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
2019-01-04 13:10:30 +07:00
|
|
|
iter->nr_loop_iter++;
|
2017-08-07 20:05:15 +07:00
|
|
|
iter->cycles = 0;
|
|
|
|
|
|
|
|
for (i = 0; i < nr; i++)
|
|
|
|
iter->cycles += be[i].flags.cycles;
|
|
|
|
}
|
|
|
|
|
perf callchain: Support handling complete branch stacks as histograms
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.
This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.
This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.
Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:
tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}
% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12
The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.
This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR
Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)
v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-13 09:05:20 +07:00
|
|
|
#define CHASHSZ 127
|
|
|
|
#define CHASHBITS 7
|
|
|
|
#define NO_ENTRY 0xff
|
|
|
|
|
|
|
|
#define PERF_MAX_BRANCH_DEPTH 127
|
|
|
|
|
|
|
|
/* Remove loops. */
|
2017-08-07 20:05:15 +07:00
|
|
|
static int remove_loops(struct branch_entry *l, int nr,
|
|
|
|
struct iterations *iter)
|
perf callchain: Support handling complete branch stacks as histograms
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.
This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.
This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.
Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:
tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}
% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12
The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.
This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR
Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)
v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-13 09:05:20 +07:00
|
|
|
{
|
|
|
|
int i, j, off;
|
|
|
|
unsigned char chash[CHASHSZ];
|
|
|
|
|
|
|
|
memset(chash, NO_ENTRY, sizeof(chash));
|
|
|
|
|
|
|
|
BUG_ON(PERF_MAX_BRANCH_DEPTH > 255);
|
|
|
|
|
|
|
|
for (i = 0; i < nr; i++) {
|
|
|
|
int h = hash_64(l[i].from, CHASHBITS) % CHASHSZ;
|
|
|
|
|
|
|
|
/* no collision handling for now */
|
|
|
|
if (chash[h] == NO_ENTRY) {
|
|
|
|
chash[h] = i;
|
|
|
|
} else if (l[chash[h]].from == l[i].from) {
|
|
|
|
bool is_loop = true;
|
|
|
|
/* check if it is a real loop */
|
|
|
|
off = 0;
|
|
|
|
for (j = chash[h]; j < i && i + off < nr; j++, off++)
|
|
|
|
if (l[j].from != l[i + off].from) {
|
|
|
|
is_loop = false;
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
if (is_loop) {
|
2017-08-07 20:05:15 +07:00
|
|
|
j = nr - (i + off);
|
|
|
|
if (j > 0) {
|
|
|
|
save_iterations(iter + i + off,
|
|
|
|
l + i, off);
|
|
|
|
|
|
|
|
memmove(iter + i, iter + i + off,
|
|
|
|
j * sizeof(*iter));
|
|
|
|
|
|
|
|
memmove(l + i, l + i + off,
|
|
|
|
j * sizeof(*l));
|
|
|
|
}
|
|
|
|
|
perf callchain: Support handling complete branch stacks as histograms
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.
This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.
This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.
Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:
tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}
% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12
The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.
This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR
Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)
v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-13 09:05:20 +07:00
|
|
|
nr -= off;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return nr;
|
|
|
|
}
|
|
|
|
|
2015-01-06 01:23:05 +07:00
|
|
|
/*
|
|
|
|
* Recolve LBR callstack chain sample
|
|
|
|
* Return:
|
|
|
|
* 1 on success get LBR callchain information
|
|
|
|
* 0 no available LBR callchain information, should try fp
|
|
|
|
* negative error code on other errors.
|
|
|
|
*/
|
|
|
|
static int resolve_lbr_callchain_sample(struct thread *thread,
|
2016-04-15 00:48:07 +07:00
|
|
|
struct callchain_cursor *cursor,
|
2015-01-06 01:23:05 +07:00
|
|
|
struct perf_sample *sample,
|
|
|
|
struct symbol **parent,
|
|
|
|
struct addr_location *root_al,
|
|
|
|
int max_stack)
|
2012-12-08 03:39:39 +07:00
|
|
|
{
|
2015-01-06 01:23:05 +07:00
|
|
|
struct ip_callchain *chain = sample->callchain;
|
2016-10-03 21:07:24 +07:00
|
|
|
int chain_nr = min(max_stack, (int)chain->nr), i;
|
2015-03-30 15:11:00 +07:00
|
|
|
u8 cpumode = PERF_RECORD_MISC_USER;
|
2017-07-18 19:13:15 +07:00
|
|
|
u64 ip, branch_from = 0;
|
2015-01-06 01:23:05 +07:00
|
|
|
|
|
|
|
for (i = 0; i < chain_nr; i++) {
|
|
|
|
if (chain->ips[i] == PERF_CONTEXT_USER)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* LBR only affects the user callchain */
|
|
|
|
if (i != chain_nr) {
|
|
|
|
struct branch_stack *lbr_stack = sample->branch_stack;
|
perf report: Add branch flag to callchain cursor node
Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.
Then we can know if the cursor node represents a branch and know what
the branch flag it has.
The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.
For example:
Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800
After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800
The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples
This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-31 08:19:49 +07:00
|
|
|
int lbr_nr = lbr_stack->nr, j, k;
|
|
|
|
bool branch;
|
|
|
|
struct branch_flags *flags;
|
2015-01-06 01:23:05 +07:00
|
|
|
/*
|
|
|
|
* LBR callstack can only get user call chain.
|
|
|
|
* The mix_chain_nr is kernel call chain
|
|
|
|
* number plus LBR user call chain number.
|
|
|
|
* i is kernel call chain number,
|
|
|
|
* 1 is PERF_CONTEXT_USER,
|
|
|
|
* lbr_nr + 1 is the user call chain number.
|
|
|
|
* For details, please refer to the comments
|
|
|
|
* in callchain__printf
|
|
|
|
*/
|
|
|
|
int mix_chain_nr = i + 1 + lbr_nr + 1;
|
|
|
|
|
|
|
|
for (j = 0; j < mix_chain_nr; j++) {
|
2016-10-03 21:07:24 +07:00
|
|
|
int err;
|
perf report: Add branch flag to callchain cursor node
Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.
Then we can know if the cursor node represents a branch and know what
the branch flag it has.
The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.
For example:
Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800
After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800
The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples
This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-31 08:19:49 +07:00
|
|
|
branch = false;
|
|
|
|
flags = NULL;
|
|
|
|
|
2015-01-06 01:23:05 +07:00
|
|
|
if (callchain_param.order == ORDER_CALLEE) {
|
|
|
|
if (j < i + 1)
|
|
|
|
ip = chain->ips[j];
|
perf report: Add branch flag to callchain cursor node
Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.
Then we can know if the cursor node represents a branch and know what
the branch flag it has.
The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.
For example:
Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800
After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800
The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples
This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-31 08:19:49 +07:00
|
|
|
else if (j > i + 1) {
|
|
|
|
k = j - i - 2;
|
|
|
|
ip = lbr_stack->entries[k].from;
|
|
|
|
branch = true;
|
|
|
|
flags = &lbr_stack->entries[k].flags;
|
|
|
|
} else {
|
2015-01-06 01:23:05 +07:00
|
|
|
ip = lbr_stack->entries[0].to;
|
perf report: Add branch flag to callchain cursor node
Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.
Then we can know if the cursor node represents a branch and know what
the branch flag it has.
The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.
For example:
Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800
After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800
The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples
This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-31 08:19:49 +07:00
|
|
|
branch = true;
|
|
|
|
flags = &lbr_stack->entries[0].flags;
|
2017-07-18 19:13:15 +07:00
|
|
|
branch_from =
|
|
|
|
lbr_stack->entries[0].from;
|
perf report: Add branch flag to callchain cursor node
Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.
Then we can know if the cursor node represents a branch and know what
the branch flag it has.
The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.
For example:
Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800
After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800
The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples
This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-31 08:19:49 +07:00
|
|
|
}
|
2015-01-06 01:23:05 +07:00
|
|
|
} else {
|
perf report: Add branch flag to callchain cursor node
Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.
Then we can know if the cursor node represents a branch and know what
the branch flag it has.
The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.
For example:
Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800
After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800
The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples
This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-31 08:19:49 +07:00
|
|
|
if (j < lbr_nr) {
|
|
|
|
k = lbr_nr - j - 1;
|
|
|
|
ip = lbr_stack->entries[k].from;
|
|
|
|
branch = true;
|
|
|
|
flags = &lbr_stack->entries[k].flags;
|
|
|
|
}
|
2015-01-06 01:23:05 +07:00
|
|
|
else if (j > lbr_nr)
|
|
|
|
ip = chain->ips[i + 1 - (j - lbr_nr)];
|
perf report: Add branch flag to callchain cursor node
Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.
Then we can know if the cursor node represents a branch and know what
the branch flag it has.
The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.
For example:
Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800
After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800
The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples
This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-31 08:19:49 +07:00
|
|
|
else {
|
2015-01-06 01:23:05 +07:00
|
|
|
ip = lbr_stack->entries[0].to;
|
perf report: Add branch flag to callchain cursor node
Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.
Then we can know if the cursor node represents a branch and know what
the branch flag it has.
The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.
For example:
Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800
After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800
The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples
This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-31 08:19:49 +07:00
|
|
|
branch = true;
|
|
|
|
flags = &lbr_stack->entries[0].flags;
|
2017-07-18 19:13:15 +07:00
|
|
|
branch_from =
|
|
|
|
lbr_stack->entries[0].from;
|
perf report: Add branch flag to callchain cursor node
Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.
Then we can know if the cursor node represents a branch and know what
the branch flag it has.
The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.
For example:
Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800
After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800
The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples
This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-31 08:19:49 +07:00
|
|
|
}
|
2015-01-06 01:23:05 +07:00
|
|
|
}
|
|
|
|
|
perf report: Add branch flag to callchain cursor node
Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.
Then we can know if the cursor node represents a branch and know what
the branch flag it has.
The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.
For example:
Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800
After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800
The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples
This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-31 08:19:49 +07:00
|
|
|
err = add_callchain_ip(thread, cursor, parent,
|
|
|
|
root_al, &cpumode, ip,
|
2017-08-07 20:05:15 +07:00
|
|
|
branch, flags, NULL,
|
2017-07-18 19:13:15 +07:00
|
|
|
branch_from);
|
2015-01-06 01:23:05 +07:00
|
|
|
if (err)
|
|
|
|
return (err < 0) ? err : 0;
|
|
|
|
}
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc}
When processing using 'perf report -g caller', which is the default, we
ended up reverting the callchain entries received from the kernel, but
simply reverting throws away the information that tells that from a
point onwards the addresses are for userspace, kernel, guest kernel,
guest user, hypervisor.
The idea is that if we are walking backwards, for each cluster of
non-cpumode entries we have to first scan backwards for the next one and
use that for the cluster.
This seems silly and more expensive than it needs to be but it is enough
for a initial fix.
The code here is really complicated because it is intimately intertwined
with the lbr and branch handling, as well as this callchain order,
further fixes will be needed to properly take into account the cpumode
in those cases.
Another problem with ORDER_CALLER is that the NULL "0" IP that is at the
end of most callchains shows up at the top of the histogram because
every callchain contains it and with ORDER_CALLER it is the first entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Souvik Banerjee <souvik1997@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: stable@vger.kernel.org # 4.19
Link: https://lkml.kernel.org/n/tip-2wt3ayp6j2y2f2xowixa8y6y@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-30 22:12:26 +07:00
|
|
|
static int find_prev_cpumode(struct ip_callchain *chain, struct thread *thread,
|
|
|
|
struct callchain_cursor *cursor,
|
|
|
|
struct symbol **parent,
|
|
|
|
struct addr_location *root_al,
|
|
|
|
u8 *cpumode, int ent)
|
|
|
|
{
|
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
while (--ent >= 0) {
|
|
|
|
u64 ip = chain->ips[ent];
|
|
|
|
|
|
|
|
if (ip >= PERF_CONTEXT_MAX) {
|
|
|
|
err = add_callchain_ip(thread, cursor, parent,
|
|
|
|
root_al, cpumode, ip,
|
|
|
|
false, NULL, NULL, 0);
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
|
|
|
|
2015-01-06 01:23:05 +07:00
|
|
|
static int thread__resolve_callchain_sample(struct thread *thread,
|
2016-04-15 00:48:07 +07:00
|
|
|
struct callchain_cursor *cursor,
|
2019-07-21 18:23:51 +07:00
|
|
|
struct evsel *evsel,
|
2015-01-06 01:23:05 +07:00
|
|
|
struct perf_sample *sample,
|
|
|
|
struct symbol **parent,
|
|
|
|
struct addr_location *root_al,
|
|
|
|
int max_stack)
|
|
|
|
{
|
|
|
|
struct branch_stack *branch = sample->branch_stack;
|
|
|
|
struct ip_callchain *chain = sample->callchain;
|
perf report: Make --branch-history work without callgraphs(-g) option in perf record
perf record -b -g <command>
perf report --branch-history
This merges the LBRs with the callgraphs.
However it would be nice if it also works without callgraphs (-g) set in
perf record, so that only the LBRs are displayed. But currently perf
report errors in this case. For example,
perf record -b <command>
perf report --branch-history
Error:
Selected -g or --branch-history but no callchain data. Did
you call 'perf record' without -g?
This patch displays the LBRs only even if callgraphs(-g) is not enabled
in perf record.
Change log:
v2: According to Milian Wolff's comment, change the obsolete error
message. Now the error message is:
┌─Error:─────────────────────────────────────┐
│Selected -g or --branch-history. │
│But no callchain or branch data. │
│Did you call 'perf record' without -g or -b?│
│ │
│ │
│Press any key... │
└────────────────────────────────────────────┘
When passing the last parameter to hists__fprintf,
changes "|" to "||".
hists__fprintf(hists, !quiet, 0, 0, rep->min_percent, stdout,
symbol_conf.use_callchain || symbol_conf.show_branchflag_count);
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1494240182-28899-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-05-08 17:43:02 +07:00
|
|
|
int chain_nr = 0;
|
2015-03-30 15:11:00 +07:00
|
|
|
u8 cpumode = PERF_RECORD_MISC_USER;
|
2016-05-19 21:14:15 +07:00
|
|
|
int i, j, err, nr_entries;
|
perf callchain: Support handling complete branch stacks as histograms
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.
This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.
This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.
Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:
tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}
% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12
The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.
This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR
Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)
v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-13 09:05:20 +07:00
|
|
|
int skip_idx = -1;
|
|
|
|
int first_call = 0;
|
|
|
|
|
perf report: Make --branch-history work without callgraphs(-g) option in perf record
perf record -b -g <command>
perf report --branch-history
This merges the LBRs with the callgraphs.
However it would be nice if it also works without callgraphs (-g) set in
perf record, so that only the LBRs are displayed. But currently perf
report errors in this case. For example,
perf record -b <command>
perf report --branch-history
Error:
Selected -g or --branch-history but no callchain data. Did
you call 'perf record' without -g?
This patch displays the LBRs only even if callgraphs(-g) is not enabled
in perf record.
Change log:
v2: According to Milian Wolff's comment, change the obsolete error
message. Now the error message is:
┌─Error:─────────────────────────────────────┐
│Selected -g or --branch-history. │
│But no callchain or branch data. │
│Did you call 'perf record' without -g or -b?│
│ │
│ │
│Press any key... │
└────────────────────────────────────────────┘
When passing the last parameter to hists__fprintf,
changes "|" to "||".
hists__fprintf(hists, !quiet, 0, 0, rep->min_percent, stdout,
symbol_conf.use_callchain || symbol_conf.show_branchflag_count);
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1494240182-28899-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-05-08 17:43:02 +07:00
|
|
|
if (chain)
|
|
|
|
chain_nr = chain->nr;
|
|
|
|
|
2016-04-18 20:35:03 +07:00
|
|
|
if (perf_evsel__has_branch_callstack(evsel)) {
|
2016-04-15 00:48:07 +07:00
|
|
|
err = resolve_lbr_callchain_sample(thread, cursor, sample, parent,
|
2015-01-06 01:23:05 +07:00
|
|
|
root_al, max_stack);
|
|
|
|
if (err)
|
|
|
|
return (err < 0) ? err : 0;
|
|
|
|
}
|
|
|
|
|
perf callchain: Support handling complete branch stacks as histograms
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.
This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.
This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.
Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:
tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}
% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12
The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.
This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR
Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)
v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-13 09:05:20 +07:00
|
|
|
/*
|
|
|
|
* Based on DWARF debug information, some architectures skip
|
|
|
|
* a callchain entry saved by the kernel.
|
|
|
|
*/
|
2016-05-19 21:14:15 +07:00
|
|
|
skip_idx = arch_skip_callchain_idx(thread, chain);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
perf callchain: Support handling complete branch stacks as histograms
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.
This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.
This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.
Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:
tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}
% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12
The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.
This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR
Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)
v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-13 09:05:20 +07:00
|
|
|
/*
|
|
|
|
* Add branches to call stack for easier browsing. This gives
|
|
|
|
* more context for a sample than just the callers.
|
|
|
|
*
|
|
|
|
* This uses individual histograms of paths compared to the
|
|
|
|
* aggregated histograms the normal LBR mode uses.
|
|
|
|
*
|
|
|
|
* Limitations for now:
|
|
|
|
* - No extra filters
|
|
|
|
* - No annotations (should annotate somehow)
|
|
|
|
*/
|
|
|
|
|
|
|
|
if (branch && callchain_param.branch_callstack) {
|
|
|
|
int nr = min(max_stack, (int)branch->nr);
|
|
|
|
struct branch_entry be[nr];
|
2017-08-07 20:05:15 +07:00
|
|
|
struct iterations iter[nr];
|
perf callchain: Support handling complete branch stacks as histograms
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.
This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.
This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.
Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:
tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}
% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12
The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.
This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR
Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)
v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-13 09:05:20 +07:00
|
|
|
|
|
|
|
if (branch->nr > PERF_MAX_BRANCH_DEPTH) {
|
|
|
|
pr_warning("corrupted branch chain. skipping...\n");
|
|
|
|
goto check_calls;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0; i < nr; i++) {
|
|
|
|
if (callchain_param.order == ORDER_CALLEE) {
|
|
|
|
be[i] = branch->entries[i];
|
perf report: Make --branch-history work without callgraphs(-g) option in perf record
perf record -b -g <command>
perf report --branch-history
This merges the LBRs with the callgraphs.
However it would be nice if it also works without callgraphs (-g) set in
perf record, so that only the LBRs are displayed. But currently perf
report errors in this case. For example,
perf record -b <command>
perf report --branch-history
Error:
Selected -g or --branch-history but no callchain data. Did
you call 'perf record' without -g?
This patch displays the LBRs only even if callgraphs(-g) is not enabled
in perf record.
Change log:
v2: According to Milian Wolff's comment, change the obsolete error
message. Now the error message is:
┌─Error:─────────────────────────────────────┐
│Selected -g or --branch-history. │
│But no callchain or branch data. │
│Did you call 'perf record' without -g or -b?│
│ │
│ │
│Press any key... │
└────────────────────────────────────────────┘
When passing the last parameter to hists__fprintf,
changes "|" to "||".
hists__fprintf(hists, !quiet, 0, 0, rep->min_percent, stdout,
symbol_conf.use_callchain || symbol_conf.show_branchflag_count);
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1494240182-28899-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-05-08 17:43:02 +07:00
|
|
|
|
|
|
|
if (chain == NULL)
|
|
|
|
continue;
|
|
|
|
|
perf callchain: Support handling complete branch stacks as histograms
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.
This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.
This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.
Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:
tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}
% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12
The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.
This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR
Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)
v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-13 09:05:20 +07:00
|
|
|
/*
|
|
|
|
* Check for overlap into the callchain.
|
|
|
|
* The return address is one off compared to
|
|
|
|
* the branch entry. To adjust for this
|
|
|
|
* assume the calling instruction is not longer
|
|
|
|
* than 8 bytes.
|
|
|
|
*/
|
|
|
|
if (i == skip_idx ||
|
|
|
|
chain->ips[first_call] >= PERF_CONTEXT_MAX)
|
|
|
|
first_call++;
|
|
|
|
else if (be[i].from < chain->ips[first_call] &&
|
|
|
|
be[i].from >= chain->ips[first_call] - 8)
|
|
|
|
first_call++;
|
|
|
|
} else
|
|
|
|
be[i] = branch->entries[branch->nr - i - 1];
|
|
|
|
}
|
|
|
|
|
2017-08-07 20:05:15 +07:00
|
|
|
memset(iter, 0, sizeof(struct iterations) * nr);
|
|
|
|
nr = remove_loops(be, nr, iter);
|
perf report: Add branch flag to callchain cursor node
Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.
Then we can know if the cursor node represents a branch and know what
the branch flag it has.
The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.
For example:
Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800
After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800
The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples
This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-31 08:19:49 +07:00
|
|
|
|
perf callchain: Support handling complete branch stacks as histograms
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.
This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.
This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.
Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:
tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}
% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12
The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.
This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR
Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)
v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-13 09:05:20 +07:00
|
|
|
for (i = 0; i < nr; i++) {
|
2017-08-07 20:05:15 +07:00
|
|
|
err = add_callchain_ip(thread, cursor, parent,
|
|
|
|
root_al,
|
|
|
|
NULL, be[i].to,
|
|
|
|
true, &be[i].flags,
|
|
|
|
NULL, be[i].from);
|
perf report: Add branch flag to callchain cursor node
Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.
Then we can know if the cursor node represents a branch and know what
the branch flag it has.
The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.
For example:
Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800
After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800
The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples
This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-31 08:19:49 +07:00
|
|
|
|
perf callchain: Support handling complete branch stacks as histograms
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.
This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.
This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.
Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:
tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}
% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12
The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.
This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR
Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)
v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-13 09:05:20 +07:00
|
|
|
if (!err)
|
2016-04-15 00:48:07 +07:00
|
|
|
err = add_callchain_ip(thread, cursor, parent, root_al,
|
perf report: Add branch flag to callchain cursor node
Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.
Then we can know if the cursor node represents a branch and know what
the branch flag it has.
The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.
For example:
Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800
After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800
The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples
This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-31 08:19:49 +07:00
|
|
|
NULL, be[i].from,
|
|
|
|
true, &be[i].flags,
|
2017-08-07 20:05:15 +07:00
|
|
|
&iter[i], 0);
|
perf callchain: Support handling complete branch stacks as histograms
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.
This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.
This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.
Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:
tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}
% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12
The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.
This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR
Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)
v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-13 09:05:20 +07:00
|
|
|
if (err == -EINVAL)
|
|
|
|
break;
|
|
|
|
if (err)
|
|
|
|
return err;
|
|
|
|
}
|
perf report: Make --branch-history work without callgraphs(-g) option in perf record
perf record -b -g <command>
perf report --branch-history
This merges the LBRs with the callgraphs.
However it would be nice if it also works without callgraphs (-g) set in
perf record, so that only the LBRs are displayed. But currently perf
report errors in this case. For example,
perf record -b <command>
perf report --branch-history
Error:
Selected -g or --branch-history but no callchain data. Did
you call 'perf record' without -g?
This patch displays the LBRs only even if callgraphs(-g) is not enabled
in perf record.
Change log:
v2: According to Milian Wolff's comment, change the obsolete error
message. Now the error message is:
┌─Error:─────────────────────────────────────┐
│Selected -g or --branch-history. │
│But no callchain or branch data. │
│Did you call 'perf record' without -g or -b?│
│ │
│ │
│Press any key... │
└────────────────────────────────────────────┘
When passing the last parameter to hists__fprintf,
changes "|" to "||".
hists__fprintf(hists, !quiet, 0, 0, rep->min_percent, stdout,
symbol_conf.use_callchain || symbol_conf.show_branchflag_count);
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1494240182-28899-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-05-08 17:43:02 +07:00
|
|
|
|
|
|
|
if (chain_nr == 0)
|
|
|
|
return 0;
|
|
|
|
|
perf callchain: Support handling complete branch stacks as histograms
Currently branch stacks can be only shown as edge histograms for
individual branches. I never found this display particularly useful.
This implements an alternative mode that creates histograms over
complete branch traces, instead of individual branches, similar to how
normal callgraphs are handled. This is done by putting it in front of
the normal callgraph and then using the normal callgraph histogram
infrastructure to unify them.
This way in complex functions we can understand the control flow that
lead to a particular sample, and may even see some control flow in the
caller for short functions.
Example (simplified, of course for such simple code this is usually not
needed), please run this after the whole patchkit is in, as at this
point in the patch order there is no --branch-history, that will be
added in a patch after this one:
tcall.c:
volatile a = 10000, b = 100000, c;
__attribute__((noinline)) f2()
{
c = a / b;
}
__attribute__((noinline)) f1()
{
f2();
f2();
}
main()
{
int i;
for (i = 0; i < 1000000; i++)
f1();
}
% perf record -b -g ./tsrc/tcall
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.044 MB perf.data (~1923 samples) ]
% perf report --no-children --branch-history
...
54.91% tcall.c:6 [.] f2 tcall
|
|--65.53%-- f2 tcall.c:5
| |
| |--70.83%-- f1 tcall.c:11
| | f1 tcall.c:10
| | main tcall.c:18
| | main tcall.c:18
| | main tcall.c:17
| | main tcall.c:17
| | f1 tcall.c:13
| | f1 tcall.c:13
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:12
| | f1 tcall.c:12
| | f2 tcall.c:7
| | f2 tcall.c:5
| | f1 tcall.c:11
| |
| --29.17%-- f1 tcall.c:12
| f1 tcall.c:12
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:11
| f1 tcall.c:10
| main tcall.c:18
| main tcall.c:18
| main tcall.c:17
| main tcall.c:17
| f1 tcall.c:13
| f1 tcall.c:13
| f2 tcall.c:7
| f2 tcall.c:5
| f1 tcall.c:12
The default output is unchanged.
This is only implemented in perf report, no change to record or anywhere
else.
This adds the basic code to report:
- add a new "branch" option to the -g option parser to enable this mode
- when the flag is set include the LBR into the callstack in machine.c.
The rest of the history code is unchanged and doesn't know the
difference between LBR entry and normal call entry.
- detect overlaps with the callchain
- remove small loop duplicates in the LBR
Current limitations:
- The LBR flags (mispredict etc.) are not shown in the history
and LBR entries have no special marker.
- It would be nice if annotate marked the LBR entries somehow
(e.g. with arrows)
v2: Various fixes.
v3: Merge further patches into this one. Fix white space.
v4: Improve manpage. Address review feedback.
v5: Rename functions. Better error message without -g. Fix crash without
-b.
v6: Rebase
v7: Rebase. Use NO_ENTRY in memset.
v8: Port to latest tip. Move add_callchain_ip to separate
patch. Skip initial entries in callchain. Minor cleanups.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1415844328-4884-3-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2014-11-13 09:05:20 +07:00
|
|
|
chain_nr -= nr;
|
|
|
|
}
|
|
|
|
|
|
|
|
check_calls:
|
2019-11-14 21:25:38 +07:00
|
|
|
if (chain && callchain_param.order != ORDER_CALLEE) {
|
perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc}
When processing using 'perf report -g caller', which is the default, we
ended up reverting the callchain entries received from the kernel, but
simply reverting throws away the information that tells that from a
point onwards the addresses are for userspace, kernel, guest kernel,
guest user, hypervisor.
The idea is that if we are walking backwards, for each cluster of
non-cpumode entries we have to first scan backwards for the next one and
use that for the cluster.
This seems silly and more expensive than it needs to be but it is enough
for a initial fix.
The code here is really complicated because it is intimately intertwined
with the lbr and branch handling, as well as this callchain order,
further fixes will be needed to properly take into account the cpumode
in those cases.
Another problem with ORDER_CALLER is that the NULL "0" IP that is at the
end of most callchains shows up at the top of the histogram because
every callchain contains it and with ORDER_CALLER it is the first entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Souvik Banerjee <souvik1997@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: stable@vger.kernel.org # 4.19
Link: https://lkml.kernel.org/n/tip-2wt3ayp6j2y2f2xowixa8y6y@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-30 22:12:26 +07:00
|
|
|
err = find_prev_cpumode(chain, thread, cursor, parent, root_al,
|
|
|
|
&cpumode, chain->nr - first_call);
|
|
|
|
if (err)
|
|
|
|
return (err < 0) ? err : 0;
|
|
|
|
}
|
2016-05-19 21:14:15 +07:00
|
|
|
for (i = first_call, nr_entries = 0;
|
2016-05-17 07:16:54 +07:00
|
|
|
i < chain_nr && nr_entries < max_stack; i++) {
|
2012-12-08 03:39:39 +07:00
|
|
|
u64 ip;
|
|
|
|
|
|
|
|
if (callchain_param.order == ORDER_CALLEE)
|
2014-06-25 22:49:03 +07:00
|
|
|
j = i;
|
2012-12-08 03:39:39 +07:00
|
|
|
else
|
2014-06-25 22:49:03 +07:00
|
|
|
j = chain->nr - i - 1;
|
|
|
|
|
|
|
|
#ifdef HAVE_SKIP_CALLCHAIN_IDX
|
|
|
|
if (j == skip_idx)
|
|
|
|
continue;
|
|
|
|
#endif
|
|
|
|
ip = chain->ips[j];
|
2016-05-19 21:14:15 +07:00
|
|
|
if (ip < PERF_CONTEXT_MAX)
|
|
|
|
++nr_entries;
|
perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc}
When processing using 'perf report -g caller', which is the default, we
ended up reverting the callchain entries received from the kernel, but
simply reverting throws away the information that tells that from a
point onwards the addresses are for userspace, kernel, guest kernel,
guest user, hypervisor.
The idea is that if we are walking backwards, for each cluster of
non-cpumode entries we have to first scan backwards for the next one and
use that for the cluster.
This seems silly and more expensive than it needs to be but it is enough
for a initial fix.
The code here is really complicated because it is intimately intertwined
with the lbr and branch handling, as well as this callchain order,
further fixes will be needed to properly take into account the cpumode
in those cases.
Another problem with ORDER_CALLER is that the NULL "0" IP that is at the
end of most callchains shows up at the top of the histogram because
every callchain contains it and with ORDER_CALLER it is the first entry.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Souvik Banerjee <souvik1997@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: stable@vger.kernel.org # 4.19
Link: https://lkml.kernel.org/n/tip-2wt3ayp6j2y2f2xowixa8y6y@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-10-30 22:12:26 +07:00
|
|
|
else if (callchain_param.order != ORDER_CALLEE) {
|
|
|
|
err = find_prev_cpumode(chain, thread, cursor, parent,
|
|
|
|
root_al, &cpumode, j);
|
|
|
|
if (err)
|
|
|
|
return (err < 0) ? err : 0;
|
|
|
|
continue;
|
|
|
|
}
|
2016-05-17 07:16:54 +07:00
|
|
|
|
perf report: Add branch flag to callchain cursor node
Since the branch ip has been added to call stack for easier browsing,
this patch adds more branch information. For example, add a flag to
indicate if this ip is a branch, and also add with the branch flag.
Then we can know if the cursor node represents a branch and know what
the branch flag it has.
The branch history code has a loop detection pass that removes loops. It
would be nice for knowing how many loops were removed then in next
steps, we can compute out the average number of iterations.
For example:
Before remove_loops(),
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x300, to = 0x250
entry3: from = 0x300, to = 0x250
entry4: from = 0x700, to = 0x800
After remove_loops()
entry0: from = 0x100, to = 0x200
entry1: from = 0x300, to = 0x250
entry2: from = 0x700, to = 0x800
The original entry2 and entry3 are removed. So the number of iterations
(from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
iterations = removed number + 1;
average iteractions = Sum(iteractions) / number of samples
This formula ignores other cases, for example, iterations cross multiple
buffers and one buffer contains 2+ loops. Because in practice, it's good
enough.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linux-kernel@vger.kernel.org
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
[ Renamed 'iter' to 'nr_loop_iter' for clarity ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-10-31 08:19:49 +07:00
|
|
|
err = add_callchain_ip(thread, cursor, parent,
|
|
|
|
root_al, &cpumode, ip,
|
2017-08-07 20:05:15 +07:00
|
|
|
false, NULL, NULL, 0);
|
2012-12-08 03:39:39 +07:00
|
|
|
|
|
|
|
if (err)
|
2014-12-02 22:06:53 +07:00
|
|
|
return (err < 0) ? err : 0;
|
2012-12-08 03:39:39 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-11-04 21:58:21 +07:00
|
|
|
static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip)
|
2017-10-10 03:32:59 +07:00
|
|
|
{
|
2019-11-04 21:58:21 +07:00
|
|
|
struct symbol *sym = ms->sym;
|
|
|
|
struct map *map = ms->map;
|
2017-10-10 03:32:59 +07:00
|
|
|
struct inline_node *inline_node;
|
|
|
|
struct inline_list *ilist;
|
|
|
|
u64 addr;
|
perf report: Cache failed lookups of inlined frames
When no inlined frames could be found for a given address, we did not
store this information anywhere. That means we potentially do the costly
inliner lookup repeatedly for cases where we know it can never succeed.
This patch makes dso__parse_addr_inlines always return a valid
inline_node. It will be empty when no inliners are found. This enables
us to cache the empty list in the DSO, thereby improving the performance
when many addresses fail to find the inliners.
For my trivial example, the performance impact is already quite
significant:
Before:
~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
594.804032 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% )
53 context-switches # 0.089 K/sec ( +- 4.09% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
5,687 page-faults # 0.010 M/sec ( +- 0.02% )
2,300,918,213 cycles # 3.868 GHz ( +- 0.09% )
4,395,839,080 instructions # 1.91 insn per cycle ( +- 0.00% )
939,177,205 branches # 1578.969 M/sec ( +- 0.00% )
11,824,633 branch-misses # 1.26% of all branches ( +- 0.10% )
0.596246531 seconds time elapsed ( +- 0.07% )
~~~~~
After:
~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
113.111405 task-clock (msec) # 0.990 CPUs utilized ( +- 0.89% )
29 context-switches # 0.255 K/sec ( +- 54.25% )
0 cpu-migrations # 0.000 K/sec
5,380 page-faults # 0.048 M/sec ( +- 0.01% )
432,378,779 cycles # 3.823 GHz ( +- 0.75% )
670,057,633 instructions # 1.55 insn per cycle ( +- 0.01% )
141,001,247 branches # 1246.570 M/sec ( +- 0.01% )
2,346,845 branch-misses # 1.66% of all branches ( +- 0.19% )
0.114222393 seconds time elapsed ( +- 1.19% )
~~~~~
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:33 +07:00
|
|
|
int ret = 1;
|
2017-10-10 03:32:59 +07:00
|
|
|
|
|
|
|
if (!symbol_conf.inline_name || !map || !sym)
|
perf report: Cache failed lookups of inlined frames
When no inlined frames could be found for a given address, we did not
store this information anywhere. That means we potentially do the costly
inliner lookup repeatedly for cases where we know it can never succeed.
This patch makes dso__parse_addr_inlines always return a valid
inline_node. It will be empty when no inliners are found. This enables
us to cache the empty list in the DSO, thereby improving the performance
when many addresses fail to find the inliners.
For my trivial example, the performance impact is already quite
significant:
Before:
~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
594.804032 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% )
53 context-switches # 0.089 K/sec ( +- 4.09% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
5,687 page-faults # 0.010 M/sec ( +- 0.02% )
2,300,918,213 cycles # 3.868 GHz ( +- 0.09% )
4,395,839,080 instructions # 1.91 insn per cycle ( +- 0.00% )
939,177,205 branches # 1578.969 M/sec ( +- 0.00% )
11,824,633 branch-misses # 1.26% of all branches ( +- 0.10% )
0.596246531 seconds time elapsed ( +- 0.07% )
~~~~~
After:
~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
113.111405 task-clock (msec) # 0.990 CPUs utilized ( +- 0.89% )
29 context-switches # 0.255 K/sec ( +- 54.25% )
0 cpu-migrations # 0.000 K/sec
5,380 page-faults # 0.048 M/sec ( +- 0.01% )
432,378,779 cycles # 3.823 GHz ( +- 0.75% )
670,057,633 instructions # 1.55 insn per cycle ( +- 0.01% )
141,001,247 branches # 1246.570 M/sec ( +- 0.01% )
2,346,845 branch-misses # 1.66% of all branches ( +- 0.19% )
0.114222393 seconds time elapsed ( +- 1.19% )
~~~~~
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:33 +07:00
|
|
|
return ret;
|
2017-10-10 03:32:59 +07:00
|
|
|
|
2018-09-26 20:52:06 +07:00
|
|
|
addr = map__map_ip(map, ip);
|
|
|
|
addr = map__rip_2objdump(map, addr);
|
2017-10-10 03:32:59 +07:00
|
|
|
|
|
|
|
inline_node = inlines__tree_find(&map->dso->inlined_nodes, addr);
|
|
|
|
if (!inline_node) {
|
|
|
|
inline_node = dso__parse_addr_inlines(map->dso, addr, sym);
|
|
|
|
if (!inline_node)
|
perf report: Cache failed lookups of inlined frames
When no inlined frames could be found for a given address, we did not
store this information anywhere. That means we potentially do the costly
inliner lookup repeatedly for cases where we know it can never succeed.
This patch makes dso__parse_addr_inlines always return a valid
inline_node. It will be empty when no inliners are found. This enables
us to cache the empty list in the DSO, thereby improving the performance
when many addresses fail to find the inliners.
For my trivial example, the performance impact is already quite
significant:
Before:
~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
594.804032 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% )
53 context-switches # 0.089 K/sec ( +- 4.09% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
5,687 page-faults # 0.010 M/sec ( +- 0.02% )
2,300,918,213 cycles # 3.868 GHz ( +- 0.09% )
4,395,839,080 instructions # 1.91 insn per cycle ( +- 0.00% )
939,177,205 branches # 1578.969 M/sec ( +- 0.00% )
11,824,633 branch-misses # 1.26% of all branches ( +- 0.10% )
0.596246531 seconds time elapsed ( +- 0.07% )
~~~~~
After:
~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
113.111405 task-clock (msec) # 0.990 CPUs utilized ( +- 0.89% )
29 context-switches # 0.255 K/sec ( +- 54.25% )
0 cpu-migrations # 0.000 K/sec
5,380 page-faults # 0.048 M/sec ( +- 0.01% )
432,378,779 cycles # 3.823 GHz ( +- 0.75% )
670,057,633 instructions # 1.55 insn per cycle ( +- 0.01% )
141,001,247 branches # 1246.570 M/sec ( +- 0.01% )
2,346,845 branch-misses # 1.66% of all branches ( +- 0.19% )
0.114222393 seconds time elapsed ( +- 1.19% )
~~~~~
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:33 +07:00
|
|
|
return ret;
|
2017-10-10 03:32:59 +07:00
|
|
|
inlines__tree_insert(&map->dso->inlined_nodes, inline_node);
|
|
|
|
}
|
|
|
|
|
|
|
|
list_for_each_entry(ilist, &inline_node->val, list) {
|
2019-11-04 22:14:32 +07:00
|
|
|
struct map_symbol ilist_ms = {
|
|
|
|
.map = map,
|
|
|
|
.sym = ilist->symbol,
|
|
|
|
};
|
|
|
|
ret = callchain_cursor_append(cursor, ip, &ilist_ms, false,
|
perf report: Cache failed lookups of inlined frames
When no inlined frames could be found for a given address, we did not
store this information anywhere. That means we potentially do the costly
inliner lookup repeatedly for cases where we know it can never succeed.
This patch makes dso__parse_addr_inlines always return a valid
inline_node. It will be empty when no inliners are found. This enables
us to cache the empty list in the DSO, thereby improving the performance
when many addresses fail to find the inliners.
For my trivial example, the performance impact is already quite
significant:
Before:
~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
594.804032 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% )
53 context-switches # 0.089 K/sec ( +- 4.09% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
5,687 page-faults # 0.010 M/sec ( +- 0.02% )
2,300,918,213 cycles # 3.868 GHz ( +- 0.09% )
4,395,839,080 instructions # 1.91 insn per cycle ( +- 0.00% )
939,177,205 branches # 1578.969 M/sec ( +- 0.00% )
11,824,633 branch-misses # 1.26% of all branches ( +- 0.10% )
0.596246531 seconds time elapsed ( +- 0.07% )
~~~~~
After:
~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
113.111405 task-clock (msec) # 0.990 CPUs utilized ( +- 0.89% )
29 context-switches # 0.255 K/sec ( +- 54.25% )
0 cpu-migrations # 0.000 K/sec
5,380 page-faults # 0.048 M/sec ( +- 0.01% )
432,378,779 cycles # 3.823 GHz ( +- 0.75% )
670,057,633 instructions # 1.55 insn per cycle ( +- 0.01% )
141,001,247 branches # 1246.570 M/sec ( +- 0.01% )
2,346,845 branch-misses # 1.66% of all branches ( +- 0.19% )
0.114222393 seconds time elapsed ( +- 1.19% )
~~~~~
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:33 +07:00
|
|
|
NULL, 0, 0, 0, ilist->srcline);
|
2017-10-10 03:32:59 +07:00
|
|
|
|
|
|
|
if (ret != 0)
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
perf report: Cache failed lookups of inlined frames
When no inlined frames could be found for a given address, we did not
store this information anywhere. That means we potentially do the costly
inliner lookup repeatedly for cases where we know it can never succeed.
This patch makes dso__parse_addr_inlines always return a valid
inline_node. It will be empty when no inliners are found. This enables
us to cache the empty list in the DSO, thereby improving the performance
when many addresses fail to find the inliners.
For my trivial example, the performance impact is already quite
significant:
Before:
~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
594.804032 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% )
53 context-switches # 0.089 K/sec ( +- 4.09% )
0 cpu-migrations # 0.000 K/sec ( +-100.00% )
5,687 page-faults # 0.010 M/sec ( +- 0.02% )
2,300,918,213 cycles # 3.868 GHz ( +- 0.09% )
4,395,839,080 instructions # 1.91 insn per cycle ( +- 0.00% )
939,177,205 branches # 1578.969 M/sec ( +- 0.00% )
11,824,633 branch-misses # 1.26% of all branches ( +- 0.10% )
0.596246531 seconds time elapsed ( +- 0.07% )
~~~~~
After:
~~~~~
Performance counter stats for 'perf report --stdio --inline -g srcline -s srcline' (5 runs):
113.111405 task-clock (msec) # 0.990 CPUs utilized ( +- 0.89% )
29 context-switches # 0.255 K/sec ( +- 54.25% )
0 cpu-migrations # 0.000 K/sec
5,380 page-faults # 0.048 M/sec ( +- 0.01% )
432,378,779 cycles # 3.823 GHz ( +- 0.75% )
670,057,633 instructions # 1.55 insn per cycle ( +- 0.01% )
141,001,247 branches # 1246.570 M/sec ( +- 0.01% )
2,346,845 branch-misses # 1.66% of all branches ( +- 0.19% )
0.114222393 seconds time elapsed ( +- 1.19% )
~~~~~
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-3-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-19 18:38:33 +07:00
|
|
|
return ret;
|
2017-10-10 03:32:59 +07:00
|
|
|
}
|
|
|
|
|
2012-12-08 03:39:39 +07:00
|
|
|
static int unwind_entry(struct unwind_entry *entry, void *arg)
|
|
|
|
{
|
|
|
|
struct callchain_cursor *cursor = arg;
|
2017-10-10 03:32:56 +07:00
|
|
|
const char *srcline = NULL;
|
perf report: Don't try to map ip to invalid map
Fixes a crash when the report encounters an address that could not be
associated with an mmaped region:
#0 0x00005555557bdc4a in callchain_srcline (ip=<error reading variable: Cannot access memory at address 0x38>, sym=0x0, map=0x0) at util/machine.c:2329
#1 unwind_entry (entry=entry@entry=0x7fffffff9180, arg=arg@entry=0x7ffff5642498) at util/machine.c:2329
#2 0x00005555558370af in entry (arg=0x7ffff5642498, cb=0x5555557bdb50 <unwind_entry>, thread=<optimized out>, ip=18446744073709551615) at util/unwind-libunwind-local.c:586
#3 get_entries (ui=ui@entry=0x7fffffff9620, cb=0x5555557bdb50 <unwind_entry>, arg=0x7ffff5642498, max_stack=<optimized out>) at util/unwind-libunwind-local.c:703
#4 0x0000555555837192 in _unwind__get_entries (cb=<optimized out>, arg=<optimized out>, thread=<optimized out>, data=<optimized out>, max_stack=<optimized out>) at util/unwind-libunwind-local.c:725
#5 0x00005555557c310f in thread__resolve_callchain_unwind (max_stack=127, sample=0x7fffffff9830, evsel=0x555555c7b3b0, cursor=0x7ffff5642498, thread=0x555555c7f6f0) at util/machine.c:2351
#6 thread__resolve_callchain (thread=0x555555c7f6f0, cursor=0x7ffff5642498, evsel=0x555555c7b3b0, sample=0x7fffffff9830, parent=0x7fffffff97b8, root_al=0x7fffffff9750, max_stack=127) at util/machine.c:2378
#7 0x00005555557ba4ee in sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fffffff97b8, evsel=<optimized out>, al=al@entry=0x7fffffff9750,
max_stack=<optimized out>) at util/callchain.c:1085
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Tested-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Fixes: 2a9d5050dc84 ("perf script: Show correct offsets for DWARF-based unwinding")
Link: http://lkml.kernel.org/r/20180926135207.30263-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-09-26 20:52:05 +07:00
|
|
|
u64 addr = entry->ip;
|
2015-11-26 14:08:20 +07:00
|
|
|
|
2019-11-04 21:58:21 +07:00
|
|
|
if (symbol_conf.hide_unresolved && entry->ms.sym == NULL)
|
2015-11-26 14:08:20 +07:00
|
|
|
return 0;
|
2017-10-10 03:32:56 +07:00
|
|
|
|
2019-11-04 21:58:21 +07:00
|
|
|
if (append_inlines(cursor, &entry->ms, entry->ip) == 0)
|
2017-10-10 03:32:59 +07:00
|
|
|
return 0;
|
|
|
|
|
perf script: Show correct offsets for DWARF-based unwinding
When perf/data is recorded with the dwarf call-graph option, the
callchain shown by 'perf script' still shows the binary offsets of the
userspace symbols instead of their virtual addresses. Since the symbol
offset calculation is based on using virtual address as the ip, we see
incorrect offsets as well.
The use of virtual addresses affects the ability to find out the
line number in the corresponding source file to which an address
maps to as described in commit 67540759151a ("perf unwind: Use
addr_location::addr instead of ip for entries").
This has also been addressed by temporarily converting the virtual
address to the correponding binary offset so that it can be mapped
to the source line number correctly.
This is a follow-up for commit 19610184693c ("perf script: Show
virtual addresses instead of offsets").
This can be verified on a powerpc64le system running Fedora 27 as
shown below:
# perf probe -x /usr/lib64/libc-2.26.so -a inet_pton
# perf record -e probe_libc:inet_pton --call-graph=dwarf ping -6 -c 1 ::1
Before:
# perf report --stdio --no-children -s sym,srcline -g address
# Samples: 1 of event 'probe_libc:inet_pton'
# Event count (approx.): 1
#
# Overhead Symbol Source:Line
# ........ .................... ...........
#
100.00% [.] __GI___inet_pton inet_pton.c
|
---gaih_inet getaddrinfo.c:537 (inlined)
__GI_getaddrinfo getaddrinfo.c:2304 (inlined)
main ping.c:519
generic_start_main libc-start.c:308 (inlined)
__libc_start_main libc-start.c:102
...
# perf script -F comm,ip,sym,symoff,srcline,dso
ping
15af28 __GI___inet_pton+0xffff000099160008 (/usr/lib64/libc-2.26.so)
libc-2.26.so[ffff80004ca0af28]
10fa53 gaih_inet+0xffff000099160f43
libc-2.26.so[ffff80004c9bfa53] (inlined)
1105b3 __GI_getaddrinfo+0xffff000099160163
libc-2.26.so[ffff80004c9c05b3] (inlined)
2d6f main+0xfffffffd9f1003df (/usr/bin/ping)
ping[fffffffecf882d6f]
2369f generic_start_main+0xffff00009916013f
libc-2.26.so[ffff80004c8d369f] (inlined)
23897 __libc_start_main+0xffff0000991600b7 (/usr/lib64/libc-2.26.so)
libc-2.26.so[ffff80004c8d3897]
After:
# perf report --stdio --no-children -s sym,srcline -g address
# Samples: 1 of event 'probe_libc:inet_pton'
# Event count (approx.): 1
#
# Overhead Symbol Source:Line
# ........ .................... ...........
#
100.00% [.] __GI___inet_pton inet_pton.c
|
---gaih_inet.constprop.7 getaddrinfo.c:537
getaddrinfo getaddrinfo.c:2304
main ping.c:519
generic_start_main.isra.0 libc-start.c:308
__libc_start_main libc-start.c:102
...
# perf script -F comm,ip,sym,symoff,srcline,dso
ping
7fffb38aaf28 __GI___inet_pton+0x8 (/usr/lib64/libc-2.26.so)
inet_pton.c:68
7fffb385fa53 gaih_inet.constprop.7+0xf43 (/usr/lib64/libc-2.26.so)
getaddrinfo.c:537
7fffb38605b3 getaddrinfo+0x163 (/usr/lib64/libc-2.26.so)
getaddrinfo.c:2304
130782d6f main+0x3df (/usr/bin/ping)
ping.c:519
7fffb377369f generic_start_main.isra.0+0x13f (/usr/lib64/libc-2.26.so)
libc-start.c:308
7fffb3773897 __libc_start_main+0xb7 (/usr/lib64/libc-2.26.so)
libc-start.c:102
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Fixes: 67540759151a ("perf unwind: Use addr_location::addr instead of ip for entries")
Link: http://lkml.kernel.org/r/20180703120555.32971-1-sandipan@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-03 19:05:55 +07:00
|
|
|
/*
|
|
|
|
* Convert entry->ip from a virtual address to an offset in
|
|
|
|
* its corresponding binary.
|
|
|
|
*/
|
2019-11-04 21:58:21 +07:00
|
|
|
if (entry->ms.map)
|
|
|
|
addr = map__map_ip(entry->ms.map, entry->ip);
|
perf script: Show correct offsets for DWARF-based unwinding
When perf/data is recorded with the dwarf call-graph option, the
callchain shown by 'perf script' still shows the binary offsets of the
userspace symbols instead of their virtual addresses. Since the symbol
offset calculation is based on using virtual address as the ip, we see
incorrect offsets as well.
The use of virtual addresses affects the ability to find out the
line number in the corresponding source file to which an address
maps to as described in commit 67540759151a ("perf unwind: Use
addr_location::addr instead of ip for entries").
This has also been addressed by temporarily converting the virtual
address to the correponding binary offset so that it can be mapped
to the source line number correctly.
This is a follow-up for commit 19610184693c ("perf script: Show
virtual addresses instead of offsets").
This can be verified on a powerpc64le system running Fedora 27 as
shown below:
# perf probe -x /usr/lib64/libc-2.26.so -a inet_pton
# perf record -e probe_libc:inet_pton --call-graph=dwarf ping -6 -c 1 ::1
Before:
# perf report --stdio --no-children -s sym,srcline -g address
# Samples: 1 of event 'probe_libc:inet_pton'
# Event count (approx.): 1
#
# Overhead Symbol Source:Line
# ........ .................... ...........
#
100.00% [.] __GI___inet_pton inet_pton.c
|
---gaih_inet getaddrinfo.c:537 (inlined)
__GI_getaddrinfo getaddrinfo.c:2304 (inlined)
main ping.c:519
generic_start_main libc-start.c:308 (inlined)
__libc_start_main libc-start.c:102
...
# perf script -F comm,ip,sym,symoff,srcline,dso
ping
15af28 __GI___inet_pton+0xffff000099160008 (/usr/lib64/libc-2.26.so)
libc-2.26.so[ffff80004ca0af28]
10fa53 gaih_inet+0xffff000099160f43
libc-2.26.so[ffff80004c9bfa53] (inlined)
1105b3 __GI_getaddrinfo+0xffff000099160163
libc-2.26.so[ffff80004c9c05b3] (inlined)
2d6f main+0xfffffffd9f1003df (/usr/bin/ping)
ping[fffffffecf882d6f]
2369f generic_start_main+0xffff00009916013f
libc-2.26.so[ffff80004c8d369f] (inlined)
23897 __libc_start_main+0xffff0000991600b7 (/usr/lib64/libc-2.26.so)
libc-2.26.so[ffff80004c8d3897]
After:
# perf report --stdio --no-children -s sym,srcline -g address
# Samples: 1 of event 'probe_libc:inet_pton'
# Event count (approx.): 1
#
# Overhead Symbol Source:Line
# ........ .................... ...........
#
100.00% [.] __GI___inet_pton inet_pton.c
|
---gaih_inet.constprop.7 getaddrinfo.c:537
getaddrinfo getaddrinfo.c:2304
main ping.c:519
generic_start_main.isra.0 libc-start.c:308
__libc_start_main libc-start.c:102
...
# perf script -F comm,ip,sym,symoff,srcline,dso
ping
7fffb38aaf28 __GI___inet_pton+0x8 (/usr/lib64/libc-2.26.so)
inet_pton.c:68
7fffb385fa53 gaih_inet.constprop.7+0xf43 (/usr/lib64/libc-2.26.so)
getaddrinfo.c:537
7fffb38605b3 getaddrinfo+0x163 (/usr/lib64/libc-2.26.so)
getaddrinfo.c:2304
130782d6f main+0x3df (/usr/bin/ping)
ping.c:519
7fffb377369f generic_start_main.isra.0+0x13f (/usr/lib64/libc-2.26.so)
libc-start.c:308
7fffb3773897 __libc_start_main+0xb7 (/usr/lib64/libc-2.26.so)
libc-start.c:102
Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Fixes: 67540759151a ("perf unwind: Use addr_location::addr instead of ip for entries")
Link: http://lkml.kernel.org/r/20180703120555.32971-1-sandipan@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-07-03 19:05:55 +07:00
|
|
|
|
2019-11-04 22:14:32 +07:00
|
|
|
srcline = callchain_srcline(&entry->ms, addr);
|
|
|
|
return callchain_cursor_append(cursor, entry->ip, &entry->ms,
|
2017-10-10 03:32:56 +07:00
|
|
|
false, NULL, 0, 0, 0, srcline);
|
2012-12-08 03:39:39 +07:00
|
|
|
}
|
|
|
|
|
perf callchain: Fix incorrect ordering of entries
The existing implementation of thread__resolve_callchain, under certain
circumstances, can assemble callchain entries in the incorrect order.
The callchain entries are resolved incorrectly for a sample when all of
the following conditions are met:
1. callchain_param.order is set to ORDER_CALLER
2. thread__resolve_callchain_sample is able to resolve callchain entries
for the sample.
3. unwind__get_entries is also able to resolve callchain entries for the
sample.
The fix is accomplished by reversing the order in which
thread__resolve_callchain_sample and unwind__get_entries are called when
callchain_param.order is set to ORDER_CALLER.
Unwind specific code from thread__resolve_callchain is also moved into a
new static function to improve readability of the fix.
How to Reproduce the Existing Bug:
Modifying perf script to print call trees in the opposite order or
applying the remaining patches from this series and comparing the
results output from export-to-postgtresql.py are the easiest ways to see
the bug, however it can still be seen in current builds using perf
report.
Here is how i can reproduce the bug using perf report:
# perf record --call-graph=dwarf stress -c 1 -t 5
when i run this command:
# perf report --call-graph=flat,0,0,callee
This callchain, containing kernel (handle_irq_event, etc) and userspace
samples (__libc_start_main, etc) is contained in the output, which looks
correct (callee order):
gen8_irq_handler
handle_irq_event_percpu
handle_irq_event
handle_edge_irq
handle_irq
do_IRQ
ret_from_intr
__random
rand
0x558f2a04dded
0x558f2a04c774
__libc_start_main
0x558f2a04dcd9
Now run this command using caller order:
# perf report --call-graph=flat,0,0,caller
It is expected to see the exact reverse of the above when using caller
order (with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the
bottom) in the output, but it is nowhere to be found.
instead you see this:
ret_from_intr
do_IRQ
handle_irq
handle_edge_irq
handle_irq_event
handle_irq_event_percpu
gen8_irq_handler
0x558f2a04dcd9
__libc_start_main
0x558f2a04c774
0x558f2a04dded
rand
__random
Notice how internally the kernel symbols are reversed and the user space
symbols are reversed, but the kernel symbols still appear above the user
space symbols.
if this patch is applied and perf script is re-run, you will see the
expected output (with "0x558f2a04dcd9" at the top and "gen8_irq_handler"
at the bottom):
0x558f2a04dcd9
__libc_start_main
0x558f2a04c774
0x558f2a04dded
rand
__random
ret_from_intr
do_IRQ
handle_irq
handle_edge_irq
handle_irq_event
handle_irq_event_percpu
gen8_irq_handler
Signed-off-by: Chris Phlipot <cphlipot0@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1461831551-12213-2-git-send-email-cphlipot0@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-28 15:19:06 +07:00
|
|
|
static int thread__resolve_callchain_unwind(struct thread *thread,
|
|
|
|
struct callchain_cursor *cursor,
|
2019-07-21 18:23:51 +07:00
|
|
|
struct evsel *evsel,
|
perf callchain: Fix incorrect ordering of entries
The existing implementation of thread__resolve_callchain, under certain
circumstances, can assemble callchain entries in the incorrect order.
The callchain entries are resolved incorrectly for a sample when all of
the following conditions are met:
1. callchain_param.order is set to ORDER_CALLER
2. thread__resolve_callchain_sample is able to resolve callchain entries
for the sample.
3. unwind__get_entries is also able to resolve callchain entries for the
sample.
The fix is accomplished by reversing the order in which
thread__resolve_callchain_sample and unwind__get_entries are called when
callchain_param.order is set to ORDER_CALLER.
Unwind specific code from thread__resolve_callchain is also moved into a
new static function to improve readability of the fix.
How to Reproduce the Existing Bug:
Modifying perf script to print call trees in the opposite order or
applying the remaining patches from this series and comparing the
results output from export-to-postgtresql.py are the easiest ways to see
the bug, however it can still be seen in current builds using perf
report.
Here is how i can reproduce the bug using perf report:
# perf record --call-graph=dwarf stress -c 1 -t 5
when i run this command:
# perf report --call-graph=flat,0,0,callee
This callchain, containing kernel (handle_irq_event, etc) and userspace
samples (__libc_start_main, etc) is contained in the output, which looks
correct (callee order):
gen8_irq_handler
handle_irq_event_percpu
handle_irq_event
handle_edge_irq
handle_irq
do_IRQ
ret_from_intr
__random
rand
0x558f2a04dded
0x558f2a04c774
__libc_start_main
0x558f2a04dcd9
Now run this command using caller order:
# perf report --call-graph=flat,0,0,caller
It is expected to see the exact reverse of the above when using caller
order (with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the
bottom) in the output, but it is nowhere to be found.
instead you see this:
ret_from_intr
do_IRQ
handle_irq
handle_edge_irq
handle_irq_event
handle_irq_event_percpu
gen8_irq_handler
0x558f2a04dcd9
__libc_start_main
0x558f2a04c774
0x558f2a04dded
rand
__random
Notice how internally the kernel symbols are reversed and the user space
symbols are reversed, but the kernel symbols still appear above the user
space symbols.
if this patch is applied and perf script is re-run, you will see the
expected output (with "0x558f2a04dcd9" at the top and "gen8_irq_handler"
at the bottom):
0x558f2a04dcd9
__libc_start_main
0x558f2a04c774
0x558f2a04dded
rand
__random
ret_from_intr
do_IRQ
handle_irq
handle_edge_irq
handle_irq_event
handle_irq_event_percpu
gen8_irq_handler
Signed-off-by: Chris Phlipot <cphlipot0@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1461831551-12213-2-git-send-email-cphlipot0@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-28 15:19:06 +07:00
|
|
|
struct perf_sample *sample,
|
|
|
|
int max_stack)
|
2012-12-08 03:39:39 +07:00
|
|
|
{
|
|
|
|
/* Can we do dwarf post unwind? */
|
2019-07-21 18:24:29 +07:00
|
|
|
if (!((evsel->core.attr.sample_type & PERF_SAMPLE_REGS_USER) &&
|
|
|
|
(evsel->core.attr.sample_type & PERF_SAMPLE_STACK_USER)))
|
2012-12-08 03:39:39 +07:00
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* Bail out if nothing was captured. */
|
|
|
|
if ((!sample->user_regs.regs) ||
|
|
|
|
(!sample->user_stack.size))
|
|
|
|
return 0;
|
|
|
|
|
2016-04-15 00:48:07 +07:00
|
|
|
return unwind__get_entries(unwind_entry, cursor,
|
2014-01-07 19:47:25 +07:00
|
|
|
thread, sample, max_stack);
|
perf callchain: Fix incorrect ordering of entries
The existing implementation of thread__resolve_callchain, under certain
circumstances, can assemble callchain entries in the incorrect order.
The callchain entries are resolved incorrectly for a sample when all of
the following conditions are met:
1. callchain_param.order is set to ORDER_CALLER
2. thread__resolve_callchain_sample is able to resolve callchain entries
for the sample.
3. unwind__get_entries is also able to resolve callchain entries for the
sample.
The fix is accomplished by reversing the order in which
thread__resolve_callchain_sample and unwind__get_entries are called when
callchain_param.order is set to ORDER_CALLER.
Unwind specific code from thread__resolve_callchain is also moved into a
new static function to improve readability of the fix.
How to Reproduce the Existing Bug:
Modifying perf script to print call trees in the opposite order or
applying the remaining patches from this series and comparing the
results output from export-to-postgtresql.py are the easiest ways to see
the bug, however it can still be seen in current builds using perf
report.
Here is how i can reproduce the bug using perf report:
# perf record --call-graph=dwarf stress -c 1 -t 5
when i run this command:
# perf report --call-graph=flat,0,0,callee
This callchain, containing kernel (handle_irq_event, etc) and userspace
samples (__libc_start_main, etc) is contained in the output, which looks
correct (callee order):
gen8_irq_handler
handle_irq_event_percpu
handle_irq_event
handle_edge_irq
handle_irq
do_IRQ
ret_from_intr
__random
rand
0x558f2a04dded
0x558f2a04c774
__libc_start_main
0x558f2a04dcd9
Now run this command using caller order:
# perf report --call-graph=flat,0,0,caller
It is expected to see the exact reverse of the above when using caller
order (with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the
bottom) in the output, but it is nowhere to be found.
instead you see this:
ret_from_intr
do_IRQ
handle_irq
handle_edge_irq
handle_irq_event
handle_irq_event_percpu
gen8_irq_handler
0x558f2a04dcd9
__libc_start_main
0x558f2a04c774
0x558f2a04dded
rand
__random
Notice how internally the kernel symbols are reversed and the user space
symbols are reversed, but the kernel symbols still appear above the user
space symbols.
if this patch is applied and perf script is re-run, you will see the
expected output (with "0x558f2a04dcd9" at the top and "gen8_irq_handler"
at the bottom):
0x558f2a04dcd9
__libc_start_main
0x558f2a04c774
0x558f2a04dded
rand
__random
ret_from_intr
do_IRQ
handle_irq
handle_edge_irq
handle_irq_event
handle_irq_event_percpu
gen8_irq_handler
Signed-off-by: Chris Phlipot <cphlipot0@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1461831551-12213-2-git-send-email-cphlipot0@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-28 15:19:06 +07:00
|
|
|
}
|
2012-12-08 03:39:39 +07:00
|
|
|
|
perf callchain: Fix incorrect ordering of entries
The existing implementation of thread__resolve_callchain, under certain
circumstances, can assemble callchain entries in the incorrect order.
The callchain entries are resolved incorrectly for a sample when all of
the following conditions are met:
1. callchain_param.order is set to ORDER_CALLER
2. thread__resolve_callchain_sample is able to resolve callchain entries
for the sample.
3. unwind__get_entries is also able to resolve callchain entries for the
sample.
The fix is accomplished by reversing the order in which
thread__resolve_callchain_sample and unwind__get_entries are called when
callchain_param.order is set to ORDER_CALLER.
Unwind specific code from thread__resolve_callchain is also moved into a
new static function to improve readability of the fix.
How to Reproduce the Existing Bug:
Modifying perf script to print call trees in the opposite order or
applying the remaining patches from this series and comparing the
results output from export-to-postgtresql.py are the easiest ways to see
the bug, however it can still be seen in current builds using perf
report.
Here is how i can reproduce the bug using perf report:
# perf record --call-graph=dwarf stress -c 1 -t 5
when i run this command:
# perf report --call-graph=flat,0,0,callee
This callchain, containing kernel (handle_irq_event, etc) and userspace
samples (__libc_start_main, etc) is contained in the output, which looks
correct (callee order):
gen8_irq_handler
handle_irq_event_percpu
handle_irq_event
handle_edge_irq
handle_irq
do_IRQ
ret_from_intr
__random
rand
0x558f2a04dded
0x558f2a04c774
__libc_start_main
0x558f2a04dcd9
Now run this command using caller order:
# perf report --call-graph=flat,0,0,caller
It is expected to see the exact reverse of the above when using caller
order (with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the
bottom) in the output, but it is nowhere to be found.
instead you see this:
ret_from_intr
do_IRQ
handle_irq
handle_edge_irq
handle_irq_event
handle_irq_event_percpu
gen8_irq_handler
0x558f2a04dcd9
__libc_start_main
0x558f2a04c774
0x558f2a04dded
rand
__random
Notice how internally the kernel symbols are reversed and the user space
symbols are reversed, but the kernel symbols still appear above the user
space symbols.
if this patch is applied and perf script is re-run, you will see the
expected output (with "0x558f2a04dcd9" at the top and "gen8_irq_handler"
at the bottom):
0x558f2a04dcd9
__libc_start_main
0x558f2a04c774
0x558f2a04dded
rand
__random
ret_from_intr
do_IRQ
handle_irq
handle_edge_irq
handle_irq_event
handle_irq_event_percpu
gen8_irq_handler
Signed-off-by: Chris Phlipot <cphlipot0@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1461831551-12213-2-git-send-email-cphlipot0@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-28 15:19:06 +07:00
|
|
|
int thread__resolve_callchain(struct thread *thread,
|
|
|
|
struct callchain_cursor *cursor,
|
2019-07-21 18:23:51 +07:00
|
|
|
struct evsel *evsel,
|
perf callchain: Fix incorrect ordering of entries
The existing implementation of thread__resolve_callchain, under certain
circumstances, can assemble callchain entries in the incorrect order.
The callchain entries are resolved incorrectly for a sample when all of
the following conditions are met:
1. callchain_param.order is set to ORDER_CALLER
2. thread__resolve_callchain_sample is able to resolve callchain entries
for the sample.
3. unwind__get_entries is also able to resolve callchain entries for the
sample.
The fix is accomplished by reversing the order in which
thread__resolve_callchain_sample and unwind__get_entries are called when
callchain_param.order is set to ORDER_CALLER.
Unwind specific code from thread__resolve_callchain is also moved into a
new static function to improve readability of the fix.
How to Reproduce the Existing Bug:
Modifying perf script to print call trees in the opposite order or
applying the remaining patches from this series and comparing the
results output from export-to-postgtresql.py are the easiest ways to see
the bug, however it can still be seen in current builds using perf
report.
Here is how i can reproduce the bug using perf report:
# perf record --call-graph=dwarf stress -c 1 -t 5
when i run this command:
# perf report --call-graph=flat,0,0,callee
This callchain, containing kernel (handle_irq_event, etc) and userspace
samples (__libc_start_main, etc) is contained in the output, which looks
correct (callee order):
gen8_irq_handler
handle_irq_event_percpu
handle_irq_event
handle_edge_irq
handle_irq
do_IRQ
ret_from_intr
__random
rand
0x558f2a04dded
0x558f2a04c774
__libc_start_main
0x558f2a04dcd9
Now run this command using caller order:
# perf report --call-graph=flat,0,0,caller
It is expected to see the exact reverse of the above when using caller
order (with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the
bottom) in the output, but it is nowhere to be found.
instead you see this:
ret_from_intr
do_IRQ
handle_irq
handle_edge_irq
handle_irq_event
handle_irq_event_percpu
gen8_irq_handler
0x558f2a04dcd9
__libc_start_main
0x558f2a04c774
0x558f2a04dded
rand
__random
Notice how internally the kernel symbols are reversed and the user space
symbols are reversed, but the kernel symbols still appear above the user
space symbols.
if this patch is applied and perf script is re-run, you will see the
expected output (with "0x558f2a04dcd9" at the top and "gen8_irq_handler"
at the bottom):
0x558f2a04dcd9
__libc_start_main
0x558f2a04c774
0x558f2a04dded
rand
__random
ret_from_intr
do_IRQ
handle_irq
handle_edge_irq
handle_irq_event
handle_irq_event_percpu
gen8_irq_handler
Signed-off-by: Chris Phlipot <cphlipot0@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1461831551-12213-2-git-send-email-cphlipot0@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-28 15:19:06 +07:00
|
|
|
struct perf_sample *sample,
|
|
|
|
struct symbol **parent,
|
|
|
|
struct addr_location *root_al,
|
|
|
|
int max_stack)
|
|
|
|
{
|
|
|
|
int ret = 0;
|
|
|
|
|
2017-08-06 21:39:39 +07:00
|
|
|
callchain_cursor_reset(cursor);
|
perf callchain: Fix incorrect ordering of entries
The existing implementation of thread__resolve_callchain, under certain
circumstances, can assemble callchain entries in the incorrect order.
The callchain entries are resolved incorrectly for a sample when all of
the following conditions are met:
1. callchain_param.order is set to ORDER_CALLER
2. thread__resolve_callchain_sample is able to resolve callchain entries
for the sample.
3. unwind__get_entries is also able to resolve callchain entries for the
sample.
The fix is accomplished by reversing the order in which
thread__resolve_callchain_sample and unwind__get_entries are called when
callchain_param.order is set to ORDER_CALLER.
Unwind specific code from thread__resolve_callchain is also moved into a
new static function to improve readability of the fix.
How to Reproduce the Existing Bug:
Modifying perf script to print call trees in the opposite order or
applying the remaining patches from this series and comparing the
results output from export-to-postgtresql.py are the easiest ways to see
the bug, however it can still be seen in current builds using perf
report.
Here is how i can reproduce the bug using perf report:
# perf record --call-graph=dwarf stress -c 1 -t 5
when i run this command:
# perf report --call-graph=flat,0,0,callee
This callchain, containing kernel (handle_irq_event, etc) and userspace
samples (__libc_start_main, etc) is contained in the output, which looks
correct (callee order):
gen8_irq_handler
handle_irq_event_percpu
handle_irq_event
handle_edge_irq
handle_irq
do_IRQ
ret_from_intr
__random
rand
0x558f2a04dded
0x558f2a04c774
__libc_start_main
0x558f2a04dcd9
Now run this command using caller order:
# perf report --call-graph=flat,0,0,caller
It is expected to see the exact reverse of the above when using caller
order (with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the
bottom) in the output, but it is nowhere to be found.
instead you see this:
ret_from_intr
do_IRQ
handle_irq
handle_edge_irq
handle_irq_event
handle_irq_event_percpu
gen8_irq_handler
0x558f2a04dcd9
__libc_start_main
0x558f2a04c774
0x558f2a04dded
rand
__random
Notice how internally the kernel symbols are reversed and the user space
symbols are reversed, but the kernel symbols still appear above the user
space symbols.
if this patch is applied and perf script is re-run, you will see the
expected output (with "0x558f2a04dcd9" at the top and "gen8_irq_handler"
at the bottom):
0x558f2a04dcd9
__libc_start_main
0x558f2a04c774
0x558f2a04dded
rand
__random
ret_from_intr
do_IRQ
handle_irq
handle_edge_irq
handle_irq_event
handle_irq_event_percpu
gen8_irq_handler
Signed-off-by: Chris Phlipot <cphlipot0@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1461831551-12213-2-git-send-email-cphlipot0@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-04-28 15:19:06 +07:00
|
|
|
|
|
|
|
if (callchain_param.order == ORDER_CALLEE) {
|
|
|
|
ret = thread__resolve_callchain_sample(thread, cursor,
|
|
|
|
evsel, sample,
|
|
|
|
parent, root_al,
|
|
|
|
max_stack);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
ret = thread__resolve_callchain_unwind(thread, cursor,
|
|
|
|
evsel, sample,
|
|
|
|
max_stack);
|
|
|
|
} else {
|
|
|
|
ret = thread__resolve_callchain_unwind(thread, cursor,
|
|
|
|
evsel, sample,
|
|
|
|
max_stack);
|
|
|
|
if (ret)
|
|
|
|
return ret;
|
|
|
|
ret = thread__resolve_callchain_sample(thread, cursor,
|
|
|
|
evsel, sample,
|
|
|
|
parent, root_al,
|
|
|
|
max_stack);
|
|
|
|
}
|
|
|
|
|
|
|
|
return ret;
|
2012-12-08 03:39:39 +07:00
|
|
|
}
|
2013-09-29 02:12:58 +07:00
|
|
|
|
|
|
|
int machine__for_each_thread(struct machine *machine,
|
|
|
|
int (*fn)(struct thread *thread, void *p),
|
|
|
|
void *priv)
|
|
|
|
{
|
2017-09-11 09:23:14 +07:00
|
|
|
struct threads *threads;
|
2013-09-29 02:12:58 +07:00
|
|
|
struct rb_node *nd;
|
|
|
|
struct thread *thread;
|
|
|
|
int rc = 0;
|
2017-09-11 09:23:14 +07:00
|
|
|
int i;
|
2013-09-29 02:12:58 +07:00
|
|
|
|
2017-09-11 09:23:14 +07:00
|
|
|
for (i = 0; i < THREADS__TABLE_SIZE; i++) {
|
|
|
|
threads = &machine->threads[i];
|
2018-12-07 02:18:14 +07:00
|
|
|
for (nd = rb_first_cached(&threads->entries); nd;
|
|
|
|
nd = rb_next(nd)) {
|
2017-09-11 09:23:14 +07:00
|
|
|
thread = rb_entry(nd, struct thread, rb_node);
|
|
|
|
rc = fn(thread, priv);
|
|
|
|
if (rc != 0)
|
|
|
|
return rc;
|
|
|
|
}
|
2013-09-29 02:12:58 +07:00
|
|
|
|
2017-09-11 09:23:14 +07:00
|
|
|
list_for_each_entry(thread, &threads->dead, node) {
|
|
|
|
rc = fn(thread, priv);
|
|
|
|
if (rc != 0)
|
|
|
|
return rc;
|
|
|
|
}
|
2013-09-29 02:12:58 +07:00
|
|
|
}
|
|
|
|
return rc;
|
|
|
|
}
|
2013-11-11 21:28:02 +07:00
|
|
|
|
2015-05-29 20:33:30 +07:00
|
|
|
int machines__for_each_thread(struct machines *machines,
|
|
|
|
int (*fn)(struct thread *thread, void *p),
|
|
|
|
void *priv)
|
|
|
|
{
|
|
|
|
struct rb_node *nd;
|
|
|
|
int rc = 0;
|
|
|
|
|
|
|
|
rc = machine__for_each_thread(&machines->host, fn, priv);
|
|
|
|
if (rc != 0)
|
|
|
|
return rc;
|
|
|
|
|
2018-12-07 02:18:14 +07:00
|
|
|
for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
|
2015-05-29 20:33:30 +07:00
|
|
|
struct machine *machine = rb_entry(nd, struct machine, rb_node);
|
|
|
|
|
|
|
|
rc = machine__for_each_thread(machine, fn, priv);
|
|
|
|
if (rc != 0)
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2014-07-22 20:17:25 +07:00
|
|
|
pid_t machine__get_current_tid(struct machine *machine, int cpu)
|
|
|
|
{
|
2019-08-28 04:43:50 +07:00
|
|
|
int nr_cpus = min(machine->env->nr_cpus_online, MAX_NR_CPUS);
|
|
|
|
|
|
|
|
if (cpu < 0 || cpu >= nr_cpus || !machine->current_tid)
|
2014-07-22 20:17:25 +07:00
|
|
|
return -1;
|
|
|
|
|
|
|
|
return machine->current_tid[cpu];
|
|
|
|
}
|
|
|
|
|
|
|
|
int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
|
|
|
|
pid_t tid)
|
|
|
|
{
|
|
|
|
struct thread *thread;
|
2019-08-28 04:43:50 +07:00
|
|
|
int nr_cpus = min(machine->env->nr_cpus_online, MAX_NR_CPUS);
|
2014-07-22 20:17:25 +07:00
|
|
|
|
|
|
|
if (cpu < 0)
|
|
|
|
return -EINVAL;
|
|
|
|
|
|
|
|
if (!machine->current_tid) {
|
|
|
|
int i;
|
|
|
|
|
2019-08-28 04:43:50 +07:00
|
|
|
machine->current_tid = calloc(nr_cpus, sizeof(pid_t));
|
2014-07-22 20:17:25 +07:00
|
|
|
if (!machine->current_tid)
|
|
|
|
return -ENOMEM;
|
2019-08-28 04:43:50 +07:00
|
|
|
for (i = 0; i < nr_cpus; i++)
|
2014-07-22 20:17:25 +07:00
|
|
|
machine->current_tid[i] = -1;
|
|
|
|
}
|
|
|
|
|
2019-08-28 04:43:50 +07:00
|
|
|
if (cpu >= nr_cpus) {
|
2014-07-22 20:17:25 +07:00
|
|
|
pr_err("Requested CPU %d too large. ", cpu);
|
|
|
|
pr_err("Consider raising MAX_NR_CPUS\n");
|
|
|
|
return -EINVAL;
|
|
|
|
}
|
|
|
|
|
|
|
|
machine->current_tid[cpu] = tid;
|
|
|
|
|
|
|
|
thread = machine__findnew_thread(machine, pid, tid);
|
|
|
|
if (!thread)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
thread->cpu = cpu;
|
perf machine: Protect the machine->threads with a rwlock
In addition to using refcounts for the struct thread lifetime
management, we need to protect access to machine->threads from
concurrent access.
That happens in 'perf top', where a thread processes events, inserting
and deleting entries from that rb_tree while another thread decays
hist_entries, that end up dropping references and ultimately deleting
threads from the rb_tree and releasing its resources when no further
hist_entry (or other data structures, like in 'perf sched') references
it.
So the rule is the same for refcounts + protected trees in the kernel,
get the tree lock, find object, bump the refcount, drop the tree lock,
return, use object, drop the refcount if no more use of it is needed,
keep it if storing it in some other data structure, drop when releasing
that data structure.
I.e. pair "t = machine__find(new)_thread()" with a "thread__put(t)", and
"perf_event__preprocess_sample(&al)" with "addr_location__put(&al)".
The addr_location__put() one is because as we return references to
several data structures, we may end up adding more reference counting
for the other data structures and then we'll drop it at
addr_location__put() time.
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-bs9rt4n0jw3hi9f3zxyy3xln@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-07 06:43:22 +07:00
|
|
|
thread__put(thread);
|
2014-07-22 20:17:25 +07:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
2014-08-16 02:08:39 +07:00
|
|
|
|
2018-05-17 16:21:53 +07:00
|
|
|
/*
|
|
|
|
* Compares the raw arch string. N.B. see instead perf_env__arch() if a
|
|
|
|
* normalized arch is needed.
|
|
|
|
*/
|
|
|
|
bool machine__is(struct machine *machine, const char *arch)
|
|
|
|
{
|
|
|
|
return machine && !strcmp(perf_env__raw_arch(machine->env), arch);
|
|
|
|
}
|
|
|
|
|
2018-05-22 17:54:32 +07:00
|
|
|
int machine__nr_cpus_avail(struct machine *machine)
|
|
|
|
{
|
|
|
|
return machine ? perf_env__nr_cpus_avail(machine->env) : 0;
|
|
|
|
}
|
|
|
|
|
2014-08-16 02:08:39 +07:00
|
|
|
int machine__get_kernel_start(struct machine *machine)
|
|
|
|
{
|
2015-09-30 21:54:04 +07:00
|
|
|
struct map *map = machine__kernel_map(machine);
|
2014-08-16 02:08:39 +07:00
|
|
|
int err = 0;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The only addresses above 2^63 are kernel addresses of a 64-bit
|
|
|
|
* kernel. Note that addresses are unsigned so that on a 32-bit system
|
|
|
|
* all addresses including kernel addresses are less than 2^32. In
|
|
|
|
* that case (32-bit system), if the kernel mapping is unknown, all
|
|
|
|
* addresses will be assumed to be in user space - see
|
|
|
|
* machine__kernel_ip().
|
|
|
|
*/
|
|
|
|
machine->kernel_start = 1ULL << 63;
|
|
|
|
if (map) {
|
2016-09-02 05:25:52 +07:00
|
|
|
err = map__load(map);
|
2018-05-17 16:21:54 +07:00
|
|
|
/*
|
|
|
|
* On x86_64, PTI entry trampolines are less than the
|
|
|
|
* start of kernel text, but still above 2^63. So leave
|
|
|
|
* kernel_start = 1ULL << 63 for x86_64.
|
|
|
|
*/
|
|
|
|
if (!err && !machine__is(machine, "x86_64"))
|
2014-08-16 02:08:39 +07:00
|
|
|
machine->kernel_start = map->start;
|
|
|
|
}
|
|
|
|
return err;
|
|
|
|
}
|
2015-05-29 21:31:12 +07:00
|
|
|
|
2018-11-07 04:07:10 +07:00
|
|
|
u8 machine__addr_cpumode(struct machine *machine, u8 cpumode, u64 addr)
|
|
|
|
{
|
|
|
|
u8 addr_cpumode = cpumode;
|
|
|
|
bool kernel_ip;
|
|
|
|
|
|
|
|
if (!machine->single_address_space)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
kernel_ip = machine__kernel_ip(machine, addr);
|
|
|
|
switch (cpumode) {
|
|
|
|
case PERF_RECORD_MISC_KERNEL:
|
|
|
|
case PERF_RECORD_MISC_USER:
|
|
|
|
addr_cpumode = kernel_ip ? PERF_RECORD_MISC_KERNEL :
|
|
|
|
PERF_RECORD_MISC_USER;
|
|
|
|
break;
|
|
|
|
case PERF_RECORD_MISC_GUEST_KERNEL:
|
|
|
|
case PERF_RECORD_MISC_GUEST_USER:
|
|
|
|
addr_cpumode = kernel_ip ? PERF_RECORD_MISC_GUEST_KERNEL :
|
|
|
|
PERF_RECORD_MISC_GUEST_USER;
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
out:
|
|
|
|
return addr_cpumode;
|
|
|
|
}
|
|
|
|
|
2015-05-29 21:31:12 +07:00
|
|
|
struct dso *machine__findnew_dso(struct machine *machine, const char *filename)
|
|
|
|
{
|
2015-06-02 01:40:01 +07:00
|
|
|
return dsos__findnew(&machine->dsos, filename);
|
2015-05-29 21:31:12 +07:00
|
|
|
}
|
2015-07-23 02:14:29 +07:00
|
|
|
|
|
|
|
char *machine__resolve_kernel_addr(void *vmachine, unsigned long long *addrp, char **modp)
|
|
|
|
{
|
|
|
|
struct machine *machine = vmachine;
|
|
|
|
struct map *map;
|
2018-04-30 22:20:54 +07:00
|
|
|
struct symbol *sym = machine__find_kernel_symbol(machine, *addrp, &map);
|
2015-07-23 02:14:29 +07:00
|
|
|
|
|
|
|
if (sym == NULL)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
*modp = __map__is_kmodule(map) ? (char *)map->dso->short_name : NULL;
|
|
|
|
*addrp = map->unmap_ip(map, sym->start);
|
|
|
|
return sym->name;
|
|
|
|
}
|