2014-12-29 23:42:46 +07:00
|
|
|
libperf-y += annotate.o
|
perf annotate: Add branch stack / basic block
I wanted to know the hottest path through a function and figured the
branch-stack (LBR) information should be able to help out with that.
The below uses the branch-stack to create basic blocks and generate
statistics from them.
from to branch_i
* ----> *
|
| block
v
* ----> *
from to branch_i+1
The blocks are broken down into non-overlapping ranges, while tracking
if the start of each range is an entry point and/or the end of a range
is a branch.
Each block iterates all ranges it covers (while splitting where required
to exactly match the block) and increments the 'coverage' count.
For the range including the branch we increment the taken counter, as
well as the pred counter if flags.predicted.
Using these number we can find if an instruction:
- had coverage; given by:
br->coverage / br->sym->max_coverage
This metric ensures each symbol has a 100% spot, which reflects the
observation that each symbol must have a most covered/hottest
block.
- is a branch target: br->is_target && br->start == add
- for targets, how much of a branch's coverages comes from it:
target->entry / branch->coverage
- is a branch: br->is_branch && br->end == addr
- for branches, how often it was taken:
br->taken / br->coverage
after all, all execution that didn't take the branch would have
incremented the coverage and continued onward to a later branch.
- for branches, how often it was predicted:
br->pred / br->taken
The coverage percentage is used to color the address and asm sections;
for low (<1%) coverage we use NORMAL (uncolored), indicating that these
instructions are not 'important'. For high coverage (>75%) we color the
address RED.
For each branch, we add an asm comment after the instruction with
information on how often it was taken and predicted.
Output looks like (sans color, which does loose a lot of the
information :/)
$ perf record --branch-filter u,any -e cycles:p ./branches 27
$ perf annotate branches
Percent | Source code & Disassembly of branches for cycles:pu (217 samples)
---------------------------------------------------------------------------------
: branches():
0.00 : 40057a: push %rbp
0.00 : 40057b: mov %rsp,%rbp
0.00 : 40057e: sub $0x20,%rsp
0.00 : 400582: mov %rdi,-0x18(%rbp)
0.00 : 400586: mov %rsi,-0x20(%rbp)
0.00 : 40058a: mov -0x18(%rbp),%rax
0.00 : 40058e: mov %rax,-0x10(%rbp)
0.00 : 400592: movq $0x0,-0x8(%rbp)
0.00 : 40059a: jmpq 400656 <branches+0xdc>
1.84 : 40059f: mov -0x10(%rbp),%rax # +100.00%
3.23 : 4005a3: and $0x1,%eax
1.84 : 4005a6: test %rax,%rax
0.00 : 4005a9: je 4005bf <branches+0x45> # -54.50% (p:42.00%)
0.46 : 4005ab: mov 0x200bbe(%rip),%rax # 601170 <acc>
12.90 : 4005b2: add $0x1,%rax
2.30 : 4005b6: mov %rax,0x200bb3(%rip) # 601170 <acc>
0.46 : 4005bd: jmp 4005d1 <branches+0x57> # -100.00% (p:100.00%)
0.92 : 4005bf: mov 0x200baa(%rip),%rax # 601170 <acc> # +49.54%
13.82 : 4005c6: sub $0x1,%rax
0.46 : 4005ca: mov %rax,0x200b9f(%rip) # 601170 <acc>
2.30 : 4005d1: mov -0x10(%rbp),%rax # +50.46%
0.46 : 4005d5: mov %rax,%rdi
0.46 : 4005d8: callq 400526 <lfsr> # -100.00% (p:100.00%)
0.00 : 4005dd: mov %rax,-0x10(%rbp) # +100.00%
0.92 : 4005e1: mov -0x18(%rbp),%rax
0.00 : 4005e5: and $0x1,%eax
0.00 : 4005e8: test %rax,%rax
0.00 : 4005eb: je 4005ff <branches+0x85> # -100.00% (p:100.00%)
0.00 : 4005ed: mov 0x200b7c(%rip),%rax # 601170 <acc>
0.00 : 4005f4: shr $0x2,%rax
0.00 : 4005f8: mov %rax,0x200b71(%rip) # 601170 <acc>
0.00 : 4005ff: mov -0x10(%rbp),%rax # +100.00%
7.37 : 400603: and $0x1,%eax
3.69 : 400606: test %rax,%rax
0.00 : 400609: jne 400612 <branches+0x98> # -59.25% (p:42.99%)
1.84 : 40060b: mov $0x1,%eax
14.29 : 400610: jmp 400617 <branches+0x9d> # -100.00% (p:100.00%)
1.38 : 400612: mov $0x0,%eax # +57.65%
10.14 : 400617: test %al,%al # +42.35%
0.00 : 400619: je 40062f <branches+0xb5> # -57.65% (p:100.00%)
0.46 : 40061b: mov 0x200b4e(%rip),%rax # 601170 <acc>
2.76 : 400622: sub $0x1,%rax
0.00 : 400626: mov %rax,0x200b43(%rip) # 601170 <acc>
0.46 : 40062d: jmp 400641 <branches+0xc7> # -100.00% (p:100.00%)
0.92 : 40062f: mov 0x200b3a(%rip),%rax # 601170 <acc> # +56.13%
2.30 : 400636: add $0x1,%rax
0.92 : 40063a: mov %rax,0x200b2f(%rip) # 601170 <acc>
0.92 : 400641: mov -0x10(%rbp),%rax # +43.87%
2.30 : 400645: mov %rax,%rdi
0.00 : 400648: callq 400526 <lfsr> # -100.00% (p:100.00%)
0.00 : 40064d: mov %rax,-0x10(%rbp) # +100.00%
1.84 : 400651: addq $0x1,-0x8(%rbp)
0.92 : 400656: mov -0x8(%rbp),%rax
5.07 : 40065a: cmp -0x20(%rbp),%rax
0.00 : 40065e: jb 40059f <branches+0x25> # -100.00% (p:100.00%)
0.00 : 400664: nop
0.00 : 400665: leaveq
0.00 : 400666: retq
(Note: the --branch-filter u,any was used to avoid spurious target and
branch points due to interrupts/faults, they show up as very small -/+
annotations on 'weird' locations)
Committer note:
Please take a look at:
http://vger.kernel.org/~acme/perf/annotate_basic_blocks.png
To see the colors.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: David Carrillo-Cisneros <davidcc@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
[ Moved sym->max_coverage to 'struct annotate', aka symbol__annotate(sym) ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-09-06 02:08:12 +07:00
|
|
|
libperf-y += block-range.o
|
2014-12-29 23:42:46 +07:00
|
|
|
libperf-y += build-id.o
|
|
|
|
libperf-y += config.o
|
|
|
|
libperf-y += ctype.o
|
|
|
|
libperf-y += db-export.o
|
2015-09-08 23:30:00 +07:00
|
|
|
libperf-y += env.o
|
2014-12-29 23:42:46 +07:00
|
|
|
libperf-y += event.o
|
|
|
|
libperf-y += evlist.o
|
|
|
|
libperf-y += evsel.o
|
2016-04-15 05:45:01 +07:00
|
|
|
libperf-y += evsel_fprintf.o
|
2016-01-08 20:46:52 +07:00
|
|
|
libperf-y += find_bit.o
|
2014-12-29 23:42:46 +07:00
|
|
|
libperf-y += kallsyms.o
|
|
|
|
libperf-y += levenshtein.o
|
2015-07-21 18:13:34 +07:00
|
|
|
libperf-y += llvm-utils.o
|
2017-04-26 01:45:35 +07:00
|
|
|
libperf-y += memswap.o
|
2014-12-29 23:42:46 +07:00
|
|
|
libperf-y += parse-events.o
|
2015-09-25 04:53:49 +07:00
|
|
|
libperf-y += perf_regs.o
|
2014-12-29 23:42:46 +07:00
|
|
|
libperf-y += path.o
|
2017-04-18 02:23:22 +07:00
|
|
|
libperf-y += print_binary.o
|
2014-12-29 23:42:46 +07:00
|
|
|
libperf-y += rbtree.o
|
2015-11-16 21:36:29 +07:00
|
|
|
libperf-y += libstring.o
|
2014-12-29 23:42:46 +07:00
|
|
|
libperf-y += bitmap.o
|
|
|
|
libperf-y += hweight.o
|
|
|
|
libperf-y += quote.o
|
|
|
|
libperf-y += strbuf.o
|
|
|
|
libperf-y += string.o
|
|
|
|
libperf-y += strlist.o
|
|
|
|
libperf-y += strfilter.o
|
|
|
|
libperf-y += top.o
|
|
|
|
libperf-y += usage.o
|
|
|
|
libperf-y += dso.o
|
|
|
|
libperf-y += symbol.o
|
2016-04-15 01:54:36 +07:00
|
|
|
libperf-y += symbol_fprintf.o
|
2014-12-29 23:42:46 +07:00
|
|
|
libperf-y += color.o
|
|
|
|
libperf-y += header.o
|
|
|
|
libperf-y += callchain.o
|
|
|
|
libperf-y += values.o
|
|
|
|
libperf-y += debug.o
|
|
|
|
libperf-y += machine.o
|
|
|
|
libperf-y += map.o
|
|
|
|
libperf-y += pstack.o
|
|
|
|
libperf-y += session.o
|
2016-04-04 23:32:20 +07:00
|
|
|
libperf-$(CONFIG_AUDIT) += syscalltbl.o
|
2014-12-29 23:42:46 +07:00
|
|
|
libperf-y += ordered-events.o
|
perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info
Introduce a new option to record PERF_RECORD_NAMESPACES events emitted
by the kernel when fork, clone, setns or unshare are invoked. And update
perf-record documentation with the new option to record namespace
events.
Committer notes:
Combined it with a later patch to allow printing it via 'perf report -D'
and be able to test the feature introduced in this patch. Had to move
here also perf_ns__name(), that was introduced in another later patch.
Also used PRIu64 and PRIx64 to fix the build in some enfironments wrt:
util/event.c:1129:39: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'long long unsigned int' [-Werror=format=]
ret += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx
^
Testing it:
# perf record --namespaces -a
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.083 MB perf.data (423 samples) ]
#
# perf report -D
<SNIP>
3 2028902078892 0x115140 [0xa0]: PERF_RECORD_NAMESPACES 14783/14783 - nr_namespaces: 7
[0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]
0x1151e0 [0x30]: event: 9
.
. ... raw event: size 48 bytes
. 0000: 09 00 00 00 02 00 30 00 c4 71 82 68 0c 7f 00 00 ......0..q.h....
. 0010: a9 39 00 00 a9 39 00 00 94 28 fe 63 d8 01 00 00 .9...9...(.c....
. 0020: 03 00 00 00 00 00 00 00 ce c4 02 00 00 00 00 00 ................
<SNIP>
NAMESPACES events: 1
<SNIP>
#
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sargun Dhillon <sargun@sargun.me>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/148891930386.25309.18412039920746995488.stgit@hbathini.in.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-03-08 03:41:43 +07:00
|
|
|
libperf-y += namespaces.o
|
2014-12-29 23:42:46 +07:00
|
|
|
libperf-y += comm.o
|
|
|
|
libperf-y += thread.o
|
|
|
|
libperf-y += thread_map.o
|
|
|
|
libperf-y += trace-event-parse.o
|
|
|
|
libperf-y += parse-events-flex.o
|
|
|
|
libperf-y += parse-events-bison.o
|
|
|
|
libperf-y += pmu.o
|
|
|
|
libperf-y += pmu-flex.o
|
|
|
|
libperf-y += pmu-bison.o
|
|
|
|
libperf-y += trace-event-read.o
|
|
|
|
libperf-y += trace-event-info.o
|
|
|
|
libperf-y += trace-event-scripting.o
|
|
|
|
libperf-y += trace-event.o
|
|
|
|
libperf-y += svghelper.o
|
|
|
|
libperf-y += sort.o
|
|
|
|
libperf-y += hist.o
|
|
|
|
libperf-y += util.o
|
|
|
|
libperf-y += xyarray.o
|
|
|
|
libperf-y += cpumap.o
|
|
|
|
libperf-y += cgroup.o
|
|
|
|
libperf-y += target.o
|
|
|
|
libperf-y += rblist.o
|
|
|
|
libperf-y += intlist.o
|
|
|
|
libperf-y += vdso.o
|
2015-08-07 17:51:03 +07:00
|
|
|
libperf-y += counts.o
|
2014-12-29 23:42:46 +07:00
|
|
|
libperf-y += stat.o
|
2015-06-03 21:25:59 +07:00
|
|
|
libperf-y += stat-shadow.o
|
2014-12-29 23:42:46 +07:00
|
|
|
libperf-y += record.o
|
|
|
|
libperf-y += srcline.o
|
|
|
|
libperf-y += data.o
|
2016-03-08 15:38:50 +07:00
|
|
|
libperf-y += tsc.o
|
2014-12-29 23:42:46 +07:00
|
|
|
libperf-y += cloexec.o
|
2016-04-28 15:19:07 +07:00
|
|
|
libperf-y += call-path.o
|
2014-12-29 23:42:46 +07:00
|
|
|
libperf-y += thread-stack.o
|
2015-04-30 21:37:27 +07:00
|
|
|
libperf-$(CONFIG_AUXTRACE) += auxtrace.o
|
2015-07-17 23:33:37 +07:00
|
|
|
libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
|
2015-07-17 23:33:41 +07:00
|
|
|
libperf-$(CONFIG_AUXTRACE) += intel-pt.o
|
perf tools: Add Intel BTS support
Intel BTS support fits within the new auxtrace infrastructure. Recording is
supporting by identifying the Intel BTS PMU, parsing options and setting up
events.
Decoding is supported by queuing up trace data by thread and then decoding
synchronously delivering synthesized event samples into the session processing
for tools to consume.
Committer note:
E.g:
[root@felicio ~]# perf record --per-thread -e intel_bts// ls
anaconda-ks.cfg apctest.output bin kernel-rt-3.10.0-298.rt56.171.el7.x86_64.rpm libexec lock_page.bpf.c perf.data perf.data.old
[ perf record: Woken up 3 times to write data ]
[ perf record: Captured and wrote 4.367 MB perf.data ]
[root@felicio ~]# perf evlist -v
intel_bts//: type: 6, size: 112, { sample_period, sample_freq }: 1, sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1
dummy:u: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 1, sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1
[root@felicio ~]# perf script # the navigate in the pager to some interesting place:
ls 1843 1 branches: ffffffff810a60cb flush_signal_handlers ([kernel.kallsyms]) => ffffffff8121a522 setup_new_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8121a529 setup_new_exec ([kernel.kallsyms]) => ffffffff8122fa30 do_close_on_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8122fa5d do_close_on_exec ([kernel.kallsyms]) => ffffffff81767ae0 _raw_spin_lock ([kernel.kallsyms])
ls 1843 1 branches: ffffffff81767af4 _raw_spin_lock ([kernel.kallsyms]) => ffffffff8122fa62 do_close_on_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8122fac9 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fad2 do_close_on_exec ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8122fadd do_close_on_exec ([kernel.kallsyms]) => ffffffff8120fc80 filp_close ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8120fcaf filp_close ([kernel.kallsyms]) => ffffffff8120fcb6 filp_close ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8120fcc2 filp_close ([kernel.kallsyms]) => ffffffff812547f0 dnotify_flush ([kernel.kallsyms])
ls 1843 1 branches: ffffffff81254823 dnotify_flush ([kernel.kallsyms]) => ffffffff8120fcc7 filp_close ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8120fccd filp_close ([kernel.kallsyms]) => ffffffff81261790 locks_remove_posix ([kernel.kallsyms])
ls 1843 1 branches: ffffffff812617a3 locks_remove_posix ([kernel.kallsyms]) => ffffffff812617b9 locks_remove_posix ([kernel.kallsyms])
ls 1843 1 branches: ffffffff812617b9 locks_remove_posix ([kernel.kallsyms]) => ffffffff8120fcd2 filp_close ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8120fcd5 filp_close ([kernel.kallsyms]) => ffffffff812142c0 fput ([kernel.kallsyms])
ls 1843 1 branches: ffffffff812142d6 fput ([kernel.kallsyms]) => ffffffff812142df fput ([kernel.kallsyms])
ls 1843 1 branches: ffffffff8121430c fput ([kernel.kallsyms]) => ffffffff810b6580 task_work_add ([kernel.kallsyms])
ls 1843 1 branches: ffffffff810b65ad task_work_add ([kernel.kallsyms]) => ffffffff810b65b1 task_work_add ([kernel.kallsyms])
ls 1843 1 branches: ffffffff810b65c1 task_work_add ([kernel.kallsyms]) => ffffffff810bc710 kick_process ([kernel.kallsyms])
ls 1843 1 branches: ffffffff810bc725 kick_process ([kernel.kallsyms]) => ffffffff810bc742 kick_process ([kernel.kallsyms])
ls 1843 1 branches: ffffffff810bc742 kick_process ([kernel.kallsyms]) => ffffffff810b65c6 task_work_add ([kernel.kallsyms])
ls 1843 1 branches: ffffffff810b65c9 task_work_add ([kernel.kallsyms]) => ffffffff81214311 fput ([kernel.kallsyms])
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-9-git-send-email-adrian.hunter@intel.com
[ Merged sample->time fix for bug found after first round of testing on slightly older kernel ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-17 23:33:43 +07:00
|
|
|
libperf-$(CONFIG_AUXTRACE) += intel-bts.o
|
2015-05-28 00:51:51 +07:00
|
|
|
libperf-y += parse-branch-options.o
|
2017-02-24 06:46:34 +07:00
|
|
|
libperf-y += dump-insn.o
|
perf record: Add ability to name registers to record
This patch modifies the -I/--int-regs option to enablepassing the name
of the registers to sample on interrupt. Registers can be specified by
their symbolic names. For instance on x86, --intr-regs=ax,si.
The motivation is to reduce the size of the perf.data file and the
overhead of sampling by only collecting the registers useful to a
specific analysis. For instance, for value profiling, sampling only the
registers used to passed arguements to functions.
With no parameter, the --intr-regs still records all possible registers
based on the architecture.
To name registers, it is necessary to use the long form of the option,
i.e., --intr-regs:
$ perf record --intr-regs=si,di,r8,r9 .....
To record any possible registers:
$ perf record -I .....
$ perf report --intr-regs ...
To display the register, one can use perf report -D
To list the available registers:
$ perf record --intr-regs=\?
available registers: AX BX CX DX SI DI BP SP IP FLAGS CS SS R8 R9 R10 R11 R12 R13 R14 R15
Signed-off-by: Stephane Eranian <eranian@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1441039273-16260-4-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-08-31 23:41:12 +07:00
|
|
|
libperf-y += parse-regs-options.o
|
2015-12-08 11:21:42 +07:00
|
|
|
libperf-y += term.o
|
2015-12-14 11:18:09 +07:00
|
|
|
libperf-y += help-unknown-cmd.o
|
2016-02-15 15:34:34 +07:00
|
|
|
libperf-y += mem-events.o
|
2016-07-08 01:42:33 +07:00
|
|
|
libperf-y += vsprintf.o
|
2016-09-16 22:50:02 +07:00
|
|
|
libperf-y += drv_configs.o
|
2017-04-20 02:05:56 +07:00
|
|
|
libperf-y += units.o
|
2016-11-30 00:15:41 +07:00
|
|
|
libperf-y += time-utils.o
|
2017-03-21 03:17:05 +07:00
|
|
|
libperf-y += expr-bison.o
|
2017-07-18 19:13:13 +07:00
|
|
|
libperf-y += branch.o
|
2014-12-29 23:42:46 +07:00
|
|
|
|
perf tools: Enable passing bpf object file to --event
By introducing new rules in tools/perf/util/parse-events.[ly], this
patch enables 'perf record --event bpf_file.o' to select events by an
eBPF object file. It calls parse_events_load_bpf() to load that file,
which uses bpf__prepare_load() and finally calls bpf_object__open() for
the object files.
After applying this patch, commands like:
# perf record --event foo.o sleep
become possible.
However, at this point it is unable to link any useful things onto the
evsel list because the creating of probe points and BPF program
attaching have not been implemented. Before real events are possible to
be extracted, to avoid perf report error because of empty evsel list,
this patch link a dummy evsel. The dummy event related code will be
removed when probing and extracting code is ready.
Commiter notes:
Using it:
$ ls -la foo.o
ls: cannot access foo.o: No such file or directory
$ perf record --event foo.o sleep
libbpf: failed to open foo.o: No such file or directory
event syntax error: 'foo.o'
\___ BPF object file 'foo.o' is invalid
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
$
$ file /tmp/build/perf/perf.o
/tmp/build/perf/perf.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
$ perf record --event /tmp/build/perf/perf.o sleep
libbpf: /tmp/build/perf/perf.o is not an eBPF object file
event syntax error: '/tmp/build/perf/perf.o'
\___ BPF object file '/tmp/build/perf/perf.o' is invalid
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
$
$ file /tmp/foo.o
/tmp/foo.o: ELF 64-bit LSB relocatable, no machine, version 1 (SYSV), not stripped
$ perf record --event /tmp/foo.o sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data ]
$ perf evlist
/tmp/foo.o
$ perf evlist -v
/tmp/foo.o: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1
$
So, type 1 is PERF_TYPE_SOFTWARE, config 0x9 is PERF_COUNT_SW_DUMMY, ok.
$ perf report --stdio
Error:
The perf.data file has no samples!
# To display the perf.data header info, please use --header/--header-only options.
#
$
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1444826502-49291-4-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-14 19:41:14 +07:00
|
|
|
libperf-$(CONFIG_LIBBPF) += bpf-loader.o
|
perf bpf: Add prologue for BPF programs for fetching arguments
This patch generates a prologue for a BPF program which fetches arguments for
it. With this patch, the program can have arguments as follow:
SEC("lock_page=__lock_page page->flags")
int lock_page(struct pt_regs *ctx, int err, unsigned long flags)
{
return 1;
}
This patch passes at most 3 arguments from r3, r4 and r5. r1 is still the ctx
pointer. r2 is used to indicate if dereferencing was done successfully.
This patch uses r6 to hold ctx (struct pt_regs) and r7 to hold stack pointer
for result. Result of each arguments first store on stack:
low address
BPF_REG_FP - 24 ARG3
BPF_REG_FP - 16 ARG2
BPF_REG_FP - 8 ARG1
BPF_REG_FP
high address
Then loaded into r3, r4 and r5.
The output prologue for offn(...off2(off1(reg)))) should be:
r6 <- r1 // save ctx into a callee saved register
r7 <- fp
r7 <- r7 - stack_offset // pointer to result slot
/* load r3 with the offset in pt_regs of 'reg' */
(r7) <- r3 // make slot valid
r3 <- r3 + off1 // prepare to read unsafe pointer
r2 <- 8
r1 <- r7 // result put onto stack
call probe_read // read unsafe pointer
jnei r0, 0, err // error checking
r3 <- (r7) // read result
r3 <- r3 + off2 // prepare to read unsafe pointer
r2 <- 8
r1 <- r7
call probe_read
jnei r0, 0, err
...
/* load r2, r3, r4 from stack */
goto success
err:
r2 <- 1
/* load r3, r4, r5 with 0 */
goto usercode
success:
r2 <- 0
usercode:
r1 <- r6 // restore ctx
// original user code
If all of arguments reside in register (dereferencing is not
required), gen_prologue_fastpath() will be used to create
fast prologue:
r3 <- (r1 + offset of reg1)
r4 <- (r1 + offset of reg2)
r5 <- (r1 + offset of reg3)
r2 <- 0
P.S.
eBPF calling convention is defined as:
* r0 - return value from in-kernel function, and exit value
for eBPF program
* r1 - r5 - arguments from eBPF program to in-kernel function
* r6 - r9 - callee saved registers that in-kernel function will
preserve
* r10 - read-only frame pointer to access stack
Committer note:
At least testing if it builds and loads:
# cat test_probe_arg.c
struct pt_regs;
__attribute__((section("lock_page=__lock_page page->flags"), used))
int func(struct pt_regs *ctx, int err, unsigned long flags)
{
return 1;
}
char _license[] __attribute__((section("license"), used)) = "GPL";
int _version __attribute__((section("version"), used)) = 0x40300;
# perf record -e ./test_probe_arg.c usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data ]
# perf evlist
perf_bpf_probe:lock_page
#
Signed-off-by: He Kuang <hekuang@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1447675815-166222-11-git-send-email-wangnan0@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-11-16 19:10:12 +07:00
|
|
|
libperf-$(CONFIG_BPF_PROLOGUE) += bpf-prologue.o
|
2014-12-30 05:52:25 +07:00
|
|
|
libperf-$(CONFIG_LIBELF) += symbol-elf.o
|
2015-07-15 16:14:07 +07:00
|
|
|
libperf-$(CONFIG_LIBELF) += probe-file.o
|
2014-12-30 05:52:25 +07:00
|
|
|
libperf-$(CONFIG_LIBELF) += probe-event.o
|
|
|
|
|
|
|
|
ifndef CONFIG_LIBELF
|
|
|
|
libperf-y += symbol-minimal.o
|
|
|
|
endif
|
|
|
|
|
2017-07-19 03:15:29 +07:00
|
|
|
ifndef CONFIG_SETNS
|
|
|
|
libperf-y += setns.o
|
|
|
|
endif
|
|
|
|
|
2014-12-30 06:06:25 +07:00
|
|
|
libperf-$(CONFIG_DWARF) += probe-finder.o
|
|
|
|
libperf-$(CONFIG_DWARF) += dwarf-aux.o
|
2016-08-25 23:24:57 +07:00
|
|
|
libperf-$(CONFIG_DWARF) += dwarf-regs.o
|
2014-12-30 06:06:25 +07:00
|
|
|
|
2014-12-30 06:11:11 +07:00
|
|
|
libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
|
2016-06-03 10:33:16 +07:00
|
|
|
libperf-$(CONFIG_LOCAL_LIBUNWIND) += unwind-libunwind-local.o
|
2016-06-03 10:33:17 +07:00
|
|
|
libperf-$(CONFIG_LIBUNWIND) += unwind-libunwind.o
|
2016-06-03 10:33:22 +07:00
|
|
|
libperf-$(CONFIG_LIBUNWIND_X86) += libunwind/x86_32.o
|
2016-06-03 10:33:23 +07:00
|
|
|
libperf-$(CONFIG_LIBUNWIND_AARCH64) += libunwind/arm64.o
|
2014-12-30 06:11:11 +07:00
|
|
|
|
perf data: Add perf data to CTF conversion support
Adding 'perf data convert' to convert perf data file into different
format. This patch adds support for CTF format conversion.
To convert perf.data into CTF run:
$ perf data convert --to-ctf=./ctf-data/
[ perf data convert: Converted 'perf.data' into CTF data './ctf-data/' ]
[ perf data convert: Converted and wrote 11.268 MB (100230 samples) ]
The command will create CTF metadata out of perf.data file (or one
specified via -i option) and then convert all sample events into single
CTF stream.
Each sample_type bit is translated into separated CTF event field apart
from following exceptions:
PERF_SAMPLE_RAW - added in next patch
PERF_SAMPLE_READ - TODO
PERF_SAMPLE_CALLCHAIN - TODO
PERF_SAMPLE_BRANCH_STACK - TODO
PERF_SAMPLE_REGS_USER - TODO
PERF_SAMPLE_STACK_USER - TODO
$ perf --debug=data-convert=2 data convert ...
The converted CTF data could be analyzed by CTF tools, like babletrace
or tracecompass [1].
$ babeltrace ./ctf-data/
[03:19:13.962125533] (+?.?????????) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 1 }
[03:19:13.962130001] (+0.000004468) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 1 }
[03:19:13.962131936] (+0.000001935) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 8 }
[03:19:13.962133732] (+0.000001796) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 114 }
[03:19:13.962135557] (+0.000001825) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 2087 }
[03:19:13.962137627] (+0.000002070) cycles: { }, { ip = 0xFFFFFFFF81361938, tid = 20714, pid = 20714, period = 37582 }
[03:19:13.962161091] (+0.000023464) cycles: { }, { ip = 0xFFFFFFFF8124218F, tid = 20714, pid = 20714, period = 600246 }
[03:19:13.962517569] (+0.000356478) cycles: { }, { ip = 0xFFFFFFFF811A75DB, tid = 20714, pid = 20714, period = 1325731 }
[03:19:13.969518008] (+0.007000439) cycles: { }, { ip = 0x34080917B2, tid = 20714, pid = 20714, period = 1144298 }
The following members to the ctf-environment were decided to be added to
distinguish and specify perf CTF data:
- domain
It says "kernel" because it contains a kernel trace (not to be
confused with a user space like lttng-ust does)
- tracer_name
It says perf. This can be used to distinguish between lttng and perf
CTF based trace.
- version
The kernel version from stream. In addition to release, this is what
it looks like on a Debian kernel:
release = "3.14-1-amd64";
version = "3.14.0";
[1] http://projects.eclipse.org/projects/tools.tracecompass
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: David Ahern <dsahern@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeremie Galarneau <jgalar@efficios.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1424470628-5969-4-git-send-email-jolsa@kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-02-21 05:17:00 +07:00
|
|
|
libperf-$(CONFIG_LIBBABELTRACE) += data-convert-bt.o
|
|
|
|
|
2014-12-30 19:11:32 +07:00
|
|
|
libperf-y += scripting-engines/
|
|
|
|
|
2014-12-30 19:31:12 +07:00
|
|
|
libperf-$(CONFIG_ZLIB) += zlib.o
|
2015-01-29 19:29:39 +07:00
|
|
|
libperf-$(CONFIG_LZMA) += lzma.o
|
2015-11-30 16:02:20 +07:00
|
|
|
libperf-y += demangle-java.o
|
2016-07-09 14:20:00 +07:00
|
|
|
libperf-y += demangle-rust.o
|
2016-03-08 04:48:45 +07:00
|
|
|
|
2016-03-10 23:41:13 +07:00
|
|
|
ifdef CONFIG_JITDUMP
|
2015-11-30 16:02:21 +07:00
|
|
|
libperf-$(CONFIG_LIBELF) += jitdump.o
|
|
|
|
libperf-$(CONFIG_LIBELF) += genelf.o
|
2016-10-13 17:59:36 +07:00
|
|
|
libperf-$(CONFIG_DWARF) += genelf_debug.o
|
2016-03-08 04:48:45 +07:00
|
|
|
endif
|
2014-12-30 19:30:04 +07:00
|
|
|
|
2016-11-26 14:03:28 +07:00
|
|
|
libperf-y += perf-hooks.o
|
|
|
|
|
2016-11-26 14:03:34 +07:00
|
|
|
libperf-$(CONFIG_CXX) += c++/
|
|
|
|
|
2014-12-29 23:42:46 +07:00
|
|
|
CFLAGS_config.o += -DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
|
2015-11-30 16:02:23 +07:00
|
|
|
# avoid compiler warnings in 32-bit mode
|
|
|
|
CFLAGS_genelf_debug.o += -Wno-packed
|
2014-12-29 23:42:46 +07:00
|
|
|
|
|
|
|
$(OUTPUT)util/parse-events-flex.c: util/parse-events.l $(OUTPUT)util/parse-events-bison.c
|
2014-12-31 00:44:38 +07:00
|
|
|
$(call rule_mkdir)
|
2015-05-14 22:31:48 +07:00
|
|
|
$(Q)$(call echo-cmd,flex)$(FLEX) -o $@ --header-file=$(OUTPUT)util/parse-events-flex.h $(PARSER_DEBUG_FLEX) util/parse-events.l
|
2014-12-29 23:42:46 +07:00
|
|
|
|
|
|
|
$(OUTPUT)util/parse-events-bison.c: util/parse-events.y
|
2014-12-31 00:44:38 +07:00
|
|
|
$(call rule_mkdir)
|
2015-05-14 22:31:48 +07:00
|
|
|
$(Q)$(call echo-cmd,bison)$(BISON) -v util/parse-events.y -d $(PARSER_DEBUG_BISON) -o $@ -p parse_events_
|
2014-12-29 23:42:46 +07:00
|
|
|
|
2017-03-21 03:17:05 +07:00
|
|
|
$(OUTPUT)util/expr-bison.c: util/expr.y
|
|
|
|
$(call rule_mkdir)
|
|
|
|
$(Q)$(call echo-cmd,bison)$(BISON) -v util/expr.y -d $(PARSER_DEBUG_BISON) -o $@ -p expr__
|
|
|
|
|
2014-12-29 23:42:46 +07:00
|
|
|
$(OUTPUT)util/pmu-flex.c: util/pmu.l $(OUTPUT)util/pmu-bison.c
|
2014-12-31 00:44:38 +07:00
|
|
|
$(call rule_mkdir)
|
2015-05-14 22:31:48 +07:00
|
|
|
$(Q)$(call echo-cmd,flex)$(FLEX) -o $@ --header-file=$(OUTPUT)util/pmu-flex.h util/pmu.l
|
2014-12-29 23:42:46 +07:00
|
|
|
|
|
|
|
$(OUTPUT)util/pmu-bison.c: util/pmu.y
|
2014-12-31 00:44:38 +07:00
|
|
|
$(call rule_mkdir)
|
2015-05-14 22:31:48 +07:00
|
|
|
$(Q)$(call echo-cmd,bison)$(BISON) -v util/pmu.y -d -o $@ -p perf_pmu_
|
2014-12-29 23:42:46 +07:00
|
|
|
|
|
|
|
CFLAGS_parse-events-flex.o += -w
|
|
|
|
CFLAGS_pmu-flex.o += -w
|
2015-04-29 22:55:00 +07:00
|
|
|
CFLAGS_parse-events-bison.o += -DYYENABLE_NLS=0 -w
|
2014-12-29 23:42:46 +07:00
|
|
|
CFLAGS_pmu-bison.o += -DYYENABLE_NLS=0 -DYYLTYPE_IS_TRIVIAL=0 -w
|
2017-03-21 03:17:05 +07:00
|
|
|
CFLAGS_expr-bison.o += -DYYENABLE_NLS=0 -DYYLTYPE_IS_TRIVIAL=0 -w
|
2014-12-29 23:42:46 +07:00
|
|
|
|
|
|
|
$(OUTPUT)util/parse-events.o: $(OUTPUT)util/parse-events-flex.c $(OUTPUT)util/parse-events-bison.c
|
|
|
|
$(OUTPUT)util/pmu.o: $(OUTPUT)util/pmu-flex.c $(OUTPUT)util/pmu-bison.c
|
|
|
|
|
2016-01-08 22:33:37 +07:00
|
|
|
CFLAGS_bitmap.o += -Wno-unused-parameter -DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
|
2016-01-08 20:46:52 +07:00
|
|
|
CFLAGS_find_bit.o += -Wno-unused-parameter -DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
|
2014-12-29 23:42:46 +07:00
|
|
|
CFLAGS_rbtree.o += -Wno-unused-parameter -DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
|
2015-11-16 21:36:29 +07:00
|
|
|
CFLAGS_libstring.o += -Wno-unused-parameter -DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
|
2014-12-29 23:42:46 +07:00
|
|
|
CFLAGS_hweight.o += -Wno-unused-parameter -DETC_PERFCONFIG="BUILD_STR($(ETC_PERFCONFIG_SQ))"
|
|
|
|
CFLAGS_parse-events.o += -Wno-redundant-decls
|
2017-01-16 22:22:37 +07:00
|
|
|
CFLAGS_header.o += -include $(OUTPUT)PERF-VERSION-FILE
|
2014-12-29 23:42:46 +07:00
|
|
|
|
|
|
|
$(OUTPUT)util/kallsyms.o: ../lib/symbol/kallsyms.c FORCE
|
2014-12-31 00:44:38 +07:00
|
|
|
$(call rule_mkdir)
|
2014-12-29 23:42:46 +07:00
|
|
|
$(call if_changed_dep,cc_o_c)
|
|
|
|
|
2016-01-08 22:33:37 +07:00
|
|
|
$(OUTPUT)util/bitmap.o: ../lib/bitmap.c FORCE
|
|
|
|
$(call rule_mkdir)
|
|
|
|
$(call if_changed_dep,cc_o_c)
|
|
|
|
|
2016-01-08 20:46:52 +07:00
|
|
|
$(OUTPUT)util/find_bit.o: ../lib/find_bit.c FORCE
|
2014-12-31 00:44:38 +07:00
|
|
|
$(call rule_mkdir)
|
2014-12-29 23:42:46 +07:00
|
|
|
$(call if_changed_dep,cc_o_c)
|
|
|
|
|
2015-07-06 08:48:21 +07:00
|
|
|
$(OUTPUT)util/rbtree.o: ../lib/rbtree.c FORCE
|
2014-12-31 00:44:38 +07:00
|
|
|
$(call rule_mkdir)
|
2014-12-29 23:42:46 +07:00
|
|
|
$(call if_changed_dep,cc_o_c)
|
|
|
|
|
2015-11-16 21:36:29 +07:00
|
|
|
$(OUTPUT)util/libstring.o: ../lib/string.c FORCE
|
|
|
|
$(call rule_mkdir)
|
|
|
|
$(call if_changed_dep,cc_o_c)
|
|
|
|
|
2015-07-10 02:27:25 +07:00
|
|
|
$(OUTPUT)util/hweight.o: ../lib/hweight.c FORCE
|
2014-12-31 00:44:38 +07:00
|
|
|
$(call rule_mkdir)
|
2014-12-29 23:42:46 +07:00
|
|
|
$(call if_changed_dep,cc_o_c)
|
2016-07-08 01:42:33 +07:00
|
|
|
|
|
|
|
$(OUTPUT)util/vsprintf.o: ../lib/vsprintf.c FORCE
|
|
|
|
$(call rule_mkdir)
|
|
|
|
$(call if_changed_dep,cc_o_c)
|