linux_dsm_epyc7002/tools
Milian Wolff 21ac9d547f perf report: Cache srclines for callchain nodes
On one hand this ensures that the memory is properly freed when the DSO
gets freed. On the other hand this significantly speeds up the
processing of the callchain nodes when lots of srclines are requested.
For one of my data files e.g.:

Before:

 Performance counter stats for 'perf report -s srcline -g srcline --stdio':

      52496.495043      task-clock (msec)         #    0.999 CPUs utilized
               634      context-switches          #    0.012 K/sec
                 2      cpu-migrations            #    0.000 K/sec
           191,561      page-faults               #    0.004 M/sec
   165,074,498,235      cycles                    #    3.144 GHz
   334,170,832,408      instructions              #    2.02  insn per cycle
    90,220,029,745      branches                  # 1718.591 M/sec
       654,525,177      branch-misses             #    0.73% of all branches

      52.533273822 seconds time elapsedProcessed 236605 events and lost 40 chunks!

After:

 Performance counter stats for 'perf report -s srcline -g srcline --stdio':

      22606.323706      task-clock (msec)         #    1.000 CPUs utilized
                31      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
           185,471      page-faults               #    0.008 M/sec
    71,188,113,681      cycles                    #    3.149 GHz
   133,204,943,083      instructions              #    1.87  insn per cycle
    34,886,384,979      branches                  # 1543.214 M/sec
       278,214,495      branch-misses             #    0.80% of all branches

      22.609857253 seconds time elapsed

Note that the difference is only this large when `--inline` is not
passed. In such situations, we would use the inliner cache and thus do
not run this code path that often.

I think that this cache should actually be used in other places, too.
When looking at the valgrind leak report for perf report, we see tons of
srclines being leaked, most notably from calls to
hist_entry__get_srcline. The problem is that get_srcline has many
different formatting options (show_sym, show_addr, potentially even
unwind_inlines when calling __get_srcline directly). As such, the
srcline cannot easily be cached for all calls, or we'd have to add
caches for all formatting combinations (6 so far). An alternative would
be to remove the formatting options and handle that on a different level
- i.e. print the sym/addr on demand wherever we actually output
something. And the unwind_inlines could be moved into a separate
function that does not return the srcline.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-4-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25 10:50:46 -03:00
..
accounting tools: move accounting tool from Documentation 2016-09-23 13:07:15 -06:00
arch tools include: Sync kernel ABI headers with tooling headers 2017-09-25 10:39:44 -03:00
build tools build tests: Don't hardcode gcc name 2017-08-28 16:44:44 -03:00
cgroup
firewire
gpio gpio-hammer: fix make consumer_label suitable to work on gpio-nails 2017-01-26 16:29:09 +01:00
hv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-09-06 14:45:08 -07:00
iio iio: tools: add install section 2017-08-09 14:31:58 +01:00
include Merge branch 'perf/urgent' into perf/core, to pick up fixes 2017-10-20 11:02:05 +02:00
kvm/kvm_stat tools/kvm_stat: add '-f help' to get the available event list 2017-07-26 19:04:53 +02:00
laptop tools: move laptops dslm tool from Documentation 2016-09-23 13:07:21 -06:00
leds tools/leds: Add led_hw_brightness_mon program 2017-02-14 22:20:23 +01:00
lib perf/urgent fixes: 2017-09-13 09:25:10 +02:00
net tools: bpf_jit_disasm: Handle large images. 2017-06-14 15:03:22 -04:00
nfsd
objtool objtool: Support unoptimized frame pointer setup 2017-09-28 07:25:54 +02:00
pci tools: PCI: Add a missing option help line 2017-08-29 16:03:34 -05:00
pcmcia tools: move pcmcia crc32hash tool from Documentation 2016-09-23 13:07:27 -06:00
perf perf report: Cache srclines for callchain nodes 2017-10-25 10:50:46 -03:00
power Revert "tools/power turbostat: stop migrating, unless '-m'" 2017-10-18 03:17:45 +02:00
scripts Kbuild updates for v4.14 2017-09-14 13:46:33 -07:00
spi spi: tools: add install section 2017-07-26 12:25:28 +01:00
testing userfaultfd: selftest: exercise -EEXIST only in background transfer 2017-10-13 16:18:32 -07:00
thermal/tmon
time
usb usbip: auto retry for concurrent attach 2017-08-31 18:08:45 +02:00
virtio tools/virtio: fix spelling mistake: "wakeus" -> "wakeups" 2017-05-09 16:43:24 +03:00
vm tools/vm: add missing Makefile rules 2017-02-22 16:41:26 -08:00
Makefile spi: Updates for v4.14 2017-09-05 11:40:38 -07:00