Commit Graph

337 Commits

Author SHA1 Message Date
Arnaldo Carvalho de Melo
08f6680e62 perf tools: Add a 'struct map_groups' pointer to 'struct map_symbol'
And fill it whenever we setup a a 'struct map_symbol', now we need to
use it, next cset.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-fzwfcnddenz1o7uj1fzw3g46@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-12 08:20:53 -03:00
Arnaldo Carvalho de Melo
d46a4cdf49 pref tools: Make 'struct addr_map_symbol' contain 'struct map_symbol'
So that we pass that substructure around and with it consolidate lots of
functions that receive a (map, symbol) pair and now can receive just a
'struct map_symbol' pointer.

This further paves the way to add 'struct map_groups' to 'struct
map_symbol' so that we can have all we need for annotation so that we
can ditch 'struct map'->groups, i.e. have the map_groups pointer in a
more central place, avoiding the pointer in the 'struct map' that have
tons of instances.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-fs90ttd9q12l7989fo7pw81q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-12 08:20:53 -03:00
Ingo Molnar
56b2147f34 perf/core improvements and fixes:
perf report:
 
   Jin Yao:
 
   - Introduce --total-cycles, for basic block profiling, further using data
     obtained from LBR, an example should suffice:
 
       # perf record -b
       ^C[ perf record: Woken up 595 times to write data ]
       [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]
 
       # perf evlist -v
       cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY
 
       # perf report --total-cycles --stdio
       # To display the perf.data header info, please use --header/--header-only options.
       #
       # Total Lost Samples: 0
       #
       # Samples: 6M of event 'cycles'
       # Event count (approx.): 6299936
       #
       # Sampled  Sampled   Avg     Avg
       # Cycles%  Cycles  Cycles%  Cycles                 [Program Block Range]     Shared Object
       # .......  ......  .......  .....   ....................................  ................
       #
          2.17%     1.7M   0.08%     607       [compiler.h:199 -> common.c:221]  [kernel.vmlinux]
          0.72%   544.5K   0.03%     230     [entry_64.S:657 -> entry_64.S:662]  [kernel.vmlinux]
          0.56%   541.8K   0.09%     672       [compiler.h:199 -> common.c:300]  [kernel.vmlinux]
          0.39%   293.2K   0.01%     104   [list_debug.c:43 -> list_debug.c:61]  [kernel.vmlinux]
          0.36%   278.6K   0.03%     272   [entry_64.S:1289 -> entry_64.S:1308]  [kernel.vmlinux]
 
 perf record:
 
   Adrian Hunter:
 
   - Allow storing perf.data in a directory together with a copy of /proc/kcore.
 
   Jiwei Sun:
 
   - Add support for limit perf output file size, i.e.:
 
     # perf record --all-cpus -F 10000 --max-size=4M sleep 10h
     [ perf record: perf size limit reached (4097 KB), stopping session ]
     [ perf record: Woken up 6 times to write data ]
     [ perf record: Captured and wrote 4.048 MB perf.data (54094 samples) ]
     Terminated
     # ls -lah perf.data
     -rw-------. 1 root root 4.1M Nov  7 15:27 perf.data
     #
 
 perf stat:
 
   Jiri Olsa:
 
   - Add --per-node agregation support:
 
     In live mode:
 
       # perf stat  -a -I 1000 -e cycles --per-node
       #           time node   cpus             counts unit events
            1.000542550 N0       20          6,202,097      cycles
            1.000542550 N1       20            639,559      cycles
            2.002040063 N0       20          7,412,495      cycles
            2.002040063 N1       20          2,185,577      cycles
            3.003451699 N0       20          6,508,917      cycles
            3.003451699 N1       20            765,607      cycles
       ...
 
     Or in the record/report stat session:
 
       # perf stat record -a -I 1000 -e cycles
       #           time             counts unit events
            1.000536937         10,008,468      cycles
            2.002090152          9,578,539      cycles
            3.003625233          7,647,869      cycles
            4.005135036          7,032,086      cycles
       ^C     4.340902364          3,923,893      cycles
 
       # perf stat report --per-node
       #           time node   cpus             counts unit events
            1.000536937 N0       20          9,355,086      cycles
            1.000536937 N1       20            653,382      cycles
            2.002090152 N0       20          7,712,838      cycles
            2.002090152 N1       20          1,865,701      cycles
        ...
 
 perf probe:
 
   Masami Hiramatsu:
 
   Various fixes related to recent additions to the DWARF format:
 
   - Fix to find range-only function instance
 
   - Walk function lines in lexical blocks
 
   - Fix to show function entry line as probe-able
 
   - Fix wrong address verification
 
   - Fix to probe a function which has no entry pc
 
   - Fix to probe an inline function which has no entry pc
 
   - Fix to list probe event with correct line number
 
   - Fix to show inlined function callsite without entry_pc
 
   - Fix to show ranges of variables in functions without entry_pc
 
   - Return a better scope DIE if there is no best scope
 
   - Skip end-of-sequence and non statement lines
 
   - Filter out instances except for inlined subroutine and subprogram
 
   - Fix to show calling lines of inlined functions
 
   - Skip overlapped location on searching variables
 
 perf inject:
 
   Adrian Hunter:
 
   - Do not strip evsels with --strip, as they are needed for create_gcov
     (see the autofdo example in tools/perf/Documentation/intel-pt.txt).
 
 Intel PT:
 
   Adrian Hunter:
 
   - Intel PT uses an auxtrace_cache to store the results of code-walking, to avoid
     repeated decoding. Add an auxtrace_cache__remove to handle text poke events.
 
 core:
 
   Andi Kleen:
 
   - Always preserve errno while cleaning up perf_event_open failures.
 
 llvm:
 
   Arnaldo Carvalho de Melo:
 
   - No need to tell that the request for saving a .o file for BPF events, as
     expressed in ~/.perfconfig was satisfied, make that a debug message.
 
 perf vendor events:
 
 Intel:
 
   Haiyan Song:
 
   - Update CascadelakeX events to v1.05.
 
   - Update all the Intel JSON metrics from TMAM 3.6.
 
 Treewide:
 
   Ian Rogers:
 
   - Improve error paths, plugging leaks found using LLVM tools
     such as libFuzzer.
 
 jevents:
 
   Yunfeng Ye:
 
   - Fix resource leak in process_mapfile() and main()
 
 perf kvm:
 
   Igor Lubashev:
 
   - Use evlist layer api when possible.
 
 libsubcmd:
 
   James Clark:
 
   - Move EXTRA_FLAGS to the end to allow overriding existing flags.
 
   - Use -O0 with DEBUG=1
 
 perf diff:
 
   Jin Yao:
 
   - Don't use hack to skip column length calculation
 
 CoreSight ETM:
 
   Leo yan:
 
   - Fix definition of macro TO_CS_QUEUE_NR
 
 ARM64:
 
   John Garry:
 
   - Do not try to include libelf header files when its feature detection
     failed, fixing the cross build for ARM64.
 
 perf tests:
 
   Leo Yan:
 
   - Fix out of bounds memory access in the backward ring buffer test.
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXcRowQAKCRCyPKLppCJ+
 JxHcAQCTtl9N3zkNjLWif1i6AGKNU9TzYpup+jDR5J83ggLqgQD+O931nR9wXUOe
 9bDUr45cNw3ZkRbc1558hKPWIsceJgU=
 =Rko+
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-5.5-20191107' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

perf report:

  Jin Yao:

  - Introduce --total-cycles, for basic block profiling, further using data
    obtained from LBR, an example should suffice:

      # perf record -b
      ^C[ perf record: Woken up 595 times to write data ]
      [ perf record: Captured and wrote 156.672 MB perf.data (196873 samples) ]

      # perf evlist -v
      cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY

      # perf report --total-cycles --stdio
      # To display the perf.data header info, please use --header/--header-only options.
      #
      # Total Lost Samples: 0
      #
      # Samples: 6M of event 'cycles'
      # Event count (approx.): 6299936
      #
      # Sampled  Sampled   Avg     Avg
      # Cycles%  Cycles  Cycles%  Cycles                 [Program Block Range]     Shared Object
      # .......  ......  .......  .....   ....................................  ................
      #
         2.17%     1.7M   0.08%     607       [compiler.h:199 -> common.c:221]  [kernel.vmlinux]
         0.72%   544.5K   0.03%     230     [entry_64.S:657 -> entry_64.S:662]  [kernel.vmlinux]
         0.56%   541.8K   0.09%     672       [compiler.h:199 -> common.c:300]  [kernel.vmlinux]
         0.39%   293.2K   0.01%     104   [list_debug.c:43 -> list_debug.c:61]  [kernel.vmlinux]
         0.36%   278.6K   0.03%     272   [entry_64.S:1289 -> entry_64.S:1308]  [kernel.vmlinux]

perf record:

  Adrian Hunter:

  - Allow storing perf.data in a directory together with a copy of /proc/kcore.

  Jiwei Sun:

  - Add support for limit perf output file size, i.e.:

    # perf record --all-cpus -F 10000 --max-size=4M sleep 10h
    [ perf record: perf size limit reached (4097 KB), stopping session ]
    [ perf record: Woken up 6 times to write data ]
    [ perf record: Captured and wrote 4.048 MB perf.data (54094 samples) ]
    Terminated
    # ls -lah perf.data
    -rw-------. 1 root root 4.1M Nov  7 15:27 perf.data
    #

perf stat:

  Jiri Olsa:

  - Add --per-node agregation support:

    In live mode:

      # perf stat  -a -I 1000 -e cycles --per-node
      #           time node   cpus             counts unit events
           1.000542550 N0       20          6,202,097      cycles
           1.000542550 N1       20            639,559      cycles
           2.002040063 N0       20          7,412,495      cycles
           2.002040063 N1       20          2,185,577      cycles
           3.003451699 N0       20          6,508,917      cycles
           3.003451699 N1       20            765,607      cycles
      ...

    Or in the record/report stat session:

      # perf stat record -a -I 1000 -e cycles
      #           time             counts unit events
           1.000536937         10,008,468      cycles
           2.002090152          9,578,539      cycles
           3.003625233          7,647,869      cycles
           4.005135036          7,032,086      cycles
      ^C     4.340902364          3,923,893      cycles

      # perf stat report --per-node
      #           time node   cpus             counts unit events
           1.000536937 N0       20          9,355,086      cycles
           1.000536937 N1       20            653,382      cycles
           2.002090152 N0       20          7,712,838      cycles
           2.002090152 N1       20          1,865,701      cycles
       ...

perf probe:

  Masami Hiramatsu:

  Various fixes related to recent additions to the DWARF format:

  - Fix to find range-only function instance

  - Walk function lines in lexical blocks

  - Fix to show function entry line as probe-able

  - Fix wrong address verification

  - Fix to probe a function which has no entry pc

  - Fix to probe an inline function which has no entry pc

  - Fix to list probe event with correct line number

  - Fix to show inlined function callsite without entry_pc

  - Fix to show ranges of variables in functions without entry_pc

  - Return a better scope DIE if there is no best scope

  - Skip end-of-sequence and non statement lines

  - Filter out instances except for inlined subroutine and subprogram

  - Fix to show calling lines of inlined functions

  - Skip overlapped location on searching variables

perf inject:

  Adrian Hunter:

  - Do not strip evsels with --strip, as they are needed for create_gcov
    (see the autofdo example in tools/perf/Documentation/intel-pt.txt).

Intel PT:

  Adrian Hunter:

  - Intel PT uses an auxtrace_cache to store the results of code-walking, to avoid
    repeated decoding. Add an auxtrace_cache__remove to handle text poke events.

core:

  Andi Kleen:

  - Always preserve errno while cleaning up perf_event_open failures.

llvm:

  Arnaldo Carvalho de Melo:

  - No need to tell that the request for saving a .o file for BPF events, as
    expressed in ~/.perfconfig was satisfied, make that a debug message.

perf vendor events:

Intel:

  Haiyan Song:

  - Update CascadelakeX events to v1.05.

  - Update all the Intel JSON metrics from TMAM 3.6.

Treewide:

  Ian Rogers:

  - Improve error paths, plugging leaks found using LLVM tools
    such as libFuzzer.

jevents:

  Yunfeng Ye:

  - Fix resource leak in process_mapfile() and main()

perf kvm:

  Igor Lubashev:

  - Use evlist layer api when possible.

libsubcmd:

  James Clark:

  - Move EXTRA_FLAGS to the end to allow overriding existing flags.

  - Use -O0 with DEBUG=1

perf diff:

  Jin Yao:

  - Don't use hack to skip column length calculation

CoreSight ETM:

  Leo yan:

  - Fix definition of macro TO_CS_QUEUE_NR

ARM64:

  John Garry:

  - Do not try to include libelf header files when its feature detection
    failed, fixing the cross build for ARM64.

perf tests:

  Leo Yan:

  - Fix out of bounds memory access in the backward ring buffer test.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-12 12:06:08 +01:00
Jin Yao
b65a7d372b perf hist: Support block formats with compare/sort/display
This patch provides helper routines to support new columns for block
info output.

The new columns are:

  Sampled Cycles%
  Sampled Cycles
  Avg Cycles%
  Avg Cycles
  [Program Block Range]
  Shared Object

 v5:
 ---
 1. Move more block related functions from builtin-report.c to
    block-info.c

 2. Set ms (map+sym) in block hist_entry. Because this info
    is needed for reporting the block range (i.e. source line)

Committer notes:

Remove unused set_fmt() function, some build were not completing with:

  util/block-info.c:396:20: error: unused function 'set_fmt' [-Werror,-Wunused-function]
  static inline void set_fmt(struct block_fmt *block_fmt,
                     ^
  1 error generated.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191107074719.26139-5-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07 10:14:05 -03:00
Jin Yao
7841f40aed perf hist: Count the total cycles of all samples
We can get the per sample cycles by hist__account_cycles(). It's also
useful to know the total cycles of all samples in order to get the
cycles coverage for a single program block in further. For example:

  coverage = per block sampled cycles / total sampled cycles

This patch creates a new argument 'total_cycles' in hist__account_cycles(),
which will be added with the cycles of each sample.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191107074719.26139-4-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07 09:14:15 -03:00
Jin Yao
6041441870 perf block: Cleanup and refactor block info functions
We have already implemented some block-info related functions.
Now it's time to do some cleanup, refactoring and move the
functions and structures to new block-info.h/block-info.c.

 v4:
 ---
 Move code for skipping column length calculation to patch:
 'perf diff: Don't use hack to skip column length calculation'

 v3:
 ---
 1. Rename the patch title
 2. Rename from block.h/block.c to block-info.h/block-info.c
 3. Move more common part to block-info, such as
    block_info__process_sym.
 4. Remove the nasty hack for skipping calculation of column
    length

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191107074719.26139-3-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07 09:09:18 -03:00
Jin Yao
0bdf181fe0 perf diff: Don't use hack to skip column length calculation
Previously we use a nasty hack to skip the hists__calc_col_len for block
since this function is not very suitable for block column length
calculation.

This patch removes the hack code and add a check at the entry of
hists__calc_col_len to skip for block case.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20191107074719.26139-2-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-07 09:08:03 -03:00
Jiri Olsa
722ddfde36 perf tools: Fix time sorting
The final sort might get confused when the comparison is done over
bigger numbers than int like for -s time.

Check the following report for longer workloads:

  $ perf report -s time -F time,overhead --stdio

Fix hist_entry__sort() to properly return int64_t and not possible cut
int.

Fixes: 043ca389a3 ("perf tools: Use hpp formats to sort final output")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org # v3.16+
Link: http://lore.kernel.org/lkml/20191104232711.16055-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-11-05 08:49:14 -03:00
Arnaldo Carvalho de Melo
d3300a3c4e perf symbols: Move mem_info and branch_info out of symbol.h
The mem_info struct goes to mem-events.h and branch_info goes to
branch.h, where they belong, this way we can remove several headers from
symbols.h and trim the include dependency tree more.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-aupw71xnravcsu2xoabfmhpc@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-31 22:27:48 -03:00
Arnaldo Carvalho de Melo
5c9dbe6da1 perf tools: Remove needless sort.h include directives
Now that sort.h isn't included by any other header, we can check where
it is really needed, i.e. we can remove it and be sure that it isn't
being obtained indirectly.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-tom8k0lbsxd9joprr8zpu6w1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-31 22:24:10 -03:00
Arnaldo Carvalho de Melo
4a3cec8494 perf dsos: Move the dsos struct and its methods to separate source files
So that we can reduce the header dependency tree further, in the process
noticed that lots of places were getting even things like build-id
routines and 'struct perf_tool' definition indirectly, so fix all those
too.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-ti0btma9ow5ndrytyoqdk62j@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-31 22:24:10 -03:00
Arnaldo Carvalho de Melo
8520a98dba perf debug: Remove needless include directives from debug.h
All we need there is a forward declaration for 'union perf_event', so
remove it from there and add missing header directives in places using
things from this indirect include.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-7ftk0ztstqub1tirjj8o8xbl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-31 19:10:19 -03:00
Arnaldo Carvalho de Melo
b42090256f perf tools: Remove debug.h from header files not needing it
And fix the fallout, adding it to places that must have it since they
use its definitions.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-1s3jel4i26chq2g0lydoz7i3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-29 17:38:32 -03:00
Namhyung Kim
be5863b7d9 perf top: Fix event group with more than two events
The event group feature links relevant hist entries among events so that
they can be displayed together.  During the link process, each hist
entry in non-leader events is connected to a hist entry in the leader
event.  This is done in order of events specified in the command line so
it assumes that events are linked in the order.

But 'perf top' can break the assumption since it does the link process
multiple times.  For example, a hist entry can be in the third event
only at first so it's linked after the leader.  Some time later, second
event has a hist entry for it and it'll be linked after the entry of the
third event.

This makes the code compilicated to deal with such unordered entries.
This patch simply unlink all the entries after it's printed so that they
can assume the correct order after the repeated link process.  Also it'd
be easy to deal with decaying old entries IMHO.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20190827231555.121411-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-28 18:15:03 -03:00
Andi Kleen
3dab6ac080 perf report: Fix --ns time sort key output
If the user specified --ns, the column to print the sort time stamp
wasn't wide enough to actually print the full nanoseconds.

Widen the time key column width when --ns is specified.

Before:

  % perf record -a sleep 1
  % perf report --sort time,overhead,symbol --stdio --ns
  ...
       2.39%  187851.10000  [k] smp_call_function_single   -      -
       1.53%  187851.10000  [k] intel_idle                 -      -
       0.59%  187851.10000  [.] __wcscmp_ifunc             -      -
       0.33%  187851.10000  [.] 0000000000000000           -      -
       0.28%  187851.10000  [k] cpuidle_enter_state        -      -

After:

  % perf report --sort time,overhead,symbol --stdio --ns
  ...
       2.39%  187851.100000000  [k] smp_call_function_single   -      -
       1.53%  187851.100000000  [k] intel_idle                 -      -
       0.59%  187851.100000000  [.] __wcscmp_ifunc             -      -
       0.33%  187851.100000000  [.] 0000000000000000           -      -
       0.28%  187851.100000000  [k] cpuidle_enter_state        -      -

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190823210338.12360-2-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-26 11:58:29 -03:00
Arnaldo Carvalho de Melo
5f8b4d5d23 perf hist: Remove dummy entries when finding real ones.
When he have an event group we have multiple struct hist instances, one
per evsel, and in each of these hists we may have hist_entries that
point to the same thing being observed, say a symbol, i.e. if we're
looking at instructions and cycles, then we'll have one hist_entry in
the "instructions" evsel and another in the "cycles" evsel.

We need to link those to then show one column for each. When we're
looking at some other pair of events, say instructions and cache misses,
we may have just the "instructions" hist entry and not one for "cache
misses", as instructions not necessarily generate cache misses, as the
logic expects one hist_entry per evsel, we end up adding "dummy"
hist_entries.

This is enough for 'perf report', that does this matching operation
(hists__match()) just once after processing all events, but for 'perf
top', we do this at each refresh, so we may finally find events matching
and then we need to trow away the dummies and link with the real events.

So if we find a match, traverse the link of matches and trow away
dummies for that hists.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-dwvtjqqifsbsczeb35q6mqkk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-12 16:26:02 -03:00
Arnaldo Carvalho de Melo
7d1a5efa20 perf hists: Do not link a pair if already linked
When we have multiple events in a group we link hist_entries in the
non-leader evsel hists to the one in the leader that points to the same
sorting criteria, in hists__match().

For 'perf report' we do this just once and then print the results, but
for 'perf top' we need to look if this was already done in the previous
refresh of the screen, so check for that and don't try to link again.

This is part of having 'perf top' using the hists browser for showing
multiple events in multiple columns.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-iwvb37rgb7upswhruwpcdnhw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-08-12 16:26:02 -03:00
Jiri Olsa
5643b1a59e libperf: Move nr_members from perf's evsel to libperf's perf_evsel
Move the nr_members member from perf's evsel to libperf's perf_evsel.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-60-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-29 18:34:46 -03:00
Jiri Olsa
1fc632cef4 libperf: Move perf_event_attr field from perf's evsel to libperf's perf_evsel
Move the perf_event_attr struct fron 'struct evsel' to 'struct perf_evsel'.

Committer notes:

Fixed up these:

 tools/perf/arch/arm/util/auxtrace.c
 tools/perf/arch/arm/util/cs-etm.c
 tools/perf/arch/arm64/util/arm-spe.c
 tools/perf/arch/s390/util/auxtrace.c
 tools/perf/util/cs-etm.c

Also

  cc1: warnings being treated as errors
  tests/sample-parsing.c: In function 'do_test':
  tests/sample-parsing.c:162: error: missing initializer
  tests/sample-parsing.c:162: error: (near initialization for 'evsel.core.cpus')

   	struct evsel evsel = {
   		.needs_swap = false,
  -		.core.attr = {
  -			.sample_type = sample_type,
  -			.read_format = read_format,
  +		.core = {
  +			. attr = {
  +				.sample_type = sample_type,
  +				.read_format = read_format,
  +			},

  [perfbuilder@a70e4eeb5549 /]$ gcc --version |& head -1
  gcc (GCC) 4.4.7

Also we don't need to include perf_event.h in
tools/perf/lib/include/perf/evsel.h, forward declaring 'struct
perf_event_attr' is enough. And this even fixes the build in some
systems where things are used somewhere down the include path from
perf_event.h without defining __always_inline.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-43-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-29 18:34:45 -03:00
Jiri Olsa
63503dba87 perf evlist: Rename struct perf_evlist to struct evlist
Rename struct perf_evlist to struct evlist, so we don't have a name
clash when we add struct perf_evlist in libperf.

Committer notes:

Added fixes to build on arm64, from Jiri and from me
(tools/perf/util/cs-etm.c)

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-29 18:34:42 -03:00
Jiri Olsa
32dcd021d0 perf evsel: Rename struct perf_evsel to struct evsel
Rename struct perf_evsel to struct evsel, so we don't have a name clash
when we add struct perf_evsel in libperf.

Committer notes:

Added fixes for arm64, provided by Jiri.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190721112506.12306-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-29 18:34:42 -03:00
Arnaldo Carvalho de Melo
e56fbc9dc7 perf tools: Use list_del_init() more thorougly
To allow for destructors to check if they're operating on a object still
in a list, and to avoid going from use after free list entries into
still valid, or even also other already removed from list entries.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-deh17ub44atyox3j90e6rksu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-09 10:13:27 -03:00
Arnaldo Carvalho de Melo
d8f9da2404 perf tools: Use zfree() where applicable
In places where the equivalent was already being done, i.e.:

   free(a);
   a = NULL;

And in placs where struct members are being freed so that if we have
some erroneous reference to its struct, then accesses to freed members
will result in segfaults, which we can detect faster than use after free
to areas that may still have something seemingly valid.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-jatyoofo5boc1bsvoig6bb6i@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-09 10:13:27 -03:00
Arnaldo Carvalho de Melo
7f7c536f23 tools lib: Adopt zalloc()/zfree() from tools/perf
Eroding a bit more the tools/perf/util/util.h hodpodge header.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-natazosyn9rwjka25tvcnyi0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-09 10:13:26 -03:00
Jin Yao
b10c78c509 perf diff: Print the basic block cycles diff
$ perf record -b ./div
 $ perf record -b ./div

Following is the default perf diff output

 $ perf diff

 # Event 'cycles'
 #
 # Baseline  Delta Abs  Shared Object     Symbol
 # ........  .........  ................  ..................................
 #
     48.75%     +0.33%  div               [.] main
      8.21%     -0.20%  div               [.] compute_flag
     19.02%     -0.12%  libc-2.23.so      [.] __random_r
     16.17%     -0.09%  libc-2.23.so      [.] __random
      2.27%     -0.03%  div               [.] rand@plt
                +0.02%  [i915]            [k] gen8_irq_handler
      5.52%     +0.02%  libc-2.23.so      [.] rand

This patch creates a new computation selection 'cycles'.

 $ perf diff -c cycles

 # Event 'cycles'
 #
 # Baseline       [Program Block Range] Cycles Diff Shared Object Symbol
 # ........ ....................................... .........................................
 #
     48.75%             [div.c:42 -> div.c:45]  147 div           [.] main
     48.75%             [div.c:31 -> div.c:40]    4 div           [.] main
     48.75%             [div.c:40 -> div.c:40]    0 div           [.] main
     48.75%             [div.c:42 -> div.c:42]    0 div           [.] main
     48.75%             [div.c:42 -> div.c:44]    0 div           [.] main
     19.02% [random_r.c:357 -> random_r.c:360]    0 libc-2.23.so  [.] __random_r
     19.02% [random_r.c:357 -> random_r.c:373]    0 libc-2.23.so  [.] __random_r
     19.02% [random_r.c:357 -> random_r.c:376]    0 libc-2.23.so  [.] __random_r
     19.02% [random_r.c:357 -> random_r.c:380]    0 libc-2.23.so  [.] __random_r
     19.02% [random_r.c:357 -> random_r.c:392]    0 libc-2.23.so  [.] __random_r
     16.17%     [random.c:288 -> random.c:291]    0 libc-2.23.so  [.] __random
     16.17%     [random.c:288 -> random.c:291]    0 libc-2.23.so  [.] __random
     16.17%     [random.c:288 -> random.c:295]    0 libc-2.23.so  [.] __random
     16.17%     [random.c:288 -> random.c:297]    0 libc-2.23.so  [.] __random
     16.17%     [random.c:291 -> random.c:291]    0 libc-2.23.so  [.] __random
     16.17%     [random.c:293 -> random.c:293]    0 libc-2.23.so  [.] __random
      8.21%             [div.c:22 -> div.c:22]  148 div           [.] compute_flag
      8.21%             [div.c:22 -> div.c:25]    0 div           [.] compute_flag
      8.21%             [div.c:27 -> div.c:28]    0 div           [.] compute_flag
      5.52%           [rand.c:26 -> rand.c:27]    0 libc-2.23.so  [.] rand
      5.52%           [rand.c:26 -> rand.c:28]    0 libc-2.23.so  [.] rand
      2.27%         [rand@plt+0 -> rand@plt+0]    0 div           [.] rand@plt
      0.01% [entry_64.S:694 -> entry_64.S:694]   16 [vmlinux]     [k] native_irq_return_iret
      0.00%       [fair.c:7676 -> fair.c:7665]  162 [vmlinux]     [k] update_blocked_averages

"[Program Block Range]" indicates the range of program basic block
(start -> end). If we can find the source line it prints the source line
otherwise it prints the symbol+offset instead.

 v4:
 ---
 Use source lines or symbol+offset to indicate the basic block. It should
 be easier to understand.

 v3:
 ---
 Cast 'struct hist_entry' to 'struct block_hist' in hist_entry__block_fprintf.
 Use symbol_conf.report_block to check if executing hist_entry__block_fprintf.

 v2:
 ---
 Keep standard perf diff format and display the 'Baseline' and
 'Shared Object'.

The output is sorted by "Baseline" and the basic blocks in the same
function are sorted by cycles diff.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1561713784-30533-7-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-02 13:20:51 -03:00
Jin Yao
99150a1faa perf diff: Use hists to manage basic blocks per symbol
The hist__account_cycles() can account cycles per basic block. The basic
block information is saved in cycles_hist structure.

This patch processes each symbol, get basic blocks from cycles_hist and
add the basic block entries to a new hists (in 'struct block_hist').
Using a hists is because we need to compare, sort and print the basic
blocks later.

 v6:
 ---
 Since 'ops' argument is removed from hists__add_entry_block,
 update the code accordingly. No functional change.

 v5:
 ---
 Since now we still carry block_info in 'struct hist_entry'
 we don't need to use our own new/free ops for hist entries.
 And the block_info is released in hist_entry__delete.

 v3:
 ---
 1. In v2, we put block stuffs in 'struct hist_entry', but
 it's not a good design. In v3, we create a new
 'struct block_hist' and cast the 'struct hist_entry' to
 'struct block_hist' in some places, which can avoid adding
 new stuffs in 'struct hist_entry'.

 2. abs() -> labs(), in block_cycles_diff_cmp().

 v2:
 ---
 v1 adds the basic block entries to per data-file hists
 but v2 adds the basic block entries to per symbol hists.
 That is to keep current perf-diff format. Will show the
 result in next patches.

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1561713784-30533-5-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-02 12:47:07 -03:00
Jin Yao
fe96245c7f perf hists: Add block_info in hist_entry
The block_info contains the program basic block information, i.e,
contains the start address and the end address of this basic block and
how much cycles it takes.

We need to compare, sort and even print out the basic block by some
orders, i.e. sort by cycles.

For this purpose, we add block_info field to hist_entry. In order not to
impact current interface, we creates a new function
hists__add_entry_block.

 v6:
 ---
 Remove the 'ops' argument in hists__add_entry_block

Signed-off-by: Jin Yao <yao.jin@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1561713784-30533-3-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-07-02 12:45:23 -03:00
Namhyung Kim
7cb10a08df perf tools: Remove const from thread read accessors
The namespaces and comm fields of a thread are protected by rwsem and
require write access for it.  So it ended up using a cast to remove
the const qualifier.  Let's get rid of the const then.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Link: http://lkml.kernel.org/r/20190527061149.168640-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-05-28 18:37:43 -03:00
Changbin Du
cb6186aeff perf hist: Add missing map__put() in error case
We need to map__put() before returning from failure of
sample__resolve_callchain().

Detected with gcc's ASan.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fixes: 9c68ae98c6 ("perf callchain: Reference count maps")
Link: http://lkml.kernel.org/r/20190316080556.3075-10-changbin.du@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-19 16:52:04 -03:00
Andi Kleen
4968ac8fb7 perf report: Implement browsing of individual samples
Now 'perf report' can show whole time periods with 'perf script', but
the user still has to find individual samples of interest manually.

It would be expensive and complicated to search for the right samples in
the whole perf file. Typically users only need to look at a small number
of samples for useful analysis.

Also the full scripts tend to show samples of all CPUs and all threads
mixed up, which can be very confusing on larger systems.

Add a new --samples option to save a small random number of samples per
hist entry.

Use a reservoir sample technique to select a representatve number of
samples.

Then allow browsing the samples using 'perf script' as part of the hist
entry context menu. This automatically adds the right filters, so only
the thread or cpu of the sample is displayed. Then we use less' search
functionality to directly jump the to the time stamp of the selected
sample.

It uses different menus for assembler and source display.  Assembler
needs xed installed and source needs debuginfo.

Currently it only supports as many samples as fit on the screen due to
some limitations in the slang ui code.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311174605.GA29294@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-11 16:33:19 -03:00
Andi Kleen
3723908d05 perf report: Support time sort key
Add a time sort key to perf report to display samples for different time
quantums separately. This allows easier analysis of workloads that
change over time, and also will allow looking at the context of samples.

% perf record ...
% perf report --sort time,overhead,symbol --time-quantum 1ms --stdio
...
     0.67%  277061.87300  [.] _dl_start
     0.50%  277061.87300  [.] f1
     0.50%  277061.87300  [.] f2
     0.33%  277061.87300  [.] main
     0.29%  277061.87300  [.] _dl_lookup_symbol_x
     0.29%  277061.87300  [.] dl_main
     0.29%  277061.87300  [.] do_lookup_x
     0.17%  277061.87300  [.] _dl_debug_initialize
     0.17%  277061.87300  [.] _dl_init_paths
     0.08%  277061.87300  [.] check_match
     0.04%  277061.87300  [.] _dl_count_modids
     1.33%  277061.87400  [.] f1
     1.33%  277061.87400  [.] f2
     1.33%  277061.87400  [.] main
     1.17%  277061.87500  [.] main
     1.08%  277061.87500  [.] f1
     1.08%  277061.87500  [.] f2
     1.00%  277061.87600  [.] main
     0.83%  277061.87600  [.] f1
     0.83%  277061.87600  [.] f2
     1.00%  277061.87700  [.] main

Committer notes:

Rename 'time' argument to hist_time() to htime to overcome this in older
distros:

  cc1: warnings being treated as errors
  util/hist.c: In function 'hist_time':
  util/hist.c:251: error: declaration of 'time' shadows a global declaration
  /usr/include/time.h:186: error: shadowed declaration is here

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311144502.15423-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-11 16:32:31 -03:00
Jiri Olsa
2634958586 perf hist: Fix memory leak of srcline
We can't allocate he->srcline unconditionaly, only when new hist_entry
is created. Moving he->srcline allocation into hist_entry__init
function.

Original-patch-by: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Nageswara R Sastry <nasastry@in.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: http://lkml.kernel.org/r/20190305152536.21035-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-06 18:16:57 -03:00
Jiri Olsa
c57589106f perf hist: Add error path into hist_entry__init
Adding error path into hist_entry__init to unify error handling, so
every new member does not need to free everything else.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jonas Rabenstein <jonas.rabenstein@studium.uni-erlangen.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: nageswara r sastry <nasastry@in.ibm.com>
Link: http://lkml.kernel.org/r/20190305152536.21035-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-03-06 18:16:30 -03:00
Jiri Olsa
5749618764 perf evsel: Add output_resort_cb method
Add perf_evsel__output_resort_cb() so we have an interface with a
callback for each hist entry. It will be used in the following patch.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190204141808.23031-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06 10:00:40 -03:00
Jiri Olsa
e4c38fd4a0 perf hists: Add argument to hists__resort_cb_t callback
Add argument to hists__resort_cb_t so that we can pass data from upper
layers to the callback function. It will be used in the following
patches.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190204141808.23031-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06 10:00:39 -03:00
Arnaldo Carvalho de Melo
b10ba7f1a2 perf tools: Add missing include <callchain.h> in various places
Its getting it from hist.h and that will go away, as that header doesn't
need callchain.h at all.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-6ebl3mwwiqocl79yts44qltu@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06 10:00:38 -03:00
Arnaldo Carvalho de Melo
daecf9e0fa perf tools: Add missing include for symbols.h
Several places were using definitions found in symbols.h but not
including it, getting it by sheer luck from some other headers that now
are in the process of removing that include because they don't need it
or because simply having struct forward declarations is enough, fix it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-xbcvvx296d70kpg9wb0qmeq9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-02-06 10:00:38 -03:00
Davidlohr Bueso
2eb3d6894a perf hist: Use cached rbtrees
At the cost of an extra pointer, we can avoid the O(logN) cost of
finding the first element in the tree (smallest node), which is
something heavily required for histograms. Specifically, the following
are converted to rb_root_cached, and users accordingly:

hist::entries_in_array
hist::entries_in
hist::entries
hist::entries_collapsed
hist_entry::hroot_in
hist_entry::hroot_out

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181206191819.30182-7-dave@stgolabs.net
[ Added some missing conversions to rb_first_cached() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2019-01-25 15:12:10 +01:00
Ingo Molnar
adba163441 perf tools: Fix diverse comment typos
Go over the tools/ files that are maintained in Arnaldo's tree and
fix common typos: half of them were in comments, the other half
in JSON files.

No change in functionality intended.

Committer notes:

This was split from a larger patch as there are code that is,
additionally, maintained outside the kernel tree, so to ease
cherry-picking and/or backporting, split this into multiple patches.

Just typos in comments, no need to backport, reducing the possibility of
possible backporting artifacts.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20181203102200.GA104797@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-12-17 14:56:47 -03:00
Arnaldo Carvalho de Melo
c9d3662870 perf hists: Reimplement hists__has_callchains()
There are places where we have only access to struct hists and need to
know if any of its hist_entries has callchains, like when drawing
headers for the various output modes (stdio, TUI, etc), so, when adding
a new hist_entry, check if it has callchains, storing this info for
later use by hists__has_callchains().

This reimplementation is necessary because not always a 'struct hists'
is allocated together with a 'struct perf evsel', so we can't go from
'hists' to 'perf_event_attr.sample_type & PERF_SAMPLE_CALLCHAIN'.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-hg5g7yddjio3ljwyqnnaj5dt@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-07 14:42:27 -03:00
Arnaldo Carvalho de Melo
41477acf09 perf hists: Save the callchain_size in struct hist_entry
So that we can figure out the real size of the struct and also be able
to tell if callchains may be present in this histogram entry.

Since we can't always guarantee that from hist_entry->hists we can use
hists_to_evsel, to then look at evsel->attr.sample_type for
PERF_SAMPLE_CALLCHAIN, like with the 'perf c2c' tool, that uses plain
'struct hists' instances, we need another way of deciding if a specific
hist_entry instance has callchains associated with it, i.e. if its
hist_entry->callchain[0] has space allocated for.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-ptvndealxs1k7myluvu9flnq@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-07 14:22:53 -03:00
Arnaldo Carvalho de Melo
fabd37b837 perf hists: Check if a hist_entry has callchains before using them
So far if we use 'perf record -g' this will make
symbol_conf.use_callchain 'true' and logic will assume that all events
have callchains enabled, but ever since we added the possibility of
setting up callchains for some events (e.g.: -e
cycles/call-graph=dwarf/) while not for others, we limit usage scenarios
by looking at that symbol_conf.use_callchain global boolean, we better
look at each event attributes.

On the road to that we need to look if a hist_entry has callchains, that
is, to go from hist_entry->hists to the evsel that contains it, to then
look at evsel->sample_type for PERF_SAMPLE_CALLCHAIN.

The next step is to add a symbol_conf.ignore_callchains global, to use
in the places where what we really want to know is if callchains should
be ignored, even if present.

Then -g will mean just to select a callchain mode to be applied to all
events not explicitely setting some other callchain mode, i.e. a default
callchain mode, and --no-call-graph will set
symbol_conf.ignore_callchains with that clear intention.

That too will at some point become a per evsel thing, that tools can set
for all or just a few of its evsels.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-0sas5cm4dsw2obn75g7ruz69@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-06 12:52:03 -03:00
Arnaldo Carvalho de Melo
27de9b2bd9 perf evsel: Add has_callchain() helper to make code more compact/clear
Its common to have the (evsel->attr.sample_type & PERF_SAMPLE_CALLCHAIN),
so add an evsel__has_callchain(evsel) helper.

This will actually get more uses as we check that instead of
symbol_conf.use_callchain in places where that produces the same result
but makes this decision to be more fine grained, per evsel.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-145340oytbthatpfeaq1do18@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-05 10:09:54 -03:00
Arnaldo Carvalho de Melo
362379aad5 perf tools: No need to check if the argument to __get() function is NULL
Those functions always check if the argument is NULL before trying to
grab a reference count, and also will return the received object, so, to
make code more compact, no need to check for NULL.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-i9wycjdxh0fwhryu55lmafks@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-06-04 10:28:50 -03:00
Arnaldo Carvalho de Melo
25c312dbf8 perf hists: Move hists__scnprintf_title() away from the TUI code
The previous patch made this function useful to non-TUI parts of the
tools, but left it where the function from what it was carved, so that
the patch showed more clearly the process.

Now just move it outside the TUI parts so that we can finally use it,
even when the TUI code doesn't get built/linked.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196935
Link: https://lkml.kernel.org/n/tip-hqj7hvcr3mu5lvcqp3cssio6@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-03 10:23:32 -03:00
Jiri Olsa
9f87498f1c perf tools: Add refcnt into struct mem_info
It's passed along several hists entries in --hierarchy mode, so it's
better we keep track of it.

The current fail I see is that it gets removed in hierarchy --mem-mode
mode, where it's shared in the different hierarchies, but removed from
the template hist entry, so the report crashes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180307155020.32613-6-jolsa@kernel.org
[ Rename mem_info__aloc() to mem_info__new(), to fix the typo and use the convention for constructors ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-03-08 11:30:44 -03:00
Jiri Olsa
e3ebaa4651 perf report: Fix memory corruption in --branch-history mode --branch-history
Jin Yao reported memory corrupton in perf report with
branch info used for stack trace:

  > Following command lines will cause perf crash.

  > perf record -j call -g -a <application>
  > perf report --branch-history
  >
  > *** Error in `perf': double free or corruption (!prev): 0x00000000104aa040 ***
  > ======= Backtrace: =========
  > /lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f6b37254725]
  > /lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f6b3725cf4a]
  > /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f6b37260abc]
  > perf[0x51b914]
  > perf(hist_entry_iter__add+0x1e5)[0x51f305]
  > perf[0x43cf01]
  > perf[0x4fa3bf]
  > perf[0x4fa923]
  > perf[0x4fd396]
  > perf[0x4f9614]
  > perf(perf_session__process_events+0x89e)[0x4fc38e]
  > perf(cmd_report+0x15d2)[0x43f202]
  > perf[0x4a059f]
  > perf(main+0x631)[0x427b71]
  > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6b371fd830]
  > perf(_start+0x29)[0x427d89]

For the cumulative output, we allocate the he_cache array based on the
--max-stack option value and populate it with data from 'callchain_cursor'.

The --max-stack option value does not ensure now the limit for number of
callchain_cursor nodes, so the cumulative iter code will allocate smaller array
than it's actually needed and cause above corruption.

I think the --max-stack limit does not apply here anyway, because we add
callchain data as normal hist entries, while the --max-stack control the limit
of single entry callchain depth.

Using the callchain_cursor.nr as he_cache array count to fix this. Also
removing struct hist_entry_iter::max_stack, because there's no longer any use
for it.

We need more fixes to ensure that the branch stack code follows properly the
logic of --max-stack, which is not the case at the moment.

Original-patch-by: Jin Yao <yao.jin@linux.intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reported-by: Jin Yao <yao.jin@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180216123619.GA9945@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-02-16 14:55:47 -03:00
Ingo Molnar
15bcdc9477 Merge branch 'linus' into perf/core, to fix conflicts
Conflicts:
	tools/perf/arch/arm/annotate/instructions.c
	tools/perf/arch/arm64/annotate/instructions.c
	tools/perf/arch/powerpc/annotate/instructions.c
	tools/perf/arch/s390/annotate/instructions.c
	tools/perf/arch/x86/tests/intel-cqm.c
	tools/perf/ui/tui/progress.c
	tools/perf/util/zlib.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07 10:30:18 +01:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Milian Wolff
1fb7d06a50 perf report: Use srcline from callchain for hist entries
This also removes the symbol name from the srcline column, more on this
below.

This ensures we use the correct srcline, which could originate from a
potentially inlined function. The hist entries used to query for the
srcline based purely on the IP, which leads to wrong results for inlined
entries.

Before:

~~~~~
  perf report --inline -s srcline -g none --stdio
  ...
  # Children      Self  Source:Line
  # ........  ........  ..................................................................................................................................
  #
      94.23%     0.00%  __libc_start_main+18446603487898210537
      94.23%     0.00%  _start+41
      44.58%     0.00%  main+100
      44.58%     0.00%  std::_Norm_helper<true>::_S_do_it<double>+100
      44.58%     0.00%  std::__complex_abs+100
      44.58%     0.00%  std::abs<double>+100
      44.58%     0.00%  std::norm<double>+100
      36.01%     0.00%  hypot+18446603487892193300
      25.81%     0.00%  main+41
      25.81%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
      25.81%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
      25.75%    25.75%  random.h:143
      18.39%     0.00%  main+57
      18.39%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
      18.39%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
      13.80%    13.80%  random.tcc:3330
       5.64%     0.00%  ??:0
       4.13%     4.13%  __hypot_finite+163
       4.13%     0.00%  __hypot_finite+18446603487892193443
...
~~~~~

After:

~~~~~
  perf report --inline -s srcline -g none --stdio
  ...
  # Children      Self  Source:Line
  # ........  ........  ...........................................
  #
      94.30%     1.19%  main.cpp:39
      94.23%     0.00%  __libc_start_main+18446603487898210537
      94.23%     0.00%  _start+41
      48.44%     1.70%  random.h:1823
      48.44%     0.00%  random.h:1814
      46.74%     2.53%  random.h:185
      44.68%     0.10%  complex:589
      44.68%     0.00%  complex:597
      44.68%     0.00%  complex:654
      44.68%     0.00%  complex:664
      40.61%    13.80%  random.tcc:3330
      36.01%     0.00%  hypot+18446603487892193300
      26.81%     0.00%  random.h:151
      26.81%     0.00%  random.h:332
      25.75%    25.75%  random.h:143
       5.64%     0.00%  ??:0
       4.13%     4.13%  __hypot_finite+163
       4.13%     0.00%  __hypot_finite+18446603487892193443
...
~~~~~

Note that this change removes the symbol from the source:line hist
column. If this information is desired, users should explicitly query
for it if needed. I.e. run this command instead:

~~~~~
  perf report --inline -s sym,srcline -g none --stdio
  ...
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 1K of event 'cycles:uppp'
  # Event count (approx.): 1381229476
  #
  # Children      Self  Symbol                                                                                                                               Source:Line
  # ........  ........  ...................................................................................................................................  ...........................................
  #
      94.30%     1.19%  [.] main                                                                                                                             main.cpp:39
      94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
      94.23%     0.00%  [.] _start                                                                                                                           _start+41
      48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1814
      48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1823
      46.74%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  random.h:185
      44.68%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              complex:654
      44.68%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     complex:589
      44.68%     0.00%  [.] std::abs<double> (inlined)                                                                                                       complex:597
      44.68%     0.00%  [.] std::norm<double> (inlined)                                                                                                      complex:664
      39.80%    13.59%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
      36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
      26.81%     0.00%  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)                                                        random.h:151
      26.81%     0.00%  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)                                 random.h:332
      25.75%     0.00%  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)                                     random.h:143
      25.19%    25.19%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
       4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
       4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
...
~~~~~

Compared to the old behavior, this reduces duplication in the output.
Before we used to print the symbol name in the srcline column even
when the sym column was explicitly requested. I.e. the output was:

~~~~~
  perf report --inline -s sym,srcline -g none --stdio
  ...
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 1K of event 'cycles:uppp'
  # Event count (approx.): 1381229476
  #
  # Children      Self  Symbol                                                                                                                               Source:Line
  # ........  ........  ...................................................................................................................................  ..................................................................................................................................
  #
      94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
      94.23%     0.00%  [.] _start                                                                                                                           _start+41
      44.58%     0.00%  [.] main                                                                                                                             main+100
      44.58%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              std::_Norm_helper<true>::_S_do_it<double>+100
      44.58%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     std::__complex_abs+100
      44.58%     0.00%  [.] std::abs<double> (inlined)                                                                                                       std::abs<double>+100
      44.58%     0.00%  [.] std::norm<double> (inlined)                                                                                                      std::norm<double>+100
      36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
      25.81%     0.00%  [.] main                                                                                                                             main+41
      25.81%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
      25.81%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
      25.69%    25.69%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
      18.39%     0.00%  [.] main                                                                                                                             main+57
      18.39%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
      18.39%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
      13.80%    13.80%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
       4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
       4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
...
~~~~~

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171019113836.5548-5-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-10-25 10:50:46 -03:00