Commit Graph

45 Commits

Author SHA1 Message Date
Arnaldo Carvalho de Melo
30d7a77dd5 perf_counter tools: Adjust symbols in ET_EXEC files too
Ingo Molnar wrote:

> i just bisected a 'perf report' bug that would cause us to not
> resolve all user-space symbols in a 'git gc' run to:
>
> f5812a7a33 is first bad commit
> commit f5812a7a33
> Author: Arnaldo Carvalho de Melo <acme@redhat.com>
> Date:   Tue Jun 30 11:43:17 2009 -0300
>
>     perf_counter tools: Adjust only prelinked symbol's addresses

Rename ->prelinked to ->adjust_symbols and making what was done
only for prelinked libraries also to ET_EXEC binaries, such as
/usr/bin/git:

[acme@doppio pahole]$ readelf -h /usr/bin/git | grep Type
  Type:                              EXEC (Executable file)
[acme@doppio pahole]$

And after installing the 'git-debuginfo' package, I get correct results:

[acme@doppio linux-2.6-tip]$ perf report --sort comm,dso,symbol -d /usr/bin/git | head -20

 #
 # (1139614 samples)
 #
 # Overhead           Command  Shared Object              Symbol
 # ........  ................  .........................  ......
 #
    34.98%               git  /usr/bin/git               [.] send_sideband
    33.39%               git  /usr/bin/git               [.] enter_repo
     6.81%               git  /usr/bin/git               [.] diff_opt_parse
     4.95%               git  /usr/bin/git               [.] is_repository_shallow
     3.24%               git  /usr/bin/git               [.] odb_mkstemp
     1.39%               git  /usr/bin/git               [.] output
     1.34%               git  /usr/bin/git               [.] xmmap
     1.25%               git  /usr/bin/git               [.] receive_pack_config
     1.16%               git  /usr/bin/git               [.] git_pathdup
     0.90%               git  /usr/bin/git               [.] read_object_with_reference
     0.86%               git  /usr/bin/git               [.] show_patch_diff
     0.85%               git  /usr/bin/git               0x00000000095e2e
     0.69%               git  /usr/bin/git               [.] display
[acme@doppio linux-2.6-tip]$

I'll check what are the last cases where we can't resolve symbols, like
this 0x00000000095e2e later.

And I guess this will fix the problems Mike were seeing too:

 [acme@doppio linux-2.6-tip]$ readelf -h ../build/perf/vmlinux | grep Type
   Type:                              EXEC (Executable file)
 [acme@doppio linux-2.6-tip]$

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-03 08:24:13 +02:00
Frederic Weisbecker
1e11fd82d2 perf_counter tools: Provide helper to print percents color
Among perf annotate, perf report and perf top, we can find the
common colored printing of percents according to the following
rules:

    High overhead =  > 5%, colored in red
    Mid overhead =  > 0.5%, colored in green
    Low overhead =  < 0.5%, default color

Factorize these multiple checks in a single function named
percent_color_fprintf() and also provide a get_percent_color()
for sites which print percentages and other things at the same
time.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246558475-10624-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 21:38:37 +02:00
Frederic Weisbecker
c20ab37ef3 perf_counter tools: Set the minimum percent for callchains to be displayed
Callchains output may become a burden on a trace because even
rarely hit site are exposed. This can be too much information.

Let the user set a threshold as a minimum percent of hits using
the new pattern for the -c option:

    -c mode,min_percent

Example:

$ perf report -s sym -c flat,4

     8.25%  [k] copy_user_generic_string
             4.19%
                copy_user_generic_string
                generic_file_aio_read
                do_sync_read
                vfs_read
                sys_pread64
                system_call_fastpath
                pread64

     5.39%  [k] search_by_key
     4.63%  0x00000000009e0a
     2.36%  [k] memcpy_c
[...]

$ perf report -s sym -c graph,2

     8.25%  [k] copy_user_generic_string
                |
                |--4.31%-- generic_file_aio_read
                |          do_sync_read
                |          vfs_read
                |          |
                |           --4.19%-- sys_pread64
                |                     system_call_fastpath
                |                     pread64
                |
                 --3.24%-- generic_file_buffered_write
                           __generic_file_aio_write_nolock
                           generic_file_aio_write
                           do_sync_write
                           reiserfs_file_write
                           vfs_write
                           |
                            --3.14%-- sys_pwrite64
                                      system_call_fastpath
                                      __pwrite64

     5.39%  [k] search_by_key
                |
                 --2.23%-- reiserfs_update_sd_size

     4.63%  0x00000000009e0a

     2.36%  [k] memcpy_c
[...]

You can also omit it and it will default to 0.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246558475-10624-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 21:38:37 +02:00
Frederic Weisbecker
4eb3e4788b perf report: Add support for callchain graph output
Currently, the printing of callchains is done in a single
vertical level, this is the "flat" mode:

8.25%  [k] copy_user_generic_string
             4.19%
                copy_user_generic_string
                generic_file_aio_read
                do_sync_read
                vfs_read
                sys_pread64
                system_call_fastpath
                pread64

This patch introduces a new "graph" mode which provides a
hierarchical output of factorized paths recursively sorted:

 8.25%  [k] copy_user_generic_string
                |
                |--4.31%-- generic_file_aio_read
                |          do_sync_read
                |          vfs_read
                |          |
                |          |--4.19%-- sys_pread64
                |          |          system_call_fastpath
                |          |          pread64
                |          |
                |           --0.12%-- sys_read
                |                     system_call_fastpath
                |                     __read
                |
                |--3.24%-- generic_file_buffered_write
                |          __generic_file_aio_write_nolock
                |          generic_file_aio_write
                |          do_sync_write
                |          reiserfs_file_write
                |          vfs_write
                |          |
                |          |--3.14%-- sys_pwrite64
                |          |          system_call_fastpath
                |          |          __pwrite64
                |          |
                |           --0.10%-- sys_write
[...]

The command line has then changed.

By providing the -c option, the callchain will output in the
flat mode by default.

But you can override it:

    perf report -c graph

or

    perf report -c flat

You can also pass the abreviated mode:

    perf report -c g

or

    perf report -c gra

will both make use of the graph mode.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246550301-8954-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 20:47:15 +02:00
Frederic Weisbecker
5a4b181721 perf_counter tools: Add new OPT_CALLBACK_DEFAULT option
There is no predefined macro to create an option that can have
a custom value or a default one if none is given.

This patch provides a new helper OPT_CALLBACK_DEFAULT() which
defines such kind of option.

For example, considering an option -c, we want to get the
default value in the following cases:

    perf command -c -d
    perf command -d -c

And the foo value when it's given:

    perf command -c foo -d
    perf command -d -c foo

That's also why PARSE_OPT_LASTARG_DEFAULT is extended here to
support default values whatever the position of the option, not
only in the end.

Should it now be renamed to PARSE_OPT_ARG_DEFAULT ?

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: git@vger.kernel.org
LKML-Reference: <1246550301-8954-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 20:47:14 +02:00
Frederic Weisbecker
14f4654cbd perf_counter tools: Create new chain_for_each_child() iterator
Iterating through children of a node in the callchain tree
shows something that may be quite confusing at a first glance.
The head is the children field of the parent and the list nodes
are in the brothers field of the children.

This is because the childs are linked to the parent as a list
of "brothers" using the "children" list of the parent as a
head:

  ---------------
 | Parent (head) |-------------------------------------
  ---------------                                      |
     |                                                 |
  children                                             |
     |                                                 |
  -----------               -----------                |
 | 1st child |---brother---| 2nd child |---brother-----
  -----------               -----------

This makes the following strange pattern often occuring:

 list_for_each_entry(child, &parent->children, brothers) {
        // do something with children
 }

Abstract it to chain_for_each_child() to factorize and simplify
this pattern.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246550301-8954-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 20:47:14 +02:00
Mike Galbraith
6cfcc53ed4 perf_counter tools: Connect module support infrastructure to symbol loading infrastructure
Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246514916.13293.46.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 08:42:21 +02:00
Mike Galbraith
208b4b4a59 perf_counter tools: Add infrastructure to support loading of kernel module symbols
Add infrastructure for module path discovery and section load addresses.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246514830.13293.44.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 08:42:20 +02:00
Mike Galbraith
9974f49678 perf_counter tools: Make symbol loading consistently return number of loaded symbols
perf_counter tools: Make symbol loading consistently return number of loaded symbols.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246514758.13293.42.camel@marge.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-02 08:42:20 +02:00
Arnaldo Carvalho de Melo
5da5025858 perf_counter tools: Share list.h with the kernel
The copy we were using came from another copy I did for the dwarves
(pahole) package, that came from the kernel years ago.

The only function that is used by the perf tools and that isn't in the
kernel is list_del_range, that I'm leaving in the perf tools only for
now.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090701174608.GA5823@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 22:37:23 +02:00
Arnaldo Carvalho de Melo
43cbcd8acb perf_counter tools: Share rbtree.with the kernel
The tools/perf/util/rbtree.c copy already drifted by three
csets:

 4b324126e0
 4c60117811
 16c047add3

So remove the copy and use the lib/rbtree.c directly, sharing
the source code while still generating a separate object file,
since tools/perf uses a far more agressive -O6 switch.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090701152837.GG15682@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 22:37:22 +02:00
Jaswinder Singh Rajput
73c24cb86c perf list: Add cache events
After:

$ ./perf list

List of pre-defined events (to be used in -e):

  cpu-cycles OR cycles                     [Hardware event]
  instructions                             [Hardware event]
  cache-references                         [Hardware event]
  cache-misses                             [Hardware event]
  branch-instructions OR branches          [Hardware event]
  branch-misses                            [Hardware event]
  bus-cycles                               [Hardware event]

  cpu-clock                                [Software event]
  task-clock                               [Software event]
  page-faults OR faults                    [Software event]
  minor-faults                             [Software event]
  major-faults                             [Software event]
  context-switches OR cs                   [Software event]
  cpu-migrations OR migrations             [Software event]

  L1-d$-loads                              [Hardware cache event]
  L1-d$-load-misses                        [Hardware cache event]
  L1-d$-stores                             [Hardware cache event]
  L1-d$-store-misses                       [Hardware cache event]
  L1-d$-prefetches                         [Hardware cache event]
  L1-d$-prefetch-misses                    [Hardware cache event]
  L1-i$-loads                              [Hardware cache event]
  L1-i$-load-misses                        [Hardware cache event]
  L1-i$-prefetches                         [Hardware cache event]
  L1-i$-prefetch-misses                    [Hardware cache event]
  LLC-loads                                [Hardware cache event]
  LLC-load-misses                          [Hardware cache event]
  LLC-stores                               [Hardware cache event]
  LLC-store-misses                         [Hardware cache event]
  LLC-prefetches                           [Hardware cache event]
  LLC-prefetch-misses                      [Hardware cache event]
  dTLB-loads                               [Hardware cache event]
  dTLB-load-misses                         [Hardware cache event]
  dTLB-stores                              [Hardware cache event]
  dTLB-store-misses                        [Hardware cache event]
  dTLB-prefetches                          [Hardware cache event]
  dTLB-prefetch-misses                     [Hardware cache event]
  iTLB-loads                               [Hardware cache event]
  iTLB-load-misses                         [Hardware cache event]
  branch-loads                             [Hardware cache event]
  branch-load-misses                       [Hardware cache event]

  rNNN                                     [raw hardware event descriptor]

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1246453578.3072.1.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 15:25:03 +02:00
Ingo Molnar
f37a291c52 perf_counter tools: Add more warnings and fix/annotate them
Enable -Wextra. This found a few real bugs plus a number
of signed/unsigned type mismatches/uncleanlinesses. It
also required a few annotations

All things considered it was still worth it so lets try with
this enabled for now.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 12:49:48 +02:00
Paul Mackerras
61c45981dd perf_counter tools: Rework event string parsing/syntax
This reworks the parser for event descriptors to make it more
consistent in what it accepts.  It is now structured as a
recursive descent parser for the following grammar:

events		::= event ( ("," | space) space* event )*
event		::= ( raw_event | numeric_event | symbolic_event |
		      generic_hw_event ) [ event_modifier ]
raw_event	::= "r" hex_number
numeric_event	::= number ":" number
number		::= decimal_number | "0x" hex_number | "0" octal_number
symbolic_event	::= string_from_event_symbols_array
generic_hw_event::= cache_type ( "-" ( cache_op | cache_result ) )*
event_modifier	::= ":" ( "u" | "k" | "h" )+

with the extra restriction that you can have at most one
cache_op and at most one cache_result.

We pass the current string pointer by reference (i.e. as a
const char **) to the various parsing functions so that they
can advance the pointer to indicate how much they consumed.
They return 0 if they didn't recognize the thing at the pointer
or 1 if they did (and advance the pointer past it).

This also fixes parse_aliases to take the longest matching
alias from the table, not the first one.  Otherwise "l1-data"
would match the "l1-d" alias and the "ata" would not be
consumed.

This allows event modifiers indicating what processor modes to
count in to be applied to any event, not just numeric events,
and adds a ":h" modifier to indicate counting in hypervisor
mode.  Specifying ":u" now sets both exclude_kernel and
exclude_hv, and so on.  Multiple modes can be specified, e.g.
":uk" will count in user or hypervisor mode (i.e. only
exclude_kernel will be set).

Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <19018.53826.843815.189847@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 10:23:17 +02:00
Frederic Weisbecker
deac911cbd perf_counter tools: Various fixes for callchains
The symbol resolving has of course revealed some bugs in the
callchain tree handling. This patch fixes some of them,
including:

- inherit the children from the parents while splitting a node
- fix list range moving
- fix indexes setting in callchains
- create a child on the current node if the path doesn't match in
  the existent children (was only done on the root)
- compare using symbols when possible so that we can match a function
  using any ip inside by referring to its start address.

The practical effects are:

- remove double callchains
- fix upside down or any random order of callchains
- fix wrong paths
- fix bad hits and percentage accounts

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246419315-9968-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 09:58:52 +02:00
Frederic Weisbecker
4424961ad6 perf_counter tools: Resolve symbols in callchains
This patch resolves the names, when possible, of each ip
present in the callchains while using the -c option with perf
report.

Example:

5.40%  [k] __d_lookup
             5.37%
                perf_callchain
                perf_counter_overflow
                intel_pmu_handle_irq
                perf_counter_nmi_handler
                notifier_call_chain
                atomic_notifier_call_chain
                notify_die
                do_nmi
                nmi
                do_lookup
                __link_path_walk
                path_walk
                do_path_lookup
                user_path_at
                sys_faccessat
                sys_access
                system_call_fastpath
                0x7fb609846f77

             0.01%
                perf_callchain
                perf_counter_overflow
                intel_pmu_handle_irq
                perf_counter_nmi_handler
                notifier_call_chain
                atomic_notifier_call_chain
                notify_die
                do_nmi
                nmi
                do_lookup
                __link_path_walk
                path_walk
                do_path_lookup
                user_path_at
                sys_faccessat

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246419315-9968-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 09:58:26 +02:00
Frederic Weisbecker
9198aa77b6 perf_counter tools: Fix storage size allocation of callchain list
Fix a confusion while giving the size of a callchain list
during its allocation. We are using the wrong structure size.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246419315-9968-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 09:58:23 +02:00
Arnaldo Carvalho de Melo
25903407da perf report: Add --dsos parameter
So that we can filter by dso. Symbols in other dsos won't be
accounted for.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1246399282-20934-2-git-send-email-acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 00:07:09 +02:00
Arnaldo Carvalho de Melo
f5812a7a33 perf_counter tools: Adjust only prelinked symbol's addresses
I.e. we can't handle these two kinds of files in the same way:

1) prelinked system library:

[acme@doppio pahole]$ readelf -s /usr/lib64/libdw-0.141.so | egrep 'FUNC.+GLOBAL.+dwfl_report_elf'
   278: 00000030450105a0   261 FUNC    GLOBAL DEFAULT   12 dwfl_report_elf@@ELFUTILS_0.122

2) not prelinked library with debug information from a -debuginfo package:

[acme@doppio pahole]$ readelf -s /usr/lib/debug/usr/lib64/libdw-0.141.so.debug | egrep 'FUNC.+GLOBAL.+dwfl_report_elf'
   629: 00000000000105a0   261 FUNC    GLOBAL DEFAULT   12 dwfl_report_elf
[acme@doppio pahole]$

Now the numbers I got for a pahole perf run are in line with
the numbers I get from oprofile.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090630144317.GB12663@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-30 17:09:30 +02:00
Ingo Molnar
fde953c1c6 perf_counter tools: Remove dead code
Vince Weaver reported that there's a handful of #ifdef __MINGW32__
sections in the code.

Remove them as they are in essence dead code - as unlike upstream
Git, the perf tool is unlikely to be ported to Windows.

Reported-by: Vince Weaver <vince@deater.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-27 06:06:39 +02:00
Frederic Weisbecker
8cb76d99d7 perf_counter tools: Prepare a small callchain framework
We plan to display the callchains depending on some user-configurable
parameters.

To gather the callchains stats from the recorded stream in a fast way,
this patch introduces an ad hoc radix tree adapted for callchains and also
a rbtree to sort these callchains once we have gathered every events
from the stream.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1246026481-8314-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-26 16:47:00 +02:00
Jaswinder Singh Rajput
4418351f06 perf_counter tools: Add alias for 'l1d' and 'l1i'
Add 'l1d' and 'l1i' aliases again as shortcuts - just dont make them
the primary display alias.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1245945462.9157.11.camel@hpdv5.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-25 21:54:53 +02:00
Peter Zijlstra
7c6a1c65bb perf_counter tools: Rework the file format
Create a structured file format that includes the full
perf_counter_attr and all its relevant counter IDs so that
the reporting program has full information.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-25 21:39:04 +02:00
Jaswinder Singh Rajput
e5c5954779 perf_counter tools: Shorten names for events
Added new alias for events.

On AMD box:

 $ ./perf stat -e l1d -e l1d-misses -e l1d-write -e l1d-prefetch -e l1d-prefetch-miss -e l1i -e l1i-misses -e l1i-prefetch -e l2 -e l2-misses -e l2-write -e dtlb -e dtlb-misses -e itlb -e itlb-misses -e bpu -e bpu-misses -- ls -lR /usr/include/ > /dev/null

Before :

 Performance counter stats for 'ls -lR /usr/include/':

      248064467  L1-data-Cache-Load-Referencees  (scaled from 23.27%)
        1001433  L1-data-Cache-Load-Misses  (scaled from 23.34%)
         153691  L1-data-Cache-Store-Referencees  (scaled from 23.34%)
         423248  L1-data-Cache-Prefetch-Referencees  (scaled from 23.33%)
         302138  L1-data-Cache-Prefetch-Misses  (scaled from 23.25%)
      251217546  L1-instruction-Cache-Load-Referencees  (scaled from 23.25%)
        5757005  L1-instruction-Cache-Load-Misses  (scaled from 23.23%)
          93435  L1-instruction-Cache-Prefetch-Referencees  (scaled from 23.24%)
        6496073  L2-Cache-Load-Referencees  (scaled from 23.32%)
         609485  L2-Cache-Load-Misses  (scaled from 23.45%)
        6876991  L2-Cache-Store-Referencees  (scaled from 23.71%)
      248922840  Data-TLB-Cache-Load-Referencees  (scaled from 23.94%)
        5828386  Data-TLB-Cache-Load-Misses  (scaled from 24.17%)
      257613506  Instruction-TLB-Cache-Load-Referencees  (scaled from 24.20%)
           6833  Instruction-TLB-Cache-Load-Misses  (scaled from 23.88%)
      109043606  Branch-Cache-Load-Referencees  (scaled from 23.64%)
        5552296  Branch-Cache-Load-Misses  (scaled from 23.42%)

    0.413702461  seconds time elapsed.

After :

 Peformance counter stats for 'ls -lR /usr/include/':

      266590464  L1-d$-loads           (scaled from 23.03%)
        1222273  L1-d$-load-misses     (scaled from 23.58%)
         146204  L1-d$-stores          (scaled from 23.83%)
         406344  L1-d$-prefetches      (scaled from 24.09%)
         283748  L1-d$-prefetch-misses (scaled from 24.10%)
      249650965  L1-i$-loads           (scaled from 23.80%)
        3353961  L1-i$-load-misses     (scaled from 23.82%)
         104599  L1-i$-prefetches      (scaled from 23.68%)
        4836405  LLC-loads             (scaled from 23.67%)
         498214  LLC-load-misses       (scaled from 23.66%)
        4953994  LLC-stores            (scaled from 23.64%)
      243354097  dTLB-loads            (scaled from 23.77%)
        6468584  dTLB-load-misses      (scaled from 23.74%)
      249719549  iTLB-loads            (scaled from 23.25%)
           5060  iTLB-load-misses      (scaled from 23.00%)
      112343016  branch-loads          (scaled from 22.76%)
        5528876  branch-load-misses    (scaled from 22.54%)

    0.427154051  seconds time elapsed.

Reported-by : Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1245934522.5308.39.camel@hpdv5.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-25 17:30:23 +02:00
Jaswinder Singh Rajput
06813f6c74 perf_counter tools: Check for valid cache operations
Made new table for cache operartion stat 'hw_cache_stat' as:

 L1I : Read and prefetch only
 ITLB and BPU : Read-only

introduce is_cache_op_valid() for cache operation validity

And checks for valid cache operations.

Reported-by : Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1245930367.5308.33.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-25 14:08:49 +02:00
Roel Kluin
f7679dabfa perf_counter tools: Fix strbuf_fread() error path handling
size_t res cannot be less than 0 - fread returns 0 on error.

[ Updated by: René Scharfe <rene.scharfe@lsrfire.ath.cx> ]

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Junio C Hamano <gitster@pobox.com>
LKML-Reference: <4A3FB479.2090902@lsrfire.ath.cx>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-24 10:22:06 +02:00
Jaswinder Singh Rajput
c0c22dbfa8 perf_counter tools: Set alias for page-faults
"faults" should be alias for "page-faults"

Also fixed alignment and 80 characters issue

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1245683846.12092.1.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-22 17:25:53 +02:00
Peter Zijlstra
520f2c346a perf report: Output more symbol related debug data
Print more symbol relocation related info under -vv.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-22 17:02:07 +02:00
Jaswinder Singh Rajput
74d5b5889e perf_counter tools: Introduce alias member in event_symbol
By introducing alias member in event_symbol :

1. duplicate lines are removed, like:
   cpu-cycles and cycles
   branch-instructions and branches
   context-switches and cs
   cpu-migrations and migrations

2. We can also add alias for another events.

Now ./perf list looks like :

List of pre-defined events (to be used in -e):

  cpu-cycles OR cycles                     [Hardware event]
  instructions                             [Hardware event]
  cache-references                         [Hardware event]
  cache-misses                             [Hardware event]
  branch-instructions OR branches          [Hardware event]
  branch-misses                            [Hardware event]
  bus-cycles                               [Hardware event]

  cpu-clock                                [Software event]
  task-clock                               [Software event]
  page-faults                              [Software event]
  faults                                   [Software event]
  minor-faults                             [Software event]
  major-faults                             [Software event]
  context-switches OR cs                   [Software event]
  cpu-migrations OR migrations             [Software event]

  rNNN                                     [raw hardware event descriptor]

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1245669268.17153.8.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-22 13:29:58 +02:00
Jaswinder Singh Rajput
51e2684231 perf_counter tools: Define separate declarations for H/W and S/W events
Define separate declarations for H/W and S/W events to:

 1. Shorten name to save some space so that we can add more members
 2. Fix alignment
 3. Avoid declaring HARDWARE/SOFTWARE again and again.

Removed unused CR(x, y)

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1245669194.17153.6.camel@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-22 13:29:57 +02:00
Ingo Molnar
c1f47b454c perf_counter tools: Fix vmlinux fallback when running on a different kernel
Lucas De Marchi reported that perf report and perf annotate
displays mismatching profile if a perf.data is analyzed on
an older kernel - even if the correct vmlinux is specified
via the -k option.

The reason is the fallback path in util/symbol.c:dso__load_kernel():

int dso__load_kernel(struct dso *self, const char *vmlinux,
                     symbol_filter_t filter, int verbose)
{
        int err = -1;

        if (vmlinux)
                err = dso__load_vmlinux(self, vmlinux, filter, verbose);

        if (err)
                err = dso__load_kallsyms(self, filter, verbose);

        return err;
}

dso__load_vmlinux() returns negative on error, but on success it
returns the number of symbols loaded - which confuses the function
to load the kallsyms.

This is normally harmless, as reporting is usually performed on the
same kernel that is analyzed - but if there's a mismatch then we
load the wrong kallsyms and create a non-sensical symbol tree.

The fix is to only fall back to kallsyms on errors.

Reported-by: Lucas De Marchi <lucas.de.marchi@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-21 13:58:51 +02:00
Paul Mackerras
9cffa8d533 perf_counter tools: Define and use our own u64, s64 etc. definitions
On 64-bit powerpc, __u64 is defined to be unsigned long rather than
unsigned long long.  This causes compiler warnings every time we
print a __u64 value with %Lx.

Rather than changing __u64, we define our own u64 to be unsigned long
long on all architectures, and similarly s64 as signed long long.
For consistency we also define u32, s32, u16, s16, u8 and s8.  These
definitions are put in a new header, types.h, because these definitions
are needed in util/string.h and util/symbol.h.

The main change here is the mechanical change of __[us]{64,32,16,8}
to remove the "__".  The other changes are:

* Create types.h
* Include types.h in perf.h, util/string.h and util/symbol.h
* Add types.h to the LIB_H definition in Makefile
* Added (u64) casts in process_overflow_event() and print_sym_table()
  to kill two remaining warnings.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: benh@kernel.crashing.org
LKML-Reference: <19003.33494.495844.956580@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-19 18:25:47 +02:00
Peter Zijlstra
a73c7d84a1 perf_counter tools: Add and use isprint()
Introduce isprint() to print out raw event dumps to ASCII, etc.

(This is an extension to upstream Git's ctype.c.)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
[ removed openssl.h inclusion from util.h - it leaked ctype.h ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-18 09:46:00 +02:00
Peter Zijlstra
5aa75a0fd4 perf_counter tools: Replace isprint() with issane()
The Git utils came with a ctype replacement that doesn't provide
isprint(). Add a replacement.

Solves a build bug on certain distros.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 19:23:53 +02:00
Ingo Molnar
44175b6f39 perf stat: Reorganize output
- use IPC for the instruction normalization output
 - CPUs for the CPU utilization factor value.
 - print out time elapsed like the other rows
 - tidy up the task-clocks/cpu-clocks printout

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 13:40:03 +02:00
Frederic Weisbecker
301406b9c6 perf annotate: Print the filename:line for annotated colored lines
When we have a colored line in perf annotate, ie a middle/high
overhead one, it's sometimes useful to get the matching line
and filename from the source file, especially this path prepares
to another subsequent one which will print a sorted summary of
midle/high overhead lines in the beginning of the output.

Filename:Lines have the same color than the concerned ip lines.

It can be slow because it relies on addr2line. We could also
use objdump with -l but that implies we would have to bufferize
objdump output and parse it to filter the relevant lines since
we want to print a sorted summary in the beginning.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1244844682-12928-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 12:58:23 +02:00
Yong Wang
faafec1e61 perf_counter tools: Remove one L1-data alias
Otherwise all L1-instruction aliases will be recognized as
L1-data by strcasestr() when calling function parse_aliases.

Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090612031706.GA22126@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-12 13:45:09 +02:00
Peter Zijlstra
f4dbfa8f31 perf_counter: Standardize event names
Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 17:54:15 +02:00
Ingo Molnar
729ff5e2aa perf_counter tools: Clean up u64 usage
A build error slipped in:

 builtin-report.c: In function ‘hist_entry__fprintf’:
 builtin-report.c:711: error: format ‘%12d’ expects type ‘int’, but argument 3 has type ‘uint64_t’

Because we got a bit sloppy with those types. uint64_t really sucks,
because there's no printf format for it. So standardize on __u64
instead - for all types that go to or come from the ABI (which is __u64),
or for values that need to be large enough even on 32-bit.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-11 16:48:38 +02:00
Pekka Enberg
80d496be89 perf report: Add support for profiling JIT generated code
This patch adds support for profiling JIT generated code to 'perf
report'. A JIT compiler is required to generate a "/tmp/perf-$PID.map"
symbols map that is parsed when looking and displaying symbols.

Thanks to Peter Zijlstra for his help with this patch!

Example "perf report" output with the Jato JIT:

 #
 # (40311 samples)
 #
 # Overhead           Command  Shared Object              Symbol
 # ........  ................  .........................  ......
 #
     97.80%              jato  /tmp/perf-11915.map        [.] Fibonacci.fib(I)I
      0.56%              jato  00000000b7fa023b           0x000000b7fa023b
      0.45%              jato  /tmp/perf-11915.map        [.] Fibonacci.main([Ljava/lang/String;)V
      0.38%              jato  [kernel]                   [k] get_page_from_freelist
      0.06%              jato  [kernel]                   [k] kunmap_atomic
      0.05%              jato  ./jato                     [.] utf8Hash
      0.04%              jato  ./jato                     [.] executeJava
      0.04%              jato  ./jato                     [.] defineClass

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: a.p.zijlstra@chello.nl
Cc: acme@redhat.com
LKML-Reference: <Pine.LNX.4.64.0906082111590.12407@melkki.cs.Helsinki.FI>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-08 23:10:44 +02:00
Ingo Molnar
716c69feca perf top: Fall back to cpu-clock-tick hrtimer sampling if no cycle counter available
On architectures/CPUs without PMU support but with perfcounters
enabled 'perf top' currently fails because it cannot create a
cycle based hw-perfcounter.

Fall back to the cpu-clock-tick sw-perfcounter in this case, which
is hrtimer based and will always work (as long as perfcounters
is enabled).

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-07 17:31:52 +02:00
Arjan van de Ven
e9fbc9dc92 perf_counter tools: Initialize a stack variable before use
the "perf report" utility crashed in some circumstances
because the "sym" stack variable was not initialized before used
(as also proven by valgrind).

With this fix both the crash goes away and valgrind no longer complains.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-06 21:22:33 +02:00
Ingo Molnar
8953645fec perf_counter tools: Fix error condition in parse_aliases()
gcc warned about this bug:

util/parse-events.c: In function ‘parse_generic_hw_symbols’:
util/parse-events.c:175: warning: comparison is always false due to limited range of data type
util/parse-events.c:182: warning: comparison is always false due to limited range of data type
util/parse-events.c:190: warning: comparison is always false due to limited range of data type

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-06 21:09:08 +02:00
Arjan van de Ven
7d37a0cbd6 perf_counter tools: Warning fixes on 32-bit
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-06 20:46:19 +02:00
Ingo Molnar
864709302a perf_counter tools: Move from Documentation/perf_counter/ to tools/perf/
Several people have suggested that 'perf' has become a full-fledged
tool that should be moved out of Documentation/. Move it to the
(new) tools/ directory.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-06 20:33:43 +02:00