Commit Graph

285 Commits

Author SHA1 Message Date
Frederic Weisbecker
8a0ecfb8b4 perf hist: Fix missing getline declaration
hist.c needs to include util.h so that it gets stdio.h
inclusion with __GNU_SOURCE defined.

Fixes:
	util/hist.c: In function ‘hist_entry__parse_objdump_line’:
	util/hist.c:931: erreur: implicit declaration of function ‘getline’
	util/hist.c:931: erreur: nested extern declaration of ‘getline’

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1273772836-11533-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-13 16:32:58 -03:00
Arnaldo Carvalho de Melo
ef7b93a119 perf report: Librarize the annotation code and use it in the newt browser
Now we don't anymore use popen to run 'perf annotate' for the selected
symbol, instead we collect per address samplings when processing samples
in 'perf report' if we're using the newt browser, then we use this data
directly to do annotation.

Done this way we can actually traverse the objdump_line objects
directly, matching the addresses to the collected samples and colouring
them appropriately using lower level slang routines.

The new ui_browser class will be reused for the main, callchain aware,
histogram browser, when it will be made generic and don't assume that
the objects are always instances of the objdump_line class maintained
using list_heads.

Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-11 23:23:20 -03:00
Arnaldo Carvalho de Melo
b09e0190ac perf hist: Adopt filter by dso and by thread methods from the newt browser
Those are really not specific to the newt code, can be used by other UI
frontends.

Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-11 12:43:10 -03:00
Arnaldo Carvalho de Melo
fefb0b94bb perf hist: Calculate max_sym name len and nr_entries
Better done when we are adding entries, be it initially of when we're
re-sorting the histograms.

Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-10 19:49:08 -03:00
Arnaldo Carvalho de Melo
1c02c4d2e9 perf hist: Introduce hists class and move lots of methods to it
In cbbc79a we introduced support for multiple events by introducing a
new "event_stat_id" struct and then made several perf_session methods
receive a point to it instead of a pointer to perf_session, and kept the
event_stats and hists rb_tree in perf_session.

While working on the new newt based browser, I realised that it would be
better to introduce a new class, "hists" (short for "histograms"),
renaming the "event_stat_id" struct and the perf_session methods that
were really "hists" methods, as they manipulate only struct hists
members, not touching anything in the other perf_session members.

Other optimizations, such as calculating the maximum lenght of a symbol
name present in an hists instance will be possible as we add them,
avoiding a re-traversal just for finding that information.

The rationale for the name "hists" to replace "event_stat_id" is that we
may have multiple sets of hists for the same event_stat id, as, for
instance, the 'perf diff' tool has, so event stat id is not what
characterizes what this struct and the functions that manipulate it do.

Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-10 13:13:49 -03:00
Arnaldo Carvalho de Melo
232a5c948d perf report: Allow limiting the number of entries to print in callchains
Works by adding a third parameter to the '-g' argument, after the graph
type and minimum percentage, for example:

[root@doppio linux-2.6-tip]# perf report -g fractal,0.5,2

Will show only the first two symbols where at least 0.5% of the samples
took place.

All the other symbols that don't fall outside these constraints will be
put together in the last entry, prefixed with "[...]" and the total
percentage for them.

Suggested-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-09 21:15:35 -03:00
Arnaldo Carvalho de Melo
28e2a106d1 perf hist: Simplify the insertion of new hist_entry instances
And with that fix at least one bug:

The first hit for an entry, the one that calls malloc to create a new
instance in __perf_session__add_hist_entry, wasn't adding the count to
the per cpumode (PERF_RECORD_MISC_USER, etc) total variable.

Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-09 13:10:39 -03:00
Zhang, Yanmin
a1645ce12a perf: 'perf kvm' tool for monitoring guest performance from host
Here is the patch of userspace perf tool.

Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-19 12:37:24 +03:00
Frederic Weisbecker
fcd1498405 perf tools: Fix accidentally preprocessed snprintf callback
struct sort_entry has a callback named snprintf that turns an
entry into a string result.
But there are glibc versions that implement snprintf through a
macro. The following expression is then going to get the snprintf
call preprocessed:

        ent->snprintf(...)

to finally end up in a build error:

        util/hist.c: Dans la fonction «hist_entry__snprintf» :
        util/hist.c:539: erreur: «struct sort_entry» has no member named «__builtin___snprintf_chk»

To fix this, prepend struct sort_entry callbacks with an "se_"
prefix.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-04-14 16:59:21 -03:00
Arnaldo Carvalho de Melo
b9fb930477 perf hist: Only allocate callchain_node if processing callchains
The struct callchain_node size is 120 bytes, that are never used when
there are no callchains or '-g none' is specified, so conditionally
allocate it, reducing sizeof(struct hist_entry) from 210 bytes to only
96, greatly speeding the non-callchain processing.

LKML-Reference: <new-submission>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-04-02 16:28:28 -03:00
Arnaldo Carvalho de Melo
a4e3b956a8 perf hist: Replace ->print() routines by ->snprintf() equivalents
Then hist_entry__fprintf will just us the newly introduced
hist_entry__snprintf, add the newline and fprintf it to the supplied
FILE descriptor.

This allows us to remove the use_browser checking in the color_printf
routines, that now got color_snprintf variants too.

The newt TUI browser (and other GUIs that may come in the future) don't
have to worry about stdio specific stuff in the strings they get from
the se->snprintf routines and instead use whatever means to do the
equivalent.

Also the newt TUI browser don't have to use the fmemopen() hack, instead
it can use the se->snprintf routines directly. For now tho use the
hist_entry__snprintf routine to reduce the patch size.

Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-04-02 16:28:15 -03:00
Arnaldo Carvalho de Melo
5f4d3f8816 perf report: Add progress bars
For when we are processing the events and inserting the entries in the
browser.

Experimentation here: naming "ui_something" we may be treading into
creating a TUI/GUI set of routines that can then be implemented in terms
of multiple backends.

Also the time it takes for adding things to the "browser" takes, visually
(I guess I should do some profiling here ;-) ), more time than for
processing the events...

That means we probably need to create a custom hist_entry browser, so
that we reuse the structures we have in place instead of duplicating
them in newt.

But progress was made and at least we can see something while long files
are being loaded, that must be one of UI 101 bullet points :-)

Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-04-02 16:27:55 -03:00
Arnaldo Carvalho de Melo
c6e718ff8c perf symbols: Move more map_groups methods to map.c
While writing a standalone test app that uses the symbol system to
find kernel space symbols I noticed these also need to be moved.

Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-04-02 16:27:39 -03:00
Arnaldo Carvalho de Melo
b3c9ac0846 perf callchains: Store the map together with the symbol
We need this to know where a symbol in a callchain came from,
for various reasons, among them precise annotation from a
TUI/GUI tool.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1269459619-982-5-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-26 08:52:57 +01:00
Arnaldo Carvalho de Melo
59fd53062f perf tools: Introduce struct map_symbol
That will be in both struct hist_entry and struct
callchain_list, so that the TUI can store a pointer to the pair
(map, symbol) in the trees where hist_entries and
callchain_lists are present, to allow precise annotation instead
of looking for the first symbol with the selected name.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1269459619-982-4-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-26 08:52:57 +01:00
Frederic Weisbecker
301fde27c7 perf: Fix orphan callchain branches
Callchains have markers inside their capture to tell we
enter a context (kernel, user, ...).

Those are not displayed in the callchains but they are
incidentally an active part of the radix tree where
callchains are stored, just like any other address.

If we have the two following callchains:

addr1 -> addr2 -> user context -> addr3
addr1 -> addr2 -> user context -> addr4
addr1 -> addr2 -> addr 5

This is pretty common if addr1 and addr2 are part of an
interrupt path, addr3 and addr4 are user addresses and
addr5 is a kernel non interrupt path.

This will be stored as follows in the tree:

                   addr1
                   addr2
                   /   \
                  /     addr5
            user context
               /    \
             addr3  addr4

But we ignore the context markers in the report, hence
the addr3 and addr4 will appear as orphan branches:

    |--28.30%-- hrtimer_interrupt
    |          smp_apic_timer_interrupt
    |          apic_timer_interrupt
    |          |           <------------- here, no parent!
    |          |          |
    |          |          |--11.11%-- 0x7fae7bccb875
    |          |          |
    |          |          |--11.11%-- 0xffffffffff60013b
    |          |          |
    |          |          |--11.11%-- __pthread_mutex_lock_internal
    |          |          |
    |          |          |--11.11%-- __errno_location

Fix this by removing the context markers when we process the
callchains to the tree.

Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1269274173-20328-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-22 18:47:34 +01:00
Ingo Molnar
d2f1e15b66 Merge commit 'v2.6.34-rc2' into perf/core
Merge reason: Pick up latest perf fixes from upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-22 18:47:01 +01:00
Linus Torvalds
f82c37e7bb Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (35 commits)
  perf: Fix unexported generic perf_arch_fetch_caller_regs
  perf record: Don't try to find buildids in a zero sized file
  perf: export perf_trace_regs and perf_arch_fetch_caller_regs
  perf, x86: Fix hw_perf_enable() event assignment
  perf, ppc: Fix compile error due to new cpu notifiers
  perf: Make the install relative to DESTDIR if specified
  kprobes: Calculate the index correctly when freeing the out-of-line execution slot
  perf tools: Fix sparse CPU numbering related bugs
  perf_event: Fix oops triggered by cpu offline/online
  perf: Drop the obsolete profile naming for trace events
  perf: Take a hot regs snapshot for trace events
  perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot
  perf/x86-64: Use frame pointer to walk on irq and process stacks
  lockdep: Move lock events under lockdep recursion protection
  perf report: Print the map table just after samples for which no map was found
  perf report: Add multiple event support
  perf session: Change perf_session post processing functions to take histogram tree
  perf session: Add storage for seperating event types in report
  perf session: Change add_hist_entry to take the tree root instead of session
  perf record: Add ID and to recorded event data when recording multiple events
  ...
2010-03-18 16:52:46 -07:00
Arnaldo Carvalho de Melo
3997d3776a perf hist: Don't fprintf the callgraph unconditionally
[root@doppio ~]# perf report -i newt.data | head -10
  # Samples: 11999679868
  #
  # Overhead  Command                  Shared Object  Symbol
  # ........  .......  .............................  ......
  #
      63.61%     perf  libslang.so.2.1.4              [.] SLsmg_write_chars
       6.30%     perf  perf                           [.] symbols__find
       2.19%     perf  libnewt.so.0.52.10             [.] newtListboxAppendEntry
       2.08%     perf  libslang.so.2.1.4              [.] SLsmg_write_chars@plt
       1.99%     perf  libc-2.10.2.so                 [.] _IO_vfprintf_internal
  [root@doppio ~]#

Not good, the newt form for report works, but slang has to eat
the cost of the additional callgraph lines everytime it prints a
line, and the callgraph doesn't appear on the screen, so move
the callgraph printing to a separate function and don't use it
in newt.c.

Newt tree widgets are being investigated to properly support
callgraphs, but till that gets merged, lets remove this huge
overhead and show at least the symbol overheads for a callgraph
rich perf.data with good performance.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1268408808-13595-2-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-12 20:31:53 +01:00
Arnaldo Carvalho de Melo
dd2ee78dd8 perf tools: Add missing bytes printed in hist_entry__fprintf
We need those to properly size the browser widht in the newt
TUI.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1268349164-5822-4-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-12 10:28:37 +01:00
Arnaldo Carvalho de Melo
65f2ed2b2f perf report: Print the map table just after samples for which no map was found
If -vv is used just the map table will be printed, -vvv will
print the symbol table too, with it we can see that we have a
bug where some samples are not being resolved to a map when we
get them in the perf.data stream, but after we have it all
processed, we can find the right map, some reordering probably
is happening.

Upcoming patches will provide ways to ask for most PERF_SAMPLE_
conditional samples to be taken for !PERF_RECORD_SAMPLE events
too, then we'll be able to ask for PERF_SAMPLE_TIME and
PERF_SAMPLE_CPU to help diagnose this.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1268161097-17761-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-10 13:53:52 +01:00
Eric B Munson
eefc465cdd perf session: Change perf_session post processing functions to take histogram tree
Now that report can store historgrams for multiple events we
need to be able to do the post processing work for each
histogram. This patch changes the post processing functions so
that they can be called individually for each event's histogram.

Signed-off-by: Eric B Munson <ebmunson@us.ibm.com>
[ Guarantee bisectabilty by fixing up builtin-report.c ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1267804269-22660-5-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-10 13:53:49 +01:00
Eric B Munson
d403d0acc9 perf session: Change add_hist_entry to take the tree root instead of session
In order to minimize the impact of storing multiple events in a
report this function will now take the root of the histogram
tree so that the logic for selecting the proper tree can be
inserted before the call.

Signed-off-by: Eric B Munson <ebmunson@us.ibm.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1267804269-22660-3-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-10 13:53:47 +01:00
Daniel Mack
3ad2f3fbb9 tree-wide: Assorted spelling fixes
In particular, several occurances of funny versions of 'success',
'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
'beginning', 'desirable', 'separate' and 'necessary' are fixed.

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Cc: Joe Perches <joe@perches.com>
Cc: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-02-09 11:13:56 +01:00
Arnaldo Carvalho de Melo
9b33827de6 perf diff: Percent calcs should use double values
Otherwise we do integer math and the delta values round up to
multiples of 1.0%.

Also, calculate absolute values. Things look precise now:

$ perf report -i perf.data.old --sort dso,symbol | head -13
     9.02%  libc-2.10.1.so               [.] _IO_vfprintf_internal
     4.88%  find                         [.] 0x00000000014af0
     2.91%  [kernel]                     [k] __kmalloc
     2.85%  [kernel]                     [k] ext4_htree_store_dirent
     2.50%  libc-2.10.1.so               [.] __GI_memmove
     2.44%  [kernel]                     [k] half_md4_transform
     2.43%  [kernel]                     [k] _spin_lock
     2.33%  [kernel]                     [k] system_call
$ perf report -i perf.data --sort dso,symbol | head -13
     8.55%  libc-2.10.1.so               [.] _IO_vfprintf_internal
     3.11%  [kernel]                     [k] __kmalloc
     3.07%  [kernel]                     [k] ext4_htree_store_dirent
     2.66%  find                         [.] 0x00000000016bcf
     2.61%  [kernel]                     [k] _atomic_dec_and_lock
     2.46%  [kernel]                     [k] half_md4_transform
     2.41%  libc-2.10.1.so               [.] __GI_memmove
     2.30%  find                         [.] 0x00000000009219
$ perf diff | head -13
     9.02%     -0.47%  libc-2.10.1.so               [.] _IO_vfprintf_internal
     2.91%     +0.20%  [kernel]                     [k] __kmalloc
     2.85%     +0.23%  [kernel]                     [k] ext4_htree_store_dirent
     1.99%     +0.62%  [kernel]                     [k] _atomic_dec_and_lock
     2.44%     +0.02%  [kernel]                     [k] half_md4_transform
     2.50%     -0.09%  libc-2.10.1.so               [.] __GI_memmove
     1.88%     +0.01%  [kernel]                     [k] __d_lookup
     2.43%     -0.75%  [kernel]                     [k] _spin_lock
     0.97%     +0.62%  [kernel]                     [k] path_get
     1.99%     -0.42%  libc-2.10.1.so               [.] _int_malloc
$

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1260981109-2621-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-16 18:29:10 +01:00
Arnaldo Carvalho de Melo
c351c28161 perf diff: Use perf_session__fprintf_hists just like 'perf record'
That means that almost everything you can do with 'perf report'
can be done with 'perf diff', for instance:

$ perf record -f find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.062 MB perf.data (~2699
samples) ] $ perf record -f find / > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.062 MB perf.data (~2687
samples) ] perf diff | head -8
     9.02%     +1.00%     find  libc-2.10.1.so               [.] _IO_vfprintf_internal
     2.91%     -1.00%     find  [kernel]                     [k] __kmalloc
     2.85%     -1.00%     find  [kernel]                     [k] ext4_htree_store_dirent
     1.99%     -1.00%     find  [kernel]                     [k] _atomic_dec_and_lock
     2.44%                find  [kernel]                     [k] half_md4_transform
$

So if you want to zoom into libc:

$ perf diff --dsos libc-2.10.1.so | head -8
    37.34%                find  [.] _IO_vfprintf_internal
    10.34%                find  [.] __GI_memmove
     8.25%     +2.00%     find  [.] _int_malloc
     5.07%     -1.00%     find  [.] __GI_mempcpy
     7.62%     +2.00%     find  [.] _int_free
$

And if there were multiple commands using libc, it is also
possible to aggregate them all by using --sort symbol:

$ perf diff --dsos libc-2.10.1.so --sort symbol | head -8
    37.34%             [.] _IO_vfprintf_internal
    10.34%             [.] __GI_memmove
     8.25%     +2.00%  [.] _int_malloc
     5.07%     -1.00%  [.] __GI_mempcpy
     7.62%     +2.00%  [.] _int_free
$

The displacement column now is off by default, to use it:

perf diff -m --dsos libc-2.10.1.so --sort symbol | head -8
    37.34%                   [.] _IO_vfprintf_internal
    10.34%                   [.] __GI_memmove
     8.25%     +2.00%        [.] _int_malloc
     5.07%     -1.00%    +2  [.] __GI_mempcpy
     7.62%     +2.00%    -1  [.] _int_free
$

Using -t/--field-separator can be used for scripting:

$ perf diff -t, -m --dsos libc-2.10.1.so --sort symbol | head -8
37.34, , ,[.] _IO_vfprintf_internal
10.34, , ,[.] __GI_memmove
8.25,+2.00%, ,[.] _int_malloc
5.07,-1.00%,  +2,[.] __GI_mempcpy
7.62,+2.00%,  -1,[.] _int_free
6.99,+1.00%,  -1,[.] _IO_new_file_xsputn
1.89,-2.00%,  +4,[.] __readdir64
$

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1260978567-550-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-16 16:53:37 +01:00
Arnaldo Carvalho de Melo
3e6055ab98 perf session: Move perf report specific hits out of perf_session__fprintf_hists
Those don't make sense for tools such as 'perf diff'.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1260973631-28035-2-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-16 16:51:50 +01:00
Arnaldo Carvalho de Melo
4ecf84d086 perf tools: Move hist entries printing routines from perf report
Will be used in other tools such as 'perf diff'.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1260973631-28035-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-16 16:51:50 +01:00
Arnaldo Carvalho de Melo
d599db3fc5 perf report: Generalize perf_session__fprintf_hists()
Pull it out of builtin-report - further changes will be made and it
will then be reusable in 'perf diff' as well.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1260914682-29652-4-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-16 08:53:50 +01:00
Arnaldo Carvalho de Melo
4e4f06e4c8 perf session: Move the hist_entries rb tree to perf_session
As we'll need to sort multiple times for multiple perf sessions,
so that we can then do a diff.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1260803439-16783-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 16:57:18 +01:00
Arnaldo Carvalho de Melo
b9bf089212 perf tools: No need for three rb_trees for sorting hist entries
All hist entries are in only one of them, so use just one and a
temporary rb_root while sorting/collapsing.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1260797831-11220-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-14 16:57:17 +01:00
Arnaldo Carvalho de Melo
1ed091c45a perf tools: Consolidate symbol resolving across all tools
Now we have a very high level routine for simple tools to
process IP sample events:

	int event__preprocess_sample(const event_t *self,
				     struct addr_location *al,
				     symbol_filter_t filter)

It receives the event itself and will insert new threads in the
global threads list and resolve the map and symbol, filling all
this info into the new addr_location struct, so that tools like
annotate and report can further process the event by creating
hist_entries in their specific way (with or without callgraphs,
etc).

It in turn uses the new next layer function:

	void thread__find_addr_location(struct thread *self, u8 cpumode,
					enum map_type type, u64 addr,
					struct addr_location *al,
					symbol_filter_t filter)

This one will, given a thread (userspace or the kernel kthread
one), will find the given type (MAP__FUNCTION now, MAP__VARIABLE
too in the near future) at the given cpumode, taking vdsos into
account (userspace hit, but kernel symbol) and will fill all
these details in the addr_location given.

Tools that need a more compact API for plain function
resolution, like 'kmem', can use this other one:

	struct symbol *thread__find_function(struct thread *self, u64 addr,
					     symbol_filter_t filter)

So, to resolve a kernel symbol, that is all the 'kmem' tool
needs, its just a matter of calling:

	sym = thread__find_function(kthread, addr, NULL);

The 'filter' parameter is needed because we do lazy
parsing/loading of ELF symtabs or /proc/kallsyms.

With this we remove more code duplication all around, which is
always good, huh? :-)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1259346563-12568-12-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-27 20:22:02 +01:00
Arnaldo Carvalho de Melo
62daacb51a perf tools: Reorganize event processing routines, lotsa dups killed
While implementing event__preprocess_sample, that will do all of
the symbol lookup in one convenient function, I noticed that
util/process_event.[ch] were not being used at all, then started
looking if there were other functions that could be shared
and...

All those functions really don't need to receive offset + head,
the only thing they did was common to all of them, so do it at
one place instead.

Stats about number of each type of event processed now is done
in a central place.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1259346563-12568-11-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-27 20:22:01 +01:00
Arnaldo Carvalho de Melo
9735abf11b perf tools: Move hist_entry__add common code to hist.c
Now perf report and annotate do the callgraph/hit processing in
their specialized hist_entry__add functions.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-03 16:01:59 +02:00
John Kacur
3d1d07ecd2 perf tools: Put common histogram functions in their own file
Move histogram related functions into their own files (hist.c and
hist.h) and make use of them in builtin-annotate.c and
builtin-report.c.

Signed-off-by: John Kacur <jkacur@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <alpine.LFD.2.00.0909281531180.8316@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-30 13:57:56 +02:00