mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-25 19:45:07 +07:00
4cb93446c5
There is an upper limit to what tooling considers a valid callchain, and it was tied to the hardcoded value in the kernel, PERF_MAX_STACK_DEPTH (127), now that this can be tuned via a sysctl, make it read it and use that as the upper limit, falling back to PERF_MAX_STACK_DEPTH for kernels where this sysctl isn't present. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-yjqsd30nnkogvj5oyx9ghir9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
276 lines
7.5 KiB
Plaintext
276 lines
7.5 KiB
Plaintext
perf-top(1)
|
|
===========
|
|
|
|
NAME
|
|
----
|
|
perf-top - System profiling tool.
|
|
|
|
SYNOPSIS
|
|
--------
|
|
[verse]
|
|
'perf top' [-e <EVENT> | --event=EVENT] [<options>]
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
This command generates and displays a performance counter profile in real time.
|
|
|
|
|
|
OPTIONS
|
|
-------
|
|
-a::
|
|
--all-cpus::
|
|
System-wide collection. (default)
|
|
|
|
-c <count>::
|
|
--count=<count>::
|
|
Event period to sample.
|
|
|
|
-C <cpu-list>::
|
|
--cpu=<cpu>::
|
|
Monitor only on the list of CPUs provided. Multiple CPUs can be provided as a
|
|
comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
|
|
Default is to monitor all CPUS.
|
|
|
|
-d <seconds>::
|
|
--delay=<seconds>::
|
|
Number of seconds to delay between refreshes.
|
|
|
|
-e <event>::
|
|
--event=<event>::
|
|
Select the PMU event. Selection can be a symbolic event name
|
|
(use 'perf list' to list all events) or a raw PMU
|
|
event (eventsel+umask) in the form of rNNN where NNN is a
|
|
hexadecimal event descriptor.
|
|
|
|
-E <entries>::
|
|
--entries=<entries>::
|
|
Display this many functions.
|
|
|
|
-f <count>::
|
|
--count-filter=<count>::
|
|
Only display functions with more events than this.
|
|
|
|
--group::
|
|
Put the counters into a counter group.
|
|
|
|
-F <freq>::
|
|
--freq=<freq>::
|
|
Profile at this frequency.
|
|
|
|
-i::
|
|
--inherit::
|
|
Child tasks do not inherit counters.
|
|
|
|
-k <path>::
|
|
--vmlinux=<path>::
|
|
Path to vmlinux. Required for annotation functionality.
|
|
|
|
-m <pages>::
|
|
--mmap-pages=<pages>::
|
|
Number of mmap data pages (must be a power of two) or size
|
|
specification with appended unit character - B/K/M/G. The
|
|
size is rounded up to have nearest pages power of two value.
|
|
|
|
-p <pid>::
|
|
--pid=<pid>::
|
|
Profile events on existing Process ID (comma separated list).
|
|
|
|
-t <tid>::
|
|
--tid=<tid>::
|
|
Profile events on existing thread ID (comma separated list).
|
|
|
|
-u::
|
|
--uid=::
|
|
Record events in threads owned by uid. Name or number.
|
|
|
|
-r <priority>::
|
|
--realtime=<priority>::
|
|
Collect data with this RT SCHED_FIFO priority.
|
|
|
|
--sym-annotate=<symbol>::
|
|
Annotate this symbol.
|
|
|
|
-K::
|
|
--hide_kernel_symbols::
|
|
Hide kernel symbols.
|
|
|
|
-U::
|
|
--hide_user_symbols::
|
|
Hide user symbols.
|
|
|
|
--demangle-kernel::
|
|
Demangle kernel symbols.
|
|
|
|
-D::
|
|
--dump-symtab::
|
|
Dump the symbol table used for profiling.
|
|
|
|
-v::
|
|
--verbose::
|
|
Be more verbose (show counter open errors, etc).
|
|
|
|
-z::
|
|
--zero::
|
|
Zero history across display updates.
|
|
|
|
-s::
|
|
--sort::
|
|
Sort by key(s): pid, comm, dso, symbol, parent, srcline, weight,
|
|
local_weight, abort, in_tx, transaction, overhead, sample, period.
|
|
Please see description of --sort in the perf-report man page.
|
|
|
|
--fields=::
|
|
Specify output field - multiple keys can be specified in CSV format.
|
|
Following fields are available:
|
|
overhead, overhead_sys, overhead_us, overhead_children, sample and period.
|
|
Also it can contain any sort key(s).
|
|
|
|
By default, every sort keys not specified in --field will be appended
|
|
automatically.
|
|
|
|
-n::
|
|
--show-nr-samples::
|
|
Show a column with the number of samples.
|
|
|
|
--show-total-period::
|
|
Show a column with the sum of periods.
|
|
|
|
--dsos::
|
|
Only consider symbols in these dsos. This option will affect the
|
|
percentage of the overhead column. See --percentage for more info.
|
|
|
|
--comms::
|
|
Only consider symbols in these comms. This option will affect the
|
|
percentage of the overhead column. See --percentage for more info.
|
|
|
|
--symbols::
|
|
Only consider these symbols. This option will affect the
|
|
percentage of the overhead column. See --percentage for more info.
|
|
|
|
-M::
|
|
--disassembler-style=:: Set disassembler style for objdump.
|
|
|
|
--source::
|
|
Interleave source code with assembly code. Enabled by default,
|
|
disable with --no-source.
|
|
|
|
--asm-raw::
|
|
Show raw instruction encoding of assembly instructions.
|
|
|
|
-g::
|
|
Enables call-graph (stack chain/backtrace) recording.
|
|
|
|
--call-graph [mode,type,min[,limit],order[,key][,branch]]::
|
|
Setup and enable call-graph (stack chain/backtrace) recording,
|
|
implies -g. See `--call-graph` section in perf-record and
|
|
perf-report man pages for details.
|
|
|
|
--children::
|
|
Accumulate callchain of children to parent entry so that then can
|
|
show up in the output. The output will have a new "Children" column
|
|
and will be sorted on the data. It requires -g/--call-graph option
|
|
enabled. See the `overhead calculation' section for more details.
|
|
|
|
--max-stack::
|
|
Set the stack depth limit when parsing the callchain, anything
|
|
beyond the specified depth will be ignored. This is a trade-off
|
|
between information loss and faster processing especially for
|
|
workloads that can have a very long callchain stack.
|
|
|
|
Default: /proc/sys/kernel/perf_event_max_stack when present, 127 otherwise.
|
|
|
|
--ignore-callees=<regex>::
|
|
Ignore callees of the function(s) matching the given regex.
|
|
This has the effect of collecting the callers of each such
|
|
function into one place in the call-graph tree.
|
|
|
|
--percent-limit::
|
|
Do not show entries which have an overhead under that percent.
|
|
(Default: 0).
|
|
|
|
--percentage::
|
|
Determine how to display the overhead percentage of filtered entries.
|
|
Filters can be applied by --comms, --dsos and/or --symbols options and
|
|
Zoom operations on the TUI (thread, dso, etc).
|
|
|
|
"relative" means it's relative to filtered entries only so that the
|
|
sum of shown entries will be always 100%. "absolute" means it retains
|
|
the original value before and after the filter is applied.
|
|
|
|
-w::
|
|
--column-widths=<width[,width...]>::
|
|
Force each column width to the provided list, for large terminal
|
|
readability. 0 means no limit (default behavior).
|
|
|
|
--proc-map-timeout::
|
|
When processing pre-existing threads /proc/XXX/mmap, it may take
|
|
a long time, because the file may be huge. A time out is needed
|
|
in such cases.
|
|
This option sets the time out limit. The default value is 500 ms.
|
|
|
|
|
|
-b::
|
|
--branch-any::
|
|
Enable taken branch stack sampling. Any type of taken branch may be sampled.
|
|
This is a shortcut for --branch-filter any. See --branch-filter for more infos.
|
|
|
|
-j::
|
|
--branch-filter::
|
|
Enable taken branch stack sampling. Each sample captures a series of consecutive
|
|
taken branches. The number of branches captured with each sample depends on the
|
|
underlying hardware, the type of branches of interest, and the executed code.
|
|
It is possible to select the types of branches captured by enabling filters.
|
|
For a full list of modifiers please see the perf record manpage.
|
|
|
|
The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
|
|
The privilege levels may be omitted, in which case, the privilege levels of the associated
|
|
event are applied to the branch filter. Both kernel (k) and hypervisor (hv) privilege
|
|
levels are subject to permissions. When sampling on multiple events, branch stack sampling
|
|
is enabled for all the sampling events. The sampled branch type is the same for all events.
|
|
The various filters must be specified as a comma separated list: --branch-filter any_ret,u,k
|
|
Note that this feature may not be available on all processors.
|
|
|
|
--raw-trace::
|
|
When displaying traceevent output, do not use print fmt or plugins.
|
|
|
|
--hierarchy::
|
|
Enable hierarchy output.
|
|
|
|
INTERACTIVE PROMPTING KEYS
|
|
--------------------------
|
|
|
|
[d]::
|
|
Display refresh delay.
|
|
|
|
[e]::
|
|
Number of entries to display.
|
|
|
|
[E]::
|
|
Event to display when multiple counters are active.
|
|
|
|
[f]::
|
|
Profile display filter (>= hit count).
|
|
|
|
[F]::
|
|
Annotation display filter (>= % of total).
|
|
|
|
[s]::
|
|
Annotate symbol.
|
|
|
|
[S]::
|
|
Stop annotation, return to full profile display.
|
|
|
|
[z]::
|
|
Toggle event count zeroing across display updates.
|
|
|
|
[qQ]::
|
|
Quit.
|
|
|
|
Pressing any unmapped key displays a menu, and prompts for input.
|
|
|
|
include::callchain-overhead-calculation.txt[]
|
|
|
|
SEE ALSO
|
|
--------
|
|
linkperf:perf-stat[1], linkperf:perf-list[1], linkperf:perf-report[1]
|