mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2025-01-25 12:19:30 +07:00
dd41f660c0
Counterpart of --switch-on: # perf record -e sched:*,syscalls:sys_*_nanosleep sleep 1 [ perf record: Woken up 36 times to write data ] [ perf record: Captured and wrote 0.032 MB perf.data (10 samples) ] # # perf script :20918 20918 [002] 109866.143696: sched:sched_waking: comm=perf pid=20919 prio=120 target_cpu=001 :20918 20918 [002] 109866.143702: sched:sched_wakeup: perf:20919 [120] success=1 CPU:001 sleep 20919 [001] 109866.144081: sched:sched_process_exec: filename=/usr/bin/sleep pid=20919 old_pid=20919 sleep 20919 [001] 109866.144408: syscalls:sys_enter_nanosleep: rqtp: 0x7ffc2384fef0, rmtp: 0x00000000 sleep 20919 [001] 109866.144411: sched:sched_stat_runtime: comm=sleep pid=20919 runtime=521249 [ns] vruntime=202919398131 [n> sleep 20919 [001] 109866.144412: sched:sched_switch: sleep:20919 [120] S ==> swapper/1:0 [120] swapper 0 [001] 109867.144568: sched:sched_waking: comm=sleep pid=20919 prio=120 target_cpu=001 swapper 0 [001] 109867.144586: sched:sched_wakeup: sleep:20919 [120] success=1 CPU:001 sleep 20919 [001] 109867.144614: syscalls:sys_exit_nanosleep: 0x0 sleep 20919 [001] 109867.144753: sched:sched_process_exit: comm=sleep pid=20919 prio=120 # # perf script --switch-off syscalls:sys_exit_nanosleep :20918 20918 [002] 109866.143696: sched:sched_waking: comm=perf pid=20919 prio=120 target_cpu=001 :20918 20918 [002] 109866.143702: sched:sched_wakeup: perf:20919 [120] success=1 CPU:001 sleep 20919 [001] 109866.144081: sched:sched_process_exec: filename=/usr/bin/sleep pid=20919 old_pid=20919 sleep 20919 [001] 109866.144408: syscalls:sys_enter_nanosleep: rqtp: 0x7ffc2384fef0, rmtp: 0x00000000 sleep 20919 [001] 109866.144411: sched:sched_stat_runtime: comm=sleep pid=20919 runtime=521249 [ns] vruntime=202919398131 [n> sleep 20919 [001] 109866.144412: sched:sched_switch: sleep:20919 [120] S ==> swapper/1:0 [120] swapper 0 [001] 109867.144568: sched:sched_waking: comm=sleep pid=20919 prio=120 target_cpu=001 swapper 0 [001] 109867.144586: sched:sched_wakeup: sleep:20919 [120] success=1 CPU:001 sleep 20919 [001] 109867.144753: sched:sched_process_exit: comm=sleep pid=20919 prio=120 # # perf script --switch-on syscalls:sys_enter_nanosleep --switch-off syscalls:sys_exit_nanosleep sleep 20919 [001] 109866.144411: sched:sched_stat_runtime: comm=sleep pid=20919 runtime=521249 [ns] vruntime=202919398131 [n> sleep 20919 [001] 109866.144412: sched:sched_switch: sleep:20919 [120] S ==> swapper/1:0 [120] swapper 0 [001] 109867.144568: sched:sched_waking: comm=sleep pid=20919 prio=120 target_cpu=001 swapper 0 [001] 109867.144586: sched:sched_wakeup: sleep:20919 [120] success=1 CPU:001 # # perf script --switch-on syscalls:sys_enter_nanosleep --switch-off syscalls:sys_exit_nanosleep --show-on-off sleep 20919 [001] 109866.144408: syscalls:sys_enter_nanosleep: rqtp: 0x7ffc2384fef0, rmtp: 0x00000000 sleep 20919 [001] 109866.144411: sched:sched_stat_runtime: comm=sleep pid=20919 runtime=521249 [ns] vruntime=202919398131 [n> sleep 20919 [001] 109866.144412: sched:sched_switch: sleep:20919 [120] S ==> swapper/1:0 [120] swapper 0 [001] 109867.144568: sched:sched_waking: comm=sleep pid=20919 prio=120 target_cpu=001 swapper 0 [001] 109867.144586: sched:sched_wakeup: sleep:20919 [120] success=1 CPU:001 sleep 20919 [001] 109867.144614: syscalls:sys_exit_nanosleep: 0x0 # Now think about using this together with 'perf probe' to create custom on/off events in your app :-) Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: William Cohen <wcohen@redhat.com> Link: https://lkml.kernel.org/n/tip-li3j01c4tmj9kw6ydsl8swej@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
433 lines
15 KiB
Plaintext
433 lines
15 KiB
Plaintext
perf-script(1)
|
|
=============
|
|
|
|
NAME
|
|
----
|
|
perf-script - Read perf.data (created by perf record) and display trace output
|
|
|
|
SYNOPSIS
|
|
--------
|
|
[verse]
|
|
'perf script' [<options>]
|
|
'perf script' [<options>] record <script> [<record-options>] <command>
|
|
'perf script' [<options>] report <script> [script-args]
|
|
'perf script' [<options>] <script> <required-script-args> [<record-options>] <command>
|
|
'perf script' [<options>] <top-script> [script-args]
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
This command reads the input file and displays the trace recorded.
|
|
|
|
There are several variants of perf script:
|
|
|
|
'perf script' to see a detailed trace of the workload that was
|
|
recorded.
|
|
|
|
You can also run a set of pre-canned scripts that aggregate and
|
|
summarize the raw trace data in various ways (the list of scripts is
|
|
available via 'perf script -l'). The following variants allow you to
|
|
record and run those scripts:
|
|
|
|
'perf script record <script> <command>' to record the events required
|
|
for 'perf script report'. <script> is the name displayed in the
|
|
output of 'perf script --list' i.e. the actual script name minus any
|
|
language extension. If <command> is not specified, the events are
|
|
recorded using the -a (system-wide) 'perf record' option.
|
|
|
|
'perf script report <script> [args]' to run and display the results
|
|
of <script>. <script> is the name displayed in the output of 'perf
|
|
script --list' i.e. the actual script name minus any language
|
|
extension. The perf.data output from a previous run of 'perf script
|
|
record <script>' is used and should be present for this command to
|
|
succeed. [args] refers to the (mainly optional) args expected by
|
|
the script.
|
|
|
|
'perf script <script> <required-script-args> <command>' to both
|
|
record the events required for <script> and to run the <script>
|
|
using 'live-mode' i.e. without writing anything to disk. <script>
|
|
is the name displayed in the output of 'perf script --list' i.e. the
|
|
actual script name minus any language extension. If <command> is
|
|
not specified, the events are recorded using the -a (system-wide)
|
|
'perf record' option. If <script> has any required args, they
|
|
should be specified before <command>. This mode doesn't allow for
|
|
optional script args to be specified; if optional script args are
|
|
desired, they can be specified using separate 'perf script record'
|
|
and 'perf script report' commands, with the stdout of the record step
|
|
piped to the stdin of the report script, using the '-o -' and '-i -'
|
|
options of the corresponding commands.
|
|
|
|
'perf script <top-script>' to both record the events required for
|
|
<top-script> and to run the <top-script> using 'live-mode'
|
|
i.e. without writing anything to disk. <top-script> is the name
|
|
displayed in the output of 'perf script --list' i.e. the actual
|
|
script name minus any language extension; a <top-script> is defined
|
|
as any script name ending with the string 'top'.
|
|
|
|
[<record-options>] can be passed to the record steps of 'perf script
|
|
record' and 'live-mode' variants; this isn't possible however for
|
|
<top-script> 'live-mode' or 'perf script report' variants.
|
|
|
|
See the 'SEE ALSO' section for links to language-specific
|
|
information on how to write and run your own trace scripts.
|
|
|
|
OPTIONS
|
|
-------
|
|
<command>...::
|
|
Any command you can specify in a shell.
|
|
|
|
-D::
|
|
--dump-raw-trace=::
|
|
Display verbose dump of the trace data.
|
|
|
|
-L::
|
|
--Latency=::
|
|
Show latency attributes (irqs/preemption disabled, etc).
|
|
|
|
-l::
|
|
--list=::
|
|
Display a list of available trace scripts.
|
|
|
|
-s ['lang']::
|
|
--script=::
|
|
Process trace data with the given script ([lang]:script[.ext]).
|
|
If the string 'lang' is specified in place of a script name, a
|
|
list of supported languages will be displayed instead.
|
|
|
|
-g::
|
|
--gen-script=::
|
|
Generate perf-script.[ext] starter script for given language,
|
|
using current perf.data.
|
|
|
|
-a::
|
|
Force system-wide collection. Scripts run without a <command>
|
|
normally use -a by default, while scripts run with a <command>
|
|
normally don't - this option allows the latter to be run in
|
|
system-wide mode.
|
|
|
|
-i::
|
|
--input=::
|
|
Input file name. (default: perf.data unless stdin is a fifo)
|
|
|
|
-d::
|
|
--debug-mode::
|
|
Do various checks like samples ordering and lost events.
|
|
|
|
-F::
|
|
--fields::
|
|
Comma separated list of fields to print. Options are:
|
|
comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
|
|
srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output, brstackinsn,
|
|
brstackoff, callindent, insn, insnlen, synth, phys_addr, metric, misc, srccode, ipc.
|
|
Field list can be prepended with the type, trace, sw or hw,
|
|
to indicate to which event type the field list applies.
|
|
e.g., -F sw:comm,tid,time,ip,sym and -F trace:time,cpu,trace
|
|
|
|
perf script -F <fields>
|
|
|
|
is equivalent to:
|
|
|
|
perf script -F trace:<fields> -F sw:<fields> -F hw:<fields>
|
|
|
|
i.e., the specified fields apply to all event types if the type string
|
|
is not given.
|
|
|
|
In addition to overriding fields, it is also possible to add or remove
|
|
fields from the defaults. For example
|
|
|
|
-F -cpu,+insn
|
|
|
|
removes the cpu field and adds the insn field. Adding/removing fields
|
|
cannot be mixed with normal overriding.
|
|
|
|
The arguments are processed in the order received. A later usage can
|
|
reset a prior request. e.g.:
|
|
|
|
-F trace: -F comm,tid,time,ip,sym
|
|
|
|
The first -F suppresses trace events (field list is ""), but then the
|
|
second invocation sets the fields to comm,tid,time,ip,sym. In this case a
|
|
warning is given to the user:
|
|
|
|
"Overriding previous field request for all events."
|
|
|
|
Alternatively, consider the order:
|
|
|
|
-F comm,tid,time,ip,sym -F trace:
|
|
|
|
The first -F sets the fields for all events and the second -F
|
|
suppresses trace events. The user is given a warning message about
|
|
the override, and the result of the above is that only S/W and H/W
|
|
events are displayed with the given fields.
|
|
|
|
It's possible tp add/remove fields only for specific event type:
|
|
|
|
-Fsw:-cpu,-period
|
|
|
|
removes cpu and period from software events.
|
|
|
|
For the 'wildcard' option if a user selected field is invalid for an
|
|
event type, a message is displayed to the user that the option is
|
|
ignored for that type. For example:
|
|
|
|
$ perf script -F comm,tid,trace
|
|
'trace' not valid for hardware events. Ignoring.
|
|
'trace' not valid for software events. Ignoring.
|
|
|
|
Alternatively, if the type is given an invalid field is specified it
|
|
is an error. For example:
|
|
|
|
perf script -v -F sw:comm,tid,trace
|
|
'trace' not valid for software events.
|
|
|
|
At this point usage is displayed, and perf-script exits.
|
|
|
|
The flags field is synthesized and may have a value when Instruction
|
|
Trace decoding. The flags are "bcrosyiABEx" which stand for branch,
|
|
call, return, conditional, system, asynchronous, interrupt,
|
|
transaction abort, trace begin, trace end, and in transaction,
|
|
respectively. Known combinations of flags are printed more nicely e.g.
|
|
"call" for "bc", "return" for "br", "jcc" for "bo", "jmp" for "b",
|
|
"int" for "bci", "iret" for "bri", "syscall" for "bcs", "sysret" for "brs",
|
|
"async" for "by", "hw int" for "bcyi", "tx abrt" for "bA", "tr strt" for "bB",
|
|
"tr end" for "bE". However the "x" flag will be display separately in those
|
|
cases e.g. "jcc (x)" for a condition branch within a transaction.
|
|
|
|
The callindent field is synthesized and may have a value when
|
|
Instruction Trace decoding. For calls and returns, it will display the
|
|
name of the symbol indented with spaces to reflect the stack depth.
|
|
|
|
When doing instruction trace decoding insn and insnlen give the
|
|
instruction bytes and the instruction length of the current
|
|
instruction.
|
|
|
|
The synth field is used by synthesized events which may be created when
|
|
Instruction Trace decoding.
|
|
|
|
The ipc (instructions per cycle) field is synthesized and may have a value when
|
|
Instruction Trace decoding.
|
|
|
|
Finally, a user may not set fields to none for all event types.
|
|
i.e., -F "" is not allowed.
|
|
|
|
The brstack output includes branch related information with raw addresses using the
|
|
/v/v/v/v/cycles syntax in the following order:
|
|
FROM: branch source instruction
|
|
TO : branch target instruction
|
|
M/P/-: M=branch target mispredicted or branch direction was mispredicted, P=target predicted or direction predicted, -=not supported
|
|
X/- : X=branch inside a transactional region, -=not in transaction region or not supported
|
|
A/- : A=TSX abort entry, -=not aborted region or not supported
|
|
cycles
|
|
|
|
The brstacksym is identical to brstack, except that the FROM and TO addresses are printed in a symbolic form if possible.
|
|
|
|
When brstackinsn is specified the full assembler sequences of branch sequences for each sample
|
|
is printed. This is the full execution path leading to the sample. This is only supported when the
|
|
sample was recorded with perf record -b or -j any.
|
|
|
|
The brstackoff field will print an offset into a specific dso/binary.
|
|
|
|
With the metric option perf script can compute metrics for
|
|
sampling periods, similar to perf stat. This requires
|
|
specifying a group with multiple events defining metrics with the :S option
|
|
for perf record. perf will sample on the first event, and
|
|
print computed metrics for all the events in the group. Please note
|
|
that the metric computed is averaged over the whole sampling
|
|
period (since the last sample), not just for the sample point.
|
|
|
|
For sample events it's possible to display misc field with -F +misc option,
|
|
following letters are displayed for each bit:
|
|
|
|
PERF_RECORD_MISC_KERNEL K
|
|
PERF_RECORD_MISC_USER U
|
|
PERF_RECORD_MISC_HYPERVISOR H
|
|
PERF_RECORD_MISC_GUEST_KERNEL G
|
|
PERF_RECORD_MISC_GUEST_USER g
|
|
PERF_RECORD_MISC_MMAP_DATA* M
|
|
PERF_RECORD_MISC_COMM_EXEC E
|
|
PERF_RECORD_MISC_SWITCH_OUT S
|
|
PERF_RECORD_MISC_SWITCH_OUT_PREEMPT Sp
|
|
|
|
$ perf script -F +misc ...
|
|
sched-messaging 1414 K 28690.636582: 4590 cycles ...
|
|
sched-messaging 1407 U 28690.636600: 325620 cycles ...
|
|
sched-messaging 1414 K 28690.636608: 19473 cycles ...
|
|
misc field ___________/
|
|
|
|
-k::
|
|
--vmlinux=<file>::
|
|
vmlinux pathname
|
|
|
|
--kallsyms=<file>::
|
|
kallsyms pathname
|
|
|
|
--symfs=<directory>::
|
|
Look for files with symbols relative to this directory.
|
|
|
|
-G::
|
|
--hide-call-graph::
|
|
When printing symbols do not display call chain.
|
|
|
|
--stop-bt::
|
|
Stop display of callgraph at these symbols
|
|
|
|
-C::
|
|
--cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can
|
|
be provided as a comma-separated list with no space: 0,1. Ranges of
|
|
CPUs are specified with -: 0-2. Default is to report samples on all
|
|
CPUs.
|
|
|
|
-c::
|
|
--comms=::
|
|
Only display events for these comms. CSV that understands
|
|
file://filename entries.
|
|
|
|
--pid=::
|
|
Only show events for given process ID (comma separated list).
|
|
|
|
--tid=::
|
|
Only show events for given thread ID (comma separated list).
|
|
|
|
-I::
|
|
--show-info::
|
|
Display extended information about the perf.data file. This adds
|
|
information which may be very large and thus may clutter the display.
|
|
It currently includes: cpu and numa topology of the host system.
|
|
It can only be used with the perf script report mode.
|
|
|
|
--show-kernel-path::
|
|
Try to resolve the path of [kernel.kallsyms]
|
|
|
|
--show-task-events
|
|
Display task related events (e.g. FORK, COMM, EXIT).
|
|
|
|
--show-mmap-events
|
|
Display mmap related events (e.g. MMAP, MMAP2).
|
|
|
|
--show-namespace-events
|
|
Display namespace events i.e. events of type PERF_RECORD_NAMESPACES.
|
|
|
|
--show-switch-events
|
|
Display context switch events i.e. events of type PERF_RECORD_SWITCH or
|
|
PERF_RECORD_SWITCH_CPU_WIDE.
|
|
|
|
--show-lost-events
|
|
Display lost events i.e. events of type PERF_RECORD_LOST.
|
|
|
|
--show-round-events
|
|
Display finished round events i.e. events of type PERF_RECORD_FINISHED_ROUND.
|
|
|
|
--show-bpf-events
|
|
Display bpf events i.e. events of type PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT.
|
|
|
|
--demangle::
|
|
Demangle symbol names to human readable form. It's enabled by default,
|
|
disable with --no-demangle.
|
|
|
|
--demangle-kernel::
|
|
Demangle kernel symbol names to human readable form (for C++ kernels).
|
|
|
|
--header
|
|
Show perf.data header.
|
|
|
|
--header-only
|
|
Show only perf.data header.
|
|
|
|
--itrace::
|
|
Options for decoding instruction tracing data. The options are:
|
|
|
|
include::itrace.txt[]
|
|
|
|
To disable decoding entirely, use --no-itrace.
|
|
|
|
--full-source-path::
|
|
Show the full path for source files for srcline output.
|
|
|
|
--max-stack::
|
|
Set the stack depth limit when parsing the callchain, anything
|
|
beyond the specified depth will be ignored. This is a trade-off
|
|
between information loss and faster processing especially for
|
|
workloads that can have a very long callchain stack.
|
|
Note that when using the --itrace option the synthesized callchain size
|
|
will override this value if the synthesized callchain size is bigger.
|
|
|
|
Default: 127
|
|
|
|
--ns::
|
|
Use 9 decimal places when displaying time (i.e. show the nanoseconds)
|
|
|
|
-f::
|
|
--force::
|
|
Don't do ownership validation.
|
|
|
|
--time::
|
|
Only analyze samples within given time window: <start>,<stop>. Times
|
|
have the format seconds.nanoseconds. If start is not given (i.e. time
|
|
string is ',x.y') then analysis starts at the beginning of the file. If
|
|
stop time is not given (i.e. time string is 'x.y,') then analysis goes
|
|
to end of file. Multiple ranges can be separated by spaces, which
|
|
requires the argument to be quoted e.g. --time "1234.567,1234.789 1235,"
|
|
|
|
Also support time percent with multiple time ranges. Time string is
|
|
'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'.
|
|
|
|
For example:
|
|
Select the second 10% time slice:
|
|
perf script --time 10%/2
|
|
|
|
Select from 0% to 10% time slice:
|
|
perf script --time 0%-10%
|
|
|
|
Select the first and second 10% time slices:
|
|
perf script --time 10%/1,10%/2
|
|
|
|
Select from 0% to 10% and 30% to 40% slices:
|
|
perf script --time 0%-10%,30%-40%
|
|
|
|
--max-blocks::
|
|
Set the maximum number of program blocks to print with brstackinsn for
|
|
each sample.
|
|
|
|
--reltime::
|
|
Print time stamps relative to trace start.
|
|
|
|
--per-event-dump::
|
|
Create per event files with a "perf.data.EVENT.dump" name instead of
|
|
printing to stdout, useful, for instance, for generating flamegraphs.
|
|
|
|
--inline::
|
|
If a callgraph address belongs to an inlined function, the inline stack
|
|
will be printed. Each entry has function name and file/line. Enabled by
|
|
default, disable with --no-inline.
|
|
|
|
--insn-trace::
|
|
Show instruction stream for intel_pt traces. Combine with --xed to
|
|
show disassembly.
|
|
|
|
--xed::
|
|
Run xed disassembler on output. Requires installing the xed disassembler.
|
|
|
|
--call-trace::
|
|
Show call stream for intel_pt traces. The CPUs are interleaved, but
|
|
can be filtered with -C.
|
|
|
|
--call-ret-trace::
|
|
Show call and return stream for intel_pt traces.
|
|
|
|
--graph-function::
|
|
For itrace only show specified functions and their callees for
|
|
itrace. Multiple functions can be separated by comma.
|
|
|
|
--switch-on EVENT_NAME::
|
|
Only consider events after this event is found.
|
|
|
|
--switch-off EVENT_NAME::
|
|
Stop considering events after this event is found.
|
|
|
|
--show-on-off-events::
|
|
Show the --switch-on/off events too.
|
|
|
|
SEE ALSO
|
|
--------
|
|
linkperf:perf-record[1], linkperf:perf-script-perl[1],
|
|
linkperf:perf-script-python[1]
|