linux_dsm_epyc7002/tools/perf
Frederic Weisbecker aa1ab9d26a perf tools: Fix processing of randomly serialized sched traces
Currently it's possible to meet such too high latency results
with 'perf sched latency'.

 -----------------------------------------------------------------------------------
 Task              |  Runtime ms | Switches | Average delay ms | Maximum delay ms |
 -----------------------------------------------------------------------------------
 xfce4-panel       |    0.222 ms |        2 | avg: 4718.345 ms | max: 9436.493 ms |
 scsi_eh_3         |    3.962 ms |       36 | avg:   55.957 ms | max: 1977.829 ms |

The origin is on traces that are sometimes badly serialized across cpus.
For example the raw traces that raised such results for xfce4-panel:

(1)          [init]-0     [000]  1494.663899990: sched_switch: task swapper:0 [140] (R) ==> xfce4-panel:4569 [120]
(2)     xfce4-panel-4569  [000]  1494.663928373: sched_switch: task xfce4-panel:4569 [120] (S) ==> swapper:0 [140]
(3)            Xorg-4276  [001]  1494.663860125: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000]
(4)            Xorg-4276  [001]  1504.098252756: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000]
(5)            perf-5219  [000]  1504.100353302: sched_switch: task perf:5219 [120] (S) ==> xfce4-panel:4569 [120]

The traces are processed in the order they arrive. Then in (2),
xfce4-panel sleeps, it is first waken up in (3) and eventually
scheduled in (5).

The latency reported is then 1504 - 1495 = 9 secs, as reported by perf
sched. But this is wrong, we are confident in the fact the traces are
nicely serialized while we should actually more trust the timestamps.

If we reorder by timestamps we get:

(1)            Xorg-4276  [001]  1494.663860125: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000]
(2)          [init]-0     [000]  1494.663899990: sched_switch: task swapper:0 [140] (R) ==> xfce4-panel:4569 [120]
(3)     xfce4-panel-4569  [000]  1494.663928373: sched_switch: task xfce4-panel:4569 [120] (S) ==> swapper:0 [140]
(4)            Xorg-4276  [001]  1504.098252756: sched_wakeup: task xfce4-panel:4569 [120] success=1 [000]
(5)            perf-5219  [000]  1504.100353302: sched_switch: task perf:5219 [120] (S) ==> xfce4-panel:4569 [120]

Now the trace make more sense, xfce4-panel is sleeping. Then it is
woken up in (1), scheduled in (2)
It goes to sleep in (3), woken up in (4) and scheduled in (5).

Now, latency captured between (1) and (2) is of 39 us.
And between (4) and (5) it is 2.1 ms.

Such pattern of bad serializing is the origin of the high latencies
reported by perf sched.

Basically, we need to check whether wake up time is higher than
schedule out time. If it's not the case, we need to tag the current
work atom as invalid.

Beside that, we may need to work later on a better ordering of the
traces given by the kernel.

After this patch:

xfce4-session     |    0.221 ms |        1 | avg:    0.538 ms | max:    0.538 ms |

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-14 15:45:10 +02:00
..
Documentation perf sched: Add 'perf sched trace', improve documentation 2009-09-13 17:55:23 +02:00
util perf sched: Implement the 'perf sched record' subcommand 2009-09-13 10:22:51 +02:00
.gitignore perf_counter tools: Move from Documentation/perf_counter/ to tools/perf/ 2009-06-06 20:33:43 +02:00
builtin-annotate.c perf tools: Librarize idle thread registration 2009-08-31 10:04:48 +02:00
builtin-help.c perf: Enable more compiler warnings 2009-08-16 10:47:47 +02:00
builtin-list.c perf list: Fix large list output by using the pager 2009-08-13 09:05:48 +02:00
builtin-record.c perf tools: Add an option to multiplex counters in a single channel 2009-09-14 09:52:23 +02:00
builtin-report.c perf tools: Librarize idle thread registration 2009-08-31 10:04:48 +02:00
builtin-sched.c perf tools: Fix processing of randomly serialized sched traces 2009-09-14 15:45:10 +02:00
builtin-stat.c perf stat: Clean up statistics calculations a bit more 2009-09-04 20:27:26 +02:00
builtin-top.c perf top: Show RIP only in verbose mode 2009-08-26 20:21:45 +02:00
builtin-trace.c perf trace: Fix parsing of perf.data 2009-09-03 16:19:57 +02:00
builtin.h perf: Add 'perf sched' tool 2009-09-13 10:22:36 +02:00
command-list.txt perf: Add 'perf sched' tool 2009-09-13 10:22:36 +02:00
CREDITS perf_counter tools: Add CREDITS file for Git contributors 2009-06-24 19:54:29 +02:00
design.txt perf_counter: Start documenting HAVE_PERF_COUNTERS requirements 2009-06-12 19:37:30 +02:00
Makefile perf: Add 'perf sched' tool 2009-09-13 10:22:36 +02:00
perf.c perf: Add 'perf sched' tool 2009-09-13 10:22:36 +02:00
perf.h perf tools: Librarize trace_event() helper 2009-08-16 23:06:45 +02:00