mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-01 08:06:50 +07:00
8b2a5dac78
Disabling interrupts in trace_clock_local takes quite a performance hit to the recording of traces. Using perf top we see: ------------------------------------------------------------------------------ PerfTop: 244 irqs/sec kernel:100.0% [1000Hz cpu-clock-msecs], (all, 4 CPUs) ------------------------------------------------------------------------------ samples pcnt kernel function _______ _____ _______________ 2842.00 - 40.4% : trace_clock_local 1043.00 - 14.8% : rb_reserve_next_event 784.00 - 11.1% : ring_buffer_lock_reserve 600.00 - 8.5% : __rb_reserve_next 579.00 - 8.2% : rb_end_commit 440.00 - 6.3% : ring_buffer_unlock_commit 290.00 - 4.1% : ring_buffer_producer_thread [ring_buffer_benchmark] 155.00 - 2.2% : debug_smp_processor_id 117.00 - 1.7% : trace_recursive_unlock 103.00 - 1.5% : ring_buffer_event_data 28.00 - 0.4% : do_gettimeofday 22.00 - 0.3% : _spin_unlock_irq 14.00 - 0.2% : native_read_tsc 11.00 - 0.2% : getnstimeofday Where trace_clock_local is 40% of the tracing, and the time for recording a trace according to ring_buffer_benchmark is 210ns. After converting the interrupts to preemption disabling we have from perf top: ------------------------------------------------------------------------------ PerfTop: 1084 irqs/sec kernel:99.9% [1000Hz cpu-clock-msecs], (all, 4 CPUs) ------------------------------------------------------------------------------ samples pcnt kernel function _______ _____ _______________ 1277.00 - 16.8% : native_read_tsc 1148.00 - 15.1% : rb_reserve_next_event 896.00 - 11.8% : ring_buffer_lock_reserve 688.00 - 9.1% : __rb_reserve_next 664.00 - 8.8% : rb_end_commit 563.00 - 7.4% : ring_buffer_unlock_commit 508.00 - 6.7% : _spin_unlock_irq 365.00 - 4.8% : debug_smp_processor_id 321.00 - 4.2% : trace_clock_local 303.00 - 4.0% : ring_buffer_producer_thread [ring_buffer_benchmark] 273.00 - 3.6% : native_sched_clock 122.00 - 1.6% : trace_recursive_unlock 113.00 - 1.5% : sched_clock 101.00 - 1.3% : ring_buffer_event_data 53.00 - 0.7% : tick_nohz_stop_sched_tick Where trace_clock_local drops from 40% to only taking 4% of the total time. The trace time also goes from 210ns down to 179ns (31ns). I talked with Peter Zijlstra about the impact that sched_clock may have without having interrupts disabled, and he told me that if a timer interrupt comes in, sched_clock may report a wrong time. Balancing a seldom incorrect timestamp with a 15% performance boost, I'll take the performance boost. Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> |
||
---|---|---|
.. | ||
blktrace.c | ||
ftrace.c | ||
Kconfig | ||
kmemtrace.c | ||
Makefile | ||
power-traces.c | ||
ring_buffer_benchmark.c | ||
ring_buffer.c | ||
trace_boot.c | ||
trace_branch.c | ||
trace_clock.c | ||
trace_entries.h | ||
trace_event_profile.c | ||
trace_events_filter.c | ||
trace_events.c | ||
trace_export.c | ||
trace_functions_graph.c | ||
trace_functions.c | ||
trace_hw_branches.c | ||
trace_irqsoff.c | ||
trace_mmiotrace.c | ||
trace_nop.c | ||
trace_output.c | ||
trace_output.h | ||
trace_printk.c | ||
trace_sched_switch.c | ||
trace_sched_wakeup.c | ||
trace_selftest_dynamic.c | ||
trace_selftest.c | ||
trace_stack.c | ||
trace_stat.c | ||
trace_stat.h | ||
trace_syscalls.c | ||
trace_sysprof.c | ||
trace_workqueue.c | ||
trace.c | ||
trace.h |