mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-14 23:16:43 +07:00
f4929bd372
Change the Nehalem cache events to use retired memory instruction counters (similar to Westmere), this greatly improves the provided stats. Using: main () { int i; for (i = 0; i < 1000000000; i++) { asm("mov (%%rsp), %%rbx;" "mov %%rbx, (%%rsp);" : : : "rbx"); } } We find: $ perf stat --repeat 10 -e instructions:u -e l1-dcache-loads:u -e l1-dcache-stores:u ./loop_1b_loads+stores Performance counter stats for './loop_1b_loads+stores' (10 runs): 4,000,081,056 instructions:u # 0.000 IPC ( +- 0.000% ) 4,999,502,846 l1-dcache-loads:u ( +- 0.008% ) 1,000,034,832 l1-dcache-stores:u ( +- 0.000% ) 1.565184942 seconds time elapsed ( +- 0.005% ) The 5b is surprising - we'd expect 1b: $ perf stat --repeat 10 -e instructions:u -e r10b:u -e l1-dcache-stores:u ./loop_1b_loads+stores Performance counter stats for './loop_1b_loads+stores' (10 runs): 4,000,081,054 instructions:u # 0.000 IPC ( +- 0.000% ) 1,000,021,961 r10b:u ( +- 0.000% ) 1,000,030,951 l1-dcache-stores:u ( +- 0.000% ) 1.565055422 seconds time elapsed ( +- 0.003% ) Which this patch thus fixes. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stephane Eranian <eranian@google.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Link: http://lkml.kernel.org/n/tip-q9rtru7b7840tws75xzboapv@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu> |
||
---|---|---|
.. | ||
cpufreq | ||
mcheck | ||
mtrr | ||
.gitignore | ||
amd.c | ||
bugs_64.c | ||
bugs.c | ||
centaur.c | ||
common.c | ||
cpu.h | ||
cyrix.c | ||
hypervisor.c | ||
intel_cacheinfo.c | ||
intel.c | ||
Makefile | ||
mkcapflags.pl | ||
mshyperv.c | ||
perf_event_amd.c | ||
perf_event_intel_ds.c | ||
perf_event_intel_lbr.c | ||
perf_event_intel.c | ||
perf_event_p4.c | ||
perf_event_p6.c | ||
perf_event.c | ||
perfctr-watchdog.c | ||
powerflags.c | ||
proc.c | ||
scattered.c | ||
sched.c | ||
topology.c | ||
transmeta.c | ||
umc.c | ||
vmware.c |