linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-21 09:24:37 +07:00

History

Andi Kleen 724697648e perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp Add a new 'three-p' precise level, that uses INST_RETIRED.PREC_DIST as base. The basic mechanism of abusing the inverse cmask to get all cycles works the same as before. PREC_DIST is available on Sandy Bridge or later. It had some problems on Sandy Bridge, so we only use it on IvyBridge and later. I tested it on Broadwell and Skylake. PREC_DIST has special support for avoiding shadow effects, which can give better results compare to UOPS_RETIRED. The drawback is that PREC_DIST can only schedule on counter 1, but that is ok for cycle sampling, as there is normally no need to do multiple cycle sampling runs in parallel. It is still possible to run perf top in parallel, as that doesn't use precise mode. Also of course the multiplexing can still allow parallel operation. :pp stays with the previous event. Example: Sample a loop with 10 sqrt with old cycles:pp 0.14 │10: sqrtps %xmm1,%xmm0 <-------------- 9.13 │ sqrtps %xmm1,%xmm0 11.58 │ sqrtps %xmm1,%xmm0 11.51 │ sqrtps %xmm1,%xmm0 6.27 │ sqrtps %xmm1,%xmm0 10.38 │ sqrtps %xmm1,%xmm0 12.20 │ sqrtps %xmm1,%xmm0 12.74 │ sqrtps %xmm1,%xmm0 5.40 │ sqrtps %xmm1,%xmm0 10.14 │ sqrtps %xmm1,%xmm0 10.51 │ ↑ jmp 10 We expect all 10 sqrt to get roughly the sample number of samples. But you can see that the instruction directly after the JMP is systematically underestimated in the result, due to sampling shadow effects. With the new PREC_DIST based sampling this problem is gone and all instructions show up roughly evenly: 9.51 │10: sqrtps %xmm1,%xmm0 11.74 │ sqrtps %xmm1,%xmm0 11.84 │ sqrtps %xmm1,%xmm0 6.05 │ sqrtps %xmm1,%xmm0 10.46 │ sqrtps %xmm1,%xmm0 12.25 │ sqrtps %xmm1,%xmm0 12.18 │ sqrtps %xmm1,%xmm0 5.26 │ sqrtps %xmm1,%xmm0 10.13 │ sqrtps %xmm1,%xmm0 10.43 │ sqrtps %xmm1,%xmm0 0.16 │ ↑ jmp 10 Even with PREC_DIST there is still sampling skid and the result is not completely even, but systematic shadow effects are significantly reduced. The improvements are mainly expected to make a difference in high IPC code. With low IPC it should be similar. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: hpa@zytor.com Link: http://lkml.kernel.org/r/1448929689-13771-2-git-send-email-andi@firstfloor.org Signed-off-by: Ingo Molnar <mingo@kernel.org>		2016-01-06 11:15:32 +01:00
..
mcheck	x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init()	2015-11-01 11:26:14 +01:00
microcode	x86/microcode: Initialize the driver late when facilities are up	2015-11-23 10:39:49 +01:00
mtrr	x86/mm/mtrr: Remove kernel internal MTRR interfaces: unexport mtrr_add() and mtrr_del()	2015-08-28 10:09:28 +02:00
.gitignore
amd.c	x86/AMD: Fix last level cache topology for AMD Fam17h systems	2015-11-07 10:37:51 +01:00
bugs_64.c
bugs.c	x86/fpu: Move various internal function prototypes to fpu/internal.h	2015-05-19 15:47:48 +02:00
centaur.c	x86: Remove CONFIG_X86_OOSTORE	2014-03-11 10:16:18 -07:00
common.c	x86/cpu: Fix SMAP check in PVOPS environments	2015-11-19 11:07:49 +01:00
cpu.h	x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume	2015-07-21 07:51:38 +02:00
cyrix.c	x86: Delete non-required instances of include <linux/init.h>	2014-01-06 21:25:18 -08:00
hypervisor.c	hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests	2015-05-05 18:27:43 +01:00
intel_cacheinfo.c	perf/core, perf/x86: Change needlessly global functions and a variable to static	2015-09-28 08:09:52 +02:00
intel_pt.h	perf/x86/intel/pt: Clean up files of Intel Processor Trace	2015-08-12 11:43:22 +02:00
intel.c	x86/cpu/intel: Enable X86_FEATURE_NONSTOP_TSC_S3 for Merrifield	2015-11-07 10:37:30 +01:00
Makefile	perf/x86: Add Intel cstate PMUs support	2015-10-06 17:31:51 +02:00
match.c	x86: align x86 arch with generic CPU modalias handling	2014-02-18 12:45:38 -08:00
mkcapflags.sh	x86/build: Fix mkcapflags.sh bash-ism	2015-02-19 02:21:00 +01:00
mshyperv.c	x86/hyperv: Fix the build in the !CONFIG_KEXEC_CORE case	2015-09-30 07:44:15 +02:00
perf_event_amd_ibs.c	perf/x86/amd/ibs: Convert force_ibs_eilvt_setup() to void	2015-02-18 17:01:46 +01:00
perf_event_amd_iommu.c	cpumask: factor out show_cpumap into separate helper function	2014-11-07 11:45:00 -08:00
perf_event_amd_iommu.h	perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation	2013-06-19 13:04:53 +02:00
perf_event_amd_uncore.c	cpumask: factor out show_cpumap into separate helper function	2014-11-07 11:45:00 -08:00
perf_event_amd.c	perf/x86: Add 'index' param to get_event_constraint() callback	2015-04-02 17:33:10 +02:00
perf_event_intel_bts.c	perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems	2015-09-13 11:27:22 +02:00
perf_event_intel_cqm.c	perf/core: Robustify the perf_cgroup_from_task() RCU checks	2015-11-23 09:21:03 +01:00
perf_event_intel_cstate.c	perf/x86: Add Intel cstate PMUs support	2015-10-06 17:31:51 +02:00
perf_event_intel_ds.c	perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp	2016-01-06 11:15:32 +01:00
perf_event_intel_lbr.c	perf/x86: Add option to disable reading branch flags/cycles	2015-11-23 09:58:25 +01:00
perf_event_intel_pt.c	perf/x86/intel/pt: Add interface to stop Intel PT logging	2015-11-23 09:58:26 +01:00
perf_event_intel_rapl.c	perf/x86/intel: Fix __initconst declaration in the RAPL perf driver	2015-12-06 12:55:53 +01:00
perf_event_intel_uncore_nhmex.c	perf/x86/uncore: Fix coccinelle warnings	2014-08-13 07:51:09 +02:00
perf_event_intel_uncore_snb.c	perf/x86/intel/uncore: Fix multi-segment problem of perf_event_intel_uncore	2015-10-06 17:31:51 +02:00
perf_event_intel_uncore_snbep.c	perf/x86/intel/uncore: Fix multi-segment problem of perf_event_intel_uncore	2015-10-06 17:31:51 +02:00
perf_event_intel_uncore.c	perf/x86/intel/uncore: Fix multi-segment problem of perf_event_intel_uncore	2015-10-06 17:31:51 +02:00
perf_event_intel_uncore.h	perf/x86/intel/uncore: Fix multi-segment problem of perf_event_intel_uncore	2015-10-06 17:31:51 +02:00
perf_event_intel.c	perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp	2016-01-06 11:15:32 +01:00
perf_event_knc.c	x86: Replace __get_cpu_var uses	2014-08-26 13:45:49 -04:00
perf_event_msr.c	arch/x86/kernel/cpu/perf_event_msr.c: use sign_extend64() for sign extension	2015-11-06 17:50:42 -08:00
perf_event_p4.c	x86: Replace __get_cpu_var uses	2014-08-26 13:45:49 -04:00
perf_event_p6.c	perf/x86/intel/p6: Add userspace RDPMC quirk for PPro	2014-02-09 13:08:24 +01:00
perf_event.c	perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp	2016-01-06 11:15:32 +01:00
perf_event.h	perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp	2016-01-06 11:15:32 +01:00
perfctr-watchdog.c	perf/x86: Add support for Intel Xeon-Phi Knights Corner PMU	2012-10-04 13:32:37 +02:00
powerflags.c	update AMD powerflags comments	2013-05-28 12:02:10 +02:00
proc.c	x86: Replace cpu__mask() with topology__cpumask()	2015-05-27 15:22:17 +02:00
rdrand.c	x86, rdrand: When nordrand is specified, disable RDSEED as well	2014-05-11 20:25:20 -07:00
scattered.c	x86/cpufeatures: Correct spelling of the HWP_NOTIFY flag	2015-09-23 09:57:24 +02:00
topology.c	x86: delete __cpuinit usage from all x86 files	2013-07-14 19:36:56 -04:00
transmeta.c	x86: Delete non-required instances of include <linux/init.h>	2014-01-06 21:25:18 -08:00
umc.c	x86: Delete non-required instances of include <linux/init.h>	2014-01-06 21:25:18 -08:00
vmware.c	x86: Correctly detect hypervisor	2013-08-05 06:35:33 -07:00