linux_dsm_epyc7002/arch/x86/kernel/cpu
Andi Kleen 724697648e perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp
Add a new 'three-p' precise level, that uses INST_RETIRED.PREC_DIST as
base. The basic mechanism of abusing the inverse cmask to get all
cycles works the same as before.

PREC_DIST is available on Sandy Bridge or later. It had some problems
on Sandy Bridge, so we only use it on IvyBridge and later. I tested it
on Broadwell and Skylake.

PREC_DIST has special support for avoiding shadow effects, which can
give better results compare to UOPS_RETIRED. The drawback is that
PREC_DIST can only schedule on counter 1, but that is ok for cycle
sampling, as there is normally no need to do multiple cycle sampling
runs in parallel. It is still possible to run perf top in parallel, as
that doesn't use precise mode. Also of course the multiplexing can
still allow parallel operation.

:pp stays with the previous event.

Example:

Sample a loop with 10 sqrt with old cycles:pp

	  0.14 │10:   sqrtps %xmm1,%xmm0     <--------------
	  9.13 │      sqrtps %xmm1,%xmm0
	 11.58 │      sqrtps %xmm1,%xmm0
	 11.51 │      sqrtps %xmm1,%xmm0
	  6.27 │      sqrtps %xmm1,%xmm0
	 10.38 │      sqrtps %xmm1,%xmm0
	 12.20 │      sqrtps %xmm1,%xmm0
	 12.74 │      sqrtps %xmm1,%xmm0
	  5.40 │      sqrtps %xmm1,%xmm0
	 10.14 │      sqrtps %xmm1,%xmm0
	 10.51 │    ↑ jmp    10

We expect all 10 sqrt to get roughly the sample number of samples.

But you can see that the instruction directly after the JMP is
systematically underestimated in the result, due to sampling shadow
effects.

With the new PREC_DIST based sampling this problem is gone and all
instructions show up roughly evenly:

	  9.51 │10:   sqrtps %xmm1,%xmm0
	 11.74 │      sqrtps %xmm1,%xmm0
	 11.84 │      sqrtps %xmm1,%xmm0
	  6.05 │      sqrtps %xmm1,%xmm0
	 10.46 │      sqrtps %xmm1,%xmm0
	 12.25 │      sqrtps %xmm1,%xmm0
	 12.18 │      sqrtps %xmm1,%xmm0
	  5.26 │      sqrtps %xmm1,%xmm0
	 10.13 │      sqrtps %xmm1,%xmm0
	 10.43 │      sqrtps %xmm1,%xmm0
	  0.16 │    ↑ jmp    10

Even with PREC_DIST there is still sampling skid and the result is not
completely even, but systematic shadow effects are significantly
reduced.

The improvements are mainly expected to make a difference in high IPC
code. With low IPC it should be similar.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: hpa@zytor.com
Link: http://lkml.kernel.org/r/1448929689-13771-2-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-01-06 11:15:32 +01:00
..
mcheck x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init() 2015-11-01 11:26:14 +01:00
microcode x86/microcode: Initialize the driver late when facilities are up 2015-11-23 10:39:49 +01:00
mtrr x86/mm/mtrr: Remove kernel internal MTRR interfaces: unexport mtrr_add() and mtrr_del() 2015-08-28 10:09:28 +02:00
.gitignore
amd.c x86/AMD: Fix last level cache topology for AMD Fam17h systems 2015-11-07 10:37:51 +01:00
bugs_64.c
bugs.c x86/fpu: Move various internal function prototypes to fpu/internal.h 2015-05-19 15:47:48 +02:00
centaur.c x86: Remove CONFIG_X86_OOSTORE 2014-03-11 10:16:18 -07:00
common.c x86/cpu: Fix SMAP check in PVOPS environments 2015-11-19 11:07:49 +01:00
cpu.h x86/cpu: Restore MSR_IA32_ENERGY_PERF_BIAS after resume 2015-07-21 07:51:38 +02:00
cyrix.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
hypervisor.c hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests 2015-05-05 18:27:43 +01:00
intel_cacheinfo.c perf/core, perf/x86: Change needlessly global functions and a variable to static 2015-09-28 08:09:52 +02:00
intel_pt.h perf/x86/intel/pt: Clean up files of Intel Processor Trace 2015-08-12 11:43:22 +02:00
intel.c x86/cpu/intel: Enable X86_FEATURE_NONSTOP_TSC_S3 for Merrifield 2015-11-07 10:37:30 +01:00
Makefile perf/x86: Add Intel cstate PMUs support 2015-10-06 17:31:51 +02:00
match.c x86: align x86 arch with generic CPU modalias handling 2014-02-18 12:45:38 -08:00
mkcapflags.sh x86/build: Fix mkcapflags.sh bash-ism 2015-02-19 02:21:00 +01:00
mshyperv.c x86/hyperv: Fix the build in the !CONFIG_KEXEC_CORE case 2015-09-30 07:44:15 +02:00
perf_event_amd_ibs.c perf/x86/amd/ibs: Convert force_ibs_eilvt_setup() to void 2015-02-18 17:01:46 +01:00
perf_event_amd_iommu.c cpumask: factor out show_cpumap into separate helper function 2014-11-07 11:45:00 -08:00
perf_event_amd_iommu.h perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation 2013-06-19 13:04:53 +02:00
perf_event_amd_uncore.c cpumask: factor out show_cpumap into separate helper function 2014-11-07 11:45:00 -08:00
perf_event_amd.c perf/x86: Add 'index' param to get_event_constraint() callback 2015-04-02 17:33:10 +02:00
perf_event_intel_bts.c perf/x86/intel/bts: Disallow use by unprivileged users on paranoid systems 2015-09-13 11:27:22 +02:00
perf_event_intel_cqm.c perf/core: Robustify the perf_cgroup_from_task() RCU checks 2015-11-23 09:21:03 +01:00
perf_event_intel_cstate.c perf/x86: Add Intel cstate PMUs support 2015-10-06 17:31:51 +02:00
perf_event_intel_ds.c perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp 2016-01-06 11:15:32 +01:00
perf_event_intel_lbr.c perf/x86: Add option to disable reading branch flags/cycles 2015-11-23 09:58:25 +01:00
perf_event_intel_pt.c perf/x86/intel/pt: Add interface to stop Intel PT logging 2015-11-23 09:58:26 +01:00
perf_event_intel_rapl.c perf/x86/intel: Fix __initconst declaration in the RAPL perf driver 2015-12-06 12:55:53 +01:00
perf_event_intel_uncore_nhmex.c perf/x86/uncore: Fix coccinelle warnings 2014-08-13 07:51:09 +02:00
perf_event_intel_uncore_snb.c perf/x86/intel/uncore: Fix multi-segment problem of perf_event_intel_uncore 2015-10-06 17:31:51 +02:00
perf_event_intel_uncore_snbep.c perf/x86/intel/uncore: Fix multi-segment problem of perf_event_intel_uncore 2015-10-06 17:31:51 +02:00
perf_event_intel_uncore.c perf/x86/intel/uncore: Fix multi-segment problem of perf_event_intel_uncore 2015-10-06 17:31:51 +02:00
perf_event_intel_uncore.h perf/x86/intel/uncore: Fix multi-segment problem of perf_event_intel_uncore 2015-10-06 17:31:51 +02:00
perf_event_intel.c perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp 2016-01-06 11:15:32 +01:00
perf_event_knc.c x86: Replace __get_cpu_var uses 2014-08-26 13:45:49 -04:00
perf_event_msr.c arch/x86/kernel/cpu/perf_event_msr.c: use sign_extend64() for sign extension 2015-11-06 17:50:42 -08:00
perf_event_p4.c x86: Replace __get_cpu_var uses 2014-08-26 13:45:49 -04:00
perf_event_p6.c perf/x86/intel/p6: Add userspace RDPMC quirk for PPro 2014-02-09 13:08:24 +01:00
perf_event.c perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp 2016-01-06 11:15:32 +01:00
perf_event.h perf/x86: Use INST_RETIRED.PREC_DIST for cycles: ppp 2016-01-06 11:15:32 +01:00
perfctr-watchdog.c perf/x86: Add support for Intel Xeon-Phi Knights Corner PMU 2012-10-04 13:32:37 +02:00
powerflags.c update AMD powerflags comments 2013-05-28 12:02:10 +02:00
proc.c x86: Replace cpu_**_mask() with topology_**_cpumask() 2015-05-27 15:22:17 +02:00
rdrand.c x86, rdrand: When nordrand is specified, disable RDSEED as well 2014-05-11 20:25:20 -07:00
scattered.c x86/cpufeatures: Correct spelling of the HWP_NOTIFY flag 2015-09-23 09:57:24 +02:00
topology.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
transmeta.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
umc.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
vmware.c x86: Correctly detect hypervisor 2013-08-05 06:35:33 -07:00