linux_dsm_epyc7002/kernel
Yan, Zheng 21509084f9 perf/x86/intel: Handle multiple records in the PEBS buffer
When the PEBS interrupt threshold is larger than one record and the
machine supports multiple PEBS events, the records of these events are
mixed up and we need to demultiplex them.

Demuxing the records is hard because the hardware is deficient. The
hardware has two issues that, when combined, create impossible
scenarios to demux.

The first issue is that the 'status' field of the PEBS record is a copy
of the GLOBAL_STATUS MSR at PEBS assist time. To see why this is a
problem let us first describe the regular PEBS cycle:

A) the CTRn value reaches 0:
  - the corresponding bit in GLOBAL_STATUS gets set
  - we start arming the hardware assist
  < some unspecified amount of time later -- this could cover multiple
    events of interest >

B) the hardware assist is armed, any next event will trigger it

C) a matching event happens:
  - the hardware assist triggers and generates a PEBS record
    this includes a copy of GLOBAL_STATUS at this moment
  - if we auto-reload we (re)set CTRn
  - we clear the relevant bit in GLOBAL_STATUS

Now consider the following chain of events:

  A0, B0, A1, C0

The event generated for counter 0 will include a status with counter 1
set, even though its not at all related to the record. A similar thing
can happen with a !PEBS event if it just happens to overflow at the
right moment.

The second issue is that the hardware will only emit one record for two
or more counters if the event that triggers the assist is 'close'. The
'close' can be several cycles. In some cases even the complete assist,
if the event is something that doesn't need retirement.

For instance, consider this chain of events:

  A0, B0, A1, B1, C01

Where C01 is an event that triggers both hardware assists, we will
generate but a single record, but again with both counters listed in the
status field.

This time the record pertains to both events.

Note that these two cases are different but undistinguishable with the
data as generated. Therefore demuxing records with multiple PEBS bits
(we can safely ignore status bits for !PEBS counters) is impossible.

Furthermore we cannot emit the record to both events because that might
cause a data leak -- the events might not have the same privileges -- so
what this patch does is discard such events.

The assumption/hope is that such discards will be rare.

Here lists some possible ways you may get high discard rate.

  - when you count the same thing multiple times. But it is not a useful
    configuration.
  - you can be unfortunate if you measure with a userspace only PEBS
    event along with either a kernel or unrestricted PEBS event. Imagine
    the event triggering and setting the overflow flag right before
    entering the kernel. Then all kernel side events will end up with
    multiple bits set.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Kan Liang <kan.liang@intel.com>
[ Changelog improvements. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@infradead.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1430940834-8964-4-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-07 16:08:45 +02:00
..
bpf bpf: fix 64-bit divide 2015-04-27 23:11:49 -04:00
configs x86: Add "make tinyconfig" to configure the tiniest possible kernel 2014-08-08 16:30:24 -07:00
debug debug: prevent entering debug mode on panic/exception. 2015-02-19 12:39:03 -06:00
events perf/x86/intel: Handle multiple records in the PEBS buffer 2015-06-07 16:08:45 +02:00
gcov gcov: fix softlockups 2015-04-17 09:04:08 -04:00
irq genirq: Set IRQCHIP_SKIP_SET_WAKE flag for dummy_irq_chip 2015-04-24 20:57:06 +02:00
livepatch Merge branch 'for-4.1/core-noarch' into for-linus 2015-04-13 23:57:20 +02:00
locking sched: Handle priority boosted tasks proper in setscheduler() 2015-05-08 11:53:55 +02:00
power Merge back earlier suspend/hibernate material for v4.1. 2015-04-10 12:01:59 +02:00
printk TTY/Serial patches for 4.1-rc1 2015-04-21 09:33:10 -07:00
rcu rcu: Control grace-period delays directly from value 2015-04-14 19:33:59 -07:00
sched Merge branch 'perf/urgent' into perf/core, before applying dependent patches 2015-05-27 09:17:21 +02:00
time ktime: Fix ktime_divns to do signed division 2015-05-13 10:19:35 +02:00
trace tracing: Make ftrace_print_array_seq compute buf_len 2015-05-06 23:03:23 -04:00
.gitignore
acct.c acct: check FMODE_CAN_WRITE 2015-04-11 22:27:55 -04:00
async.c kernel/async.c: switch to pr_foo() 2014-10-09 22:26:04 -04:00
audit_tree.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
audit_watch.c VFS: audit: d_backing_inode() annotations 2015-04-15 15:06:55 -04:00
audit.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
audit.h Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit 2015-04-22 14:49:23 -07:00
auditfilter.c Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit 2015-02-11 20:07:47 -08:00
auditsc.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-04-26 17:22:07 -07:00
backtracetest.c
bounds.c page-cgroup: get rid of NR_PCG_FLAGS 2014-08-08 15:57:18 -07:00
capability.c kernel: conditionally support non-root users, groups and capabilities 2015-04-15 16:35:22 -07:00
cgroup_freezer.c cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes 2014-07-15 11:05:09 -04:00
cgroup.c cgroup: remove use of seq_printf return value 2015-04-15 16:35:25 -07:00
compat.c all arches, signal: move restart_block to struct task_struct 2015-02-12 18:54:12 -08:00
configs.c
context_tracking.c context_tracking: Export context_tracking_user_enter/exit 2015-03-09 15:43:00 +01:00
cpu_pm.c
cpu.c Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-04-14 13:36:04 -07:00
cpuset.c kernel, cpuset: remove exception for __GFP_THISNODE 2015-04-14 16:49:03 -07:00
crash_dump.c crash_dump: Make is_kdump_kernel() accessible from modules 2014-08-25 15:42:19 -07:00
cred.c kernel: conditionally support non-root users, groups and capabilities 2015-04-15 16:35:22 -07:00
delayacct.c delayacct: Remove braindamaged type conversions 2014-07-23 10:18:06 -07:00
dma.c
elfcore.c
exec_domain.c Remove rest of exec domains. 2015-04-12 21:03:31 +02:00
exit.c Remove execution domain support 2015-04-12 20:58:24 +02:00
extable.c ftrace/x86/extable: Add is_ftrace_trampoline() function 2014-11-19 15:25:26 -05:00
fork.c oprofile: reduce mmap_sem hold for mm->exe_file 2015-04-17 09:04:11 -04:00
freezer.c freezer: remove obsolete comments in __thaw_task() 2014-10-21 23:44:20 +02:00
futex_compat.c
futex.c Linux 34.0-rc1 2015-02-24 08:41:07 +01:00
groups.c kernel: conditionally support non-root users, groups and capabilities 2015-04-15 16:35:22 -07:00
hung_task.c kernel/hung_task.c: change hung_task.c to use for_each_process_thread() 2015-04-15 16:35:22 -07:00
irq_work.c percpu: Convert remaining __get_cpu_var uses in 3.18-rcX 2014-10-29 11:18:18 -04:00
jump_label.c
kallsyms.c kernel/kallsyms.c: use __seq_open_private() 2014-10-14 02:18:16 +02:00
kcmp.c kcmp: fix standard comparison bug 2014-09-10 15:42:12 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/mcs: Better differentiate between MCS variants 2015-01-14 15:07:32 +01:00
Kconfig.preempt
kexec.c kexec: allocate the kexec control page with KEXEC_CONTROL_MEMORY_GFP 2015-04-23 16:52:01 +02:00
kmod.c usermodehelper: kill the kmod_thread_locker logic 2014-12-10 17:41:17 -08:00
kprobes.c kprobes: makes kprobes/enabled works correctly for optimized kprobes. 2015-02-13 21:21:42 -08:00
ksysfs.c
kthread.c kernel/kthread.c: partial revert of 81c98869fa ("kthread: ensure locality of task_struct allocations") 2014-10-09 22:25:51 -04:00
latencytop.c
Makefile modsign: change default key details 2015-04-30 09:35:41 -07:00
module_signing.c
module-internal.h
module.c Quentin opened a can of worms by adding extable entry checking to modpost, 2015-04-22 09:49:24 -07:00
notifier.c rcu: Make SRCU optional by using CONFIG_SRCU 2015-01-06 11:04:29 -08:00
nsproxy.c bury struct proc_ns in fs/proc 2014-12-04 14:34:54 -05:00
padata.c padata: use %*pb[l] to print bitmaps including cpumasks and nodemasks 2015-02-13 21:21:38 -08:00
panic.c livepatch: kernel: add TAINT_LIVEPATCH 2014-12-22 15:40:48 +01:00
params.c params: handle quotes properly for values not of form foo="bar". 2015-04-15 13:31:23 +09:30
pid_namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-16 15:53:03 -08:00
pid.c fork: report pid reservation failure properly 2015-04-17 09:04:06 -04:00
profile.c profile: use %*pb[l] to print bitmaps including cpumasks and nodemasks 2015-02-13 21:21:38 -08:00
ptrace.c ptrace: ptrace_detach() can no longer race with SIGKILL 2015-04-17 09:04:06 -04:00
range.c kernel: avoid overflow in cmp_range 2015-01-17 10:02:23 +13:00
reboot.c kernel/reboot.c: add orderly_reboot for graceful reboot 2015-04-15 16:35:23 -07:00
relay.c VFS: kernel/: d_inode() annotations 2015-04-15 15:06:55 -04:00
resource.c kernel/resource.c: remove deprecated __check_region() and friends 2015-04-15 16:35:22 -07:00
seccomp.c seccomp: cap SECCOMP_RET_ERRNO data to MAX_ERRNO 2015-02-17 14:34:55 -08:00
signal.c signal: remove warning about using SI_TKILL in rt_[tg]sigqueueinfo 2015-04-17 09:04:06 -04:00
smp.c smp: Fix error case handling in smp_call_function_*() 2015-04-19 13:19:23 -07:00
smpboot.c smpboot: Add common code for notification from dying CPU 2015-03-11 13:20:25 -07:00
smpboot.h
softirq.c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-02-09 15:24:03 -08:00
stacktrace.c stacktrace: introduce snprint_stack_trace for buffer output 2014-12-13 12:42:48 -08:00
stop_machine.c
sys_ni.c kernel: conditionally support non-root users, groups and capabilities 2015-04-15 16:35:22 -07:00
sys.c prctl: avoid using mmap_sem for exe_file serialization 2015-04-17 09:04:07 -04:00
sysctl_binary.c kernel: add panic_on_warn 2014-12-10 17:41:10 -08:00
sysctl.c kernel/sysctl.c: detect overflows when converting to int 2015-04-17 09:04:08 -04:00
system_certificates.S
system_keyring.c KEYS: validate certificate trust only with builtin keys 2014-07-17 09:35:17 -04:00
task_work.c
taskstats.c netlink: make nlmsg_end() and genlmsg_end() void 2015-01-18 01:03:45 -05:00
test_kprobes.c kernel/test_kprobes.c: use current logging functions 2014-08-08 15:57:18 -07:00
torture.c torture: Address race in module cleanup 2014-09-16 13:41:06 -07:00
tracepoint.c
tsacct.c sched: Make task->start_time nanoseconds based 2014-07-23 10:18:05 -07:00
uid16.c groups: Consolidate the setgroups permission checks 2014-12-05 17:19:27 -06:00
up.c
user_namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2014-12-17 12:31:40 -08:00
user-return-notifier.c scheduler: Replace __get_cpu_var with this_cpu_ptr 2014-08-26 13:45:45 -04:00
user.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2014-12-17 12:31:40 -08:00
utsname_sysctl.c
utsname.c copy address of proc_ns_ops into ns_common 2014-12-04 14:34:47 -05:00
watchdog.c watchdog: fix double lock in watchdog_nmi_enable_all 2015-05-19 10:57:03 -07:00
workqueue_internal.h
workqueue.c workqueue: Reorder sysfs code 2015-04-06 11:16:04 -04:00