linux_dsm_epyc7002/arch/x86/kernel
Don Zickus 13beacee81 perf/x86/p4: Fix counter corruption when using lots of perf groups
On a P4 box stressing perf with:

   ./perf record -o perf.data ./perf stat -v ./perf bench all

it was noticed that a slew of unknown NMIs would pop out rather quickly.

Painfully debugging this ancient platform, led me to notice cross cpu counter
corruption.

The P4 machine is special in that it has 18 counters, half are used for cpu0
and the other half is for cpu1 (or all 18 if hyperthreading is disabled).  But
the splitting of the counters has to be actively managed by the software.

In this particular bug, one of the cpu0 specific counters was being used by
cpu1 and caused all sorts of random unknown nmis.

I am not entirely sure on the corruption path, but what happens is:

 o perf schedules a group with p4_pmu_schedule_events()
 o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
   but for a different cpu, so it 'swaps' the config bits and returns the
   updated 'assign' array with a _new_ index.
 o perf schedules another group with p4_pmu_schedule_events()
 o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
   (the same one as above) but for the _same_ cpu [BUG!!], so it updates the
   'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in
   an earlier part of the 'assign' array (and hasn't been committed yet).
 o perf commits the transaction using the wrong index and corrupts the other cpu

The [BUG!!] is because the 'hwc->config' is updated but not the 'hwc->idx'.  So
the check for 'p4_should_swap_ts()' is correct the first time around but
incorrect the second time around (because hwc->config was updated in between).

I think the spirit of perf was to not modify anything until all the
transactions had a chance to 'test' if they would succeed, and if so, commit
atomically.  However, P4 breaks this spirit by touching the hwc->config
element.

So my fix is to continue the un-perf like breakage, by assigning hwc->idx to -1
on swap to tell follow up group scheduling to find a new index.

Of course if the transaction fails rolling this back will be difficult, but
that is not different than how the current code works. :-)  And I wasn't sure
how much effort to cleanup the code I should do for a platform that is almost
10 years old by now.

Hence the lazy fix.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391024270-19469-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:17:23 +01:00
..
acpi ACPI and power management updates for 3.14-rc1 2014-01-24 15:51:02 -08:00
apic Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-31 08:59:46 -08:00
cpu perf/x86/p4: Fix counter corruption when using lots of perf groups 2014-02-09 13:17:23 +01:00
kprobes
.gitignore
alternative.c
amd_gart_64.c
amd_nb.c x86/AMD/NB: Fix amd_set_subcaches() parameter type 2014-01-25 08:50:09 +01:00
apb_timer.c
aperture_64.c
apm_32.c
asm-offsets_32.c
asm-offsets_64.c
asm-offsets.c
audit_64.c
bootflag.c
check.c x86/mm: memblock: switch to use NUMA_NO_NODE 2014-01-21 16:19:47 -08:00
cpuid.c
crash_dump_32.c
crash_dump_64.c
crash.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
devicetree.c Merge remote-tracking branch 'grant/devicetree/next' into for-next 2013-11-07 10:34:46 -06:00
doublefault.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
dumpstack_32.c
dumpstack_64.c
dumpstack.c x86/dumpstack: Fix printk_address for direct addresses 2013-11-12 21:06:06 +01:00
e820.c x86/mm: memblock: switch to use NUMA_NO_NODE 2014-01-21 16:19:47 -08:00
early_printk.c Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 11:12:22 +09:00
early-quirks.c x86/early quirk: use gen6 stolen detection for VLV 2013-11-14 09:32:11 +01:00
entry_32.S ftrace/x86: Load ftrace_ops in parameter not the variable holding it 2014-01-09 13:24:29 -08:00
entry_64.S ftrace/x86: Load ftrace_ops in parameter not the variable holding it 2014-01-09 13:24:29 -08:00
ftrace.c
head32.c
head64.c x86, trace: Register exception handler to trace IDT 2013-11-08 14:15:45 -08:00
head_32.S
head_64.S
head.c
hpet.c
hw_breakpoint.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
i386_ksyms_32.c
i387.c x86: move fpu_counter into ARCH specific thread_struct 2013-11-13 12:09:13 +09:00
i8237.c
i8253.c
i8259.c
io_delay.c
ioport.c
iosf_mbi.c arch: x86: New MailBox support driver for Intel SOC's 2014-01-08 14:36:29 -08:00
irq_32.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 10:20:12 +09:00
irq_64.c
irq_work.c
irq.c x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable() 2014-01-30 16:40:13 -08:00
irqinit.c x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs 2014-01-12 13:13:02 +01:00
jump_label.c
kdebugfs.c
kgdb.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
ksysfs.c x86: ksysfs.c build fix 2014-01-03 14:37:13 +00:00
kvm.c Second batch of KVM updates. Some minor x86 fixes, 2014-01-31 08:37:32 -08:00
kvmclock.c pvclock: detect watchdog reset at pvclock read 2013-11-06 09:48:43 +02:00
ldt.c
machine_kexec_32.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
machine_kexec_64.c
Makefile Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-20 12:09:31 -08:00
mmconf-fam10h_64.c
module.c mm/arch: use NUMA_NO_NODE 2013-11-13 12:09:05 +09:00
mpparse.c
msr.c
nmi_selftest.c
nmi.c x86/nmi: Push duration printk() to irq context 2014-02-09 13:17:22 +01:00
paravirt_patch_32.c
paravirt_patch_64.c
paravirt-spinlocks.c
paravirt.c
pci-calgary_64.c
pci-dma.c
pci-iommu_table.c
pci-nommu.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
pci-swiotlb.c
pcspeaker.c
perf_regs.c
preempt.S
probe_roms.c
process_32.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
process_64.c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:55:56 +09:00
process.c
ptrace.c
pvclock.c hung_task: add method to reset detector 2013-11-06 09:49:02 +02:00
quirks.c x86/quirks: Add workaround for AMD F16h Erratum792 2014-01-25 08:44:19 +01:00
reboot_fixups_32.c
reboot.c x86/apic, doc: Justification for disabling IO APIC before Local APIC 2013-12-04 19:33:21 -08:00
relocate_kernel_32.S
relocate_kernel_64.S
resource.c
rtc.c Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 11:12:22 +09:00
setup_percpu.c
setup.c x86: revert wrong memblock current limit setting 2014-01-27 21:02:38 -08:00
signal.c
smp.c
smpboot.c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-01-20 12:11:41 -08:00
stacktrace.c
step.c
sys_x86_64.c
syscall_32.c
syscall_64.c
sysfb_efi.c
sysfb_simplefb.c
sysfb.c
tboot.c
tce_64.c
test_nx.c
test_rodata.c
time.c
tls.c
tls.h
topology.c
trace_clock.c
tracepoint.c
traps.c x86/traps: Clean up error exception handler definitions 2013-12-12 14:46:42 +01:00
tsc_msr.c x86, tsc, apic: Unbreak static (MSR) calibration when CONFIG_X86_LOCAL_APIC=n 2014-01-16 13:00:21 -08:00
tsc_sync.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
tsc.c sched/x86/tsc: Initialize multiplier to 0 2014-01-23 14:48:36 +01:00
uprobes.c
verify_cpu.S
vm86_32.c
vmlinux.lds.S
vsmp_64.c x86, asmlinkage, paravirt: Make paravirt thunks global 2014-01-29 22:17:17 -08:00
vsyscall_64.c
vsyscall_emu_64.S
vsyscall_trace.h
x86_init.c PCI: Drop "irq" param from *_restore_msi_irqs() 2013-12-13 08:44:30 -07:00
x8664_ksyms_64.c
xsave.c x86, xsave: Support eager-only xsave features, add MPX support 2013-12-06 17:17:42 -08:00