mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-28 11:18:45 +07:00
13beacee81
On a P4 box stressing perf with: ./perf record -o perf.data ./perf stat -v ./perf bench all it was noticed that a slew of unknown NMIs would pop out rather quickly. Painfully debugging this ancient platform, led me to notice cross cpu counter corruption. The P4 machine is special in that it has 18 counters, half are used for cpu0 and the other half is for cpu1 (or all 18 if hyperthreading is disabled). But the splitting of the counters has to be actively managed by the software. In this particular bug, one of the cpu0 specific counters was being used by cpu1 and caused all sorts of random unknown nmis. I am not entirely sure on the corruption path, but what happens is: o perf schedules a group with p4_pmu_schedule_events() o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused but for a different cpu, so it 'swaps' the config bits and returns the updated 'assign' array with a _new_ index. o perf schedules another group with p4_pmu_schedule_events() o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused (the same one as above) but for the _same_ cpu [BUG!!], so it updates the 'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in an earlier part of the 'assign' array (and hasn't been committed yet). o perf commits the transaction using the wrong index and corrupts the other cpu The [BUG!!] is because the 'hwc->config' is updated but not the 'hwc->idx'. So the check for 'p4_should_swap_ts()' is correct the first time around but incorrect the second time around (because hwc->config was updated in between). I think the spirit of perf was to not modify anything until all the transactions had a chance to 'test' if they would succeed, and if so, commit atomically. However, P4 breaks this spirit by touching the hwc->config element. So my fix is to continue the un-perf like breakage, by assigning hwc->idx to -1 on swap to tell follow up group scheduling to find a new index. Of course if the transaction fails rolling this back will be difficult, but that is not different than how the current code works. :-) And I wasn't sure how much effort to cleanup the code I should do for a platform that is almost 10 years old by now. Hence the lazy fix. Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1391024270-19469-1-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
---|---|---|
.. | ||
acpi | ||
apic | ||
cpu | ||
kprobes | ||
.gitignore | ||
alternative.c | ||
amd_gart_64.c | ||
amd_nb.c | ||
apb_timer.c | ||
aperture_64.c | ||
apm_32.c | ||
asm-offsets_32.c | ||
asm-offsets_64.c | ||
asm-offsets.c | ||
audit_64.c | ||
bootflag.c | ||
check.c | ||
cpuid.c | ||
crash_dump_32.c | ||
crash_dump_64.c | ||
crash.c | ||
devicetree.c | ||
doublefault.c | ||
dumpstack_32.c | ||
dumpstack_64.c | ||
dumpstack.c | ||
e820.c | ||
early_printk.c | ||
early-quirks.c | ||
entry_32.S | ||
entry_64.S | ||
ftrace.c | ||
head32.c | ||
head64.c | ||
head_32.S | ||
head_64.S | ||
head.c | ||
hpet.c | ||
hw_breakpoint.c | ||
i386_ksyms_32.c | ||
i387.c | ||
i8237.c | ||
i8253.c | ||
i8259.c | ||
io_delay.c | ||
ioport.c | ||
iosf_mbi.c | ||
irq_32.c | ||
irq_64.c | ||
irq_work.c | ||
irq.c | ||
irqinit.c | ||
jump_label.c | ||
kdebugfs.c | ||
kgdb.c | ||
ksysfs.c | ||
kvm.c | ||
kvmclock.c | ||
ldt.c | ||
machine_kexec_32.c | ||
machine_kexec_64.c | ||
Makefile | ||
mmconf-fam10h_64.c | ||
module.c | ||
mpparse.c | ||
msr.c | ||
nmi_selftest.c | ||
nmi.c | ||
paravirt_patch_32.c | ||
paravirt_patch_64.c | ||
paravirt-spinlocks.c | ||
paravirt.c | ||
pci-calgary_64.c | ||
pci-dma.c | ||
pci-iommu_table.c | ||
pci-nommu.c | ||
pci-swiotlb.c | ||
pcspeaker.c | ||
perf_regs.c | ||
preempt.S | ||
probe_roms.c | ||
process_32.c | ||
process_64.c | ||
process.c | ||
ptrace.c | ||
pvclock.c | ||
quirks.c | ||
reboot_fixups_32.c | ||
reboot.c | ||
relocate_kernel_32.S | ||
relocate_kernel_64.S | ||
resource.c | ||
rtc.c | ||
setup_percpu.c | ||
setup.c | ||
signal.c | ||
smp.c | ||
smpboot.c | ||
stacktrace.c | ||
step.c | ||
sys_x86_64.c | ||
syscall_32.c | ||
syscall_64.c | ||
sysfb_efi.c | ||
sysfb_simplefb.c | ||
sysfb.c | ||
tboot.c | ||
tce_64.c | ||
test_nx.c | ||
test_rodata.c | ||
time.c | ||
tls.c | ||
tls.h | ||
topology.c | ||
trace_clock.c | ||
tracepoint.c | ||
traps.c | ||
tsc_msr.c | ||
tsc_sync.c | ||
tsc.c | ||
uprobes.c | ||
verify_cpu.S | ||
vm86_32.c | ||
vmlinux.lds.S | ||
vsmp_64.c | ||
vsyscall_64.c | ||
vsyscall_emu_64.S | ||
vsyscall_trace.h | ||
x86_init.c | ||
x8664_ksyms_64.c | ||
xsave.c |