linux_dsm_epyc7002/arch/x86/kernel
Thomas Gleixner 506a66f374 Revert "x86/apic: Ignore secondary threads if nosmt=force"
Dave Hansen reported, that it's outright dangerous to keep SMT siblings
disabled completely so they are stuck in the BIOS and wait for SIPI.

The reason is that Machine Check Exceptions are broadcasted to siblings and
the soft disabled sibling has CR4.MCE = 0. If a MCE is delivered to a
logical core with CR4.MCE = 0, it asserts IERR#, which shuts down or
reboots the machine. The MCE chapter in the SDM contains the following
blurb:

    Because the logical processors within a physical package are tightly
    coupled with respect to shared hardware resources, both logical
    processors are notified of machine check errors that occur within a
    given physical processor. If machine-check exceptions are enabled when
    a fatal error is reported, all the logical processors within a physical
    package are dispatched to the machine-check exception handler. If
    machine-check exceptions are disabled, the logical processors enter the
    shutdown state and assert the IERR# signal. When enabling machine-check
    exceptions, the MCE flag in control register CR4 should be set for each
    logical processor.

Reverting the commit which ignores siblings at enumeration time solves only
half of the problem. The core cpuhotplug logic needs to be adjusted as
well.

This thoughtful engineered mechanism also turns the boot process on all
Intel HT enabled systems into a MCE lottery. MCE is enabled on the boot CPU
before the secondary CPUs are brought up. Depending on the number of
physical cores the window in which this situation can happen is smaller or
larger. On a HSW-EX it's about 750ms:

MCE is enabled on the boot CPU:

[    0.244017] mce: CPU supports 22 MCE banks

The corresponding sibling #72 boots:

[    1.008005] .... node  #0, CPUs:    #72

That means if an MCE hits on physical core 0 (logical CPUs 0 and 72)
between these two points the machine is going to shutdown. At least it's a
known safe state.

It's obvious that the early boot can be hit by an MCE as well and then runs
into the same situation because MCEs are not yet enabled on the boot CPU.
But after enabling them on the boot CPU, it does not make any sense to
prevent the kernel from recovering.

Adjust the nosmt kernel parameter documentation as well.

Reverts: 2207def700 ("x86/apic: Ignore secondary threads if nosmt=force")
Reported-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Luck <tony.luck@intel.com>
2018-07-02 11:25:28 +02:00
..
acpi Revert "x86/apic: Ignore secondary threads if nosmt=force" 2018-07-02 11:25:28 +02:00
apic Revert "x86/apic: Ignore secondary threads if nosmt=force" 2018-07-02 11:25:28 +02:00
cpu x86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings 2018-06-22 14:52:13 +02:00
fpu Merge commit 'upstream-x86-entry' into WIP.x86/mm 2017-12-17 12:58:53 +01:00
kprobes kprobes/x86: Prohibit probing on exception masking instructions 2018-05-13 19:52:55 +02:00
.gitignore
alternative.c x86/paravirt: Remove 'noreplace-paravirt' cmdline option 2018-01-31 10:37:45 +01:00
amd_gart_64.c x86/dma/amd_gart: Use dma_direct_{alloc,free}() 2018-03-20 10:01:57 +01:00
amd_nb.c x86/amd_nb: Add support for Raven Ridge CPUs 2018-05-13 09:00:27 -07:00
apb_timer.c Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-25 14:30:04 -08:00
aperture_64.c x86/gart: Exclude GART aperture from vmcore 2018-01-11 15:09:24 +01:00
apm_32.c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-04 19:17:47 -07:00
asm-offsets_32.c Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables 2018-06-14 12:21:18 +09:00
asm-offsets_64.c Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables 2018-06-14 12:21:18 +09:00
asm-offsets.c Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables 2018-06-14 12:21:18 +09:00
audit_64.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
bootflag.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
check.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cpuid.c x86/cpuid: Allow cpuid_read() to schedule 2018-03-27 12:01:48 +02:00
crash_dump_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
crash_dump_64.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
crash.c kexec_file, x86: move re-factored code to generic side 2018-04-13 17:10:27 -07:00
devicetree.c Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-04-02 16:15:32 -07:00
doublefault.c x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss 2017-12-17 13:59:55 +01:00
dumpstack_32.c x86/dumpstack: Unify show_regs() 2018-03-08 12:04:59 +01:00
dumpstack_64.c x86/dumpstack: Unify show_regs() 2018-03-08 12:04:59 +01:00
dumpstack.c x86/dumpstack: Explain the reasoning for the prologue and buffer size 2018-04-26 16:15:28 +02:00
e820.c x86: Remove pr_fmt duplicate logging prefixes 2018-05-13 21:25:18 +02:00
early_printk.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
early-quirks.c x86/early-quirks: Rename duplicate define of dev_err 2018-05-13 20:04:35 +02:00
ebda.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
eisa.c x86/eisa: Add missing include 2017-08-31 21:34:48 +02:00
espfix_64.c x86/espfix: Document use of _PAGE_GLOBAL 2018-04-09 18:27:33 +02:00
ftrace_32.S x86/retpoline/ftrace: Convert ftrace assembler indirect jumps 2018-01-12 00:14:30 +01:00
ftrace_64.S Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-01-28 12:19:23 -08:00
ftrace.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
head32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
head64.c x86/mm: Mark __pgtable_l5_enabled __initdata 2018-05-19 11:56:58 +02:00
head_32.S Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables 2018-06-14 12:21:18 +09:00
head_64.S x86/mm: Comment _PAGE_GLOBAL mystery 2018-04-12 09:05:58 +02:00
hpet.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
hw_breakpoint.c
i8237.c x86/i8237: Register device based on FADT legacy boot flag 2018-04-27 16:44:29 +02:00
i8253.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
i8259.c Merge branch 'linus' into x86/apic, to resolve conflicts 2017-11-07 10:51:10 +01:00
idt.c x86/idt: Simplify the idt_setup_apic_and_irq_gates() 2018-06-06 13:38:01 +02:00
io_delay.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ioport.c x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() 2018-04-02 20:16:12 +02:00
irq_32.c x86/retpoline/irq32: Convert assembler indirect jumps 2018-01-12 00:14:32 +01:00
irq_64.c x86/irq/64: Print the offending IP in the stack overflow warning 2017-12-17 13:59:53 +01:00
irq_work.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
irq.c Drivers: hv: vmbus: Implement Direct Mode for stimer0 2018-03-06 09:57:17 -08:00
irqinit.c x86/apic: Simplify init_bsp_APIC() usage 2018-02-13 17:30:38 +01:00
itmt.c x86/headers: Remove duplicate #includes 2017-12-12 11:32:24 +01:00
jailhouse.c x86: Convert x86_platform_ops to timespec64 2018-05-19 14:03:14 +02:00
jump_label.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kdebugfs.c x86, mpparse, x86/acpi, x86/PCI, x86/dmi, SFI: Use memremap() for RAM mappings 2017-07-18 11:37:58 +02:00
kexec-bzimage64.c kexec_file: do not add extra alignment to efi memmap 2018-04-20 17:18:36 -07:00
kgdb.c
ksysfs.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
kvm.c kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME 2018-05-17 19:12:13 +02:00
kvmclock.c x86: Convert x86_platform_ops to timespec64 2018-05-19 14:03:14 +02:00
ldt.c x86/ldt: Fix support_pte_mask filtering in map_ldt_struct() 2018-04-16 11:20:34 -07:00
livepatch.c
machine_kexec_32.c x86/kexec: Avoid double free_page() upon do_kexec_load() failure 2018-05-13 19:50:06 +02:00
machine_kexec_64.c x86/mm: Stop pretending pgtable_l5_enabled is a variable 2018-05-19 11:56:57 +02:00
Makefile Merge branch 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-04-02 17:18:45 -07:00
mmconf-fam10h_64.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
module.c x86: Treat R_X86_64_PLT32 as R_X86_64_PC32 2018-02-22 09:01:10 -08:00
mpparse.c Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-02-14 17:02:15 -08:00
msr.c x86/msr: Remove bogus cleanup from the error path 2016-12-25 10:47:41 +01:00
nmi_selftest.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nmi.c locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() 2017-10-25 11:01:08 +02:00
paravirt_patch_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
paravirt_patch_64.c x86/paravirt: Dont patch flush_tlb_single 2017-12-17 14:27:52 +01:00
paravirt-spinlocks.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
paravirt.c x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]() 2018-02-15 01:15:52 +01:00
pci-calgary_64.c x86/dma: Remove dma_alloc_coherent_gfp_flags() 2018-03-20 10:01:58 +01:00
pci-dma.c x86/pci-dma: switch the VIA 32-bit DMA quirk to use the struct device flag 2018-05-28 12:48:25 +02:00
pci-iommu_table.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
pci-swiotlb.c x86/dma: Use generic swiotlb_ops 2018-03-20 10:01:57 +01:00
pcspeaker.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
perf_regs.c perf/x86: Store user space frame-pointer value on a sample 2018-05-25 08:11:12 +02:00
platform-quirks.c x86/i8237: Register device based on FADT legacy boot flag 2018-04-27 16:44:29 +02:00
pmem.c resource: Provide resource struct in resource walk callback 2017-11-07 15:35:57 +01:00
probe_roms.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
process_32.c x86/dumpstack: Add a show_ip() function 2018-04-26 16:15:27 +02:00
process_64.c x86/mm: Drop TS_COMPAT on 64-bit exec() syscall 2018-05-19 12:31:05 +02:00
process.c x86/speculation: Rework speculative_store_bypass_update() 2018-05-17 17:09:19 +02:00
ptrace.c signal: Ensure every siginfo we send has all bits initialized 2018-04-25 10:40:51 -05:00
pvclock.c x86: Convert x86_platform_ops to timespec64 2018-05-19 14:03:14 +02:00
quirks.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
reboot_fixups_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
reboot.c x86/apic: Fix restoring boot IRQ mode in reboot and kexec/kdump 2018-02-17 11:47:45 +01:00
relocate_kernel_32.S
relocate_kernel_64.S x86/kexec: Make kexec (mostly) work in 5-level paging mode 2018-01-31 08:39:40 +01:00
resource.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
rtc.c x86: Convert x86_platform_ops to timespec64 2018-05-19 14:03:14 +02:00
setup_percpu.c x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table 2018-03-01 09:48:27 +01:00
setup.c x86/speculation/l1tf: Make sure the first page is always reserved 2018-06-20 19:10:00 +02:00
signal_compat.c signal: Add TRAP_UNK si_code for undiagnosted trap exceptions 2018-04-25 10:40:56 -05:00
signal.c x86: Add support for restartable sequences 2018-06-06 11:58:32 +02:00
smp.c x86/tracing: Disentangle pagefault and resched IPI tracing key 2017-08-29 11:42:29 +02:00
smpboot.c x86/topology: Provide topology_smt_supported() 2018-06-21 14:20:57 +02:00
stacktrace.c Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-01-03 16:41:07 -08:00
step.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sys_x86_64.c compat: Move compat_timespec/ timeval to compat_time.h 2018-04-19 13:29:54 +02:00
sysfb_efi.c
sysfb_simplefb.c x86/sysfb: Fix lfb_size calculation 2016-11-16 09:38:23 +01:00
sysfb.c
tboot.c x86/pti: Make unpoison of pgd for trusted boot work for real 2018-01-11 23:36:59 +01:00
tce_64.c
time.c x86/time: Unconditionally register legacy timer interrupt 2018-01-14 20:18:23 +01:00
tls.c x86/ldt: Make the LDT mapping RO 2017-12-23 21:13:01 +01:00
tls.h
topology.c
trace_clock.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tracepoint.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
traps.c signal: Ensure every siginfo we send has all bits initialized 2018-04-25 10:40:51 -05:00
tsc_msr.c x86/tsc: Set TSC_KNOWN_FREQ and TSC_RELIABLE flags on Intel Atom SoCs 2016-11-18 10:58:31 +01:00
tsc_sync.c Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-13 19:07:38 -08:00
tsc.c x86/tsc: Fix mark_tsc_unstable() 2018-05-02 16:10:40 +02:00
umip.c signal: Ensure every siginfo we send has all bits initialized 2018-04-25 10:40:51 -05:00
unwind_frame.c x86/unwind: Disable unwinder warnings on 32-bit 2017-10-10 12:49:49 +02:00
unwind_guess.c x86/unwind: Add the ORC unwinder 2017-07-26 13:18:20 +02:00
unwind_orc.c extable: Make init_kernel_text() global 2018-02-21 16:54:06 +01:00
uprobes.c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-04 19:17:47 -07:00
verify_cpu.S x86/boot: Annotate verify_cpu() as a callable function 2017-09-28 09:39:03 +02:00
vm86_32.c x86/vm86/32: Fix POPF emulation 2018-03-14 09:21:01 +01:00
vmlinux.lds.S x86/build: Remove no-op macro VMLINUX_SYMBOL() 2018-05-13 15:11:34 +02:00
vsmp_64.c x86/apic: Remove unused callbacks 2017-09-25 20:51:58 +02:00
x86_init.c xen, mm: allow deferred page initialization for xen pv domains 2018-04-11 10:28:38 -07:00