linux_dsm_epyc7002/arch/x86/kernel/cpu/mcheck
Srivatsa S. Bhat 7a0c819d28 x86/mce: Rework cmci_rediscover() to play well with CPU hotplug
Dave Jones reports that offlining a CPU leads to this trace:

numa_remove_cpu cpu 1 node 0: mask now 0,2-3
smpboot: CPU 1 is now offline
BUG: using smp_processor_id() in preemptible [00000000] code:
cpu-offline.sh/10591
caller is cmci_rediscover+0x6a/0xe0
Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2
Call Trace:
 [<ffffffff81333bbd>] debug_smp_processor_id+0xdd/0x100
 [<ffffffff8101edba>] cmci_rediscover+0x6a/0xe0
 [<ffffffff815f5b9f>] mce_cpu_callback+0x19d/0x1ae
 [<ffffffff8160ea66>] notifier_call_chain+0x66/0x150
 [<ffffffff8107ad7e>] __raw_notifier_call_chain+0xe/0x10
 [<ffffffff8104c2e3>] cpu_notify+0x23/0x50
 [<ffffffff8104c31e>] cpu_notify_nofail+0xe/0x20
 [<ffffffff815ef082>] _cpu_down+0x302/0x350
 [<ffffffff815ef106>] cpu_down+0x36/0x50
 [<ffffffff815f1c9d>] store_online+0x8d/0xd0
 [<ffffffff813edc48>] dev_attr_store+0x18/0x30
 [<ffffffff81226eeb>] sysfs_write_file+0xdb/0x150
 [<ffffffff811adfb2>] vfs_write+0xa2/0x170
 [<ffffffff811ae16c>] sys_write+0x4c/0xa0
 [<ffffffff81613019>] system_call_fastpath+0x16/0x1b

However, a look at cmci_rediscover shows that it can be simplified quite
a bit, apart from solving the above issue. It invokes functions that
take spin locks with interrupts disabled, and hence it can run in atomic
context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU
is already dead and out of the cpu_online_mask. So take these points into
account and simplify the code, and thereby also fix the above issue.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2013-04-02 14:04:01 -07:00
..
Makefile ACPI, APEI, Generic Hardware Error Source memory error support 2010-05-19 22:41:16 -04:00
mce_amd.c x86, AMD: Change Boris' email address 2012-10-30 10:05:50 +01:00
mce_intel.c x86/mce: Rework cmci_rediscover() to play well with CPU hotplug 2013-04-02 14:04:01 -07:00
mce-apei.c x86/mce Add validation check before GHES error is recorded 2012-04-20 16:02:05 -07:00
mce-inject.c x86: mce: Serialize mce injection 2012-08-03 11:45:56 -07:00
mce-internal.h x86, MCA: Finish mca_config conversion 2012-10-26 14:37:58 +02:00
mce-severity.c x86, MCA: Finish mca_config conversion 2012-10-26 14:37:58 +02:00
mce.c x86/mce: Rework cmci_rediscover() to play well with CPU hotplug 2013-04-02 14:04:01 -07:00
p5.c taint: add explicit flag to show whether lock dep is still OK. 2013-01-21 17:17:57 +10:30
therm_throt.c Disintegrate asm/system.h for X86 2012-03-28 18:11:12 +01:00
threshold.c x86: Call idle notifier after irq_enter() 2011-12-11 10:31:38 -08:00
winchip.c taint: add explicit flag to show whether lock dep is still OK. 2013-01-21 17:17:57 +10:30