linux_dsm_epyc7002/include
Thomas Gleixner 46c498c2cd stop_machine: Mark per cpu stopper enabled early
commit 14e568e78 (stop_machine: Use smpboot threads) introduced the
following regression:

Before this commit the stopper enabled bit was set in the online
notifier.

CPU0				CPU1
cpu_up
				cpu online
hotplug_notifier(ONLINE)
  stopper(CPU1)->enabled = true;
...
stop_machine()

The conversion to smpboot threads moved the enablement to the wakeup
path of the parked thread. The majority of users seem to have the
following working order:

CPU0				CPU1
cpu_up
				cpu online
unpark_threads()
  wakeup(stopper[CPU1])
....
				stopper thread runs
				  stopper(CPU1)->enabled = true;
stop_machine()

But Konrad and Sander have observed:

CPU0				CPU1
cpu_up
				cpu online
unpark_threads()
  wakeup(stopper[CPU1])
....
stop_machine()
				stopper thread runs
				  stopper(CPU1)->enabled = true;

Now the stop machinery kicks CPU0 into the stop loop, where it gets
stuck forever because the queue code saw stopper(CPU1)->enabled ==
false, so CPU0 waits for CPU1 to enter stomp_machine, but the CPU1
stopper work got discarded due to enabled == false.

Add a pre_unpark function to the smpboot thread descriptor and call it
before waking the thread.

This fixes the problem at hand, but the stop_machine code should be
more robust. The stopper->enabled flag smells fishy at best.

Thanks to Konrad for going through a loop of debug patches and
providing the information to decode this issue.

Reported-and-tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-and-tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302261843240.22263@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-02-26 22:25:17 +01:00
..
acpi PCI changes for the v3.8 merge window: 2012-12-13 12:14:47 -08:00
asm-generic This implements the cputime accounting on full dynticks CPUs. 2013-02-05 13:10:33 +01:00
clocksource
crypto
drm Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel 2013-01-11 07:52:48 +10:00
keys
linux stop_machine: Mark per cpu stopper enabled early 2013-02-26 22:25:17 +01:00
math-emu
media
memory
misc
net ipv6: rename datagram_send_ctl and datagram_recv_ctl 2013-01-31 13:53:08 -05:00
pcmcia
ras
rdma UAPI: Remove empty Kbuild files 2013-01-02 17:36:10 -08:00
rxrpc
scsi SCSI misc on 20121212 2012-12-13 19:20:31 -08:00
sound Merge remote-tracking branch 'asoc/fix/cs4271' into tmp 2013-01-10 12:22:11 +00:00
target target: Introduce TCM_NO_SENSE 2013-01-10 20:06:08 -08:00
trace Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-02-19 17:49:41 -08:00
uapi Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-02-19 17:49:41 -08:00
video Merge branch 'omap-for-v3.8/fixes-for-merge-window' into omap-for-v3.8/fixes-for-merge-window-v2 2012-12-16 11:28:10 -08:00
xen Bugfixes: 2012-12-18 12:26:54 -08:00
Kbuild UAPI: Remove empty Kbuild files 2013-01-02 17:36:10 -08:00