linux_dsm_epyc7002/include/linux/smpboot.h
Thomas Gleixner 46c498c2cd stop_machine: Mark per cpu stopper enabled early
commit 14e568e78 (stop_machine: Use smpboot threads) introduced the
following regression:

Before this commit the stopper enabled bit was set in the online
notifier.

CPU0				CPU1
cpu_up
				cpu online
hotplug_notifier(ONLINE)
  stopper(CPU1)->enabled = true;
...
stop_machine()

The conversion to smpboot threads moved the enablement to the wakeup
path of the parked thread. The majority of users seem to have the
following working order:

CPU0				CPU1
cpu_up
				cpu online
unpark_threads()
  wakeup(stopper[CPU1])
....
				stopper thread runs
				  stopper(CPU1)->enabled = true;
stop_machine()

But Konrad and Sander have observed:

CPU0				CPU1
cpu_up
				cpu online
unpark_threads()
  wakeup(stopper[CPU1])
....
stop_machine()
				stopper thread runs
				  stopper(CPU1)->enabled = true;

Now the stop machinery kicks CPU0 into the stop loop, where it gets
stuck forever because the queue code saw stopper(CPU1)->enabled ==
false, so CPU0 waits for CPU1 to enter stomp_machine, but the CPU1
stopper work got discarded due to enabled == false.

Add a pre_unpark function to the smpboot thread descriptor and call it
before waking the thread.

This fixes the problem at hand, but the stop_machine code should be
more robust. The stopper->enabled flag smells fishy at best.

Thanks to Konrad for going through a loop of debug patches and
providing the information to decode this issue.

Reported-and-tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-and-tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1302261843240.22263@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2013-02-26 22:25:17 +01:00

53 lines
1.9 KiB
C

#ifndef _LINUX_SMPBOOT_H
#define _LINUX_SMPBOOT_H
#include <linux/types.h>
struct task_struct;
/* Cookie handed to the thread_fn*/
struct smpboot_thread_data;
/**
* struct smp_hotplug_thread - CPU hotplug related thread descriptor
* @store: Pointer to per cpu storage for the task pointers
* @list: List head for core management
* @thread_should_run: Check whether the thread should run or not. Called with
* preemption disabled.
* @thread_fn: The associated thread function
* @create: Optional setup function, called when the thread gets
* created (Not called from the thread context)
* @setup: Optional setup function, called when the thread gets
* operational the first time
* @cleanup: Optional cleanup function, called when the thread
* should stop (module exit)
* @park: Optional park function, called when the thread is
* parked (cpu offline)
* @unpark: Optional unpark function, called when the thread is
* unparked (cpu online)
* @pre_unpark: Optional unpark function, called before the thread is
* unparked (cpu online). This is not guaranteed to be
* called on the target cpu of the thread. Careful!
* @selfparking: Thread is not parked by the park function.
* @thread_comm: The base name of the thread
*/
struct smp_hotplug_thread {
struct task_struct __percpu **store;
struct list_head list;
int (*thread_should_run)(unsigned int cpu);
void (*thread_fn)(unsigned int cpu);
void (*create)(unsigned int cpu);
void (*setup)(unsigned int cpu);
void (*cleanup)(unsigned int cpu, bool online);
void (*park)(unsigned int cpu);
void (*unpark)(unsigned int cpu);
void (*pre_unpark)(unsigned int cpu);
bool selfparking;
const char *thread_comm;
};
int smpboot_register_percpu_thread(struct smp_hotplug_thread *plug_thread);
void smpboot_unregister_percpu_thread(struct smp_hotplug_thread *plug_thread);
int smpboot_thread_schedule(void);
#endif