mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-16 04:37:07 +07:00
md: add checkings before flush md_misc_wq
Coly reported possible circular locking dependencyi with LOCKDEP enabled, quote the below info from the detailed report [1]. [ 1607.673903] Chain exists of: [ 1607.673903] kn->count#256 --> (wq_completion)md_misc --> (work_completion)(&rdev->del_work) [ 1607.673903] [ 1607.827946] Possible unsafe locking scenario: [ 1607.827946] [ 1607.898780] CPU0 CPU1 [ 1607.952980] ---- ---- [ 1608.007173] lock((work_completion)(&rdev->del_work)); [ 1608.069690] lock((wq_completion)md_misc); [ 1608.149887] lock((work_completion)(&rdev->del_work)); [ 1608.242563] lock(kn->count#256); [ 1608.283238] [ 1608.283238] *** DEADLOCK *** [ 1608.283238] [ 1608.354078] 2 locks held by kworker/5:0/843: [ 1608.405152] #0: ffff8889eecc9948 ((wq_completion)md_misc){+.+.}, at: process_one_work+0x42b/0xb30 [ 1608.512399] #1: ffff888a1d3b7e10 ((work_completion)(&rdev->del_work)){+.+.}, at: process_one_work+0x42b/0xb30 [ 1608.632130] Since works (rdev->del_work and mddev->del_work) are queued in md_misc_wq, then lockdep_map lock is held if either of them are running, then both of them try to hold kernfs lock by call kobject_del. Then if new_dev_store or array_state_store are triggered by write to the related sysfs node, so the write operation gets kernfs lock, but need the lockdep_map because all of them would trigger flush_workqueue(md_misc_wq) finally, then the same lockdep_map lock is needed. To suppress the lockdep warnning, we should flush the workqueue in case the related work is pending. And several works are attached to md_misc_wq, so we need to check which work should be checked: 1. for __md_stop_writes, the purpose of call flush workqueue is ensure sync thread is started if it was starting, so check mddev->del_work is pending or not since md_start_sync is attached to mddev->del_work. 2. __md_stop flushes md_misc_wq to ensure event_work is done, check the event_work is enough. Assume raid_{ctr,dtr} -> md_stop -> __md_stop doesn't need the kernfs lock. 3. both new_dev_store (holds kernfs lock) and ADD_NEW_DISK ioctl (holds the bdev->bd_mutex) call flush_workqueue to ensure md_delayed_delete has completed, this case will be handled in next patch. 4. md_open flushes workqueue to ensure the previous md is disappeared, but it holds bdev->bd_mutex then try to flush workqueue, so it is better to check mddev->del_work as well to avoid potential lock issue, this will be done in another patch. [1]: https://marc.info/?l=linux-raid&m=158518958031584&w=2 Cc: Coly Li <colyli@suse.de> Reported-by: Coly Li <colyli@suse.de> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
This commit is contained in:
parent
91bf5ec342
commit
21e0958ec9
@ -6147,7 +6147,8 @@ static void md_clean(struct mddev *mddev)
|
||||
static void __md_stop_writes(struct mddev *mddev)
|
||||
{
|
||||
set_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
|
||||
flush_workqueue(md_misc_wq);
|
||||
if (work_pending(&mddev->del_work))
|
||||
flush_workqueue(md_misc_wq);
|
||||
if (mddev->sync_thread) {
|
||||
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
|
||||
md_reap_sync_thread(mddev);
|
||||
@ -6200,7 +6201,8 @@ static void __md_stop(struct mddev *mddev)
|
||||
md_bitmap_destroy(mddev);
|
||||
mddev_detach(mddev);
|
||||
/* Ensure ->event_work is done */
|
||||
flush_workqueue(md_misc_wq);
|
||||
if (mddev->event_work.func)
|
||||
flush_workqueue(md_misc_wq);
|
||||
spin_lock(&mddev->lock);
|
||||
mddev->pers = NULL;
|
||||
spin_unlock(&mddev->lock);
|
||||
|
Loading…
Reference in New Issue
Block a user