linux_dsm_epyc7002/fs/ocfs2/dlm
piaojun ee8f7fcbe6 ocfs2/dlm: continue to purge recovery lockres when recovery master goes down
We found a dlm-blocked situation caused by continuous breakdown of
recovery masters described below.  To solve this problem, we should
purge recovery lock once detecting recovery master goes down.

N3                      N2                   N1(reco master)
                        go down
                                             pick up recovery lock and
                                             begin recoverying for N2

                                             go down

pick up recovery
lock failed, then
purge it:
dlm_purge_lockres
  ->DROPPING_REF is set

send deref to N1 failed,
recovery lock is not purged

find N1 go down, begin
recoverying for N1, but
blocked in dlm_do_recovery
as DROPPING_REF is set:
dlm_do_recovery
  ->dlm_pick_recovery_master
    ->dlmlock
      ->dlm_get_lock_resource
        ->__dlm_wait_on_lockres_flags(tmpres,
	  	DLM_LOCK_RES_DROPPING_REF);

Fixes: 8c03439681 ("ocfs2/dlm: clear DROPPING_REF flag when the master goes down")
Link: http://lkml.kernel.org/r/578453AF.8030404@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jiufei Xue <xuejiufei@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-08-02 17:31:41 -04:00
..
dlmapi.h ocfs2/trivial: Remove trailing whitespaces 2010-01-25 19:20:51 -08:00
dlmast.c o2dlm: fix NULL pointer dereference in o2dlm_blocking_ast_wrapper 2015-02-10 14:30:30 -08:00
dlmcommon.h ocfs2/dlm: continue to purge recovery lockres when recovery master goes down 2016-08-02 17:31:41 -04:00
dlmconvert.c ocfs2/dlm: move lock to the tail of grant queue while doing in-place convert 2016-03-25 16:37:42 -07:00
dlmconvert.h [PATCH] OCFS2: The Second Oracle Cluster Filesystem 2006-01-03 11:45:47 -08:00
dlmdebug.c ocfs2/dlm: fix memory leak of dlm_debug_ctxt 2016-07-26 16:19:19 -07:00
dlmdebug.h ocfs2/dlm: fix memory leak of dlm_debug_ctxt 2016-07-26 16:19:19 -07:00
dlmdomain.c ocfs2/dlm: fix a variable overflow problem in dlmdomain.c 2016-03-15 16:55:16 -07:00
dlmdomain.h ocfs2: dlm: dlmdomain: remove unused function 2015-02-10 14:30:29 -08:00
dlmlock.c ocfs2: remove NULL assignments on static 2014-06-04 16:53:53 -07:00
dlmmaster.c ocfs2/dlm: continue to purge recovery lockres when recovery master goes down 2016-08-02 17:31:41 -04:00
dlmrecovery.c ocfs2/dlm: continue to purge recovery lockres when recovery master goes down 2016-08-02 17:31:41 -04:00
dlmthread.c ocfs2/dlm: continue to purge recovery lockres when recovery master goes down 2016-08-02 17:31:41 -04:00
dlmunlock.c ocfs2/dlm: return appropriate value when dlm_grab() returns NULL 2016-01-14 16:00:49 -08:00
Makefile ocfs2: remove versioning information 2014-01-21 16:19:41 -08:00