linux_dsm_epyc7002/fs/ocfs2/dlm
Joseph Qi ac7cf246df ocfs2/dlm: fix race between convert and recovery
There is a race window between dlmconvert_remote and
dlm_move_lockres_to_recovery_list, which will cause a lock with
OCFS2_LOCK_BUSY in grant list, thus system hangs.

dlmconvert_remote
{
        spin_lock(&res->spinlock);
        list_move_tail(&lock->list, &res->converting);
        lock->convert_pending = 1;
        spin_unlock(&res->spinlock);

        status = dlm_send_remote_convert_request();
        >>>>>> race window, master has queued ast and return DLM_NORMAL,
               and then down before sending ast.
               this node detects master down and calls
               dlm_move_lockres_to_recovery_list, which will revert the
               lock to grant list.
               Then OCFS2_LOCK_BUSY won't be cleared as new master won't
               send ast any more because it thinks already be authorized.

        spin_lock(&res->spinlock);
        lock->convert_pending = 0;
        if (status != DLM_NORMAL)
                dlm_revert_pending_convert(res, lock);
        spin_unlock(&res->spinlock);
}

In this case, check if res->state has DLM_LOCK_RES_RECOVERING bit set
(res is still in recovering) or res master changed (new master has
finished recovery), reset the status to DLM_RECOVERING, then it will
retry convert.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Tariq Saeed <tariq.x.saeed@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
..
dlmapi.h ocfs2/trivial: Remove trailing whitespaces 2010-01-25 19:20:51 -08:00
dlmast.c o2dlm: fix NULL pointer dereference in o2dlm_blocking_ast_wrapper 2015-02-10 14:30:30 -08:00
dlmcommon.h ocfs2: fix a tiny race that leads file system read-only 2016-03-15 16:55:16 -07:00
dlmconvert.c ocfs2/dlm: fix race between convert and recovery 2016-03-25 16:37:42 -07:00
dlmconvert.h [PATCH] OCFS2: The Second Oracle Cluster Filesystem 2006-01-03 11:45:47 -08:00
dlmdebug.c ocfs2: fix snprintf format specifier in dlmdebug.c 2015-02-10 14:30:29 -08:00
dlmdebug.h ocfs2/dlm: Cleanup dlmdebug.c 2010-12-22 18:34:44 -08:00
dlmdomain.c ocfs2/dlm: fix a variable overflow problem in dlmdomain.c 2016-03-15 16:55:16 -07:00
dlmdomain.h ocfs2: dlm: dlmdomain: remove unused function 2015-02-10 14:30:29 -08:00
dlmlock.c ocfs2: remove NULL assignments on static 2014-06-04 16:53:53 -07:00
dlmmaster.c ocfs2: fix a tiny race that leads file system read-only 2016-03-15 16:55:16 -07:00
dlmrecovery.c ocfs2: fix a tiny race that leads file system read-only 2016-03-15 16:55:16 -07:00
dlmthread.c ocfs2: fix a tiny race that leads file system read-only 2016-03-15 16:55:16 -07:00
dlmunlock.c ocfs2/dlm: return appropriate value when dlm_grab() returns NULL 2016-01-14 16:00:49 -08:00
Makefile ocfs2: remove versioning information 2014-01-21 16:19:41 -08:00