linux_dsm_epyc7002/fs/ocfs2
Junxiao Bi ded2cf7141 ocfs2: dlm: fix recovery hung
There is a race window in dlm_do_recovery() between dlm_remaster_locks()
and dlm_reset_recovery() when the recovery master nearly finish the
recovery process for a dead node.  After the master sends FINALIZE_RECO
message in dlm_remaster_locks(), another node may become the recovery
master for another dead node, and then send the BEGIN_RECO message to
all the nodes included the old master, in the handler of this message
dlm_begin_reco_handler() of old master, dlm->reco.dead_node and
dlm->reco.new_master will be set to the second dead node and the new
master, then in dlm_reset_recovery(), these two variables will be reset
to default value.  This will cause new recovery master can not finish
the recovery process and hung, at last the whole cluster will hung for
recovery.

old recovery master:                                 new recovery master:
dlm_remaster_locks()
                                                  become recovery master for
                                                  another dead node.
                                                  dlm_send_begin_reco_message()
dlm_begin_reco_handler()
{
 if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) {
  return -EAGAIN;
 }
 dlm_set_reco_master(dlm, br->node_idx);
 dlm_set_reco_dead_node(dlm, br->dead_node);
}
dlm_reset_recovery()
{
 dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
 dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM);
}
                                                  will hang in dlm_remaster_locks() for
                                                  request dlm locks info

Before send FINALIZE_RECO message, recovery master should set
DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done,
this can break the race windows as the BEGIN_RECO messages will not be
handled before DLM_RECO_STATE_FINALIZE flag is cleared.

A similar race may happen between new recovery master and normal node
which is in dlm_finalize_reco_handler(), also fix it.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-03 16:20:54 -07:00
..
cluster Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block 2014-01-30 11:19:05 -08:00
dlm ocfs2: dlm: fix recovery hung 2014-04-03 16:20:54 -07:00
dlmfs ocfs2: remove versioning information 2014-01-21 16:19:41 -08:00
acl.c ocfs2: use generic posix ACL infrastructure 2014-01-25 23:58:21 -05:00
acl.h ocfs2: use generic posix ACL infrastructure 2014-01-25 23:58:21 -05:00
alloc.c ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file 2014-04-03 16:20:53 -07:00
alloc.h ocfs2: Add ocfs2_trim_fs for SSD trim support. 2011-05-23 23:37:18 -07:00
aops.c ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file 2014-04-03 16:20:53 -07:00
aops.h ocfs2: change ip_unaligned_aio to of type mutex from atomit_t 2014-04-03 16:20:53 -07:00
blockcheck.c ocfs2: kill endianness abuses in blockcheck.c 2012-05-29 23:28:35 -04:00
blockcheck.h
buffer_head_io.c ocfs2: return ENOMEM when sb_getblk() fails 2013-11-13 12:09:00 +09:00
buffer_head_io.h
dcache.c ocfs2: needs ->d_lock to poke in ->d_parent->d_inode from ->d_revalidate() 2013-09-29 22:02:20 -04:00
dcache.h
dir.c ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file 2014-04-03 16:20:53 -07:00
dir.h [readdir] convert ocfs2 2013-06-29 12:57:02 +04:00
dlmglue.c ocfs2: pass ocfs2_cluster_connection to ocfs2_this_node 2014-01-21 16:19:41 -08:00
dlmglue.h
export.c fs: encode_fh: return FILEID_INVALID if invalid fid_type 2013-02-26 02:46:10 -05:00
export.h
extent_map.c ocfs2: fix the end cluster offset of FIEMAP 2013-09-11 15:56:53 -07:00
extent_map.h ocfs2: Implement llseek() 2011-07-25 14:58:15 -07:00
file.c ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file 2014-04-03 16:20:53 -07:00
file.h ->permission() sanitizing: don't pass flags to ->permission() 2011-07-20 01:43:24 -04:00
heartbeat.c
heartbeat.h
inode.c ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file 2014-04-03 16:20:53 -07:00
inode.h ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file 2014-04-03 16:20:53 -07:00
ioctl.c ocfs2: adjust minlen with discard_granularity in the FITRIM ioctl 2014-01-21 16:19:42 -08:00
ioctl.h
journal.c ocfs2: use i_size_read() to access i_size 2013-09-11 15:56:30 -07:00
journal.h ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file 2014-04-03 16:20:53 -07:00
Kconfig
localalloc.c ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters 2014-02-06 13:48:51 -08:00
localalloc.h ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters 2014-02-06 13:48:51 -08:00
locks.c
locks.h
Makefile ocfs2: remove versioning information 2014-01-21 16:19:41 -08:00
mmap.c kill f_vfsmnt 2013-02-26 02:46:10 -05:00
mmap.h
move_extents.c ocfs2: remove redundant ocfs2_alloc_dinode_update_counts() and ocfs2_block_group_set_bits() 2014-01-21 16:19:42 -08:00
move_extents.h Ocfs2/move_extents: move/defrag extents within a certain range. 2011-05-25 15:17:12 +08:00
namei.c ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file 2014-04-03 16:20:53 -07:00
namei.h
ocfs1_fs_compat.h
ocfs2_fs.h Revert wrong fixes for common misspellings 2011-04-26 23:31:11 -07:00
ocfs2_ioctl.h Ocfs2/move_extents: Adding new ioctl code 'OCFS2_IOC_MOVE_EXT' to ocfs2. 2011-05-25 15:17:08 +08:00
ocfs2_lockid.h
ocfs2_lockingver.h
ocfs2_trace.h ocfs2: lighten up allocate transaction 2013-09-11 15:56:28 -07:00
ocfs2.h ocfs2: add clustername to cluster connection 2014-01-21 16:19:41 -08:00
quota_global.c ocfs2: fix quota file corruption 2014-03-04 07:55:48 -08:00
quota_local.c ocfs2: fix quota file corruption 2014-03-04 07:55:48 -08:00
quota.h
refcounttree.c ocfs2: use generic posix ACL infrastructure 2014-01-25 23:58:21 -05:00
refcounttree.h ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page 2013-08-13 17:57:49 -07:00
reservations.c
reservations.h
resize.c ocfs2: do not call brelse() if group_bh is not initialized in ocfs2_group_add() 2013-11-13 12:09:01 +09:00
resize.h
slot_map.c ocfs2: Clean up messages in the fs 2011-07-24 10:34:54 -07:00
slot_map.h
stack_o2cb.c ocfs2: pass ocfs2_cluster_connection to ocfs2_this_node 2014-01-21 16:19:41 -08:00
stack_user.c ocfs2: fix sparse non static symbol warning 2014-01-21 16:19:42 -08:00
stackglue.c ocfs2: check if cluster name exists before deref 2014-03-28 13:56:58 -07:00
stackglue.h ocfs2: pass ocfs2_cluster_connection to ocfs2_this_node 2014-01-21 16:19:41 -08:00
suballoc.c ocfs2: remove redundant ocfs2_alloc_dinode_update_counts() and ocfs2_block_group_set_bits() 2014-01-21 16:19:42 -08:00
suballoc.h ocfs2: remove redundant ocfs2_alloc_dinode_update_counts() and ocfs2_block_group_set_bits() 2014-01-21 16:19:42 -08:00
super.c ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file 2014-04-03 16:20:53 -07:00
super.h treewide: use __printf not __attribute__((format(printf,...))) 2011-10-31 17:30:54 -07:00
symlink.c ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path 2013-02-26 02:46:12 -05:00
symlink.h ocfs: simplify symlink handling 2012-05-29 23:28:40 -04:00
sysfile.c ocfs2: remove kfree() redundant null checks 2013-02-21 17:22:19 -08:00
sysfile.h
uptodate.c
uptodate.h
xattr.c ocfs2: use generic posix ACL infrastructure 2014-01-25 23:58:21 -05:00
xattr.h ocfs2: use generic posix ACL infrastructure 2014-01-25 23:58:21 -05:00