linux_dsm_epyc7002/fs/dlm
David Teigland b790c3b7c3 [DLM] can miss clearing resend flag
A long, complicated sequence of events, beginning with the RESEND flag not
being cleared on an lkb, can result in an unlock never completing.

- lkb on waiters list for remote lookup
- the remote node is both the dir node and the master node, so
  it optimizes the lookup into a request and sends a request
  reply back
- the request reply is saved on the requestqueue to be processed
  after recovery
- recovery runs dlm_recover_waiters_pre() which sets RESEND flag
  so the lookup will be resent after recovery
- end of recovery: process_requestqueue takes saved request reply
  which removes the lkb off the waitesr list, _without_ clearing
  the RESEND flag
- end of recovery: dlm_recover_waiters_post() doesn't do anything
  with the now completed lookup lkb (would usually clear RESEND)
- later, the node unmounts, unlocks this lkb that still has RESEND
  flag set
- the lkb is on the waiters list again, now for unlock, when recovery
  occurs, dlm_recover_waiters_pre() shows the lkb for unlock with RESEND
  set, doesn't do anything since the master still exists
- end of recovery: dlm_recover_waiters_post() takes this lkb off
  the waiters list because it has the RESEND flag set, then reports
  an error because unlocks are never supposed to be handled in
  recover_waiters_post().
- later, the unlock reply is received, doesn't find the lkb on
  the waiters list because recover_waiters_post() has wrongly
  removed it.
- the unlock operation has been lost, and we're left with a
  stray granted lock
- unmount spins waiting for the unlock to complete

The visible evidence of this problem will be a node where gfs umount is
spinning, the dlm waiters list will be empty, and the dlm locks list will
show a granted lock.

The fix is simply to clear the RESEND flag when taking an lkb off the
waiters list.

Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2007-02-05 13:37:50 -05:00
..
ast.c [DLM] down conversion clearing flags 2006-08-23 16:07:31 -04:00
ast.h [DLM] The core of the DLM for GFS2/CLVM 2006-01-18 09:30:29 +00:00
config.c [DLM] expose dlm_config_info fields in configfs 2007-02-05 13:36:43 -05:00
config.h [DLM] add config entry to enable log_debug 2007-02-05 13:36:40 -05:00
debug_fs.c [GFS2] inode_diet: Replace inode.u.generic_ip with inode.i_private (gfs) 2006-09-28 08:32:24 -04:00
dir.c [DLM] Update DLM to the latest patch level 2006-01-20 08:47:07 +00:00
dir.h [DLM] The core of the DLM for GFS2/CLVM 2006-01-18 09:30:29 +00:00
dlm_internal.h [DLM] fix user unlocking 2007-02-05 13:36:55 -05:00
Kconfig [DLM] Fix DLM config 2006-11-30 10:35:41 -05:00
lock.c [DLM] can miss clearing resend flag 2007-02-05 13:37:50 -05:00
lock.h [DLM] dump rsb and locks on assert 2006-08-21 09:50:09 -04:00
lockspace.c [DLM] rename dlm_config_info fields 2007-02-05 13:36:37 -05:00
lockspace.h [DLM] dlm: user locks 2006-07-13 09:25:34 -04:00
lowcomms-sctp.c [DLM] Use workqueues for dlm lowcomms 2007-02-05 13:36:52 -05:00
lowcomms-tcp.c [DLM] Make sock_sem into a mutex 2007-02-05 13:37:44 -05:00
lowcomms.h [DLM] Clean up lowcomms 2006-12-07 09:25:13 -05:00
lvb_table.h [DLM] The core of the DLM for GFS2/CLVM 2006-01-18 09:30:29 +00:00
main.c [DLM] Clean up lowcomms 2006-12-07 09:25:13 -05:00
Makefile [DLM] Add support for tcp communications 2006-11-30 10:35:00 -05:00
member.c [DLM] fix aborted recovery during node removal 2006-11-30 10:35:13 -05:00
member.h [DLM] The core of the DLM for GFS2/CLVM 2006-01-18 09:30:29 +00:00
memory.c [PATCH] slab: remove kmem_cache_t 2006-12-07 08:39:25 -08:00
memory.h [DLM] Remove range locks from the DLM 2006-02-23 09:56:38 +00:00
midcomms.c [DLM] rename dlm_config_info fields 2007-02-05 13:36:37 -05:00
midcomms.h [DLM] The core of the DLM for GFS2/CLVM 2006-01-18 09:30:29 +00:00
rcom.c [DLM] rename dlm_config_info fields 2007-02-05 13:36:37 -05:00
rcom.h [DLM] The core of the DLM for GFS2/CLVM 2006-01-18 09:30:29 +00:00
recover.c [DLM] fix master recovery 2007-02-05 13:36:58 -05:00
recover.h [DLM] The core of the DLM for GFS2/CLVM 2006-01-18 09:30:29 +00:00
recoverd.c [DLM] change some log_error to log_debug 2007-02-05 13:36:34 -05:00
recoverd.h [DLM] The core of the DLM for GFS2/CLVM 2006-01-18 09:30:29 +00:00
requestqueue.c [DLM] fix add_requestqueue checking nodes list 2006-11-30 10:37:00 -05:00
requestqueue.h [DLM] fix requestqueue race 2006-11-30 10:35:10 -05:00
user.c [DLM] fix user unlocking 2007-02-05 13:36:55 -05:00
user.h [DLM] dlm: user locks 2006-07-13 09:25:34 -04:00
util.c [DLM] fix old rcom messages 2007-02-05 13:35:50 -05:00
util.h [DLM] The core of the DLM for GFS2/CLVM 2006-01-18 09:30:29 +00:00