linux_dsm_epyc7002/fs/ocfs2/cluster
Yang Zhang fc2af28bd9 ocfs2/cluster: close a race that fence can't be triggered
When some nodes of cluster face with TCP connection fault, ocfs2 will
pick up a quorum to continue to work and other nodes will be fenced by
resetting host.

In order to decide which node should be fenced, ocfs2 leverages
o2quo_state::qs_holds.  If that variable is reduced to zero, then a try
to decide if fence local node is performed.  However, under a specific
scenario that local node is not disconnected from others at the same
time, above method has a problem to reduce ::qs_holds to zero.

Because, o2net 90s idle timer corresponding to different nodes is
triggered one after another.

  node 2			node 3
  90s idle timer elapses
  clear ::qs_conn_bm
  set hold
				40s is passed
				90 idle timer elapses
				clear ::qs_conn_bm
				set hold
  still up timer elapses
  clear hold (NOT to zero )
  90s idle timer elapses AGAIN
				still up timer elapses.
				clear hold
				still up timer elapses

To solve this issue, a node which has already be evicted from
::qs_conn_bm can't set hold again and again invoked from idle timer.

Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373F1F3F93B@H3CMLB12-EX.srv.huawei-3com.com
Signed-off-by: Yang Zhang <zhang.yangB@h3c.com>
Signed-off-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-01-31 17:18:35 -08:00
..
heartbeat.c ocfs2/cluster: make config_item_type const 2017-10-19 16:15:18 +02:00
heartbeat.h ocfs2: cleanup unused func declaration and assignment 2017-11-15 18:21:01 -08:00
Makefile ocfs2: remove versioning information 2014-01-21 16:19:41 -08:00
masklog.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
masklog.h ocfs2: reduce object size of mlog uses 2015-06-24 17:49:39 -07:00
netdebug.c ocfs2: free 'dummy_sc' in sc_fop_release() to prevent memory leak 2017-07-06 16:24:30 -07:00
nodemanager.c ocfs2: subsystem.su_mutex is required while accessing the item->ci_parent 2017-11-15 18:21:01 -08:00
nodemanager.h ocfs2/cluster: Make fence method configurable - v2 2009-12-02 16:49:26 -08:00
ocfs2_heartbeat.h ocfs2: warn the user on a dead timeout mismatch 2006-06-29 15:45:35 -07:00
ocfs2_nodemanager.h ocfs2/dlm: Add message DLM_QUERY_REGION 2010-10-09 10:26:23 -07:00
quorum.c ocfs2/cluster: close a race that fence can't be triggered 2018-01-31 17:18:35 -08:00
quorum.h [PATCH] OCFS2: The Second Oracle Cluster Filesystem 2006-01-03 11:45:46 -08:00
sys.c VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms. 2014-03-24 12:21:00 +10:30
sys.h [PATCH] OCFS2: The Second Oracle Cluster Filesystem 2006-01-03 11:45:46 -08:00
tcp_internal.h ocfs2/cluster: neaten a member of o2net_msg_handler 2018-01-31 17:18:34 -08:00
tcp.c cfs2: switch to sock_recvmsg() 2017-12-02 20:38:04 -05:00
tcp.h ocfs2: o2net: set tcp user timeout to max value 2014-08-29 16:28:16 -07:00