linux_dsm_epyc7002/fs/ocfs2/cluster
Junxiao Bi e0cbb79805 ocfs2: o2hb: add negotiate timer
This series of patches is to fix the issue that when storage down, all
nodes will fence self due to write timeout.

With this patch set, all nodes will keep going until storage back
online, except if the following issue happens, then all nodes will do as
before to fence self.

1. io error got
2. network between nodes down
3. nodes panic

This patch (of 6):

When storage down, all nodes will fence self due to write timeout.  The
negotiate timer is designed to avoid this, with it node will wait until
storage up again.

Negotiate timer working in the following way:

1. The timer expires before write timeout timer, its timeout is half
   of write timeout now.  It is re-queued along with write timeout timer.
   If expires, it will send NEGO_TIMEOUT message to master node(node with
   lowest node number).  This message does nothing but marks a bit in a
   bitmap recording which nodes are negotiating timeout on master node.

2. If storage down, nodes will send this message to master node, then
   when master node finds its bitmap including all online nodes, it sends
   NEGO_APPROVL message to all nodes one by one, this message will
   re-queue write timeout timer and negotiate timer.  For any node doesn't
   receive this message or meets some issue when handling this message, it
   will be fenced.  If storage up at any time, o2hb_thread will run and
   re-queue all the timer, nothing will be affected by these two steps.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Gang He <ghe@suse.com>
Cc: rwxybh <rwxybh@126.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-27 14:49:37 -07:00
..
heartbeat.c ocfs2: o2hb: add negotiate timer 2016-05-27 14:49:37 -07:00
heartbeat.h ocfs2: fix deadlock between o2hb thread and o2net_wq 2014-10-09 22:25:47 -04:00
Makefile ocfs2: remove versioning information 2014-01-21 16:19:41 -08:00
masklog.c ocfs2: remove __mlog_cpu_guess 2015-06-24 17:49:39 -07:00
masklog.h ocfs2: reduce object size of mlog uses 2015-06-24 17:49:39 -07:00
netdebug.c fs/ocfs2/cluster/netdebug.c: use seq_open_private() not seq_open() 2014-10-09 22:25:47 -04:00
nodemanager.c configfs: switch ->default groups to a linked list 2016-03-06 16:11:24 +01:00
nodemanager.h ocfs2/cluster: Make fence method configurable - v2 2009-12-02 16:49:26 -08:00
ocfs2_heartbeat.h ocfs2: warn the user on a dead timeout mismatch 2006-06-29 15:45:35 -07:00
ocfs2_nodemanager.h ocfs2/dlm: Add message DLM_QUERY_REGION 2010-10-09 10:26:23 -07:00
quorum.c ocfs2: quorum: add a log for node not fenced 2014-08-29 16:28:17 -07:00
quorum.h [PATCH] OCFS2: The Second Oracle Cluster Filesystem 2006-01-03 11:45:46 -08:00
sys.c VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms. 2014-03-24 12:21:00 +10:30
sys.h [PATCH] OCFS2: The Second Oracle Cluster Filesystem 2006-01-03 11:45:46 -08:00
tcp_internal.h ocfs2: fix wrong comment 2015-02-10 14:30:28 -08:00
tcp.c ocfs2/cluster: block BH in TCP callbacks 2016-05-19 11:36:49 -07:00
tcp.h ocfs2: o2net: set tcp user timeout to max value 2014-08-29 16:28:16 -07:00