linux_dsm_epyc7002/drivers/block
Lars Ellenberg 31d646042d drbd: disallow promotion during resync handshake, avoid deadlock and hard reset
We already serialize connection state changes,
and other, non-connection state changes (role changes)
while we are establishing a connection.

But if we have an established connection,
then trigger a resync handshake (by primary --force or similar),
until now we just had to be "lucky".

Consider this sequence (e.g. deployment scenario):
create-md; up;
  -> Connected Secondary/Secondary Inconsistent/Inconsistent
then do a racy primary --force on both peers.

 block drbd0: drbd_sync_handshake:
 block drbd0: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:25590 flags:0
 block drbd0: peer 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:25590 flags:0
 block drbd0: peer( Unknown -> Secondary ) conn( WFReportParams -> Connected ) pdsk( DUnknown -> Inconsistent )
 block drbd0: peer( Secondary -> Primary ) pdsk( Inconsistent -> UpToDate )
  *** HERE things go wrong. ***
 block drbd0: role( Secondary -> Primary )
 block drbd0: drbd_sync_handshake:
 block drbd0: self 0000000000000005:0000000000000000:0000000000000000:0000000000000000 bits:25590 flags:0
 block drbd0: peer C90D2FC716D232AB:0000000000000004:0000000000000000:0000000000000000 bits:25590 flags:0
 block drbd0: Becoming sync target due to disk states.
 block drbd0: Writing the whole bitmap, full sync required after drbd_sync_handshake.
 block drbd0: Remote failed to finish a request within 6007ms > ko-count (2) * timeout (30 * 0.1s)
 drbd s0: peer( Primary -> Unknown ) conn( Connected -> Timeout ) pdsk( UpToDate -> DUnknown )

The problem here is that the local promotion happens before the sync handshake
triggered by the remote promotion was completed.  Some assumptions elsewhere
become wrong, and when the expected resync handshake is then received and
processed, we get stuck in a deadlock, which can only be recovered by reboot :-(

Fix: if we know the peer has good data,
and our own disk is present, but NOT good,
and there is no resync going on yet,
we expect a sync handshake to happen "soon".
So reject a racy promotion with SS_IN_TRANSIENT_STATE.

Result:
 ... as above ...
 block drbd0: peer( Secondary -> Primary ) pdsk( Inconsistent -> UpToDate )
  *** local promotion being postponed until ... ***
 block drbd0: drbd_sync_handshake:
 block drbd0: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:25590 flags:0
 block drbd0: peer 77868BDA836E12A5:0000000000000004:0000000000000000:0000000000000000 bits:25590 flags:0
  ...
 block drbd0: conn( WFBitMapT -> WFSyncUUID )
 block drbd0: updated sync uuid 85D06D0E8887AD44:0000000000000000:0000000000000000:0000000000000000
 block drbd0: conn( WFSyncUUID -> SyncTarget )
  *** ... after the resync handshake ***
 block drbd0: role( Secondary -> Primary )

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-13 21:43:07 -06:00
..
aoe mm: rename _count, field of the struct page, to _refcount 2016-05-19 19:12:14 -07:00
drbd drbd: disallow promotion during resync handshake, avoid deadlock and hard reset 2016-06-13 21:43:07 -06:00
mtip32xx drivers: use req op accessor 2016-06-07 13:41:38 -06:00
paride paride: make 'verbose' parameter an 'int' again 2016-03-15 16:55:16 -07:00
rsxx block, fs, mm, drivers: use bio set/get op accessors 2016-06-07 13:41:38 -06:00
xen-blkback block: add a separate operation type for secure erase 2016-06-09 09:52:25 -06:00
zram block, fs, mm, drivers: use bio set/get op accessors 2016-06-07 13:41:38 -06:00
amiflop.c block: drop owner assignment from platform_drivers 2014-10-20 16:20:18 +02:00
ataflop.c Merge branch 'for-3.16/core' of git://git.kernel.dk/linux-block into next 2014-06-02 09:29:34 -07:00
brd.c block, fs, mm, drivers: use bio set/get op accessors 2016-06-07 13:41:38 -06:00
cciss_cmd.h cciss: use new doorbell-bit-5 reset method 2011-05-06 08:23:55 -06:00
cciss_scsi.c scsi: Do not set cmd_per_lun to 1 in the host template 2015-05-31 18:06:28 -07:00
cciss_scsi.h cciss: add cciss_tape_cmds module paramter 2011-05-06 08:23:59 -06:00
cciss.c SCSI misc on 20160113 2016-01-13 19:37:36 -08:00
cciss.h cciss: Adds simple mode functionality 2011-08-08 11:40:15 +02:00
cryptoloop.c block: cryptoloop - Use new skcipher interface 2016-01-27 20:35:43 +08:00
DAC960.c block: use pci_zalloc_consistent 2014-08-08 15:57:28 -07:00
DAC960.h
floppy.c block, fs, mm, drivers: use bio set/get op accessors 2016-06-07 13:41:38 -06:00
hd.c block: hd: remove deprecated IRQF_DISABLED 2014-10-01 08:16:07 -06:00
Kconfig cpqarray: remove it from the kernel 2016-03-14 09:06:01 -06:00
loop.c block, drivers: add REQ_OP_FLUSH operation 2016-06-07 13:41:38 -06:00
loop.h block: loop: support DIO & AIO 2015-09-23 11:01:16 -06:00
Makefile drivers:block: cpqarray clean up 2016-03-15 15:59:47 -07:00
mg_disk.c mg_disk: fix enum REQ_OP_ kbuild error 2016-06-08 15:01:16 -06:00
nbd.c block, drivers: add REQ_OP_FLUSH operation 2016-06-07 13:41:38 -06:00
null_blk.c null_blk: add lightnvm null_blk device to the nullb_list 2016-03-18 18:10:37 -07:00
osdblk.c block, drivers: add REQ_OP_FLUSH operation 2016-06-07 13:41:38 -06:00
pktcdvd.c block, fs, mm, drivers: use bio set/get op accessors 2016-06-07 13:41:38 -06:00
ps3disk.c block, drivers: add REQ_OP_FLUSH operation 2016-06-07 13:41:38 -06:00
ps3vram.c block: change ->make_request_fn() and users to return a queue cookie 2015-11-07 10:40:46 -07:00
rbd_types.h rbd: get rid of RBD_MAX_SEG_NAME_LEN 2012-12-17 08:37:29 -06:00
rbd.c drivers: use req op accessor 2016-06-07 13:41:38 -06:00
skd_main.c block, drivers: add REQ_OP_FLUSH operation 2016-06-07 13:41:38 -06:00
skd_s1120.h skd: fix formatting in skd_s1120.h 2013-11-08 09:10:30 -07:00
smart1,2.h fix typos 'comamnd' -> 'command' in comments 2011-02-02 11:31:21 +01:00
sunvdc.c sunvdc: reconnect ldc after vds service domain restarts 2014-12-11 18:52:45 -08:00
swim3.c powerpc: Move Power Macintosh drivers to generic byteswappers 2015-03-23 14:29:40 +11:00
swim_asm.S
swim.c block: drop owner assignment from platform_drivers 2014-10-20 16:20:18 +02:00
sx8.c sx8: use real time for the command seconds 2015-12-23 08:42:59 -07:00
umem.c block, drivers, cgroup: use op_is_write helper instead of checking for REQ_WRITE 2016-06-07 13:41:38 -06:00
umem.h
virtio_blk.c block, drivers: add REQ_OP_FLUSH operation 2016-06-07 13:41:38 -06:00
xen-blkfront.c block: add a separate operation type for secure erase 2016-06-09 09:52:25 -06:00
xsysace.c block: systemace: Remove .owner field for driver 2014-08-21 20:37:54 -05:00
z2ram.c block: remove struct request buffer member 2014-04-15 14:03:02 -06:00