linux_dsm_epyc7002/drivers/block/drbd
Lars Ellenberg 2681f7f6ce drbd: fix potential protocol error and resulting disconnect/reconnect
When we notice a disk failure on the receiving side,
we stop sending it new incoming writes.

Depending on exact timing of various events, the same transfer log epoch
could end up containing both replicated (before we noticed the failure)
and local-only requests (after we noticed the failure).

The sanity checks in tl_release(), called when receiving a
P_BARRIER_ACK, check that the ack'ed transfer log epoch matches
the expected epoch, and the number of contained writes matches
the number of ack'ed writes.

In this case, they counted both replicated and local-only writes,
but the peer only acknowledges those it has seen.  We get a mismatch,
resulting in a protocol error and disconnect/reconnect cycle.

Messages logged are
  "BAD! BarrierAck #%u received with n_writes=%u, expected n_writes=%u!\n"

A similar issue can also be triggered when starting a resync while
having a healthy replication link, by invalidating one side, forcing a
full sync, or attaching to a diskless node.

Fix this by closing the current epoch if the state changes in a way
that would cause the replication intent of the next write.

Epochs now contain either only non-replicated,
or only replicated writes.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2013-01-21 22:58:36 +01:00
..
drbd_actlog.c drbd: don't try to clear bits once the disk has failed 2012-11-09 14:11:42 +01:00
drbd_bitmap.c drbd: use copy_highpage 2012-11-09 14:22:26 +01:00
drbd_int.h drbd: fixup after wait_even_lock_irq() addition to generic code 2012-11-30 21:20:15 +01:00
drbd_interval.c Merge branch 'drbd-8.4_ed6' into for-3.8-drivers-drbd-8.4_ed6 2012-11-09 14:20:23 +01:00
drbd_interval.h drbd: Iterate over all overlapping intervals in a tree 2011-10-14 16:47:37 +02:00
drbd_main.c drbd: Remove obsolete check 2012-12-06 12:09:55 +01:00
drbd_nl.c drbd: Fix drbdsetup wait-connect, wait-sync etc... commands 2012-12-06 13:04:34 +01:00
drbd_nla.c drbd: Split off netlink mandatory attribute handling into separate file 2012-11-08 16:57:45 +01:00
drbd_nla.h drbd: Split off netlink mandatory attribute handling into separate file 2012-11-08 16:57:45 +01:00
drbd_proc.c drbd: introduce stop-sector to online verify 2012-11-09 14:05:32 +01:00
drbd_receiver.c drbd: close race between drbd_set_role and drbd_connect 2012-12-06 13:00:33 +01:00
drbd_req.c drbd: fix potential protocol error and resulting disconnect/reconnect 2013-01-21 22:58:36 +01:00
drbd_req.h drbd: fix potential protocol error and resulting disconnect/reconnect 2013-01-21 22:58:36 +01:00
drbd_state.c drbd: fix potential protocol error and resulting disconnect/reconnect 2013-01-21 22:58:36 +01:00
drbd_state.h drbd: Improved logging of state changes 2012-11-08 16:45:06 +01:00
drbd_strings.c drbd: Allow volumes to become primary only on one side 2012-11-04 00:16:31 +01:00
drbd_vli.h Fix common misspellings 2011-03-31 11:26:23 -03:00
drbd_worker.c drbd: Broadcast sync progress no more often than once per second 2012-11-09 14:11:43 +01:00
drbd_wrappers.h drbd: Split off netlink mandatory attribute handling into separate file 2012-11-08 16:57:45 +01:00
Kconfig drbd: update Kconfig to match current dependencies 2012-12-06 13:08:29 +01:00
Makefile drbd: Split off netlink mandatory attribute handling into separate file 2012-11-08 16:57:45 +01:00