Commit Graph

220 Commits

Author SHA1 Message Date
Lars Ellenberg
e5f891b223 drbd: gather detailed timing statistics for drbd_requests
Record (in jiffies) how much time a request spends in which stages.
Followup commits will use and present this additional timing information
so we can better locate and tackle the root causes of latency spikes,
or present the backlog for asynchronous replication.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10 18:35:11 +02:00
Lars Ellenberg
e37d2438d8 drbd: track meta data IO intent, start and submit time
For diagnostic purposes, track intent, start time
and latest submit time of meta data IO.

Move separate members from struct drbd_device
into the embeded struct drbd_md_io.
s/md_io_(page|in_use)/md_io.\1/

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10 18:35:10 +02:00
Lars Ellenberg
b9ed7080d7 drbd: consistently use list_add_tail for peer_request tracking
Keep the epoch entry lists (active_ee, read_ee, sync_ee, ...)
consistently "oldest first".  That way finding the oldest not yet
successfully processed request is simply list_first_entry_or_null.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10 18:35:08 +02:00
Lars Ellenberg
15e26f6a3c drbd: add drbd_queue_work_if_unqueued helper
We sometimes do
    if (list_empty(&w.list))
	drbd_queue_work(&q, &w.list);

Removal (list_del_init) may happen outside all locks, after all
pending work entries have been moved to an on-stack local work list.

For not dynamically allocated, but embeded, work structs,
we must avoid to re-add until it really was removed.

Move that list_empty check inside the spin_lock(&q->q_lock)
within the helper function, and change to list_empty_careful().

This may have been the reason for a list_add corruption
inside drbd_queue_work().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10 18:35:07 +02:00
Lars Ellenberg
7f34f61490 drbd: drbd_rs_number_requests: fix unit mismatch in comparison
We try to limit the number of "in-flight" resync requests.
One condition for that is the amount of requested data should not exceed
half of what can be covered by our "max-buffers" setting.

However we compared number of 4k pages with number of in-flight 512 Byte
sectors, and this extra throttle triggered much earlier than intended.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10 18:35:06 +02:00
Lars Ellenberg
506afb6248 drbd: improve resync request throttling due to sendbuf size
If we throttle resync because the socket sendbuffer is filling up,
tell TCP about it, so it may expand the sendbuffer for us.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10 18:35:04 +02:00
Lars Ellenberg
aaaba34576 drbd: implement csums-after-crash-only
Checksum based resync trades CPU cycles for network bandwidth,
in situations where we expect much of the to-be-resynced blocks
to be actually identical on both sides already.

In a "network hickup" scenario, it won't help:
all to-be-resynced blocks will typically be different.

The use case is for the resync of *potentially* different blocks
after crash recovery -- the crash recovery had marked larger areas
(those covered by the activity log) as need-to-be-resynced,
just in case. Most of those blocks will be identical.

This option makes it possible to configure checksum based resync,
but only actually use it for the first resync after primary crash.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10 18:35:00 +02:00
Lars Ellenberg
4dd726f029 drbd: get rid of drbd_queue_work_front
The last user was al_write_transaction, if called with "delegate",
and the last user to call it with "delegate = true" was the receiver
thread, which has no need to delegate, but can call it himself.

Finally drop the delegate parameter, drop the extra
w_al_write_transaction callback, and drop drbd_queue_work_front.

Do not (yet) change dequeue_work_item to dequeue_work_batch, though.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10 18:34:56 +02:00
Lars Ellenberg
ac0acb9e39 drbd: use drbd_device_post_work() in more places
This replaces the md_sync_work member of struct drbd_device
by a new MD_SYNC "work bit" in device->flags.

This replaces the resync_start_work member of struct drbd_device
by a new RS_START "work bit" in device->flags.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10 18:34:55 +02:00
Lars Ellenberg
e334f55095 drbd: make sure disk cleanup happens in worker context
The recent fix to put_ldev() (correct ordering of access to local_cnt
and state.disk; memory barrier in __drbd_set_state) guarantees
that the cleanup happens exactly once.

However it does not yet guarantee that the cleanup happens from worker
context, the last put_ldev() may still happen from atomic context,
which must not happen: blkdev_put() may sleep.

Fix this by scheduling the cleanup to the worker instead,
using a couple more bits in device->flags and a new helper,
drbd_device_post_work().

Generalized the "resync progress" work to cover these new work bits.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10 18:34:55 +02:00
Lars Ellenberg
5ab7d2c005 drbd: fix resync finished detection
This fixes one recent regresion,
and one long existing bug.

The bug:
drbd_try_clear_on_disk_bm() assumed that all "count" bits have to be
accounted in the resync extent corresponding to the start sector.

Since we allow application requests to cross our "extent" boundaries,
this assumption is no longer true, resulting in possible misaccounting,
scary messages
("BAD! sector=12345s enr=6 rs_left=-7 rs_failed=0 count=58 cstate=..."),
and potentially, if the last bit to be cleared during resync would
reside in previously misaccounted resync extent, the resync would never
be recognized as finished, but would be "stalled" forever, even though
all blocks are in sync again and all bits have been cleared...

The regression was introduced by
    drbd: get rid of atomic update on disk bitmap works

For an "empty" resync (rs_total == 0), we must not "finish" the
resync on the SyncSource before the SyncTarget knows all relevant
information (sync uuid).  We need to wait for the full round-trip,
the SyncTarget will then explicitly notify us.

Also for normal, non-empty resyncs (rs_total > 0), the resync-finished
condition needs to be tested before the schedule() in wait_for_work, or
it is likely to be missed.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10 18:34:50 +02:00
Lars Ellenberg
a80ca1ae81 drbd: fix a race stopping the worker thread
We may implicitly call drbd_send() from inside wait_for_work(),
via maybe_send_barrier().

If the "stop" signal was send just before that, drbd_send() would call
flush_signals(), and we would run an unbounded schedule() afterwards.

Fix: check for thread_state == RUNNING before we schedule()

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10 18:34:50 +02:00
Lars Ellenberg
c7a58db4e9 drbd: get rid of atomic update on disk bitmap works
Just trigger the occasional lazy bitmap write-out during resync
from the central wait_for_work() helper.

Previously, during resync, bitmap pages would be written out separately,
synchronously, one at a time, at least 8 times each (every 512 bytes
worth of bitmap cleared).

Now we trigger "merge friendly" bulk write out of all cleared pages
every two seconds during resync, and once the resync is finished.
Most pages will be written out only once.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10 18:34:49 +02:00
Lars Ellenberg
44a4d55184 drbd: refactor use of first_peer_device()
Reduce the number of calls to first_peer_device(). Instead, call
first_peer_device() just once to assign a local variable peer_device.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2014-07-10 15:22:22 +02:00
Lars Ellenberg
f9c78128f8 drbd: always implicitly close last epoch when idle
Once our sender thread needs to wait_for_work(),
and actually needs to schedule(), just before we do that,
we already check if it is useful to implicitly close the last epoch.

The condition was too strict: only implicitly close the epoch,
if there have been no new (write) requests at all.

The assumption was that if there were new requests, they would
always be communicated one way or another, and would send necessary
epoch separating barriers explicitly.

This is not always true, e.g. when becoming diskless,
or while explicitly starting a full resync.

The last communicated epoch could stay open for a long time,
locking down corresponding activity log extents.

It is safe to always implicitly send that last barrier, as soon as we
determin that there cannot be more requests in the last communicated
epoch, even if there have been (uncommunicated) new requests in new
epochs meanwhile.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:55 -06:00
Lars Ellenberg
074f4afeb2 drbd: fix a race between start_resync and send_and_submit
In the drbd make request function, specifically in
drbd_send_and_submit(), we decide whether we want to send the actual
write request, or only a "set this block out of sync" information.

We do so based on the current connection state, while holding the req_lock.
The connection state is not supposed to change while holding the req_lock.

But in drbd_start_resync, we did change that state anyways,
while only holding the global_state_lock, which is enough to change
sync-after dependencies (paused vs active resync), but
not good enough to change the connection state.

Fix: in drbd_start_resync, first grab the req_lock to serialize with
drbd_send_and_submit(), before grabbing the global_state_lock
to be able to evaluate the sync-after dependencies.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:55 -06:00
Lars Ellenberg
2f632aeb53 drbd: prepare sending side for REQ_DISCARD
Note that I do NOT call __drbd_chk_io_error for failed REQ_DISCARD.
That may be wrong, though, or needs to differ between EOPNOTSUPP and
other errors...

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:55 -06:00
Lars Ellenberg
a0fb3c47a1 drbd: prepare receiving side for REQ_DISCARD
If the receiver needs to serve a discard request on a queue that does
not announce to be discard cabable, it falls back to do synchronous
blkdev_issue_zeroout().

We expect only "reasonably" large (up to one activity log extent?)
discard requests.

We do this to not to not block the receiver for too long in this
fallback code path, and to not set/clear too many bits inside one
spinlock_irq_save() in drbd_set_in_sync/drbd_set_out_of_sync,

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:55 -06:00
Lars Ellenberg
0e49d7b014 drbd: fix potential distributed deadlock during verify or resync
If max-buffers and socket buffer sizes are "too small" for the chosen
resync rate, this could lead potentially lead to a distributed deadlock,
which may or may not resolve itself via the "ko-count" and request
timeout mechanism, or could be resolved by forced disconnect.

One option to deal with this is proper configuration:
use larger max-buffer and socket buffers settings,
or reduce the resync rate.

But even with bad configuration we should not deadlock,
but "gracefully" recover.

The issue is avoided by using only up to max-buffers/2 for resync
requests, and by using max-buffers not as a hard limit for data buffer
allocations, but as a throttle threshold only.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:54 -06:00
Lars Ellenberg
6377b92350 drbd: resync: fix too large bursts for very slow rates
While merging adjacent dirty blocks into resync requests,
the resync rate throttle was disregarded.
For very low resync rates, the effective rate may have exceeded
the intended rate by a larger margin.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2014-04-30 13:46:54 -06:00
Andreas Gruenbacher
6780139c0a drbd: Use the right peer device
in w_e_ (peer request) callbacks and in peer request I/O completion handlers

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:50:47 +01:00
Andreas Gruenbacher
6db7e50a8a drbd: In the worker thread, process drbd_work instead of drbd_device_work items
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:50:44 +01:00
Andreas Gruenbacher
d448a2e1e3 drbd: Turn w_make_ov_request and make_resync_request into "normal" functions
These functions are not used as drbd_work callbacks.

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:50:43 +01:00
Andreas Gruenbacher
4d010392f4 drbd: Make w_make_resync_request() static
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:50:42 +01:00
Andreas Gruenbacher
a8cd15ba79 drbd: struct drbd_peer_request: Use drbd_work instead of drbd_device_work
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:50:41 +01:00
Andreas Gruenbacher
84b8c06b65 drbd: Create a dedicated struct drbd_device_work
drbd_device_work is a work item that has a reference to a device,
while drbd_work is a more generic work item that does not carry
a reference to a device.

All callbacks get a pointer to a drbd_work instance, those callbacks
that expect a drbd_device_work use the container_of macro to get it.

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:50:39 +01:00
Andreas Gruenbacher
8682eae9b4 drbd: Rename w_prev_work_done -> w_complete
Also move it to drbd_receiver.c and make it static.

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:50:38 +01:00
Andreas Gruenbacher
69a227731a drbd: Pass a peer device to a number of fuctions
These functions actually operate on a peer device, or
need a peer device.

drbd_prepare_command(), drbd_send_command(), drbd_send_sync_param()
drbd_send_uuids(), drbd_gen_and_send_sync_uuid(), drbd_send_sizes()
drbd_send_state(), drbd_send_current_state(), and drbd_send_state_req()
drbd_send_sr_reply(), drbd_send_ack(), drbd_send_drequest(),
drbd_send_drequest_csum(), drbd_send_ov_request(), drbd_send_dblock()
drbd_send_block(), drbd_send_out_of_sync(), recv_dless_read()
drbd_drain_block(), receive_bitmap_plain(), recv_resync_read()
read_in_block(), read_for_csum(), drbd_alloc_pages(), drbd_alloc_peer_req()
need_peer_seq(), update_peer_seq(), wait_for_and_update_peer_seq()
drbd_sync_handshake(), drbd_asb_recover_{0,1,2}p(), drbd_connected()
drbd_disconnected(), decode_bitmap_c() and recv_bm_rle_bits()

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:50:31 +01:00
Andreas Gruenbacher
79a3c8d38c drbd: drbd_csum_bio(), drbd_csum_ee(): Remove unused device argument
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:49:36 +01:00
Andreas Gruenbacher
0500813fe0 drbd: Move conf_mutex from connection to resource
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:46:46 +01:00
Andreas Gruenbacher
0b0ba1efc7 drbd: Add explicit device parameter to D_ASSERT
The implicit dependency on a variable inside the macro is problematic.

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:45:04 +01:00
Andreas Gruenbacher
1ec861ebd0 drbd: Replace and remove the obsolete conn_() macros
With the polymorphic drbd_() macros, we no longer need the connection
specific variants.

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:45:03 +01:00
Andreas Gruenbacher
d018017102 drbd: Remove the terrible DEV hack
DRBD was using dev_err() and similar all over the code; instead of having to
write dev_err(disk_to_dev(device->vdisk), ...) to convert a drbd_device into a
kernel device, a DEV macro was used which implicitly references the device
variable.  This is terrible; introduce separate drbd_err() and similar macros
with an explicit device parameter instead.

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:45:01 +01:00
Andreas Gruenbacher
c06ece6ba6 drbd: Turn connection->volumes into connection->peer_devices
Let connection->peer_devices point to peer devices; connection->volumes was
pointing to devices.

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:45:00 +01:00
Andreas Gruenbacher
05a10ec790 drbd: Improve some function and variable naming
Rename functions
conn_destroy() -> drbd_destroy_connection(),
drbd_minor_destroy() -> drbd_destroy_device()
drbd_adm_add_minor() -> drbd_adm_add_minor()
drbd_adm_delete_minor() -> drbd_adm_del_minor()

Rename global variable minors to drbd_devices

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:44:52 +01:00
Andreas Gruenbacher
a6b32bc3ce drbd: Introduce "peer_device" object between "device" and "connection"
In a setup where a device (aka volume) can replicate to multiple peers and one
connection can be shared between multiple devices, we need separate objects to
represent devices on peer nodes and network connections.

As a first step to introduce multiple connections per device, give each
drbd_device object a single drbd_peer_device object which connects it to a
drbd_connection object.

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:44:51 +01:00
Andreas Gruenbacher
bde89a9e15 drbd: Rename drbd_tconn -> drbd_connection
sed -i -e 's:all_tconn:connections:g' -e 's:tconn:connection:g'

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:44:47 +01:00
Andreas Gruenbacher
b30ab7913b drbd: Rename "mdev" to "device"
sed -i -e 's:mdev:device:g'

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:42:24 +01:00
Andreas Gruenbacher
5476169793 drbd: Rename struct drbd_conf -> struct drbd_device
sed -i -e 's:\<drbd_conf\>:drbd_device:g'

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:36:44 +01:00
Andreas Gruenbacher
a3603a6e3b drbd: Split off on-the-wire protocol definitions
Keep the protocol definitions separate from the kernel code; they are useful in
their own right.

Signed-off-by: Andreas Gruenbacher <agruen@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:27:49 +01:00
Rashika Kheria
a186e47856 drivers: block: Mark the function as static in drbd_worker.c
Mark functions drbd_endio_read_sec_final(), drbd_send_barrier(),
need_to_send_barrier(), dequeue_work_batch(), dequeue_work_item() and
wait_for_work() as static in drbd/drbd_worker.c because they are not
used outside this file.

This eliminates the following warnings in drbd/drbd_worker.c:
drivers/block/drbd/drbd_worker.c:99:6: warning: no previous prototype for ‘drbd_endio_read_sec_final’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_worker.c:1276:5: warning: no previous prototype for ‘drbd_send_barrier’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_worker.c:1774:6: warning: no previous prototype for ‘need_to_send_barrier’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_worker.c:1798:6: warning: no previous prototype for ‘dequeue_work_batch’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_worker.c:1806:6: warning: no previous prototype for ‘dequeue_work_item’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_worker.c:1815:6: warning: no previous prototype for ‘wait_for_work’ [-Wmissing-prototypes]

Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
2014-02-17 16:19:39 +01:00
Kent Overstreet
7988613b0e block: Convert bio_for_each_segment() to bvec_iter
More prep work for immutable biovecs - with immutable bvecs drivers
won't be able to use the biovec directly, they'll need to use helpers
that take into account bio->bi_iter.bi_bvec_done.

This updates callers for the new usage without changing the
implementation yet.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Paul Clements <Paul.Clements@steeleye.com>
Cc: Jim Paris <jim@jtan.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@inktank.com>
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: Nagalakshmi Nandigama <Nagalakshmi.Nandigama@lsi.com>
Cc: Sreekanth Reddy <Sreekanth.Reddy@lsi.com>
Cc: support@lsi.com
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Quoc-Son Anh <quoc-sonx.anh@intel.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-m68k@lists.linux-m68k.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: drbd-user@lists.linbit.com
Cc: nbd-general@lists.sourceforge.net
Cc: cbe-oss-dev@lists.ozlabs.org
Cc: xen-devel@lists.xensource.com
Cc: virtualization@lists.linux-foundation.org
Cc: linux-raid@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: DL-MPTFusionLinux@lsi.com
Cc: linux-scsi@vger.kernel.org
Cc: devel@driverdev.osuosl.org
Cc: linux-fsdevel@vger.kernel.org
Cc: cluster-devel@redhat.com
Cc: linux-mm@kvack.org
Acked-by: Geoff Levand <geoff@infradead.org>
2013-11-23 22:33:49 -08:00
Lars Ellenberg
a3f8f7dc7a drbd: validate resync_after dependency on attach already
We validated resync_after dependencies, if changed via disk-options.
But we did not validate them when first created via attach.
We also did not check or cleanup dependencies that used to be correct,
but now point to meanwhile removed minor devices.

If the drbd_resync_after_valid() validation in disk-options tried to
follow a dependency chain in this way, this could lead to NULL pointer
dereference.

Validate resync_after settings in drbd_adm_attach() already, as well as
in drbd_adm_disk_opts(), and and only reject dependency loops.
Depending on non-existing disks is allowed and equivalent to no dependency.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-28 10:10:25 -06:00
Philipp Reisner
a700471bf3 drbd: abort start of resync early, if it raced with connection breakage
We've seen a spurious full resync, because a connection breakage
raced with drbd_start_resync(, C_SYNC_TARGET),
and the resulting state change request intended to start the resync
ended up looking like a local invalidate.

Fix:
Double check the state inside the lock,
and don't even request that state change,
if we had connection or IO problems.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-28 10:10:24 -06:00
Lars Ellenberg
56392d2f40 drbd: Clarify when activity log I/O is delegated to the worker thread
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-22 18:14:00 -06:00
Lars Ellenberg
c04ccaa669 drbd: read meta data early, base on-disk offsets on super block
We used to calculate all on-disk meta data offsets, and then compare
the stored offsets, basically treating them as magic numbers.

Now with the activity log striping, the activity log size is no longer
fixed.  We need to first read the super block, then base the activity
log and bitmap offsets on the stored offsets/al stripe settings.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-03-22 18:13:59 -06:00
Philipp Reisner
328e0f125b drbd: Broadcast sync progress no more often than once per second
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09 14:11:43 +01:00
Lars Ellenberg
edc9f5eb7a drbd: always write bitmap on detach
If we detach due to local read-error (which sets a bit in the bitmap),
stay Primary, and then re-attach (which re-reads the bitmap from disk),
we potentially lost the "out-of-sync" (or, "bad block") information in
the bitmap.

Always (try to) write out the changed bitmap pages before going diskless.

That way, we don't lose the bit for the bad block,
the next resync will fetch it from the peer, and rewrite
it locally, which may result in block reallocation in some
lower layer (or the hardware), and thereby "heal" the bad blocks.

If the bitmap writeout errors out as well, we will (again: try to)
mark the "we need a full sync" bit in our super block,
if it was a READ error; writes are covered by the activity log already.

If that superblock does not make it to disk either, we are sorry.

Maybe we just lost an entire disk or controller (or iSCSI connection),
and there actually are no bad blocks at all, so we don't need to
re-fetch from the peer, there is no "auto-healing" necessary.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09 14:11:41 +01:00
Lars Ellenberg
8747d30af9 drbd: a few more GFP_KERNEL -> GFP_NOIO
This has not yet been observed, but conceivably, when using GFP_KERNEL
allocations from drbd_md_sync(), drbd_flush_after_epoch() or
receive_SyncParam(), we could trigger additional IO to our own device,
or an other device in a criss-cross setup, and end up in a local
deadlock, or potentially a distributed deadlock in a criss-cross setup
involving the peer blocked in a similar way waiting for us to make
progress.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09 14:11:40 +01:00
Lars Ellenberg
a506c13a4d drbd: use list_move_tail instead of list_del/list_add_tail
Using list_move_tail() instead of list_del() + list_add_tail().

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
2012-11-09 14:11:39 +01:00