linux_dsm_epyc7002/drivers/md
Mikulas Patocka b21555786f dm snapshot: rework COW throttling to fix deadlock
Commit 721b1d98fb ("dm snapshot: Fix excessive memory usage and
workqueue stalls") introduced a semaphore to limit the maximum number of
in-flight kcopyd (COW) jobs.

The implementation of this throttling mechanism is prone to a deadlock:

1. One or more threads write to the origin device causing COW, which is
   performed by kcopyd.

2. At some point some of these threads might reach the s->cow_count
   semaphore limit and block in down(&s->cow_count), holding a read lock
   on _origins_lock.

3. Someone tries to acquire a write lock on _origins_lock, e.g.,
   snapshot_ctr(), which blocks because the threads at step (2) already
   hold a read lock on it.

4. A COW operation completes and kcopyd runs dm-snapshot's completion
   callback, which ends up calling pending_complete().
   pending_complete() tries to resubmit any deferred origin bios. This
   requires acquiring a read lock on _origins_lock, which blocks.

   This happens because the read-write semaphore implementation gives
   priority to writers, meaning that as soon as a writer tries to enter
   the critical section, no readers will be allowed in, until all
   writers have completed their work.

   So, pending_complete() waits for the writer at step (3) to acquire
   and release the lock. This writer waits for the readers at step (2)
   to release the read lock and those readers wait for
   pending_complete() (the kcopyd thread) to signal the s->cow_count
   semaphore: DEADLOCK.

The above was thoroughly analyzed and documented by Nikos Tsironis as
part of his initial proposal for fixing this deadlock, see:
https://www.redhat.com/archives/dm-devel/2019-October/msg00001.html

Fix this deadlock by reworking COW throttling so that it waits without
holding any locks. Add a variable 'in_progress' that counts how many
kcopyd jobs are running. A function wait_for_in_progress() will sleep if
'in_progress' is over the limit. It drops _origins_lock in order to
avoid the deadlock.

Reported-by: Guruswamy Basavaiah <guru2018@gmail.com>
Reported-by: Nikos Tsironis <ntsironis@arrikto.com>
Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com>
Tested-by: Nikos Tsironis <ntsironis@arrikto.com>
Fixes: 721b1d98fb ("dm snapshot: Fix excessive memory usage and workqueue stalls")
Cc: stable@vger.kernel.org # v5.0+
Depends-on: 4a3f111a73a8c ("dm snapshot: introduce account_start_copy() and account_end_copy()")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-10-10 09:46:05 -04:00
..
bcache for-5.4/block-2019-09-16 2019-09-17 16:57:47 -07:00
persistent-data dm space map common: remove check for impossible sm_find_free() return value 2019-08-26 15:39:53 -04:00
dm-bio-prison-v1.c dm: adjust structure members to improve alignment 2018-06-08 11:53:14 -04:00
dm-bio-prison-v1.h
dm-bio-prison-v2.c dm: adjust structure members to improve alignment 2018-06-08 11:53:14 -04:00
dm-bio-prison-v2.h
dm-bio-record.h block: replace bi_bdev with a gendisk pointer and partitions index 2017-08-23 12:49:55 -06:00
dm-bufio.c dm bufio: introduce a global cache replacement 2019-09-13 17:00:21 -04:00
dm-builtin.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dm-cache-background-tracker.c dm cache background tracker: fix sparse warning 2018-04-30 15:40:40 -04:00
dm-cache-background-tracker.h
dm-cache-block-types.h
dm-cache-metadata.c dm cache metadata: Fix loading discard bitset 2019-04-18 16:18:25 -04:00
dm-cache-metadata.h
dm-cache-policy-internal.h
dm-cache-policy-smq.c dm: remove unnecessary unlikely() around WARN_ON_ONCE() 2018-10-16 14:34:59 -04:00
dm-cache-policy.c
dm-cache-policy.h
dm-cache-target.c dm cache: add support for discard passdown to the origin device 2019-03-05 14:53:52 -05:00
dm-clone-metadata.c dm: add clone target 2019-09-12 09:32:31 -04:00
dm-clone-metadata.h dm: add clone target 2019-09-12 09:32:31 -04:00
dm-clone-target.c dm clone: Make __hash_find static 2019-10-08 14:04:54 -04:00
dm-core.h dm: disable DISCARD if the underlying storage no longer supports it 2019-04-04 15:33:59 -04:00
dm-crypt.c dm crypt: omit parsing of the encapsulated cipher 2019-09-03 16:46:16 -04:00
dm-delay.c dm delay: fix a crash when invalid device is specified 2019-04-26 11:29:32 -04:00
dm-dust.c dm dust: use dust block size for badblocklist index 2019-08-21 11:27:17 -04:00
dm-era-target.c treewide: Add SPDX license identifier for more missed files 2019-05-21 10:50:45 +02:00
dm-exception-store.c
dm-exception-store.h - Improve DM snapshot target's scalability by using finer grained 2019-05-16 15:55:48 -07:00
dm-flakey.c block: Kill gfp_t argument of blkdev_report_zones() 2019-07-11 20:04:37 -06:00
dm-init.c docs: device-mapper: move it to the admin-guide 2019-07-15 11:03:01 -03:00
dm-integrity.c block: centralize PI remapping logic to the block layer 2019-09-17 20:03:49 -06:00
dm-io.c dm: Use kzalloc for all structs with embedded biosets/mempools 2018-06-05 08:47:43 -06:00
dm-ioctl.c dm: introduce DM_GET_TARGET_VERSION 2019-09-16 10:18:01 -04:00
dm-kcopyd.c dm kcopyd: always complete failed jobs 2019-08-15 15:57:39 -04:00
dm-linear.c block: Kill gfp_t argument of blkdev_report_zones() 2019-07-11 20:04:37 -06:00
dm-log-userspace-base.c dm: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c dm log writes: fix incorrect comment about the logged sequence example 2019-07-09 14:13:33 -04:00
dm-log.c
dm-mpath.c dm mpath: always free attached_handler_name in parse_path() 2019-04-30 16:51:30 -04:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c dm mpath selector: more evenly distribute ties 2018-01-29 13:44:58 -05:00
dm-raid1.c dm raid1: use struct_size() with kzalloc() 2019-08-26 11:05:32 -04:00
dm-raid.c dm raid: fix updating of max_discard_sectors limit 2019-09-11 16:18:23 -04:00
dm-region-hash.c - Error path bug fix for overflow tests (Dan) 2018-06-12 18:28:00 -07:00
dm-round-robin.c
dm-rq.c block: Delay default elevator initialization 2019-09-05 19:52:34 -06:00
dm-rq.h dm: remove unused _rq_tio_cache and _rq_cache 2019-03-05 14:48:50 -05:00
dm-service-time.c dm mpath selector: more evenly distribute ties 2018-01-29 13:44:58 -05:00
dm-snap-persistent.c dm bufio: move dm-bufio.h to include/linux/ 2018-04-03 15:04:23 -04:00
dm-snap-transient.c
dm-snap.c dm snapshot: rework COW throttling to fix deadlock 2019-10-10 09:46:05 -04:00
dm-stats.c dm stats: use struct_size() helper 2019-09-04 09:39:22 -04:00
dm-stats.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
dm-stripe.c dax: Introduce a ->copy_to_iter dax operation 2018-05-22 23:18:31 -07:00
dm-switch.c dm switch: use struct_size() in kzalloc() 2019-03-05 14:48:51 -05:00
dm-sysfs.c dm: remove legacy request-based IO path 2018-10-11 11:36:09 -04:00
dm-table.c dm: make dm_table_find_target return NULL 2019-08-23 10:13:12 -04:00
dm-target.c dm mpath: fix missing call of path selector type->end_io 2019-04-25 15:38:52 -04:00
dm-thin-metadata.c dm thin metadata: check if in fail_io mode when setting needs_check 2019-07-02 15:50:08 -04:00
dm-thin-metadata.h dm thin: fix passdown_double_checking_shared_status() 2019-01-15 16:10:41 -05:00
dm-thin.c dm thin: add sanity checks to thin-pool and external snapshot creation 2019-03-05 14:53:49 -05:00
dm-uevent.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
dm-uevent.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
dm-unstripe.c dm: Check for device sector overflow if CONFIG_LBDAF is not set 2018-12-18 09:02:26 -05:00
dm-verity-fec.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
dm-verity-fec.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
dm-verity-target.c dm verity: add root hash pkcs#7 signature verification 2019-08-23 10:13:14 -04:00
dm-verity-verify-sig.c dm verity: add root hash pkcs#7 signature verification 2019-08-23 10:13:14 -04:00
dm-verity-verify-sig.h dm verity: add root hash pkcs#7 signature verification 2019-08-23 10:13:14 -04:00
dm-verity.h dm verity: add root hash pkcs#7 signature verification 2019-08-23 10:13:14 -04:00
dm-writecache.c dm writecache: skip writecache_wait for pmem mode 2019-09-05 13:22:05 -04:00
dm-zero.c
dm-zoned-metadata.c dm zoned: fix potential NULL dereference in dmz_do_reclaim() 2019-08-21 11:29:30 -04:00
dm-zoned-reclaim.c dm zoned: fix a few typos 2019-08-15 15:57:43 -04:00
dm-zoned-target.c dm zoned: fix invalid memory access 2019-08-26 10:33:58 -04:00
dm-zoned.h dm zoned: add SPDX license identifiers 2019-08-15 15:57:42 -04:00
dm.c dm: make dm_table_find_target return NULL 2019-08-23 10:13:12 -04:00
dm.h dm: make dm_table_find_target return NULL 2019-08-23 10:13:12 -04:00
Kconfig dm: add clone target 2019-09-12 09:32:31 -04:00
Makefile dm: add clone target 2019-09-12 09:32:31 -04:00
md-bitmap.c md-bitmap: create and destroy wb_info_pool with the change of bitmap 2019-06-20 16:36:00 -07:00
md-bitmap.h md: Avoid namespace collision with bitmap API 2018-08-01 15:49:39 -07:00
md-cluster.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 45 2019-05-24 17:27:12 +02:00
md-cluster.h md-cluster: introduce resync_info_get interface for sanity check 2018-10-18 09:36:35 -07:00
md-faulty.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 47 2019-05-24 17:27:13 +02:00
md-linear.c md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone 2019-09-03 14:49:28 -07:00
md-linear.h Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md 2017-11-14 16:07:26 -08:00
md-multipath.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 47 2019-05-24 17:27:13 +02:00
md-multipath.h md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
md.c md: add feature flag MD_FEATURE_RAID0_LAYOUT 2019-09-13 13:10:06 -07:00
md.h md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone 2019-09-03 14:49:28 -07:00
raid0.c md: add feature flag MD_FEATURE_RAID0_LAYOUT 2019-09-13 13:10:06 -07:00
raid0.h md/raid0: avoid RAID0 data corruption due to layout confusion. 2019-09-13 13:10:05 -07:00
raid1-10.c md: raid1-10: Unify r{1,10}bio_pool_free 2019-06-15 01:37:35 -06:00
raid1.c md/raid1: fail run raid1 array when active disk less than one 2019-09-03 14:52:03 -07:00
raid1.h md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00
raid5-cache.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288 2019-06-05 17:36:37 +02:00
raid5-log.h raid5: set write hint for PPL 2019-03-12 10:15:18 -07:00
raid5-ppl.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288 2019-06-05 17:36:37 +02:00
raid5.c raid5: remove STRIPE_OPS_REQ_PENDING 2019-09-13 13:14:39 -07:00
raid5.h raid5: use bio_end_sector in r5_next_bio 2019-09-13 13:14:43 -07:00
raid10.c md: allow last device to be forcibly removed from RAID1/RAID10. 2019-08-07 10:25:02 -07:00
raid10.h md: convert to bioset_init()/mempool_init() 2018-05-30 15:33:32 -06:00