linux_dsm_epyc7002/drivers/md
Eivind Sarto cf170f3fa4 raid5: avoid release list until last reference of the stripe
The (lockless) release_list reduces lock contention, but there is excessive
queueing and dequeuing of stripes on this list.  A stripe will currently be
queued on the release_list with a stripe reference count > 1.  This can cause
the raid5 kernel thread(s) to dequeue the stripe and decrement the refcount
without doing any other useful processing of the stripe.  The are two cases
when the stripe can be put on the release_list multiple times before it is
actually handled by the kernel thread(s).
1) make_request() activates the stripe processing in 4k increments.  When a
   write request is large enough to span multiple chunks of a stripe_head, the
   first 4k chunk adds the stripe to the plug list.  The next 4k chunk that is
   processed for the same stripe puts the stripe on the release_list with a
   refcount=2.  This can cause the kernel thread to process and decrement the
   stripe before the stripe us unplugged, which again will put it back on the
   release_list.
2) Whenever IO is scheduled on a stripe (pre-read and/or write), the stripe
   refcount is set to the number of active IO (for each chunk).  The stripe is
   released as each IO complete, and can be queued and dequeued multiple times
   on the release_list, until its refcount finally reached zero.

This simple patch will ensure a stripe is only queued on the release_list when
its refcount=1 and is ready to be handled by the kernel thread(s).  I added some
instrumentation to raid5 and counted the number of times striped were queued on
the release_list for a variety of write IO sizes.  Without this patch the number
of times stripes got queued on the release_list was 100-500% higher than with
the patch.  The excess queuing will increase with the IO size.  The patch also
improved throughput by 5-10%.

Signed-off-by: Eivind Sarto <esarto@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-05-29 16:59:46 +10:00
..
bcache bcache: remove nested function usage 2014-03-18 12:39:28 -07:00
persistent-data dm transaction manager: fix corruption due to non-atomic transaction commit 2014-03-27 16:56:23 -04:00
bitmap.c md/bitmap: don't abuse i_writecount for bitmap files. 2014-04-09 12:26:59 +10:00
bitmap.h kernfs: s/sysfs_dirent/kernfs_node/ and rename its friends accordingly 2013-12-11 15:28:36 -08:00
dm-bio-prison.c dm: add cache target 2013-03-01 22:45:51 +00:00
dm-bio-prison.h dm: add cache target 2013-03-01 22:45:51 +00:00
dm-bio-record.h dm: Refactor for new bio cloning/splitting 2013-11-23 22:33:55 -08:00
dm-bufio.c Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block 2014-01-30 11:19:05 -08:00
dm-bufio.h dm snapshot: use dm-bufio prefetch 2014-01-14 23:23:03 -05:00
dm-builtin.c dm sysfs: fix a module unload race 2014-01-14 23:23:04 -05:00
dm-cache-block-types.h dm cache: remove remainder of distinct discard block size 2014-03-27 16:56:23 -04:00
dm-cache-metadata.c dm cache: fix a lock-inversion 2014-04-04 14:53:05 -04:00
dm-cache-metadata.h dm cache: fix a lock-inversion 2014-04-04 14:53:05 -04:00
dm-cache-policy-cleaner.c dm cache: policy change version from string to integer set 2013-03-20 17:21:27 +00:00
dm-cache-policy-internal.h dm cache: add remove_cblock method to policy interface 2013-11-11 11:37:50 -05:00
dm-cache-policy-mq.c dm cache mq: fix memory allocation failure for large cache devices 2014-02-28 12:18:29 -05:00
dm-cache-policy.c dm cache: add policy name to status output 2014-01-16 13:44:11 -05:00
dm-cache-policy.h dm cache: add policy name to status output 2014-01-16 13:44:11 -05:00
dm-cache-target.c dm cache: fix writethrough mode quiescing in cache_map 2014-05-01 16:14:24 -04:00
dm-crypt.c dm crypt: fix cpu hotplug crash by removing per-cpu structure 2014-05-14 16:11:35 -04:00
dm-delay.c Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block 2014-01-30 11:19:05 -08:00
dm-era-target.c dm: take care to copy the space map roots before locking the superblock 2014-03-27 16:56:23 -04:00
dm-exception-store.c dm: replace simple_strtoul 2012-07-27 15:07:59 +01:00
dm-exception-store.h dm snapshot: test chunk size against both origin and snapshot 2010-08-12 04:13:51 +01:00
dm-flakey.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
dm-io.c dm io: fix I/O to multiple destinations 2014-02-17 11:00:05 -05:00
dm-ioctl.c dm: allow remove to be deferred 2013-11-09 18:20:22 -05:00
dm-kcopyd.c dm: stop using WQ_NON_REENTRANT 2013-08-23 09:02:13 -04:00
dm-linear.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
dm-log-userspace-base.c dm log userspace: allow mark requests to piggyback on flush requests 2014-01-21 23:46:27 -05:00
dm-log-userspace-transfer.c connector: add portid to unicast in addition to broadcasting 2014-02-07 15:40:17 -08:00
dm-log-userspace-transfer.h
dm-log.c dm: use memweight() 2012-07-30 17:25:16 -07:00
dm-mpath.c dm mpath: fix lock order inconsistency in multipath_ioctl 2014-05-14 16:12:17 -04:00
dm-mpath.h
dm-path-selector.c md: Add module.h to all files using it implicitly 2011-10-31 19:31:18 -04:00
dm-path-selector.h
dm-queue-length.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-raid1.c dm raid1: fix immutable biovec related BUG when retrying read bio 2014-02-18 10:48:57 -05:00
dm-raid.c MD: Remember the last sync operation that was performed 2013-06-26 12:38:24 +10:00
dm-region-hash.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
dm-round-robin.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-service-time.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-snap-persistent.c dm snapshot: fix metadata corruption 2014-03-03 17:58:13 -05:00
dm-snap-transient.c md: Add in export.h for files using EXPORT_SYMBOL 2011-10-31 19:31:19 -04:00
dm-snap.c Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block 2014-01-30 11:19:05 -08:00
dm-stats.c dm stats: initialize read-only module parameter 2013-12-10 19:13:21 -05:00
dm-stats.h dm: add statistics support 2013-09-05 20:46:06 -04:00
dm-stripe.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
dm-switch.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
dm-sysfs.c dm sysfs: fix a module unload race 2014-01-14 23:23:04 -05:00
dm-table.c dm table: add dm_table_run_md_queue_async 2014-03-27 16:56:24 -04:00
dm-target.c dm: allow error target to replace bio-based and request-based targets 2013-09-05 20:46:05 -04:00
dm-thin-metadata.c dm: take care to copy the space map roots before locking the superblock 2014-03-27 16:56:23 -04:00
dm-thin-metadata.h dm thin: ensure user takes action to validate data and metadata consistency 2014-03-05 15:25:35 -05:00
dm-thin.c dm thin: add timeout to stop out-of-data-space mode holding IO forever 2014-05-14 16:11:37 -04:00
dm-uevent.c md: Add in export.h for files using EXPORT_SYMBOL 2011-10-31 19:31:19 -04:00
dm-uevent.h
dm-verity.c dm verity: fix biovecs hash calculation regression 2014-04-15 12:19:24 -04:00
dm-zero.c dm: rename request variables to bios 2013-03-01 22:45:47 +00:00
dm.c dm table: add dm_table_run_md_queue_async 2014-03-27 16:56:24 -04:00
dm.h dm table: add dm_table_run_md_queue_async 2014-03-27 16:56:24 -04:00
faulty.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
Kconfig dm: add era target 2014-03-27 16:56:23 -04:00
linear.c block: Introduce new bio_split() 2013-11-23 22:33:57 -08:00
linear.h md/linear: typedef removal: linear_conf_t -> struct linear_conf 2011-10-11 16:48:54 +11:00
Makefile dm: add era target 2014-03-27 16:56:23 -04:00
md.c md: md_clear_badblocks should return an error code on failure. 2014-05-29 16:59:46 +10:00
md.h md/bitmap: don't abuse i_writecount for bitmap files. 2014-04-09 12:26:59 +10:00
multipath.c block: Abstract out bvec iterator 2013-11-23 22:33:47 -08:00
multipath.h md/multipath: typedef removal: multipath_conf_t -> struct mpconf 2011-10-11 16:48:57 +11:00
raid0.c block: Introduce new bio_split() 2013-11-23 22:33:57 -08:00
raid0.h md: add proper merge_bvec handling to RAID0 and Linear. 2012-03-19 12:46:39 +11:00
raid1.c md/raid1: r1buf_pool_alloc: free allocate pages when subsequent allocation fails. 2014-04-09 14:42:23 +10:00
raid1.h raid1: Rewrite the implementation of iobarrier. 2013-11-19 15:19:18 +11:00
raid5.c raid5: avoid release list until last reference of the stripe 2014-05-29 16:59:46 +10:00
raid5.h md update for 3.13. 2013-11-20 13:05:25 -08:00
raid10.c md/raid10: call wait_barrier() for each request submitted. 2014-05-06 09:49:26 +10:00
raid10.h MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1) 2013-02-26 11:55:30 +11:00