linux_dsm_epyc7002/drivers/md
Robert Becker 1e50915fe0 raid: improve MD/raid10 handling of correctable read errors.
We've noticed severe lasting performance degradation of our raid
arrays when we have drives that yield large amounts of media errors.
The raid10 module will queue each failed read for retry, and also
will attempt call fix_read_error() to perform the read recovery.
Read recovery is performed while the array is frozen, so repeated
recovery attempts can degrade the performance of the array for
extended periods of time.

With this patch I propose adding a per md device max number of
corrected read attempts.  Each rdev will maintain a count of
read correction attempts in the rdev->read_errors field (not
used currently for raid10). When we enter fix_read_error()
we'll check to see when the last read error occurred, and
divide the read error count by 2 for every hour since the
last read error. If at that point our read error count
exceeds the read error threshold, we'll fail the raid device.

In addition in this patch I add sysfs nodes (get/set) for
the per md max_read_errors attribute, the rdev->read_errors
attribute, and added some printk's to indicate when
fix_read_error fails to repair an rdev.

For testing I used debugfs->fail_make_request to inject
IO errors to the rdev while doing IO to the raid array.

Signed-off-by: Robert Becker <Rob.Becker@riverbed.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-12-14 12:51:41 +11:00
..
raid6test md: drivers/md/unroll.pl replaced with awk analog 2009-10-16 16:25:19 +11:00
.gitignore gitignore: misc files 2006-01-01 22:21:50 +01:00
bitmap.c md/bitmap: update dirty flag when bitmap bits are explicitly set. 2009-12-14 12:51:41 +11:00
bitmap.h md: Support write-intent bitmaps with externally managed metadata. 2009-12-14 12:51:41 +11:00
dm-bio-record.h dm: preserve bi_io_vec when resubmitting bios 2009-04-02 19:55:23 +01:00
dm-crypt.c tree-wide: fix a very frequent spelling mistake 2009-11-09 09:40:54 +01:00
dm-delay.c dm table: pass correct dev area size to device_area_is_valid 2009-07-23 20:30:42 +01:00
dm-exception-store.c dm snapshot: allow chunk size to be less than page size 2009-10-16 23:18:22 +01:00
dm-exception-store.h dm snapshot: use unsigned integer chunk size 2009-10-16 23:18:17 +01:00
dm-io.c dm io: retry after barrier error 2009-06-22 10:12:26 +01:00
dm-ioctl.c Driver-Core: extend devnode callbacks to provide permissions 2009-09-19 12:50:38 -07:00
dm-kcopyd.c dm kcopyd: fix callback race 2009-04-09 00:27:17 +01:00
dm-linear.c dm table: pass correct dev area size to device_area_is_valid 2009-07-23 20:30:42 +01:00
dm-log-userspace-base.c dm log: userspace fix incorrect luid cast in userspace_ctr 2009-10-16 23:18:15 +01:00
dm-log-userspace-transfer.c dm/connector: Only process connector packages from privileged processes 2009-10-02 10:54:10 -07:00
dm-log-userspace-transfer.h dm log: userspace add luid to distinguish between concurrent log instances 2009-09-04 20:40:34 +01:00
dm-log.c dm log: fix create_log_context to use logical_block_size of log device 2009-06-22 10:12:33 +01:00
dm-mpath.c [SCSI] scsi_dh: Change the scsidh_activate interface to be asynchronous 2009-12-04 12:00:46 -06:00
dm-mpath.h dm mpath: remove is_active from struct dm_path 2008-10-10 13:36:58 +01:00
dm-path-selector.c dm: path selector use module refcount directly 2009-04-02 19:55:27 +01:00
dm-path-selector.h dm mpath: add start_io and nr_bytes to path selectors 2009-06-22 10:12:27 +01:00
dm-queue-length.c dm mpath: add queue length load balancer 2009-06-22 10:12:27 +01:00
dm-raid1.c bio: first step in sanitizing the bio->bi_rw flag testing 2009-09-11 14:33:31 +02:00
dm-region-hash.c dm raid1: keep retrying alloc if mempool_alloc failed 2009-06-22 10:12:13 +01:00
dm-round-robin.c dm mpath: add start_io and nr_bytes to path selectors 2009-06-22 10:12:27 +01:00
dm-service-time.c dm mpath: add service time load balancer 2009-06-22 10:12:28 +01:00
dm-snap-persistent.c dm snapshot: use unsigned integer chunk size 2009-10-16 23:18:17 +01:00
dm-snap-transient.c dm snapshot: move status to exception store 2009-04-02 19:55:35 +01:00
dm-snap.c dm snapshot: use unsigned integer chunk size 2009-10-16 23:18:17 +01:00
dm-stripe.c block: Optimal I/O limit wrapper 2009-09-14 08:24:52 +02:00
dm-sysfs.c dm: sysfs add suspended attribute 2009-06-22 10:12:29 +01:00
dm-table.c dm stripe: expose correct io hints 2009-09-04 20:40:25 +01:00
dm-target.c dm target: remove struct tt_internal 2009-04-02 19:55:28 +01:00
dm-uevent.c md: replace remaining __FUNCTION__ occurrences 2008-04-28 08:58:42 -07:00
dm-uevent.h dm: uevent generate events 2007-10-20 02:01:26 +01:00
dm-zero.c dm: consolidate target deregistration error handling 2009-01-06 03:04:58 +00:00
dm.c dm: dec_pending needs locking to save error value 2009-10-16 23:18:15 +01:00
dm.h dm: remove queue next_ordered workaround for barriers 2009-07-23 20:30:40 +01:00
faulty.c md: Move check for bitmap presence to personality code. 2009-06-18 08:49:23 +10:00
Kconfig Merge branch 'dmaengine' into async-tx-next 2009-09-08 17:55:21 -07:00
linear.c md: support barrier requests on all personalities. 2009-12-14 12:49:49 +11:00
linear.h md/linear: use call_rcu to free obsolete 'conf' structures. 2009-06-18 08:49:42 +10:00
Makefile md: drivers/md/unroll.pl replaced with awk analog 2009-10-16 16:25:19 +11:00
md.c raid: improve MD/raid10 handling of correctable read errors. 2009-12-14 12:51:41 +11:00
md.h raid: improve MD/raid10 handling of correctable read errors. 2009-12-14 12:51:41 +11:00
mktables.c md/raid6: move raid6 data processing to raid6_pq.ko 2009-03-31 15:09:39 +11:00
multipath.c md: support barrier requests on all personalities. 2009-12-14 12:49:49 +11:00
multipath.h md: remove mddev_to_conf "helper" macro 2009-06-16 16:54:21 +10:00
raid0.c md: support barrier requests on all personalities. 2009-12-14 12:49:49 +11:00
raid0.h md: remove mddev_to_conf "helper" macro 2009-06-16 16:54:21 +10:00
raid1.c md: move offset, daemon_sleep and chunksize out of bitmap structure 2009-12-14 12:51:41 +11:00
raid1.h md/raid1: add takeover support for raid5->raid1 2009-12-14 12:51:41 +11:00
raid5.c md/raid5: don't complete make_request on barrier until writes are scheduled 2009-12-14 12:51:40 +11:00
raid5.h md: fix problems with RAID6 calculations for DDF. 2009-10-16 16:27:34 +11:00
raid6algos.c md: remove sparse warning:symbol XXX was not declared. 2009-12-14 12:49:47 +11:00
raid6altivec.uc md: drivers/md/unroll.pl replaced with awk analog 2009-10-16 16:25:19 +11:00
raid6int.uc md: drivers/md/unroll.pl replaced with awk analog 2009-10-16 16:25:19 +11:00
raid6mmx.c md/raid6: move raid6 data processing to raid6_pq.ko 2009-03-31 15:09:39 +11:00
raid6recov.c md/raid6: move raid6 data processing to raid6_pq.ko 2009-03-31 15:09:39 +11:00
raid6sse1.c md/raid6: move raid6 data processing to raid6_pq.ko 2009-03-31 15:09:39 +11:00
raid6sse2.c md/raid6: move raid6 data processing to raid6_pq.ko 2009-03-31 15:09:39 +11:00
raid6x86.h md: fix typo in FSF address 2009-03-31 14:57:37 +11:00
raid10.c raid: improve MD/raid10 handling of correctable read errors. 2009-12-14 12:51:41 +11:00
raid10.h md: remove mddev_to_conf "helper" macro 2009-06-16 16:54:21 +10:00
unroll.awk md: drivers/md/unroll.pl replaced with awk analog 2009-10-16 16:25:19 +11:00