linux_dsm_epyc7002/drivers/md
Xiao Ni fe272570d0 md: Set prev_flush_start and flush_bio in an atomic way
commit dc5d17a3c39b06aef866afca19245a9cfb533a79 upstream.

One customer reports a crash problem which causes by flush request. It
triggers a warning before crash.

        /* new request after previous flush is completed */
        if (ktime_after(req_start, mddev->prev_flush_start)) {
                WARN_ON(mddev->flush_bio);
                mddev->flush_bio = bio;
                bio = NULL;
        }

The WARN_ON is triggered. We use spin lock to protect prev_flush_start and
flush_bio in md_flush_request. But there is no lock protection in
md_submit_flush_data. It can set flush_bio to NULL first because of
compiler reordering write instructions.

For example, flush bio1 sets flush bio to NULL first in
md_submit_flush_data. An interrupt or vmware causing an extended stall
happen between updating flush_bio and prev_flush_start. Because flush_bio
is NULL, flush bio2 can get the lock and submit to underlayer disks. Then
flush bio1 updates prev_flush_start after the interrupt or extended stall.

Then flush bio3 enters in md_flush_request. The start time req_start is
behind prev_flush_start. The flush_bio is not NULL(flush bio2 hasn't
finished). So it can trigger the WARN_ON now. Then it calls INIT_WORK
again. INIT_WORK() will re-initialize the list pointers in the
work_struct, which then can result in a corrupted work list and the
work_struct queued a second time. With the work list corrupted, it can
lead in invalid work items being used and cause a crash in
process_one_work.

We need to make sure only one flush bio can be handled at one same time.
So add spin lock in md_submit_flush_data to protect prev_flush_start and
flush_bio in an atomic way.

Reviewed-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-10 09:29:22 +01:00
..
bcache bcache: only check feature sets when sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES 2021-02-03 23:28:39 +01:00
persistent-data
dm-bio-prison-v1.c
dm-bio-prison-v1.h
dm-bio-prison-v2.c
dm-bio-prison-v2.h
dm-bio-record.h
dm-bufio.c dm integrity: fix flush with external metadata device 2021-01-19 18:27:22 +01:00
dm-builtin.c
dm-cache-background-tracker.c
dm-cache-background-tracker.h
dm-cache-block-types.h
dm-cache-metadata.c
dm-cache-metadata.h
dm-cache-policy-internal.h
dm-cache-policy-smq.c
dm-cache-policy.c
dm-cache-policy.h
dm-cache-target.c
dm-clone-metadata.c
dm-clone-metadata.h
dm-clone-target.c
dm-core.h
dm-crypt.c dm crypt: fix copy and paste bug in crypt_alloc_req_aead 2021-01-27 11:54:52 +01:00
dm-delay.c
dm-dust.c
dm-ebs-target.c
dm-era-target.c
dm-exception-store.c
dm-exception-store.h
dm-flakey.c
dm-historical-service-time.c
dm-init.c
dm-integrity.c dm integrity: conditionally disable "recalculate" feature 2021-01-27 11:54:55 +01:00
dm-io.c
dm-ioctl.c dm ioctl: fix error return code in target_message 2020-12-30 11:53:36 +01:00
dm-kcopyd.c
dm-linear.c
dm-log-userspace-base.c
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c
dm-log.c
dm-mpath.c
dm-mpath.h
dm-path-selector.c
dm-path-selector.h
dm-queue-length.c
dm-raid1.c
dm-raid.c dm raid: fix discard limits for raid1 2021-01-19 18:27:21 +01:00
dm-region-hash.c
dm-round-robin.c
dm-rq.c
dm-rq.h
dm-service-time.c
dm-snap-persistent.c
dm-snap-transient.c
dm-snap.c dm snapshot: flush merged data before committing metadata 2021-01-19 18:27:21 +01:00
dm-stats.c
dm-stats.h
dm-stripe.c
dm-switch.c
dm-sysfs.c
dm-table.c dm: avoid filesystem lookup in dm_get_dev_t() 2021-01-27 11:54:54 +01:00
dm-target.c
dm-thin-metadata.c
dm-thin-metadata.h
dm-thin.c
dm-uevent.c
dm-uevent.h
dm-unstripe.c
dm-verity-fec.c
dm-verity-fec.h
dm-verity-target.c dm verity: skip verity work if I/O error when system is shutting down 2021-01-06 14:56:56 +01:00
dm-verity-verify-sig.c
dm-verity-verify-sig.h
dm-verity.h
dm-writecache.c
dm-zero.c
dm-zoned-metadata.c
dm-zoned-reclaim.c
dm-zoned-target.c
dm-zoned.h
dm.c dm: eliminate potential source of excessive kernel log noise 2021-01-19 18:27:33 +01:00
dm.h
Kconfig dm integrity: select CRYPTO_SKCIPHER 2021-01-27 11:54:57 +01:00
Makefile
md-autodetect.c
md-bitmap.c
md-bitmap.h
md-cluster.c md/cluster: fix deadlock when node is doing resync job 2020-12-30 11:54:25 +01:00
md-cluster.h
md-faulty.c
md-linear.c
md-linear.h
md-multipath.c
md-multipath.h
md.c md: Set prev_flush_start and flush_bio in an atomic way 2021-02-10 09:29:22 +01:00
md.h Revert "md: change mddev 'chunk_sectors' from int to unsigned" 2020-12-14 19:33:01 +01:00
raid0.c Revert "md: add md_submit_discard_bio() for submitting discard bio" 2020-12-09 20:46:01 -08:00
raid0.h
raid1-10.c
raid1.c
raid1.h
raid5-cache.c
raid5-log.h
raid5-ppl.c
raid5.c
raid5.h
raid10.c md/raid10: initialize r10_bio->read_slot before use. 2021-01-06 14:56:49 +01:00
raid10.h