linux_dsm_epyc7002/block
Douglas Anderson dbc3117d4c block, bfq: NULL out the bic when it's no longer valid
In reboot tests on several devices we were seeing a "use after free"
when slub_debug or KASAN was enabled.  The kernel complained about:

  Unable to handle kernel paging request at virtual address 6b6b6c2b

...which is a classic sign of use after free under slub_debug.  The
stack crawl in kgdb looked like:

 0  test_bit (addr=<optimized out>, nr=<optimized out>)
 1  bfq_bfqq_busy (bfqq=<optimized out>)
 2  bfq_select_queue (bfqd=<optimized out>)
 3  __bfq_dispatch_request (hctx=<optimized out>)
 4  bfq_dispatch_request (hctx=<optimized out>)
 5  0xc056ef00 in blk_mq_do_dispatch_sched (hctx=0xed249440)
 6  0xc056f728 in blk_mq_sched_dispatch_requests (hctx=0xed249440)
 7  0xc0568d24 in __blk_mq_run_hw_queue (hctx=0xed249440)
 8  0xc0568d94 in blk_mq_run_work_fn (work=<optimized out>)
 9  0xc024c5c4 in process_one_work (worker=0xec6d4640, work=0xed249480)
 10 0xc024cff4 in worker_thread (__worker=0xec6d4640)

Digging in kgdb, it could be found that, though bfqq looked fine,
bfqq->bic had been freed.

Through further digging, I postulated that perhaps it is illegal to
access a "bic" (AKA an "icq") after bfq_exit_icq() had been called
because the "bic" can be freed at some point in time after this call
is made.  I confirmed that there certainly were cases where the exact
crashing code path would access the "bic" after bfq_exit_icq() had
been called.  Sspecifically I set the "bfqq->bic" to (void *)0x7 and
saw that the bic was 0x7 at the time of the crash.

To understand a bit more about why this crash was fairly uncommon (I
saw it only once in a few hundred reboots), you can see that much of
the time bfq_exit_icq_fbqq() fully frees the bfqq and thus it can't
access the ->bic anymore.  The only case it doesn't is if
bfq_put_queue() sees a reference still held.

However, even in the case when bfqq isn't freed, the crash is still
rare.  Why?  I tracked what happened to the "bic" after the exit
routine.  It doesn't get freed right away.  Rather,
put_io_context_active() eventually called put_io_context() which
queued up freeing on a workqueue.  The freeing then actually happened
later than that through call_rcu().  Despite all these delays, some
extra debugging showed that all the hoops could be jumped through in
time and the memory could be freed causing the original crash.  Phew!

To make a long story short, assuming it truly is illegal to access an
icq after the "exit_icq" callback is finished, this patch is needed.

Cc: stable@vger.kernel.org
Reviewed-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-28 07:44:19 -06:00
..
partitions block/partitions/ldm: Convert a kernel-doc header into a non-kernel-doc header 2019-05-31 15:12:34 -06:00
badblocks.c block: switch all files cleared marked as GPLv2 to SPDX tags 2019-04-30 16:11:57 -06:00
bfq-cgroup.c block: rename CONFIG_DEBUG_BLK_CGROUP to CONFIG_BFQ_CGROUP_DEBUG 2019-06-20 10:32:35 -06:00
bfq-iosched.c block, bfq: NULL out the bic when it's no longer valid 2019-06-28 07:44:19 -06:00
bfq-iosched.h block, bfq: detect wakers and unconditionally inject their I/O 2019-06-25 09:07:34 -06:00
bfq-wf2q.c block: switch all files cleared marked as GPLv2 or later to SPDX tags 2019-04-30 16:11:59 -06:00
bio-integrity.c block/bio-integrity: use struct_size() in kmalloc() 2019-05-16 08:48:48 -06:00
bio.c block: Remove unused code 2019-06-27 07:34:25 -06:00
blk-cgroup.c blk-cgroup: move struct blkg_stat to bfq 2019-06-20 10:32:34 -06:00
blk-core.c block: update print_req_error() 2019-06-20 13:03:51 -06:00
blk-exec.c block: add SPDX tags to block layer files missing licensing information 2019-04-30 16:12:03 -06:00
blk-flush.c block: switch all files cleared marked as GPLv2 to SPDX tags 2019-04-30 16:11:57 -06:00
blk-integrity.c for-5.2/block-20190507 2019-05-07 18:14:36 -07:00
blk-ioc.c block: remove the queue_lock indirection 2018-11-15 12:17:28 -07:00
blk-iolatency.c blk-iolatency: only account submitted bios 2019-06-20 03:29:56 -06:00
blk-lib.c block: fix 32 bit overflow in __blkdev_issue_discard() 2018-11-14 08:17:18 -07:00
blk-map.c block: remove the bi_phys_segments field in struct bio 2019-06-20 10:29:22 -06:00
blk-merge.c block: untangle the end of blk_bio_segment_split 2019-06-20 10:29:22 -06:00
blk-mq-cpumap.c blk-mq: Document the blk_mq_hw_queue_to_node() arguments 2019-05-31 15:12:34 -06:00
blk-mq-debugfs-zoned.c block: Cleanup license notice 2019-01-17 21:21:40 -07:00
blk-mq-debugfs.c block: use blk_op_str() in blk-mq-debugfs.c 2019-06-20 13:03:51 -06:00
blk-mq-debugfs.h blk-mq: no need to check return value of debugfs_create functions 2019-06-13 03:00:30 -06:00
blk-mq-pci.c block: Fix blk_mq_*_map_queues() kernel-doc headers 2019-05-31 15:12:34 -06:00
blk-mq-rdma.c block: Fix blk_mq_*_map_queues() kernel-doc headers 2019-05-31 15:12:34 -06:00
blk-mq-sched.c block: remove the bi_phys_segments field in struct bio 2019-06-20 10:29:22 -06:00
blk-mq-sched.h block: remove the bi_phys_segments field in struct bio 2019-06-20 10:29:22 -06:00
blk-mq-sysfs.c for-5.2/block-20190507 2019-05-07 18:14:36 -07:00
blk-mq-tag.c block: add SPDX tags to block layer files missing licensing information 2019-04-30 16:12:03 -06:00
blk-mq-tag.h Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block 2017-11-14 15:32:19 -08:00
blk-mq-virtio.c block: Fix blk_mq_*_map_queues() kernel-doc headers 2019-05-31 15:12:34 -06:00
blk-mq.c block: remove the bi_phys_segments field in struct bio 2019-06-20 10:29:22 -06:00
blk-mq.h blk-mq: free hw queue's resource in hctx's release handler 2019-05-04 07:24:05 -06:00
blk-pm.c block: remove the queue_lock indirection 2018-11-15 12:17:28 -07:00
blk-pm.h block: remove the queue_lock indirection 2018-11-15 12:17:28 -07:00
blk-rq-qos.c block: Fix rq_qos_wait() kernel-doc header 2019-05-31 15:12:34 -06:00
blk-rq-qos.h block: add SPDX tags to block layer files missing licensing information 2019-04-30 16:12:03 -06:00
blk-settings.c block: force an unlimited segment size on queues with a virt boundary 2019-05-23 10:25:26 -06:00
blk-softirq.c block: remove a few unused exports 2018-11-15 12:13:25 -07:00
blk-stat.c block: add SPDX tags to block layer files missing licensing information 2019-04-30 16:12:03 -06:00
blk-stat.h block: deactivate blk_stat timer in wbt_disable_default() 2018-12-12 06:47:51 -07:00
blk-sysfs.c block: free sched's request pool in blk_cleanup_queue 2019-06-06 22:39:39 -06:00
blk-throttle.c block: Fix throtl_pending_timer_fn() kernel-doc header 2019-05-31 15:12:34 -06:00
blk-timeout.c block: add SPDX tags to block layer files missing licensing information 2019-04-30 16:12:03 -06:00
blk-wbt.c block: add SPDX tags to block layer files missing licensing information 2019-04-30 16:12:03 -06:00
blk-wbt.h block: remove external dependency on wbt_flags 2018-07-09 09:07:54 -06:00
blk-zoned.c block: add SPDX tags to block layer files missing licensing information 2019-04-30 16:12:03 -06:00
blk.h block: mark blk_rq_bio_prep as inline 2019-06-20 10:29:22 -06:00
bounce.c block: remove the i argument to bio_for_each_segment_all 2019-04-30 09:26:13 -06:00
bsg-lib.c block: Fix bsg_setup_queue() kernel-doc header 2019-05-31 15:12:34 -06:00
bsg.c block: switch all files cleared marked as GPLv2 to SPDX tags 2019-04-30 16:11:57 -06:00
cmdline-parser.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat_ioctl.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
elevator.c block: free sched's request pool in blk_cleanup_queue 2019-06-06 22:39:39 -06:00
genhd.c block: genhd: Use struct_size() helper 2019-06-15 01:46:09 -06:00
ioctl.c block: add SPDX tags to block layer files missing licensing information 2019-04-30 16:12:03 -06:00
ioprio.c block: add SPDX tags to block layer files missing licensing information 2019-04-30 16:12:03 -06:00
Kconfig block: force select mq-deadline for zoned block devices 2019-06-13 03:00:31 -06:00
Kconfig.iosched block: rename CONFIG_DEBUG_BLK_CGROUP to CONFIG_BFQ_CGROUP_DEBUG 2019-06-20 10:32:35 -06:00
kyber-iosched.c block: remove the bi_phys_segments field in struct bio 2019-06-20 10:29:22 -06:00
Makefile block: remove legacy IO schedulers 2018-11-07 13:42:32 -07:00
mq-deadline.c block: remove the bi_phys_segments field in struct bio 2019-06-20 10:29:22 -06:00
opal_proto.h block: switch all files cleared marked as GPLv2 to SPDX tags 2019-04-30 16:11:57 -06:00
partition-generic.c block: fix use-after-free on gendisk 2019-04-22 09:48:12 -06:00
scsi_ioctl.c block: switch all files cleared marked as GPLv2 to SPDX tags 2019-04-30 16:11:57 -06:00
sed-opal.c block: switch all files cleared marked as GPLv2 to SPDX tags 2019-04-30 16:11:57 -06:00
t10-pi.c block: switch all files cleared marked as GPLv2 to SPDX tags 2019-04-30 16:11:57 -06:00