linux_dsm_epyc7002/block
Tejun Heo 70748bba55 blk-iocost: fix weight updates of inner active iocgs
commit e9f4eee9a0023ba22db9560d4cc6ee63f933dae8 upstream.

When the weight of an active iocg is updated, weight_updated() is called
which in turn calls __propagate_weights() to update the active and inuse
weights so that the effective hierarchical weights are update accordingly.

The current implementation is incorrect for inner active nodes. For an
active leaf iocg, inuse can be any value between 1 and active and the
difference represents how much the iocg is donating. When weight is updated,
as long as inuse is clamped between 1 and the new weight, we're alright and
this is what __propagate_weights() currently implements.

However, that's not how an active inner node's inuse is set. An inner node's
inuse is solely determined by the ratio between the sums of inuse's and
active's of its children - ie. they're results of propagating the leaves'
active and inuse weights upwards. __propagate_weights() incorrectly applies
the same clamping as for a leaf when an active inner node's weight is
updated. Consider a hierarchy which looks like the following with saturating
workloads in AA and BB.

     R
   /   \
  A     B
  |     |
 AA     BB

1. For both A and B, active=100, inuse=100, hwa=0.5, hwi=0.5.

2. echo 200 > A/io.weight

3. __propagate_weights() update A's active to 200 and leave inuse at 100 as
   it's already between 1 and the new active, making A:active=200,
   A:inuse=100. As R's active_sum is updated along with A's active,
   A:hwa=2/3, B:hwa=1/3. However, because the inuses didn't change, the
   hwi's remain unchanged at 0.5.

4. The weight of A is now twice that of B but AA and BB still have the same
   hwi of 0.5 and thus are doing the same amount of IOs.

Fix it by making __propgate_weights() always calculate the inuse of an
active inner iocg based on the ratio of child_inuse_sum to child_active_sum.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dan Schatzberg <dschatzberg@fb.com>
Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org # v5.4+
Link: https://lore.kernel.org/r/YJsxnLZV1MnBcqjj@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-19 10:13:11 +02:00
..
partitions block-5.10-2020-10-12 2020-10-13 12:12:44 -07:00
badblocks.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
bfq-cgroup.c bfq: fix blkio cgroup leakage v4 2020-08-18 07:48:08 -07:00
bfq-iosched.c bfq: Avoid false bfq queue merging 2021-03-04 11:37:18 +01:00
bfq-iosched.h bfq: fix blkio cgroup leakage v4 2020-08-18 07:48:08 -07:00
bfq-wf2q.c bfq: fix blkio cgroup leakage v4 2020-08-18 07:48:08 -07:00
bio-integrity.c block: make function __bio_integrity_free() static 2020-07-02 12:38:18 -06:00
bio.c block: only update parent bi_status when bio fail 2021-04-16 11:43:21 +02:00
blk-cgroup-rwstat.c blk-cgroup: Fix the recursive blkg rwstat 2021-03-30 14:31:48 +02:00
blk-cgroup-rwstat.h blk-cgroup: separate out blkg_rwstat under CONFIG_BLK_CGROUP_RWSTAT 2019-11-07 12:28:13 -07:00
blk-cgroup.c blk-cgroup: Use cond_resched() when destroy blkgs 2021-02-13 13:55:13 +01:00
blk-core.c scsi: block: Do not accept any requests while suspended 2021-01-12 20:18:17 +01:00
blk-crypto-fallback.c block: rename generic_make_request to submit_bio_noacct 2020-07-01 07:27:24 -06:00
blk-crypto-internal.h block: make blk_crypto_rq_bio_prep() able to fail 2020-10-05 10:47:43 -06:00
blk-crypto.c block: warn if !__GFP_DIRECT_RECLAIM in bio_crypt_set_ctx() 2020-10-05 10:47:43 -06:00
blk-exec.c block: add a blk_account_io_merge_bio helper 2020-05-27 05:21:23 -06:00
blk-flush.c block: mark flush request as IDLE when it is really finished 2020-11-13 14:24:16 -07:00
blk-integrity.c block: remove the unused blk_integrity_merge_bio export 2020-10-06 07:29:53 -06:00
blk-ioc.c block: remove retry loop in ioc_release_fn() 2020-07-16 10:22:15 -06:00
blk-iocost.c blk-iocost: fix weight updates of inner active iocgs 2021-05-19 10:13:11 +02:00
blk-iolatency.c block: Remove redundant 'return' statement 2020-10-08 07:59:48 -06:00
blk-lib.c block: add a bdev_is_partition helper 2020-09-25 08:18:57 -06:00
blk-map.c block: fix bmd->is_null_mapped initialization 2020-09-23 09:18:39 -06:00
blk-merge.c block: recalculate segment count for multi-segment discards correctly 2021-03-30 14:32:06 +02:00
blk-mq-cpumap.c blk-mq: remove the calling of local_memory_node() 2020-10-20 07:08:17 -06:00
blk-mq-debugfs-zoned.c block: Cleanup license notice 2019-01-17 21:21:40 -07:00
blk-mq-debugfs.c blk-mq-debugfs: Add decode for BLK_MQ_F_TAG_HCTX_SHARED 2021-01-19 18:27:29 +01:00
blk-mq-debugfs.h blk-mq: no need to check return value of debugfs_create functions 2019-06-13 03:00:30 -06:00
blk-mq-pci.c block: Fix blk_mq_*_map_queues() kernel-doc headers 2019-05-31 15:12:34 -06:00
blk-mq-rdma.c block: Fix blk_mq_*_map_queues() kernel-doc headers 2019-05-31 15:12:34 -06:00
blk-mq-sched.c blk-mq: get rid of the dead flush handle code path 2020-10-09 12:35:39 -06:00
blk-mq-sched.h block-5.10-2020-10-12 2020-10-13 12:12:44 -07:00
blk-mq-sysfs.c blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue 2020-10-09 12:46:28 -06:00
blk-mq-tag.c block-mq: fix comments in blk_mq_queue_tag_busy_iter 2020-09-29 08:11:00 -06:00
blk-mq-tag.h blk-mq: Relocate hctx_may_queue() 2020-09-03 15:20:47 -06:00
blk-mq-virtio.c blk-mq: Fix typo in comment 2020-03-17 20:55:21 +01:00
blk-mq.c scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT 2021-01-12 20:18:17 +01:00
blk-mq.h blk-mq: test QUEUE_FLAG_HCTX_ACTIVE for sbitmap_shared in hctx_may_queue 2021-02-03 23:28:44 +01:00
blk-pm.c scsi: block: Fix a race in the runtime power management code 2021-01-06 14:56:50 +01:00
blk-pm.h scsi: block: Do not accept any requests while suspended 2021-01-12 20:18:17 +01:00
blk-rq-qos.c Revert "blk-rq-qos: remove redundant finish_wait to rq_qos_wait." 2020-07-15 09:33:37 -06:00
blk-rq-qos.h blk-rq-qos: fix first node deletion of rq_qos_del() 2019-10-15 10:13:13 -06:00
blk-settings.c blk-settings: align max_sectors on "logical_block_size" boundary 2021-03-04 11:38:22 +01:00
blk-stat.c blk-stat: make q->stats->lock irqsafe 2020-09-01 16:48:46 -06:00
blk-stat.h block: deactivate blk_stat timer in wbt_disable_default() 2018-12-12 06:47:51 -07:00
blk-sysfs.c blk-mq: move cancel of hctx->run_work to the front of blk_exit_queue 2020-10-09 12:46:28 -06:00
blk-throttle.c blk-throttle: Re-use the throtl_set_slice_end() 2020-10-08 08:01:38 -06:00
blk-timeout.c block: blk-timeout: delete duplicated word 2020-07-31 16:29:47 -06:00
blk-wbt.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
blk-wbt.h blk-wbt: remove wbt_update_limits 2020-05-29 16:30:39 -06:00
blk-zoned.c block: Fix REQ_OP_ZONE_RESET_ALL handling 2021-03-30 14:31:51 +02:00
blk.h block: move blk_mq_sched_try_merge to blk-merge.c 2020-10-06 07:29:53 -06:00
bounce.c block: make bio_crypt_clone() able to fail 2020-10-05 10:47:43 -06:00
bsg-lib.c block: drop double zeroing 2020-09-23 09:18:13 -06:00
bsg.c bsg: free the request before return error code 2021-03-04 11:37:42 +01:00
cmdline-parser.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
elevator.c block: fix comment and add lockdep assert 2020-10-09 12:34:06 -06:00
genhd.c block: Suppress uevent for hidden device when removed 2021-03-30 14:31:52 +02:00
ioctl.c block: return -EBUSY when there are open partitions in blkdev_reread_part 2021-04-28 13:39:59 +02:00
ioprio.c block: grant IOPRIO_CLASS_RT to CAP_SYS_NICE 2020-09-01 19:38:33 -06:00
Kconfig blk-wbt: Remove obsolete multiqueue I/O scheduling comment 2020-09-01 16:49:26 -06:00
Kconfig.iosched treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
keyslot-manager.c block/keyslot-manager: prevent crash when num_slots=1 2020-11-20 11:52:52 -07:00
kyber-iosched.c blk-mq: Use pointers for blk_mq_tags bitmap tags 2020-09-03 15:20:47 -06:00
Makefile blk-mq: merge blk-softirq.c into blk-mq.c 2020-06-24 09:15:56 -06:00
mq-deadline.c blk-mq, elevator: Count requests per hctx to improve performance 2020-09-03 15:20:47 -06:00
opal_proto.h block: sed-opal: Change the check condition for regular session validity 2020-03-12 08:00:10 -06:00
scsi_ioctl.c drivers-5.10-2020-10-12 2020-10-13 13:04:41 -07:00
sed-opal.c block: sed-opal: Change the check condition for regular session validity 2020-03-12 08:00:10 -06:00
t10-pi.c block: Allow t10-pi to be modular 2020-01-06 20:59:04 -07:00