linux_dsm_epyc7002/include
Tejun Heo a051661ca6 blkcg: implement per-blkg request allocation
Currently, request_queue has one request_list to allocate requests
from regardless of blkcg of the IO being issued.  When the unified
request pool is used up, cfq proportional IO limits become meaningless
- whoever grabs the next request being freed wins the race regardless
of the configured weights.

This can be easily demonstrated by creating a blkio cgroup w/ very low
weight, put a program which can issue a lot of random direct IOs there
and running a sequential IO from a different cgroup.  As soon as the
request pool is used up, the sequential IO bandwidth crashes.

This patch implements per-blkg request_list.  Each blkg has its own
request_list and any IO allocates its request from the matching blkg
making blkcgs completely isolated in terms of request allocation.

* Root blkcg uses the request_list embedded in each request_queue,
  which was renamed to @q->root_rl from @q->rq.  While making blkcg rl
  handling a bit harier, this enables avoiding most overhead for root
  blkcg.

* Queue fullness is properly per request_list but bdi isn't blkcg
  aware yet, so congestion state currently just follows the root
  blkcg.  As writeback isn't aware of blkcg yet, this works okay for
  async congestion but readahead may get the wrong signals.  It's
  better than blkcg completely collapsing with shared request_list but
  needs to be improved with future changes.

* After this change, each block cgroup gets a full request pool making
  resource consumption of each cgroup higher.  This makes allowing
  non-root users to create cgroups less desirable; however, note that
  allowing non-root users to directly manage cgroups is already
  severely broken regardless of this patch - each block cgroup
  consumes kernel memory and skews IO weight (IO weights are not
  hierarchical).

v2: queue-sysfs.txt updated and patch description udpated as suggested
    by Vivek.

v3: blk_get_rl() wasn't checking error return from
    blkg_lookup_create() and may cause oops on lookup failure.  Fix it
    by falling back to root_rl on blkg lookup failures.  This problem
    was spotted by Rakesh Iyer <rni@google.com>.

v4: Updated to accomodate 458f27a982 "block: Avoid missed wakeup in
    request waitqueue".  blk_drain_queue() now wakes up waiters on all
    blkg->rl on the target queue.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-06-26 18:42:49 -04:00
..
acpi ACPI: Add _PLD support 2012-05-11 17:03:12 -07:00
asm-generic Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-06-01 10:34:35 -07:00
crypto
drm introduce SIZE_MAX 2012-05-31 17:49:26 -07:00
keys
linux blkcg: implement per-blkg request allocation 2012-06-26 18:42:49 -04:00
math-emu
media [media] patch for Asus My Cinema PS3-100 (1043:48cd) 2012-05-20 16:05:02 -03:00
memory
misc
mtd UBI: amend commentaries WRT dtype 2012-05-20 20:25:59 +03:00
net cipso: handle CIPSO options correctly when NetLabel is disabled 2012-06-01 14:18:29 -04:00
pcmcia
rdma Merge branches 'core', 'cxgb4', 'ipath', 'iser', 'lockdep', 'mlx4', 'nes', 'ocrdma', 'qib' and 'raw-qp' into for-linus 2012-05-21 09:00:47 -07:00
rxrpc
scsi [SCSI] fcoe, bnx2fc, libfcoe: SW FCoE and bnx2fc use FCoE Syfs 2012-05-23 09:43:13 +01:00
sound ASoC: Last minute updates 2012-05-22 02:58:55 +02:00
target target: Add MI_REPORT_TARGET_PGS ext. header + implict_trans_secs attribute 2012-05-17 00:45:58 -07:00
trace mm: vmscan: remove reclaim_mode_t 2012-05-29 16:22:19 -07:00
video fbdev updates for 3.5 2012-06-01 16:57:51 -07:00
xen xen: do not map the same GSI twice in PVHVM guests. 2012-05-21 14:11:36 -04:00
Kbuild