The flag on_st in the bfq_entity data structure is true if the entity
is on a service tree or is in service. Yet the name of the field,
confusingly, does not mention the second, very important case. Extend
the name to mention the second case too.
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In bfq_bfqq_move(), the bfq_queue, say Q, to be moved to a new group
may happen to be deactivated in the scheduling data structures of the
source group (and then activated in the destination group). If Q is
referred only by the data structures in the source group when the
deactivation happens, then Q is freed upon the deactivation.
This commit addresses this issue by getting an extra reference before
the possible deactivation, and releasing this extra reference after Q
has been moved.
Tested-by: Chris Evich <cevich@redhat.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
BFQ maintains an ordered list, implemented with an RB tree, of
head-request positions of non-empty bfq_queues. This position tree,
inherited from CFQ, is used to find bfq_queues that contain I/O close
to each other. BFQ merges these bfq_queues into a single shared queue,
if this boosts throughput on the device at hand.
There is however a special-purpose bfq_queue that does not participate
in queue merging, the oom bfq_queue. Yet, also this bfq_queue could be
wrongly added to the position tree. So bfqq_find_close() could return
the oom bfq_queue, which is a source of further troubles in an
out-of-memory situation. This commit prevents the oom bfq_queue from
being inserted into the position tree.
Tested-by: Patrick Dung <patdung100@gmail.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 478de3380c ("block, bfq: deschedule empty bfq_queues not
referred by any process") fixed commit 3726112ec7 ("block, bfq:
re-schedule empty queues if they deserve I/O plugging") by
descheduling an empty bfq_queue when it remains with not process
reference. Yet, this still left a case uncovered: an empty bfq_queue
with not process reference that remains in service. This happens for
an in-service sync bfq_queue that is deemed to deserve I/O-dispatch
plugging when it remains empty. Yet no new requests will arrive for
such a bfq_queue if no process sends requests to it any longer. Even
worse, the bfq_queue may happen to be prematurely freed while still in
service (because there may remain no reference to it any longer).
This commit solves this problem by preventing I/O dispatch from being
plugged for the in-service bfq_queue, if the latter has no process
reference (the bfq_queue is then prevented from remaining in service).
Fixes: 3726112ec7 ("block, bfq: re-schedule empty queues if they deserve I/O plugging")
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Reported-by: Patrick Dung <patdung100@gmail.com>
Tested-by: Patrick Dung <patdung100@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl4vOqAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgppYBD/wLczY7hyjF2loc71MC9HloUq3BVbATktM3
OF6wRbyxbeiOj/7Px0lE0M67tQbnEoIP26gS03fd6e7HE19//gmzGuB3Z2R2CJ5q
XKkTamqz0pcPcX5FdDO5JFQZf27/1Qs3g7Nkr7FjVcR2XQ8PFv5B/FLMhse4frJI
k92Sj0V1OwdNtMXozKqno/7xPwL/kQKWoF6aFDgO27xLfsFmi8Wbgf/CslOOTHIN
vAUaz3Cue6V17M5y98wD4nwpjG7Ve+aY1i6oFPBE7Az9TA0xoiBA/tNPKW7iS10C
GEP1aoI6lpgkxAzvyR29K1ayjzV11hEIig3rNIWxNfmCSGaawttWXAPEi7jU5u2D
ZXbzUJxKnfeg8yrAj0CTcKLA9i4v1cZXPCUXqMO2+wHEWgmxq2IWuWjSl/V4fn3Y
zgTPBngDM4Gx3fAqvD8SVfCW7xwI4VRP+da58WCFOjwnOgYSouxS7RnCtm+yPUbk
Es6m2XBb+3ycaJPT58LcXPrnTJWZeRincs3MfFJeTXRn5T7IzlBjKdIvQiQSHQXo
caZzWHEJW827+wfQFNreXpk5KPi+D6boeziYe96UcII8L5qVw3N0X5hOpr6IRhkX
hn2CUb/CmY6bl8PJJPVc4ygqgiavyvynJu+A0uJvFSjvXX6jjXNEsSJ6bz8aBxdm
4rmgPFTlqA==
=yJZi
-----END PGP SIGNATURE-----
Merge tag 'for-5.6/block-2020-01-27' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
"This may be the most quiet round we've had in years. I'm not
complaining. Really not a lot to detail here, outside of spelling and
documentation improvements/fixes, we have:
- Allow t10-pi to be modular (Herbert)
- Remove dead code in bfq (Alex)
- Mark zone management requests with REQ_SYNC (Chaitanya)
- BFQ division improvement (Wen)
- Small series improving plugging (Pavel)"
* tag 'for-5.6/block-2020-01-27' of git://git.kernel.dk/linux-block:
partitions/ldm: fix spelling mistake "to" -> "too"
block, bfq: improve arithmetic division in bfq_delta()
block/bfq: remove unused bfq_class_rt which never used
block: mark zone-mgmt bios with REQ_SYNC
blk-mq: Document functions for sending request
block: Allow t10-pi to be modular
blk-mq: optimise blk_mq_flush_plug_list()
list: introduce list_for_each_continue()
blk-mq: optimise rq sort function
Host-aware SMR drives can be used with the commands to explicitly manage
zone state, but they can also be used as normal disks. In the former
case it makes perfect sense to allow partitions on them, in the latter
it does not, just like for host managed devices. Add a check to
add_partition to allow partitions on host aware devices, but give
up any zone management capabilities in that case, which also catches
the previously missed case of adding a partition vs just scanning it.
Because sd can rescan the attribute at runtime it needs to check if
a disk has partitions, for which a new helper is added to genhd.h.
Fixes: 5eac3eb30c ("block: Remove partition support for zoned block devices")
Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is a spelling mistake in a ldm_error message. Fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
do_div() does a 64-by-32 division. Use div64_ul() instead of it
if the divisor is unsigned long, to avoid truncation to 32-bit.
And as a nice side effect also cleans up the function a bit.
Signed-off-by: Wen Yang <wenyang@linux.alibaba.com>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Jens Axboe <axboe@fb.com>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This macro is never used after introduced from commit aee69d78de
("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
Better to remove it.
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Logical block size has type unsigned short. That means that it can be at
most 32768. However, there are architectures that can run with 64k pages
(for example arm64) and on these architectures, it may be possible to
create block devices with 64k block size.
For exmaple (run this on an architecture with 64k pages):
Mount will fail with this error because it tries to read the superblock using 2-sector
access:
device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
EXT4-fs (dm-0): unable to read superblock
This patch changes the logical block size from unsigned short to unsigned
int to avoid the overflow.
Cc: stable@vger.kernel.org
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 429120f3df starts to take account of segment's start dma address
when computing max segment size, and data type of 'unsigned long'
is used to do that. However, the segment mask may be 0xffffffff, so
the figured out segment size may be overflowed in case of zero physical
address on 32bit arch.
Fix the issue by returning queue_max_segment_size() directly when that
happens.
Fixes: 429120f3df ("block: fix splitting segments on boundary masks")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: Christoph Hellwig <hch@lst.de>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 85a8ce62c2 ("block: add bio_truncate to fix guard_bio_eod")
adds bio_truncate() for handling bio EOD. However, bio_truncate()
doesn't use the passed 'op' parameter from guard_bio_eod's callers.
So bio_trunacate() may retrieve wrong 'op', and zering pages may
not be done for READ bio.
Fixes this issue by moving guard_bio_eod() after bio_set_op_attrs()
in submit_bh_wbc() so that bio_truncate() can always retrieve correct
op info.
Meantime remove the 'op' parameter from guard_bio_eod() because it isn't
used any more.
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Fixes: 85a8ce62c2 ("block: add bio_truncate to fix guard_bio_eod")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Fold in kerneldoc and bio_op() change.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In the current implementation, final zone-mgmt request is issued with
submit_bio_wait() which marks the bio REQ_SYNC. This is needed since
immediate action is expected for zone-mgmt requests as these are
blocking operations. This also bypasses the scheduler in the
blk_mq_make_request() and dispatches the request directly into the
hw ctx.
This patch marks all the chained bios REQ_SYNC so that we can have
above-mentioned behavior for non-final bios also.
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add or improve documentation for function regarding creating and sending
IO requests to the hardware.
Signed-off-by: André Almeida <andrealmeid@collabora.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Currently t10-pi can only be built into the block layer which via
crc-t10dif pulls in a whole chunk of the Crypto API. In fact all
users of t10-pi work as modules and there is no reason for it to
always be built-in.
This patch adds a new hidden option for t10-pi that is selected
automatically based on BLK_DEV_INTEGRITY and whether the users
of t10-pi are built-in or not.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We ran into a problem with a mpt3sas based controller, where we would
see random (and hard to reproduce) file corruption). The issue seemed
specific to this controller, but wasn't specific to the file system.
After a lot of debugging, we find out that it's caused by segments
spanning a 4G memory boundary. This shouldn't happen, as the default
setting for segment boundary masks is 4G.
Turns out there are two issues in get_max_segment_size():
1) The default segment boundary mask is bypassed
2) The segment start address isn't taken into account when checking
segment boundary limit
Fix these two issues by removing the bypass of the segment boundary
check even if the mask is set to the default value, and taking into
account the actual start address of the request when checking if a
segment needs splitting.
Cc: stable@vger.kernel.org # v5.1+
Reviewed-by: Chris Mason <clm@fb.com>
Tested-by: Chris Mason <clm@fb.com>
Fixes: dcebd75592 ("block: use bio_for_each_bvec() to compute multi-page bvec count")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Dropped const on the page pointer, ppc page_to_phys() doesn't mark the
page as const...
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Some filesystem, such as vfat, may send bio which crosses device boundary,
and the worse thing is that the IO request starting within device boundaries
can contain more than one segment past EOD.
Commit dce30ca9e3 ("fs: fix guard_bio_eod to check for real EOD errors")
tries to fix this issue by returning -EIO for this situation. However,
this way lets fs user code lose chance to handle -EIO, then sync_inodes_sb()
may hang for ever.
Also the current truncating on last segment is dangerous by updating the
last bvec, given bvec table becomes not immutable any more, and fs bio
users may not retrieve the truncated pages via bio_for_each_segment_all() in
its .end_io callback.
Fixes this issue by supporting multi-segment truncating. And the
approach is simpler:
- just update bio size since block layer can make correct bvec with
the updated bio size. Then bvec table becomes really immutable.
- zero all truncated segments for read bio
Cc: Carlos Maiolino <cmaiolino@redhat.com>
Cc: linux-fsdevel@vger.kernel.org
Fixed-by: dce30ca9e3 ("fs: fix guard_bio_eod to check for real EOD errors")
Reported-by: syzbot+2b9e54155c8c25d8d165@syzkaller.appspotmail.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
These were added to blkdev_ioctl() in linux-5.5 but not
blkdev_compat_ioctl, so add them now.
Cc: <stable@vger.kernel.org> # v4.4+
Fixes: bbd3e06436 ("block: add an API for Persistent Reservations")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fold in followup patch from Arnd with missing pr.h header include.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
These were added to blkdev_ioctl() in linux-5.5 but not
blkdev_compat_ioctl, so add them now.
Fixes: e876df1fe0 ("block: add zone open, close and finish ioctl support")
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
These were added to blkdev_ioctl() in v4.20 but not blkdev_compat_ioctl,
so add them now.
Cc: <stable@vger.kernel.org> # v4.20+
Fixes: 72cd87576d ("block: Introduce BLKGETZONESZ ioctl")
Fixes: 65e4e3eee8 ("block: Introduce BLKGETNRZONES ioctl")
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
These were added to blkdev_ioctl() but not blkdev_compat_ioctl,
so add them now.
Cc: <stable@vger.kernel.org> # v4.10+
Fixes: 3ed05a987e ("blk-zoned: implement ioctls")
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When I doing fuzzy test, get the memleak report:
BUG: memory leak
unreferenced object 0xffff88837af80000 (size 4096):
comm "memleak", pid 3557, jiffies 4294817681 (age 112.499s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
20 00 00 00 10 01 00 00 00 00 00 00 01 00 00 00 ...............
backtrace:
[<000000001c894df8>] bio_alloc_bioset+0x393/0x590
[<000000008b139a3c>] bio_copy_user_iov+0x300/0xcd0
[<00000000a998bd8c>] blk_rq_map_user_iov+0x2f1/0x5f0
[<000000005ceb7f05>] blk_rq_map_user+0xf2/0x160
[<000000006454da92>] sg_common_write.isra.21+0x1094/0x1870
[<00000000064bb208>] sg_write.part.25+0x5d9/0x950
[<000000004fc670f6>] sg_write+0x5f/0x8c
[<00000000b0d05c7b>] __vfs_write+0x7c/0x100
[<000000008e177714>] vfs_write+0x1c3/0x500
[<0000000087d23f34>] ksys_write+0xf9/0x200
[<000000002c8dbc9d>] do_syscall_64+0x9f/0x4f0
[<00000000678d8e9a>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
If __blk_rq_map_user_iov() is failed in blk_rq_map_user_iov(),
the bio(s) which is allocated before this failing will leak. The
refcount of the bio(s) is init to 1 and increased to 2 by calling
bio_get(), but __blk_rq_unmap_user() only decrease it to 1, so
the bio cannot be freed. Fix it by calling blk_rq_unmap_user().
Reviewed-by: Bob Liu <bob.liu@oracle.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Avoid that running test nvme/012 from the blktests suite triggers the
following false positive lockdep complaint:
============================================
WARNING: possible recursive locking detected
5.0.0-rc3-xfstests-00015-g1236f7d60242 #841 Not tainted
--------------------------------------------
ksoftirqd/1/16 is trying to acquire lock:
000000000282032e (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0
but task is already holding lock:
00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&fq->mq_flush_lock)->rlock);
lock(&(&fq->mq_flush_lock)->rlock);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by ksoftirqd/1/16:
#0: 00000000cbadcbc2 (&(&fq->mq_flush_lock)->rlock){..-.}, at: flush_end_io+0x4e/0x1d0
stack backtrace:
CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.0.0-rc3-xfstests-00015-g1236f7d60242 #841
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
dump_stack+0x67/0x90
__lock_acquire.cold.45+0x2b4/0x313
lock_acquire+0x98/0x160
_raw_spin_lock_irqsave+0x3b/0x80
flush_end_io+0x4e/0x1d0
blk_mq_complete_request+0x76/0x110
nvmet_req_complete+0x15/0x110 [nvmet]
nvmet_bio_done+0x27/0x50 [nvmet]
blk_update_request+0xd7/0x2d0
blk_mq_end_request+0x1a/0x100
blk_flush_complete_seq+0xe5/0x350
flush_end_io+0x12f/0x1d0
blk_done_softirq+0x9f/0xd0
__do_softirq+0xca/0x440
run_ksoftirqd+0x24/0x50
smpboot_thread_fn+0x113/0x1e0
kthread+0x121/0x140
ret_from_fork+0x3a/0x50
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch fixes the following sparse warnings:
block/bsg-lib.c:269:19: warning: incorrect type in initializer (different base types)
block/bsg-lib.c:269:19: expected int sts
block/bsg-lib.c:269:19: got restricted blk_status_t [usertype]
block/bsg-lib.c:286:16: warning: incorrect type in return expression (different base types)
block/bsg-lib.c:286:16: expected restricted blk_status_t
block/bsg-lib.c:286:16: got int [assigned] sts
Cc: Martin Wilck <mwilck@suse.com>
Fixes: d46fe2cb2d ("block: drop device references in bsg_queue_rq()")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of using list_del_init() in a loop, that generates a lot of
unnecessary memory read/writes, iterate from the first request of a
batch and cut out a sublist with list_cut_before().
Apart from removing the list node initialisation part, this is more
register-friendly, and the assembly uses the stack less intensively.
list_empty() at the beginning is done with hope, that the compiler can
optimise out the same check in the following list_splice_init().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Check "!=" in multi-layer comparisons. The same memory usage, fewer
instructions, and 2 from 4 jumps are replaced with SETcc.
Note, that list_sort() doesn't differ 0 and <0.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Non-mq devs do not honor REQ_NOWAIT so give a chance to the caller to repeat
request gracefully on -EAGAIN error.
The problem is well reproduced using io_uring:
mkfs.ext4 /dev/ram0
mount /dev/ram0 /mnt
# Preallocate a file
dd if=/dev/zero of=/mnt/file bs=1M count=1
# Start fio with io_uring and get -EIO
fio --rw=write --ioengine=io_uring --size=1M --direct=1 --name=job --filename=/mnt/file
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When over-budget IOs are force-issued through root cgroup,
iocg_kick_delay() adjusts the async delay accordingly but doesn't
actually schedule async throttle for the issuing task. This bug is
pretty well masked because sooner or later the offending threads are
gonna get directly throttled on regular IOs or have async delay
scheduled by mem_cgroup_throttle_swaprate().
However, it can affect control quality on filesystem metadata heavy
operations. Let's fix it by invoking blkcg_schedule_throttle() when
iocg_kick_delay() says async delay is needed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 7caa47151a ("blkcg: implement blk-iocost")
Cc: stable@vger.kernel.org
Reported-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3y54EQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpqJuD/93LZmzS5UEWrNLkRaAaCyAy40MPxuXRZEp
42yk7cvAT4OcCr+W6nkAgG6IHGRXOz8QvOzt0P5/HfugpNlB2oz5a/6+TiTtcZTt
YNt0Z4yuBMU5SXIIxc3lUMcJGxslzOr+L+9ZXD4u5UqIdG1fSrECAexSCrlmmTwu
Fx02TakDc/bbUYDfLAQD1+/Z066rp1ZWDkjXqA4kUvbFzt8F7qEOc1Evq47SuR7d
Iw0bM3LVASXwTq2lRc1bFFL2glku6wwkccjwdyjSrQmK4+8LhF396fQGtXuj0Mrs
OzuWhaOoGhan7dpj1D8e4tqugflQy9rv9bcy6Z9PjBY+VauuFdgPr3iFcwPaPbXm
17ir4y7xJJxXlhZl/Bn06KIB2h+nLWDIaundFys5JnMmTiZvWIgSJ6Q3gWtMxgfH
zWZLMw/UtRAmjHhLqvGsMaBTfgKX5ATpMbfGeZeXheVtVaOgGTunXunT56o7oRHB
q4XWZqbydsYyHBUhgSzhBr03i67wbotxtebqg9VZ0UD8XM4iM8Kor/DleK03oUqD
DsltKF66NAGNeOcV3TNzJuXHyF6S/vZdO7JdFHY29+pdljoTj5GB88+W9CbhwQRe
WiKVpq7sAe/bh0wtqrD+QCByjSNSVU62kVgRhfqms47804j/vNqNvOKaC5UWTd0I
2LG4jfSbeg==
=hmxJ
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20191212' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- stable fix for the bi_size overflow. Not a corruption issue, but a
case wher we could merge but disallowed (Andreas)
- NVMe pull request via Keith, with various fixes.
- MD pull request from Song.
- Merge window regression fix for the rq passthrough stats (Logan)
- Remove unused blkcg_drain_queue() function (Guoqing)
* tag 'for-linus-20191212' of git://git.kernel.dk/linux-block:
blk-cgroup: remove blkcg_drain_queue
block: fix NULL pointer dereference in account statistics with IDE
md: make sure desc_nr less than MD_SB_DISKS
md: raid1: check rdev before reference in raid1_sync_request func
raid5: need to set STRIPE_HANDLE for batch head
block: fix "check bi_size overflow before merge"
nvme/pci: Fix read queue count
nvme/pci Limit write queue sizes to possible cpus
nvme/pci: Fix write and poll queue types
nvme/pci: Remove last_cq_head
nvme: Namepace identification descriptor list is optional
nvme-fc: fix double-free scenarios on hw queues
nvme: else following return is not needed
nvme: add error message on mismatching controller ids
nvme_fc: add module to ops template to allow module references
nvmet-loop: Avoid preallocating big SGL for data
nvme-fc: Avoid preallocating big SGL for data
nvme-rdma: Avoid preallocating big SGL for data
Since blk_drain_queue had already been removed, so this function
is not needed anymore.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The IDE driver creates some passthru requests which never get
submitted to the block layer in such a way that blk_account_io_start()
gets called. However, the driver still calls __blk_mq_end_request() in
ide_end_rq() which will call blk_account_io_completion() which tries
to dereferences req->part which is never set. See ide_prep_sense() for
an example of where these requests come from.
To fix this, blk_account_io_completion() and blk_account_io_done()
should do nothing if req->part is not set.
The back trace of this bug is:
BUG: kernel NULL pointer dereference, address: 000002ac
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
*pde = 00000000
Oops: 0002 [#1]
CPU: 0 PID: 237 Comm: kworker/0:1H Not tainted
5.4.0-rc2-00011-g48d9b0d43105e #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1
04/01/2014
Workqueue: kblockd drive_rq_insert_work
EIP: blk_account_io_completion+0x7a/0xf0
Code: 89 54 24 08 31 d2 89 4c 24 04 31 c9 c7 04 24 02 00 00 00 c1 ee
09 e8 f5 21 a6 ff e8 70 5c a7 ff 8b 53 60 8d 04 bd 00 00 00 00 <01> b4
02 ac 02 00 00 8b 9a 88 02 00 00 85 db 74 11 85 d2 74 51 8b
EAX: 00000000 EBX: f5b80000 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: f3031e70 ESP: f3031e54
DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010046
CR0: 80050033 CR2: 000002ac CR3: 03c25000 CR4: 000406d0
Call Trace:
<IRQ>
blk_update_request+0x85/0x420
ide_end_rq+0x38/0xa0
ide_complete_rq+0x3d/0x70
cdrom_newpc_intr+0x258/0xba0
ide_intr+0x135/0x250
__handle_irq_event_percpu+0x3e/0x250
handle_irq_event_percpu+0x1f/0x50
handle_irq_event+0x32/0x60
handle_level_irq+0x6c/0x110
handle_irq+0x72/0xa0
</IRQ>
do_IRQ+0x45/0xad
common_interrupt+0x115/0x11c
Fixes: 48d9b0d431 ("block: account statistics for passthrough requests")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This partially reverts commit e3a5d8e386.
Commit e3a5d8e386 ("check bi_size overflow before merge") adds a bio_full
check to __bio_try_merge_page. This will cause __bio_try_merge_page to fail
when the last bi_io_vec has been reached. Instead, what we want here is only
the bi_size overflow check.
Fixes: e3a5d8e386 ("block: check bi_size overflow before merge")
Cc: stable@vger.kernel.org # v5.4+
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except
at places where these are defined. Later patches will remove the unused
definition of FIELD_SIZEOF().
This patch is generated using following script:
EXCLUDE_FILES="include/linux/stddef.h|include/linux/kernel.h"
git grep -l -e "\bFIELD_SIZEOF\b" | while read file;
do
if [[ "$file" =~ $EXCLUDE_FILES ]]; then
continue
fi
sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file;
done
Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com>
Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com
Co-developed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David Miller <davem@davemloft.net> # for net
7c20f11680 ("bio-integrity: stop abusing bi_end_io") moves
bio_integrity_free from bio_uninit() to bio_integrity_verify_fn()
and bio_endio(). This way looks wrong because bio may be freed
without calling bio_endio(), for example, blk_rq_unprep_clone() is
called from dm_mq_queue_rq() when the underlying queue of dm-mpath
is busy.
So memory leak of bio integrity data is caused by commit 7c20f11680.
Fixes this issue by re-adding bio_integrity_free() to bio_uninit().
Fixes: 7c20f11680 ("bio-integrity: stop abusing bi_end_io")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by Justin Tee <justin.tee@broadcom.com>
Add commit log, and simplify/fix the original patch wroten by Justin.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio->bi_blkg will be NULL when the issue of the request
has bypassed the block layer as shown in the following oops:
Internal error: Oops: 96000005 [#1] SMP
CPU: 17 PID: 2996 Comm: scsi_id Not tainted 5.4.0 #4
Call trace:
percpu_counter_add_batch+0x38/0x4c8
bfqg_stats_update_legacy_io+0x9c/0x280
bfq_insert_requests+0xbac/0x2190
blk_mq_sched_insert_request+0x288/0x670
blk_execute_rq_nowait+0x140/0x178
blk_execute_rq+0x8c/0x140
sg_io+0x604/0x9c0
scsi_cmd_ioctl+0xe38/0x10a8
scsi_cmd_blk_ioctl+0xac/0xe8
sd_ioctl+0xe4/0x238
blkdev_ioctl+0x590/0x20e0
block_ioctl+0x60/0x98
do_vfs_ioctl+0xe0/0x1b58
ksys_ioctl+0x80/0xd8
__arm64_sys_ioctl+0x40/0x78
el0_svc_handler+0xc4/0x270
so ensure its validity before using it.
Fixes: fd41e60331 ("bfq-iosched: stop using blkg->stat_bytes and ->stat_ios")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The current zone revalidation code has a major problem in that it
doesn't update the zone size and q->nr_zones atomically, leading
to a short window where an out of bounds access to the zone arrays
is possible.
To fix this move the setting of the zone size into the crticial
sections blk_revalidate_disk_zones so that it gets updated together
with the zone bitmaps and q->nr_zones. This also slightly simplifies
the caller as it deducts the zone size from the report_zones.
This change also allows to check for a power of two zone size in generic
code.
Reported-by: Hans Holmberg <hans@owltronix.com>
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
bio based drivers only need to update q->nr_zones. Do that manually
instead of overloading blk_revalidate_disk_zones to keep that function
simpler for the next round of changes that will rely even more on the
request based functionality.
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Allocate the conventional zone bitmap and the sequential zone locking
bitmap only when we find a zone of the respective type. This avoids
wasting memory on the conventional zone bitmap for devices that only
have sequential zones, and will also prepare for other future changes.
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Invert the meaning of seq_zones_bitmap by keeping a bitmap of
conventional zones. This allows not having a bitmap for devices
that do not have conventional zones.
Reviewed-by: Javier González <javier@javigon.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Simplify the arguments to blkdev_nr_zones by passing a gendisk instead
of the block_device and capacity. This also removes the need for
__blkdev_nr_zones as all callers are outside the fast path and can
deal with the additional branch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
As part of the cleanup of some remaining y2038 issues, I came to
fs/compat_ioctl.c, which still has a couple of commands that need support
for time64_t.
In completely unrelated work, I spent time on cleaning up parts of this
file in the past, moving things out into drivers instead.
After Al Viro reviewed an earlier version of this series and did a lot
more of that cleanup, I decided to try to completely eliminate the rest
of it and move it all into drivers.
This series incorporates some of Al's work and many patches of my own,
but in the end stops short of actually removing the last part, which is
the scsi ioctl handlers. I have patches for those as well, but they need
more testing or possibly a rewrite.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJdsHCdAAoJEJpsee/mABjZtYkP/1JGl3jFv3Iq/5BCdPkaePP1
RtMJRNfURgK3GeuHUui330PvVjI/pLWXU/VXMK2MPTASpJLzYz3uCaZrpVWEMpDZ
+ImzGmgJkITlW1uWU3zOcQhOxTyb1hCZ0Ci+2xn9QAmyOL7prXoXCXDWv3h6iyiF
lwG+nW+HNtyx41YG+9bRfKNoG0ZJ+nkJ70BV6u0acQHXWn7Xuupa9YUmBL87hxAL
6dlJfLTJg6q8QSv/Q6LxslfWk2Ti8OOJZOwtFM5R8Bgl0iUcvshiRCKfv/3t9jXD
dJNvF1uq8z+gracWK49Qsfq5dnZ2ZxHFUo9u0NjbCrxNvWH/sdvhbaUBuJI75seH
VIznCkdxFhrqitJJ8KmxANxG08u+9zSKjSlxG2SmlA4qFx/AoStoHwQXcogJscNb
YIXYKmWBvwPzYu09QFAXdHFPmZvp/3HhMWU6o92lvDhsDwzkSGt3XKhCJea4DCaT
m+oCcoACqSWhMwdbJOEFofSub4bY43s5iaYuKes+c8O261/Dwg6v/pgIVez9mxXm
TBnvCsotq5m8wbwzv99eFqGeJH8zpDHrXxEtRR5KQqMqjLq/OQVaEzmpHZTEuK7n
e/V/PAKo2/V63g4k6GApQXDxnjwT+m0aWToWoeEzPYXS6KmtWC91r4bWtslu3rdl
bN65armTm7bFFR32Avnu
=lgCl
-----END PGP SIGNATURE-----
Merge tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground
Pull removal of most of fs/compat_ioctl.c from Arnd Bergmann:
"As part of the cleanup of some remaining y2038 issues, I came to
fs/compat_ioctl.c, which still has a couple of commands that need
support for time64_t.
In completely unrelated work, I spent time on cleaning up parts of
this file in the past, moving things out into drivers instead.
After Al Viro reviewed an earlier version of this series and did a lot
more of that cleanup, I decided to try to completely eliminate the
rest of it and move it all into drivers.
This series incorporates some of Al's work and many patches of my own,
but in the end stops short of actually removing the last part, which
is the scsi ioctl handlers. I have patches for those as well, but they
need more testing or possibly a rewrite"
* tag 'compat-ioctl-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (42 commits)
scsi: sd: enable compat ioctls for sed-opal
pktcdvd: add compat_ioctl handler
compat_ioctl: move SG_GET_REQUEST_TABLE handling
compat_ioctl: ppp: move simple commands into ppp_generic.c
compat_ioctl: handle PPPIOCGIDLE for 64-bit time_t
compat_ioctl: move PPPIOCSCOMPRESS to ppp_generic
compat_ioctl: unify copy-in of ppp filters
tty: handle compat PPP ioctls
compat_ioctl: move SIOCOUTQ out of compat_ioctl.c
compat_ioctl: handle SIOCOUTQNSD
af_unix: add compat_ioctl support
compat_ioctl: reimplement SG_IO handling
compat_ioctl: move WDIOC handling into wdt drivers
fs: compat_ioctl: move FITRIM emulation into file systems
gfs2: add compat_ioctl support
compat_ioctl: remove unused convert_in_user macro
compat_ioctl: remove last RAID handling code
compat_ioctl: remove /dev/raw ioctl translation
compat_ioctl: remove PCI ioctl translation
compat_ioctl: remove joystick ioctl translation
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3YA5sQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgplFxEACM7CwrWsullPX6b3j62NW6VepU5JQdzwVW
S+bmLpb8Z2I4wzEnaVuWAY5hEhGaS9NFtQLdBG0W0YOzH7sweNmL38dZfCE4+oFj
ZwytpXQQhAQUwkgANJCpNfzDymHduPsTz7RYqRr1plmhna1KC/dnhuMwg8lVOBf5
myWjqcCHxxoQn6KFqcX9/Azz29ZrgzV28lOnZdiw9yoTjraBmS/ymx4woaa3pc2v
UNw0Cgx53vHENJzEL9FNSxc0ENZq/bQhpDolnc2AlPGy9+vPg4afMitJb60KTT7r
HpDcLGkYAIKLrfk8DUmFW8lZhWsxTchXvK2+zwQV7nXMcdUgGN/G3HTIdvWEHFv8
oGbPB8cfdA2vNC9QAybwWEum/S0H/GfYsBVplNCUCdFXE7yj1cbKD5dPfCyIvmPz
BjgMae5vH/KoH+vNdZ8NL5oFz2eFC3rLxa/Ss78pcEoBdiiV3WQHPv9MBmn/OQ/v
CeUAM7omyWpbv3lcByNzIOkeeO3m6Ne28EpEMc2pzLnDPu2btvSyetdO488DE+7O
MNfApZULVX91W7jWnhM5GR+1SJTdEXZnoxnFV+J/j4deog5vUR7Dt1VkujpUILfL
7jMl3erF6C53wNrc465z8iLRp1ZM+aTpwatXXRfucNXeomExKK9zF+/+O1ACckUB
jWDCR9NTcw==
=e5Lx
-----END PGP SIGNATURE-----
Merge tag 'for-5.5/disk-revalidate-20191122' of git://git.kernel.dk/linux-block
Pull disk revalidation updates from Jens Axboe:
"This continues the work that Jan Kara started to thoroughly cleanup
and consolidate how we handle rescans and revalidations"
* tag 'for-5.5/disk-revalidate-20191122' of git://git.kernel.dk/linux-block:
block: move clearing bd_invalidated into check_disk_size_change
block: remove (__)blkdev_reread_part as an exported API
block: fix bdev_disk_changed for non-partitioned devices
block: move rescan_partitions to fs/block_dev.c
block: merge invalidate_partitions into rescan_partitions
block: refactor rescan_partitions
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3YAiAQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpsJRD/wNfUGWVdIckw7iiFNuuipKBEy0Nd2VLt0B
I+pVW/YjDsG2oxWXWPs5Nxc7ca2A8EzRXcWP0xEjBfOCcBh/9mULi1flkLRoWKcq
v/OuTVif3ATvgJcwNkbMcoi0bYA/VwKi2dWC6ALhDDmZhyMTLeE362oIeOUNNnl6
GM8CGZHaRfmBzcH5t+WnxiS6rBlt5iwFJ35EvZo3GMXGGiLGlryxEXPAwZrf4haA
Z4atNinKcNXhb80LWHo23aK3bpnaumwKP4BPuLEyvnjS4iU8SeYTXy+w5yq1BE+h
HBP5s3no/mPiBAG8b6EZXqOJUGlN596AQfNLu7vCR78tmImZF0jKRFsHEAaKXf+B
1yRgZi7J+gV0qzK/Ufulg43vItk5/sTzEuV9YLfCpKTr14MFcWw908BAqaI5Kk1K
e8uGqnb2KbZOLTW4QdPvpWg3eYtqEoluSoZUQ5elHxqQZ4MSZ1lK78FF1TeaW/pw
sYH+v6rsWoVjEcFSwGoaaOMravzU4MKtavNAZrTJwKZx7qCqkwmi3R1k8WF6KsSV
rTRAzUC1wpTdSOm1MYPMMKM/h5+BJRSJ/RjljOF4fXLnvpD5q0lequCWjrrEzc6c
HPRKIgSBq7S620A19QD8UxwvZJ8bOivESqr0bux29v1Vpf7vJBrRMng8nLUrXfJs
jdma5mK1UA==
=/G9l
-----END PGP SIGNATURE-----
Merge tag 'for-5.5/zoned-20191122' of git://git.kernel.dk/linux-block
Pull zoned block device update from Jens Axboe:
"Enhancements and improvements to the zoned device support"
* tag 'for-5.5/zoned-20191122' of git://git.kernel.dk/linux-block:
scsi: sd_zbc: Remove set but not used variable 'buflen'
block: rework zone reporting
scsi: sd_zbc: Cleanup sd_zbc_alloc_report_buffer()
null_blk: Add zone_nr_conv to features
null_blk: clean up report zones
null_blk: clean up the block device operations
block: Remove partition support for zoned block devices
block: Simplify report zones execution
block: cleanup the !zoned case in blk_revalidate_disk_zones
block: Enhance blk_revalidate_disk_zones()
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl3WxrEQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpuH5D/9qQKfIIuQDUNO4Xx+dIHimTDCrfiEOeO9e
CRaMuSj+yMxLDMwfX8RnDmR17H3ZVoiIY1CT24U9ZkA5iDjeAH4xmzkH30US7LR7
/64YVZTxB0OrWppRK8RiIhaJJZDQ6+HPUQsn6PRaLVuFHi2unMoTQnj/ZQKz03QA
Pl8Xx7qBtH1JwYCzQ21f/uryAcNg9eWabRLN2f1uiOXLmvRxOfh6Z/iaezlaZlmL
qeJdcdLjjvOgOPwEOfNjfS6pd+XBz3gdEhn0l+11nHITxWZmVBwsWTKyUQlCmKnl
yuCWDVyx5d6zCnlrLYG0l2Fn2lr9SwAkdkq3YAKV03hA/6s6P9q9bm31VvOf828x
7gmr4YVz68y7H9bM0QAHCvDpjll0aIEUw6XFzSOCDtZ9B6/pppYQWzMU71J05eyF
8DOKv2M2EVNLUjf6u0RDyolnWGU0kIjt5ryWE3OsGcezAVa2wYstgUJTKbrn1YgT
j+4KTpaI+sg8GKDFauvxcSa6gwoRp6jweFNW+7vC090/shXmrGmVLOnQZKRuHho/
O4W8y/1/deM8CCIAETpiNxA8RV5U/EZygrFGDFc7yzTtVDGHY356M/B4Bmm2qkVu
K3WgeZp8Fc0lH0QF6Pp9ZlBkZEpGNCAPVsPkXIsxQXbctftkn3KY//uIubfpFEB1
PpHSicvkww==
=HYYq
-----END PGP SIGNATURE-----
Merge tag 'for-5.5/block-20191121' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
"Due to more granular branches, this one is small and will be followed
with other core branches that add specific features. I meant to just
have a core and drivers branch, but external dependencies we ended up
adding a few more that are also core.
The changes are:
- Fixes and improvements for the zoned device support (Ajay, Damien)
- sed-opal table writing and datastore UID (Revanth)
- blk-cgroup (and bfq) blk-cgroup stat fixes (Tejun)
- Improvements to the block stats tracking (Pavel)
- Fix for overruning sysfs buffer for large number of CPUs (Ming)
- Optimization for small IO (Ming, Christoph)
- Fix typo in RWH lifetime hint (Eugene)
- Dead code removal and documentation (Bart)
- Reduction in memory usage for queue and tag set (Bart)
- Kerneldoc header documentation (André)
- Device/partition revalidation fixes (Jan)
- Stats tracking for flush requests (Konstantin)
- Various other little fixes here and there (et al)"
* tag 'for-5.5/block-20191121' of git://git.kernel.dk/linux-block: (48 commits)
Revert "block: split bio if the only bvec's length is > SZ_4K"
block: add iostat counters for flush requests
block,bfq: Skip tracing hooks if possible
block: sed-opal: Introduce SUM_SET_LIST parameter and append it using 'add_token_u64'
blk-cgroup: cgroup_rstat_updated() shouldn't be called on cgroup1
block: Don't disable interrupts in trigger_softirq()
sbitmap: Delete sbitmap_any_bit_clear()
blk-mq: Delete blk_mq_has_free_tags() and blk_mq_can_queue()
block: split bio if the only bvec's length is > SZ_4K
block: still try to split bio if the bvec crosses pages
blk-cgroup: separate out blkg_rwstat under CONFIG_BLK_CGROUP_RWSTAT
blk-cgroup: reimplement basic IO stats using cgroup rstat
blk-cgroup: remove now unused blkg_print_stat_{bytes|ios}_recursive()
blk-throtl: stop using blkg->stat_bytes and ->stat_ios
bfq-iosched: stop using blkg->stat_bytes and ->stat_ios
bfq-iosched: relocate bfqg_*rwstat*() helpers
block: add zone open, close and finish ioctl support
block: add zone open, close and finish operations
block: Simplify REQ_OP_ZONE_RESET_ALL handling
block: Remove REQ_OP_ZONE_RESET plugging
...
We really don't need this, as the slow path will do the right thing
anyway.
This reverts commit 6952a7f844.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Requests that triggers flushing volatile writeback cache to disk (barriers)
have significant effect to overall performance.
Block layer has sophisticated engine for combining several flush requests
into one. But there is no statistics for actual flushes executed by disk.
Requests which trigger flushes usually are barriers - zero-size writes.
This patch adds two iostat counters into /sys/class/block/$dev/stat and
/proc/diskstats - count of completed flush requests and their total time.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In function 'activate_lsp', rather than hard-coding the short atom
header(0x83), we need to let the function 'add_short_atom_header' append
the header based on the parameter being appended.
The parameter has been defined in Section 3.1.2.1 of
https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage-Opal_Feature_Set_Single_User_Mode_v1-00_r1-00-Final.pdf
Reviewed-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Revanth Rajashekar <revanth.rajashekar@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
trigger_softirq() is always invoked as a SMP-function call which is
always invoked with disables interrupts.
Don't disable interrupt in trigger_softirq() because interrupts are
already disabled.
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>