Commit Graph

66312 Commits

Author SHA1 Message Date
Linus Torvalds
472e5b056f pipe: remove pipe_wait() and fix wakeup race with splice
The pipe splice code still used the old model of waiting for pipe IO by
using a non-specific "pipe_wait()" that waited for any pipe event to
happen, which depended on all pipe IO being entirely serialized by the
pipe lock.  So by checking the state you were waiting for, and then
adding yourself to the wait queue before dropping the lock, you were
guaranteed to see all the wakeups.

Strictly speaking, the actual wakeups were not done under the lock, but
the pipe_wait() model still worked, because since the waiter held the
lock when checking whether it should sleep, it would always see the
current state, and the wakeup was always done after updating the state.

However, commit 0ddad21d3e ("pipe: use exclusive waits when reading or
writing") split the single wait-queue into two, and in the process also
made the "wait for event" code wait for _two_ wait queues, and that then
showed a race with the wakers that were not serialized by the pipe lock.

It's only splice that used that "pipe_wait()" model, so the problem
wasn't obvious, but Josef Bacik reports:

 "I hit a hang with fstest btrfs/187, which does a btrfs send into
  /dev/null. This works by creating a pipe, the write side is given to
  the kernel to write into, and the read side is handed to a thread that
  splices into a file, in this case /dev/null.

  The box that was hung had the write side stuck here [pipe_write] and
  the read side stuck here [splice_from_pipe_next -> pipe_wait].

  [ more details about pipe_wait() scenario ]

  The problem is we're doing the prepare_to_wait, which sets our state
  each time, however we can be woken up either with reads or writes. In
  the case above we race with the WRITER waking us up, and re-set our
  state to INTERRUPTIBLE, and thus never break out of schedule"

Josef had a patch that avoided the issue in pipe_wait() by just making
it set the state only once, but the deeper problem is that pipe_wait()
depends on a level of synchonization by the pipe mutex that it really
shouldn't.  And the whole "wait for any pipe state change" model really
isn't very good to begin with.

So rather than trying to work around things in pipe_wait(), remove that
legacy model of "wait for arbitrary pipe event" entirely, and actually
create functions that wait for the pipe actually being readable or
writable, and can do so without depending on the pipe lock serializing
everything.

Fixes: 0ddad21d3e ("pipe: use exclusive waits when reading or writing")
Link: https://lore.kernel.org/linux-fsdevel/bfa88b5ad6f069b2b679316b9e495a970130416c.1601567868.git.josef@toxicpanda.com/
Reported-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-and-tested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-01 19:14:36 -07:00
Linus Torvalds
90fb702791 autofs: use __kernel_write() for the autofs pipe writing
autofs got broken in some configurations by commit 13c164b1a1
("autofs: switch to kernel_write") because there is now an extra LSM
permission check done by security_file_permission() in rw_verify_area().

autofs is one if the few places that really does want the much more
limited __kernel_write(), because the write is an internal kernel one
that shouldn't do any user permission checks (it also doesn't need the
file_start_write/file_end_write logic, since it's just a pipe).

There are a couple of other cases like that - accounting, core dumping,
and splice - but autofs stands out because it can be built as a module.

As a result, we need to export this internal __kernel_write() function
again.

We really don't want any other module to use this, but we don't have a
"EXPORT_SYMBOL_FOR_AUTOFS_ONLY()".  But we can mark it GPL-only to at
least approximate that "internal use only" for licensing.

While in this area, make autofs pass in NULL for the file position
pointer, since it's always a pipe, and we now use a NULL file pointer
for streaming file descriptors (see file_ppos() and commit 438ab720c6:
"vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files")

This effectively reverts commits 9db9775224 ("fs: unexport
__kernel_write") and 13c164b1a1 ("autofs: switch to kernel_write").

Fixes: 13c164b1a1 ("autofs: switch to kernel_write")
Reported-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-29 17:18:34 -07:00
Linus Torvalds
fb0155a09b NFS client bugfixes for Linux 5.9
Highlights include:
 
 Bugfixes:
 - NFSv4.2: copy_file_range needs to invalidate caches on success
 - NFSv4.2: Fix security label length not being reset
 - pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly on read
 - pNFS/flexfiles: Fix signed/unsigned type issues with mirror indices
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl9yHBYACgkQZwvnipYK
 APLKCA//Sppmzm+kFDmZ6iWplwdoIq7rnIMG7eKKGD754dDvOtYNIw9D9yOIY5G6
 eVdvQ10m6vA8Dp8AxaWK9qacMXljmOX8szz+Bf1NcIe2F6X/waO3zMoud8Rd9Ja4
 PigAbAW6Gs0gohL3wg+jh5N5JlaDcZ0Dri3QWdqGaHjhrKV9MW9h0BpBCx9YCPkL
 FFgk+I+524rGQnkHvCWbclww4428e+MSYdeJE+c4wrIx/HCz3iJ60AFA0SIAw7FV
 6qMtxN4/kqfdIrA074xcreMdkucxe3lNl7ujT1T6dum2OwERq+WyzkwoirqNguJM
 X71CXU9IE8rw72ATWMoba961i4HITp05ZbVg7yXZrrRkAEljyHhr67R/1RRSlxQm
 ZrPOICrCoXKHRFTbNL7Sb+xeTGbuZQkbcwGXnUYdTIO3JQ6PRIEFb/y8yuuT+EPG
 KWk2vM+QM9036qfBWjbAZMOpwDB4oiVkBgzNM8FGcebiV1FANQ1by7oMaQsH1NLm
 WY0M0KFY2wdv3ovGT7oUOEbtoxD993HuuLdIWxTRHFjRPgg8WKTFnf4BIeZtMjY8
 oRvN83hEjWszuTEuuEukUdsqLTftv7rNhxrotoh9WfeSXvJDB6PF0y55UmZ6WuKE
 wRQQLxC9Om+E3HidxgOolqKxD6d4OOY3XJWzH3As7sJEgQyE/5o=
 =cNi/
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

   - NFSv4.2: copy_file_range needs to invalidate caches on success

   - NFSv4.2: Fix security label length not being reset

   - pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly
     on read

   - pNFS/flexfiles: Fix signed/unsigned type issues with mirror
     indices"

* tag 'nfs-for-5.9-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  pNFS/flexfiles: Be consistent about mirror index types
  pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly on read
  NFSv4.2: fix client's attribute cache management for copy_file_range
  nfs: Fix security label length not being reset
2020-09-28 11:05:56 -07:00
Linus Torvalds
692495baa3 io_uring-5.9-2020-09-25
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9upV4QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpuPrD/9K1UQLv38K2nPYclLymOi+GIsukpjgwzdY
 SM38GNXU5vYkFhylH/bXfckNQ/gja0/whNpcr/UVCgTWleMnss9UiZaCgysyuIOL
 vnBxT4yDZIxtkOwF/790NiwV2FrsmrLFdNZU4LkmfbmmrAlNtjOElKyJsOyNNMzJ
 UMzHH2Z1vvUwKz+Yq4fPyZCJbpNHN6ABwkSXY/Nz8oWsKfw728fZztLsP57gOtkl
 yYVFO2z1n7VaWp5ZzVFYG51DFuMCIDXgN6mMlaKfnQ6auQZxjR+jg69HSRKLjIx7
 ZEG1gl/DzwH1+751P7HnuI3U7BtBYolyErHW4j4a6Ko4XX8PPhG9ODKOmsEMPrEq
 gCUGcGgWUsEyz+pyullTEt7ea/oLGJ5N86qtNOdviXETZZTghm47QlzxFWr1/GWy
 wH++ctBf/Ekk0dbCBF6mJiqDl/PrVSDSClTVhsGJESEmk4BOoC9zd9zT/EfsiR9m
 vA8wLE2g1/5oU+irQ0Dlc/EENVWISiigOFFvTPJJjma9iGXAW3kV2/aYW6DKZSwM
 w/va7zTlzt89O+L0AT+Rg8btaiTiaZcs3op1AFa1z5Gut3b5YhWL/e95wlaOI6Nv
 Tudm4GX06BaN1QdUDV9g0Pr5iNOaCvuOArjNOU3j7ySusJxiJ8GdA3WFqJ/XUlIV
 pne8hC/+7A==
 =mBw7
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.9-2020-09-25' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Two fixes for regressions in this cycle, and one that goes to 5.8
  stable:

   - fix leak of getname() retrieved filename

   - remove plug->nowait assignment, fixing a regression with btrfs

   - fix for async buffered retry"

* tag 'io_uring-5.9-2020-09-25' of git://git.kernel.dk/linux-block:
  io_uring: ensure async buffered read-retry is setup properly
  io_uring: don't unconditionally set plug->nowait = true
  io_uring: ensure open/openat2 name is cleaned on cancelation
2020-09-26 11:13:51 -07:00
Jens Axboe
f38c7e3abf io_uring: ensure async buffered read-retry is setup properly
A previous commit for fixing up short reads botched the async retry
path, so we ended up going to worker threads more often than we should.
Fix this up, so retries work the way they originally were intended to.

Fixes: 227c0c9673 ("io_uring: internally retry short reads")
Reported-by: Hao_Xu <haoxu@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25 15:39:13 -06:00
Jens Axboe
62c774ed48 io_uring: don't unconditionally set plug->nowait = true
This causes all the bios to be submitted with REQ_NOWAIT, which can be
problematic on either btrfs or on file systems that otherwise use a mix
of block devices where only some of them support it.

For now, just remove the setting of plug->nowait = true.

Reported-by: Dan Melnic <dmm@fb.com>
Reported-by: Brian Foster <bfoster@redhat.com>
Fixes: b63534c41e ("io_uring: re-issue block requests that failed because of resources")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25 09:01:53 -06:00
Jens Axboe
f3cd485050 io_uring: ensure open/openat2 name is cleaned on cancelation
If we cancel these requests, we'll leak the memory associated with the
filename. Add them to the table of ops that need cleaning, if
REQ_F_NEED_CLEANUP is set.

Cc: stable@vger.kernel.org
Fixes: e62753e4e2 ("io_uring: call statx directly")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25 07:41:46 -06:00
Linus Torvalds
bffac4b543 for-5.9-rc6-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl9q8PUACgkQxWXV+ddt
 WDsHZg//YF3Rfeo7/zaRsfUPvNoKDcM69TW+HROJXu4+rYlOukyuh5T+wboRU1Ft
 7ymiR18idPYbtOczmH1Pqw+3wyOr39WafcvAnndoUguXJHsUrriBNqkthQICt0CG
 hUUiofedaB+j+ti7AYGhF/tkqjd8LkCj8SGEz4cSUFCheIHR+ajFwFmx1Sw6NGJV
 h9SdKfbBpIqIpoExFhprNFlxdaKN9rlhYY+zXZYeCBdU6r89CkuLqxZ79GzaU0N7
 PG7FxuuJXvyHhta2a6p8hnEp7perOG22OTXJhzXd5JXiNCfZ/w4SfhH/aPO/3t5V
 x42hO+FvloVSLS3woZqkBsCgCIe0a3QOT0YxZiM+1cwSgg8mVw4UBEB3PIgkfOVT
 LawMbcgSh1evsSazru8gujm4f8RVxpSxxWfhhRwjXtyB8K89e22yBa9Lwfj04SH7
 O5O7VrLDDnHsQWinsEf4Rl6byA13jUCgI5eUxZ5B7Au0Pm9uMexDh3lvgE0W0ucY
 UvD8qAetu2NNZD68gZp597uHPrwu+Lr+VumIh4wF6doeShlIkbf/d+ntOgW9ey1S
 WFSh7sUdKg5pVf6KJQ4yc3aBA6un5lv9LnvPJOwc9HyMUj/cYuxywxWf1YMr5umv
 7/6CkufYjTAmEERQeqE1I6UIgUiWkS9nIisB8BLbYkrMR9Wi1bs=
 =tFUE
 -----END PGP SIGNATURE-----

Merge tag 'for-5.9-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "syzkaller started to hit us with reports, here's a fix for one type
  (stack overflow when printing checksums on read error).

  The other patch is a fix for sysfs object, we have a test for that and
  it leads to a crash."

* tag 'for-5.9-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix put of uninitialized kobject after seed device delete
  btrfs: fix overflow when copying corrupt csums for a message
2020-09-23 14:32:23 -07:00
Linus Torvalds
805c6d3c19 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "No common topic, just assorted fixes"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fuse: fix the ->direct_IO() treatment of iov_iter
  fs: fix cast in fsparam_u32hex() macro
  vboxsf: Fix the check for the old binary mount-arguments struct
2020-09-22 15:08:41 -07:00
Linus Torvalds
0baca07006 io_uring-5.9-2020-09-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9qLpQQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpk/qD/0dj9STzEMkUsbl2XA5oifF2NVn6VHMidJ3
 Ukdhoy4ihh2UFBFO2VZv2UNZ7o4Zt53TA3ha+fB0EL7I23g86XTOItTWd+JHOGpI
 M11JejYTxcSUzPVrPfd/2PJ/Tqx+ld4ojTxH8noS4hx7FgueSuRR80UU5gfLGAmr
 e7A7vHD8tr9ZoqNcyVVCYa0/80gUbxh1wYOMvqaE6dSPITe96keGKmmk8hRA8kQo
 SBfbZeEqf2oErlM0dTVOd34rZbQQyRuMpDmLuc/g6RNMFVPyBqEvQmGwqOtWNe4q
 RFS9/imQA1Wi1OD15NoDx0C7BGovmT53xfXpnqI3lXzywxSDGhGVQd0E8Udp6zha
 xszrFlQEqS4OFZrHK6B+tnJBFFBZ8jN0K3ZlHpO8QH83OGvyr2k/RokoHFWMTSYh
 +5pHRd+6p7o8traQ6h0MJXmacIxZ0hQdJPuawRjAnziBgRhMV2FMLAXgYHtWl0AD
 wUiBWUEIV9PP0phu78X2TxvB9L7CPjuv7orJ8Q5dBSkQc7i33ESYMe8Mix85CFm+
 SQcazoQE7VLL175TN/FdDDKkBeyAsob9TjeEazb04Vywy0vHW+MGrSOescCBDLF7
 RRDRE0E12Ur9BTVTBi/MJsXT2xtufxN2YU368ZX78RYwgI4r9lx4LZZDte3h9/gs
 xEPXk5vuzg==
 =ImBG
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.9-2020-09-22' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "A few fixes - most of them regression fixes from this cycle, but also
  a few stable heading fixes, and a build fix for the included demo tool
  since some systems now actually have gettid() available"

* tag 'io_uring-5.9-2020-09-22' of git://git.kernel.dk/linux-block:
  io_uring: fix openat/openat2 unified prep handling
  io_uring: mark statx/files_update/epoll_ctl as non-SQPOLL
  tools/io_uring: fix compile breakage
  io_uring: don't use retry based buffered reads for non-async bdev
  io_uring: don't re-setup vecs/iter in io_resumit_prep() is already there
  io_uring: don't run task work on an exiting task
  io_uring: drop 'ctx' ref on task work cancelation
  io_uring: grab any needed state during defer prep
2020-09-22 14:36:50 -07:00
Anand Jain
b5ddcffa37 btrfs: fix put of uninitialized kobject after seed device delete
The following test case leads to NULL kobject free error:

  mount seed /mnt
  add sprout to /mnt
  umount /mnt
  mount sprout to /mnt
  delete seed

  kobject: '(null)' (00000000dd2b87e4): is not initialized, yet kobject_put() is being called.
  WARNING: CPU: 1 PID: 15784 at lib/kobject.c:736 kobject_put+0x80/0x350
  RIP: 0010:kobject_put+0x80/0x350
  ::
  Call Trace:
  btrfs_sysfs_remove_devices_dir+0x6e/0x160 [btrfs]
  btrfs_rm_device.cold+0xa8/0x298 [btrfs]
  btrfs_ioctl+0x206c/0x22a0 [btrfs]
  ksys_ioctl+0xe2/0x140
  __x64_sys_ioctl+0x1e/0x29
  do_syscall_64+0x96/0x150
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f4047c6288b
  ::

This is because, at the end of the seed device-delete, we try to remove
the seed's devid sysfs entry. But for the seed devices under the sprout
fs, we don't initialize the devid kobject yet. So add a kobject state
check, which takes care of the bug.

Fixes: 668e48af7a ("btrfs: sysfs, add devid/dev_state kobject and device attributes")
CC: stable@vger.kernel.org # 5.6+
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-22 15:57:52 +02:00
Jens Axboe
4eb8dded6b io_uring: fix openat/openat2 unified prep handling
A previous commit unified how we handle prep for these two functions,
but this means that we check the allowed context (SQPOLL, specifically)
later than we should. Move the ring type checking into the two parent
functions, instead of doing it after we've done some setup work.

Fixes: ec65fea5a8 ("io_uring: deduplicate io_openat{,2}_prep()")
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-21 07:51:03 -06:00
Jens Axboe
6ca56f8459 io_uring: mark statx/files_update/epoll_ctl as non-SQPOLL
These will naturally fail when attempted through SQPOLL, but either
with -EFAULT or -EBADF. Make it explicit that these are not workable
through SQPOLL and return -EINVAL, just like other ops that need to
use ->files.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-21 07:51:00 -06:00
Jens Axboe
f5cac8b156 io_uring: don't use retry based buffered reads for non-async bdev
Some block devices, like dm, bubble back -EAGAIN through the completion
handler. We check for this in io_read(), but don't honor it for when
we have copied the iov. Return -EAGAIN for this case before retrying,
to force punt to io-wq.

Fixes: bcf5a06304 ("io_uring: support true async buffered reads, if file provides it")
Reported-by: Zorro Lang <zlang@redhat.com>
Tested-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-21 07:50:56 -06:00
Jens Axboe
8f3d749685 io_uring: don't re-setup vecs/iter in io_resumit_prep() is already there
If we already have mapped the necessary data for retry, then don't set
it up again. It's a pointless operation, and we leak the iovec if it's
a large (non-stack) vec.

Fixes: b63534c41e ("io_uring: re-issue block requests that failed because of resources")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-21 07:50:54 -06:00
Johannes Thumshirn
35be8851d1 btrfs: fix overflow when copying corrupt csums for a message
Syzkaller reported a buffer overflow in btree_readpage_end_io_hook()
when loop mounting a crafted image:

  detected buffer overflow in memcpy
  ------------[ cut here ]------------
  kernel BUG at lib/string.c:1129!
  invalid opcode: 0000 [#1] PREEMPT SMP KASAN
  CPU: 1 PID: 26 Comm: kworker/u4:2 Not tainted 5.9.0-rc4-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Workqueue: btrfs-endio-meta btrfs_work_helper
  RIP: 0010:fortify_panic+0xf/0x20 lib/string.c:1129
  RSP: 0018:ffffc90000e27980 EFLAGS: 00010286
  RAX: 0000000000000022 RBX: ffff8880a80dca64 RCX: 0000000000000000
  RDX: ffff8880a90860c0 RSI: ffffffff815dba07 RDI: fffff520001c4f22
  RBP: ffff8880a80dca00 R08: 0000000000000022 R09: ffff8880ae7318e7
  R10: 0000000000000000 R11: 0000000000077578 R12: 00000000ffffff6e
  R13: 0000000000000008 R14: ffffc90000e27a40 R15: 1ffff920001c4f3c
  FS:  0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000557335f440d0 CR3: 000000009647d000 CR4: 00000000001506e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   memcpy include/linux/string.h:405 [inline]
   btree_readpage_end_io_hook.cold+0x206/0x221 fs/btrfs/disk-io.c:642
   end_bio_extent_readpage+0x4de/0x10c0 fs/btrfs/extent_io.c:2854
   bio_endio+0x3cf/0x7f0 block/bio.c:1449
   end_workqueue_fn+0x114/0x170 fs/btrfs/disk-io.c:1695
   btrfs_work_helper+0x221/0xe20 fs/btrfs/async-thread.c:318
   process_one_work+0x94c/0x1670 kernel/workqueue.c:2269
   worker_thread+0x64c/0x1120 kernel/workqueue.c:2415
   kthread+0x3b5/0x4a0 kernel/kthread.c:292
   ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
  Modules linked in:
  ---[ end trace b68924293169feef ]---
  RIP: 0010:fortify_panic+0xf/0x20 lib/string.c:1129
  RSP: 0018:ffffc90000e27980 EFLAGS: 00010286
  RAX: 0000000000000022 RBX: ffff8880a80dca64 RCX: 0000000000000000
  RDX: ffff8880a90860c0 RSI: ffffffff815dba07 RDI: fffff520001c4f22
  RBP: ffff8880a80dca00 R08: 0000000000000022 R09: ffff8880ae7318e7
  R10: 0000000000000000 R11: 0000000000077578 R12: 00000000ffffff6e
  R13: 0000000000000008 R14: ffffc90000e27a40 R15: 1ffff920001c4f3c
  FS:  0000000000000000(0000) GS:ffff8880ae700000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f95b7c4d008 CR3: 000000009647d000 CR4: 00000000001506e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

The overflow happens, because in btree_readpage_end_io_hook() we assume
that we have found a 4 byte checksum instead of the real possible 32
bytes we have for the checksums.

With the fix applied:

[   35.726623] BTRFS: device fsid 815caf9a-dc43-4d2a-ac54-764b8333d765 devid 1 transid 5 /dev/loop0 scanned by syz-repro (215)
[   35.738994] BTRFS info (device loop0): disk space caching is enabled
[   35.738998] BTRFS info (device loop0): has skinny extents
[   35.743337] BTRFS warning (device loop0): loop0 checksum verify failed on 1052672 wanted 0xf9c035fc8d239a54 found 0x67a25c14b7eabcf9 level 0
[   35.743420] BTRFS error (device loop0): failed to read chunk root
[   35.745899] BTRFS error (device loop0): open_ctree failed

Reported-by: syzbot+e864a35d361e1d4e29a5@syzkaller.appspotmail.com
Fixes: d5178578bc ("btrfs: directly call into crypto framework for checksumming")
CC: stable@vger.kernel.org # 5.4+
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-21 12:39:21 +02:00
Tobias Klauser
9ca48e20ec fs/fs-writeback.c: adjust dirtytime_interval_handler definition to match prototype
Commit 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
changed ctl_table.proc_handler to take a kernel pointer.  Adjust the
definition of dirtytime_interval_handler to match its prototype in
linux/writeback.h which fixes the following sparse error/warning:

fs/fs-writeback.c:2189:50: warning: incorrect type in argument 3 (different address spaces)
fs/fs-writeback.c:2189:50:    expected void *
fs/fs-writeback.c:2189:50:    got void [noderef] __user *buffer
fs/fs-writeback.c:2184:5: error: symbol 'dirtytime_interval_handler' redeclared with different type (incompatible argument 3 (different address spaces)):
fs/fs-writeback.c:2184:5:    int extern [addressable] [signed] [toplevel] dirtytime_interval_handler( ... )
fs/fs-writeback.c: note: in included file:
./include/linux/writeback.h:374:5: note: previously declared as:
./include/linux/writeback.h:374:5:    int extern [addressable] [signed] [toplevel] dirtytime_interval_handler( ... )

Fixes: 32927393dc ("sysctl: pass kernel pointers to ->proc_handler")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Link: https://lkml.kernel.org/r/20200907093140.13434-1-tklauser@distanz.ch
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-19 13:13:39 -07:00
Trond Myklebust
b9df46d08a pNFS/flexfiles: Be consistent about mirror index types
A mirror index is always of type u32.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-09-18 09:25:33 -04:00
Trond Myklebust
ee15c7b53e pNFS/flexfiles: Ensure we initialise the mirror bsizes correctly on read
While it is true that reading from an unmirrored source always uses
index 0, that is no longer true for mirrored sources when we fail over.

Fixes: 563c53e73b ("NFS: Fix flexfiles read failover")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-09-18 09:21:10 -04:00
Al Viro
933a3752ba fuse: fix the ->direct_IO() treatment of iov_iter
the callers rely upon having any iov_iter_truncate() done inside
->direct_IO() countered by iov_iter_reexpand().

Reported-by: Qian Cai <cai@redhat.com>
Tested-by: Qian Cai <cai@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-09-17 17:26:56 -04:00
Olga Kornievskaia
16abd2a0c1 NFSv4.2: fix client's attribute cache management for copy_file_range
After client is done with the COPY operation, it needs to invalidate
its pagecache (as it did no reading or writing of the data locally)
and it needs to invalidate it's attributes just like it would have
for a read on the source file and write on the destination file.

Once the linux server started giving out read delegations to
read+write opens, the destination file of the copy_file range
started having delegations and not doing syncup on close of the
file leading to xfstest failures for generic/430,431,432,433,565.

v2: changing cache_validity needs to be protected by the i_lock.

Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Fixes: 2e72448b07 ("NFS: Add COPY nfs operation")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-09-16 12:25:14 -04:00
Jeffrey Mitchell
d33030e2ee nfs: Fix security label length not being reset
nfs_readdir_page_filler() iterates over entries in a directory, reusing
the same security label buffer, but does not reset the buffer's length.
This causes decode_attr_security_label() to return -ERANGE if an entry's
security label is longer than the previous one's. This error, in
nfs4_decode_dirent(), only gets passed up as -EAGAIN, which causes another
failed attempt to copy into the buffer. The second error is ignored and
the remaining entries do not show up in ls, specifically the getdents64()
syscall.

Reproduce by creating multiple files in NFS and giving one of the later
files a longer security label. ls will not see that file nor any that are
added afterwards, though they will exist on the backend.

In nfs_readdir_page_filler(), reset security label buffer length before
every reuse

Signed-off-by: Jeffrey Mitchell <jeffrey.mitchell@starlab.io>
Fixes: b4487b9354 ("nfs: Fix getxattr kernel panic and memory overflow")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-09-16 12:25:14 -04:00
Linus Torvalds
fc4f28bb3d for-5.9-rc5-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl9fju4ACgkQxWXV+ddt
 WDukWw//d3D33s1SnHI2ooub6VllM5xC/VCB5hfpHdeuLfhIR4iFrXlKCwFNuyuF
 A1CgKu7W4eMkQ9E6dKIvYhRQj9ELcHWxS1LRUkuXVGojvPVQAbGhUibjcmTy6xoi
 t4KkvdI0t/puEwRTrF2ooInWzdbiJvzCuCHLWkX4wK0IAkaVt+IK2ZJuSTCZhUEv
 Gx5CSLqP6zq/XD/03yWQvILE3MoM9BOc28EQ345dhEu+SWWpdhzy4W825A8xUe3Q
 bThcK8vJgeLUo/KVtUJNLoFTl+2BpxzaiHWqh+3plGjQ1qtZyDgFhZOoP0JEJb/f
 qEnyN2XbY26p6nj2J+34Euz5OX65ot6bePx6v/h5jY+UjNa8mpZD6d4Y+OII0XW0
 kCLPyaNc8P6IkmAYzSLQPmrjCoRgR/2we8HRBylTWaeEcvpduZs/5YlJNO9q59Et
 voclNx6kk4AdFjSxXHwERlSA3v4+pkq41bjnbpv7a9FGRBUJGVfJFCNukuhCx5Oi
 nUKg7jBJEZNhE2DOA0MKnrZTBnufkLCiH+f/7pn+ypPdw1dna/ix9Suwbz4+UElx
 TtwGLF99CQCX09ieJBVWB90a3xgQBIZLaqu1HtZ1u7dzKkutl+uKMcnfEEtLSfJy
 qMLHHsRQj/iwytQHSdWboL6xQCUSkSefPTmmNfstaKSxeq6w0Q0=
 =pdp0
 -----END PGP SIGNATURE-----

Merge tag 'for-5.9-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fix from David Sterba:
 "One of the recent lockdep fixes introduced a bug that breaks the
  search ioctl, which is used by some applications (bees, compsize). The
  patch made it to stable trees so we need this fixup to make it work
  again"

* tag 'for-5.9-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix wrong address when faulting in pages in the search ioctl
2020-09-14 15:41:58 -07:00
Jens Axboe
6200b0ae4e io_uring: don't run task work on an exiting task
This isn't safe, and isn't needed either. We are guaranteed that any
work we queue is on a live task (and will be run), or it goes to
our backup io-wq threads if the task is exiting.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-14 10:22:15 -06:00
Jens Axboe
87ceb6a6b8 io_uring: drop 'ctx' ref on task work cancelation
If task_work ends up being marked for cancelation, we go through a
cancelation helper instead of the queue path. In converting task_work to
always hold a ctx reference, this path was missed. Make sure that
io_req_task_cancel() puts the reference that is being held against the
ctx.

Fixes: 6d816e088c ("io_uring: hold 'ctx' reference around task_work queue + execute")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-14 10:22:14 -06:00
Filipe Manana
1c78544eaa btrfs: fix wrong address when faulting in pages in the search ioctl
When faulting in the pages for the user supplied buffer for the search
ioctl, we are passing only the base address of the buffer to the function
fault_in_pages_writeable(). This means that after the first iteration of
the while loop that searches for leaves, when we have a non-zero offset,
stored in 'sk_offset', we try to fault in a wrong page range.

So fix this by adding the offset in 'sk_offset' to the base address of the
user supplied buffer when calling fault_in_pages_writeable().

Several users have reported that the applications compsize and bees have
started to operate incorrectly since commit a48b73eca4 ("btrfs: fix
potential deadlock in the search ioctl") was added to stable trees, and
these applications make heavy use of the search ioctls. This fixes their
issues.

Link: https://lore.kernel.org/linux-btrfs/632b888d-a3c3-b085-cdf5-f9bb61017d92@lechevalier.se/
Link: https://github.com/kilobyte/compsize/issues/34
Fixes: a48b73eca4 ("btrfs: fix potential deadlock in the search ioctl")
CC: stable@vger.kernel.org # 4.4+
Tested-by: A L <mail@lechevalier.se>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-14 17:27:16 +02:00
Jens Axboe
202700e18a io_uring: grab any needed state during defer prep
Always grab work environment for deferred links. The assumption that we
will be running it always from the task in question is false, as exiting
tasks may mean that we're deferring this one to a thread helper. And at
that point it's too late to grab the work environment.

Fixes: debb85f496 ("io_uring: factor out grab_env() from defer_prep()")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-13 14:47:06 -06:00
Linus Torvalds
20a7b6be05 Driver core fixes for 5.9-rc5
Here are some small driver core and debugfs fixes for 5.9-rc5
 
 Included in here are:
 	- firmware loader memory leak fix
 	- firmware loader testing fixes for non-EFI systems
 	- device link locking fixes found by lockdep
 	- kobject_del() bugfix that has been affecting some callers
 	- debugfs minor fix
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCX13Zhw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylfNQCfX4Bx9J1aLGr0/MOBwlXXEycChE0AmwQ9rXa7
 u5Przdz+fMr1mLyNBaY5
 =4kmo
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are some small driver core and debugfs fixes for 5.9-rc5

  Included in here are:

   - firmware loader memory leak fix

   - firmware loader testing fixes for non-EFI systems

   - device link locking fixes found by lockdep

   - kobject_del() bugfix that has been affecting some callers

   - debugfs minor fix

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'driver-core-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  test_firmware: Test platform fw loading on non-EFI systems
  PM: <linux/device.h>: fix @em_pd kernel-doc warning
  kobject: Drop unneeded conditional in __kobject_del()
  driver core: Fix device_pm_lock() locking for device links
  MAINTAINERS: Add the security document to SECURITY CONTACT
  driver code: print symbolic error code
  debugfs: Fix module state check condition
  kobject: Restore old behaviour of kobject_del(NULL)
  firmware_loader: fix memory leak for paged buffer
2020-09-13 09:02:59 -07:00
Linus Torvalds
edf6b0e1e4 for-5.9-rc4-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl9bj/UACgkQxWXV+ddt
 WDs1HhAAgAvJVM2WJuMCjQhQiKljFjRT1a0Kbsp+9ayw5Q225t5S5kCMWrsA6mXF
 /9bGRmmELm/Nr5pSH9hp5Bhbke0vNV+Y9XiRQXpegla4LLMF4MulVgADRIL3WoxO
 ZAtNmZUokkjvB0CkzDuI7PqrF67TXLqV2hlctZo0p5SAFFgLaELyIYC6uAaO9Qo/
 +EAAK+7oJyzWcUp44APu90wBbF79umwNVKEEkDfc6bwiA2Cut1JGzvPWgGvvQnta
 fAd114LFViKg05GXcbnx4NxHYtf9tKHjDk9yYWssR+uV6vo/pWwAkDwYxXm/LzA4
 Zv8QK5uvng1fW4eq9QkN3KflIDn+YhaH1jgwNcgyS+ZCdqZR1Mi949f+6Nj1fXt2
 NeXOx3nhtqgNthKQNvHSMVJZrPjV3bdzOz+bULA+hMvTkr5gJy+ToAs30SLxGF5Y
 BCJEE6b5M5Jnb+UHEBMuoxubBfmPHkY8LxfDzVWDLESsKcW2eYyeJyJXx4DNe/v9
 O7Z5pcku+7R9LOlYQEzKeSuiYMqYLtmQtcNXyFBysksikjFJBWNgENna1LmgvmRH
 j6fC5S9h4sIxzyKQkJgihIDt/a3f9WnhsoHw8EIn62tfdOIvMcT/xWq9YYgWaOjZ
 H9040WXvEAFVcDn4cQ22DNgV+toJMpe0pLg6UXe7VtESUtbwMFM=
 =JTfF
 -----END PGP SIGNATURE-----

Merge tag 'for-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more fixes:

   - regression fix for a crash after failed snapshot creation

   - one more lockep fix: use nofs allocation when allocating missing
     device

   - fix reloc tree leak on degraded mount

   - make some extent buffer alignment checks less strict to mount
     filesystems created by btrfs-convert"

* tag 'for-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix NULL pointer dereference after failure to create snapshot
  btrfs: free data reloc tree on failed mount
  btrfs: require only sector size alignment for parent eb bytenr
  btrfs: fix lockdep splat in add_missing_dev
2020-09-12 12:28:39 -07:00
Linus Torvalds
5a3c558a9f A fix for lookup on DFS link when cifsacl or modefromsid used
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl9c/IoACgkQiiy9cAdy
 T1GB8Av/XCQ1NqifphSf9/AJ9GJRIJ9fl5pdoPDDPPpY88XGUBJ21frXJCm5Uv/5
 MXmNltZnIPMqBa+K6QvMR2WjmGlE28zVaapiIXJK+ckaMClukvn09FOCKWmhVkgi
 0m1LwLOCL92TslRxkmW7wI1IFdNFCDdcehQ9xvjkaCCkpkiZ6ec4cV68fvF9nQ0P
 CD5I7E4m5KZiMDIyF1DGZvPEp/fVv5Wu8Fmvi/4DdUCMF8EXmBYrbCax6+SfqMEs
 3ut3pJEiUPdaojZQKb2s0u4vf5zqLYI7azdplqeyEqACDvSmDDhiXyIddbLrT/It
 OkfiffaRLXJADLEQrH1E1uVbABa7pY44LhorxV1J0+4T/v33kItbRcwLfi9LeOpn
 jIIBXYzRP7Pmj2ymH9irf1F18z8O97gcQf8HiR8e6WaSC1Mf8GJLTcwZRsiR4J5H
 4no9vDkL+T+lXLUGwnW8aRe10w2gRoIZeQxhkkWUZJZnf2x0NgecljckFOZaMQff
 ycRefbVh
 =JaQw
 -----END PGP SIGNATURE-----

Merge tag '5.9-rc4-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fix from Steve French:
 "A fix for lookup on DFS link when cifsacl or modefromsid is used"

* tag '5.9-rc4-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix DFS mount with cifsacl/modefromsid
2020-09-12 11:48:04 -07:00
Linus Torvalds
581cb3a26b f2fs-for-5.9-rc5
This introduces some bug fixes including 1) SMR drive fix, 2) infinite loop
 when building free node ids, 3) EOF at DIO read.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAl9Y0skACgkQQBSofoJI
 UNKggA//a8cCkIx0wp1BEabPQ8x8TrHkiOVZX9zTbiaLkI9MLYweF0YFjbCx/i/t
 41r6JbTxWLiM+GRI+u9kOrkBOUgTHAZThIBcWbfYKR5Jz7XdijhsuNYqRf8n33ya
 u4PtT2AiMW3gt8gJML08eI/U+/rBBpI6fkc8Sza03iTCLjJiWwvZ99h2rPSfgmIw
 UdyRuof3mPgUz12fQ8BoO/oQ5KhKgA7kxMsCc4UJE9VX/hBYamTfIEF1AmnxlyKn
 HYhkXyuAmhEFEenBA+ts9YZXMubWHwcj4GlTkWTP6Ib+A3cIinHIqpzEaJnPQdWY
 JPG72L2TNg0Mtl8w+46zJa1mr6KvXTA5GNowX4OTTf2oHxeHJ4sxax3cmvrDOIfe
 7Ga8kGyi5OON1Hymwuyyz4KuKM9zvyUouX0i4wsiamm9kFQoxzNSbfsM5L0weCwM
 MWVOFhtOUhrQdJ+zpbtJpF8Ijj3nuOoUpw3/hUhl3GM+J4bzE8MdzRmzB3JjkVCr
 4mfYuXCITPOH31Fu3obXEFI/9kgyZdxk6OHsXLZCCfRCvgRTDfk5KFQx1OIY3jk0
 I/yRjmvnOyD37GVXHyGZxjRplDCgCpTOL6HFk1GOLFUvuoQeOOqwwL0oa/J94ED0
 nE80dqs4e8c3/YmUqcgiOm4d4EkdwVzhE9RxdUq7s2SJgh5Eu/o=
 =gbal
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs fixes from Jaegeuk Kim:
 "Small bug fixes for:

   - SMR drive fix

   - infinite loop when building free node ids

   - EOF at DIO read"

* tag 'f2fs-for-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
  f2fs: Return EOF on unaligned end of file DIO read
  f2fs: fix indefinite loop scanning for free nid
  f2fs: Fix type of section block count variables
2020-09-10 13:12:46 -07:00
Linus Torvalds
ab29a807a7 NFS client bugfixes for Linux 5.9
Highlights include:
 
 Bugfixes:
 - Fix an NFS/RDMA resource leak
 - Fix the error handling during delegation recall
 - NFSv4.0 needs to return the delegation on a zero-stateid SETATTR
 - Stop printk reading past end of string
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl9ZFYAACgkQZwvnipYK
 APLg+RAArQ0J54M4vTg7avKhUEwIrAlPCFjHvZ5jtlXiY8JDT7Cy2lEo9W/pC9x2
 BiV02H6seKXq6vKUHIBgzVq0BdZBKeWQcOpoO/dfvWSPs9u+lxKlOEwcdsaXwdXz
 31u5HS4xHYg2SlYj+BcKGfVexcWVEVyPqqPvflGBZIlKfzQLHo9YY390deUHMC6o
 HrRXWADvpYXC1sJb3mtNtCojqr9a5A8Ty4clT19YvdwQL7cUt3HjjsOvJfbmB9S+
 fW5/u3sdWJ1nYoz8AxC+utIMNmtXFBUhW0Sg+TPWMJj8yG9rclAgTxbobhXyzGph
 j2ZamPhUtpcSYXBlwiQCm7GbUIItnzHgU6MSCs/nq8AeDc3WEx4qVONVqNvNr/sY
 1T3znylZpXCHvxLmDWzDGsW8XvZT1r86Lm6zrJCmjWm+eoSKBzeoENcXGsGGYuJu
 6NGz7pgQbYMb9t7VfOEFSxxt5w0wt7nRyhV1R7taBhm5B9XjF+BOmJBI0epQ1S7i
 XRIr7WqxT00wijWyunNCQZxi1aDMHVYZXPwaqkEHTwJqeDzCtmir+ajAnZQUgUId
 1MNiv8BDoN5YlPmj/gt+E3kbyj0Pu7M+09NvVEKqG7j8W80ltf6eb85XGrq+vp1E
 Y0lmDXElBdNo3AA+dBOmk+peoVv4bfoog5PymElaRiwRM25VCOM=
 =3fw2
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:

 - Fix an NFS/RDMA resource leak

 - Fix the error handling during delegation recall

 - NFSv4.0 needs to return the delegation on a zero-stateid SETATTR

 - Stop printk reading past end of string

* tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: stop printk reading past end of string
  NFS: Zero-stateid SETATTR should first return delegation
  NFSv4.1 handle ERR_DELAY error reclaiming locking state on delegation recall
  xprtrdma: Release in-flight MRs on disconnect
2020-09-09 11:14:20 -07:00
Gabriel Krisman Bertazi
20d0a107fb f2fs: Return EOF on unaligned end of file DIO read
Reading past end of file returns EOF for aligned reads but -EINVAL for
unaligned reads on f2fs.  While documentation is not strict about this
corner case, most filesystem returns EOF on this case, like iomap
filesystems.  This patch consolidates the behavior for f2fs, by making
it return EOF(0).

it can be verified by a read loop on a file that does a partial read
before EOF (A file that doesn't end at an aligned address).  The
following code fails on an unaligned file on f2fs, but not on
btrfs, ext4, and xfs.

  while (done < total) {
    ssize_t delta = pread(fd, buf + done, total - done, off + done);
    if (!delta)
      break;
    ...
  }

It is arguable whether filesystems should actually return EOF or
-EINVAL, but since iomap filesystems support it, and so does the
original DIO code, it seems reasonable to consolidate on that.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-08 20:31:33 -07:00
Sahitya Tummala
e2cab031ba f2fs: fix indefinite loop scanning for free nid
If the sbi->ckpt->next_free_nid is not NAT block aligned and if there
are free nids in that NAT block between the start of the block and
next_free_nid, then those free nids will not be scanned in scan_nat_page().
This results into mismatch between nm_i->available_nids and the sum of
nm_i->free_nid_count of all NAT blocks scanned. And nm_i->available_nids
will always be greater than the sum of free nids in all the blocks.
Under this condition, if we use all the currently scanned free nids,
then it will loop forever in f2fs_alloc_nid() as nm_i->available_nids
is still not zero but nm_i->free_nid_count of that partially scanned
NAT block is zero.

Fix this to align the nm_i->next_scan_nid to the first nid of the
corresponding NAT block.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-08 20:31:33 -07:00
Shin'ichiro Kawasaki
123aaf774f f2fs: Fix type of section block count variables
Commit da52f8ade4 ("f2fs: get the right gc victim section when section
has several segments") added code to count blocks of each section using
variables with type 'unsigned short', which has 2 bytes size in many
systems. However, the counts can be larger than the 2 bytes range and
type conversion results in wrong values. Especially when the f2fs
sections have blocks as many as USHRT_MAX + 1, the count is handled as 0.
This triggers eternal loop in init_dirty_segmap() at mount system call.
Fix this by changing the type of the variables to block_t.

Fixes: da52f8ade4 ("f2fs: get the right gc victim section when section has several segments")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2020-09-08 20:31:33 -07:00
Filipe Manana
2d892ccdc1 btrfs: fix NULL pointer dereference after failure to create snapshot
When trying to get a new fs root for a snapshot during the transaction
at transaction.c:create_pending_snapshot(), if btrfs_get_new_fs_root()
fails we leave "pending->snap" pointing to an error pointer, and then
later at ioctl.c:create_snapshot() we dereference that pointer, resulting
in a crash:

  [12264.614689] BUG: kernel NULL pointer dereference, address: 00000000000007c4
  [12264.615650] #PF: supervisor write access in kernel mode
  [12264.616487] #PF: error_code(0x0002) - not-present page
  [12264.617436] PGD 0 P4D 0
  [12264.618328] Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI
  [12264.619150] CPU: 0 PID: 2310635 Comm: fsstress Tainted: G        W         5.9.0-rc3-btrfs-next-67 #1
  [12264.619960] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
  [12264.621769] RIP: 0010:btrfs_mksubvol+0x438/0x4a0 [btrfs]
  [12264.622528] Code: bc ef ff ff (...)
  [12264.624092] RSP: 0018:ffffaa6fc7277cd8 EFLAGS: 00010282
  [12264.624669] RAX: 00000000fffffff4 RBX: ffff9d3e8f151a60 RCX: 0000000000000000
  [12264.625249] RDX: 0000000000000001 RSI: ffffffff9d56c9be RDI: fffffffffffffff4
  [12264.625830] RBP: ffff9d3e8f151b48 R08: 0000000000000000 R09: 0000000000000000
  [12264.626413] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffff4
  [12264.626994] R13: ffff9d3ede380538 R14: ffff9d3ede380500 R15: ffff9d3f61b2eeb8
  [12264.627582] FS:  00007f140d5d8200(0000) GS:ffff9d3fb5e00000(0000) knlGS:0000000000000000
  [12264.628176] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [12264.628773] CR2: 00000000000007c4 CR3: 000000020f8e8004 CR4: 00000000003706f0
  [12264.629379] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [12264.629994] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [12264.630594] Call Trace:
  [12264.631227]  btrfs_mksnapshot+0x7b/0xb0 [btrfs]
  [12264.631840]  __btrfs_ioctl_snap_create+0x16f/0x1a0 [btrfs]
  [12264.632458]  btrfs_ioctl_snap_create_v2+0xb0/0xf0 [btrfs]
  [12264.633078]  btrfs_ioctl+0x1864/0x3130 [btrfs]
  [12264.633689]  ? do_sys_openat2+0x1a7/0x2d0
  [12264.634295]  ? kmem_cache_free+0x147/0x3a0
  [12264.634899]  ? __x64_sys_ioctl+0x83/0xb0
  [12264.635488]  __x64_sys_ioctl+0x83/0xb0
  [12264.636058]  do_syscall_64+0x33/0x80
  [12264.636616]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

  (gdb) list *(btrfs_mksubvol+0x438)
  0x7c7b8 is in btrfs_mksubvol (fs/btrfs/ioctl.c:858).
  853		ret = 0;
  854		pending_snapshot->anon_dev = 0;
  855	fail:
  856		/* Prevent double freeing of anon_dev */
  857		if (ret && pending_snapshot->snap)
  858			pending_snapshot->snap->anon_dev = 0;
  859		btrfs_put_root(pending_snapshot->snap);
  860		btrfs_subvolume_release_metadata(root, &pending_snapshot->block_rsv);
  861	free_pending:
  862		if (pending_snapshot->anon_dev)

So fix this by setting "pending->snap" to NULL if we get an error from the
call to btrfs_get_new_fs_root() at transaction.c:create_pending_snapshot().

Fixes: 2dfb1e43f5 ("btrfs: preallocate anon block device at first phase of snapshot creation")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-07 21:18:35 +02:00
Josef Bacik
9e3aa80544 btrfs: free data reloc tree on failed mount
While testing a weird problem with -o degraded, I noticed I was getting
leaked root errors

  BTRFS warning (device loop0): writable mount is not allowed due to too many missing devices
  BTRFS error (device loop0): open_ctree failed
  BTRFS error (device loop0): leaked root -9-0 refcount 1

This is the DATA_RELOC root, which gets read before the other fs roots,
but is included in the fs roots radix tree.  Handle this by adding a
btrfs_drop_and_free_fs_root() on the data reloc root if it exists.  This
is ok to do here if we fail further up because we will only drop the ref
if we delete the root from the radix tree, and all other cleanup won't
be duplicated.

CC: stable@vger.kernel.org # 5.8+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-07 14:51:15 +02:00
Qu Wenruo
ea57788eb7 btrfs: require only sector size alignment for parent eb bytenr
[BUG]
A completely sane converted fs will cause kernel warning at balance
time:

  [ 1557.188633] BTRFS info (device sda7): relocating block group 8162107392 flags data
  [ 1563.358078] BTRFS info (device sda7): found 11722 extents
  [ 1563.358277] BTRFS info (device sda7): leaf 7989321728 gen 95 total ptrs 213 free space 3458 owner 2
  [ 1563.358280] 	item 0 key (7984947200 169 0) itemoff 16250 itemsize 33
  [ 1563.358281] 		extent refs 1 gen 90 flags 2
  [ 1563.358282] 		ref#0: tree block backref root 4
  [ 1563.358285] 	item 1 key (7985602560 169 0) itemoff 16217 itemsize 33
  [ 1563.358286] 		extent refs 1 gen 93 flags 258
  [ 1563.358287] 		ref#0: shared block backref parent 7985602560
  [ 1563.358288] 			(parent 7985602560 is NOT ALIGNED to nodesize 16384)
  [ 1563.358290] 	item 2 key (7985635328 169 0) itemoff 16184 itemsize 33
  ...
  [ 1563.358995] BTRFS error (device sda7): eb 7989321728 invalid extent inline ref type 182
  [ 1563.358996] ------------[ cut here ]------------
  [ 1563.359005] WARNING: CPU: 14 PID: 2930 at 0xffffffff9f231766

Then with transaction abort, and obviously failed to balance the fs.

[CAUSE]
That mentioned inline ref type 182 is completely sane, it's
BTRFS_SHARED_BLOCK_REF_KEY, it's some extra check making kernel to
believe it's invalid.

Commit 64ecdb647d ("Btrfs: add one more sanity check for shared ref
type") introduced extra checks for backref type.

One of the requirement is, parent bytenr must be aligned to node size,
which is not correct.

One example is like this:

0	1G  1G+4K		2G 2G+4K
	|   |///////////////////|//|  <- A chunk starts at 1G+4K
            |   |	<- A tree block get reserved at bytenr 1G+4K

Then we have a valid tree block at bytenr 1G+4K, but not aligned to
nodesize (16K).

Such chunk is not ideal, but current kernel can handle it pretty well.
We may warn about such tree block in the future, but should not reject
them.

[FIX]
Change the alignment requirement from node size alignment to sector size
alignment.

Also, to make our lives a little easier, also output @iref when
btrfs_get_extent_inline_ref_type() failed, so we can locate the item
easier.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205475
Fixes: 64ecdb647d ("Btrfs: add one more sanity check for shared ref type")
CC: stable@vger.kernel.org # 4.14+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
[ update comments and messages ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-07 14:51:05 +02:00
Josef Bacik
fccc0007b8 btrfs: fix lockdep splat in add_missing_dev
Nikolay reported a lockdep splat in generic/476 that I could reproduce
with btrfs/187.

  ======================================================
  WARNING: possible circular locking dependency detected
  5.9.0-rc2+ #1 Tainted: G        W
  ------------------------------------------------------
  kswapd0/100 is trying to acquire lock:
  ffff9e8ef38b6268 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x330

  but task is already holding lock:
  ffffffffa9d74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (fs_reclaim){+.+.}-{0:0}:
	 fs_reclaim_acquire+0x65/0x80
	 slab_pre_alloc_hook.constprop.0+0x20/0x200
	 kmem_cache_alloc_trace+0x3a/0x1a0
	 btrfs_alloc_device+0x43/0x210
	 add_missing_dev+0x20/0x90
	 read_one_chunk+0x301/0x430
	 btrfs_read_sys_array+0x17b/0x1b0
	 open_ctree+0xa62/0x1896
	 btrfs_mount_root.cold+0x12/0xea
	 legacy_get_tree+0x30/0x50
	 vfs_get_tree+0x28/0xc0
	 vfs_kern_mount.part.0+0x71/0xb0
	 btrfs_mount+0x10d/0x379
	 legacy_get_tree+0x30/0x50
	 vfs_get_tree+0x28/0xc0
	 path_mount+0x434/0xc00
	 __x64_sys_mount+0xe3/0x120
	 do_syscall_64+0x33/0x40
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
	 __mutex_lock+0x7e/0x7e0
	 btrfs_chunk_alloc+0x125/0x3a0
	 find_free_extent+0xdf6/0x1210
	 btrfs_reserve_extent+0xb3/0x1b0
	 btrfs_alloc_tree_block+0xb0/0x310
	 alloc_tree_block_no_bg_flush+0x4a/0x60
	 __btrfs_cow_block+0x11a/0x530
	 btrfs_cow_block+0x104/0x220
	 btrfs_search_slot+0x52e/0x9d0
	 btrfs_lookup_inode+0x2a/0x8f
	 __btrfs_update_delayed_inode+0x80/0x240
	 btrfs_commit_inode_delayed_inode+0x119/0x120
	 btrfs_evict_inode+0x357/0x500
	 evict+0xcf/0x1f0
	 vfs_rmdir.part.0+0x149/0x160
	 do_rmdir+0x136/0x1a0
	 do_syscall_64+0x33/0x40
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
	 __lock_acquire+0x1184/0x1fa0
	 lock_acquire+0xa4/0x3d0
	 __mutex_lock+0x7e/0x7e0
	 __btrfs_release_delayed_node.part.0+0x3f/0x330
	 btrfs_evict_inode+0x24c/0x500
	 evict+0xcf/0x1f0
	 dispose_list+0x48/0x70
	 prune_icache_sb+0x44/0x50
	 super_cache_scan+0x161/0x1e0
	 do_shrink_slab+0x178/0x3c0
	 shrink_slab+0x17c/0x290
	 shrink_node+0x2b2/0x6d0
	 balance_pgdat+0x30a/0x670
	 kswapd+0x213/0x4c0
	 kthread+0x138/0x160
	 ret_from_fork+0x1f/0x30

  other info that might help us debug this:

  Chain exists of:
    &delayed_node->mutex --> &fs_info->chunk_mutex --> fs_reclaim

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(fs_reclaim);
				 lock(&fs_info->chunk_mutex);
				 lock(fs_reclaim);
    lock(&delayed_node->mutex);

   *** DEADLOCK ***

  3 locks held by kswapd0/100:
   #0: ffffffffa9d74700 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
   #1: ffffffffa9d65c50 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x115/0x290
   #2: ffff9e8e9da260e0 (&type->s_umount_key#48){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0

  stack backtrace:
  CPU: 1 PID: 100 Comm: kswapd0 Tainted: G        W         5.9.0-rc2+ #1
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
  Call Trace:
   dump_stack+0x92/0xc8
   check_noncircular+0x12d/0x150
   __lock_acquire+0x1184/0x1fa0
   lock_acquire+0xa4/0x3d0
   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
   __mutex_lock+0x7e/0x7e0
   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
   ? __btrfs_release_delayed_node.part.0+0x3f/0x330
   ? lock_acquire+0xa4/0x3d0
   ? btrfs_evict_inode+0x11e/0x500
   ? find_held_lock+0x2b/0x80
   __btrfs_release_delayed_node.part.0+0x3f/0x330
   btrfs_evict_inode+0x24c/0x500
   evict+0xcf/0x1f0
   dispose_list+0x48/0x70
   prune_icache_sb+0x44/0x50
   super_cache_scan+0x161/0x1e0
   do_shrink_slab+0x178/0x3c0
   shrink_slab+0x17c/0x290
   shrink_node+0x2b2/0x6d0
   balance_pgdat+0x30a/0x670
   kswapd+0x213/0x4c0
   ? _raw_spin_unlock_irqrestore+0x46/0x60
   ? add_wait_queue_exclusive+0x70/0x70
   ? balance_pgdat+0x670/0x670
   kthread+0x138/0x160
   ? kthread_create_worker_on_cpu+0x40/0x40
   ret_from_fork+0x1f/0x30

This is because we are holding the chunk_mutex when we call
btrfs_alloc_device, which does a GFP_KERNEL allocation.  We don't want
to switch that to a GFP_NOFS lock because this is the only place where
it matters.  So instead use memalloc_nofs_save() around the allocation
in order to avoid the lockdep splat.

Reported-by: Nikolay Borisov <nborisov@suse.com>
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-09-07 14:50:57 +02:00
Ronnie Sahlberg
01ec372cef cifs: fix DFS mount with cifsacl/modefromsid
RHBZ: 1871246

If during cifs_lookup()/get_inode_info() we encounter a DFS link
and we use the cifsacl or modefromsid mount options we must suppress
any -EREMOTE errors that triggers or else we will not be able to follow
the DFS link and automount the target.

This fixes an issue with modefromsid/cifsacl where these mountoptions
would break DFS and we would no longer be able to access the share.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2020-09-06 23:59:53 -05:00
Linus Torvalds
a8205e3100 io_uring-5.9-2020-09-06
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9U/MMQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpg4BEAC6TQ6ctc1yNTTWwiz4UrdIKMAWP7S7wepu
 k00A9+JjLLJBVdkz9rZ2Y/SyGe12qBM+riiRSn/gNTbkd4qq2rCY2d8U1vXXKyP5
 VDXPo12zsUD5WpqdEMfXUB4+DOQs0a3DAyDLT0+K/Qw0rpOQVZA26Ovn6GOh9+Kq
 KJCllynYkwUQ/7CXsdI13ktMI1HADFOx3149SlGkbuPOggwNQrGiLJAxyUOn/i+E
 uiHy8b8o3B2nun61+Y98q+MDISLf0xXYEbHeAsvETEy52ya2iadnMo1lwS5zHzM+
 p6jOBybHaM/wz5t1V44VTBvfog1KAUtp8K0gsxcB6Ezf7LhYfTVjtvfZqpwVz1sl
 txkxhnGEBURHpdr0aCtC15cIpbDGM85ymjP4RD0YlV7oyT0+Ufx7r9jJjLf7IhZO
 FMyEFmVwSPD/NdJE9ZNx0I2v/qdIzYwfeAix4Z4bXoe51BtPrf1uBgsGWVuqNVz/
 dKVf1vK1tPF8PgIiIW/o8GI4iF3RRVQcLwJGiAWMzBS11iniJLf9mUPRVl5Bxpeb
 YRd2chm+ppHC/IgtK44x7Ce6415hnbKehE2KUr43PHB7nIMNWcRsurxLizM7ZGei
 Gv2/K9PM3+4O/b1k20xLakz4Vw3Isk4W6/Flj9LoxheBo+CUG4Rsx5UyzFEB6apA
 uXxiF6ORLg==
 =v5QQ
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block

Pull more io_uring fixes from Jens Axboe:
 "Two followup fixes. One is fixing a regression from this merge window,
  the other is two commits fixing cancelation of deferred requests.

  Both have gone through full testing, and both spawned a few new
  regression test additions to liburing.

   - Don't play games with const, properly store the output iovec and
     assign it as needed.

   - Deferred request cancelation fix (Pavel)"

* tag 'io_uring-5.9-2020-09-06' of git://git.kernel.dk/linux-block:
  io_uring: fix linked deferred ->files cancellation
  io_uring: fix cancel of deferred reqs with ->files
  io_uring: fix explicit async read/write mapping for large segments
2020-09-06 12:10:27 -07:00
Pavel Begunkov
c127a2a1b7 io_uring: fix linked deferred ->files cancellation
While looking for ->files in ->defer_list, consider that requests there
may actually be links.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-05 16:02:42 -06:00
Pavel Begunkov
b7ddce3cbf io_uring: fix cancel of deferred reqs with ->files
While trying to cancel requests with ->files, it also should look for
requests in ->defer_list, otherwise it might end up hanging a thread.

Cancel all requests in ->defer_list up to the last request there with
matching ->files, that's needed to follow drain ordering semantics.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-05 15:59:51 -06:00
Linus Torvalds
9322c47b21 Fixes (2) for 5.9:
- Fix a broken metadata verifier that would incorrectly validate attr
 fork extents of a realtime file against the realtime volume.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl9RDTMACgkQ+H93GTRK
 tOseqA/8DTnwKRKuTXc6UboIXbgMwjGTJBv7xTD3KIDF3ls+Gmp2/RWHs7lfGRVY
 LVdvlgAwDLEe1al8aLaibhrsRCYX8MLvMp8dX6xoqMWCY0KF80VCpXXy+FXbwBmZ
 GQRqrThyjepseERJj15UC249p9ztBj4DAHQD0r2XD7JMKHBHo6cQ3eYY2NTzgqc3
 q0kYcP5xPZZ4R6UFUNkJpkeV8PKpkzWTXyMod0+e8h4njZoRAmHimj7Id9DMQ6HB
 ciHjZjNCd04Nu6JIpNBwlQE4epCPh79zX8hTQZf5nGNAh13CB9Wc2nHDvwCFBLxw
 HUdU5BpxMNWGujJ+T5X+RVbzA4VXXaL3iypag9EjXadg+vOV6XmPQv6Fa3WqkfBT
 GjEXB9+TG9ZuyxAObjP6yn1PRvob0l0iccAbtfnX5bE/mwqx1aDxN1QTfJrjG2Fv
 2EiUnqI+jxro66HdS5QU+W8ko7dG0tiQPMF3DCv7nfb3ZxEgfXiDrgbHyYbzdLJq
 pi5LqXBAfRHkDwRUzRU6G6pT7mplW31iTwh2AuiQogXnpPTWylb0XsUEVfYKpK9q
 Z0HzwRfHmwQSkIEDZ0rZxP3wH5ssniHyLdXh5FtfFcPSBTyDvflMqFpLmYmvDV/6
 MPfwCQAnMTrGv2GRJ8hbFiWd987bYTTfOKwJBEsFKDH01ZKYcV8=
 =5U8o
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fix from Darrick Wong:
 "Fix a broken metadata verifier that would incorrectly validate attr
  fork extents of a realtime file against the realtime volume"

* tag 'xfs-5.9-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix xfs_bmap_validate_extent_raw when checking attr fork of rt files
2020-09-05 10:04:53 -07:00
Mikulas Patocka
b17164e258 xfs: don't update mtime on COW faults
When running in a dax mode, if the user maps a page with MAP_PRIVATE and
PROT_WRITE, the xfs filesystem would incorrectly update ctime and mtime
when the user hits a COW fault.

This breaks building of the Linux kernel.  How to reproduce:

 1. extract the Linux kernel tree on dax-mounted xfs filesystem
 2. run make clean
 3. run make -j12
 4. run make -j12

at step 4, make would incorrectly rebuild the whole kernel (although it
was already built in step 3).

The reason for the breakage is that almost all object files depend on
objtool.  When we run objtool, it takes COW page fault on its .data
section, and these faults will incorrectly update the timestamp of the
objtool binary.  The updated timestamp causes make to rebuild the whole
tree.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-05 10:00:05 -07:00
Mikulas Patocka
1ef6ea0efe ext2: don't update mtime on COW faults
When running in a dax mode, if the user maps a page with MAP_PRIVATE and
PROT_WRITE, the ext2 filesystem would incorrectly update ctime and mtime
when the user hits a COW fault.

This breaks building of the Linux kernel.  How to reproduce:

 1. extract the Linux kernel tree on dax-mounted ext2 filesystem
 2. run make clean
 3. run make -j12
 4. run make -j12

at step 4, make would incorrectly rebuild the whole kernel (although it
was already built in step 3).

The reason for the breakage is that almost all object files depend on
objtool.  When we run objtool, it takes COW page fault on its .data
section, and these faults will incorrectly update the timestamp of the
objtool binary.  The updated timestamp causes make to rebuild the whole
tree.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-09-05 10:00:05 -07:00
Jens Axboe
c183edff33 io_uring: fix explicit async read/write mapping for large segments
If we exceed UIO_FASTIOV, we don't handle the transition correctly
between an allocated vec for requests that are queued with IOSQE_ASYNC.
Store the iovec appropriately and re-set it in the iter iov in case
it changed.

Fixes: ff6165b2d7 ("io_uring: retain iov_iter state over io_read/io_write calls")
Reported-by: Nick Hill <nick@nickhill.org>
Tested-by: Norman Maurer <norman.maurer@googlemail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-05 09:02:47 -06:00
Chuck Lever
644c9f40cf NFS: Zero-stateid SETATTR should first return delegation
If a write delegation isn't available, the Linux NFS client uses
a zero-stateid when performing a SETATTR.

NFSv4.0 provides no mechanism for an NFS server to match such a
request to a particular client. It recalls all delegations for that
file, even delegations held by the client issuing the request. If
that client happens to hold a read delegation, the server will
recall it immediately, resulting in an NFS4ERR_DELAY/CB_RECALL/
DELEGRETURN sequence.

Optimize out this pipeline bubble by having the client return any
delegations it may hold on a file before it issues a
SETATTR(zero-stateid) on that file.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-09-05 10:39:41 -04:00
Linus Torvalds
d849ca483d io_uring-5.9-2020-09-04
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl9SWN8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpvtbD/4yI5dkopv6E2RHVuupFWmGlGoxLhPecnAZ
 UHbKU+LA/tzWWMA7gZuwzzDEK1QWT/KmctpGTI22SXUKQCtjGzO/qnRMfyJ34TdQ
 l4leYdw/QzUOHZG7dKVYUACHiaSxzQSallNuX1I9eM084KSXH3DgUkrwMLoew/8n
 WJHKN+oRhcppnLVDekaLXbZEI9idTnY+gs/Dg8TNsxNSeO6y51OOlKltaNfL+npQ
 dwlgMoolBYWHFozqgVyzIV7sU7fQ9QGppwBIfqBb1jEe9JU2ZymtlcDgfxUVpKcg
 W8/PCoVT60AGiMdjV0EBoQO09r+nvwAcRQUSlWJU7Dn/pcZmFoaJkyse+SnD0Dac
 cLTKhnhgMJSI4Zt3yQidFSNhz0Ouw15J8k7OTftn81zhtkHzPBgGnA7R6b7UUQsZ
 5lJvlZh5aFPNBFp9A0do5+f5/lUMhHkxDpFVmZo+ywPtoNHJeDL2+jzzFawJ8kqv
 IoFvVL8hl4DzqN+vShsJ40jH93+oITF/Jlq6kY8ILKtu42i5qAxpP0wUwycrN6Pz
 /YNTKPveCoPU7zaFDvMfbc7U56Ke6ma+lmtTn6q6JOWFvUAYh7SUY4JGzEMpxfxK
 QVyFMwXnCKhB66ZypJIFdbT4zqkTXmhxvu/Oz5txDv/uoytqT1o+zLHb3USi4Lw8
 89NyvBc0aQ==
 =NLOn
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.9-2020-09-04' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

 - EAGAIN with O_NONBLOCK retry fix

 - Two small fixes for registered files (Jiufei)

* tag 'io_uring-5.9-2020-09-04' of git://git.kernel.dk/linux-block:
  io_uring: no read/write-retry on -EAGAIN error and O_NONBLOCK marked file
  io_uring: set table->files[i] to NULL when io_sqe_file_register failed
  io_uring: fix removing the wrong file in __io_sqe_files_update()
2020-09-04 12:55:22 -07:00
Vladis Dronov
e3b9fc7eec debugfs: Fix module state check condition
The '#ifdef MODULE' check in the original commit does not work as intended.
The code under the check is not built at all if CONFIG_DEBUG_FS=y. Fix this
by using a correct check.

Fixes: 275678e7a9 ("debugfs: Check module state before warning in {full/open}_proxy_open()")
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Cc: stable <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200811150129.53343-1-vdronov@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-09-04 18:12:52 +02:00