linux_dsm_epyc7002/fs
Nikolay Borisov debd1c065d btrfs: Ensure replaced device doesn't have pending chunk allocation
Recent FITRIM work, namely bbbf7243d6 ("btrfs: combine device update
operations during transaction commit") combined the way certain
operations are recoded in a transaction. As a result an ASSERT was added
in dev_replace_finish to ensure the new code works correctly.
Unfortunately I got reports that it's possible to trigger the assert,
meaning that during a device replace it's possible to have an unfinished
chunk allocation on the source device.

This is supposed to be prevented by the fact that a transaction is
committed before finishing the replace oepration and alter acquiring the
chunk mutex. This is not sufficient since by the time the transaction is
committed and the chunk mutex acquired it's possible to allocate a chunk
depending on the workload being executed on the replaced device. This
bug has been present ever since device replace was introduced but there
was never code which checks for it.

The correct way to fix is to ensure that there is no pending device
modification operation when the chunk mutex is acquire and if there is
repeat transaction commit. Unfortunately it's not possible to just
exclude the source device from btrfs_fs_devices::dev_alloc_list since
this causes ENOSPC to be hit in transaction commit.

Fixing that in another way would need to add special cases to handle the
last writes and forbid new ones. The looped transaction fix is more
obvious, and can be easily backported. The runtime of dev-replace is
long so there's no noticeable delay caused by that.

Reported-by: David Sterba <dsterba@suse.com>
Fixes: 391cd9df81 ("Btrfs: fix unprotected alloc list insertion during the finishing procedure of replace")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-28 18:54:00 +02:00
..
9p Pull request for inlusion in 5.1 2019-03-17 09:10:56 -07:00
adfs
affs
afs afs: Fix in-progess ops to ignore server-level callback invalidation 2019-04-13 08:37:37 +01:00
autofs autofs: clear O_NONBLOCK on the pipe 2019-03-07 18:32:01 -08:00
befs
bfs bfs: extra sanity checking and static inode bitmap 2019-01-04 13:13:47 -08:00
btrfs btrfs: Ensure replaced device doesn't have pending chunk allocation 2019-05-28 18:54:00 +02:00
cachefiles fscache, cachefiles: remove redundant variable 'cache' 2018-11-30 16:00:58 +00:00
ceph ceph: fix ci->i_head_snapc leak 2019-04-23 21:37:54 +02:00
cifs cifs: fix page reference leak with readv/writev 2019-04-24 12:33:59 -05:00
coda
configfs
cramfs Make the Cramfs code more robust against filesystem corruptions, 2018-10-30 12:46:25 -07:00
crypto fscrypt updates for v5.1 2019-03-09 10:54:24 -08:00
debugfs debugfs: fix use-after-free on symlink traversal 2019-04-01 00:31:02 -04:00
devpts fs/devpts: always delete dcache dentry-s in dput() 2019-01-24 13:38:30 -05:00
dlm socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes 2019-02-03 11:17:31 -08:00
ecryptfs crypto: clarify name of WEAK_KEY request flag 2019-01-25 18:41:52 +08:00
efivarfs
efs
exportfs exportfs: do not read dentry after free 2018-11-23 09:08:17 -05:00
ext2 \n 2019-03-07 09:01:33 -08:00
ext4 Miscellaneous ext4 bug fixes for 5.1. 2019-03-24 13:41:37 -07:00
f2fs f2fs-for-5.1-rc1 2019-03-15 13:42:53 -07:00
fat fat: enable .splice_write to support splice on O_DIRECT file 2019-03-07 18:32:01 -08:00
freevxfs
fscache fscache: fix race between enablement and dropping of object 2018-11-30 15:57:31 +00:00
fuse Merge branch 'page-refs' (page ref overflow) 2019-04-14 15:09:40 -07:00
gfs2 We've only got three patches ready for this merge window: 2019-03-09 11:52:11 -08:00
hfs hfs: do not free node before using 2018-11-30 14:56:14 -08:00
hfsplus hfsplus: return file attributes on statx 2019-01-04 13:13:47 -08:00
hostfs
hpfs hpfs: fix spelling mistake "partion" -> "partition" 2019-03-12 09:58:03 -07:00
hugetlbfs hugetlbfs: fix memory leak for resv_map 2019-04-05 16:02:31 -10:00
isofs
jbd2 jbd2: jbd2_get_transaction does not need to return a value 2019-03-01 00:36:57 -05:00
jffs2 jffs2: fix use-after-free on symlink traversal 2019-04-01 00:31:02 -04:00
jfs
kernfs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-12 14:08:19 -07:00
lockd NFS: fix mount/umount race in nlmclnt. 2019-03-18 22:39:34 -04:00
minix
nfs NFSv4.1 fix incorrect return value in copy_file_range 2019-04-11 15:23:48 -04:00
nfs_common
nfsd nfsd: wake blocked file lock waiters before sending callback 2019-04-22 15:38:41 -04:00
nilfs2 XArray: Change xa_insert to return -EBUSY 2019-02-06 13:12:15 -05:00
nls
notify fanotify: Allow copying of file handle to userspace 2019-03-19 09:29:07 +01:00
ntfs mm: convert totalram_pages and totalhigh_pages variables to atomic 2018-12-28 12:11:47 -08:00
ocfs2 ocfs2: fix inode bh swapping mixup in ocfs2_reflink_inodes_lock 2019-03-29 10:01:37 -07:00
omfs
openpromfs fs/openpromfs: Use of_node_name_eq for node name comparisons 2018-11-18 13:35:19 -08:00
orangefs Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-12 13:27:20 -07:00
overlayfs ovl: Do not lose security.capability xattr over metadata file copy-up 2019-02-13 11:14:46 +01:00
proc fs/proc/proc_sysctl.c: Fix a NULL pointer dereference 2019-04-26 09:18:05 -07:00
pstore pstore/ram: Avoid needless alloc during header write 2019-02-12 13:45:53 -08:00
qnx4
qnx6
quota quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON,OFF} quotactls. 2018-12-18 18:29:15 +01:00
ramfs
reiserfs reiserfs: remove workaround code for GCC 3.x 2018-10-31 08:54:14 -07:00
romfs
squashfs
sysfs Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-16 10:31:02 -07:00
sysv sysv: return 'err' instead of 0 in __sysv_write_inode 2018-11-10 08:02:40 -05:00
tracefs
ubifs ubifs: fix use-after-free on symlink traversal 2019-04-01 00:31:02 -04:00
udf udf: Propagate errors from udf_truncate_extents() 2019-03-18 16:30:02 +01:00
ufs
xfs xfs: serialize unaligned dio writes against all other dio writes 2019-03-26 08:37:55 -07:00
aio.c aio: use kmem_cache_free() instead of kfree() 2019-04-04 20:13:59 -04:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c a.out: remove core dumping support 2019-03-05 10:00:35 -08:00
binfmt_elf_fdpic.c
binfmt_elf.c fs/binfmt_elf.c: spread const a little 2019-03-07 18:32:01 -08:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c exec: load_script: Do not exec truncated interpreter path 2019-02-18 16:49:36 -08:00
block_dev.c block: fix the return errno for direct IO 2019-04-11 21:22:21 -06:00
buffer.c fs: fix guard_bio_eod to check for real EOD errors 2019-02-28 13:59:41 -07:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c media updates for v4.20-rc1 2018-10-29 14:29:58 -07:00
compat.c
coredump.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
d_path.c
dax.c fs/dax: Deposit pagetable even when installing zero page 2019-03-13 13:58:46 -07:00
dcache.c fs/dcache: Track & report number of negative dentries 2019-01-30 11:02:11 -08:00
dcookies.c
direct-io.c block: allow bio_for_each_segment_all() to iterate over multi-page bvec 2019-02-15 08:40:11 -07:00
drop_caches.c fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() 2019-02-01 15:46:24 -08:00
eventfd.c
eventpoll.c epoll: use rwlock in order to reduce ep_poll_callback() contention 2019-03-07 18:32:01 -08:00
exec.c exec: increase BINPRM_BUF_SIZE to 256 2019-03-07 18:32:01 -08:00
fcntl.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
fhandle.c
file_table.c fs: add fget_many() and fput_many() 2019-02-28 08:24:23 -07:00
file.c io_uring-2019-03-06 2019-03-08 14:48:40 -08:00
filesystems.c vfs: Implement a filesystem superblock creation/configuration context 2019-02-28 03:29:26 -05:00
fs_context.c vfs: Implement logging through fs_context 2019-02-28 03:29:37 -05:00
fs_parser.c fs: fs_parser: fix printk format warning 2019-03-29 10:01:38 -07:00
fs_pin.c
fs_struct.c
fs_types.c fs: common implementation of file type 2019-01-21 17:48:13 +01:00
fs-writeback.c writeback: synchronize sync(2) against cgroup writeback membership switches 2019-01-22 14:39:38 -07:00
inode.c fs/inode.c: inode_set_flags(): replace opencoded set_mask_bits() 2019-03-05 21:07:13 -08:00
internal.h vfs: Add configuration parser helpers 2019-02-28 03:28:53 -05:00
io_uring.c io_uring: remove 'state' argument from io_{read,write} path 2019-04-23 08:17:58 -06:00
ioctl.c Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
iomap.c block: add BIO_NO_PAGE_REF flag 2019-03-18 10:44:48 -06:00
Kconfig Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-12 14:08:19 -07:00
Kconfig.binfmt
libfs.c
locks.c locks: wake any locks blocked on request before deadlock check 2019-03-25 08:36:24 -04:00
Makefile Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-12 14:08:19 -07:00
mbcache.c
mount.h saner handling of temporary namespaces 2019-01-30 17:44:07 -05:00
mpage.c block: allow bio_for_each_segment_all() to iterate over multi-page bvec 2019-02-15 08:40:11 -07:00
namei.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-12 14:08:19 -07:00
namespace.c Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-03-12 14:08:19 -07:00
no-block.c
nsfs.c
open.c fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock 2019-04-06 07:01:55 -10:00
pipe.c Merge branch 'page-refs' (page ref overflow) 2019-04-14 15:09:40 -07:00
pnode.c separate copying and locking mount tree on cross-userns copies 2019-01-30 17:14:50 -05:00
pnode.h separate copying and locking mount tree on cross-userns copies 2019-01-30 17:14:50 -05:00
posix_acl.c
proc_namespace.c
read_write.c fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock 2019-04-06 07:01:55 -10:00
readdir.c Remove 'type' argument from access_ok() function 2019-01-03 18:57:57 -08:00
select.c y2038: syscalls: rename y2038 compat syscalls 2019-02-07 00:13:27 +01:00
seq_file.c
signalfd.c signal: Distinguish between kernel_siginfo and siginfo 2018-10-03 16:47:43 +02:00
splice.c There tracing fixes: 2019-04-26 11:09:55 -07:00
stack.c
stat.c fs: move generic stat response attr handling to vfs_getattr_nosec 2019-02-01 01:55:45 -05:00
statfs.c vfs: add vfs_get_fsid() helper 2019-02-07 16:38:35 +01:00
super.c vfs: Add some logging to the core users of the fs_context log 2019-02-28 03:29:38 -05:00
sync.c
timerfd.c y2038: syscalls: rename y2038 compat syscalls 2019-02-07 00:13:27 +01:00
userfaultfd.c coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping 2019-04-19 09:46:05 -07:00
utimes.c y2038: syscalls: rename y2038 compat syscalls 2019-02-07 00:13:27 +01:00
xattr.c