linux_dsm_epyc7002/fs
Josef Bacik 18c850fdc5 btrfs: open device without device_list_mutex
There's long existed a lockdep splat because we open our bdev's under
the ->device_list_mutex at mount time, which acquires the bd_mutex.
Usually this goes unnoticed, but if you do loopback devices at all
suddenly the bd_mutex comes with a whole host of other dependencies,
which results in the splat when you mount a btrfs file system.

======================================================
WARNING: possible circular locking dependency detected
5.8.0-0.rc3.1.fc33.x86_64+debug #1 Not tainted
------------------------------------------------------
systemd-journal/509 is trying to acquire lock:
ffff970831f84db0 (&fs_info->reloc_mutex){+.+.}-{3:3}, at: btrfs_record_root_in_trans+0x44/0x70 [btrfs]

but task is already holding lock:
ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

 -> #6 (sb_pagefaults){.+.+}-{0:0}:
       __sb_start_write+0x13e/0x220
       btrfs_page_mkwrite+0x59/0x560 [btrfs]
       do_page_mkwrite+0x4f/0x130
       do_wp_page+0x3b0/0x4f0
       handle_mm_fault+0xf47/0x1850
       do_user_addr_fault+0x1fc/0x4b0
       exc_page_fault+0x88/0x300
       asm_exc_page_fault+0x1e/0x30

 -> #5 (&mm->mmap_lock#2){++++}-{3:3}:
       __might_fault+0x60/0x80
       _copy_from_user+0x20/0xb0
       get_sg_io_hdr+0x9a/0xb0
       scsi_cmd_ioctl+0x1ea/0x2f0
       cdrom_ioctl+0x3c/0x12b4
       sr_block_ioctl+0xa4/0xd0
       block_ioctl+0x3f/0x50
       ksys_ioctl+0x82/0xc0
       __x64_sys_ioctl+0x16/0x20
       do_syscall_64+0x52/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

 -> #4 (&cd->lock){+.+.}-{3:3}:
       __mutex_lock+0x7b/0x820
       sr_block_open+0xa2/0x180
       __blkdev_get+0xdd/0x550
       blkdev_get+0x38/0x150
       do_dentry_open+0x16b/0x3e0
       path_openat+0x3c9/0xa00
       do_filp_open+0x75/0x100
       do_sys_openat2+0x8a/0x140
       __x64_sys_openat+0x46/0x70
       do_syscall_64+0x52/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

 -> #3 (&bdev->bd_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7b/0x820
       __blkdev_get+0x6a/0x550
       blkdev_get+0x85/0x150
       blkdev_get_by_path+0x2c/0x70
       btrfs_get_bdev_and_sb+0x1b/0xb0 [btrfs]
       open_fs_devices+0x88/0x240 [btrfs]
       btrfs_open_devices+0x92/0xa0 [btrfs]
       btrfs_mount_root+0x250/0x490 [btrfs]
       legacy_get_tree+0x30/0x50
       vfs_get_tree+0x28/0xc0
       vfs_kern_mount.part.0+0x71/0xb0
       btrfs_mount+0x119/0x380 [btrfs]
       legacy_get_tree+0x30/0x50
       vfs_get_tree+0x28/0xc0
       do_mount+0x8c6/0xca0
       __x64_sys_mount+0x8e/0xd0
       do_syscall_64+0x52/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

 -> #2 (&fs_devs->device_list_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7b/0x820
       btrfs_run_dev_stats+0x36/0x420 [btrfs]
       commit_cowonly_roots+0x91/0x2d0 [btrfs]
       btrfs_commit_transaction+0x4e6/0x9f0 [btrfs]
       btrfs_sync_file+0x38a/0x480 [btrfs]
       __x64_sys_fdatasync+0x47/0x80
       do_syscall_64+0x52/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

 -> #1 (&fs_info->tree_log_mutex){+.+.}-{3:3}:
       __mutex_lock+0x7b/0x820
       btrfs_commit_transaction+0x48e/0x9f0 [btrfs]
       btrfs_sync_file+0x38a/0x480 [btrfs]
       __x64_sys_fdatasync+0x47/0x80
       do_syscall_64+0x52/0xb0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9

 -> #0 (&fs_info->reloc_mutex){+.+.}-{3:3}:
       __lock_acquire+0x1241/0x20c0
       lock_acquire+0xb0/0x400
       __mutex_lock+0x7b/0x820
       btrfs_record_root_in_trans+0x44/0x70 [btrfs]
       start_transaction+0xd2/0x500 [btrfs]
       btrfs_dirty_inode+0x44/0xd0 [btrfs]
       file_update_time+0xc6/0x120
       btrfs_page_mkwrite+0xda/0x560 [btrfs]
       do_page_mkwrite+0x4f/0x130
       do_wp_page+0x3b0/0x4f0
       handle_mm_fault+0xf47/0x1850
       do_user_addr_fault+0x1fc/0x4b0
       exc_page_fault+0x88/0x300
       asm_exc_page_fault+0x1e/0x30

other info that might help us debug this:

Chain exists of:
  &fs_info->reloc_mutex --> &mm->mmap_lock#2 --> sb_pagefaults

Possible unsafe locking scenario:

     CPU0                    CPU1
     ----                    ----
 lock(sb_pagefaults);
                             lock(&mm->mmap_lock#2);
                             lock(sb_pagefaults);
 lock(&fs_info->reloc_mutex);

 *** DEADLOCK ***

3 locks held by systemd-journal/509:
 #0: ffff97083bdec8b8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x12e/0x4b0
 #1: ffff97083144d598 (sb_pagefaults){.+.+}-{0:0}, at: btrfs_page_mkwrite+0x59/0x560 [btrfs]
 #2: ffff97083144d6a8 (sb_internal){.+.+}-{0:0}, at: start_transaction+0x3f8/0x500 [btrfs]

stack backtrace:
CPU: 0 PID: 509 Comm: systemd-journal Not tainted 5.8.0-0.rc3.1.fc33.x86_64+debug #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Call Trace:
 dump_stack+0x92/0xc8
 check_noncircular+0x134/0x150
 __lock_acquire+0x1241/0x20c0
 lock_acquire+0xb0/0x400
 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
 ? lock_acquire+0xb0/0x400
 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
 __mutex_lock+0x7b/0x820
 ? btrfs_record_root_in_trans+0x44/0x70 [btrfs]
 ? kvm_sched_clock_read+0x14/0x30
 ? sched_clock+0x5/0x10
 ? sched_clock_cpu+0xc/0xb0
 btrfs_record_root_in_trans+0x44/0x70 [btrfs]
 start_transaction+0xd2/0x500 [btrfs]
 btrfs_dirty_inode+0x44/0xd0 [btrfs]
 file_update_time+0xc6/0x120
 btrfs_page_mkwrite+0xda/0x560 [btrfs]
 ? sched_clock+0x5/0x10
 do_page_mkwrite+0x4f/0x130
 do_wp_page+0x3b0/0x4f0
 handle_mm_fault+0xf47/0x1850
 do_user_addr_fault+0x1fc/0x4b0
 exc_page_fault+0x88/0x300
 ? asm_exc_page_fault+0x8/0x30
 asm_exc_page_fault+0x1e/0x30
RIP: 0033:0x7fa3972fdbfe
Code: Bad RIP value.

Fix this by not holding the ->device_list_mutex at this point.  The
device_list_mutex exists to protect us from modifying the device list
while the file system is running.

However it can also be modified by doing a scan on a device.  But this
action is specifically protected by the uuid_mutex, which we are holding
here.  We cannot race with opening at this point because we have the
->s_mount lock held during the mount.  Not having the
->device_list_mutex here is perfectly safe as we're not going to change
the devices at this point.

CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add some comments ]
Signed-off-by: David Sterba <dsterba@suse.com>
2020-07-27 12:55:46 +02:00
..
9p 9p: read only once on O_NONBLOCK 2020-03-27 09:29:56 +00:00
adfs docs: filesystems: fix renamed references 2020-04-20 15:45:22 -06:00
affs docs: filesystems: fix renamed references 2020-04-20 15:45:22 -06:00
afs afs: Fix interruption of operations 2020-07-15 15:49:04 -07:00
autofs autofs: switch to kernel_write 2020-07-08 08:27:56 +02:00
befs
bfs docs: filesystems: fix renamed references 2020-04-20 15:45:22 -06:00
btrfs btrfs: open device without device_list_mutex 2020-07-27 12:55:46 +02:00
cachefiles cachefiles: switch to kernel_write 2020-07-08 08:27:56 +02:00
ceph ceph: skip checking caps when session reconnecting and releasing reqs 2020-06-01 13:22:53 +02:00
cifs Revert "cifs: Fix the target file was deleted when rename failed." 2020-07-23 15:44:11 -05:00
coda docs: filesystems: convert coda.txt to ReST 2020-05-05 09:22:21 -06:00
configfs A fair amount of stuff this time around, dominated by yet another massive 2020-06-01 15:45:27 -07:00
cramfs docs: filesystems: fix renamed references 2020-04-20 15:45:22 -06:00
crypto fscrypt updates for 5.8 2020-06-01 12:10:17 -07:00
debugfs Merge 5.7-rc3 into driver-core-next 2020-04-27 09:34:55 +02:00
devpts
dlm dlm for 5.8 2020-06-05 16:43:16 -07:00
ecryptfs A fair amount of stuff this time around, dominated by yet another massive 2020-06-01 15:45:27 -07:00
efivarfs efi/efivars: Expose RT service availability via efivars abstraction 2020-07-09 10:14:29 +03:00
efs
erofs erofs: fix partially uninitialized misuse in z_erofs_onlinepage_fixup 2020-06-24 09:47:44 +08:00
exfat exfat: fix name_hash computation on big endian systems 2020-07-21 10:44:19 +09:00
exportfs
ext2 mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
ext4 This is the second round of ext4 commits for 5.8 merge window. It 2020-06-15 09:32:10 -07:00
f2fs f2fs-for-5.8-rc1 2020-06-09 11:28:59 -07:00
fat fat: improve the readahead for FAT entries 2020-06-04 19:06:25 -07:00
freevxfs
fscache Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-06-03 16:27:18 -07:00
fuse fuse: Fix parameter for FS_IOC_{GET,SET}FLAGS 2020-07-15 14:18:20 +02:00
gfs2 gfs2: Rework read and page fault locking 2020-07-07 23:40:12 +02:00
hfs for-5.8/block-2020-06-01 2020-06-02 15:29:19 -07:00
hfsplus block: remove the error_sector argument to blkdev_issue_flush 2020-05-22 08:45:46 -06:00
hostfs hostfs: Use kasprintf() instead of fixed buffer formatting 2020-03-29 23:23:00 +02:00
hpfs hpfs: fix warning due to superfluous semicolon 2020-06-06 10:08:17 -07:00
hugetlbfs mmap locking API: convert mmap_sem API comments 2020-06-09 09:39:14 -07:00
iomap New code for 5.8: 2020-06-13 12:44:30 -07:00
isofs for-5.8/block-2020-06-01 2020-06-02 15:29:19 -07:00
jbd2 This is the second round of ext4 commits for 5.8 merge window. It 2020-06-15 09:32:10 -07:00
jffs2 jffs2: Replace zero-length array with flexible-array 2020-06-15 23:08:31 -05:00
jfs Replace zero-length array in JFS 2020-06-02 20:11:35 -07:00
kernfs mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
lockd
minix
nfs SUNRPC reverting d03727b248 ("NFSv4 fix CLOSE not waiting for direct IO compeletion") 2020-07-17 14:47:38 -04:00
nfs_common
nfsd nfsd4: fix NULL dereference in nfsd/clients display code 2020-07-22 16:47:14 -04:00
nilfs2 nilfs2: fix null pointer dereference at nilfs_segctor_do_construct() 2020-06-10 19:14:17 -07:00
nls treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
notify treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
ntfs Merge branch 'akpm' (patches from Andrew) 2020-06-02 12:21:36 -07:00
ocfs2 ocfs2: fix value of OCFS2_INVALID_SLOT 2020-06-26 00:27:37 -07:00
omfs fs: convert mpage_readpages to mpage_readahead 2020-06-02 10:59:07 -07:00
openpromfs
orangefs orangefs: a conversion and a cleanup... 2020-06-05 16:44:36 -07:00
overlayfs ovl: fix lookup of indexed hardlinks with metacopy 2020-07-16 07:24:47 +02:00
proc Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-07-03 23:20:14 -07:00
pstore Merge branch 'uaccess.__copy_from_user' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-06-01 16:18:46 -07:00
qnx4
qnx6 fs: convert mpage_readpages to mpage_readahead 2020-06-02 10:59:07 -07:00
quota sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
ramfs
reiserfs \n 2020-06-04 13:53:10 -07:00
romfs treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
squashfs squashfs: fix length field overlap check in metadata reading 2020-07-24 12:42:41 -07:00
sysfs RDMA 5.8 merge window pull request 2020-06-05 14:05:57 -07:00
sysv docs: filesystems: fix renamed references 2020-04-20 15:45:22 -06:00
tracefs
ubifs mm: remove the pgprot argument to __vmalloc 2020-06-02 10:59:11 -07:00
udf for-5.8/block-2020-06-01 2020-06-02 15:29:19 -07:00
ufs
unicode .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
vboxsf vboxsf: don't use the source name in the bdi name 2020-05-07 08:45:47 -06:00
verity fs-verity: remove unnecessary extern keywords 2020-05-12 16:44:00 -07:00
xfs xfs: fix use-after-free on CIL context on shutdown 2020-06-22 19:22:57 -07:00
zonefs zonefs: count pages after truncating the iterator 2020-07-20 17:59:31 +09:00
aio.c aio: Replace zero-length array with flexible-array 2020-06-15 23:08:25 -05:00
anon_inodes.c
attr.c
bad_inode.c fs: move the fiemap definitions out of fs.h 2020-06-03 23:16:55 -04:00
binfmt_aout.c exec: Rename flush_old_exec begin_new_exec 2020-05-07 16:55:47 -05:00
binfmt_elf_fdpic.c Merge branch 'uaccess.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-06-10 16:02:54 -07:00
binfmt_elf.c Merge branch 'uaccess.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-06-10 16:02:54 -07:00
binfmt_em86.c Merge branch 'akpm' (patches from Andrew) 2020-06-04 19:18:29 -07:00
binfmt_flat.c Merge branch 'uaccess.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-06-10 16:02:54 -07:00
binfmt_misc.c Merge branch 'akpm' (patches from Andrew) 2020-06-04 19:18:29 -07:00
binfmt_script.c Merge branch 'akpm' (patches from Andrew) 2020-06-04 19:18:29 -07:00
block_dev.c block: make function 'kill_bdev' static 2020-06-18 09:24:35 -06:00
buffer.c fs/buffer.c: use attach/detach_page_private 2020-06-02 10:59:07 -07:00
char_dev.c vfs: allow unprivileged whiteout creation 2020-05-14 16:44:23 +02:00
compat_binfmt_elf.c Split the old READ_IMPLIES_EXEC workaround from executable PT_GNU_STACK 2020-06-05 13:45:21 -07:00
compat.c
coredump.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
d_path.c
dax.c dax,iomap: Add helper dax_iomap_zero() to zero a range 2020-04-02 19:15:03 -07:00
dcache.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next 2020-06-03 16:27:18 -07:00
dcookies.c
direct-io.c for-5.8-part2-tag 2020-06-14 09:47:25 -07:00
drop_caches.c sysctl: pass kernel pointers to ->proc_handler 2020-04-27 02:07:40 -04:00
eventfd.c eventfd: convert to f_op->read_iter() 2020-05-06 22:33:43 -04:00
eventpoll.c epoll: call final ep_events_available() check under the lock 2020-05-14 10:00:35 -07:00
exec.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
fcntl.c
fhandle.c
file_table.c Revert "fs: Do not check if there is a fsnotify watcher on pseudo inodes" 2020-06-29 09:40:55 -07:00
file.c fix multiplication overflow in copy_fdtable() 2020-05-19 18:29:36 -04:00
filesystems.c fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once() 2020-04-10 15:36:22 -07:00
fs_context.c vfs: don't parse "silent" option 2020-05-14 16:44:25 +02:00
fs_parser.c fs_parse: remove pr_notice() about each validation 2020-04-02 09:35:26 -07:00
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c A lot of bug fixes and cleanups for ext4, including: 2020-06-05 16:19:28 -07:00
fsopen.c
inode.c AFS Changes 2020-06-05 16:26:36 -07:00
internal.h A lot of bug fixes and cleanups for ext4, including: 2020-06-05 16:19:28 -07:00
io_uring.c io_uring: missed req_init_async() for IOSQE_ASYNC 2020-07-23 11:20:55 -06:00
io-wq.c io_uring: cancel all task's requests on exit 2020-06-15 08:51:34 -06:00
io-wq.h io_uring: cancel by ->task not pid 2020-06-15 08:51:38 -06:00
ioctl.c fs: remove the access_ok() check in ioctl_fiemap 2020-06-03 23:16:55 -04:00
Kconfig treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
Kconfig.binfmt treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
libfs.c block: remove the error_sector argument to blkdev_issue_flush 2020-05-22 08:45:46 -06:00
locks.c Highlights: 2020-06-11 10:33:13 -07:00
Makefile
mbcache.c
mount.h proc/mounts: add cursor 2020-05-14 16:44:24 +02:00
mpage.c fs: convert mpage_readpages to mpage_readahead 2020-06-02 10:59:07 -07:00
namei.c vfs: clean up posix_acl_permission() logic aroudn MAY_NOT_BLOCK 2020-06-08 11:04:19 -07:00
namespace.c fuse: reject options on reconfigure via fsconfig(2) 2020-07-14 14:45:41 +02:00
no-block.c
nsfs.c nsproxy: attach to namespaces via pidfds 2020-05-13 11:41:22 +02:00
open.c Merge branch 'akpm' (patches from Andrew) 2020-06-02 12:21:36 -07:00
pipe.c Notifications over pipes + Keyring notifications 2020-06-13 09:56:21 -07:00
pnode.c propagate_one(): mnt_set_mountpoint() needs mount_lock 2020-04-27 10:37:14 -04:00
pnode.h
posix_acl.c vfs: clean up posix_acl_permission() logic aroudn MAY_NOT_BLOCK 2020-06-08 11:04:19 -07:00
proc_namespace.c Merge branch 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-06-04 13:54:34 -07:00
read_write.c fs: remove __vfs_read 2020-07-08 08:27:57 +02:00
readdir.c readdir.c: get rid of the last __put_user(), drop now-useless access_ok() 2020-05-01 20:29:54 -04:00
select.c pselect6() and friends: take handling the combined 6th/7th args into helper 2020-05-29 19:10:42 -04:00
seq_file.c fs/seq_file.c: seq_read: Update pr_info_ratelimited 2020-06-04 19:06:25 -07:00
signalfd.c
splice.c Notifications over pipes + Keyring notifications 2020-06-13 09:56:21 -07:00
stack.c
stat.c New code for 5.8: 2020-06-02 19:45:12 -07:00
statfs.c
super.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-06-10 16:09:11 -07:00
sync.c overlayfs update for 5.8 2020-06-09 15:40:50 -07:00
timerfd.c
userfaultfd.c mmap locking API: convert mmap_sem comments 2020-06-09 09:39:14 -07:00
utimes.c utimensat: AT_EMPTY_PATH support 2020-05-14 16:44:24 +02:00
xattr.c xattr: fix uninitialized out-param 2020-04-09 15:33:09 -04:00