linux_dsm_epyc7002/fs/btrfs
Filipe Manana a6703c7115 btrfs: fix crash after non-aligned direct IO write with O_DSYNC
Whenever we attempt to do a non-aligned direct IO write with O_DSYNC, we
end up triggering an assertion and crashing. Example reproducer:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdj
  MNT=/mnt/sdj

  mkfs.btrfs -f $DEV > /dev/null
  mount $DEV $MNT

  # Do a direct IO write with O_DSYNC into a non-aligned range...
  xfs_io -f -d -s -c "pwrite -S 0xab -b 64K 1111 64K" $MNT/foobar

  umount $MNT

When running the reproducer an assertion fails and produces the following
trace:

  [ 2418.403134] assertion failed: !current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA, in fs/btrfs/space-info.c:1467
  [ 2418.403745] ------------[ cut here ]------------
  [ 2418.404306] kernel BUG at fs/btrfs/ctree.h:3286!
  [ 2418.404862] invalid opcode: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC PTI
  [ 2418.405451] CPU: 1 PID: 64705 Comm: xfs_io Tainted: G      D           5.10.15-btrfs-next-87 #1
  [ 2418.406026] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
  [ 2418.407228] RIP: 0010:assertfail.constprop.0+0x18/0x26 [btrfs]
  [ 2418.407835] Code: e6 48 c7 (...)
  [ 2418.409078] RSP: 0018:ffffb06080d13c98 EFLAGS: 00010246
  [ 2418.409696] RAX: 000000000000006c RBX: ffff994c1debbf08 RCX: 0000000000000000
  [ 2418.410302] RDX: 0000000000000000 RSI: 0000000000000027 RDI: 00000000ffffffff
  [ 2418.410904] RBP: ffff994c21770000 R08: 0000000000000000 R09: 0000000000000000
  [ 2418.411504] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000010000
  [ 2418.412111] R13: ffff994c22198400 R14: ffff994c21770000 R15: 0000000000000000
  [ 2418.412713] FS:  00007f54fd7aff00(0000) GS:ffff994d35200000(0000) knlGS:0000000000000000
  [ 2418.413326] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 2418.413933] CR2: 000056549596d000 CR3: 000000010b928003 CR4: 0000000000370ee0
  [ 2418.414528] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [ 2418.415109] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [ 2418.415669] Call Trace:
  [ 2418.416254]  btrfs_reserve_data_bytes.cold+0x22/0x22 [btrfs]
  [ 2418.416812]  btrfs_check_data_free_space+0x4c/0xa0 [btrfs]
  [ 2418.417380]  btrfs_buffered_write+0x1b0/0x7f0 [btrfs]
  [ 2418.418315]  btrfs_file_write_iter+0x2a9/0x770 [btrfs]
  [ 2418.418920]  new_sync_write+0x11f/0x1c0
  [ 2418.419430]  vfs_write+0x2bb/0x3b0
  [ 2418.419972]  __x64_sys_pwrite64+0x90/0xc0
  [ 2418.420486]  do_syscall_64+0x33/0x80
  [ 2418.420979]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  [ 2418.421486] RIP: 0033:0x7f54fda0b986
  [ 2418.421981] Code: 48 c7 c0 (...)
  [ 2418.423019] RSP: 002b:00007ffc40569c38 EFLAGS: 00000246 ORIG_RAX: 0000000000000012
  [ 2418.423547] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f54fda0b986
  [ 2418.424075] RDX: 0000000000010000 RSI: 000056549595e000 RDI: 0000000000000003
  [ 2418.424596] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000400
  [ 2418.425119] R10: 0000000000000400 R11: 0000000000000246 R12: 00000000ffffffff
  [ 2418.425644] R13: 0000000000000400 R14: 0000000000010000 R15: 0000000000000000
  [ 2418.426148] Modules linked in: btrfs blake2b_generic (...)
  [ 2418.429540] ---[ end trace ef2aeb44dc0afa34 ]---

1) At btrfs_file_write_iter() we set current->journal_info to
   BTRFS_DIO_SYNC_STUB;

2) We then call __btrfs_direct_write(), which calls btrfs_direct_IO();

3) We can't do the direct IO write because it starts at a non-aligned
   offset (1111). So at btrfs_direct_IO() we return -EINVAL (coming from
   check_direct_IO() which does the alignment check), but we leave
   current->journal_info set to BTRFS_DIO_SYNC_STUB - we only clear it
   at btrfs_dio_iomap_begin(), because we assume we always get there;

4) Then at __btrfs_direct_write() we see that the attempt to do the
   direct IO write was not successful, 0 bytes written, so we fallback
   to a buffered write by calling btrfs_buffered_write();

5) There we call btrfs_check_data_free_space() which in turn calls
   btrfs_alloc_data_chunk_ondemand() and that calls
   btrfs_reserve_data_bytes() with flush == BTRFS_RESERVE_FLUSH_DATA;

6) Then at btrfs_reserve_data_bytes() we have current->journal_info set to
   BTRFS_DIO_SYNC_STUB, therefore not NULL, and flush has the value
   BTRFS_RESERVE_FLUSH_DATA, triggering the second assertion:

  int btrfs_reserve_data_bytes(struct btrfs_fs_info *fs_info, u64 bytes,
                               enum btrfs_reserve_flush_enum flush)
  {
      struct btrfs_space_info *data_sinfo = fs_info->data_sinfo;
      int ret;

      ASSERT(flush == BTRFS_RESERVE_FLUSH_DATA ||
             flush == BTRFS_RESERVE_FLUSH_FREE_SPACE_INODE);
      ASSERT(!current->journal_info || flush != BTRFS_RESERVE_FLUSH_DATA);
  (...)

So fix that by setting the journal to NULL whenever check_direct_IO()
returns a failure.

This bug only affects 5.10 kernels, and the regression was introduced in
5.10-rc1 by commit 0eb79294db ("btrfs: dio iomap DSYNC workaround").
The bug does not exist in 5.11 kernels due to commit ecfdc08b8cc65d
("btrfs: remove dio iomap DSYNC workaround"), which depends on a large
patchset that went into the merge window for 5.11. So this is a fix only
for 5.10.x stable kernels, as there are people hitting this bug.

Fixes: 0eb79294db ("btrfs: dio iomap DSYNC workaround")
CC: stable@vger.kernel.org # 5.10 (and only 5.10)
Acked-by: David Sterba <dsterba@suse.com>
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1181605
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-02-23 15:53:25 +01:00
..
tests btrfs: fix missing delalloc new bit for new delalloc ranges 2020-11-13 22:15:59 +01:00
acl.c
async-thread.c
async-thread.h
backref.c btrfs: do not double free backref nodes on error 2021-01-27 11:54:52 +01:00
backref.h btrfs: rename BTRFS_ROOT_REF_COWS to BTRFS_ROOT_SHAREABLE 2020-05-25 11:25:35 +02:00
block-group.c btrfs: fix possible free space tree corruption with online conversion 2021-02-03 23:28:40 +01:00
block-group.h btrfs: convert block group refcount to refcount_t 2020-07-27 12:55:42 +02:00
block-rsv.c btrfs: print the block rsv type when we fail our reservation 2020-11-05 13:02:05 +01:00
block-rsv.h
btrfs_inode.h btrfs: fix deadlock when cloning inline extent and low on free metadata space 2021-01-17 14:16:54 +01:00
check-integrity.c btrfs: check-integrity: remove unnecessary failure messages during memory allocation 2020-07-27 12:55:21 +02:00
check-integrity.h
compression.c btrfs: compression: move declarations to header 2020-10-07 12:06:55 +02:00
compression.h btrfs: compression: move declarations to header 2020-10-07 12:06:55 +02:00
ctree.c btrfs: cleanup cow block on error 2020-10-07 12:17:59 +02:00
ctree.h btrfs: fix backport of 2175bf57dc in 5.10.13 2021-02-23 15:53:25 +01:00
delalloc-space.c btrfs: add btrfs_reserve_data_bytes and use it 2020-10-07 12:06:52 +02:00
delalloc-space.h btrfs: make btrfs_delalloc_reserve_space take btrfs_inode 2020-07-27 12:55:36 +02:00
delayed-inode.c btrfs: sink total_data parameter in setup_items_for_insert 2020-10-07 12:13:18 +02:00
delayed-inode.h
delayed-ref.c
delayed-ref.h
dev-replace.c btrfs: fix deadlock when cloning inline extent and low on free metadata space 2021-01-17 14:16:54 +01:00
dev-replace.h
dir-item.c
discard.c btrfs: merge critical sections of discard lock in workfn 2021-01-19 18:27:24 +01:00
discard.h
disk-io.c btrfs: print the actual offset in btrfs_root_name 2021-01-27 11:55:06 +01:00
disk-io.h btrfs: add a helper to read the tree_root commit root for backref lookup 2020-10-26 15:04:57 +01:00
export.c btrfs: simplify iget helpers 2020-05-25 11:25:37 +02:00
export.h
extent_io.c btrfs: prevent NULL pointer dereference in extent_io_tree_panic 2021-01-19 18:27:17 +01:00
extent_io.h btrfs: remove struct extent_io_ops 2020-10-07 12:13:25 +02:00
extent_map.c
extent_map.h
extent-io-tree.h btrfs: remove struct extent_io_ops 2020-10-07 12:13:25 +02:00
extent-tree.c btrfs: don't get an EINTR during drop_snapshot for reloc 2021-01-27 11:54:52 +01:00
file-item.c btrfs: make btrfs_find_ordered_sum take btrfs_inode 2020-10-07 12:12:19 +02:00
file.c btrfs: fix missing delalloc new bit for new delalloc ranges 2020-11-13 22:15:59 +01:00
free-space-cache.c btrfs: free-space-cache: use unaligned helpers to access data 2020-10-07 12:13:23 +02:00
free-space-cache.h btrfs: let btrfs_return_cluster_to_free_space() return void 2020-07-27 12:55:21 +02:00
free-space-tree.c btrfs: fix possible free space tree corruption with online conversion 2021-02-03 23:28:40 +01:00
free-space-tree.h
inode-item.c
inode-map.c btrfs: make btrfs_delalloc_reserve_space take btrfs_inode 2020-07-27 12:55:36 +02:00
inode-map.h
inode.c btrfs: fix crash after non-aligned direct IO write with O_DSYNC 2021-02-23 15:53:25 +01:00
ioctl.c btrfs: fix deadlock when cloning inline extent and low on free metadata space 2021-01-17 14:16:54 +01:00
Kconfig btrfs: switch to iomap for direct IO 2020-10-07 12:06:57 +02:00
locking.c btrfs: add nesting tags to the locking helpers 2020-10-07 12:12:16 +02:00
locking.h btrfs: introduce BTRFS_NESTING_NEW_ROOT for adding new roots 2020-10-07 12:12:17 +02:00
lzo.c
Makefile
misc.h
ordered-data.c btrfs: remove inode argument from btrfs_start_ordered_extent 2020-10-07 12:13:22 +02:00
ordered-data.h btrfs: remove inode argument from btrfs_start_ordered_extent 2020-10-07 12:13:22 +02:00
orphan.c
print-tree.c btrfs: print the actual offset in btrfs_root_name 2021-01-27 11:55:06 +01:00
print-tree.h btrfs: print the actual offset in btrfs_root_name 2021-01-27 11:55:06 +01:00
props.c btrfs: simplify iget helpers 2020-05-25 11:25:37 +02:00
props.h
qgroup.c btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan 2021-01-19 18:27:24 +01:00
qgroup.h btrfs: qgroup: export qgroups in sysfs 2020-07-27 12:55:37 +02:00
raid56.c btrfs: raid56: remove out label in __raid56_parity_recover 2020-07-27 12:55:44 +02:00
raid56.h
rcu-string.h
reada.c btrfs: fix readahead hang and use-after-free after removing a device 2020-10-26 15:03:59 +01:00
ref-verify.c btrfs: ref-verify: fix memory leak in btrfs_ref_tree_mod 2020-11-05 13:03:39 +01:00
ref-verify.h
reflink.c btrfs: fix deadlock when cloning inline extent and low on free metadata space 2021-01-17 14:16:54 +01:00
reflink.h
relocation.c btrfs: reloc: fix wrong file extent type check to avoid false ENOENT 2021-01-19 18:27:17 +01:00
root-tree.c btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations 2020-10-07 12:12:13 +02:00
scrub.c btrfs: scrub: update message regarding read-only status 2020-11-05 13:02:58 +01:00
send.c btrfs: send: fix invalid clone operations when cloning from the same file and root 2021-01-27 11:54:53 +01:00
send.h btrfs: send: avoid copying file data 2020-10-07 12:13:17 +02:00
space-info.c btrfs: shrink delalloc pages instead of full inodes 2021-01-17 14:16:54 +01:00
space-info.h btrfs: add btrfs_reserve_data_bytes and use it 2020-10-07 12:06:52 +02:00
struct-funcs.c btrfs: use unaligned helpers for stack and header set/get helpers 2020-10-07 12:13:23 +02:00
super.c btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan 2021-01-19 18:27:24 +01:00
sysfs.c btrfs: do not create raid sysfs entries under any locks 2020-10-07 12:13:19 +02:00
sysfs.h btrfs: split and refactor btrfs_sysfs_remove_devices_dir 2020-10-07 12:12:21 +02:00
transaction.c btrfs: update last_byte_to_unpin in switch_commit_roots 2020-12-30 11:54:13 +01:00
transaction.h btrfs: dio iomap DSYNC workaround 2020-10-07 12:06:57 +02:00
tree-checker.c btrfs: tree-checker: check if chunk item end overflows 2021-01-19 18:27:22 +01:00
tree-checker.h
tree-defrag.c btrfs: remove unused btrfs_root::defrag_trans_start 2020-07-27 12:55:28 +02:00
tree-log.c btrfs: skip unnecessary searches for xattrs when logging an inode 2021-01-17 14:16:53 +01:00
tree-log.h btrfs: make fast fsyncs wait only for writeback 2020-10-07 12:06:56 +02:00
ulist.c
ulist.h
uuid-tree.c btrfs: simplify root lookup by id 2020-05-25 11:25:36 +02:00
volumes.c btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch 2021-02-03 23:28:40 +01:00
volumes.h btrfs: fix lockdep warning due to seqcount_mutex on 32bit arch 2021-02-03 23:28:40 +01:00
xattr.c btrfs: skip unnecessary searches for xattrs when logging an inode 2021-01-17 14:16:53 +01:00
xattr.h
zlib.c
zstd.c