linux_dsm_epyc7002/fs/btrfs
Qu Wenruo a94d1d0cb3 btrfs: Flush before reflinking any extent to prevent NOCOW write falling back to COW without data reservation
[BUG]
The following script can cause unexpected fsync failure:

  #!/bin/bash

  dev=/dev/test/test
  mnt=/mnt/btrfs

  mkfs.btrfs -f $dev -b 512M > /dev/null
  mount $dev $mnt -o nospace_cache

  # Prealloc one extent
  xfs_io -f -c "falloc 8k 64m" $mnt/file1
  # Fill the remaining data space
  xfs_io -f -c "pwrite 0 -b 4k 512M" $mnt/padding
  sync

  # Write into the prealloc extent
  xfs_io -c "pwrite 1m 16m" $mnt/file1

  # Reflink then fsync, fsync would fail due to ENOSPC
  xfs_io -c "reflink $mnt/file1 8k 0 4k" -c "fsync" $mnt/file1
  umount $dev

The fsync fails with ENOSPC, and the last page of the buffered write is
lost.

[CAUSE]
This is caused by:
- Btrfs' back reference only has extent level granularity
  So write into shared extent must be COWed even only part of the extent
  is shared.

So for above script we have:
- fallocate
  Create a preallocated extent where we can do NOCOW write.

- fill all the remaining data and unallocated space

- buffered write into preallocated space
  As we have not enough space available for data and the extent is not
  shared (yet) we fall into NOCOW mode.

- reflink
  Now part of the large preallocated extent is shared, later write
  into that extent must be COWed.

- fsync triggers writeback
  But now the extent is shared and therefore we must fallback into COW
  mode, which fails with ENOSPC since there's not enough space to
  allocate data extents.

[WORKAROUND]
The workaround is to ensure any buffered write in the related extents
(not just the reflink source range) get flushed before reflink/dedupe,
so that NOCOW writes succeed that happened before reflinking succeed.

The workaround is expensive, we could do it better by only flushing
NOCOW range, but that needs extra accounting for NOCOW range.
For now, fix the possible data loss first.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-07-01 13:35:00 +02:00
..
tests for-5.2-tag 2019-05-07 11:34:19 -07:00
acl.c btrfs: cleanup btrfs_setxattr_trans and drop transaction parameter 2019-04-29 19:02:44 +02:00
async-thread.c btrfs: simplify workqueue name when allocating 2019-02-25 14:13:24 +01:00
async-thread.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
backref.c btrfs: fiemap: preallocate ulists for btrfs_check_shared 2019-07-01 13:34:53 +02:00
backref.h btrfs: fiemap: preallocate ulists for btrfs_check_shared 2019-07-01 13:34:53 +02:00
btrfs_inode.h Btrfs: improve performance on fsync of files with multiple hardlinks 2019-04-29 19:02:52 +02:00
check-integrity.c btrfs: Fix typos in comments and strings 2018-12-17 14:51:50 +01:00
check-integrity.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
compression.c for-5.2-rc1-tag 2019-05-20 09:52:35 -07:00
compression.h btrfs: change set_level() to bound the level passed in 2019-02-25 14:13:32 +01:00
ctree.c btrfs: ctree: Dump the leaf before BUG_ON in btrfs_set_item_key_safe 2019-04-29 19:02:52 +02:00
ctree.h btrfs: remove mapping tree structures indirection 2019-07-01 13:34:56 +02:00
dedupe.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
delayed-inode.c btrfs: get fs_info from eb in btrfs_leaf_free_space 2019-04-29 19:02:30 +02:00
delayed-inode.h Btrfs: delayed-inode: use rb_first_cached for ins_root and del_root 2018-10-15 17:23:33 +02:00
delayed-ref.c btrfs: remove unused parameter fs_info from btrfs_add_delayed_extent_op 2019-04-29 19:02:51 +02:00
delayed-ref.h btrfs: remove unused parameter fs_info from btrfs_add_delayed_extent_op 2019-04-29 19:02:51 +02:00
dev-replace.c btrfs: remove mapping tree structures indirection 2019-07-01 13:34:56 +02:00
dev-replace.h btrfs: get fs_info from trans in btrfs_run_dev_replace 2019-04-29 19:02:43 +02:00
dir-item.c btrfs: remove unused parameter fs_info from btrfs_extend_item 2019-04-29 19:02:50 +02:00
disk-io.c btrfs: use u8 for raid_array members 2019-07-01 13:34:57 +02:00
disk-io.h btrfs: get fs_info from trans in btrfs_create_tree 2019-04-29 19:02:41 +02:00
export.c btrfs: Remove 'objectid' member from struct btrfs_root 2018-10-15 17:23:25 +02:00
export.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
extent_io.c btrfs: Use newly introduced btrfs_lock_and_flush_ordered_range 2019-07-01 13:34:59 +02:00
extent_io.h btrfs: Remove bio_offset argument from submit_bio_hook 2019-04-29 19:02:47 +02:00
extent_map.c btrfs: Optimize unallocated chunks discard 2019-04-29 19:02:38 +02:00
extent_map.h btrfs: Remove impossible condition from mergable_maps 2019-02-25 14:13:21 +01:00
extent-tree.c btrfs: extent-tree: Add trace events for space info numbers update 2019-07-01 13:34:58 +02:00
file-item.c btrfs: Document btrfs_csum_one_bio 2019-04-29 19:02:52 +02:00
file.c btrfs: Return EAGAIN if we can't start no snpashot write in check_can_nocow 2019-07-01 13:34:59 +02:00
free-space-cache.c btrfs: remove mapping tree structures indirection 2019-07-01 13:34:56 +02:00
free-space-cache.h btrfs: get fs_info from block group in btrfs_find_space_cluster 2019-04-29 19:02:46 +02:00
free-space-tree.c btrfs: get fs_info from block group in search_free_space_info 2019-04-29 19:02:46 +02:00
free-space-tree.h btrfs: get fs_info from block group in search_free_space_info 2019-04-29 19:02:46 +02:00
inode-item.c btrfs: remove unused parameter fs_info from btrfs_extend_item 2019-04-29 19:02:50 +02:00
inode-map.c btrfs: prune unused includes 2018-08-06 13:12:43 +02:00
inode-map.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
inode.c btrfs: Use newly introduced btrfs_lock_and_flush_ordered_range 2019-07-01 13:34:59 +02:00
ioctl.c btrfs: Flush before reflinking any extent to prevent NOCOW write falling back to COW without data reservation 2019-07-01 13:35:00 +02:00
Kconfig btrfs: add SPDX header to Kconfig 2018-04-12 16:29:55 +02:00
locking.c btrfs: trace: Introduce trace events for all btrfs tree locking events 2019-04-29 19:02:43 +02:00
locking.h btrfs: merge btrfs_set_lock_blocking_rw with it's caller 2019-02-25 14:13:28 +01:00
lzo.c btrfs: change set_level() to bound the level passed in 2019-02-25 14:13:32 +01:00
Makefile btrfs: Remove custom crc32c init code 2018-03-26 15:09:39 +02:00
math.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
ordered-data.c btrfs: Always use a cached extent_state in btrfs_lock_and_flush_ordered_range 2019-07-01 13:34:59 +02:00
ordered-data.h btrfs: add new helper btrfs_lock_and_flush_ordered_range 2019-07-01 13:34:59 +02:00
orphan.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
print-tree.c btrfs: get fs_info from eb in btrfs_leaf_free_space 2019-04-29 19:02:30 +02:00
print-tree.h btrfs: print-tree: debugging output enhancement 2018-04-20 19:18:16 +02:00
props.c btrfs: use the existing reserved items for our first prop for inheritance 2019-05-09 11:18:14 +02:00
props.h btrfs: delete unused function btrfs_set_prop_trans 2019-04-29 19:02:54 +02:00
qgroup.c btrfs: qgroup: Check bg while resuming relocation to avoid NULL pointer dereference 2019-05-28 18:54:10 +02:00
qgroup.h btrfs: qgroup: Move reserved data accounting from btrfs_delayed_ref_head to btrfs_qgroup_extent_record 2019-02-25 14:13:39 +01:00
raid56.c block: remove the i argument to bio_for_each_segment_all 2019-04-30 09:26:13 -06:00
raid56.h btrfs: constify map parameter for nr_parity_stripes and nr_data_stripes 2019-07-01 13:34:58 +02:00
rcu-string.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
reada.c btrfs: start readahead also in seed devices 2019-06-14 17:33:46 +02:00
ref-verify.c Wimplicit-fallthrough patches for 5.2-rc1 2019-05-07 12:48:10 -07:00
ref-verify.h btrfs: ref-verify: Use btrfs_ref to refactor btrfs_ref_tree_mod() 2019-04-29 19:02:49 +02:00
relocation.c btrfs: reloc: Also queue orphan reloc tree for cleanup to avoid BUG_ON() 2019-05-28 18:54:10 +02:00
root-tree.c Btrfs: do not abort transaction at btrfs_update_root() after failure to COW path 2019-05-09 11:25:27 +02:00
scrub.c btrfs: read number of data stripes from map only once 2019-07-01 13:34:58 +02:00
send.c Btrfs: incremental send, fix emission of invalid clone operations 2019-05-28 18:54:10 +02:00
send.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
struct-funcs.c btrfs: prune unused includes 2018-08-06 13:12:43 +02:00
super.c btrfs: Remove unused variable mode in btrfs_mount 2019-07-01 13:34:55 +02:00
sysfs.c btrfs: sysfs: don't leak memory when failing add fsid 2019-05-16 14:31:12 +02:00
sysfs.h btrfs: drop extra enum initialization where using defaults 2018-12-17 14:51:43 +01:00
transaction.c Btrfs: remove no longer used member num_dirty_bgs from transaction 2019-04-29 19:02:43 +02:00
transaction.h Btrfs: remove no longer used member num_dirty_bgs from transaction 2019-04-29 19:02:43 +02:00
tree-checker.c btrfs: tree-checker: Check if the file extent end overflows 2019-07-01 13:34:55 +02:00
tree-checker.h btrfs: get fs_info from eb in btrfs_check_chunk_valid 2019-04-29 19:02:39 +02:00
tree-defrag.c btrfs: open code now trivial btrfs_set_lock_blocking 2019-02-25 14:13:27 +01:00
tree-log.c Btrfs: fix race updating log root item during fsync 2019-05-28 19:26:46 +02:00
tree-log.h btrfs: get fs_info from trans in btrfs_set_log_full_commit 2019-04-29 19:02:41 +02:00
ulist.c btrfs: replace GPL boilerplate by SPDX -- sources 2018-04-12 16:29:51 +02:00
ulist.h btrfs: replace GPL boilerplate by SPDX -- headers 2018-04-12 16:29:46 +02:00
uuid-tree.c btrfs: remove unused parameter fs_info from btrfs_extend_item 2019-04-29 19:02:50 +02:00
volumes.c btrfs: Add comments on locking of several device-related fields 2019-07-01 13:34:59 +02:00
volumes.h btrfs: Add comments on locking of several device-related fields 2019-07-01 13:34:59 +02:00
xattr.c Btrfs: fix failure to persist compression property xattr deletion on fsync 2019-06-17 16:37:17 +02:00
xattr.h btrfs: cleanup btrfs_setxattr_trans and drop transaction parameter 2019-04-29 19:02:44 +02:00
zlib.c btrfs: change set_level() to bound the level passed in 2019-02-25 14:13:32 +01:00
zstd.c btrfs: correct zstd workspace manager lock to use spin_lock_bh() 2019-05-28 18:54:09 +02:00