linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-24 13:20:52 +07:00

History

Filipe Manana ac05ca913e Btrfs: fix race between using extent maps and merging them We have a few cases where we allow an extent map that is in an extent map tree to be merged with other extents in the tree. Such cases include the unpinning of an extent after the respective ordered extent completed or after logging an extent during a fast fsync. This can lead to subtle and dangerous problems because when doing the merge some other task might be using the same extent map and as consequence see an inconsistent state of the extent map - for example sees the new length but has seen the old start offset. With luck this triggers a BUG_ON(), and not some silent bug, such as the following one in __do_readpage(): $ cat -n fs/btrfs/extent_io.c 3061 static int __do_readpage(struct extent_io_tree tree, 3062 struct page page, (...) 3127 em = __get_extent_map(inode, page, pg_offset, cur, 3128 end - cur + 1, get_extent, em_cached); 3129 if (IS_ERR_OR_NULL(em)) { 3130 SetPageError(page); 3131 unlock_extent(tree, cur, end); 3132 break; 3133 } 3134 extent_offset = cur - em->start; 3135 BUG_ON(extent_map_end(em) <= cur); (...) Consider the following example scenario, where we end up hitting the BUG_ON() in __do_readpage(). We have an inode with a size of 8KiB and 2 extent maps: extent A: file offset 0, length 4KiB, disk_bytenr = X, persisted on disk by a previous transaction extent B: file offset 4KiB, length 4KiB, disk_bytenr = X + 4KiB, not yet persisted but writeback started for it already. The extent map is pinned since there's writeback and an ordered extent in progress, so it can not be merged with extent map A yet The following sequence of steps leads to the BUG_ON(): 1) The ordered extent for extent B completes, the respective page gets its writeback bit cleared and the extent map is unpinned, at that point it is not yet merged with extent map A because it's in the list of modified extents; 2) Due to memory pressure, or some other reason, the MM subsystem releases the page corresponding to extent B - btrfs_releasepage() is called and returns 1, meaning the page can be released as it's not dirty, not under writeback anymore and the extent range is not locked in the inode's iotree. However the extent map is not released, either because we are not in a context that allows memory allocations to block or because the inode's size is smaller than 16MiB - in this case our inode has a size of 8KiB; 3) Task B needs to read extent B and ends up __do_readpage() through the btrfs_readpage() callback. At __do_readpage() it gets a reference to extent map B; 4) Task A, doing a fast fsync, calls clear_em_loggin() against extent map B while holding the write lock on the inode's extent map tree - this results in try_merge_map() being called and since it's possible to merge extent map B with extent map A now (the extent map B was removed from the list of modified extents), the merging begins - it sets extent map B's start offset to 0 (was 4KiB), but before it increments the map's length to 8KiB (4kb + 4KiB), task A is at: BUG_ON(extent_map_end(em) <= cur); The call to extent_map_end() sees the extent map has a start of 0 and a length still at 4KiB, so it returns 4KiB and 'cur' is 4KiB, so the BUG_ON() is triggered. So it's dangerous to modify an extent map that is in the tree, because some other task might have got a reference to it before and still using it, and needs to see a consistent map while using it. Generally this is very rare since most paths that lookup and use extent maps also have the file range locked in the inode's iotree. The fsync path is pretty much the only exception where we don't do it to avoid serialization with concurrent reads. Fix this by not allowing an extent map do be merged if if it's being used by tasks other then the one attempting to merge the extent map (when the reference count of the extent map is greater than 2). Reported-by: ryusuke1925 <st13s20@gm.ibaraki-ct.ac.jp> Reported-by: Koki Mitani <koki.mitani.xg@hco.ntt.co.jp> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206211 CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>		2020-02-12 17:16:46 +01:00
..
tests	btrfs: Correctly handle empty trees in find_first_clear_extent_bit	2020-01-31 14:01:29 +01:00
acl.c	btrfs: cleanup btrfs_setxattr_trans and drop transaction parameter	2019-04-29 19:02:44 +02:00
async-thread.c	btrfs: add __pure attribute to functions	2019-11-18 12:46:52 +01:00
async-thread.h	btrfs: add __pure attribute to functions	2019-11-18 12:46:52 +01:00
backref.c	Btrfs: fix deadlock between fiemap and transaction commits	2019-07-30 18:25:12 +02:00
backref.h	btrfs: fiemap: preallocate ulists for btrfs_check_shared	2019-07-01 13:34:53 +02:00
block-group.c	btrfs: take overcommit into account in inc_block_group_ro	2020-01-31 14:02:01 +01:00
block-group.h	btrfs: Move and unexport btrfs_rmap_block	2020-01-23 17:24:34 +01:00
block-rsv.c	btrfs: use btrfs_try_granting_tickets in update_global_rsv	2019-09-09 14:59:19 +02:00
block-rsv.h	btrfs: migrate the global_block_rsv helpers to block-rsv.c	2019-07-02 12:30:55 +02:00
btrfs_inode.h	Btrfs: remove unnecessary delalloc mutex for inodes	2019-11-18 17:51:46 +01:00
check-integrity.c	btrfs: remove superfluous BUG_ON() in integrity checks	2020-01-20 16:40:52 +01:00
check-integrity.h	btrfs: replace GPL boilerplate by SPDX -- headers	2018-04-12 16:29:46 +02:00
compression.c	btrfs: get rid of at_offset parameter to btrfs_lookup_bio_sums()	2020-01-20 16:40:54 +01:00
compression.h	btrfs: compression: remove ops pointer from workspace_manager	2019-11-18 12:46:59 +01:00
ctree.c	Btrfs: fix race between adding and putting tree mod seq elements and nodes	2020-01-31 14:01:20 +01:00
ctree.h	Btrfs: fix race between adding and putting tree mod seq elements and nodes	2020-01-31 14:01:20 +01:00
delalloc-space.c	Btrfs: remove unnecessary delalloc mutex for inodes	2019-11-18 17:51:46 +01:00
delalloc-space.h	btrfs: migrate the delalloc space stuff to it's own home	2019-07-04 17:26:17 +02:00
delayed-inode.c	btrfs: use refcount_inc_not_zero in kill_all_nodes	2019-11-18 12:46:51 +01:00
delayed-inode.h	Btrfs: delayed-inode: use rb_first_cached for ins_root and del_root	2018-10-15 17:23:33 +02:00
delayed-ref.c	Btrfs: fix race between adding and putting tree mod seq elements and nodes	2020-01-31 14:01:20 +01:00
delayed-ref.h	btrfs: migrate the delayed refs rsv code	2019-07-04 17:26:17 +02:00
dev-replace.c	btrfs: sysfs, add devid/dev_state kobject and device attributes	2020-01-23 17:24:36 +01:00
dev-replace.h	btrfs: add __pure attribute to functions	2019-11-18 12:46:52 +01:00
dir-item.c	btrfs: remove unused parameter fs_info from btrfs_extend_item	2019-04-29 19:02:50 +02:00
discard.c	btrfs: add correction to handle -1 edge case in async discard	2020-01-20 16:41:01 +01:00
discard.h	btrfs: have multiple discard lists	2020-01-20 16:41:00 +01:00
disk-io.c	Btrfs: fix race between adding and putting tree mod seq elements and nodes	2020-01-31 14:01:20 +01:00
disk-io.h	btrfs: drop create parameter to btrfs_get_extent()	2020-01-20 16:40:55 +01:00
export.c	btrfs: drop unused parameter is_new from btrfs_iget	2019-11-18 12:46:52 +01:00
export.h	btrfs: replace GPL boilerplate by SPDX -- headers	2018-04-12 16:29:46 +02:00
extent_io.c	btrfs: drop the -EBUSY case in __extent_writepage_io	2020-01-31 14:02:11 +01:00
extent_io.h	btrfs: drop create parameter to btrfs_get_extent()	2020-01-20 16:40:55 +01:00
extent_map.c	Btrfs: fix race between using extent maps and merging them	2020-02-12 17:16:46 +01:00
extent_map.h	btrfs: remove extent_map::bdev	2019-11-18 23:43:44 +01:00
extent-io-tree.h	btrfs: move the failrec tree stuff into extent-io-tree.h	2019-11-18 12:46:47 +01:00
extent-tree.c	btrfs: calculate discard delay based on number of extents	2020-01-20 16:40:59 +01:00
file-item.c	btrfs: safely advance counter when looking up bio csums	2020-01-20 16:41:01 +01:00
file.c	btrfs: drop create parameter to btrfs_get_extent()	2020-01-20 16:40:55 +01:00
free-space-cache.c	btrfs: ensure removal of discardable_* in free_bitmap()	2020-01-20 16:41:01 +01:00
free-space-cache.h	btrfs: have multiple discard lists	2020-01-20 16:41:00 +01:00
free-space-tree.c	btrfs: rename btrfs_block_group_cache	2019-11-18 17:51:51 +01:00
free-space-tree.h	btrfs: rename btrfs_block_group_cache	2019-11-18 17:51:51 +01:00
inode-item.c	btrfs: Make btrfs_find_name_in_ext_backref return struct btrfs_inode_extref	2019-09-09 14:59:16 +02:00
inode-map.c	btrfs: keep track of which extents have been discarded	2020-01-20 16:40:57 +01:00
inode-map.h	btrfs: replace GPL boilerplate by SPDX -- headers	2018-04-12 16:29:46 +02:00
inode.c	btrfs: do not do delalloc reservation under page lock	2020-01-31 14:02:15 +01:00
ioctl.c	btrfs: drop create parameter to btrfs_get_extent()	2020-01-20 16:40:55 +01:00
Kconfig	btrfs: add Kconfig dependency for BLAKE2B	2019-12-09 17:56:06 +01:00
locking.c	btrfs: document extent buffer locking	2019-11-18 17:51:50 +01:00
locking.h	btrfs: move btrfs_unlock_up_safe to other locking functions	2019-11-18 12:46:49 +01:00
lzo.c	btrfs: compression: inline free_workspace	2019-11-18 12:46:59 +01:00
Makefile	btrfs: add the beginning of async discard, discard workqueue	2020-01-20 16:40:57 +01:00
misc.h	btrfs: add 64bit safe helper for power of two checks	2019-11-18 12:46:50 +01:00
ordered-data.c	btrfs: make btrfs_ordered_extent naming consistent with btrfs_file_extent_item	2020-01-20 16:40:54 +01:00
ordered-data.h	btrfs: make btrfs_ordered_extent naming consistent with btrfs_file_extent_item	2020-01-20 16:40:54 +01:00
orphan.c	btrfs: replace GPL boilerplate by SPDX -- sources	2018-04-12 16:29:51 +02:00
print-tree.c	btrfs: Remove unneeded semicolon	2020-01-20 16:40:55 +01:00
print-tree.h	btrfs: print-tree: debugging output enhancement	2018-04-20 19:18:16 +02:00
props.c	btrfs: props: remove unnecessary hash_init()	2019-11-18 12:46:55 +01:00
props.h	btrfs: delete unused function btrfs_set_prop_trans	2019-04-29 19:02:54 +02:00
qgroup.c	btrfs: qgroup: return ENOTCONN instead of EINVAL when quotas are not enabled	2020-01-20 16:40:50 +01:00
qgroup.h	btrfs: rename btrfs_block_group_cache	2019-11-18 17:51:51 +01:00
raid56.c	btrfs: remove pointless local variable in lock_stripe_add()	2019-11-18 12:47:00 +01:00
raid56.h	btrfs: constify map parameter for nr_parity_stripes and nr_data_stripes	2019-07-01 13:34:58 +02:00
rcu-string.h	btrfs: replace GPL boilerplate by SPDX -- headers	2018-04-12 16:29:46 +02:00
reada.c	btrfs: rename btrfs_block_group_cache	2019-11-18 17:51:51 +01:00
ref-verify.c	btrfs: ref-verify: fix memory leaks	2020-02-12 17:16:31 +01:00
ref-verify.h	btrfs: ref-verify: Use btrfs_ref to refactor btrfs_ref_tree_mod()	2019-04-29 19:02:49 +02:00
relocation.c	btrfs: make btrfs_ordered_extent naming consistent with btrfs_file_extent_item	2020-01-20 16:40:54 +01:00
root-tree.c	btrfs: do not delete mismatched root refs	2020-01-08 14:44:24 +01:00
scrub.c	btrfs: handle empty block_group removal for async discard	2020-01-20 16:40:57 +01:00
send.c	Btrfs: send, fix emission of invalid clone operations within the same file	2020-01-31 14:02:19 +01:00
send.h	btrfs: replace GPL boilerplate by SPDX -- headers	2018-04-12 16:29:46 +02:00
space-info.c	btrfs: take overcommit into account in inc_block_group_ro	2020-01-31 14:02:01 +01:00
space-info.h	btrfs: take overcommit into account in inc_block_group_ro	2020-01-31 14:02:01 +01:00
struct-funcs.c	btrfs: tie extent buffer and it's token together	2019-09-09 14:59:16 +02:00
super.c	btrfs: do not zero f_bavail if we have available space	2020-02-02 18:49:32 +01:00
sysfs.c	btrfs: sysfs, add devid/dev_state kobject and device attributes	2020-01-23 17:24:36 +01:00
sysfs.h	btrfs: sysfs, add devid/dev_state kobject and device attributes	2020-01-23 17:24:36 +01:00
transaction.c	btrfs: set trans->drity in btrfs_commit_transaction	2020-01-23 17:24:37 +01:00
transaction.h	btrfs: Rename btrfs_join_transaction_nolock	2019-11-18 12:46:54 +01:00
tree-checker.c	btrfs: tree-checker: Verify location key for DIR_ITEM/DIR_INDEX	2020-01-20 16:40:56 +01:00
tree-checker.h	btrfs: get fs_info from eb in btrfs_check_chunk_valid	2019-04-29 19:02:39 +02:00
tree-defrag.c	btrfs: open code now trivial btrfs_set_lock_blocking	2019-02-25 14:13:27 +01:00
tree-log.c	Btrfs: fix infinite loop during fsync after rename operations	2020-01-23 17:24:37 +01:00
tree-log.h	btrfs: get fs_info from trans in btrfs_set_log_full_commit	2019-04-29 19:02:41 +02:00
ulist.c	btrfs: replace GPL boilerplate by SPDX -- sources	2018-04-12 16:29:51 +02:00
ulist.h	btrfs: replace GPL boilerplate by SPDX -- headers	2018-04-12 16:29:46 +02:00
uuid-tree.c	btrfs: handle ENOENT in btrfs_uuid_tree_iterate	2019-12-13 14:10:45 +01:00
volumes.c	btrfs: Fix split-brain handling when changing FSID to metadata uuid	2020-01-23 17:24:39 +01:00
volumes.h	btrfs: sysfs, add devid/dev_state kobject and device attributes	2020-01-23 17:24:36 +01:00
xattr.c	Btrfs: fix failure to persist compression property xattr deletion on fsync	2019-06-17 16:37:17 +02:00
xattr.h	btrfs: cleanup btrfs_setxattr_trans and drop transaction parameter	2019-04-29 19:02:44 +02:00
zlib.c	btrfs: compression: inline free_workspace	2019-11-18 12:46:59 +01:00
zstd.c	btrfs: compression: inline free_workspace	2019-11-18 12:46:59 +01:00