linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-19 21:46:47 +07:00

Author	SHA1	Message	Date
Naohiro Aota	cb2f96f8ab	btrfs: introduce extent allocation policy This commit introduces extent allocation policy for btrfs. This policy controls how btrfs allocate an extents from block groups. There is no functional change introduced with this commit. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:49 +01:00
Naohiro Aota	6aafb30384	btrfs: parameterize dev_extent_min for chunk allocation Currently, we ignore a device whose available space is less than "BTRFS_STRIPE_LEN * dev_stripes". This is a lower limit for current allocation policy (to maximize the number of stripes). This commit parameterizes dev_extent_min, so that other policies can set their own lower limitat to ignore a device with insufficient space. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:49 +01:00
Naohiro Aota	dce580ca40	btrfs: factor out create_chunk() Factor out create_chunk() from __btrfs_alloc_chunk(). This function finally creates a chunk. There is no functional changes. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:49 +01:00
Naohiro Aota	5badf512ec	btrfs: factor out decide_stripe_size() Factor out decide_stripe_size() from __btrfs_alloc_chunk(). This function calculates the actual stripe size to allocate. decide_stripe_size() handles the common case to round down the 'ndevs' to 'devs_increment' and check the upper and lower limitation of 'ndevs'. decide_stripe_size_regular() decides the size of a stripe and the size of a chunk. The policy is to maximize the number of stripes. This commit has no functional changes. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:49 +01:00
Naohiro Aota	560156cb25	btrfs: factor out gather_device_info() Factor out gather_device_info() from __btrfs_alloc_chunk(). This function iterates over devices list and gather information about devices. This commit also introduces "max_avail" and "dev_extent_min" to fold the same calculation to one variable. This commit has no functional changes. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:49 +01:00
Naohiro Aota	27c314d5ca	btrfs: factor out init_alloc_chunk_ctl Factor out init_alloc_chunk_ctl() from __btrfs_alloc_chunk(). This function initialises parameters of "struct alloc_chunk_ctl" for allocation. init_alloc_chunk_ctl() handles a common part of the initialisation to load the RAID parameters from btrfs_raid_array. init_alloc_chunk_ctl_policy_regular() decides some parameters for its allocation. The last "else" case in the original code is moved to __btrfs_alloc_chunk() to handle the error case in the common code. Replace the BUG_ON with ASSERT() and error return at the same time. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:48 +01:00
Naohiro Aota	4f2bafe8a4	btrfs: introduce alloc_chunk_ctl Introduce "struct alloc_chunk_ctl" to wrap needed parameters for the chunk allocation. This will be used to split __btrfs_alloc_chunk() into smaller functions. This commit folds a number of local variables in __btrfs_alloc_chunk() into one "struct alloc_chunk_ctl ctl". There is no functional change. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:48 +01:00
Naohiro Aota	3b4ffa4088	btrfs: refactor find_free_dev_extent_start() Factor out two functions from find_free_dev_extent_start(). dev_extent_search_start() decides the starting position of the search. dev_extent_hole_check() checks if a hole found is suitable for device extent allocation. These functions also have the switch-cases to change the allocation behavior depending on the policy. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:48 +01:00
Naohiro Aota	c4a816c67c	btrfs: introduce chunk allocation policy Introduce chunk allocation policy for btrfs. This policy controls how chunks and device extents are allocated from devices. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:48 +01:00
Naohiro Aota	b25c19f49e	btrfs: handle invalid profile in chunk allocation Do not BUG_ON() when an invalid profile is passed to __btrfs_alloc_chunk(). Instead return -EINVAL with ASSERT() to catch a bug in the development stage. Suggested-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:48 +01:00
Naohiro Aota	52d40aba68	btrfs: change full_search to bool in find_free_extent_update_loop While the "full_search" variable defined in find_free_extent() is bool, but the full_search argument of find_free_extent_update_loop() is defined as int. Let's trivially fix the argument type. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:47 +01:00
Qu Wenruo	daf475c915	btrfs: qgroup: Remove the unnecesaary spin lock for qgroup_rescan_running After the previous patch, qgroup_rescan_running is protected by btrfs_fs_info::qgroup_rescan_lock, thus no need for the extra spinlock. Suggested-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:47 +01:00
Qu Wenruo	d61acbbf54	btrfs: qgroup: ensure qgroup_rescan_running is only set when the worker is at least queued [BUG] There are some reports about btrfs wait forever to unmount itself, with the following call trace: INFO: task umount:4631 blocked for more than 491 seconds. Tainted: G X 5.3.8-2-default #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. umount D 0 4631 3337 0x00000000 Call Trace: ([<00000000174adf7a>] __schedule+0x342/0x748) [<00000000174ae3ca>] schedule+0x4a/0xd8 [<00000000174b1f08>] schedule_timeout+0x218/0x420 [<00000000174af10c>] wait_for_common+0x104/0x1d8 [<000003ff804d6994>] btrfs_qgroup_wait_for_completion+0x84/0xb0 [btrfs] [<000003ff8044a616>] close_ctree+0x4e/0x380 [btrfs] [<0000000016fa3136>] generic_shutdown_super+0x8e/0x158 [<0000000016fa34d6>] kill_anon_super+0x26/0x40 [<000003ff8041ba88>] btrfs_kill_super+0x28/0xc8 [btrfs] [<0000000016fa39f8>] deactivate_locked_super+0x68/0x98 [<0000000016fcb198>] cleanup_mnt+0xc0/0x140 [<0000000016d6a846>] task_work_run+0xc6/0x110 [<0000000016d04f76>] do_notify_resume+0xae/0xb8 [<00000000174b30ae>] system_call+0xe2/0x2c8 [CAUSE] The problem happens when we have called qgroup_rescan_init(), but not queued the worker. It can be caused mostly by error handling. Qgroup ioctl thread \| Unmount thread ----------------------------------------+----------------------------------- \| btrfs_qgroup_rescan() \| \|- qgroup_rescan_init() \| \| \|- qgroup_rescan_running = true; \| \| \| \|- trans = btrfs_join_transaction() \| \| Some error happened \| \| \| \|- btrfs_qgroup_rescan() returns error \| But qgroup_rescan_running == true; \| \| close_ctree() \| \|- btrfs_qgroup_wait_for_completion() \| \|- running == true; \| \|- wait_for_completion(); btrfs_qgroup_rescan_worker is never queued, thus no one is going to wake up close_ctree() and we get a deadlock. All involved qgroup_rescan_init() callers are: - btrfs_qgroup_rescan() The example above. It's possible to trigger the deadlock when error happened. - btrfs_quota_enable() Not possible. Just after qgroup_rescan_init() we queue the work. - btrfs_read_qgroup_config() It's possible to trigger the deadlock. It only init the work, the work queueing happens in btrfs_qgroup_rescan_resume(). Thus if error happened in between, deadlock is possible. We shouldn't set fs_info->qgroup_rescan_running just in qgroup_rescan_init(), as at that stage we haven't yet queued qgroup rescan worker to run. [FIX] Set qgroup_rescan_running before queueing the work, so that we ensure the rescan work is queued when we wait for it. Fixes: `8d9eddad19` ("Btrfs: fix qgroup rescan worker initialization") Signed-off-by: Jeff Mahoney <jeffm@suse.com> [ Change subject and cause analyse, use a smaller fix ] Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:47 +01:00
Andy Shevchenko	807fc790aa	btrfs: switch to use new generic UUID API There are new types and helpers that are supposed to be used in new code. As a preparation to get rid of legacy types and API functions do the conversion here. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:47 +01:00
Qu Wenruo	b3ff8f1d38	btrfs: Don't submit any btree write bio if the fs has errors [BUG] There is a fuzzed image which could cause KASAN report at unmount time. BUG: KASAN: use-after-free in btrfs_queue_work+0x2c1/0x390 Read of size 8 at addr ffff888067cf6848 by task umount/1922 CPU: 0 PID: 1922 Comm: umount Tainted: G W 5.0.21 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: dump_stack+0x5b/0x8b print_address_description+0x70/0x280 kasan_report+0x13a/0x19b btrfs_queue_work+0x2c1/0x390 btrfs_wq_submit_bio+0x1cd/0x240 btree_submit_bio_hook+0x18c/0x2a0 submit_one_bio+0x1be/0x320 flush_write_bio.isra.41+0x2c/0x70 btree_write_cache_pages+0x3bb/0x7f0 do_writepages+0x5c/0x130 __writeback_single_inode+0xa3/0x9a0 writeback_single_inode+0x23d/0x390 write_inode_now+0x1b5/0x280 iput+0x2ef/0x600 close_ctree+0x341/0x750 generic_shutdown_super+0x126/0x370 kill_anon_super+0x31/0x50 btrfs_kill_super+0x36/0x2b0 deactivate_locked_super+0x80/0xc0 deactivate_super+0x13c/0x150 cleanup_mnt+0x9a/0x130 task_work_run+0x11a/0x1b0 exit_to_usermode_loop+0x107/0x130 do_syscall_64+0x1e5/0x280 entry_SYSCALL_64_after_hwframe+0x44/0xa9 [CAUSE] The fuzzed image has a completely screwd up extent tree: leaf 29421568 gen 8 total ptrs 6 free space 3587 owner EXTENT_TREE refs 2 lock (w:0 r:0 bw:0 br:0 sw:0 sr:0) lock_owner 0 current 5938 item 0 key (12587008 168 4096) itemoff 3942 itemsize 53 extent refs 1 gen 9 flags 1 ref#0: extent data backref root 5 objectid 259 offset 0 count 1 item 1 key (12591104 168 8192) itemoff 3889 itemsize 53 extent refs 1 gen 9 flags 1 ref#0: extent data backref root 5 objectid 271 offset 0 count 1 item 2 key (12599296 168 4096) itemoff 3836 itemsize 53 extent refs 1 gen 9 flags 1 ref#0: extent data backref root 5 objectid 259 offset 4096 count 1 item 3 key (29360128 169 0) itemoff 3803 itemsize 33 extent refs 1 gen 9 flags 2 ref#0: tree block backref root 5 item 4 key (29368320 169 1) itemoff 3770 itemsize 33 extent refs 1 gen 9 flags 2 ref#0: tree block backref root 5 item 5 key (29372416 169 0) itemoff 3737 itemsize 33 extent refs 1 gen 9 flags 2 ref#0: tree block backref root 5 Note that leaf 29421568 doesn't have its backref in the extent tree. Thus extent allocator can re-allocate leaf 29421568 for other trees. In short, the bug is caused by: - Existing tree block gets allocated to log tree This got its generation bumped. - Log tree balance cleaned dirty bit of offending tree block It will not be written back to disk, thus no WRITTEN flag. - Original owner of the tree block gets COWed Since the tree block has higher transid, no WRITTEN flag, it's reused, and not traced by transaction::dirty_pages. - Transaction aborted Tree blocks get cleaned according to transaction::dirty_pages. But the offending tree block is not recorded at all. - Filesystem unmount All pages are assumed to be are clean, destroying all workqueue, then call iput(btree_inode). But offending tree block is still dirty, which triggers writeback, and causes use-after-free bug. The detailed sequence looks like this: - Initial status eb: 29421568, header=WRITTEN bflags_dirty=0, page_dirty=0, gen=8, not traced by any dirty extent_iot_tree. - New tree block is allocated Since there is no backref for 29421568, it's re-allocated as new tree block. Keep in mind that tree block 29421568 is still referred by extent tree. - Tree block 29421568 is filled for log tree eb: 29421568, header=0 bflags_dirty=1, page_dirty=1, gen=9 << (gen bumped) traced by btrfs_root::dirty_log_pages - Some log tree operations Since the fs is using node size 4096, the log tree can easily go a level higher. - Log tree needs balance Tree block 29421568 gets all its content pushed to right, thus now it is empty, and we don't need it. btrfs_clean_tree_block() from __push_leaf_right() get called. eb: 29421568, header=0 bflags_dirty=0, page_dirty=0, gen=9 traced by btrfs_root::dirty_log_pages - Log tree write back btree_write_cache_pages() goes through dirty pages ranges, but since page of tree block 29421568 gets cleaned already, it's not written back to disk. Thus it doesn't have WRITTEN bit set. But ranges in dirty_log_pages are cleared. eb: 29421568, header=0 bflags_dirty=0, page_dirty=0, gen=9 not traced by any dirty extent_iot_tree. - Extent tree update when committing transaction Since tree block 29421568 has transid equal to running trans, and has no WRITTEN bit, should_cow_block() will use it directly without adding it to btrfs_transaction::dirty_pages. eb: 29421568, header=0 bflags_dirty=1, page_dirty=1, gen=9 not traced by any dirty extent_iot_tree. At this stage, we're doomed. We have a dirty eb not tracked by any extent io tree. - Transaction gets aborted due to corrupted extent tree Btrfs cleans up dirty pages according to transaction::dirty_pages and btrfs_root::dirty_log_pages. But since tree block 29421568 is not tracked by neither of them, it's still dirty. eb: 29421568, header=0 bflags_dirty=1, page_dirty=1, gen=9 not traced by any dirty extent_iot_tree. - Filesystem unmount Since all cleanup is assumed to be done, all workqueus are destroyed. Then iput(btree_inode) is called, expecting no dirty pages. But tree 29421568 is still dirty, thus triggering writeback. Since all workqueues are already freed, we cause use-after-free. This shows us that, log tree blocks + bad extent tree can cause wild dirty pages. [FIX] To fix the problem, don't submit any btree write bio if the filesytem has any error. This is the last safe net, just in case other cleanup haven't caught catch it. Link: https://github.com/bobfuzzer/CVE/tree/master/CVE-2019-19377 CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:46 +01:00
Marcos Paulo de Souza	faf8f7b957	btrfs: ioctl: resize: only show message if size is changed There is no point to inform the user about size change if there's none. Update the message to conform to a commonly used format where the path and devid are printed and also print old and new sizes. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Marcos Paulo de Souza <marcos@mpdesouza.com> Reviewed-by: David Sterba <dsterba@suse.com> [ enhance message ] Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:46 +01:00
Anand Jain	b82582d668	btrfs: slightly simplify global block reserve calculations In btrfs_update_global_block_rsv the lines: num_bytes = block_rsv->size - block_rsv->reserved; block_rsv->reserved += num_bytes; imply: block_rsv->reserved = block_rsv->size; Assign block_rsv->size to block_rsv->reserved directly and reorder lines so they match the other branch. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:46 +01:00
David Sterba	56e9f6ea32	btrfs: merge unlocking to common exit block in btrfs_commit_transaction The tree_log_mutex and reloc_mutex locks are properly nested so we can simplify error handling and add labels for them. This reduces line count of the function. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:46 +01:00
David Sterba	15b6e8a83e	btrfs: reduce pointer intdirections in btree_readpage_end_io_hook All we need to read is checksum size from fs_info superblock, and fs_info is provided by extent buffer so we can get rid of the wild pointer indirections from page/inode/root. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:45 +01:00
David Sterba	b79ce3dddd	btrfs: adjust delayed refs message level The message seems to be for debugging and has little value for users. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:45 +01:00
David Sterba	1db45a35f0	btrfs: replace u_long type cast with unsigned long We don't use the u_XX types anywhere, though they're defined. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:45 +01:00
David Sterba	eeb6f17200	btrfs: raid56: simplify sort_parity_stripes Remove trivial comprator and open coded swap of two values. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:45 +01:00
David Sterba	7e8f19e50e	btrfs: adjust message level for unrecognized mount option An unrecognized option is a failure that should get user/administrator attention, the info level is often below what gets logged, so make it error. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:45 +01:00
David Sterba	42c9d0b524	btrfs: simplify parameters of btrfs_set_disk_extent_flags All callers pass extent buffer start and length so the extent buffer itself should work fine. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:45 +01:00
David Sterba	c4ac754198	btrfs: open code trivial helper btrfs_header_chunk_tree_uuid The helper btrfs_header_chunk_tree_uuid follows naming convention of other struct accessors but does something compeletly different. As the offsetof calculation is clear in the context of extent buffer operations we can remove it. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:44 +01:00
David Sterba	9a8658e33d	btrfs: open code trivial helper btrfs_header_fsid The helper btrfs_header_fsid follows naming convention of other struct accessors but does something compeletly different. As the offsetof calculation is clear in the context of extent buffer operations we can remove it. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:44 +01:00
David Sterba	75fb2e9e49	btrfs: move mapping of block for discard to its caller There's a simple forwarded call based on the operation that would better fit the caller btrfs_map_block that's until now a trivial wrapper. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:44 +01:00
David Sterba	ee787f9550	btrfs: use struct_size to calculate size of raid hash table The struct_size macro does the same calculation and is safe regarding overflows. Though we're not expecting them to happen, use the helper for clarity. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:44 +01:00
Nikolay Borisov	dcc3eb9638	btrfs: convert snapshot/nocow exlcusion to drew lock This patch removes all haphazard code implementing nocow writers exclusion from pending snapshot creation and switches to using the drew lock to ensure this invariant still holds. 'Readers' are snapshot creators from create_snapshot and 'writers' are nocow writers from buffered write path or btrfs_setsize. This locking scheme allows for multiple snapshots to happen while any nocow writers are blocked, since writes to page cache in the nocow path will make snapshots inconsistent. So for performance reasons we'd like to have the ability to run multiple concurrent snapshots and also favors readers in this case. And in case there aren't pending snapshots (which will be the majority of the cases) we rely on the percpu's writers counter to avoid cacheline contention. The main gain from using the drew lock is it's now a lot easier to reason about the guarantees of the locking scheme and whether there is some silent breakage lurking. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:44 +01:00
Nikolay Borisov	2992df7326	btrfs: Implement DREW lock A (D)ouble (R)eader (W)riter (E)xclustion lock is a locking primitive that allows to have multiple readers or multiple writers but not multiple readers and writers holding it concurrently. The code is factored out from the existing open-coded locking scheme used to exclude pending snapshots from nocow writers and vice-versa. Current implementation actually favors Readers (that is snapshot creaters) to writers (nocow writers of the filesystem). The API provides lock/unlock/trylock for reads and writes. Formal specification for TLA+ provided by Valentin Schneider is at https://lore.kernel.org/linux-btrfs/2dcaf81c-f0d3-409e-cb29-733d8b3b4cc9@arm.com/ Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:43 +01:00
Johannes Thumshirn	fd8efa818c	btrfs: simplify error handling in __btrfs_write_out_cache() The error cleanup gotos in __btrfs_write_out_cache() needlessly jump back making the code less readable then needed. Flatten them out so no back-jump is necessary and the read flow is uninterrupted. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:43 +01:00
Johannes Thumshirn	1afb648e94	btrfs: use standard debug config option to enable free-space-cache debug prints free-space-cache.c has it's own set of DEBUG ifdefs which need to be turned on instead of the global CONFIG_BTRFS_DEBUG to print debug messages about failed block-group writes. Switch this over to CONFIG_BTRFS_DEBUG so we always see these messages when running a debug kernel. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:43 +01:00
Johannes Thumshirn	7a195f6db9	btrfs: make the uptodate argument of io_ctl_add_pages() boolean Make the uptodate argument of io_ctl_add_pages() boolean. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:43 +01:00
Johannes Thumshirn	831fa14f1e	btrfs: use inode from io_ctl in io_ctl_prepare_pages io_ctl_prepare_pages() gets a 'struct btrfs_io_ctl' as well as a 'struct inode', but btrfs_io_ctl::inode points to the same struct inode as this is assgined in io_ctl_init(). Use the inode form io_ctl to reduce the arguments of io_ctl_prepare_pages. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:43 +01:00
Marcos Paulo de Souza	949964c928	btrfs: add new BTRFS_IOC_SNAP_DESTROY_V2 ioctl This ioctl will be responsible for deleting a subvolume using its id. This can be used when a system has a file system mounted from a subvolume, rather than the root file system, like below: / @subvol1/ @subvol2/ @subvol_default/ If only @subvol_default is mounted, we have no path to reach @subvol1 and @subvol2, thus no way to delete them. Current subvolume delete ioctl takes a file handle point as argument, and if @subvol_default is mounted, we can't reach @subvol1 and @subvol2 from the same mount point. This patch introduces a new ioctl BTRFS_IOC_SNAP_DESTROY_V2 that takes the extended structure with flags to allow to delete subvolume using subvolid. Now, we can use this new ioctl specifying the subvolume id and refer to the same mount point. It doesn't matter which subvolume was mounted, since we can reach to the desired one using the subvolume id, and then delete it. The full path to the subvolume id is resolved internally and access is verified as if the subvolume was accessed by path. The volume args v2 structure is extended to use the existing union for subvolume id specification, that's valid in case the BTRFS_SUBVOL_SPEC_BY_ID is set. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:42 +01:00
Marcos Paulo de Souza	c0c907a47d	btrfs: export helpers for subvolume name/id resolution The functions will be used outside of export.c and super.c to allow resolving subvolume name from a given id, eg. for subvolume deletion by id ioctl. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ split from the next patch ] Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:42 +01:00
David Sterba	748449cdbe	btrfs: use ioctl args support mask for device delete When the device remove v2 ioctl was added, the full support mask was added to sanity check the flags. However this would allow to let the subvolume related flags to be accepted. This is not supposed to happen. Use the correct support mask, which means that now any of BTRFS_SUBVOL_CREATE_ASYNC, BTRFS_SUBVOL_RDONLY or BTRFS_SUBVOL_QGROUP_INHERIT will be rejected as ENOTSUPP. Though this is a user-visible change, specifying subvolume flags for device deletion does not make sense and there are hopefully no applications doing that. Reviewed-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:42 +01:00
David Sterba	673990dba3	btrfs: use ioctl args support mask for subvolume create/delete Using the defined mask instead of flag enumeration in the ioctl handler is preferred. No functional changes. Reviewed-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:42 +01:00
Jules Irenge	5ce48d0f0e	btrfs: Add missing lock annotation for release_extent_buffer() Sparse reports a warning at release_extent_buffer() warning: context imbalance in release_extent_buffer() - unexpected unlock The root cause is the missing annotation at release_extent_buffer() Add the missing __releases(&eb->refs_lock) annotation Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Jules Irenge <jbi.octave@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:42 +01:00
Josef Bacik	75ec1db871	btrfs: set update the uuid generation as soon as possible In my EIO stress testing I noticed I was getting forced to rescan the uuid tree pretty often, which was weird. This is because my error injection stuff would sometimes inject an error after log replay but before we loaded the UUID tree. If log replay committed the transaction it wouldn't have updated the uuid tree generation, but the tree was valid and didn't change, so there's no reason to not update the generation here. Fix this by setting the BTRFS_FS_UPDATE_UUID_TREE_GEN bit immediately after reading all the fs roots if the uuid tree generation matches the fs generation. Then any transaction commits that happen during mount won't screw up our uuid tree state, forcing us to do needless uuid rescans. Fixes: `70f8017547` ("Btrfs: check UUID tree during mount if required") CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:41 +01:00
Josef Bacik	c94bec2c61	btrfs: bail out of uuid tree scanning if we're closing In doing my fsstress+EIO stress testing I started running into issues where umount would get stuck forever because the uuid checker was chewing through the thousands of subvolumes I had created. We shouldn't block umount on this, simply bail if we're unmounting the fs. We need to make sure we don't mark the UUID tree as ok, so we only set that bit if we made it through the whole rescan operation, but otherwise this is completely safe. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:41 +01:00
Nikolay Borisov	97f4dd09da	btrfs: make btrfs_check_uuid_tree private to disk-io.c It's used only during filesystem mount as such it can be made private to disk-io.c file. Also use the occasion to move btrfs_uuid_rescan_kthread as btrfs_check_uuid_tree is its sole caller. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:41 +01:00
Nikolay Borisov	560b7a4aa2	btrfs: call btrfs_check_uuid_tree_entry directly in btrfs_uuid_tree_iterate btrfs_uuid_tree_iterate is called from only once place and its 2nd argument is always btrfs_check_uuid_tree_entry. Simplify btrfs_uuid_tree_iterate's signature by removing its 2nd argument and directly calling btrfs_check_uuid_tree_entry. Also move the latter into uuid-tree.h. No functional changes. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:41 +01:00
David Sterba	c17af96554	btrfs: raid56: simplify tracking of Q stripe presence There are temporary variables tracking the index of P and Q stripes, but none of them is really used as such, merely for determining if the Q stripe is present. This leads to compiler warnings with -Wunused-but-set-variable and has been reported several times. fs/btrfs/raid56.c: In function ‘finish_rmw’: fs/btrfs/raid56.c:1199:6: warning: variable ‘p_stripe’ set but not used [-Wunused-but-set-variable] 1199 \| int p_stripe = -1; \| ^~~~~~~~ fs/btrfs/raid56.c: In function ‘finish_parity_scrub’: fs/btrfs/raid56.c:2356:6: warning: variable ‘p_stripe’ set but not used [-Wunused-but-set-variable] 2356 \| int p_stripe = -1; \| ^~~~~~~~ Replace the two variables with one that has a clear meaning and also get rid of the warnings. The logic that verifies that there are only 2 valid cases is unchanged. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:41 +01:00
ethanwu	b25b0b871f	btrfs: backref, use correct count to resolve normal data refs With the following patches: - btrfs: backref, only collect file extent items matching backref offset - btrfs: backref, not adding refs from shared block when resolving normal backref - btrfs: backref, only search backref entries from leaves of the same root we only collect the normal data refs we want, so the imprecise upper bound total_refs of that EXTENT_ITEM could now be changed to the count of the normal backref entry we want to search. Background and how the patches fit together: Btrfs has two types of data backref. For BTRFS_EXTENT_DATA_REF_KEY type of backref, we don't have the exact block number. Therefore, we need to call resolve_indirect_refs. It uses btrfs_search_slot to locate the leaf block. Then we need to walk through the leaves to search for the EXTENT_DATA items that have disk bytenr matching the extent item (add_all_parents). When resolving indirect refs, we could take entries that don't belong to the backref entry we are searching for right now. For that reason when searching backref entry, we always use total refs of that EXTENT_ITEM rather than individual count. For example: item 11 key (40831553536 EXTENT_ITEM 4194304) itemoff 15460 itemsize extent refs 24 gen 7302 flags DATA shared data backref parent 394985472 count 10 #1 extent data backref root 257 objectid 260 offset 1048576 count 3 #2 extent data backref root 256 objectid 260 offset 65536 count 6 #3 extent data backref root 257 objectid 260 offset 65536 count 5 #4 For example, when searching backref entry #4, we'll use total_refs 24, a very loose loop ending condition, instead of total_refs = 5. But using total_refs = 24 is not accurate. Sometimes, we'll never find all the refs from specific root. As a result, the loop keeps on going until we reach the end of that inode. The first 3 patches, handle 3 different types refs we might encounter. These refs do not belong to the normal backref we are searching, and hence need to be skipped. This patch changes the total_refs to correct number so that we could end loop as soon as we find all the refs we want. btrfs send uses backref to find possible clone sources, the following is a simple test to compare the results with and without this patch: $ btrfs subvolume create /sub1 $ for i in `seq 1 163840`; do dd if=/dev/zero of=/sub1/file bs=64K count=1 seek=$((i-1)) conv=notrunc oflag=direct done $ btrfs subvolume snapshot /sub1 /sub2 $ for i in `seq 1 163840`; do dd if=/dev/zero of=/sub1/file bs=4K count=1 seek=$(((i-1)*16+10)) conv=notrunc oflag=direct done $ btrfs subvolume snapshot -r /sub1 /snap1 $ time btrfs send /snap1 \| btrfs receive /volume2 Without this patch: real 69m48.124s user 0m50.199s sys 70m15.600s With this patch: real 1m59.683s user 0m35.421s sys 2m42.684s Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: ethanwu <ethanwu@synology.com> [ add patchset cover letter with background and numbers ] Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:40 +01:00
ethanwu	cfc0eed0ec	btrfs: backref, only search backref entries from leaves of the same root We could have some nodes/leaves in subvolume whose owner are not the that subvolume. In this way, when we resolve normal backrefs of that subvolume, we should avoid collecting those references from these blocks. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: ethanwu <ethanwu@synology.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:40 +01:00
ethanwu	ed58f2e66e	btrfs: backref, don't add refs from shared block when resolving normal backref All references from the block of SHARED_DATA_REF belong to that shared block backref. For example: item 11 key (40831553536 EXTENT_ITEM 4194304) itemoff 15460 itemsize 95 extent refs 24 gen 7302 flags DATA extent data backref root 257 objectid 260 offset 65536 count 5 extent data backref root 258 objectid 265 offset 0 count 9 shared data backref parent 394985472 count 10 Block 394985472 might be leaf from root 257, and the item obejctid and (file_pos - file_extent_item::offset) in that leaf just happens to be 260 and 65536 which is equal to the first extent data backref entry. Before this patch, when we resolve backref: root 257 objectid 260 offset 65536 we will add those refs in block 394985472 and wrongly treat those as the refs we want. Fix this by checking if the leaf we are processing is shared data backref, if so, just skip this leaf. Shared data refs added into preftrees.direct have all entry value = 0 (root_id = 0, key = NULL, level = 0) except parent entry. Other refs from indirect tree will have key value and root id != 0, and these values won't be changed when their parent is resolved and added to preftrees.direct. Therefore, we could reuse the preftrees.direct and search ref with all values = 0 except parent is set to avoid getting those resolved refs block. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: ethanwu <ethanwu@synology.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:40 +01:00
ethanwu	7ac8b88ee6	btrfs: backref, only collect file extent items matching backref offset When resolving one backref of type EXTENT_DATA_REF, we collect all references that simply reference the EXTENT_ITEM even though their (file_pos - file_extent_item::offset) are not the same as the btrfs_extent_data_ref::offset we are searching for. This patch adds additional check so that we only collect references whose (file_pos - file_extent_item::offset) == btrfs_extent_data_ref::offset. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: ethanwu <ethanwu@synology.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:40 +01:00
Johannes Thumshirn	9da2b242e2	btrfs: remove buffer_heads form super block mirror integrity checking The integrity checking code for the super block mirrors is the last remaining user of buffer_heads, change it to using plain bios as well. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:40 +01:00
Johannes Thumshirn	59aaad503f	btrfs: remove buffer_heads from btrfsic_process_written_block() Now that the last caller of btrfsic_process_written_block() with buffer_heads is gone, remove the buffer_head processing path from it as well. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:40 +01:00
Johannes Thumshirn	61ecc5fc18	btrfs: remove btrfsic_submit_bh() Now that the last use of btrfsic_submit_bh() is gone as the super block is now written using bios, remove the function as well. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:39 +01:00
Johannes Thumshirn	314b6dd0ee	btrfs: use bios instead of buffer_heads from super block writeout Similar to the superblock read path, change the write path to using bios and pages instead of buffer_heads. This allows us to skip over the buffer_head code, for writing the superblock to disk. This is based on a patch originally authored by Nikolay Borisov. Co-developed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:39 +01:00
Johannes Thumshirn	8f32380d3f	btrfs: use the page cache for super block reading Super-block reading in BTRFS is done using buffer_heads. Buffer_heads have some drawbacks, like not being able to propagate errors from the lower layers. Directly use the page cache for reading the super blocks from disk or invalidating an on-disk super block. We have to use the page cache so to avoid races between mkfs and udev. See also `6f60cbd3ae` ("btrfs: access superblock via pagecache in scan_one_device"). This patch unwraps the buffer head API and does not change the way the super block is actually read. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:39 +01:00
Johannes Thumshirn	6fbceb9fa4	btrfs: reduce scope of btrfs_scratch_superblocks() btrfs_scratch_superblocks() isn't used anywhere outside volumes.c so remove it from the header file and mark it as static. Also move it above it's callers so we don't need a forward declaration. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:39 +01:00
Johannes Thumshirn	c514c9b10b	btrfs: don't kmap() pages from block devices Block device mappings are never in highmem so kmap() / kunmap() calls for pages from block devices are unneeded. Use page_address() instead of kmap() to get to the virtual addreses. While we're at it, read_cache_page_gfp() doesn't return NULL on error, only an ERR_PTR, so use IS_ERR() to check for errors. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:39 +01:00
Nikolay Borisov	f6d9abbc1f	btrfs: Export btrfs_release_disk_super Preparatory patch for removal of buffer_head usage in btrfs. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:38 +01:00
Filipe Manana	55ffaabe23	Btrfs: avoid unnecessary splits when setting bits on an extent io tree When attempting to set bits on a range of an exent io tree that already has those bits set we can end up splitting an extent state record, use the preallocated extent state record, insert it into the red black tree, do another search on the red black tree, merge the preallocated extent state record with the previous extent state record, remove that previous record from the red black tree and then free it. This is all unnecessary work that consumes time. This happens specifically at the following case at __set_extent_bit(): $ cat -n fs/btrfs/extent_io.c 957 static int __must_check 958 __set_extent_bit(struct extent_io_tree tree, u64 start, u64 end, (...) 1044 / 1045 * \| ---- desired range ---- \| 1046 * \| state \| 1047 * or 1048 * \| ------------- state -------------- \| 1049 * (...) 1060 if (state->start < start) { 1061 if (state->state & exclusive_bits) { 1062 failed_start = start; 1063 err = -EEXIST; 1064 goto out; 1065 } 1066 1067 prealloc = alloc_extent_state_atomic(prealloc); 1068 BUG_ON(!prealloc); 1069 err = split_state(tree, state, prealloc, start); 1070 if (err) 1071 extent_io_tree_panic(tree, err); 1072 1073 prealloc = NULL; So if our extent state represents a range from 0 to 1MiB for example, and we want to set bits in the range 128KiB to 256KiB for example, and that extent state record already has all those bits set, we end up splitting that record, so we end up with extent state records in the tree which represent the ranges from 0 to 128KiB and from 128KiB to 1MiB. This is temporary because a subsequent iteration in that function will end up merging the records. The splitting requires using the preallocated extent state record, so a future iteration that needs to do another split will need to allocate another extent state record in an atomic context, something not ideal that we try to avoid as much as possible. The splitting also requires an insertion in the red black tree, and a subsequent merge will require a deletion from the red black tree and freeing an extent state record. This change just skips the splitting of an extent state record when it already has all the bits the we need to set. Setting a bit that is already set for a range is very common in the inode's 'file_extent_tree' extent io tree for example, where we keep setting the EXTENT_DIRTY bit every time we replace an extent. This change also fixes a bug that happens after the recent patchset from Josef that avoids having implicit holes after a power failure when not using the NO_HOLES feature, more specifically the patch with the subject: "btrfs: introduce the inode->file_extent_tree" This patch introduced an extent io tree per inode to keep track of completed ordered extents and figure out at any time what is the safe value for the inode's disk_i_size. This assumes that for contiguous ranges in a file we always end up with a single extent state record in the io tree, but that is not the case, as there is a short time window where we can have two extent state records representing contiguous ranges. When this happens we end setting up an incorrect value for the inode's disk_i_size, resulting in data loss after a clean unmount of the filesystem. The following example explains how this can happen. Suppose we have an inode with an i_size and a disk_i_size of 1MiB, so in the inode's file_extent_tree we have a single extent state record that represents the range [0, 1MiB) with the EXTENT_DIRTY bit set. Then the following steps happen: 1) A buffered write against file range [512KiB, 768KiB) is made. At this point delalloc was not flushed yet; 2) Deduplication from some other inode into this inode's range [128KiB, 256KiB) is made. This causes btrfs_inode_set_file_extent_range() to be called, from btrfs_insert_clone_extent(), to mark the range [128KiB, 256KiB) with EXTENT_DIRTY in the inode's file_extent_tree; 3) When btrfs_inode_set_file_extent_range() calls set_extent_bits(), we end up at __set_extent_bit(). In the first iteration of that function's loop we end up in the following branch: $ cat -n fs/btrfs/extent_io.c 957 static int __must_check 958 __set_extent_bit(struct extent_io_tree tree, u64 start, u64 end, (...) 1044 /* 1045 * \| ---- desired range ---- \| 1046 * \| state \| 1047 * or 1048 * \| ------------- state -------------- \| 1049 * (...) 1060 if (state->start < start) { 1061 if (state->state & exclusive_bits) { 1062 *failed_start = start; 1063 err = -EEXIST; 1064 goto out; 1065 } 1066 1067 prealloc = alloc_extent_state_atomic(prealloc); 1068 BUG_ON(!prealloc); 1069 err = split_state(tree, state, prealloc, start); 1070 if (err) 1071 extent_io_tree_panic(tree, err); 1072 1073 prealloc = NULL; (...) 1089 goto search_again; This splits the state record into two, one for range [0, 128KiB) and another for the range [128KiB, 1MiB). Both already have the EXTENT_DIRTY bit set. Then we jump to the 'search_again' label, where we unlock the the spinlock protecting the extent io tree before jumping to the 'again' label to perform the next iteration; 4) In the meanwhile, delalloc is flushed, the ordered extent for the range [512KiB, 768KiB) is created and when it completes, at btrfs_finish_ordered_io(), it calls btrfs_inode_safe_disk_i_size_write() with a value of 0 for its 'new_size' argument; 5) Before the deduplication task currently at __set_extent_bit() moves to the next iteration, the task finishing the ordered extent calls find_first_extent_bit() through btrfs_inode_safe_disk_i_size_write() and gets 'start' set to 0 and 'end' set to 128KiB - because at this moment the io tree has two extent state records, one representing the range [0, 128KiB) and another representing the range [128KiB, 1MiB), both with EXTENT_DIRTY set. Then we set 'isize' to: isize = min(isize, end + 1) = min(1MiB, 128KiB - 1 + 1) = 128KiB Then we set the inode's disk_i_size to 128KiB (isize). After a clean unmount of the filesystem and mounting it again, we have the file with a size of 128KiB, and effectively lost all the data it had before in the range from 128KiB to 1MiB. This change fixes that issue too, as we never end up splitting extent state records when they already have all the bits we want set. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:38 +01:00
Josef Bacik	ab9b2c7b32	btrfs: handle logged extent failure properly If we're allocating a logged extent we attempt to insert an extent record for the file extent directly. We increase space_info->bytes_reserved, because the extent entry addition will call btrfs_update_block_group(), which will convert the ->bytes_reserved to ->bytes_used. However if we fail at any point while inserting the extent entry we will bail and leave space on ->bytes_reserved, which will trigger a WARN_ON() on umount. Fix this by pinning the space if we fail to insert, which is what happens in every other failure case that involves adding the extent entry. CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:38 +01:00
Qu Wenruo	e19221180d	btrfs: relocation: Remove is_cowonly_root() This function is only used in read_fs_root(), which is just a wrapper of btrfs_get_fs_root(). For all the mentioned essential roots except log root tree, btrfs_get_fs_root() has its own quick path to grab them from fs_info directly, thus no need for key.offset modification. For subvolume trees, btrfs_get_fs_root() with key.offset == -1 is completely fine. For log trees and log root tree, it's impossible to hit them, as for relocation all backrefs are fetched from commit root, which never records log tree blocks. Log tree blocks either get freed in regular transaction commit, or replayed at mount time. At runtime we should never hit an backref for log tree in extent tree. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:38 +01:00
Nikolay Borisov	fe119a6eeb	btrfs: switch to per-transaction pinned extents This commit flips the switch to start tracking/processing pinned extents on a per-transaction basis. It mostly replaces all references from btrfs_fs_info::(pinned_extents\|freed_extents[]) to btrfs_transaction::pinned_extents. Two notable modifications that warrant explicit mention are changing clean_pinned_extents to get a reference to the previously running transaction. The other one is removal of call to btrfs_destroy_pinned_extent since transactions are going to be cleaned in btrfs_cleanup_one_transaction. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:38 +01:00
Nikolay Borisov	45bb5d6ae9	btrfs: Factor out pinned extent clean up in btrfs_delete_unused_bgs Next patch is going to refactor how pinned extents are tracked which will necessitate changing this code. To ease that work and contain the changes factor the code now in preparation, this will also help review. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:37 +01:00
Nikolay Borisov	f2fb72983b	btrfs: Mark pinned log extents as excluded In preparation to making pinned extents per-transaction ensure that log such extents are always excluded from caching. To achieve this in addition to marking them via btrfs_pin_extent_for_log_replay they also need to be marked with btrfs_add_excluded_extent to prevent log tree extent buffer being loaded by the free space caching thread. That's required since log tree blocks are not recorded in the extent tree, hence they always look free. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:37 +01:00
Nikolay Borisov	6b45f64172	btrfs: Pass transaction handle to write_pinned_extent_entries Preparation for refactoring pinned extents tracking. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:37 +01:00
Nikolay Borisov	6690d07126	btrfs: Make pin_down_extent take transaction handle All callers have a reference to a transaction handle so pass it to pin_down_extent. This is the final step before switching pinned extent tracking to a per-transaction basis. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:37 +01:00
Nikolay Borisov	9fce570454	btrfs: Make btrfs_pin_extent_for_log_replay take transaction handle Preparation for refactoring pinned extents tracking. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:37 +01:00
Nikolay Borisov	7bfc100705	btrfs: Make btrfs_pin_reserved_extent take transaction handle btrfs_pin_reserved_extent is now only called with a valid transaction so exploit the fact to take a transaction. This is preparation for tracking pinned extents on a per-transaction basis. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:37 +01:00
Nikolay Borisov	10e958d523	btrfs: Call btrfs_pin_reserved_extent only during active transaction Calling btrfs_pin_reserved_extent makes sense only with a valid transaction since pinned extents are processed from transaction commit in btrfs_finish_extent_commit. In case of error it's sufficient to adjust the reserved counter to account for log tree extents allocated in the last transaction. This commit moves btrfs_pin_reserved_extent to be called only with valid transaction handle and otherwise uses the newly introduced unaccount_log_buffer to adjust "reserved". If this is not done if a failure occurs before transaction is committed WARN_ON are going to be triggered on unmount. This was especially pronounced with generic/475 test. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:36 +01:00
Nikolay Borisov	6787bb9f35	btrfs: Introduce unaccount_log_buffer This function correctly adjusts the reserved bytes occupied by a log tree extent buffer. It will be used instead of calling btrfs_pin_reserved_extent. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:36 +01:00
Nikolay Borisov	b25c36f84b	btrfs: Make btrfs_pin_extent take trans handle Preparation for switching pinned extent tracking to a per-transaction basis. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:36 +01:00
Nikolay Borisov	f603bb94ab	btrfs: Perform pinned cleanup directly in btrfs_destroy_delayed_refs Having btrfs_destroy_delayed_refs call btrfs_pin_extent is problematic for making pinned extents tracking per-transaction since btrfs_trans_handle cannot be passed to btrfs_pin_extent in this context. Additionally delayed refs heads pinned in btrfs_destroy_delayed_refs are going to be handled very closely, in btrfs_destroy_pinned_extent. To enable btrfs_pin_extent to take btrfs_trans_handle simply open code it in btrfs_destroy_delayed_refs and call btrfs_error_unpin_extent_range on the range. This enables us to do less work in btrfs_destroy_pinned_extent and leaves btrfs_pin_extent being called in contexts which have a valid btrfs_trans_handle. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:36 +01:00
Anand Jain	25864778bc	btrfs: sysfs, unify handler name of devinfo/missing The devinfo attribute handlers were added in `668e48af7a` ("btrfs: sysfs, add devid/dev_state kobject and device attributes") and the name should contain _devinfo_, there's one that does not conform, so unify it with the rest. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:36 +01:00
Anand Jain	f3cd2c5811	btrfs: sysfs, rename device_link add/remove functions Since commit `668e48af7a` ("btrfs: sysfs, add devid/dev_state kobject and device attributes"), the functions btrfs_sysfs_add_device_link() and btrfs_sysfs_rm_device_link() do more than just adding and removing the device link as its name indicated. Rename them to be more specific that's about the directory with the attirbutes Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:35 +01:00
Anand Jain	1f6087e69c	btrfs: sysfs, use btrfs_sysfs_remove_fsid to celanup errors in add_fsid We have one simple function btrfs_sysfs_remove_fsid() to undo btrfs_sysfs_add_fsid(), which also does proper checks before releasing objects. One difference, if btrfs_sysfs_remove_fsid is used that now we also call kobject_del() which was missing before. This was tested (with kobject debug turned on) and no change in behaviour was found. This is a cleanup patch. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:35 +01:00
David Sterba	f657a31c86	btrfs: sink argument tree to __do_readpage The tree pointer can be safely read from the inode, use it and drop the redundant argument. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:35 +01:00
David Sterba	b6660e80f1	btrfs: sink arugment tree to contiguous_readpages The tree pointer can be safely read from the inode, use it and drop the redundant argument. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:35 +01:00
David Sterba	0d44fea77e	btrfs: sink argument tree to __extent_read_full_page The tree pointer can be safely read from the inode, use it and drop the redundant argument. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:35 +01:00
David Sterba	71ad38b44e	btrfs: sink argument tree to extent_read_full_page The tree pointer can be safely read from the page's inode, use it and drop the redundant argument. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:35 +01:00
David Sterba	b272ae22ac	btrfs: drop argument tree from btrfs_lock_and_flush_ordered_range The tree pointer can be safely read from the inode so we can drop the redundant argument from btrfs_lock_and_flush_ordered_range. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:34 +01:00
David Sterba	ae6957ebbf	btrfs: add assertions for tree == inode->io_tree to extent IO helpers Add assertions to all helpers that get tree as argument and verify that it's the same that can be obtained from the inode or from its pages. In followup patches the redundant arguments and assertions will be removed one by one. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:34 +01:00
David Sterba	0ceb34bf46	btrfs: drop argument tree from submit_extent_page Now that we're sure the tree from argument is same as the one we can get from the page's inode io_tree, drop the redundant argument. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:34 +01:00
David Sterba	45b08405b9	btrfs: remove extent_page_data::tree All functions that set up extent_page_data::tree set it to the inode io_tree. That's passed down the callstack that accesses either the same inode or its pages. In the end submit_extent_page can pull the tree out of the page and we don't have to store it in the structure. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:34 +01:00
David Sterba	bf31f87f71	btrfs: add wrapper for transaction abort predicate The status of aborted transaction can change between calls and it needs to be accessed by READ_ONCE. Add a helper that also wraps the unlikely hint. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:34 +01:00
David Sterba	b908c334e7	btrfs: move root node locking helpers to locking.c The helpers are related to locking so move them there, update comments. Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:33 +01:00
Josef Bacik	0024652895	btrfs: rename btrfs_put_fs_root and btrfs_grab_fs_root We are now using these for all roots, rename them to btrfs_put_root() and btrfs_grab_root(); Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:33 +01:00
Josef Bacik	bd647ce385	btrfs: add a leak check for roots Now that we're going to start relying on getting ref counting right for roots, add a list to track allocated roots and print out any roots that aren't freed up at free_fs_info time. Hide this behind CONFIG_BTRFS_DEBUG because this will just be used for developers to verify they aren't breaking things. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:33 +01:00
Josef Bacik	8260edba67	btrfs: make the init of static elements in fs_info separate In adding things like eb leak checking and root leak checking there were a lot of weird corner cases that come from the fact that 1) We do not init the fs_info until we get to open_ctree time in the normal case and 2) The test infrastructure half-init's the fs_info for things that it needs. This makes it really annoying to make changes because you have to add init in two different places, have special cases for testing fs_info's that may not have certain things initialized, and cases for fs_info's that didn't make it to open_ctree and thus are not fully set up. Fix this by extracting out the non-allocating init of the fs info into it's own public function and use that to make sure we're all getting consistent views of an allocated fs_info. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:33 +01:00
Josef Bacik	ae18c37ad5	btrfs: move fs_info init work into it's own helper function open_ctree mixes initialization of fs stuff and fs_info stuff, which makes it confusing when doing things like adding the root leak detection. Make a separate function that inits all the static structures inside of the fs_info needed for the fs to operate, and then call that before we start setting up the fs_info to be mounted. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:33 +01:00
Josef Bacik	141386e1a5	btrfs: free more things in btrfs_free_fs_info Things like the percpu_counters, the mapping_tree, and the csum hash can all be freed at btrfs_free_fs_info time, since the helpers all check if the structure has been initialized already. This significantly cleans up the error cases in open_ctree. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:32 +01:00
Josef Bacik	bc44d7c4b2	btrfs: push btrfs_grab_fs_root into btrfs_get_fs_root Now that all callers of btrfs_get_fs_root are subsequently calling btrfs_grab_fs_root and handling dropping the ref when they are done appropriately, go ahead and push btrfs_grab_fs_root up into btrfs_get_fs_root. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:32 +01:00
Josef Bacik	81f096edf0	btrfs: use btrfs_put_fs_root to free roots always If we are going to track leaked roots we need to free them all the same way, so don't kfree() roots directly, use btrfs_put_fs_root. Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:32 +01:00
Josef Bacik	4c78e9f596	btrfs: hold a ref on the root in open_ctree We lookup the fs_root and put it in our fs_info directly, we should hold a ref on this root for the lifetime of the fs_info. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:32 +01:00
Josef Bacik	0d4b046301	btrfs: export and rename free_fs_info We're going to start freeing roots and doing other complicated things in free_fs_info, so we need to move it to disk-io.c and export it in order to use things lik btrfs_put_fs_root(). Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:32 +01:00
Josef Bacik	fbb0ce40d6	btrfs: hold a ref on the root in btrfs_check_uuid_tree_entry We lookup the uuid of arbitrary subvolumes, hold a ref on the root while we're doing this. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:31 +01:00
Josef Bacik	ca2037fba6	btrfs: hold a ref on the root in btrfs_recover_log_trees We replay the log into arbitrary fs roots, hold a ref on the root while we're doing this. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:31 +01:00
Josef Bacik	5119cfc36f	btrfs: hold a ref on the root in create_pending_snapshot We create the snapshot and then use it for a bunch of things, we need to hold a ref on it while we're messing with it. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:31 +01:00
Josef Bacik	5168489a07	btrfs: hold a ref on the root in get_subvol_name_from_objectid We lookup the name of a subvol which means we'll cross into different roots. Hold a ref while we're doing the look ups in the fs_root we're searching. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:31 +01:00
Josef Bacik	6f9a3da5da	btrfs: hold a ref on the root in btrfs_ioctl_send We lookup all the clone roots and the parent root for send, so we need to hold refs on all of these roots while we're processing them. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:31 +01:00
Josef Bacik	fd79d43b34	btrfs: hold a ref on the root in scrub_print_warning_inode We look up the root for the bytenr that is failing, so we need to hold a ref on the root for that operation. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:30 +01:00
Josef Bacik	0b2dee5cff	btrfs: hold a ref for the root in btrfs_find_orphan_roots We lookup roots for every orphan item we have, we need to hold a ref on the root while we're doing this work. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:30 +01:00
Josef Bacik	9f583209f2	btrfs: push grab_fs_root into read_fs_root All of relocation uses read_fs_root to lookup fs roots, so push the btrfs_grab_fs_root() up into that helper and remove the individual calls. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2020-03-23 17:01:30 +01:00

1 2 3 4 5 ...

8807 Commits