linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-25 11:11:13 +07:00

Author	SHA1	Message	Date
Anand Jain	12b9bf0b94	btrfs: write_dev_flush does not return ENOMEM anymore Since commit "btrfs: btrfs_io_bio_alloc never fails, skip error handling" write_dev_flush will not return ENOMEM in the sending part. We do not need to check for it in the callers. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ updated changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-06-19 18:26:04 +02:00
David Sterba	c5e4c3d750	btrfs: sink gfp parameter to btrfs_io_bio_alloc We can hardcode GFP_NOFS to btrfs_io_bio_alloc, although it means we change it back from GFP_KERNEL in scrub. I'd rather save a few stack bytes from not passing the gfp flags in the remaining, more imporatant, contexts and the bio allocating API now looks more consistent. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-06-19 18:26:04 +02:00
David Sterba	e4f5690386	btrfs: btrfs_io_bio_alloc never fails, skip error handling Update direct callers of btrfs_io_bio_alloc that do error handling, that we can now remove. Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-06-19 18:26:02 +02:00
David Sterba	4b5faeac46	btrfs: use generic slab for for btrfs_transaction Observing the number of slab objects of btrfs_transaction, there's just one active on an almost quiescent filesystem, and the number of objects goes to about ten when sync is in progress. Then the nubmer goes down to 1. This matches the expectations of the transaction lifetime. For such use the separate slab cache is not justified, as we do not reuse objects frequently. For the shortlived transaction, the generic slab (size 512) should be ok. We can optimistically expect that the 512 slabs are not all used (fragmentation) and there are free slots to take when we do the allocation, compared to potentially allocating a whole new page for the separate slab. We'll lose the stats about the object use, which could be added later if we really need them. Signed-off-by: David Sterba <dsterba@suse.com>	2017-06-19 18:26:01 +02:00
Jeff Layton	3189ff7786	btrfs: btrfs_wait_tree_block_writeback can be void return Nothing checks its return value. Is it safe to skip checking return value of btrfs_wait_tree_block_writeback? Liu Bo: I think yes, it's used in walk_log_tree which is called in two places, free_log_tree and log replay. For free_log_tree, it waits for any running writeback of the extent buffer under freeing to finish in case we need to access the eb pointer from page->private, and it's OK to not check the return value, while for log replay, it's doesn't wait because wc->wait is not set. So neither cares about the writeback error. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> [ added more explanation to changelog, from Liu Bo ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-06-19 18:26:01 +02:00
David Sterba	c9fed2bb61	btrfs: remove unused member list from btrfs_end_io_wq The end io work queue items have been tracked by the work queues since "Btrfs: Add async worker threads for pre and post IO checksumming" (`8b71284292`) (2008). Signed-off-by: David Sterba <dsterba@suse.com>	2017-06-19 18:25:59 +02:00
David Sterba	b297c9f68f	btrfs: remove unused member list from async_submit_bio The list used to track checksums in the early version (2.6.29), but I was able not pinpoint the commit that stopped using it. Everything apparently works without it for a long time. Signed-off-by: David Sterba <dsterba@suse.com>	2017-06-19 18:25:59 +02:00
Josef Bacik	c6100a4b4e	Btrfs: replace tree->mapping with tree->private_data For extent_io tree's we have carried the address_mapping of the inode around in the io tree in order to pull the inode back out for calling into various tree ops hooks. This works fine when everything that has an extent_io_tree has an inode. But we are going to remove the btree_inode, so we need to change this. Instead just have a generic void * for private data that we can initialize with, and have all the tree ops use that instead. This had a lot of cascading changes but should be relatively straightforward. Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: David Sterba <dsterba@suse.com> [ minor reordering of the callback prototypes ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-06-19 18:25:58 +02:00
Nikolay Borisov	a5ed45f822	btrfs: Convert fs_info->free_chunk_space to atomic64_t The ->free_chunk_space variable is used to track the unallocated space and access to it is protected by a spinlock, which is not used for anything else. Make the code a bit self-explanatory by switching the variable to an atomic64_t type and kill the spinlock. Signed-off-by: Nikolay Borisov <nborisov@suse.com> [ not a performance critical code, use of atomic type is ok ] Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-06-19 18:25:58 +02:00
Anand Jain	401b41e5a8	btrfs: add framework to handle device flush error as a volume This adds comments to the flush error handling part of the code, and hopes to maintain the same logic with a framework which can be used to handle the errors at the volume level. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-06-19 18:25:58 +02:00
Jens Axboe	8f66439eec	Linux 4.12-rc5 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJZPdbLAAoJEHm+PkMAQRiGx4wH/1nCjfnl6fE8oJ24/1gEAOUh biFdqJkYZmlLYHVtYfLm4Ueg4adJdg0wx6qM/4RaAzmQVvLfDV34bc1qBf1+P95G kVF+osWyXrZo5cTwkwapHW/KNu4VJwAx2D1wrlxKDVG5AOrULH1pYOYGOpApEkZU 4N+q5+M0ce0GJpqtUZX+UnI33ygjdDbBxXoFKsr24B7eA0ouGbAJ7dC88WcaETL+ 2/7tT01SvDMo0jBSV0WIqlgXwZ5gp3yPGnklC3F4159Yze6VFrzHMKS/UpPF8o8E W9EbuzwxsKyXUifX2GY348L1f+47glen/1sedbuKnFhP6E9aqUQQJXvEO7ueQl4= =m2Gx -----END PGP SIGNATURE----- Merge tag 'v4.12-rc5' into for-4.13/block We've already got a few conflicts and upcoming work depends on some of the changes that have gone into mainline as regression fixes for this series. Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream trees to continue working on 4.13 changes. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2017-06-12 08:30:13 -06:00
Linus Torvalds	66cea28a94	Merge branch 'for-linus-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Some fixes that Dave Sterba collected. We've been hitting an early enospc problem on production machines that Omar tracked down to an old int->u64 mistake. I waited a bit on this pull to make sure it was really the problem from production, but it's on ~2100 hosts now and I think we're good. Omar also noticed a commit in the queue would make new early ENOSPC problems. I pulled that out for now, which is why the top three commits are younger than the rest. Otherwise these are all fixes, some explaining very old bugs that we've been poking at for a while" * 'for-linus-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix delalloc accounting leak caused by u32 overflow Btrfs: clear EXTENT_DEFRAG bits in finish_ordered_io btrfs: tree-log.c: Wrong printk information about namelen btrfs: fix race with relocation recovery and fs_root setup btrfs: fix memory leak in update_space_info failure path btrfs: use correct types for page indices in btrfs_page_exists_in_range btrfs: fix incorrect error return ret being passed to mapping_set_error btrfs: Make flush bios explicitely sync btrfs: fiemap: Cache and merge fiemap extent before submit it to user	2017-06-10 11:06:05 -07:00
Christoph Hellwig	4e4cbee93d	block: switch bios to blk_status_t Replace bi_error with a new bi_status to allow for a clear conversion. Note that device mapper overloaded bi_error with a private value, which we'll have to keep arround at least for now and thus propagate to a proper blk_status_t value. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-06-09 09:27:32 -06:00
Jan Kara	8d91012528	btrfs: Make flush bios explicitely sync Commit `b685d3d65a` "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_{FUA\|PREFLUSH\|...} definitions. generic_make_request_checks() however strips REQ_FUA and REQ_PREFLUSH flags from a bio when the storage doesn't report volatile write cache and thus write effectively becomes asynchronous which can lead to performance regressions Fix the problem by making sure all bios which are synchronous are properly marked with REQ_SYNC. CC: David Sterba <dsterba@suse.com> CC: linux-btrfs@vger.kernel.org Fixes: `b685d3d65a` Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-05-16 15:42:01 +02:00
Linus Torvalds	1176032cb1	Merge branch 'for-linus-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "This has fixes and cleanups Dave Sterba collected for the merge window. The biggest functional fixes are between btrfs raid5/6 and scrub, and raid5/6 and device replacement. Some of our pending qgroup fixes are included as well while I bash on the rest in testing. We also have the usual set of cleanups, including one that makes __btrfs_map_block() much more maintainable, and conversions from atomic_t to refcount_t" * 'for-linus-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (71 commits) btrfs: fix the gfp_mask for the reada_zones radix tree Btrfs: fix reported number of inode blocks Btrfs: send, fix file hole not being preserved due to inline extent Btrfs: fix extent map leak during fallocate error path Btrfs: fix incorrect space accounting after failure to insert inline extent Btrfs: fix invalid attempt to free reserved space on failure to cow range btrfs: Handle delalloc error correctly to avoid ordered extent hang btrfs: Fix metadata underflow caused by btrfs_reloc_clone_csum error btrfs: check if the device is flush capable btrfs: delete unused member nobarriers btrfs: scrub: Fix RAID56 recovery race condition btrfs: scrub: Introduce full stripe lock for RAID56 btrfs: Use ktime_get_real_ts for root ctime Btrfs: handle only applicable errors returned by btrfs_get_extent btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option btrfs: use q which is already obtained from bdev_get_queue Btrfs: switch to div64_u64 if with a u64 divisor Btrfs: update scrub_parity to use u64 stripe_len Btrfs: enable repair during read for raid56 profile btrfs: use clear_page where appropriate ...	2017-05-10 08:33:17 -07:00
Chris Mason	9bcaaea741	btrfs: fix the gfp_mask for the reada_zones radix tree Commits `cc8385b59e` and `7ef70b4d99` added preallocation for the reada radix trees and also switched them over to GFP_KERNEL for the default gfp mask. Since we're doing radix tree insertions under spinlocks, we need to make sure the mask doesn't allow sleeping. This fix keeps the radix preallocation but switches back to the original gfp_mask. Reported-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2017-05-04 16:56:11 -07:00
Linus Torvalds	694752922b	Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block Pull block layer updates from Jens Axboe: - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ was initially a fork of CFQ, but subsequently changed to implement fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant to be used on desktop type single drives, providing good fairness. From Paolo. - Add Kyber IO scheduler. This is a full multiqueue aware scheduler, using a scalable token based algorithm that throttles IO based on live completion IO stats, similary to blk-wbt. From Omar. - A series from Jan, moving users to separately allocated backing devices. This continues the work of separating backing device life times, solving various problems with hot removal. - A series of updates for lightnvm, mostly from Javier. Includes a 'pblk' target that exposes an open channel SSD as a physical block device. - A series of fixes and improvements for nbd from Josef. - A series from Omar, removing queue sharing between devices on mostly legacy drivers. This helps us clean up other bits, if we know that a queue only has a single device backing. This has been overdue for more than a decade. - Fixes for the blk-stats, and improvements to unify the stats and user windows. This both improves blk-wbt, and enables other users to register a need to receive IO stats for a device. From Omar. - blk-throttle improvements from Shaohua. This provides a scalable framework for implementing scalable priotization - particularly for blk-mq, but applicable to any type of block device. The interface is marked experimental for now. - Bucketized IO stats for IO polling from Stephen Bates. This improves efficiency of polled workloads in the presence of mixed block size IO. - A few fixes for opal, from Scott. - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics. From a variety of folks, mostly Sagi and James Smart. - A series from Bart, improving our exposed info and capabilities from the blk-mq debugfs support. - A series from Christoph, cleaning up how handle WRITE_ZEROES. - A series from Christoph, cleaning up the block layer handling of how we track errors in a request. On top of being a nice cleanup, it also shrinks the size of struct request a bit. - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was never used by platforms, and the latter has outlived it's usefulness. - Various little bug fixes and cleanups from a wide variety of folks. * 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits) block: hide badblocks attribute by default blk-mq: unify hctx delay_work and run_work block: add kblock_mod_delayed_work_on() blk-mq: unify hctx delayed_run_work and run_work nbd: fix use after free on module unload MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler blk-mq-sched: alloate reserved tags out of normal pool mtip32xx: use runtime tag to initialize command header scsi: Implement blk_mq_ops.show_rq() blk-mq: Add blk_mq_ops.show_rq() blk-mq: Show operation, cmd_flags and rq_flags names blk-mq: Make blk_flags_show() callers append a newline character blk-mq: Move the "state" debugfs attribute one level down blk-mq: Unregister debugfs attributes earlier blk-mq: Only unregister hctxs for which registration succeeded blk-mq-debugfs: Rename functions for registering and unregistering the mq directory blk-mq: Let blk_mq_debugfs_register() look up the queue name blk-mq: Register <dev>/queue/mq after having registered <dev>/queue ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset ide-pm: always pass 0 error to __blk_end_request_all ..	2017-05-01 10:39:57 -07:00
Jan Kara	9e11ceee23	btrfs: Convert to separately allocated bdi Allocate struct backing_dev_info separately instead of embedding it inside superblock. This unifies handling of bdi among users. CC: Chris Mason <clm@fb.com> CC: Josef Bacik <jbacik@fb.com> CC: David Sterba <dsterba@suse.com> CC: linux-btrfs@vger.kernel.org Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-20 12:09:55 -06:00
Anand Jain	c2a9c7ab47	btrfs: check if the device is flush capable The block layer call chain from submit_bio will check if the write cache is enabled for the given queue before submitting the flush. This will add a code to fail fast if its not. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> [ updated changelog to reflect current code stat, blkdev_issue_flush is not used yet ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-04-18 16:13:27 +02:00
Anand Jain	13e88e1560	btrfs: delete unused member nobarriers The last consumer of nobarriers is removed by the commit [1] and sync won't fail with EOPNOTSUPP anymore. Thus, now when write cache is write through it just return success without actually transpiring such a request to the block device/lun. [1] commit `b25de9d6da` block: remove BIO_EOPNOTSUPP And, as the device/lun write cache state may change dynamically saving such as state won't help either. So deleting the member nobarriers. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-04-18 16:12:07 +02:00
David Sterba	d48d71aa99	btrfs: remove redundant parameter from btree_readahead_hook We can read fs_info from eb. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-04-18 14:07:25 +02:00
David Sterba	7ef70b4d99	btrfs: preallocate radix tree node for global readahead tree We can preallocate the node so insertion does not have to do that under the lock. The GFP flags for the global radix tree are initialized to GFP_NOFS & ~__GFP_DIRECT_RECLAIM but we can use GFP_KERNEL, because readahead is optional and not on any critical writeout path. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-04-18 14:07:25 +02:00
Elena Reshetova	0700cea7c8	btrfs: convert btrfs_root.refs from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-04-18 14:07:23 +02:00
Elena Reshetova	6df8cdf5bd	btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-04-18 14:07:23 +02:00
Elena Reshetova	9b64f57ddf	btrfs: convert btrfs_transaction.use_count from atomic_t to refcount_t refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-04-18 14:07:23 +02:00
Linus Torvalds	fe8e12b503	Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "We have three small fixes queued up in my for-linus-4.11 branch" * 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix an integer overflow check btrfs: Change qgroup_meta_rsv to 64bit Btrfs: bring back repair during read	2017-03-31 17:58:48 -07:00
Goldwyn Rodrigues	ce0dcee626	btrfs: Change qgroup_meta_rsv to 64bit Using an int value is causing qg->reserved to become negative and exclusive -EDQUOT to be reached prematurely. This affects exclusive qgroups only. TEST CASE: DEVICE=/dev/vdb MOUNTPOINT=/mnt SUBVOL=$MOUNTPOINT/tmp umount $SUBVOL umount $MOUNTPOINT mkfs.btrfs -f $DEVICE mount /dev/vdb $MOUNTPOINT btrfs quota enable $MOUNTPOINT btrfs subvol create $SUBVOL umount $MOUNTPOINT mount /dev/vdb $MOUNTPOINT mount -o subvol=tmp $DEVICE $SUBVOL btrfs qgroup limit -e 3G $SUBVOL btrfs quota rescan /mnt -w for i in `seq 1 44000`; do dd if=/dev/zero of=/mnt/tmp/test_$i bs=10k count=1 if [[ $? > 0 ]]; then btrfs qgroup show -pcref $SUBVOL exit 1 fi done Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com> [ add reproducer to changelog ] Signed-off-by: David Sterba <dsterba@suse.com>	2017-03-29 14:29:08 +02:00
Linus Torvalds	bbe08c0a43	Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull more btrfs updates from Chris Mason: "Btrfs round two. These are mostly a continuation of Dave Sterba's collection of cleanups, but Filipe also has some bug fixes and performance improvements" * 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (69 commits) btrfs: add dummy callback for readpage_io_failed and drop checks btrfs: drop checks for mandatory extent_io_ops callbacks btrfs: document existence of extent_io ops callbacks btrfs: let writepage_end_io_hook return void btrfs: do proper error handling in btrfs_insert_xattr_item btrfs: handle allocation error in update_dev_stat_item btrfs: remove BUG_ON from __tree_mod_log_insert btrfs: derive maximum output size in the compression implementation btrfs: use predefined limits for calculating maximum number of pages for compression btrfs: export compression buffer limits in a header btrfs: merge nr_pages input and output parameter in compress_pages btrfs: merge length input and output parameter in compress_pages btrfs: constify name of subvolume in creation helpers btrfs: constify buffers used by compression helpers btrfs: constify input buffer of btrfs_csum_data btrfs: constify device path passed to relevant helpers btrfs: make btrfs_inode_resume_unlocked_dio take btrfs_inode btrfs: make btrfs_inode_block_unlocked_dio take btrfs_inode btrfs: Make btrfs_add_nondir take btrfs_inode btrfs: Make btrfs_add_link take btrfs_inode ...	2017-03-02 16:03:00 -08:00
Chris Mason	e9f467d028	Merge branch 'for-chris-4.11-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.11	2017-02-28 14:35:09 -08:00
David Sterba	20a7db8ab3	btrfs: add dummy callback for readpage_io_failed and drop checks Make extent_io_ops::readpage_io_failed_hook callback mandatory and define a dummy function for btrfs_extent_io_ops. As the failed IO callback is not performance critical, the branch vs extra trade off does not hurt. Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-28 14:29:24 +01:00
David Sterba	4d53dddbec	btrfs: document existence of extent_io ops callbacks Some of the callbacks defined in btree_extent_io_ops and btrfs_extent_io_ops do always exist so we don't need to check the existence before each call. This patch just reorders the definition and documents which are mandatory/optional. Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-28 14:29:24 +01:00
David Sterba	9ed573674a	btrfs: constify input buffer of btrfs_csum_data The function does not modify the input buffer, also update a typecast in one caller. Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-28 14:26:07 +01:00
Nikolay Borisov	fc4f21b1d8	btrfs: Make get_extent_t take btrfs_inode In addition to changing the signature, this patch also switches all the functions which are used as an argument to also take btrfs_inode. Namely those are: btrfs_get_extent and btrfs_get_extent_filemap. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-28 11:30:11 +01:00
Linus Torvalds	9003ed1fed	Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "This has a series of fixes and cleanups that Dave Sterba has been collecting. There is a pretty big variety here, cleaning up internal APIs and fixing corner cases" * 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (124 commits) Btrfs: use the correct type when creating cow dio extent Btrfs: fix deadlock between dedup on same file and starting writeback btrfs: use btrfs_debug instead of pr_debug in transaction abort btrfs: btrfs_truncate_free_space_cache always allocates path btrfs: free-space-cache, clean up unnecessary root arguments btrfs: convert btrfs_inc_block_group_ro to accept fs_info btrfs: flush_space always takes fs_info->fs_root btrfs: pass fs_info to (more) routines that are only called with extent_root btrfs: qgroup: Move half of the qgroup accounting time out of commit trans btrfs: remove unused parameter from adjust_slots_upwards btrfs: remove unused parameters from __btrfs_write_out_cache btrfs: remove unused parameter from cleanup_write_cache_enospc btrfs: remove unused parameter from __add_inode_ref btrfs: remove unused parameter from clone_copy_inline_extent btrfs: remove unused parameters from btrfs_cmp_data btrfs: remove unused parameter from __add_inline_refs btrfs: remove unused parameters from scrub_setup_wr_ctx btrfs: remove unused parameter from create_snapshot btrfs: remove unused parameter from init_first_rw_device btrfs: remove unused parameter from __btrfs_alloc_chunk ...	2017-02-25 14:53:58 -08:00
Filipe Manana	a9b9477db2	Btrfs: fix use-after-free due to wrong order of destroying work queues Before we destroy all work queues (and wait for their tasks to complete) we were destroying the work queues used for metadata I/O operations, which can result in a use-after-free problem because most tasks from all work queues do metadata I/O operations. For example, the tasks from the caching workers work queue (fs_info->caching_workers), which is destroyed only after the work queue used for metadata reads (fs_info->endio_meta_workers) is destroyed, do metadata reads, which result in attempts to queue tasks into the later work queue, triggering a use-after-free with a trace like the following: [23114.613543] general protection fault: 0000 [#1] PREEMPT SMP [23114.614442] Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c btrfs xor raid6_pq dm_flakey dm_mod crc32c_generic acpi_cpufreq tpm_tis tpm_tis_core tpm ppdev parport_pc parport i2c_piix4 processor sg evdev i2c_core psmouse pcspkr serio_raw button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod floppy [last unloaded: scsi_debug] [23114.616932] CPU: 9 PID: 4537 Comm: kworker/u32:8 Not tainted 4.9.0-rc7-btrfs-next-36+ #1 [23114.616932] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [23114.616932] Workqueue: btrfs-cache btrfs_cache_helper [btrfs] [23114.616932] task: ffff880221d45780 task.stack: ffffc9000bc50000 [23114.616932] RIP: 0010:[<ffffffffa037c1bf>] [<ffffffffa037c1bf>] btrfs_queue_work+0x2c/0x190 [btrfs] [23114.616932] RSP: 0018:ffff88023f443d60 EFLAGS: 00010246 [23114.616932] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000102 [23114.616932] RDX: ffffffffa0419000 RSI: ffff88011df534f0 RDI: ffff880101f01c00 [23114.616932] RBP: ffff88023f443d80 R08: 00000000000f7000 R09: 000000000000ffff [23114.616932] R10: ffff88023f443d48 R11: 0000000000001000 R12: ffff88011df534f0 [23114.616932] R13: ffff880135963868 R14: 0000000000001000 R15: 0000000000001000 [23114.616932] FS: 0000000000000000(0000) GS:ffff88023f440000(0000) knlGS:0000000000000000 [23114.616932] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [23114.616932] CR2: 00007f0fb9f8e520 CR3: 0000000001a0b000 CR4: 00000000000006e0 [23114.616932] Stack: [23114.616932] ffff880101f01c00 ffff88011df534f0 ffff880135963868 0000000000001000 [23114.616932] ffff88023f443da0 ffffffffa03470af ffff880149b37200 ffff880135963868 [23114.616932] ffff88023f443db8 ffffffff8125293c ffff880149b37200 ffff88023f443de0 [23114.616932] Call Trace: [23114.616932] <IRQ> [23114.616932] [<ffffffffa03470af>] end_workqueue_bio+0xd5/0xda [btrfs] [23114.616932] [<ffffffff8125293c>] bio_endio+0x54/0x57 [23114.616932] [<ffffffffa0377929>] btrfs_end_bio+0xf7/0x106 [btrfs] [23114.616932] [<ffffffff8125293c>] bio_endio+0x54/0x57 [23114.616932] [<ffffffff8125955f>] blk_update_request+0x21a/0x30f [23114.616932] [<ffffffffa0022316>] scsi_end_request+0x31/0x182 [scsi_mod] [23114.616932] [<ffffffffa00235fc>] scsi_io_completion+0x1ce/0x4c8 [scsi_mod] [23114.616932] [<ffffffffa001ba9d>] scsi_finish_command+0x104/0x10d [scsi_mod] [23114.616932] [<ffffffffa002311f>] scsi_softirq_done+0x101/0x10a [scsi_mod] [23114.616932] [<ffffffff8125fbd9>] blk_done_softirq+0x82/0x8d [23114.616932] [<ffffffff814c8a4b>] __do_softirq+0x1ab/0x412 [23114.616932] [<ffffffff8105b01d>] irq_exit+0x49/0x99 [23114.616932] [<ffffffff81035135>] smp_call_function_single_interrupt+0x24/0x26 [23114.616932] [<ffffffff814c7ec9>] call_function_single_interrupt+0x89/0x90 [23114.616932] <EOI> [23114.616932] [<ffffffffa0023262>] ? scsi_request_fn+0x13a/0x2a1 [scsi_mod] [23114.616932] [<ffffffff814c5966>] ? _raw_spin_unlock_irq+0x2c/0x4a [23114.616932] [<ffffffff814c596c>] ? _raw_spin_unlock_irq+0x32/0x4a [23114.616932] [<ffffffff814c5966>] ? _raw_spin_unlock_irq+0x2c/0x4a [23114.616932] [<ffffffffa0023262>] scsi_request_fn+0x13a/0x2a1 [scsi_mod] [23114.616932] [<ffffffff8125590e>] __blk_run_queue_uncond+0x22/0x2b [23114.616932] [<ffffffff81255930>] __blk_run_queue+0x19/0x1b [23114.616932] [<ffffffff8125ab01>] blk_queue_bio+0x268/0x282 [23114.616932] [<ffffffff81258f44>] generic_make_request+0xbd/0x160 [23114.616932] [<ffffffff812590e7>] submit_bio+0x100/0x11d [23114.616932] [<ffffffff81298603>] ? __this_cpu_preempt_check+0x13/0x15 [23114.616932] [<ffffffff812a1805>] ? __percpu_counter_add+0x8e/0xa7 [23114.616932] [<ffffffffa03bfd47>] btrfsic_submit_bio+0x1a/0x1d [btrfs] [23114.616932] [<ffffffffa0377db2>] btrfs_map_bio+0x1f4/0x26d [btrfs] [23114.616932] [<ffffffffa0348a33>] btree_submit_bio_hook+0x74/0xbf [btrfs] [23114.616932] [<ffffffffa03489bf>] ? btrfs_wq_submit_bio+0x160/0x160 [btrfs] [23114.616932] [<ffffffffa03697a9>] submit_one_bio+0x6b/0x89 [btrfs] [23114.616932] [<ffffffffa036f5be>] read_extent_buffer_pages+0x170/0x1ec [btrfs] [23114.616932] [<ffffffffa03471fa>] ? free_root_pointers+0x64/0x64 [btrfs] [23114.616932] [<ffffffffa0348adf>] readahead_tree_block+0x3f/0x4c [btrfs] [23114.616932] [<ffffffffa032e115>] read_block_for_search.isra.20+0x1ce/0x23d [btrfs] [23114.616932] [<ffffffffa032fab8>] btrfs_search_slot+0x65f/0x774 [btrfs] [23114.616932] [<ffffffffa036eff1>] ? free_extent_buffer+0x73/0x7e [btrfs] [23114.616932] [<ffffffffa0331ba4>] btrfs_next_old_leaf+0xa1/0x33c [btrfs] [23114.616932] [<ffffffffa0331e4f>] btrfs_next_leaf+0x10/0x12 [btrfs] [23114.616932] [<ffffffffa0336aa6>] caching_thread+0x22d/0x416 [btrfs] [23114.616932] [<ffffffffa037bce9>] btrfs_scrubparity_helper+0x187/0x3b6 [btrfs] [23114.616932] [<ffffffffa037c036>] btrfs_cache_helper+0xe/0x10 [btrfs] [23114.616932] [<ffffffff8106cf96>] process_one_work+0x273/0x4e4 [23114.616932] [<ffffffff8106d6db>] worker_thread+0x1eb/0x2ca [23114.616932] [<ffffffff8106d4f0>] ? rescuer_thread+0x2b6/0x2b6 [23114.616932] [<ffffffff81072a81>] kthread+0xd5/0xdd [23114.616932] [<ffffffff810729ac>] ? __kthread_unpark+0x5a/0x5a [23114.616932] [<ffffffff814c6257>] ret_from_fork+0x27/0x40 [23114.616932] Code: 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 f4 48 8b 46 70 a8 04 74 09 48 8b 5f 08 48 85 db 75 03 48 8b 1f 49 89 5c 24 68 <83> 7b 64 ff 74 04 f0 ff 43 58 49 83 7c 24 08 00 74 2c 4c 8d 6b [23114.616932] RIP [<ffffffffa037c1bf>] btrfs_queue_work+0x2c/0x190 [btrfs] [23114.616932] RSP <ffff88023f443d60> [23114.689493] ---[ end trace 6e48b6bc707ca34b ]--- [23114.690166] Kernel panic - not syncing: Fatal exception in interrupt [23114.691283] Kernel Offset: disabled [23114.691918] ---[ end Kernel panic - not syncing: Fatal exception in interrupt The following diagram shows the sequence of operations that lead to the use-after-free problem from the above trace: CPU 1 CPU 2 CPU 3 caching_thread() close_ctree() btrfs_stop_all_workers() btrfs_destroy_workqueue( fs_info->endio_meta_workers) btrfs_search_slot() read_block_for_search() readahead_tree_block() read_extent_buffer_pages() submit_one_bio() btree_submit_bio_hook() btrfs_bio_wq_end_io() --> sets the bio's bi_end_io callback to end_workqueue_bio() --> bio is submitted bio completes and its bi_end_io callback is invoked --> end_workqueue_bio() --> attempts to queue a task on fs_info->endio_meta_workers btrfs_destroy_workqueue( fs_info->caching_workers) So fix this by destroying the queues used for metadata I/O tasks only after destroying all the other queues. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>	2017-02-24 00:38:56 +00:00
Filipe Manana	5cdd7db6c5	Btrfs: fix assertion failure when freeing block groups at close_ctree() At close_ctree() we free the block groups and then only after we wait for any running worker kthreads to finish and shutdown the workqueues. This behaviour is racy and it triggers an assertion failure when freeing block groups because while we are doing it we can have for example a block group caching kthread running, and in that case the block group's reference count can still be greater than 1 by the time we assert its reference count is 1, leading to an assertion failure: [19041.198004] assertion failed: atomic_read(&block_group->count) == 1, file: fs/btrfs/extent-tree.c, line: 9799 [19041.200584] ------------[ cut here ]------------ [19041.201692] kernel BUG at fs/btrfs/ctree.h:3418! [19041.202830] invalid opcode: 0000 [#1] PREEMPT SMP [19041.203929] Modules linked in: btrfs xor raid6_pq dm_flakey dm_mod crc32c_generic ppdev sg psmouse acpi_cpufreq pcspkr parport_pc evdev tpm_tis parport tpm_tis_core i2c_piix4 i2c_core tpm serio_raw processor button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio e1000 scsi_mod floppy [last unloaded: btrfs] [19041.208082] CPU: 6 PID: 29051 Comm: umount Not tainted 4.9.0-rc7-btrfs-next-36+ #1 [19041.208082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [19041.208082] task: ffff88015f028980 task.stack: ffffc9000ad34000 [19041.208082] RIP: 0010:[<ffffffffa03e319e>] [<ffffffffa03e319e>] assfail.constprop.41+0x1c/0x1e [btrfs] [19041.208082] RSP: 0018:ffffc9000ad37d60 EFLAGS: 00010286 [19041.208082] RAX: 0000000000000061 RBX: ffff88015ecb4000 RCX: 0000000000000001 [19041.208082] RDX: ffff88023f392fb8 RSI: ffffffff817ef7ba RDI: 00000000ffffffff [19041.208082] RBP: ffffc9000ad37d60 R08: 0000000000000001 R09: 0000000000000000 [19041.208082] R10: ffffc9000ad37cb0 R11: ffffffff82f2b66d R12: ffff88023431d170 [19041.208082] R13: ffff88015ecb40c0 R14: ffff88023431d000 R15: ffff88015ecb4100 [19041.208082] FS: 00007f44f3d42840(0000) GS:ffff88023f380000(0000) knlGS:0000000000000000 [19041.208082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [19041.208082] CR2: 00007f65d623b000 CR3: 00000002166f2000 CR4: 00000000000006e0 [19041.208082] Stack: [19041.208082] ffffc9000ad37d98 ffffffffa035989f ffff88015ecb4000 ffff88015ecb5630 [19041.208082] ffff88014f6be000 0000000000000000 00007ffcf0ba6a10 ffffc9000ad37df8 [19041.208082] ffffffffa0368cd4 ffff88014e9658e0 ffffc9000ad37e08 ffffffff811a634d [19041.208082] Call Trace: [19041.208082] [<ffffffffa035989f>] btrfs_free_block_groups+0x17f/0x392 [btrfs] [19041.208082] [<ffffffffa0368cd4>] close_ctree+0x1c5/0x2e1 [btrfs] [19041.208082] [<ffffffff811a634d>] ? evict_inodes+0x132/0x141 [19041.208082] [<ffffffffa034356d>] btrfs_put_super+0x15/0x17 [btrfs] [19041.208082] [<ffffffff8118fc32>] generic_shutdown_super+0x6a/0xeb [19041.208082] [<ffffffff8119004f>] kill_anon_super+0x12/0x1c [19041.208082] [<ffffffffa0343370>] btrfs_kill_super+0x16/0x21 [btrfs] [19041.208082] [<ffffffff8118fad1>] deactivate_locked_super+0x3b/0x68 [19041.208082] [<ffffffff8118fb34>] deactivate_super+0x36/0x39 [19041.208082] [<ffffffff811a9946>] cleanup_mnt+0x58/0x76 [19041.208082] [<ffffffff811a99a2>] __cleanup_mnt+0x12/0x14 [19041.208082] [<ffffffff81071573>] task_work_run+0x6f/0x95 [19041.208082] [<ffffffff81001897>] prepare_exit_to_usermode+0xa3/0xc1 [19041.208082] [<ffffffff81001a23>] syscall_return_slowpath+0x16e/0x1d2 [19041.208082] [<ffffffff814c607d>] entry_SYSCALL_64_fastpath+0xab/0xad [19041.208082] Code: c7 ae a0 3e a0 48 89 e5 e8 4e 74 d4 e0 0f 0b 55 89 f1 48 c7 c2 0b a4 3e a0 48 89 fe 48 c7 c7 a4 a6 3e a0 48 89 e5 e8 30 74 d4 e0 <0f> 0b 55 31 d2 48 89 e5 e8 d5 b9 f7 ff 5d c3 48 63 f6 55 31 c9 [19041.208082] RIP [<ffffffffa03e319e>] assfail.constprop.41+0x1c/0x1e [btrfs] [19041.208082] RSP <ffffc9000ad37d60> [19041.279264] ---[ end trace 23330586f16f064d ]--- This started happening as of kernel 4.8, since commit `f3bca8028b` ("Btrfs: add ASSERT for block group's memory leak") introduced these assertions. So fix this by freeing the block groups only after waiting for all worker kthreads to complete and shutdown the workqueues. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>	2017-02-24 00:38:27 +00:00
David Sterba	3d3a126a81	btrfs: remove unused parameter from btrfs_check_super_valid None of the checks need to know the ro/rw status as they're all not changing the superblock. Moreover, we can access the sb flags directly if we'd need to decide by the ro/rw status. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-17 12:03:52 +01:00
David Sterba	eece6a9cf6	btrfs: merge two superblock writing helpers write_all_supers and write_ctree_super are almost equal, the parameter 'trans' is unused so we can drop it and have just one helper. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-17 12:03:51 +01:00
David Sterba	b75f506243	btrfs: remove unused parameter from write_dev_supers The barriers are handled by the caller. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-17 12:03:51 +01:00
David Sterba	7c302b49dd	btrfs: remove unused parameter from clean_tree_block Added but never needed. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-17 12:03:51 +01:00
David Sterba	e27f62652b	btrfs: remove unused parameter from check_async_write Added but never used. Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2017-02-17 12:03:51 +01:00
Jan Kara	efa7c9f97e	block: Get rid of blk_get_backing_dev_info() blk_get_backing_dev_info() is now a simple dereference. Remove that function and simplify some code around that. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-02 08:21:32 -07:00
Linus Torvalds	087a76d390	Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "Jeff Mahoney and Dave Sterba have a really nice set of cleanups in here, and Christoph pitched in corrections/improvements to make btrfs use proper helpers for bio walking instead of doing it by hand. There are some key fixes as well, including some long standing bugs that took forever to track down in btrfs_drop_extents and during balance" * 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (77 commits) btrfs: limit async_work allocation and worker func duration Revert "Btrfs: adjust len of writes if following a preallocated extent" Btrfs: don't WARN() in btrfs_transaction_abort() for IO errors btrfs: opencode chunk locking, remove helpers btrfs: remove root parameter from transaction commit/end routines btrfs: split btrfs_wait_marked_extents into normal and tree log functions btrfs: take an fs_info directly when the root is not used otherwise btrfs: simplify btrfs_wait_cache_io prototype btrfs: convert extent-tree tracepoints to use fs_info btrfs: root->fs_info cleanup, access fs_info->delayed_root directly btrfs: root->fs_info cleanup, add fs_info convenience variables btrfs: root->fs_info cleanup, update_block_group{,flags} btrfs: root->fs_info cleanup, lock/unlock_chunks btrfs: root->fs_info cleanup, btrfs_calc_{trans,trunc}_metadata_size btrfs: pull node/sector/stripe sizes out of root and into fs_info btrfs: root->fs_info cleanup, io_ctl_init btrfs: root->fs_info cleanup, use fs_info->dev_root everywhere btrfs: struct reada_control.root -> reada_control.fs_info btrfs: struct btrfsic_state->root should be an fs_info btrfs: alloc_reserved_file_extent trace point should use extent_root ...	2016-12-16 10:53:01 -08:00
Chris Mason	5f52a2c512	Merge branch 'for-chris-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.10 Patches queued up by Filipe: The most important change is still the fix for the extent tree corruption that happens due to balance when qgroups are enabled (a regression introduced in 4.7 by a fix for a regression from the last qgroups rework). This has been hitting SLE and openSUSE users and QA very badly, where transactions keep getting aborted when running delayed references leaving the root filesystem in RO mode and nearly unusable. There are fixes here that allow us to run xfstests again with the integrity checker enabled, which has been impossible since 4.8 (apparently I'm the only one running xfstests with the integrity checker enabled, which is useful to validate dirtied leafs, like checking if there are keys out of order, etc). The rest are just some trivial fixes, most of them tagged for stable, and two cleanups. Signed-off-by: Chris Mason <clm@fb.com>	2016-12-13 09:14:42 -08:00
David Sterba	34441361c4	btrfs: opencode chunk locking, remove helpers The helpers are trivial and we don't use them consistently. Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:07:00 +01:00
Jeff Mahoney	3a45bb207e	btrfs: remove root parameter from transaction commit/end routines Now we only use the root parameter to print the root objectid in a tracepoint. We can use the root parameter from the transaction handle for that. It's also used to join the transaction with async commits, so we remove the comment that it's just for checking. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:07:00 +01:00
Jeff Mahoney	2ff7e61e0d	btrfs: take an fs_info directly when the root is not used otherwise There are loads of functions in btrfs that accept a root parameter but only use it to obtain an fs_info pointer. Let's convert those to just accept an fs_info pointer directly. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:59 +01:00
Jeff Mahoney	ccdf9b305a	btrfs: root->fs_info cleanup, access fs_info->delayed_root directly This results in btrfs_assert_delayed_root_empty and btrfs_destroy_delayed_inode taking an fs_info instead of a root. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:59 +01:00
Jeff Mahoney	0b246afa62	btrfs: root->fs_info cleanup, add fs_info convenience variables In routines where someptr->fs_info is referenced multiple times, we introduce a convenience variable. This makes the code considerably more readable. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:59 +01:00
Jeff Mahoney	3796d33535	btrfs: root->fs_info cleanup, lock/unlock_chunks Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:58 +01:00
Jeff Mahoney	da17066c40	btrfs: pull node/sector/stripe sizes out of root and into fs_info We track the node sizes per-root, but they never vary from the values in the superblock. This patch messes with the 80-column style a bit, but subsequent patches to factor out root->fs_info into a convenience variable fix it up again. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:58 +01:00
Jeff Mahoney	fb456252d3	btrfs: root->fs_info cleanup, use fs_info->dev_root everywhere Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:58 +01:00
Jeff Mahoney	6bccf3ab1e	btrfs: call functions that always use the same root with fs_info instead There are many functions that are always called with the same root argument. Rather than passing the same root every time, we can pass an fs_info pointer instead and have the function get the root pointer itself. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:57 +01:00
Jeff Mahoney	5b4aacefb8	btrfs: call functions that overwrite their root parameter with fs_info There are 11 functions that accept a root parameter and immediately overwrite it. We can pass those an fs_info pointer instead. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-12-06 16:06:57 +01:00
Wang Xiaoguang	1d57ee9416	btrfs: improve delayed refs iterations This issue was found when I tried to delete a heavily reflinked file, when deleting such files, other transaction operation will not have a chance to make progress, for example, start_transaction() will blocked in wait_current_trans(root) for long time, sometimes it even triggers soft lockups, and the time taken to delete such heavily reflinked file is also very large, often hundreds of seconds. Using perf top, it reports that: PerfTop: 7416 irqs/sec kernel:99.8% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs) --------------------------------------------------------------------------------------- 84.37% [btrfs] [k] __btrfs_run_delayed_refs.constprop.80 11.02% [kernel] [k] delay_tsc 0.79% [kernel] [k] _raw_spin_unlock_irq 0.78% [kernel] [k] _raw_spin_unlock_irqrestore 0.45% [kernel] [k] do_raw_spin_lock 0.18% [kernel] [k] __slab_alloc It seems __btrfs_run_delayed_refs() took most cpu time, after some debug work, I found it's select_delayed_ref() causing this issue, for a delayed head, in our case, it'll be full of BTRFS_DROP_DELAYED_REF nodes, but select_delayed_ref() will firstly try to iterate node list to find BTRFS_ADD_DELAYED_REF nodes, obviously it's a disaster in this case, and waste much time. To fix this issue, we introduce a new ref_add_list in struct btrfs_delayed_ref_head, then in select_delayed_ref(), if this list is not empty, we can directly use nodes in this list. With this patch, it just took about 10~15 seconds to delte the same file. Now using perf top, it reports that: PerfTop: 2734 irqs/sec kernel:99.5% exact: 0.0% [4000Hz cpu-clock], (all, 4 CPUs) ---------------------------------------------------------------------------------------- 20.74% [kernel] [k] _raw_spin_unlock_irqrestore 16.33% [kernel] [k] __slab_alloc 5.41% [kernel] [k] lock_acquired 4.42% [kernel] [k] lock_acquire 4.05% [kernel] [k] lock_release 3.37% [kernel] [k] _raw_spin_unlock_irq For normal files, this patch also gives help, at least we do not need to iterate whole list to found BTRFS_ADD_DELAYED_REF nodes. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-11-30 13:45:21 +01:00
Domagoj Tršan	0b5e3dafb6	btrfs: change btrfs_csum_final result param type to u8 csum member of struct btrfs_super_block has array type of u8. It makes sense that function btrfs_csum_final should be also declared to accept u8 . I changed the declaration of method void btrfs_csum_final(u32 crc, char result); to void btrfs_csum_final(u32 crc, u8 *result); Signed-off-by: Domagoj Tršan <domagoj.trsan@gmail.com> [ changed cast to u8 at several call sites ] Signed-off-by: David Sterba <dsterba@suse.com>	2016-11-30 13:45:18 +01:00
David Sterba	b159fa2808	btrfs: remove constant parameter to memset_extent_buffer and rename it The only memset we do is to 0, so sink the parameter to the function and simplify all calls. Rename the function to reflect the behaviour. Signed-off-by: David Sterba <dsterba@suse.com>	2016-11-30 13:45:17 +01:00
David Sterba	d24ee97b96	btrfs: use new helpers to set uuids in eb Signed-off-by: David Sterba <dsterba@suse.com>	2016-11-30 13:45:17 +01:00
David Sterba	62d1f9fe97	btrfs: remove trivial helper btrfs_find_tree_block During the time, the function has been shrunk to the point that it just calls find_extent_buffer, just passing the parameters. Signed-off-by: David Sterba <dsterba@suse.com>	2016-11-30 13:45:16 +01:00
David Sterba	fc2e901f26	btrfs: reada, sink start parameter to btree_readahead_hook Originally, the eb and start were passed separately in case eb is NULL. Since the readahead has been refactored in 4.6, this is not true anymore and we can get rid of the parameter. Signed-off-by: David Sterba <dsterba@suse.com>	2016-11-30 13:45:16 +01:00
Filipe Manana	f177d73949	Btrfs: fix emptiness check for dirtied extent buffers at check_leaf() We can not simply use the owner field from an extent buffer's header to get the id of the respective tree when the extent buffer is from a relocation tree. When we create the root for a relocation tree we leave (on purpose) the owner field with the same value as the subvolume's tree root (we do this at ctree.c:btrfs_copy_root()). So we must ignore extent buffers from relocation trees, which have the BTRFS_HEADER_FLAG_RELOC flag set, because otherwise we will always consider the extent buffer as not being the root of the tree (the root of original subvolume tree is always different from the root of the respective relocation tree). This lead to assertion failures when running with the integrity checker enabled (CONFIG_BTRFS_FS_CHECK_INTEGRITY=y) such as the following: [ 643.393409] BTRFS critical (device sdg): corrupt leaf, non-root leaf's nritems is 0: block=38506496, root=260, slot=0 [ 643.397609] BTRFS info (device sdg): leaf 38506496 total ptrs 0 free space 3995 [ 643.407075] assertion failed: 0, file: fs/btrfs/disk-io.c, line: 4078 [ 643.408425] ------------[ cut here ]------------ [ 643.409112] kernel BUG at fs/btrfs/ctree.h:3419! [ 643.409773] invalid opcode: 0000 [#1] PREEMPT SMP [ 643.410447] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq ppdev psmouse acpi_cpufreq parport_pc evdev parport tpm_tis tpm_tis_core pcspkr serio_raw i2c_piix4 sg tpm i2c_core button processor loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring scsi_mod virtio e1000 floppy [ 643.414356] CPU: 11 PID: 32726 Comm: btrfs Not tainted 4.8.0-rc8-btrfs-next-35+ #1 [ 643.414356] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 643.414356] task: ffff880145e95b00 task.stack: ffff88014826c000 [ 643.414356] RIP: 0010:[<ffffffffa0352759>] [<ffffffffa0352759>] assfail.constprop.41+0x1c/0x1e [btrfs] [ 643.414356] RSP: 0018:ffff88014826fa28 EFLAGS: 00010292 [ 643.414356] RAX: 0000000000000039 RBX: ffff88014e2d7c38 RCX: 0000000000000001 [ 643.414356] RDX: ffff88023f4d2f58 RSI: ffffffff81806c63 RDI: 00000000ffffffff [ 643.414356] RBP: ffff88014826fa28 R08: 0000000000000001 R09: 0000000000000000 [ 643.414356] R10: ffff88014826f918 R11: ffffffff82f3c5ed R12: ffff880172910000 [ 643.414356] R13: ffff880233992230 R14: ffff8801a68a3310 R15: fffffffffffffff8 [ 643.414356] FS: 00007f9ca305e8c0(0000) GS:ffff88023f4c0000(0000) knlGS:0000000000000000 [ 643.414356] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 643.414356] CR2: 00007f9ca3071000 CR3: 000000015d01b000 CR4: 00000000000006e0 [ 643.414356] Stack: [ 643.414356] ffff88014826fa50 ffffffffa02d655a 000000000000000a ffff88014e2d7c38 [ 643.414356] 0000000000000000 ffff88014826faa8 ffffffffa02b72f3 ffff88014826fab8 [ 643.414356] 00ffffffa03228e4 0000000000000000 0000000000000000 ffff8801bbd4e000 [ 643.414356] Call Trace: [ 643.414356] [<ffffffffa02d655a>] btrfs_mark_buffer_dirty+0xdf/0xe5 [btrfs] [ 643.414356] [<ffffffffa02b72f3>] btrfs_copy_root+0x18a/0x1d1 [btrfs] [ 643.414356] [<ffffffffa0322921>] create_reloc_root+0x72/0x1ba [btrfs] [ 643.414356] [<ffffffffa03267c2>] btrfs_init_reloc_root+0x7b/0xa7 [btrfs] [ 643.414356] [<ffffffffa02d9e44>] record_root_in_trans+0xdf/0xed [btrfs] [ 643.414356] [<ffffffffa02db04e>] btrfs_record_root_in_trans+0x50/0x6a [btrfs] [ 643.414356] [<ffffffffa030ad2b>] create_subvol+0x472/0x773 [btrfs] [ 643.414356] [<ffffffffa030b406>] btrfs_mksubvol+0x3da/0x463 [btrfs] [ 643.414356] [<ffffffffa030b406>] ? btrfs_mksubvol+0x3da/0x463 [btrfs] [ 643.414356] [<ffffffff810781ac>] ? preempt_count_add+0x65/0x68 [ 643.414356] [<ffffffff811a6e97>] ? __mnt_want_write+0x62/0x77 [ 643.414356] [<ffffffffa030b55d>] btrfs_ioctl_snap_create_transid+0xce/0x187 [btrfs] [ 643.414356] [<ffffffffa030b67d>] btrfs_ioctl_snap_create+0x67/0x81 [btrfs] [ 643.414356] [<ffffffffa030ecfd>] btrfs_ioctl+0x508/0x20dd [btrfs] [ 643.414356] [<ffffffff81293e39>] ? __this_cpu_preempt_check+0x13/0x15 [ 643.414356] [<ffffffff81155eca>] ? handle_mm_fault+0x976/0x9ab [ 643.414356] [<ffffffff81091300>] ? arch_local_irq_save+0x9/0xc [ 643.414356] [<ffffffff8119a2b0>] vfs_ioctl+0x18/0x34 [ 643.414356] [<ffffffff8119a8e8>] do_vfs_ioctl+0x581/0x600 [ 643.414356] [<ffffffff814b9552>] ? entry_SYSCALL_64_fastpath+0x5/0xa8 [ 643.414356] [<ffffffff81093fe9>] ? trace_hardirqs_on_caller+0x17b/0x197 [ 643.414356] [<ffffffff8119a9be>] SyS_ioctl+0x57/0x79 [ 643.414356] [<ffffffff814b9565>] entry_SYSCALL_64_fastpath+0x18/0xa8 [ 643.414356] [<ffffffff81091b08>] ? trace_hardirqs_off_caller+0x3f/0xaa [ 643.414356] Code: 89 83 88 00 00 00 31 c0 5b 41 5c 41 5d 5d c3 55 89 f1 48 c7 c2 98 bc 35 a0 48 89 fe 48 c7 c7 05 be 35 a0 48 89 e5 e8 13 46 dd e0 <0f> 0b 55 89 f1 48 c7 c2 9f d3 35 a0 48 89 fe 48 c7 c7 7a d5 35 [ 643.414356] RIP [<ffffffffa0352759>] assfail.constprop.41+0x1c/0x1e [btrfs] [ 643.414356] RSP <ffff88014826fa28> [ 643.468267] ---[ end trace 6a1b3fb1a9d7d6e3 ]--- This can be easily reproduced by running xfstests with the integrity checker enabled. Fixes: `1ba98d086f` (Btrfs: detect corruption when non-root leaf has zero item) Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>	2016-11-23 20:24:35 +00:00
Liu Bo	ef85b25e98	Btrfs: fix BUG_ON in btrfs_mark_buffer_dirty This can only happen with CONFIG_BTRFS_FS_CHECK_INTEGRITY=y. Commit `1ba98d0` ("Btrfs: detect corruption when non-root leaf has zero item") assumes that a leaf is its root when leaf->bytenr == btrfs_root_bytenr(root), however, we should not use btrfs_root_bytenr(root) since it's mainly got updated during committing transaction. So the check can fail when doing COW on this leaf while it is a root. This changes to use "if (leaf == btrfs_root_node(root))" instead, just like how we check whether leaf is a root in __btrfs_cow_block(). Fixes: `1ba98d086f` (Btrfs: detect corruption when non-root leaf has zero item) Cc: stable@vger.kernel.org # 4.8+ Reported-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Filipe Manana <fdmanana@suse.com>	2016-11-23 20:23:20 +00:00
Christoph Hellwig	70fd76140a	block,fs: use REQ_* flags directly Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-11-01 09:43:26 -06:00
Christoph Hellwig	67f055c798	btrfs: use op_is_sync to check for synchronous requests Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-11-01 09:43:26 -06:00
Chris Mason	d9ed71e545	Merge branch 'fst-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.9 Signed-off-by: Chris Mason <clm@fb.com>	2016-10-12 13:16:00 -07:00
Omar Sandoval	6675df311d	Btrfs: catch invalid free space trees There are two separate issues that can lead to corrupted free space trees. 1. The free space tree bitmaps had an endianness issue on big-endian systems which is fixed by an earlier patch in this series. 2. btrfs-progs before v4.7.3 modified filesystems without updating the free space tree. To catch both of these issues at once, we need to force the free space tree to be rebuilt. To do so, add a FREE_SPACE_TREE_VALID compat_ro bit. If the bit isn't set, we know that it was either produced by a broken big-endian kernel or may have been corrupted by btrfs-progs. This also provides us with a way to add rudimentary read-write support for the free space tree to btrfs-progs: it can just clear this bit and have the kernel rebuild the free space tree. Cc: stable@vger.kernel.org # 4.5+ Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-10-03 18:52:14 +02:00
Omar Sandoval	f8d468a15c	Btrfs: fix mount -o clear_cache,space_cache=v2 We moved the code for creating the free space tree the first time that it's enabled, but didn't move the clearing code along with it. This breaks my (undocumented) intention that `mount -o clear_cache,space_cache=v2` would clear the free space tree and then recreate it. Fixes: `511711af91` ("btrfs: don't run delayed references while we are creating the free space tree") Cc: stable@vger.kernel.org # 4.5+ Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-10-03 18:52:14 +02:00
Jeff Mahoney	ab8d0fc48d	btrfs: convert pr_* to btrfs_* where possible For many printks, we want to know which file system issued the message. This patch converts most pr_* calls to use the btrfs_* versions instead. In some cases, this means adding plumbing to allow call sites access to an fs_info pointer. fs/btrfs/check-integrity.c is left alone for another day. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-09-26 19:37:04 +02:00
Jeff Mahoney	62e855771d	btrfs: convert printk(KERN_* to use pr_* calls This patch converts printk(KERN_* style messages to use the pr_* versions. One side effect is that anything that was KERN_DEBUG is now automatically a dynamic debug message. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-09-26 18:08:44 +02:00
Jeff Mahoney	5d163e0e68	btrfs: unsplit printed strings CodingStyle chapter 2: "[...] never break user-visible strings such as printk messages, because that breaks the ability to grep for them." This patch unsplits user-visible strings. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-09-26 18:08:44 +02:00
Liu Bo	6b722c1747	Btrfs: improve check_node to avoid reading corrupted nodes We need to check items in a node to make sure that we're reading a valid one, otherwise we could get various crashes while processing delayed_refs. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-09-26 18:05:28 +02:00
Josef Bacik	8436ea91a1	Btrfs: kill the start argument to read_extent_buffer_pages Nobody uses this, it makes no sense to do partial reads of extent buffers. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-09-26 17:59:49 +02:00
Josef Bacik	afcdd129e0	Btrfs: add a flags field to btrfs_fs_info We have a lot of random ints in btrfs_fs_info that can be put into flags. This is mostly equivalent with the exception of how we deal with quota going on or off, now instead we set a flag when we are turning it on or off and deal with that appropriately, rather than just having a pending state that the current quota_enabled gets set to. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-09-26 17:59:49 +02:00
Liu Bo	c79a175175	Btrfs: fix memory leak of block group cache While processing delayed refs, we may update block group's statistics and attach it to cur_trans->dirty_bgs, and later writing dirty block groups will process the list, which happens during btrfs_commit_transaction(). For whatever reason, the transaction is aborted and dirty_bgs is not processed in cleanup_transaction(), we end up with memory leak of these dirty block group cache. Since btrfs_start_dirty_block_groups() doesn't make it go to the commit critical section, this also adds the cleanup work inside it. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-09-26 17:59:49 +02:00
Linus Torvalds	28687b935e	Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "We've queued up a few different fixes in here. These range from enospc corners to fsync and quota fixes, and a few targeted at error handling for corrupt metadata/fuzzing" * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix lockdep warning on deadlock against an inode's log mutex Btrfs: detect corruption when non-root leaf has zero item Btrfs: check btree node's nritems btrfs: don't create or leak aliased root while cleaning up orphans Btrfs: fix em leak in find_first_block_group btrfs: do not background blkdev_put() Btrfs: clarify do_chunk_alloc()'s return value btrfs: fix fsfreeze hang caused by delayed iputs deal btrfs: update btrfs_space_info's bytes_may_use timely btrfs: divide btrfs_update_reserved_bytes() into two functions btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster() btrfs: qgroup: Fix qgroup incorrectness caused by log replay btrfs: relocation: Fix leaking qgroups numbers on data extents btrfs: qgroup: Refactor btrfs_qgroup_insert_dirty_extent() btrfs: waiting on qgroup rescan should not always be interruptible btrfs: properly track when rescan worker is running btrfs: flush_space: treat return value of do_chunk_alloc properly Btrfs: add ASSERT for block group's memory leak btrfs: backref: Fix soft lockup in __merge_refs function Btrfs: fix memory leak of reloc_root	2016-08-26 20:22:01 -07:00
Liu Bo	1ba98d086f	Btrfs: detect corruption when non-root leaf has zero item Right now we treat leaf which has zero item as a valid one because we could have an empty tree, that is, a root that is also a leaf without any item, however, in the same case but when the leaf is not a root, we can end up with hitting the BUG_ON(1) in btrfs_extend_item() called by setup_inline_extent_backref(). This makes us check the situation as a corruption if leaf is not its own root. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2016-08-25 03:58:31 -07:00
Liu Bo	053ab70f06	Btrfs: check btree node's nritems When btree node (level = 1) has nritems which equals to zero, we can end up with panic due to insert_ptr()'s BUG_ON(slot > nritems); where slot is 1 and nritems is 0, as copy_for_split() calls insert_ptr(.., path->slots[1] + 1, ...); A invalid value results in the whole mess, this adds the check for btree's node nritems so that we stop reading block when when something is wrong. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2016-08-25 03:58:30 -07:00
Jeff Mahoney	35bbb97fc8	btrfs: don't create or leak aliased root while cleaning up orphans commit `909c3a22da` (Btrfs: fix loading of orphan roots leading to BUG_ON) avoids the BUG_ON but can add an aliased root to the dead_roots list or leak the root. Since we've already been loading roots into the radix tree, we should use it before looking the root up on disk. Cc: <stable@vger.kernel.org> # 4.5 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2016-08-25 03:58:29 -07:00
Wang Xiaoguang	9e7cc91a6d	btrfs: fix fsfreeze hang caused by delayed iputs deal When running fstests generic/068, sometimes we got below deadlock: xfs_io D ffff8800331dbb20 0 6697 6693 0x00000080 ffff8800331dbb20 ffff88007acfc140 ffff880034d895c0 ffff8800331dc000 ffff880032d243e8 fffffffeffffffff ffff880032d24400 0000000000000001 ffff8800331dbb38 ffffffff816a9045 ffff880034d895c0 ffff8800331dbba8 Call Trace: [<ffffffff816a9045>] schedule+0x35/0x80 [<ffffffff816abab2>] rwsem_down_read_failed+0xf2/0x140 [<ffffffff8118f5e1>] ? __filemap_fdatawrite_range+0xd1/0x100 [<ffffffff8134f978>] call_rwsem_down_read_failed+0x18/0x30 [<ffffffffa06631fc>] ? btrfs_alloc_block_rsv+0x2c/0xb0 [btrfs] [<ffffffff810d32b5>] percpu_down_read+0x35/0x50 [<ffffffff81217dfc>] __sb_start_write+0x2c/0x40 [<ffffffffa067f5d5>] start_transaction+0x2a5/0x4d0 [btrfs] [<ffffffffa067f857>] btrfs_join_transaction+0x17/0x20 [btrfs] [<ffffffffa068ba34>] btrfs_evict_inode+0x3c4/0x5d0 [btrfs] [<ffffffff81230a1a>] evict+0xba/0x1a0 [<ffffffff812316b6>] iput+0x196/0x200 [<ffffffffa06851d0>] btrfs_run_delayed_iputs+0x70/0xc0 [btrfs] [<ffffffffa067f1d8>] btrfs_commit_transaction+0x928/0xa80 [btrfs] [<ffffffffa0646df0>] btrfs_freeze+0x30/0x40 [btrfs] [<ffffffff81218040>] freeze_super+0xf0/0x190 [<ffffffff81229275>] do_vfs_ioctl+0x4a5/0x5c0 [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70 [<ffffffff810038cf>] ? syscall_trace_enter_phase1+0x11f/0x140 [<ffffffff81229409>] SyS_ioctl+0x79/0x90 [<ffffffff81003c12>] do_syscall_64+0x62/0x110 [<ffffffff816acbe1>] entry_SYSCALL64_slow_path+0x25/0x25 >From this warning, freeze_super() already holds SB_FREEZE_FS, but btrfs_freeze() will call btrfs_commit_transaction() again, if btrfs_commit_transaction() finds that it has delayed iputs to handle, it'll start_transaction(), which will try to get SB_FREEZE_FS lock again, then deadlock occurs. The root cause is that in btrfs, sync_filesystem(sb) does not make sure all metadata is updated. There still maybe some codes adding delayed iputs, see below sample race window: CPU1 \| CPU2 \|-> freeze_super() \| \|-> sync_filesystem(sb); \| \| \|-> cleaner_kthread() \| \| \|-> btrfs_delete_unused_bgs() \| \| \|-> btrfs_remove_chunk() \| \| \|-> btrfs_remove_block_group() \| \| \|-> btrfs_add_delayed_iput() \| \| \|-> sb->s_writers.frozen = SB_FREEZE_FS; \| \|-> sb_wait_write(sb, SB_FREEZE_FS); \| \| acquire SB_FREEZE_FS lock. \| \| \| \|-> btrfs_freeze() \| \|-> btrfs_commit_transaction() \| \|-> btrfs_run_delayed_iputs() \| \| will handle delayed iputs, \| \| that means start_transaction() \| \| will be called, which will try \| \| to get SB_FREEZE_FS lock. \| To fix this issue, introduce a "int fs_frozen" to record internally whether fs has been frozen. If fs has been frozen, we can not handle delayed iputs. Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add comment to btrfs_freeze ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2016-08-25 03:58:26 -07:00
Jeff Mahoney	d06f23d6a9	btrfs: waiting on qgroup rescan should not always be interruptible We wait on qgroup rescan completion in three places: file system shutdown, the quota disable ioctl, and the rescan wait ioctl. If the user sends a signal while we're waiting, we continue happily along. This is expected behavior for the rescan wait ioctl. It's racy in the shutdown path but mostly works due to other unrelated synchronization points. In the quota disable path, it Oopses the kernel pretty much immediately. Cc: <stable@vger.kernel.org> # v4.4+ Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2016-08-25 03:58:20 -07:00
Jeff Mahoney	d2c609b834	btrfs: properly track when rescan worker is running The qgroup_flags field is overloaded such that it reflects the on-disk status of qgroups and the runtime state. The BTRFS_QGROUP_STATUS_FLAG_RESCAN flag is used to indicate that a rescan operation is in progress, but if the file system is unmounted while a rescan is running, the rescan operation is paused. If the file system is then mounted read-only, the flag will still be present but the rescan operation will not have been resumed. When we go to umount, btrfs_qgroup_wait_for_completion will see the flag and interpret it to mean that the rescan worker is still running and will wait for a completion that will never come. This patch uses a separate flag to indicate when the worker is running. The locking and state surrounding the qgroup rescan worker needs a lot of attention beyond this patch but this is enough to avoid a hung umount. Cc: <stable@vger.kernel.org> # v4.4+ Signed-off-by; Jeff Mahoney <jeffm@suse.com> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2016-08-25 03:58:19 -07:00
Liu Bo	1c1ea4f781	Btrfs: fix memory leak of reloc_root When some critical errors occur and FS would be flipped into RO, if we have an on-going balance, we can end up with a memory leak of root->reloc_root since btrfs_drop_snapshots() bails out without freeing reloc_root at the very early start. However, we're not able to free reloc_root in btrfs_drop_snapshots() because its caller, merge_reloc_roots(), still needs to access it to cleanup reloc_root's rbtree. This makes us free reloc_root when we're going to free fs/file roots. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2016-08-25 03:58:15 -07:00
Jens Axboe	1eff9d322a	block: rename bio bi_rw to bi_opf Since commit `63a4cc2486`, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>	2016-08-07 14:41:02 -06:00
Linus Torvalds	d58b0d980f	Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull more btrfs updates from Chris Mason: "This is part two of my btrfs pull, which is some cleanups and a batch of fixes. Most of the code here is from Jeff Mahoney, making the pointers we pass around internally more consistent and less confusing overall. I noticed a small problem right before I sent this out yesterday, so I fixed it up and re-tested overnight" * 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (40 commits) Btrfs: fix __MAX_CSUM_ITEMS btrfs: btrfs_abort_transaction, drop root parameter btrfs: add btrfs_trans_handle->fs_info pointer btrfs: btrfs_relocate_chunk pass extent_root to btrfs_end_transaction btrfs: convert nodesize macros to static inlines btrfs: introduce BTRFS_MAX_ITEM_SIZE btrfs: cleanup, remove prototype for btrfs_find_root_ref btrfs: copy_to_sk drop unused root parameter btrfs: simpilify btrfs_subvol_inherit_props btrfs: tests, use BTRFS_FS_STATE_DUMMY_FS_INFO instead of dummy root btrfs: tests, require fs_info for root btrfs: tests, move initialization into tests/ btrfs: btrfs_test_opt and friends should take a btrfs_fs_info btrfs: prefix fsid to all trace events btrfs: plumb fs_info into btrfs_work btrfs: remove obsolete part of comment in statfs btrfs: hide test-only member under ifdef btrfs: Ratelimit "no csum found" info message btrfs: Add ratelimit to btrfs printing Btrfs: fix unexpected balance crash due to BUG_ON ...	2016-08-04 19:56:16 -04:00
Linus Torvalds	d05d7f4079	Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block Pull core block updates from Jens Axboe: - the big change is the cleanup from Mike Christie, cleaning up our uses of command types and modified flags. This is what will throw some merge conflicts - regression fix for the above for btrfs, from Vincent - following up to the above, better packing of struct request from Christoph - a 2038 fix for blktrace from Arnd - a few trivial/spelling fixes from Bart Van Assche - a front merge check fix from Damien, which could cause issues on SMR drives - Atari partition fix from Gabriel - convert cfq to highres timers, since jiffies isn't granular enough for some devices these days. From Jan and Jeff - CFQ priority boost fix idle classes, from me - cleanup series from Ming, improving our bio/bvec iteration - a direct issue fix for blk-mq from Omar - fix for plug merging not involving the IO scheduler, like we do for other types of merges. From Tahsin - expose DAX type internally and through sysfs. From Toshi and Yigal * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits) block: Fix front merge check block: do not merge requests without consulting with io scheduler block: Fix spelling in a source code comment block: expose QUEUE_FLAG_DAX in sysfs block: add QUEUE_FLAG_DAX for devices to advertise their DAX support Btrfs: fix comparison in __btrfs_map_block() block: atari: Return early for unsupported sector size Doc: block: Fix a typo in queue-sysfs.txt cfq-iosched: Charge at least 1 jiffie instead of 1 ns cfq-iosched: Fix regression in bonnie++ rewrite performance cfq-iosched: Convert slice_resid from u64 to s64 block: Convert fifo_time from ulong to u64 blktrace: avoid using timespec block/blk-cgroup.c: Declare local symbols static block/bio-integrity.c: Add #include "blk.h" block/partition-generic.c: Remove a set-but-not-used variable block: bio: kill BIO_MAX_SIZE cfq-iosched: temporarily boost queue priority for idle classes block: drbd: avoid to use BIO_MAX_SIZE block: bio: remove BIO_MAX_SECTORS ...	2016-07-26 15:03:07 -07:00
Jeff Mahoney	f5ee5c9ac5	btrfs: tests, use BTRFS_FS_STATE_DUMMY_FS_INFO instead of dummy root Now that we have a dummy fs_info associated with each test that uses a root, we don't need the DUMMY_ROOT bit anymore. This lets us make choices without needing an actual root like in e.g. btrfs_find_create_tree_block. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-07-26 13:54:19 +02:00
Jeff Mahoney	7c0260ee09	btrfs: tests, require fs_info for root This allows the upcoming patchset to push nodesize and sectorsize into fs_info. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-07-26 13:53:18 +02:00
Jeff Mahoney	3cdde2240d	btrfs: btrfs_test_opt and friends should take a btrfs_fs_info btrfs_test_opt and friends only use the root pointer to access the fs_info. Let's pass the fs_info directly in preparation to eliminate similar patterns all over btrfs. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-07-26 13:53:16 +02:00
Jeff Mahoney	cb001095ca	btrfs: plumb fs_info into btrfs_work In order to provide an fsid for trace events, we'll need a btrfs_fs_info pointer. The most lightweight way to do that for btrfs_work structures is to associate it with the __btrfs_workqueue structure. Each queued btrfs_work structure has a workqueue associated with it, so that's a natural fit. It's a privately defined structures, so we add accessors to retrieve the fs_info pointer. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-07-26 13:53:15 +02:00
Nikolay Borisov	fba4b69771	btrfs: Fix slab accounting flags BTRFS is using a variety of slab caches to satisfy internal needs. Those slab caches are always allocated with the SLAB_RECLAIM_ACCOUNT, meaning allocations from the caches are going to be accounted as SReclaimable. At the same time btrfs is not registering any shrinkers whatsoever, thus preventing memory from the slabs to be shrunk. This means those caches are not in fact reclaimable. To fix this remove the SLAB_RECLAIM_ACCOUNT on all caches apart from the inode cache, since this one is being freed by the generic VFS super_block shrinker. Also set the transaction related caches as SLAB_TEMPORARY, to better document the lifetime of the objects (it just translates to SLAB_RECLAIM_ACCOUNT). Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-07-26 13:52:25 +02:00
Liu Bo	876d2cf141	Btrfs: fix double free of fs root I got this warning while mounting a btrfs image, [ 3020.509606] ------------[ cut here ]------------ [ 3020.510107] WARNING: CPU: 3 PID: 5581 at lib/idr.c:1051 ida_remove+0xca/0x190 [ 3020.510853] ida_remove called for id=42 which is not allocated. [ 3020.511466] Modules linked in: [ 3020.511802] CPU: 3 PID: 5581 Comm: mount Not tainted 4.7.0-rc5+ #274 [ 3020.512438] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014 [ 3020.513385] 0000000000000286 0000000021295d86 ffff88006c66b8f0 ffffffff8182ba5a [ 3020.514153] 0000000000000000 0000000000000009 ffff88006c66b930 ffffffff810e0ed7 [ 3020.514928] 0000041b00000000 ffffffff8289a8c0 ffff88007f437880 0000000000000000 [ 3020.515717] Call Trace: [ 3020.515965] [<ffffffff8182ba5a>] dump_stack+0xc9/0x13f [ 3020.516487] [<ffffffff810e0ed7>] __warn+0x147/0x160 [ 3020.517005] [<ffffffff810e0f4f>] warn_slowpath_fmt+0x5f/0x80 [ 3020.517572] [<ffffffff8182e6ca>] ida_remove+0xca/0x190 [ 3020.518075] [<ffffffff813a2bcc>] free_anon_bdev+0x2c/0x60 [ 3020.518609] [<ffffffff81657a9f>] free_fs_root+0x13f/0x160 [ 3020.519138] [<ffffffff8165c679>] btrfs_get_fs_root+0x379/0x3d0 [ 3020.519710] [<ffffffff81e6e975>] ? __mutex_unlock_slowpath+0x155/0x2c0 [ 3020.520366] [<ffffffff816615b1>] open_ctree+0x2e91/0x3200 [ 3020.520965] [<ffffffff8161ede2>] btrfs_mount+0x1322/0x15b0 [ 3020.521536] [<ffffffff81e60e74>] ? kmemleak_alloc_percpu+0x44/0x170 [ 3020.522167] [<ffffffff8115f5e1>] ? lockdep_init_map+0x61/0x210 [ 3020.522780] [<ffffffff813a4f59>] mount_fs+0x49/0x2c0 [ 3020.523305] [<ffffffff813d840c>] vfs_kern_mount+0xac/0x1b0 [ 3020.523872] [<ffffffff8161dee1>] btrfs_mount+0x421/0x15b0 [ 3020.524402] [<ffffffff81e60e74>] ? kmemleak_alloc_percpu+0x44/0x170 [ 3020.525045] [<ffffffff8115f5e1>] ? lockdep_init_map+0x61/0x210 [ 3020.525657] [<ffffffff8115f5e1>] ? lockdep_init_map+0x61/0x210 [ 3020.526289] [<ffffffff813a4f59>] mount_fs+0x49/0x2c0 [ 3020.526803] [<ffffffff813d840c>] vfs_kern_mount+0xac/0x1b0 [ 3020.527365] [<ffffffff813dc27a>] do_mount+0x41a/0x1770 [ 3020.527899] [<ffffffff812e800d>] ? strndup_user+0x6d/0xc0 [ 3020.528447] [<ffffffff812e7f68>] ? memdup_user+0x78/0xb0 [ 3020.528987] [<ffffffff813ddad0>] SyS_mount+0x150/0x160 [ 3020.529493] [<ffffffff81e72b7c>] entry_SYSCALL_64_fastpath+0x1f/0xbd It turns out that we free fs root twice, btrfs_init_fs_root() calls free_anon_bdev(root->anon_dev) and later then btrfs_get_fs_root() cals free_fs_root which does another free_anon_bdev() and it ends up with the above warning. Instead of reset root->anon_dev to 0 after free_anon_bdev(), we can let btrfs_init_fs_root() return directly since its callers have already done the free job by calling free_fs_root(). Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Reviewed-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-07-26 13:52:25 +02:00
Chandan Rajendra	b7f67055d2	Btrfs: Force stripesize to the value of sectorsize Btrfs code currently assumes stripesize to be same as sectorsize. However Btrfs-progs (until commit df05c7ed455f519e6e15e46196392e4757257305) has been setting btrfs_super_block->stripesize to a value of 4096. This commit makes sure that the value of btrfs_super_block->stripesize is a power of 2. Later, it unconditionally sets btrfs_root->stripesize to sectorsize. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Chris Mason <clm@fb.com>	2016-06-23 10:44:42 -07:00
Chandan Rajendra	dd5c93111d	Btrfs: btrfs_check_super_valid: Allow 4096 as stripesize Older btrfs-progs/mkfs.btrfs sets 4096 as the stripesize. Hence restricting stripesize to be equal to sectorsize would cause super block validation to return an error on architectures where PAGE_SIZE is not equal to 4096. Hence as a workaround, this commit allows stripesize to be set to 4096 bytes. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-17 18:32:49 +02:00
Zygo Blaxell	90c711ab38	btrfs: avoid blocking open_ctree from cleaner_kthread This fixes a problem introduced in commit `2f3165ecf1` "btrfs: don't force mounts to wait for cleaner_kthread to delete one or more subvolumes". open_ctree eventually calls btrfs_replay_log which in turn calls btrfs_commit_super which tries to lock the cleaner_mutex, causing a recursive mutex deadlock during mount. Instead of playing whack-a-mole trying to keep up with all the functions that may want to lock cleaner_mutex, put all the cleaner_mutex lockers back where they were, and attack the problem more directly: keep cleaner_kthread asleep until the filesystem is mounted. When filesystems are mounted read-only and later remounted read-write, open_ctree did not set fs_info->open and neither does anything else. Set this flag in btrfs_remount so that neither btrfs_delete_unused_bgs nor cleaner_kthread get confused by the common case of "/" filesystem read-only mount followed by read-write remount. Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-17 18:32:40 +02:00
Liu Bo	c871b0f2fd	Btrfs: check if extent buffer is aligned to sectorsize Thanks to fuzz testing, we can pass an invalid bytenr to extent buffer via alloc_extent_buffer(). An unaligned eb can have more pages than it should have, which ends up extent buffer's leak or some corrupted content in extent buffer. This adds a warning to let us quickly know what was happening. Now that alloc_extent_buffer() no more returns NULL, this changes its caller and callers of its caller to match with the new error handling. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2016-06-17 18:32:40 +02:00
Chris Mason	719da39a61	Merge branch 'misc-fixes-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.7	2016-06-08 14:36:12 -07:00
Mike Christie	81a75f6781	btrfs: use bio fields for op and flags The bio REQ_OP and bi_rw rq_flag_bits are now always setup, so there is no need to pass around the rq_flag_bits bits too. btrfs users should should access the bio insead. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:41:38 -06:00
Mike Christie	37226b2111	btrfs: use bio op accessors This should be the easier cases to convert btrfs to bio_set_op_attrs/bio_op. They are mostly just cut and replace type of changes. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:41:38 -06:00
Mike Christie	2a222ca992	fs: have submit_bh users pass in op and flags separately This has submit_bh users pass in the operation and flags separately, so submit_bh_wbc can setup the bio op and bi_rw flags on the bio that is submitted. Signed-off-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:41:38 -06:00
Mike Christie	4e49ea4a3d	block/fs/drivers: remove rw argument from submit_bio This has callers of submit_bio/submit_bio_wait set the bio->bi_rw instead of passing it in. This makes that use the same as generic_make_request and how we set the other bio fields. Signed-off-by: Mike Christie <mchristi@redhat.com> Fixed up fs/ext4/crypto.c Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:41:38 -06:00

1 2 3 4 5 ...

1113 Commits