linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-26 00:30:53 +07:00

Author	SHA1	Message	Date
jon ernst	e2cbd58741	ext4: silence sparse check warning for function ext4_trim_extent This fixes the following sparse warning: CHECK fs/ext4/mballoc.c fs/ext4/mballoc.c:5019:9: warning: context imbalance in 'ext4_trim_extent' - unexpected unlock Signed-off-by: "Jon Ernst" <jonernst07@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-04-12 23:01:28 -04:00
Theodore Ts'o	40c406c74e	ext4: COLLAPSE_RANGE only works on extent-based files Unfortunately, we weren't checking to make sure of this the inode was extent-based before attempt operate on it. Hilarity ensues. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Namjae Jeon <namjae.jeon@samsung.com>	2014-04-12 22:53:53 -04:00
Zheng Liu	847c6c422a	ext4: fix byte order problems introduced by the COLLAPSE_RANGE patches This commit tries to fix some byte order issues that is found by sparse check. $ make M=fs/ext4 C=2 CF=-D__CHECK_ENDIAN__ ... CHECK fs/ext4/extents.c fs/ext4/extents.c:5232:41: warning: restricted __le32 degrades to integer fs/ext4/extents.c:5236:52: warning: bad assignment (-=) to restricted __le32 fs/ext4/extents.c:5258:45: warning: bad assignment (-=) to restricted __le32 fs/ext4/extents.c:5303:28: warning: restricted __le32 degrades to integer fs/ext4/extents.c:5318:18: warning: incorrect type in assignment (different base types) fs/ext4/extents.c:5318:18: expected unsigned int [unsigned] [usertype] ex_start fs/ext4/extents.c:5318:18: got restricted __le32 [usertype] ee_block fs/ext4/extents.c:5319:24: warning: restricted __le32 degrades to integer fs/ext4/extents.c:5334:31: warning: incorrect type in assignment (different base types) ... Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-04-12 12:45:55 -04:00
Theodore Ts'o	6e6358fc3c	ext4: use i_size_read in ext4_unaligned_aio() We haven't taken i_mutex yet, so we need to use i_size_read(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-04-12 12:45:25 -04:00
Lukas Czerner	0790b31b69	fs: disallow all fallocate operation on active swapfile Currently some file system have IS_SWAPFILE check in their fallocate implementations and some do not. However we should really prevent any fallocate operation on swapfile so move the check to vfs and remove the redundant checks from the file systems fallocate implementations. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-04-12 10:05:37 -04:00
Lukas Czerner	23fffa925e	fs: move falloc collapse range check into the filesystem methods Currently in do_fallocate in collapse range case we're checking whether offset + len is not bigger than i_size. However there is nothing which would prevent i_size from changing so the check is pointless. It should be done in the file system itself and the file system needs to make sure that i_size is not going to change. The i_size check for the other fallocate modes are also done in the filesystems. As it is now we can easily crash the kernel by having two processes doing truncate and fallocate collapse range at the same time. This can be reproduced on ext4 and it is theoretically possible on xfs even though I was not able to trigger it with this simple test. This commit removes the check from do_fallocate and adds it to the file system. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2014-04-12 09:56:41 -04:00
Lukas Czerner	9ef06cec7c	ext4: remove unnecessary check for APPEND and IMMUTABLE All the checks IS_APPEND and IS_IMMUTABLE for the fallocate operation on the inode are done in vfs. No need to do this again in ext4. Remove it. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-04-12 09:47:00 -04:00
Theodore Ts'o	622cad1325	ext4: move ext4_update_i_disksize() into mpage_map_and_submit_extent() The function ext4_update_i_disksize() is used in only one place, in the function mpage_map_and_submit_extent(). Move its code to simplify the code paths, and also move the call to ext4_mark_inode_dirty() into the i_data_sem's critical region, to be consistent with all of the other places where we update i_disksize. That way, we also keep the raw_inode's i_disksize protected, to avoid the following race: CPU #1 CPU #2 down_write(&i_data_sem) Modify i_disk_size up_write(&i_data_sem) down_write(&i_data_sem) Modify i_disk_size Copy i_disk_size to on-disk inode up_write(&i_data_sem) Copy i_disk_size to on-disk inode Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org	2014-04-11 10:35:17 -04:00
Younger Liu	c57ab39b96	ext4: return ENOMEM rather than EIO when find_###_page() fails Return ENOMEM rather than EIO when find_get_page() fails in ext4_mb_get_buddy_page_lock() and find_or_create_page() fails in ext4_mb_load_buddy(). Signed-off-by: Younger Liu <younger.liucn@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-04-10 23:03:43 -04:00
Namjae Jeon	1ce01c4a19	ext4: fix COLLAPSE_RANGE test failure in data journalling mode When mounting ext4 with data=journal option, xfstest shared/002 and shared/004 are currently failing as checksum computed for testfile does not match with the checksum computed in other journal modes. In case of data=journal mode, a call to filemap_write_and_wait_range will not flush anything to disk as buffers are not marked dirty in write_end. In collapse range this call is followed by a call to truncate_pagecache_range. Due to this, when checksum is computed, a portion of file is re-read from disk which replace valid data with NULL bytes and hence the reason for the difference in checksum. Calling ext4_force_commit before filemap_write_and_wait_range solves the issue as it will mark the buffers dirty during commit transaction which can be later synced by a call to filemap_write_and_wait_range. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-04-10 22:58:20 -04:00
Theodore Ts'o	87f7e41636	ext4: update PF_MEMALLOC handling in ext4_write_inode() The special handling of PF_MEMALLOC callers in ext4_write_inode() shouldn't be necessary as there shouldn't be any. Warn about it. Also update comment before the function as it seems somewhat outdated. (Changes modeled on an ext3 patch posted by Jan Kara to the linux-ext4 mailing list on Februaryt 28, 2014, which apparently never went into the ext3 tree.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz>	2014-04-08 11:38:28 -04:00
Jan Kara	ec4cb1aa2b	ext4: fix jbd2 warning under heavy xattr load When heavily exercising xattr code the assertion that jbd2_journal_dirty_metadata() shouldn't return error was triggered: WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237 jbd2_journal_dirty_metadata+0x1ba/0x260() CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G W 3.10.0-ceph-00049-g68d04c9 #1 Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011 ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968 ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80 0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978 Call Trace: [<ffffffff816311b0>] dump_stack+0x19/0x1b [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0 [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20 [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260 [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140 [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0 [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910 [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0 [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140 [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50 [<ffffffff811935ce>] generic_setxattr+0x6e/0x90 [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0 [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0 [<ffffffff8119421e>] setxattr+0x13e/0x1e0 [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0 [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60 [<ffffffff8118c65c>] ? fget_light+0x3c/0x130 [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60 [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70 [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100 [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b The reason for the warning is that buffer_head passed into jbd2_journal_dirty_metadata() didn't have journal_head attached. This is caused by the following race of two ext4_xattr_release_block() calls: CPU1 CPU2 ext4_xattr_release_block() ext4_xattr_release_block() lock_buffer(bh); /* False / if (BHDR(bh)->h_refcount == cpu_to_le32(1)) } else { le32_add_cpu(&BHDR(bh)->h_refcount, -1); unlock_buffer(bh); lock_buffer(bh); / True */ if (BHDR(bh)->h_refcount == cpu_to_le32(1)) get_bh(bh); ext4_free_blocks() ... jbd2_journal_forget() jbd2_journal_unfile_buffer() -> JH is gone error = ext4_handle_dirty_xattr_block(handle, inode, bh); -> triggers the warning We fix the problem by moving ext4_handle_dirty_xattr_block() under the buffer lock. Sadly this cannot be done in nojournal mode as that function can call sync_dirty_buffer() which would deadlock. Luckily in nojournal mode the race is harmless (we only dirty already freed buffer) and thus for nojournal mode we leave the dirtying outside of the buffer lock. Reported-by: Sage Weil <sage@inktank.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-04-07 10:54:21 -04:00
Matthew Wilcox	9503c67c93	ext4: note the error in ext4_end_bio() ext4_end_bio() currently throws away the error that it receives. Chances are this is part of a spate of errors, one of which will end up getting the error returned to userspace somehow, but we shouldn't take that risk. Also print out the errno to aid in debug. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org	2014-04-07 10:54:20 -04:00
Azat Khuzhin	007649375f	ext4: initialize multi-block allocator before checking block descriptors With EXT4FS_DEBUG ext4_count_free_clusters() will call ext4_read_block_bitmap() without s_group_info initialized, so we need to initialize multi-block allocator before. And dependencies that must be solved, to allow this: - multi-block allocator needs in group descriptors - need to install s_op before initializing multi-block allocator, because in ext4_mb_init_backend() new inode is created. - initialize number of group desc blocks (s_gdb_count) otherwise number of clusters returned by ext4_free_clusters_after_init() is not correct. (see ext4_bg_num_gdb_nometa()) Here is the stack backtrace: (gdb) bt #0 ext4_get_group_info (group=0, sb=0xffff880079a10000) at ext4.h:2430 #1 ext4_validate_block_bitmap (sb=sb@entry=0xffff880079a10000, desc=desc@entry=0xffff880056510000, block_group=block_group@entry=0, bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:358 #2 0xffffffff81232202 in ext4_wait_block_bitmap (sb=sb@entry=0xffff880079a10000, block_group=block_group@entry=0, bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:476 #3 0xffffffff81232eaf in ext4_read_block_bitmap (sb=sb@entry=0xffff880079a10000, block_group=block_group@entry=0) at balloc.c:489 #4 0xffffffff81232fc0 in ext4_count_free_clusters (sb=sb@entry=0xffff880079a10000) at balloc.c:665 #5 0xffffffff81259ffa in ext4_check_descriptors (first_not_zeroed=<synthetic pointer>, sb=0xffff880079a10000) at super.c:2143 #6 ext4_fill_super (sb=sb@entry=0xffff880079a10000, data=<optimized out>, data@entry=0x0 <irq_stack_union>, silent=silent@entry=0) at super.c:3851 ... Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-04-07 10:54:20 -04:00
Kazuya Mio	4adb6ab3e0	ext4: FIBMAP ioctl causes BUG_ON due to handle EXT_MAX_BLOCKS When we try to get 2^32-1 block of the file which has the extent (ee_block=2^32-2, ee_len=1) with FIBMAP ioctl, it causes BUG_ON in ext4_ext_put_gap_in_cache(). To avoid the problem, ext4_map_blocks() needs to check the file logical block number. ext4_ext_put_gap_in_cache() called via ext4_map_blocks() cannot handle 2^32-1 because the maximum file logical block number is 2^32-2. Note that ext4_ind_map_blocks() returns -EIO when the block number is invalid. So ext4_map_blocks() should also return the same errno. Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-04-07 10:53:28 -04:00
Chen Gang	666525dfbd	ext4: fix 64-bit number truncation warning '0x7FDEADBEEF' will be truncated to 32-bit number under unicore32. Need append 'ULL' for it. The related warning (with allmodconfig under unicore32): CC [M] fs/ext4/extents_status.o fs/ext4/extents_status.c: In function "__es_remove_extent": fs/ext4/extents_status.c:813: warning: integer constant is too large for "long" type Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-04-07 10:18:56 -04:00
Linus Torvalds	24e7ea3bea	Major changes for 3.14 include support for the newly added ZERO_RANGE and COLLAPSE_RANGE fallocate operations, and scalability improvements in the jbd2 layer and in xattr handling when the extended attributes spill over into an external block. Other than that, the usual clean ups and minor bug fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJTPbD2AAoJENNvdpvBGATwDmUQANSfGYIQazB8XKKgtNTMiG/Y Ky7n1JzN9lTX/6nMsqQnbfCweLRmxqpWUBuyKDRHUi8IG0/voXSTFsAOOgz0R15A ERRRWkVvHixLpohuL/iBdEMFHwNZYPGr3jkm0EIgzhtXNgk5DNmiuMwvHmCY27kI kdNZIw9fip/WRNoFLDBGnLGC37aanoHhCIbVlySy5o9LN1pkC8BgXAYV0Rk19SVd bWCudSJEirFEqWS5H8vsBAEm/ioxTjwnNL8tX8qms6orZ6h8yMLFkHoIGWPw3Q15 a0TSUoMyav50Yr59QaDeWx9uaPQVeK41wiYFI2rZOnyG2ts0u0YXs/nLwJqTovgs rzvbdl6cd3Nj++rPi97MTA7iXK96WQPjsDJoeeEgnB0d/qPyTk6mLKgftzLTNgSa ZmWjrB19kr6CMbebMC4L6eqJ8Fr66pCT8c/iue8wc4MUHi7FwHKH64fqWvzp2YT/ +165dqqo2JnUv7tIp6sUi1geun+bmDHLZFXgFa7fNYFtcU3I+uY1mRr3eMVAJndA 2d6ASe/KhQbpVnjKJdQ8/b833ZS3p+zkgVPrd68bBr3t7gUmX91wk+p1ct6rUPLr 700F+q/pQWL8ap0pU9Ht/h3gEJIfmRzTwxlOeYyOwDseqKuS87PSB3BzV3dDunSU DrPKlXwIgva7zq5/S0Vr =4s1Z -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Major changes for 3.14 include support for the newly added ZERO_RANGE and COLLAPSE_RANGE fallocate operations, and scalability improvements in the jbd2 layer and in xattr handling when the extended attributes spill over into an external block. Other than that, the usual clean ups and minor bug fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (42 commits) ext4: fix premature freeing of partial clusters split across leaf blocks ext4: remove unneeded test of ret variable ext4: fix comment typo ext4: make ext4_block_zero_page_range static ext4: atomically set inode->i_flags in ext4_set_inode_flags() ext4: optimize Hurd tests when reading/writing inodes ext4: kill i_version support for Hurd-castrated file systems ext4: each filesystem creates and uses its own mb_cache fs/mbcache.c: doucple the locking of local from global data fs/mbcache.c: change block and index hash chain to hlist_bl_node ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate ext4: refactor ext4_fallocate code ext4: Update inode i_size after the preallocation ext4: fix partial cluster handling for bigalloc file systems ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents ext4: only call sync_filesystm() when remounting read-only fs: push sync_filesystem() down to the file system's remount_fs() jbd2: improve error messages for inconsistent journal heads jbd2: minimize region locked by j_list_lock in jbd2_journal_forget() jbd2: minimize region locked by j_list_lock in journal_get_create_access() ...	2014-04-04 15:39:39 -07:00
Linus Torvalds	7df934526c	Merge branch 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull renameat2 system call from Miklos Szeredi: "This adds a new syscall, renameat2(), which is the same as renameat() but with a flags argument. The purpose of extending rename is to add cross-rename, a symmetric variant of rename, which exchanges the two files. This allows interesting things, which were not possible before, for example atomically replacing a directory tree with a symlink, etc... This also allows overlayfs and friends to operate on whiteouts atomically. Andy Lutomirski also suggested a "noreplace" flag, which disables the overwriting behavior of rename. These two flags, RENAME_EXCHANGE and RENAME_NOREPLACE are only implemented for ext4 as an example and for testing" * 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ext4: add cross rename support ext4: rename: split out helper functions ext4: rename: move EMLINK check up ext4: rename: create ext4_renament structure for local vars vfs: add cross-rename vfs: lock_two_nondirectories: allow directory args security: add flags to rename hooks vfs: add RENAME_NOREPLACE flag vfs: add renameat2 syscall vfs: rename: use common code for dir and non-dir vfs: rename: move d_move() up vfs: add d_is_dir()	2014-04-04 14:03:05 -07:00
Johannes Weiner	91b0abe36a	mm + fs: store shadow entries in page cache Reclaim will be leaving shadow entries in the page cache radix tree upon evicting the real page. As those pages are found from the LRU, an iput() can lead to the inode being freed concurrently. At this point, reclaim must no longer install shadow pages because the inode freeing code needs to ensure the page tree is really empty. Add an address_space flag, AS_EXITING, that the inode freeing code sets under the tree lock before doing the final truncate. Reclaim will check for this flag before installing shadow pages. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Greg Thelen <gthelen@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jan Kara <jack@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Metin Doslu <metin@citusdata.com> Cc: Michel Lespinasse <walken@google.com> Cc: Ozgun Erdogan <ozgun@citusdata.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <klamm@yandex-team.ru> Cc: Ryan Mallon <rmallon@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-04-03 16:21:01 -07:00
Dan Carpenter	45d4f85504	fs/direct-io.c: remove some left over checks We know that "ret > 0" is true here. These tests were left over from commit `02afc27fae` ('direct-io: Handle O_(D)SYNC AIO') and aren't needed any more. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-04-03 16:20:57 -07:00
Eric Whitney	ad6599ab3a	ext4: fix premature freeing of partial clusters split across leaf blocks Xfstests generic/311 and shared/298 fail when run on a bigalloc file system. Kernel error messages produced during the tests report that blocks to be freed are already on the to-be-freed list. When e2fsck is run at the end of the tests, it typically reports bad i_blocks and bad free blocks counts. The bug that causes these failures is located in ext4_ext_rm_leaf(). Code at the end of the function frees a partial cluster if it's not shared with an extent remaining in the leaf. However, if all the extents in the leaf have been removed, the code dereferences an invalid extent pointer (off the front of the leaf) when the check for sharing is made. This generally has the effect of unconditionally freeing the partial cluster, which leads to the observed failures when the partial cluster is shared with the last extent in the next leaf. Fix this by attempting to free the cluster only if extents remain in the leaf. Any remaining partial cluster will be freed if possible when the next leaf is processed or when leaf removal is complete. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-04-01 19:49:30 -04:00
Miklos Szeredi	bd42998a6b	ext4: add cross rename support Implement RENAME_EXCHANGE flag in renameat2 syscall. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz>	2014-04-01 17:08:44 +02:00
Miklos Szeredi	bd1af145b9	ext4: rename: split out helper functions Cross rename (exchange source and dest) will need to call some of these helpers for both source and dest, while overwriting rename currently only calls them for one or the other. This also makes the code easier to follow. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz>	2014-04-01 17:08:44 +02:00
Miklos Szeredi	0d7d5d678b	ext4: rename: move EMLINK check up Move checking i_nlink from after ext4_get_first_dir_block() to before. The check doesn't rely on the result of that function and the function only fails on fs corruption, so the order shouldn't matter. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz>	2014-04-01 17:08:44 +02:00
Miklos Szeredi	c0d268c366	ext4: rename: create ext4_renament structure for local vars Need to split up ext4_rename() into helpers but there are too many local variables involved, so create a new structure. This also, apparently, makes the generated code size slightly smaller. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz>	2014-04-01 17:08:43 +02:00
Miklos Szeredi	0a7c3937a1	vfs: add RENAME_NOREPLACE flag If this flag is specified and the target of the rename exists then the rename syscall fails with EEXIST. The VFS does the existence checking, so it is trivial to enable for most local filesystems. This patch only enables it in ext4. For network filesystems the VFS check is not enough as there may be a race between a remote create and the rename, so these filesystems need to handle this flag in their ->rename() implementations to ensure atomicity. Andy writes about why this is useful: "The trivial answer: to eliminate the race condition from 'mv -i'. Another answer: there's a common pattern to atomically create a file with contents: open a temporary file, write to it, optionally fsync it, close it, then link(2) it to the final name, then unlink the temporary file. The reason to use link(2) is because it won't silently clobber the destination. This is annoying: - It requires an extra system call that shouldn't be necessary. - It doesn't work on (IMO sensible) filesystems that don't support hard links (e.g. vfat). - It's not atomic -- there's an intermediate state where both files exist. - It's ugly. The new rename flag will make this totally sensible. To be fair, on new enough kernels, you can also use O_TMPFILE and linkat to achieve the same thing even more cleanly." Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reviewed-by: J. Bruce Fields <bfields@redhat.com>	2014-04-01 17:08:43 +02:00
Lukas Czerner	e5b30416f3	ext4: remove unneeded test of ret variable Currently in ext4_fallocate() and ext4_zero_range() we're testing ret variable along with new_size. However in ext4_fallocate() we just tested ret before and in ext4_zero_range() if will always be zero when we get there so there is no need to test it in both cases. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-04-01 00:59:21 -04:00
Theodore Ts'o	00a1a053eb	ext4: atomically set inode->i_flags in ext4_set_inode_flags() Use cmpxchg() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race where an immutable file has the immutable flag cleared for a brief window of time. Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-03-30 17:02:06 -07:00
Matthew Wilcox	e04027e887	ext4: fix comment typo Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-24 15:15:07 -04:00
Matthew Wilcox	94350ab5c3	ext4: make ext4_block_zero_page_range static It's only called within inode.c, so make it static, remove its prototype from ext4.h and move it above all of its callers so it doesn't need a prototype within inode.c. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-24 15:09:16 -04:00
Theodore Ts'o	5f16f3225b	ext4: atomically set inode->i_flags in ext4_set_inode_flags() Use cmpxchg() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race where an immutable file has the immutable flag cleared for a brief window of time. Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org	2014-03-24 14:43:12 -04:00
Theodore Ts'o	ed3654eb98	ext4: optimize Hurd tests when reading/writing inodes Set a in-memory superblock flag to indicate whether the file system is designed to support the Hurd. Also, add a sanity check to make sure the 64-bit feature is not set for Hurd file systems, since i_file_acl_high conflicts with a Hurd-specific field. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-24 14:09:06 -04:00
Theodore Ts'o	c4f6570605	ext4: kill i_version support for Hurd-castrated file systems The Hurd file system uses uses the inode field which is now used for i_version for its translator block. This means that ext2 file systems that are formatted for GNU Hurd can't be used to support NFSv4. Given that Hurd file systems don't support extents, and a huge number of modern file system features, this is no great loss. If we don't do this, the attempt to update the i_version field will stomp over the translator block field, which will cause file system corruption for Hurd file systems. This can be replicated via: mke2fs -t ext2 -o hurd /dev/vdc mount -t ext4 /dev/vdc /vdc touch /vdc/bug0000 umount /dev/vdc e2fsck -f /dev/vdc Addresses-Debian-Bug: #738758 Reported-By: Gabriele Giacone <1o5g4r8o@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-20 00:32:57 -04:00
T Makphaibulchoke	9c191f701c	ext4: each filesystem creates and uses its own mb_cache This patch adds new interfaces to create and destory cache, ext4_xattr_create_cache() and ext4_xattr_destroy_cache(), and remove the cache creation and destory calls from ex4_init_xattr() and ext4_exitxattr() in fs/ext4/xattr.c. fs/ext4/super.c has been changed so that when a filesystem is mounted a cache is allocated and attched to its ext4_sb_info structure. fs/mbcache.c has been changed so that only one slab allocator is allocated and used by all mbcache structures. Signed-off-by: T. Makphaibulchoke <tmac@hp.com>	2014-03-18 19:24:49 -04:00
Lukas Czerner	b8a8684502	ext4: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same functionality as xfs ioctl XFS_IOC_ZERO_RANGE. It can be used to convert a range of file to zeros preferably without issuing data IO. Blocks should be preallocated for the regions that span holes in the file, and the entire range is preferable converted to unwritten extents This can be also used to preallocate blocks past EOF in the same way as with fallocate. Flag FALLOC_FL_KEEP_SIZE which should cause the inode size to remain the same. Also add appropriate tracepoints. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-18 18:05:35 -04:00
Lukas Czerner	0e8b6879f3	ext4: refactor ext4_fallocate code Move block allocation out of the ext4_fallocate into separate function called ext4_alloc_file_blocks(). This will allow us to use the same allocation code for other allocation operations such as zero range which is commit in the next patch. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-18 18:03:51 -04:00
Lukas Czerner	f282ac19d8	ext4: Update inode i_size after the preallocation Currently in ext4_fallocate we would update inode size, c_time and sync the file with every partial allocation which is entirely unnecessary. It is true that if the crash happens in the middle of truncate we might end up with unchanged i size, or c_time which I do not think is really a problem - it does not mean file system corruption in any way. Note that xfs is doing things the same way e.g. update all of the mentioned after the allocation is done. This commit moves all the updates after the allocation is done. In addition we also need to change m_time as not only inode has been change bot also data regions might have changed (unwritten extents). However m_time will be only updated when i_size changed. Also we do not need to be paranoid about changing the c_time only if the actual allocation have happened, we can change it even if we try to allocate only to find out that there are already block allocated. It's not really a big deal and it will save us some additional complexity. Also use ext4_debug, instead of ext4_warning in #ifdef EXT4FS_DEBUG section. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>- -- v3: Do not remove the code to set EXT4_INODE_EOFBLOCKS flag fs/ext4/extents.c \| 96 ++++++++++++++++++++++++------------------------------- 1 file changed, 42 insertions(+), 54 deletions(-)	2014-03-18 17:44:35 -04:00
Eric Whitney	c063449394	ext4: fix partial cluster handling for bigalloc file systems Commit `9cb00419fa`, which enables hole punching for bigalloc file systems, exposed a bug introduced by commit `6ae06ff51e` in an earlier release. When run on a bigalloc file system, xfstests generic/013, 068, 075, 083, 091, 100, 112, 127, 263, 269, and 270 fail with e2fsck errors or cause kernel error messages indicating that previously freed blocks are being freed again. The latter commit optimizes the selection of the starting extent in ext4_ext_rm_leaf() when hole punching by beginning with the extent supplied in the path argument rather than with the last extent in the leaf node (as is still done when truncating). However, the code in rm_leaf that initially sets partial_cluster to track cluster sharing on extent boundaries is only guaranteed to run if rm_leaf starts with the last node in the leaf. Consequently, partial_cluster is not correctly initialized when hole punching, and a cluster on the boundary of a punched region that should be retained may instead be deallocated. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org	2014-03-13 23:34:16 -04:00
Eric Whitney	31cf0f2c31	ext4: delete path dealloc code in ext4_ext_handle_uninitialized_extents Code deallocating the extent path referenced by an argument to ext4_ext_handle_uninitialized_extents was made redundant with identical code in its one caller, ext4_ext_map_blocks, by commit `3779473246`. Allocating and deallocating the path in the same function also makes the code clearer. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-13 23:14:46 -04:00
Theodore Ts'o	38c03b3439	ext4: only call sync_filesystm() when remounting read-only This is the only time it is required for ext4. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-13 22:49:42 -04:00
Theodore Ts'o	02b9984d64	fs: push sync_filesystem() down to the file system's remount_fs() Previously, the no-op "mount -o mount /dev/xxx" operation when the file system is already mounted read-write causes an implied, unconditional syncfs(). This seems pretty stupid, and it's certainly documented or guaraunteed to do this, nor is it particularly useful, except in the case where the file system was mounted rw and is getting remounted read-only. However, it's possible that there might be some file systems that are actually depending on this behavior. In most file systems, it's probably fine to only call sync_filesystem() when transitioning from read-write to read-only, and there are some file systems where this is not needed at all (for example, for a pseudo-filesystem or something like romfs). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anders Larsen <al@alarsen.net> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: xfs@oss.sgi.com Cc: linux-btrfs@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: codalist@coda.cs.cmu.edu Cc: linux-ext4@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: fuse-devel@lists.sourceforge.net Cc: cluster-devel@redhat.com Cc: linux-mtd@lists.infradead.org Cc: jfs-discussion@lists.sourceforge.net Cc: linux-nfs@vger.kernel.org Cc: linux-nilfs@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: reiserfs-devel@vger.kernel.org	2014-03-13 10:14:33 -04:00
Theodore Ts'o	66a4cb187b	jbd2: improve error messages for inconsistent journal heads Fix up error messages printed when the transaction pointers in a journal head are inconsistent. This improves the error messages which are printed when running xfstests generic/068 in data=journal mode. See the bug report at: https://bugzilla.kernel.org/show_bug.cgi?id=60786 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-12 16:38:03 -04:00
Jan Kara	10542c229a	ext4: Speedup WB_SYNC_ALL pass called from sync(2) When doing filesystem wide sync, there's no need to force transaction commit (or synchronously write inode buffer) separately for each inode because ext4_sync_fs() takes care of forcing commit at the end (VFS takes care of flushing buffer cache, respectively). Most of the time this slowness doesn't manifest because previous WB_SYNC_NONE writeback doesn't leave much to write but when there are processes aggressively creating new files and several filesystems to sync, the sync slowness can be noticeable. In the following test script sync(1) takes around 6 minutes when there are two ext4 filesystems mounted on a standard SATA drive. After this patch sync takes a couple of seconds so we have about two orders of magnitude improvement. function run_writers { for (( i = 0; i < 10; i++ )); do mkdir $1/dir$i for (( j = 0; j < 40000; j++ )); do dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null done & done } for dir in "$@"; do run_writers $dir done sleep 40 time sync Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-03-04 10:50:50 -05:00
Namjae Jeon	9eb79482a9	ext4: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for Ext4. The semantics of this flag are following: 1) It collapses the range lying between offset and length by removing any data blocks which are present in this range and than updates all the logical offsets of extents beyond "offset + len" to nullify the hole created by removing blocks. In short, it does not leave a hole. 2) It should be used exclusively. No other fallocate flag in combination. 3) Offset and length supplied to fallocate should be fs block size aligned in case of xfs and ext4. 4) Collaspe range does not work beyond i_size. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Tested-by: Dongsu Park <dongsu.park@profitbricks.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-23 15:18:59 -05:00
Lukas Czerner	a633f5a319	ext4: translate fallocate mode bits to strings Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-22 06:18:17 -05:00
Darrick J. Wong	a9b8241594	ext4: merge uninitialized extents Allow for merging uninitialized extents. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-20 21:17:35 -05:00
Maxim Patlasov	e251f9bca9	ext4: avoid exposure of stale data in ext4_punch_hole() While handling punch-hole fallocate, it's useless to truncate page cache before removing the range from extent tree (or block map in indirect case) because page cache can be re-populated (by read-ahead or read(2) or mmap-ed read) immediately after truncating page cache, but before updating extent tree (or block map). In that case the user will see stale data even after fallocate is completed. Until the problem of data corruption resulting from pages backed by already freed blocks is fully resolved, the simple thing we can do now is to add another truncation of pagecache after punch hole is done. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>	2014-02-20 16:58:05 -05:00
Eric Whitney	ce140cdd9c	ext4: silence warnings in extent status tree debugging code Adjust the conversion specifications in a few optionally compiled debug messages to match the return type of ext4_es_status(). Also, make a couple of minor grammatical message edits while we're at it. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-20 16:09:12 -05:00
Eric Sandeen	dc9ddd984d	ext4: remove unused ac_ex_scanned When looking at a bug report with: > kernel: EXT4-fs: 0 scanned, 0 found I thought wow, 0 scanned, that's odd? But it's not odd; it's printing a variable that is initialized to 0 and never touched again. It's never been used since the original merge, so I don't really even know what the original intent was, either. If anyone knows how to hook it up, speak now via patch, otherwise just yank it so it's not making a confusing situation more confusing in kernel logs. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-20 13:32:10 -05:00
Theodore Ts'o	e861b5e9a4	ext4: avoid possible overflow in ext4_map_blocks() The ext4_map_blocks() function returns the number of blocks which satisfying the caller's request. This number of blocks requested by the caller is specified by an unsigned integer, but the return value of ext4_map_blocks() is a signed integer (to accomodate error codes per the kernel's standard error signalling convention). Historically, overflows could never happen since mballoc() will refuse to allocate more than 2048 blocks at a time (which is something we should fix), and if the blocks were already allocated, the fact that there would be some number of intervening metadata blocks pretty much guaranteed that there could never be a contiguous region of data blocks that was greater than 231 blocks. However, this is now possible if there is a file system which is a bit bigger than 8TB, and is created using the new mke2fs hugeblock feature, which can create a perfectly contiguous file. In that case, if a userspace program attempted to call fallocate() on this already fully allocated file, it's possible that ext4_map_blocks() could return a number large enough that it would overflow a signed integer, resulting in a ext4 thinking that the ext4_map_blocks() call had failed with some strange error code. Since ext4_map_blocks() is always free to return a smaller number of blocks than what was requested by the caller, fix this by capping the number of blocks that ext4_map_blocks() will ever try to map to 231 - 1. In practice this should never get hit, except by someone deliberately trying to provke the above-described bug. Thanks to the PaX team for asking whethre this could possibly happen in some off-line discussions about using some static code checking technology they are developing to find bugs in kernel code. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2014-02-20 12:54:05 -05:00

1 2 3 4 5 ...

2197 Commits