linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-21 23:30:37 +07:00

Author	SHA1	Message	Date
Jaegeuk Kim	687de7f101	f2fs: avoid IO split due to mixed WB_SYNC_ALL and WB_SYNC_NONE If two threads try to flush dirty pages in different inodes respectively, f2fs_write_data_pages() will produce WRITE and WRITE_SYNC one at a time, resulting in a lot of 4KB seperated IOs. So, this patch gives higher priority to WB_SYNC_ALL IOs and gathers write IOs with a big WRITE_SYNC'ed bio. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-05 11:05:06 -07:00
Jaegeuk Kim	ef095d19e8	f2fs: write small sized IO to hot log It would better split small and large IOs separately in order to get more consecutive big writes. The default threshold is set to 64KB, but configurable by sysfs/min_hot_blocks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-05 11:05:05 -07:00
Chao Yu	a7eeb82385	f2fs: use bitmap in discard_entry This patch changes to use bitmap instead of extent in struct discard_entry to indicate discard range in one segment, for fragmented space, this implementation can save memory footprint. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-05 11:05:04 -07:00
Chao Yu	f099405fc8	f2fs: clean up destroy_discard_cmd_control Remove unneeded parameter and simply change flow in destroy_discard_cmd_control. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-05 11:05:04 -07:00
Chao Yu	5f32366a29	f2fs: count discard command entry Adds to count discard command entry and show the number in debugfs, also fix to add cost of discard command cache into total comsumed memory footprint. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-05 11:05:03 -07:00
Chao Yu	8b8dd65f72	f2fs: show issued flush/discard count Show historical count of flush command and discard command. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-04-05 11:05:02 -07:00
Jaegeuk Kim	c13ff37e35	f2fs: relax node version check for victim data in gc - has_not_enough_free_secs node_secs: 0 dent_secs: 0 freed:0 free_segments:103 reserved:104 - f2fs_gc - get_victim_by_default alloc_mode 0, gc_mode 1, max_search 2672, offset 4654, ofs_unit 1 - do_garbage_collect start_segno 3976, end_segno 3977 type 0 - is_alive nid 22797, blkaddr 2131882, ofs_in_node 0, version 0x8/0x0 - gc_data_segment 766, segno 3976, block 512/426 not alive So, this patch fixes subtle corrupted case where node version does not match to summary version which results in infinite loop by gc. Reported-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-29 17:34:40 -07:00
Jaegeuk Kim	796dbbfe4e	f2fs: start SSR much eariler to avoid FG_GC This patch initiates SSR much eariler, resulting in less FG_GC. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-29 17:34:39 -07:00
Jaegeuk Kim	7a20b8a61e	f2fs: allocate node and hot data in the beginning of partition In order to give more spatial locality, this patch changes the block allocation policy which assigns beginning of partition for small and hot data/node blocks. In order to do this, we set noheap allocation by default and introduce another mount option, heap, to reset it back. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-29 17:34:37 -07:00
Jaegeuk Kim	c541a51b8c	f2fs: fix wrong max cost initialization This patch fixes missing increased max cost caused by a patch that we increased cose of data segments in greedy algorithm. Cc: <stable@vger.kernel.org> # v4.10+ Fixes: `b9cd20619` "f2fs: node segment is prior to data segment selected victim" Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-28 15:49:27 -07:00
Yunlei He	59c9081bc8	f2fs: allow write page cache when writting cp This patch allow write data to normal file when writting new checkpoint. We relax three limitations for write_begin path: 1. data allocation 2. node allocation 3. variables in checkpoint Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-25 00:19:37 -07:00
Chao Yu	22588f8773	f2fs: don't reserve additional space in xattr block In this patch, we change xattr block disk layout as below: Before: xattr node block layout +---------------------------------------------+---------------+-------------+ \| node block xattr entries \| reserved \| node footer \| \| 4068 Bytes \| 4 Bytes \| 24 Bytes \| In memory layout +--------------------+---------------------------------+--------------------+ \| inline xattr \| node block xattr entries \| reserved \| \| 200 Bytes \| 4068 Bytes \| 4 Bytes \| After: xattr node block layout +-------------------------------------------------------------+-------------+ \| node block xattr entries \| node footer \| \| 4072 Bytes \| 24 Bytes \| In memory layout +--------------------+---------------------------------+--------------------+ \| inline xattr \| node block xattr entries \| reserved \| \| 200 Bytes \| 4072 Bytes \| 4 Bytes \| With this change, we don't need to reserve additional space in node block, just keep reserved space in logical in-memory layout. So that it would help to enlarge valid free space of xattr node block. As tested, generic/026 shows max stored xattr entires number increases from 531 to 532 when inline_xattr option is enabled. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-24 15:10:53 -04:00
Chao Yu	89e9eabd7d	f2fs: clean up xattr operation 1. don't allocate redundant memory in read_all_xattrs. 2. introduce RESERVED_XATTR_SIZE for cleanup. Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-24 15:10:52 -04:00
Chao Yu	99f4b917b0	f2fs: don't track volatile file in dirty inode list Don't track volatile file in dirty inode list, otherwise with data_flush option, background thread will entry into endless loop for flushing journal file's pages. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-24 15:10:51 -04:00
Chao Yu	648d50ba12	f2fs: show the max number of volatile operations This patch adds to show the max number of volatile operations which are conducting concurrently. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-24 15:10:50 -04:00
Chao Yu	30a61ddf81	f2fs: fix race condition in between free nid allocator/initializer In below concurrent case, allocated nid can be loaded into free nid cache and be allocated again. Thread A Thread B - f2fs_create - f2fs_new_inode - alloc_nid - __insert_nid_to_list(ALLOC_NID_LIST) - f2fs_balance_fs_bg - build_free_nids - __build_free_nids - scan_nat_page - add_free_nid - __lookup_nat_cache - f2fs_add_link - init_inode_metadata - new_inode_page - new_node_page - set_node_addr - alloc_nid_done - __remove_nid_from_list(ALLOC_NID_LIST) - __insert_nid_to_list(FREE_NID_LIST) This patch makes nat cache lookup and free nid list operation being atomical to avoid this race condition. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-24 15:10:50 -04:00
Yunlei He	5f4c3dec22	f2fs: use set_page_private marcro in f2fs_trace_pid Use set_page_private marcro instead of operte page struct directly Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-24 15:10:49 -04:00
Chao Yu	9897159a7b	f2fs: fix recording invalid last_victim When doing garbage collection, we try to record segment offset which locates at next one of last victim, using it as the start offset in next searching. But in some corner cases, recorded offset may cross the end of main segment area, it will cause incorrectly searching in dirty_segmap bitmap. This patch adds modular operation to avoid this issue. Reported-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-24 15:10:48 -04:00
Kinglong Mee	8f73cbb7d4	f2fs: more reasonable mem_size calculating of ino_entry Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:37 -04:00
Kinglong Mee	70874fb34f	f2fs: calculate the f2fs_stat_info into base_mem The memory size of f2fs_stat_info also should be calculated. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:36 -04:00
Kinglong Mee	684ca7e55d	f2fs: avoid stat_inc_atomic_write for non-atomic file After filemap_write_and_wait_range fail, the FI_ATOMIC_FILE flags is removed, so that f2fs should not increase the stat of atomic_write. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:36 -04:00
Kinglong Mee	c6f89dfd52	f2fs: sanity check of crc_offset from raw checkpoint The crc_offset towards or beyond the end of block is wrong, sanity check it. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:34 -04:00
Kinglong Mee	d03ba4cc3f	f2fs: cleanup the disk level filename updating As discuss with Jaegeuk and Chao, "Once checkpoint is done, f2fs doesn't need to update there-in filename at all." The disk-level filename is used only one case, 1. create a file A under a dir 2. sync A 3. godown 4. umount 5. mount (roll_forward) Only the rename/cross_rename changes the filename, if it happens, a. between step 1 and 2, the sync A will caused checkpoint, so that, the roll_forward at step 5 never happens. b. after step 2, the roll_forward happens, file A will roll forward to the result as after step 1. So that, any updating the disk filename is useless, just cleanup it. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:33 -04:00
Chao Yu	346fe752c4	f2fs: cover update_free_nid_bitmap with nid_list_lock free_nid_bitmap and free_nid_count in update_free_nid_bitmap should be updated atomically, use nid_list_lock cover them to avoid race in concurrent scenario. Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:32 -04:00
Kinglong Mee	a83d50bc16	f2fs: fix bad prefetchw of NULL page For f2fs_read_data_pages, the f2fs_mpage_readpages gets "page == NULL", so that, the prefetchw(&page->flags) is operated on NULL. Fixes: `f1e8866016` ("f2fs: expose f2fs_mpage_readpages") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:31 -04:00
Kinglong Mee	bd4667cb4b	f2fs: clear FI_DATA_EXIST flag in truncate_inline_inode Clear FI_DATA_EXIST flag atomically in truncate_inline_inode, and the return value from truncate_inline_inode isn't used, remove it. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:31 -04:00
Kinglong Mee	d756386172	f2fs: move mnt_want_write_file after arguments checking It's needless of mnt_want_write_file for arguments checking. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:30 -04:00
Kinglong Mee	46e82fb1b5	f2fs: check new size by inode_newsize_ok in f2fs_insert_range The inode_newsize_ok is better than only checking the maxbytes, eg. the rlimit etc. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:29 -04:00
Kinglong Mee	3cecfa5f67	f2fs: avoid copy date to user-space if move file range fail If move file range return error, the data copied to user-space is duplicate. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:28 -04:00
Kinglong Mee	087d3d8bae	f2fs: drop duplicate new_size assign in f2fs_zero_range Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:28 -04:00
Fan Li	8a6aa32502	f2fs: adjust the way of calculating nat block use a slightly simpler expression to calculate nat block with nid. Signed-off-by: Fan Li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:27 -04:00
Jaegeuk Kim	14b44d238c	f2fs: add fault injection on f2fs_truncate Inject a fault during f2fs_truncate(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:26 -04:00
Sheng Yong	1941d7bcb4	f2fs: check range before defragment This patch checks the parameter range passed by ioctl to void that range exceeds the max_file_blocks limit. Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:25 -04:00
Sheng Yong	b0beab5016	f2fs: use parameter max_items instead of PIDVEC_SIZE Signed-off-by: Sheng Yong <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:24 -04:00
Yunlei He	3d6a650feb	f2fs: add a punch discard command function This patch add a function to punch discard command if one segment reuse before discard. Split this segment from multi-segments discard range, and discard the left bigger range. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:23 -04:00
Jaegeuk Kim	c81abe34fe	f2fs: allocate a bio for discarding when actually issuing it Let's allocate a bio when issuing discard commands later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:23 -04:00
Yunlei He	a29d0e0bc0	f2fs: skip writeback meta pages if cp_mutex acquire failed Skip writeback meta pages if cp_mutex lock acquire failed, cp will flush dirty pages instead. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:22 -04:00
Jaegeuk Kim	5ce4738a02	f2fs: show more precise message on orphan recovery failure This case is not caused by fsck.f2fs. User needs to retry mount. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:21 -04:00
Kinglong Mee	fc7d5bc427	f2fs: remove dead macro PGOFS_OF_NEXT_DNODE Fixes: `3cf4574705` ("f2fs: introduce get_next_page_offset to speed up SEEK_DATA") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:21 -04:00
Kinglong Mee	0b28b71e29	f2fs: drop duplicate radix tree lookup of nat_entry_set The nat entry is listed from the set list for freeing, it's duplicate to do radix tree lookup again. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> [Jaegeuk Kim: remove unnecessary f2fs_bug_on] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:20 -04:00
Kinglong Mee	20fda56b01	f2fs: make sure trace all f2fs_issue_flush The root device's issue flush trace is missing, add it and tracing the result from submit. Fixes `d50aaeec90` ("f2fs: show actual device info in tracepoints") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:19 -04:00
Chao Yu	8ff0971f15	f2fs: don't allow volatile writes for non-regular file Now f2fs only supports volatile writes for journal db regular file. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:18 -04:00
Jaegeuk Kim	e811898c97	f2fs: don't allow atomic writes for not regular files The atomic writes only supports regular files for database. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:17 -04:00
Jaegeuk Kim	8c242db9b8	f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer When I forced to enable atomic operations intentionally, I could hit the below panic, since we didn't clear page->private in f2fs_invalidate_page called by file truncation. The panic occurs due to NULL mapping having page->private. BUG: unable to handle kernel paging request at ffffffffffffffff IP: drop_buffers+0x38/0xe0 PGD 5d00c067 PUD 5d00e067 PMD 0 CPU: 3 PID: 1648 Comm: fsstress Tainted: G D OE 4.10.0+ #5 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 task: ffff9151952863c0 task.stack: ffffaaec40db4000 RIP: 0010:drop_buffers+0x38/0xe0 RSP: 0018:ffffaaec40db74c8 EFLAGS: 00010292 Call Trace: ? page_referenced+0x8b/0x170 try_to_free_buffers+0xc5/0xe0 try_to_release_page+0x49/0x50 shrink_page_list+0x8bc/0x9f0 shrink_inactive_list+0x1dd/0x500 ? shrink_active_list+0x2c0/0x430 shrink_node_memcg+0x5eb/0x7c0 shrink_node+0xe1/0x320 do_try_to_free_pages+0xef/0x2e0 try_to_free_pages+0xe9/0x190 __alloc_pages_slowpath+0x390/0xe70 __alloc_pages_nodemask+0x291/0x2b0 alloc_pages_current+0x95/0x140 __page_cache_alloc+0xc4/0xe0 pagecache_get_page+0xab/0x2a0 grab_cache_page_write_begin+0x20/0x40 get_read_data_page+0x2e6/0x4c0 [f2fs] ? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs] ? truncate_data_blocks_range+0x238/0x2b0 [f2fs] get_lock_data_page+0x30/0x190 [f2fs] __exchange_data_block+0xaaf/0xf40 [f2fs] f2fs_fallocate+0x418/0xd00 [f2fs] vfs_fallocate+0x157/0x220 SyS_fallocate+0x48/0x80 Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> [Chao Yu: use INMEM_INVALIDATE for better tracing] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 22:34:10 -04:00
Jaegeuk Kim	aa51d08a11	f2fs: build stat_info before orphan inode recovery f2fs_sync_fs() -> write_checkpoint() calls stat_inc_cp_count(sbi->stat_info), which needs stat_info allocation. Otherwise, we can hit: [254042.598623] ? count_shadow_nodes+0xa0/0xa0 [254042.598633] f2fs_sync_fs+0x65/0xd0 [f2fs] [254042.598645] f2fs_balance_fs_bg+0xe4/0x1c0 [f2fs] [254042.598657] f2fs_write_node_pages+0x34/0x1a0 [f2fs] [254042.598664] ? pagevec_lookup_entries+0x1e/0x30 [254042.598673] do_writepages+0x1e/0x30 [254042.598682] __writeback_single_inode+0x45/0x330 [254042.598688] writeback_single_inode+0xd7/0x190 [254042.598694] write_inode_now+0x86/0xa0 [254042.598699] iput+0x122/0x200 [254042.598709] f2fs_fill_super+0xd4a/0x14d0 [f2fs] [254042.598717] mount_bdev+0x184/0x1c0 [254042.598934] ? f2fs_commit_super+0x100/0x100 [f2fs] [254042.599142] f2fs_mount+0x15/0x20 [f2fs] [254042.599349] mount_fs+0x39/0x160 [254042.599554] ? __alloc_percpu+0x15/0x20 [254042.599759] vfs_kern_mount+0x67/0x110 [254042.599972] do_mount+0x1bb/0xc80 [254042.600175] ? memdup_user+0x42/0x60 [254042.600380] SyS_mount+0x83/0xd0 [254042.600583] entry_SYSCALL_64_fastpath+0x1e/0xad Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Kinglong Mee	10a875f82b	f2fs: fix the fault of calculating blkstart twice When the zone type is BLK_ZONE_TYPE_CONVENTIONAL, the blkstart is calculated twice. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Kinglong Mee	e2f0e962ac	f2fs: fix the fault of checking F2FS_LINK_MAX for rename inode The parent directory's nlink will change, not the inode. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Jaegeuk Kim	4f1bca9f0d	f2fs: don't allow to get pino when filename is encrypted After renaming an encrypted file, we have no way to get its encrypted filename from its dentry. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Jaegeuk Kim	8c1b3c0fb6	f2fs: fix wrong error injection for evict_inode The previous one was not a proper location to inject an error, since there is no point to get errors. Instead, we can emulate EIO during truncation, and the below logic should handle it correctly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Kinglong Mee	10047f537c	f2fs: le32_to_cpu for ckpt->cp_pack_total_block_count Fixes: `22ad0b6ab4` ("f2fs: add bitmaps for empty or full NAT blocks") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Jaegeuk Kim	b71deadbc4	f2fs: le16_to_cpu for xattr->e_value_size This patch fixes missing le16 conversion, reported by kbuild test robot. Fixes: `5f35a2cd5` ("f2fs: Don't update the xattr data that same as the exist") Reviewed-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Jaegeuk Kim	4f295443bf	f2fs: don't need to invalidate wrong node page If f2fs_new_inode() is failed, the bad inode will invalidate 0'th node page during f2fs_evict_inode(), which doesn't need to do. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Yunlei He	a78aaa2c3c	f2fs: fix an error return value in truncate_partial_data_page This patch fix a error return value in truncate_partial_data_page Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-21 16:52:16 -04:00
Chao Yu	7041d5d286	f2fs: combine nat_bits and free_nid_bitmap cache Both nat_bits cache and free_nid_bitmap cache provide same functionality as a intermediate cache between free nid cache and disk, but with different granularity of indicating free nid range, and different persistence policy. nat_bits cache provides better persistence ability, and free_nid_bitmap provides better granularity. In this patch we combine advantage of both caches, so finally policy of the intermediate cache would be: - init: load free nid status from nat_bits into free_nid_bitmap - lookup: scan free_nid_bitmap before load NAT blocks - update: update free_nid_bitmap in real-time - persistence: udpate and persist nat_bits in checkpoint This patch also resolves performance regression reported by lkp-robot. commit: `4ac912427c` ("f2fs: introduce free nid bitmap") d00030cf9cd0bb96fdccc41e33d3c91dcbb672ba ("f2fs: use __set{__clear}_bit_le") 1382c0f3f9d3f936c8bc42ed1591cf7a593ef9f7 ("f2fs: combine nat_bits and free_nid_bitmap cache") `4ac912427c` d00030cf9cd0bb96fdccc41e33 1382c0f3f9d3f936c8bc42ed15 ---------------- -------------------------- -------------------------- %stddev %change %stddev %change %stddev \ \| \ \| \ 77863 ± 0% +2.1% 79485 ± 1% +50.8% 117404 ± 0% aim7.jobs-per-min 231.63 ± 0% -2.0% 227.01 ± 1% -33.6% 153.80 ± 0% aim7.time.elapsed_time 231.63 ± 0% -2.0% 227.01 ± 1% -33.6% 153.80 ± 0% aim7.time.elapsed_time.max 896604 ± 0% -0.8% 889221 ± 3% -20.2% 715260 ± 1% aim7.time.involuntary_context_switches 2394 ± 1% +4.6% 2503 ± 1% +3.7% 2481 ± 2% aim7.time.maximum_resident_set_size 6240 ± 0% -1.5% 6145 ± 1% -14.1% 5360 ± 1% aim7.time.system_time 1111357 ± 3% +1.9% 1132509 ± 2% -6.2% 1041932 ± 2% aim7.time.voluntary_context_switches ... Signed-off-by: Chao Yu <yuchao0@huawei.com> Tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:18 -04:00
Chao Yu	586d1492f3	f2fs: skip scanning free nid bitmap of full NAT blocks This patch adds to account free nids for each NAT blocks, and while scanning all free nid bitmap, do check count and skip lookuping in full NAT block. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:17 -04:00
Jaegeuk Kim	23380b8568	f2fs: use __set{__clear}_bit_le This patch uses __set{__clear}_bit_le for highter speed. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:16 -04:00
Jaegeuk Kim	9f7e4a2c49	f2fs: declare static functions This is to avoid build warning reported by kbuild test robot. Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:15 -04:00
Jaegeuk Kim	720037f939	f2fs: don't overwrite node block by SSR This patch fixes that SSR can overwrite previous warm node block consisting of a node chain since the last checkpoint. Fixes: `5b6c6be2d8` ("f2fs: use SSR for warm node as well") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-03-20 10:00:14 -04:00
Linus Torvalds	590dce2d49	Merge branch 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs 'statx()' update from Al Viro. This adds the new extended stat() interface that internally subsumes our previous stat interfaces, and allows user mode to specify in more detail what kind of information it wants. It also allows for some explicit synchronization information to be passed to the filesystem, which can be relevant for network filesystems: is the cached value ok, or do you need open/close consistency, or what? From David Howells. Andreas Dilger points out that the first version of the extended statx interface was posted June 29, 2010: https://www.spinics.net/lists/linux-fsdevel/msg33831.html * 'rebased-statx' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: statx: Add a system call to make enhanced file info available	2017-03-03 11:38:56 -08:00
David Howells	a528d35e8b	statx: Add a system call to make enhanced file info available Add a system call to make extended file information available, including file creation and some attribute flags where available through the underlying filesystem. The getattr inode operation is altered to take two additional arguments: a u32 request_mask and an unsigned int flags that indicate the synchronisation mode. This change is propagated to the vfs_getattr() function. Functions like vfs_stat() are now inline wrappers around new functions vfs_statx() and vfs_statx_fd() to reduce stack usage. ======== OVERVIEW ======== The idea was initially proposed as a set of xattrs that could be retrieved with getxattr(), but the general preference proved to be for a new syscall with an extended stat structure. A number of requests were gathered for features to be included. The following have been included: (1) Make the fields a consistent size on all arches and make them large. (2) Spare space, request flags and information flags are provided for future expansion. (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an __s64). (4) Creation time: The SMB protocol carries the creation time, which could be exported by Samba, which will in turn help CIFS make use of FS-Cache as that can be used for coherency data (stx_btime). This is also specified in NFSv4 as a recommended attribute and could be exported by NFSD [Steve French]. (5) Lightweight stat: Ask for just those details of interest, and allow a netfs (such as NFS) to approximate anything not of interest, possibly without going to the server [Trond Myklebust, Ulrich Drepper, Andreas Dilger] (AT_STATX_DONT_SYNC). (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks its cached attributes are up to date [Trond Myklebust] (AT_STATX_FORCE_SYNC). And the following have been left out for future extension: (7) Data version number: Could be used by userspace NFS servers [Aneesh Kumar]. Can also be used to modify fill_post_wcc() in NFSD which retrieves i_version directly, but has just called vfs_getattr(). It could get it from the kstat struct if it used vfs_xgetattr() instead. (There's disagreement on the exact semantics of a single field, since not all filesystems do this the same way). (8) BSD stat compatibility: Including more fields from the BSD stat such as creation time (st_btime) and inode generation number (st_gen) [Jeremy Allison, Bernd Schubert]. (9) Inode generation number: Useful for FUSE and userspace NFS servers [Bernd Schubert]. (This was asked for but later deemed unnecessary with the open-by-handle capability available and caused disagreement as to whether it's a security hole or not). (10) Extra coherency data may be useful in making backups [Andreas Dilger]. (No particular data were offered, but things like last backup timestamp, the data version number and the DOS archive bit would come into this category). (11) Allow the filesystem to indicate what it can/cannot provide: A filesystem can now say it doesn't support a standard stat feature if that isn't available, so if, for instance, inode numbers or UIDs don't exist or are fabricated locally... (This requires a separate system call - I have an fsinfo() call idea for this). (12) Store a 16-byte volume ID in the superblock that can be returned in struct xstat [Steve French]. (Deferred to fsinfo). (13) Include granularity fields in the time data to indicate the granularity of each of the times (NFSv4 time_delta) [Steve French]. (Deferred to fsinfo). (14) FS_IOC_GETFLAGS value. These could be translated to BSD's st_flags. Note that the Linux IOC flags are a mess and filesystems such as Ext4 define flags that aren't in linux/fs.h, so translation in the kernel may be a necessity (or, possibly, we provide the filesystem type too). (Some attributes are made available in stx_attributes, but the general feeling was that the IOC flags were to ext[234]-specific and shouldn't be exposed through statx this way). (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer, Michael Kerrisk]. (Deferred, probably to fsinfo. Finding out if there's an ACL or seclabal might require extra filesystem operations). (16) Femtosecond-resolution timestamps [Dave Chinner]. (A __reserved field has been left in the statx_timestamp struct for this - if there proves to be a need). (17) A set multiple attributes syscall to go with this. =============== NEW SYSTEM CALL =============== The new system call is: int ret = statx(int dfd, const char filename, unsigned int flags, unsigned int mask, struct statx buffer); The dfd, filename and flags parameters indicate the file to query, in a similar way to fstatat(). There is no equivalent of lstat() as that can be emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags. There is also no equivalent of fstat() as that can be emulated by passing a NULL filename to statx() with the fd of interest in dfd. Whether or not statx() synchronises the attributes with the backing store can be controlled by OR'ing a value into the flags argument (this typically only affects network filesystems): (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this respect. (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise its attributes with the server - which might require data writeback to occur to get the timestamps correct. (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a network filesystem. The resulting values should be considered approximate. mask is a bitmask indicating the fields in struct statx that are of interest to the caller. The user should set this to STATX_BASIC_STATS to get the basic set returned by stat(). It should be noted that asking for more information may entail extra I/O operations. buffer points to the destination for the data. This must be 256 bytes in size. ====================== MAIN ATTRIBUTES RECORD ====================== The following structures are defined in which to return the main attribute set: struct statx_timestamp { __s64 tv_sec; __s32 tv_nsec; __s32 __reserved; }; struct statx { __u32 stx_mask; __u32 stx_blksize; __u64 stx_attributes; __u32 stx_nlink; __u32 stx_uid; __u32 stx_gid; __u16 stx_mode; __u16 __spare0[1]; __u64 stx_ino; __u64 stx_size; __u64 stx_blocks; __u64 __spare1[1]; struct statx_timestamp stx_atime; struct statx_timestamp stx_btime; struct statx_timestamp stx_ctime; struct statx_timestamp stx_mtime; __u32 stx_rdev_major; __u32 stx_rdev_minor; __u32 stx_dev_major; __u32 stx_dev_minor; __u64 __spare2[14]; }; The defined bits in request_mask and stx_mask are: STATX_TYPE Want/got stx_mode & S_IFMT STATX_MODE Want/got stx_mode & ~S_IFMT STATX_NLINK Want/got stx_nlink STATX_UID Want/got stx_uid STATX_GID Want/got stx_gid STATX_ATIME Want/got stx_atime{,_ns} STATX_MTIME Want/got stx_mtime{,_ns} STATX_CTIME Want/got stx_ctime{,_ns} STATX_INO Want/got stx_ino STATX_SIZE Want/got stx_size STATX_BLOCKS Want/got stx_blocks STATX_BASIC_STATS [The stuff in the normal stat struct] STATX_BTIME Want/got stx_btime{,_ns} STATX_ALL [All currently available stuff] stx_btime is the file creation time, stx_mask is a bitmask indicating the data provided and __spares[] are where as-yet undefined fields can be placed. Time fields are structures with separate seconds and nanoseconds fields plus a reserved field in case we want to add even finer resolution. Note that times will be negative if before 1970; in such a case, the nanosecond fields will also be negative if not zero. The bits defined in the stx_attributes field convey information about a file, how it is accessed, where it is and what it does. The following attributes map to FS__FL flags and are the same numerical value: STATX_ATTR_COMPRESSED File is compressed by the fs STATX_ATTR_IMMUTABLE File is marked immutable STATX_ATTR_APPEND File is append-only STATX_ATTR_NODUMP File is not to be dumped STATX_ATTR_ENCRYPTED File requires key to decrypt in fs Within the kernel, the supported flags are listed by: KSTAT_ATTR_FS_IOC_FLAGS [Are any other IOC flags of sufficient general interest to be exposed through this interface?] New flags include: STATX_ATTR_AUTOMOUNT Object is an automount trigger These are for the use of GUI tools that might want to mark files specially, depending on what they are. Fields in struct statx come in a number of classes: (0) stx_dev_, stx_blksize. These are local system information and are always available. (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino, stx_size, stx_blocks. These will be returned whether the caller asks for them or not. The corresponding bits in stx_mask will be set to indicate whether they actually have valid values. If the caller didn't ask for them, then they may be approximated. For example, NFS won't waste any time updating them from the server, unless as a byproduct of updating something requested. If the values don't actually exist for the underlying object (such as UID or GID on a DOS file), then the bit won't be set in the stx_mask, even if the caller asked for the value. In such a case, the returned value will be a fabrication. Note that there are instances where the type might not be valid, for instance Windows reparse points. (2) stx_rdev_*. This will be set only if stx_mode indicates we're looking at a blockdev or a chardev, otherwise will be 0. (3) stx_btime. Similar to (1), except this will be set to 0 if it doesn't exist. ======= TESTING ======= The following test program can be used to test the statx system call: samples/statx/test-statx.c Just compile and run, passing it paths to the files you want to examine. The file is built automatically if CONFIG_SAMPLES is enabled. Here's some example output. Firstly, an NFS directory that crosses to another FSID. Note that the AUTOMOUNT attribute is set because transiting this directory will cause d_automount to be invoked by the VFS. [root@andromeda ~]# /tmp/test-statx -A /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:26 Inode: 1703937 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------) Secondly, the result of automounting on that directory. [root@andromeda ~]# /tmp/test-statx /warthog/data statx(/warthog/data) = 0 results=7ff Size: 4096 Blocks: 8 IO Block: 1048576 directory Device: 00:27 Inode: 2 Links: 125 Access: (3777/drwxrwxrwx) Uid: 0 Gid: 4041 Access: 2016-11-24 09:02:12.219699527+0000 Modify: 2016-11-17 10:44:36.225653653+0000 Change: 2016-11-17 10:44:36.225653653+0000 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-03-02 20:51:15 -05:00
Ingo Molnar	174cd4b1e5	sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h> Fix up affected files that include this signal functionality via sched.h. Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-03-02 08:42:32 +01:00
Linus Torvalds	25c4e6c3f0	for-f2fs-4.11 This round introduces several interesting features such as on-disk NAT bitmaps, IO alignment, and a discard thread. And it includes a couple of major bug fixes as below. == Enhancement == - introduce on-disk bitmaps to avoid scanning NAT blocks when getting free nids - support IO alignment to prepare open-channel SSD integration in future - introduce a discard thread to avoid long latency during checkpoint and fstrim - use SSR for warm node and enable inline_xattr by default - introduce in-memory bitmaps to check FS consistency for debugging - improve write_begin by avoiding needless read IO == Bug fix == - fix broken zone_reset behavior for SMR drive - fix wrong victim selection policy during GC - fix missing behavior when preparing discard commands - fix bugs in atomic write support and fiemap - workaround to handle multiple f2fs_add_link calls having same name And it includes a bunch of clean-up patches as well. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYtmdOAAoJEEAUqH6CSFDSs0UP/AzngT37xVIhVBD13J9oHIuv rFA/eHVGRJmU1xc4SG1bghKm45xq8rwUX7irarfvLLc5aL+6VPGSdaRBykUr4A5N MN/bgK//EPp7If8EF+8PpY+9x7g67i0mtz5iD8dDrK+bUKV/IDKV1LWw5pR3g/g6 RwMH0dUVOiD/HJ5iFp1ykTdVPe4vFY013uVmyPxUq+nCBlqlQm1nOvrGjF/HeYyX kqcD2LEc79GPfS5ebQIKfCfLE0rsWVnnS6YaqlDNCD5/oRim71CUtA4MPTYv29vp R/SebWlayEm+u68+uQUu6AyIk/1IdP0+AtRuQd/VxuteoyXmkTMHER662DqN4F8J npPdNrbNdlzwuAP77avy+hplqbD19yUa7o7Fl1No5rfheT3CiNTSj2uoriyEAffH 1AM6tES7S7n5ttrXOr9iOxrK0u/vuaf7fbKVtK+RI09hwzdvyGB5HUdQB0iP/XR+ obw8dru79ISMVZ9YuDhSfjI5ohAcfthfuqgjUt2RAfDv19IRsg5eayAp3T6nUfEX AGQbV/52dkO9svZztMbcBW95zmqkE0cMeX66KIMCPXNuDiE474t8k115K6kHpFwP e4Kx+mTSNhR1LEAaVdmCjbLb0gVrumVHTdjaZopnxTFmE70u/M6h1vY90m1LkReF ZDK5mhfMmGzU4wkvbgP8 =tw8c -----END PGP SIGNATURE----- Merge tag 'for-f2fs-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This round introduces several interesting features such as on-disk NAT bitmaps, IO alignment, and a discard thread. And it includes a couple of major bug fixes as below. Enhancements: - introduce on-disk bitmaps to avoid scanning NAT blocks when getting free nids - support IO alignment to prepare open-channel SSD integration in future - introduce a discard thread to avoid long latency during checkpoint and fstrim - use SSR for warm node and enable inline_xattr by default - introduce in-memory bitmaps to check FS consistency for debugging - improve write_begin by avoiding needless read IO Bug fixes: - fix broken zone_reset behavior for SMR drive - fix wrong victim selection policy during GC - fix missing behavior when preparing discard commands - fix bugs in atomic write support and fiemap - workaround to handle multiple f2fs_add_link calls having same name ... and it includes a bunch of clean-up patches as well" * tag 'for-f2fs-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (97 commits) f2fs: avoid to flush nat journal entries f2fs: avoid to issue redundant discard commands f2fs: fix a plint compile warning f2fs: add f2fs_drop_inode tracepoint f2fs: Fix zoned block device support f2fs: remove redundant set_page_dirty() f2fs: fix to enlarge size of write_io_dummy mempool f2fs: fix memory leak of write_io_dummy mempool during umount f2fs: fix to update F2FS_{CP_}WB_DATA count correctly f2fs: use MAX_FREE_NIDS for the free nids target f2fs: introduce free nid bitmap f2fs: new helper cur_cp_crc() getting crc in f2fs_checkpoint f2fs: update the comment of default nr_pages to skipping f2fs: drop the duplicate pval in f2fs_getxattr f2fs: Don't update the xattr data that same as the exist f2fs: kill __is_extent_same f2fs: avoid bggc->fggc when enough free segments are avaliable after cp f2fs: select target segment with closer temperature in SSR mode f2fs: show simple call stack in fault injection message f2fs: no need lock_op in f2fs_write_inline_data ...	2017-03-01 15:55:04 -08:00
Jaegeuk Kim	900f736251	f2fs: avoid to flush nat journal entries This patch adds a missing condition which flushes nat journal entries unnecessarily introduced by: f2fs: add bitmaps for empty or full NAT blocks Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-28 09:57:13 -08:00
Jaegeuk Kim	8b107f5b97	f2fs: avoid to issue redundant discard commands If segs_per_sec is over 1 like under SMR, previously f2fs issues discard commands redundantly on the same section, since we didn't move end position for the previous discard command. E.g., start end \| \| prefree_bitmap = [01111100111100] And, after issue discard for this section, end start \| \| prefree_bitmap = [01111100111100] Select this section again by searching from (end + 1), start end \| \| prefree_bitmap = [01111100111100] Fixes: `36abef4e79` ("f2fs: introduce mode=lfs mount option") Cc: <stable@vger.kernel.org> Cc: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 13:44:08 -08:00
Hou Pengyang	37e79cd31c	f2fs: fix a plint compile warning fix such pclint warning: ... Loss of precision (arg. no. 2) (unsigned long long to unsigned int)) Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:51:21 -08:00
Hou Pengyang	b8d96a30b6	f2fs: add f2fs_drop_inode tracepoint Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:51:20 -08:00
Masato Suzuki	7bb3a371d1	f2fs: Fix zoned block device support The introduction of the multi-device feature partially broke the support for zoned block devices. In the function f2fs_scan_devices, sbi->devs allocation and initialization is skipped in the case of a single device mount. This result in no device information structure being allocated for the device. This is fine if the device is a regular device, but in the case of a zoned block device, the device zone type array is not initialized, which causes the function __f2fs_issue_discard_zone to fail as get_blkz_type is unable to determine the zone type of a section. Fix this by always allocating and initializing the sbi->devs device information array even in the case of a single device if that device is zoned. For this particular case, make sure to obtain a reference on the single device so that the call to blkdev_put() in destroy_device_list operates as expected. Fixes: `3c62be17d4` ("f2fs: support multiple devices") Cc: <stable@vger.kernel.org> # v4.10 Signed-off-by: Masato Suzuki <masato.suzuki@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:40:12 -08:00
Yunlei He	4fcf589ad4	f2fs: remove redundant set_page_dirty() This patch remove redundant set_page_dirty in truncate_blocks Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:40:11 -08:00
Chao Yu	a3ebfe4fd8	f2fs: fix to enlarge size of write_io_dummy mempool It needs to double cache size of write_io_dummy mempool, otherwise we may run out of cache in scenraio of Data/Node IOs were issued concurrently. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:40:10 -08:00
Chao Yu	b6895e8f99	f2fs: fix memory leak of write_io_dummy mempool during umount Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:40:09 -08:00
Chao Yu	540faedb00	f2fs: fix to update F2FS_{CP_}WB_DATA count correctly We should only account F2FS_{CP_}WB_DATA IOs for write path, fix it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:40:08 -08:00
Kinglong Mee	f0cdbfe6ef	f2fs: use MAX_FREE_NIDS for the free nids target F2FS has define MAX_FREE_NIDS for maximum of cached free nids target. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:48 -08:00
Chao Yu	4ac912427c	f2fs: introduce free nid bitmap In scenario of intensively node allocation, free nids will be ran out soon, then it needs to stop to load free nids by traversing NAT blocks, in worse case, if NAT blocks does not be cached in memory, it generates IOs which slows down our foreground operations. In order to speed up node allocation, in this patch we introduce a new free_nid_bitmap array, so there is an bitmap table for each NAT block, Once the NAT block is loaded, related bitmap cache will be switched on, and bitmap will be set during traversing nat entries in NAT block, later we can query and update nid usage status in memory completely. With such implementation, I expect performance of node allocation can be improved in the long-term after filesystem image is mounted. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:47 -08:00
Kinglong Mee	ced2c7ea8e	f2fs: new helper cur_cp_crc() getting crc in f2fs_checkpoint There are four places that getting the crc value in f2fs_checkpoint, just add a new helper cur_cp_crc for them. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:47 -08:00
Kinglong Mee	727ebb091e	f2fs: update the comment of default nr_pages to skipping Fixes: `2c237ebaa4` ("f2fs: avoid writing node/metapages during writes") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:46 -08:00
Kinglong Mee	0ec4a5b647	f2fs: drop the duplicate pval in f2fs_getxattr Fixes: `ba38c27eb9` ("f2fs: enhance lookup xattr") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:45 -08:00
Kinglong Mee	5f35a2cd5b	f2fs: Don't update the xattr data that same as the exist f2fs removes the old xattr data and appends the new data although the new data is same as the exist. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:44 -08:00
Chao Yu	317e130096	f2fs: kill __is_extent_same Since commit `ee6d182f2a` ("f2fs: remove syncing inode page in all the cases") delayed inode element updating from inode cache to node page cache, so once largest cached extent is updated, we can make inode dirty immediately instead of checking and updating it in the end of extent cache update. The above commit didn't clean up unneeded codes in extent_cache.c, let's finish the job in this patch. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:43 -08:00
Hou Pengyang	19f4e688f8	f2fs: avoid bggc->fggc when enough free segments are avaliable after cp We use has_not_enough_free_secs to check if there are enough free segments, (free_sections(sbi) + freed) <= (node_secs + 2 * dent_secs + imeta_secs + reserved_sections(sbi) + needed); Under scenario with large number of dirty nodes, these nodes would be flushed during cp, as a result, right side of the inequality would be decreased, while left side stays unchanged if these nodes are flushed in SSR way, which means there are enough free segments after this cp. For this case, we just do a bggc instead of fggc. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 10:07:37 -08:00
Chao Yu	d27c3d89db	f2fs: select target segment with closer temperature in SSR mode In SSR mode, we can allocate target segment which has different temperature type from the type of current block, in order to avoid mixing coldest and hottest data/node as much as possible, change SSR allocation policy to select closer temperature for current block prior. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:56 -08:00
Chao Yu	55523519bc	f2fs: show simple call stack in fault injection message Previously kernel message can show that in which function we do the injection, but unfortunately, most of the caller are the same, for tracking more information of injection path, it needs to show upper caller's name. This patch supports that ability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:55 -08:00
Yunlei He	dd7b2333e6	f2fs: no need lock_op in f2fs_write_inline_data Similar as f2fs_write_inode, f2fs_write_inline_data just mark inode page dirty, so it's no need to write inline data under read lock of cp_rwsem. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:55 -08:00
Jaegeuk Kim	22ad0b6ab4	f2fs: add bitmaps for empty or full NAT blocks This patches adds bitmaps to represent empty or full NAT blocks containing free nid entries. If we can find valid crc\|cp_ver in the last block of checkpoint pack, we'll use these bitmaps when building free nids. In order to avoid checkpointing burden, up-to-date bitmaps will be flushed only during umount time. So, normally we can get this gain, but when power-cut happens, we rely on fsck.f2fs which recovers this bitmap again. After this patch, we build free nids from nid #0 at mount time to make more full NAT blocks, but in runtime, we check empty NAT blocks to load free nids without loading any NAT pages from disk. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:54 -08:00
Yunlei He	5e8256ac2e	f2fs: replace rw semaphore extent_tree_lock with mutex lock This patch replace rw semaphore extent_tree_lock with mutex lock for no read cases with this lock. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:53 -08:00
Kinglong Mee	3f2be04304	f2fs: avoid m_flags overlay when allocating more data blocks When more than one data blocks are allocated, the F2FS_MAP_UNWRITTEN/MAPPED flags will be overlapped by F2FS_MAP_NEW at the later times. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:52 -08:00
Hou Pengyang	6bfaf7b150	f2fs: remove unsafe bitmap checking proc A: proc B: - writeback_sb_inodes - __writeback_single_inode - do_writepages - f2fs_write_node_pages - f2fs_balance_fs_bg - write_checkpoint - build_free_nids - flush_nat_entries - __build_free_nids - __flush_nat_entry_set - ra_meta_pages - get_next_nat_page - current_nat_addr - set_to_next_nat [do nat_bitmap checking] - f2fs_change_bit For proc A, nat_bitmap and nat_bitmap_mir would be compared without lock_op and nm_i->nat_tree_lock, while proc B is changing nat_bitmap/nat_bitmap_ver in cp. So it is normal for nat_bitmap/nat_bitmap diffrence under such scenario. This patch fix this by removing the monitoring point. [Fix: `599a09b` f2fs: check in-memory nat version bitmap] Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:51 -08:00
Hou Pengyang	e15882b6c6	f2fs: init local extent_info to avoid stale stack info in tp To avoid such stale(fops, blk, len) info in f2fs_lookup_extent_tree_end tp dio-23095 [005] ...1 17878.856859: f2fs_lookup_extent_tree_end: dev = (259,30), ino = 856, pgofs = 0, ext_info(fofs: 3441207040, blk: 4294967232, len: 3481143808) Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:50 -08:00
Yunlong Song	77190e1f31	f2fs: remove unnecessary condition check for write_checkpoint in f2fs_gc Since has_not_enough_free_secs(sbi, 0, 0) must be true if has_not_enough_ free_secs(sbi, sec_freed, 0) is true, write_checkpoint is sure to execute in both conditions. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:50 -08:00
Jaegeuk Kim	9259228571	f2fs: check discard alignment only for SEQWRITE zones For converntional zones, we don't need to align discard commands to exact zone size. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:46 -08:00
Jaegeuk Kim	40465257ac	f2fs: wait for discard completion after submission We don't need to wait for each discard commands when unmounting the image. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:39 -08:00
Jaegeuk Kim	47b8980816	f2fs: much larger batched trim_fs job We have a kernel thread to issue discard commands, so we can increase the number of batched discard sections. By default, now it becomes 4GB range. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:30 -08:00
Jaegeuk Kim	ad4d307fce	f2fs: avoid very large discard command This patch adds MAX_DISCARD_BLOCKS() to avoid issuing too much large single discard command. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-27 09:59:20 -08:00
Dave Jiang	11bac80004	mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf ->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to take a vma and vmf parameter when the vma already resides in vmf. Remove the vma parameter to simplify things. [arnd@arndb.de: fix ARM build] Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-02-24 17:46:54 -08:00
Jaegeuk Kim	70d625cbdb	f2fs: do SSR for node segments more aggresively This patch gives more SSR chances for node blocks. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-24 10:01:41 -08:00
Jaegeuk Kim	c192f7a477	f2fs: find data segments across all the types Previously, if type is CURSEG_HOT_DATA, we only check CURSEG_HOT_DATA only. This patch fixes to search all the different types for SSR. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-24 10:01:08 -08:00
Jaegeuk Kim	d0db7703ac	f2fs: do SSR in higher priority Let's check SSR in prior to LFS allocation. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-24 09:39:53 -08:00
Yunlong Song	035e97adab	f2fs: do SSR for data when there is enough free space In allocate_segment_by_default(), need_SSR() already detected it's time to do SSR. So, let's try to find victims for data segments more aggressively in time. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-24 09:39:52 -08:00
Hou Pengyang	b9cd20619e	f2fs: node segment is prior to data segment selected victim As data segment gc may lead dnode dirty, so the greedy cost for data segment should be valid blocks * 2, that is data segment is prior to node segment. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-24 09:39:40 -08:00
Yunlong Song	3436c4bdb3	f2fs: put allocate_segment after refresh_sit_entry SIT information should be updated before segment allocation, since SSR needs latest valid block information. Current code does not update the old_blkaddr info in sit_entry, so adjust the allocate_segment to its proper location. Commit `5e443818fa` ("f2fs: handle dirty segments inside refresh_sit_entry") puts it into wrong location. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-24 09:37:30 -08:00
Hou Pengyang	e93b986525	f2fs: add ovp valid_blocks check for bg gc victim to fg_gc For foreground gc, greedy algorithm should be adapted, which makes this formula work well: (2 * (100 / config.overprovision + 1) + 6) But currently, we fg_gc have a prior to select bg_gc victim segments to gc first, these victims are selected by cost-benefit algorithm, we can't guarantee such segments have the small valid blocks, which may destroy the f2fs rule, on the worstest case, would consume all the free segments. This patch fix this by add a filter in check_bg_victims, if segment's has # of valid blocks over overprovision ratio, skip such segments. Cc: <stable@vger.kernel.org> Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 11:28:20 -08:00
Jaegeuk Kim	86d54795c9	f2fs: do not wait for writeback in write_begin Otherwise we can get livelock like below. [79880.428136] dbench D 0 18405 18404 0x00000000 [79880.428139] Call Trace: [79880.428142] __schedule+0x219/0x6b0 [79880.428144] schedule+0x36/0x80 [79880.428147] schedule_timeout+0x243/0x2e0 [79880.428152] ? update_sd_lb_stats+0x16b/0x5f0 [79880.428155] ? ktime_get+0x3c/0xb0 [79880.428157] io_schedule_timeout+0xa6/0x110 [79880.428161] __lock_page+0xf7/0x130 [79880.428164] ? unlock_page+0x30/0x30 [79880.428167] pagecache_get_page+0x16b/0x250 [79880.428171] grab_cache_page_write_begin+0x20/0x40 [79880.428182] f2fs_write_begin+0xa2/0xdb0 [f2fs] [79880.428192] ? f2fs_mark_inode_dirty_sync+0x16/0x30 [f2fs] [79880.428197] ? kmem_cache_free+0x79/0x200 [79880.428203] ? __mark_inode_dirty+0x17f/0x360 [79880.428206] generic_perform_write+0xbb/0x190 [79880.428213] ? file_update_time+0xa4/0xf0 [79880.428217] __generic_file_write_iter+0x19b/0x1e0 [79880.428226] f2fs_file_write_iter+0x9c/0x180 [f2fs] [79880.428231] __vfs_write+0xc5/0x140 [79880.428235] vfs_write+0xb2/0x1b0 [79880.428238] SyS_write+0x46/0xa0 [79880.428242] entry_SYSCALL_64_fastpath+0x1e/0xad Fixes: cae96a5c8ab6 ("f2fs: check io submission more precisely") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 11:23:27 -08:00
Yunlei He	05eeb118a0	f2fs: replace __get_victim by dirty_segments in FG_GC In FG_GC process, it will search victim section twice. This will cause some dirty section with less valid blocks skip garbage collection. section # 26425 : valid blocks # 3 142.037567: get_victim_by_default: victim 26425 : valid blocks # 3 142.037585: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244 142.039494: f2fs_get_victim: dev = (259,30), type = Hot DATA, policy = (Background GC, SSR-mode, Greedy), victim = 19022 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 24 142.070247: new_curseg: Debug: alloc new segment 26746 142.244341: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26054 ofs_unit = 1, pre_victim_secno = 26054, prefree = 0, free = 243 142.254475: do_garbage_collect: Debug: FG_GC, seg_freed = 1 142.293131: f2fs_get_victim: dev = (259,30), type = Warm DATA, policy = (Background GC, SSR-mode, Greedy), victim = 23466 ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 244 142.319001: f2fs_get_victim: dev = (259,30), type = Warm DATA, policy = (Background GC, SSR-mode, Greedy), victim = 23467 ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 244 142.368879: get_victim_by_default: victim 26425 : valid blocks # 3 142.368894: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244 142.378127: f2fs_get_victim: dev = (259,30), type = Hot DATA, policy = (Background GC, SSR-mode, Greedy), victim = 19612 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 24 142.416917: new_curseg: Debug: alloc new segment 26054 142.656794: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 25404 ofs_unit = 1, pre_victim_secno = 25404, prefree = 0, free = 243 142.662139: do_garbage_collect: Debug: FG_GC, seg_freed = 1 142.684159: new_curseg: Debug: alloc new segment 25197 142.685059: get_victim_by_default: victim 26425 : valid blocks # 3 142.685079: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 243 142.701427: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26238 ofs_unit = 1, pre_victim_secno = 26238, prefree = 0, free = 243 142.707105: do_garbage_collect: Debug: FG_GC, seg_freed = 1 142.802444: f2fs_get_victim: dev = (259,30), type = Warm DATA, policy = (Background GC, SSR-mode, Greedy), victim = 23473 ofs_unit = 1, pre_victim_secno = -1, prefree = 0, free = 244 142.804422: get_victim_by_default: victim 26425 : valid blocks # 3 142.804443: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244 142.851567: f2fs_get_victim: dev = (259,30), type = Hot DATA, policy = (Background GC, SSR-mode, Greedy), victim = 19092 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 24 142.865014: new_curseg: Debug: alloc new segment 26238 143.082245: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26307 ofs_unit = 1, pre_victim_secno = 26307, prefree = 0, free = 244 143.088252: do_garbage_collect: Debug: FG_GC, seg_freed = 1 143.128307: new_curseg: Debug: alloc new segment 25404 143.181846: get_victim_by_default: victim 26425 : valid blocks # 3 143.181872: f2fs_get_victim: dev = (259,30), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 26425 ofs_unit = 1, pre_victim_secno = 26425, prefree = 0, free = 244 Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 11:23:26 -08:00
Jaegeuk Kim	88c5c13a50	f2fs: fix multiple f2fs_add_link() calls having same name It turns out a stakable filesystem like sdcardfs in AOSP can trigger multiple vfs_create() to lower filesystem. In that case, f2fs will add multiple dentries having same name which breaks filesystem consistency. Until upper layer fixes, let's work around by f2fs, which shows actually not much performance regression. Cc: <stable@vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 11:23:25 -08:00
Jaegeuk Kim	d50aaeec90	f2fs: show actual device info in tracepoints This patch shows actual device information in the tracepoints. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 11:23:24 -08:00
Jaegeuk Kim	5b6c6be2d8	f2fs: use SSR for warm node as well We have had node chains, but haven't used it so far due to stale node blocks. Now, we have crc\|cp_ver in node footer and give random cp_ver at format time, we can start to use it again. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 11:23:22 -08:00
Chao Yu	39133a5015	f2fs: enable inline_xattr by default In android, since SElinux is enable, security policy will be appliedd for each file, it stores in inode as an xattr entry, so it will take one 4k size node block additionally for each file. Let's enable inline_xattr by default in order to save storage space. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:21:49 -08:00
Chao Yu	23cf7212a1	f2fs: introduce noinline_xattr mount option This patch introduces new mount option 'noinline_xattr', so we can disable inline xattr functionality which is already set as a default mount option. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:21:48 -08:00
Jaegeuk Kim	25cc5d3b9d	f2fs: avoid reading NAT page by get_node_info We've not seen this buggy case for a long time, so it's time to avoid this unnecessary get_node_info() call which reading NAT page to cache nat entry. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:21:47 -08:00
Jaegeuk Kim	9b064f7d0c	f2fs: remove build_free_nids() during checkpoint Let's avoid build_free_nids() in checkpoint path. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:53 -08:00
Chao Yu	d260081ccf	f2fs: change recovery policy of xattr node block Currently, if we call fsync after updating the xattr date belongs to the file, f2fs needs to trigger checkpoint to keep xattr data consistent. But, this policy cause low performance as checkpoint will block most foreground operations and cause unneeded and unrelated IOs around checkpoint. This patch will reuse regular file recovery policy for xattr node block, so, we change to write xattr node block tagged with fsync flag to warm area instead of cold area, and during recovery, we search warm node chain for fsynced xattr block, and do the recovery. So, for below application IO pattern, performance can be improved obviously: - touch file - create/update/delete xattr entry in file - fsync file Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:52 -08:00
Bhumika Goyal	2ad0ef846b	f2fs: super: constify fscrypt_operations structure Declare fscrypt_operations structure as const as it is only stored in the s_cop field of a super_block structure. This field is of type const, so fscrypt_operations structure having this property can be made const too. File size before: fs/f2fs/super.o text data bss dec hex filename 54131 31355 184 85670 14ea6 fs/f2fs/super.o File size after: fs/f2fs/super.o text data bss dec hex filename 54227 31259 184 85670 14ea6 fs/f2fs/super.o Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:51 -08:00
Jaegeuk Kim	1200abb26f	f2fs: show checkpoint version at mount time If we mounted f2fs successfully, let's show current checkpoint version. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:50 -08:00
Jaegeuk Kim	7f54f51f46	f2fs: remove preflush for nobarrier case This patch removes REQ_PREFLUSH in the nobarrier case. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:48 -08:00
Jaegeuk Kim	942fd3192f	f2fs: check last page index in cached bio to decide submission If the cached bio has the last page's index, then we need to submit it. Otherwise, we don't need to submit it and can wait for further IO merges. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:48 -08:00
Jaegeuk Kim	d68f735b3b	f2fs: check io submission more precisely This patch check IO submission more precisely than previous rough check. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:47 -08:00
Jaegeuk Kim	f566bae846	f2fs: call internal __write_data_page directly This patch introduces __write_data_page to call it by f2fs_write_cache_pages directly.. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:46 -08:00
Jaegeuk Kim	e7c75ab099	f2fs: avoid out-of-order execution of atomic writes We need to flush data writes before flushing last node block writes by using FUA with PREFLUSH. We don't need to guarantee precedent node writes since if those are not written, we can't reach to the last node block when scanning node block chain during roll-forward recovery. Afterwards f2fs_wait_on_page_writeback guarantees all the IO submission to disk, which builds a valid node block chain. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:10:35 -08:00
Jaegeuk Kim	faa24895ac	f2fs: move write_node_page above fsync_node_pages This patch just moves write_node_page and introduces an inner function. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:09:43 -08:00
Jaegeuk Kim	c1b221078b	f2fs: move flush tracepoint This patch moves the tracepoint location for flush command. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-23 10:08:43 -08:00
Jaegeuk Kim	a00861dbca	f2fs: show # of APPEND and UPDATE inodes This patch shows cached # of APPEND and UPDATE inode entries. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:54:53 -08:00
DongOh Shin	cac5a3d8f5	f2fs: fix 446 coding style warnings in f2fs.h 1) Nine coding style warnings below have been resolved: "Missing a blank line after declarations" 2) 435 coding style warnings below have been resolved: "function definition argument 'x' should also have an identifier name" 3) Two coding style warnings below have been resolved: "macros should not use a trailing semicolon" Signed-off-by: DongOh Shin <doscode.kr@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:55 -08:00
DongOh Shin	c64ab12e36	f2fs: fix 3 coding style errors in f2fs.h Two coding style errors below have been resolved: "Macros with complex values should be enclosed in parentheses" And a coding style error below has been resolved: "space prohibited before that ',' (ctx:WxW)" Signed-off-by: DongOh Shin <doscode.kr@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:55 -08:00
Jaegeuk Kim	8ed5974552	f2fs: declare missing static function We missed two functions declared as static functions. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:54 -08:00
Kaixu Xia	0cc0dec2b6	f2fs: show the fault injection mount option This patch shows the fault injection mount option in f2fs_show_options(). Signed-off-by: Kaixu Xia <xiakaixu@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:53 -08:00
Chao Yu	73545817c9	f2fs: fix null pointer dereference when issuing flush in ->fsync We only allocate flush merge control structure sbi::sm_info::fcc_info when flush_merge option is on, but in f2fs_issue_flush we still try to access member of the control structure without that option, it incurs panic as show below, fix it. Call Trace: __remove_ino_entry+0xa9/0xc0 [f2fs] f2fs_do_sync_file.isra.27+0x214/0x6d0 [f2fs] f2fs_sync_file+0x18/0x20 [f2fs] vfs_fsync_range+0x3d/0xb0 __do_page_fault+0x261/0x4d0 do_fsync+0x3d/0x70 SyS_fsync+0x10/0x20 do_syscall_64+0x6e/0x180 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f18ce260de0 RSP: 002b:00007ffdd4589258 EFLAGS: 00000246 ORIG_RAX: 000000000000004a RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f18ce260de0 RDX: 0000000000000006 RSI: 00000000016c0360 RDI: 0000000000000003 RBP: 00000000016c0360 R08: 000000000000ffff R09: 000000000000001f R10: 00007ffdd4589020 R11: 0000000000000246 R12: 00000000016c0100 R13: 0000000000000000 R14: 00000000016c1f00 R15: 00000000016c0100 Code: fb 81 e3 00 08 00 00 48 89 45 a0 0f 1f 44 00 00 31 c0 85 db 75 27 41 81 e7 00 04 00 00 74 0c 41 8b 45 20 85 c0 0f 85 81 00 00 00 <f0> 41 ff 45 20 4c 89 e7 e8 f8 e9 ff ff f0 41 ff 4d 20 48 83 c4 RIP: f2fs_issue_flush+0x5b/0x170 [f2fs] RSP: ffffc90003b5fd78 CR2: 0000000000000020 ---[ end trace a09314c24f037648 ]--- Reported-by: Shuoran Liu <liushuoran@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:52 -08:00
Chao Yu	dba79f38bc	f2fs: fix to avoid overflow when left shifting page offset We use following method to calculate size with current page index: size = index << PAGE_SHIFT If type of index has only 32-bits size, left shifting will incur overflow, which makes result incorrect. So let's cast index with 64-bits type to avoid such issue. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:51 -08:00
Chao Yu	ba38c27eb9	f2fs: enhance lookup xattr Previously, in getxattr we will load all entries both in inline xattr and xattr node block, and then do the lookup in all entries, but our lookup flow shows low efficiency, since if we can lookup and hit in inline xattr of inode page cache first, we don't need to load and lookup xattr node block, which can obviously save cpu time and IO latency. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: initialize NULL to avoid warning] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:51 -08:00
Wei Fang	b86e33075e	f2fs: fix a dead loop in f2fs_fiemap() A dead loop can be triggered in f2fs_fiemap() using the test case as below: ... fd = open(); fallocate(fd, 0, 0, 4294967296); ioctl(fd, FS_IOC_FIEMAP, fiemap_buf); ... It's caused by an overflow in __get_data_block(): ... bh->b_size = map.m_len << inode->i_blkbits; ... map.m_len is an unsigned int, and bh->b_size is a size_t which is 64 bits on 64 bits archtecture, type conversion from an unsigned int to a size_t will result in an overflow. In the above-mentioned case, bh->b_size will be zero, and f2fs_fiemap() will call get_data_block() at block 0 again an again. Fix this by adding a force conversion before left shift. Signed-off-by: Wei Fang <fangwei1@huawei.com> Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:49 -08:00
Jaegeuk Kim	dc91de78e5	f2fs: do not preallocate blocks which has wrong buffer Sheng Yong reports needless preallocation if write(small_buffer, large_size) is called. In that case, f2fs preallocates large_size, but vfs returns early due to small_buffer size. Let's detect it before preallocation phase in f2fs. Reported-by: Sheng Yong <shengyong1@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:48 -08:00
Jaegeuk Kim	dcc9165dbf	f2fs: show # of on-going flush and discard bios This patch adds stat information for flush and discard commands. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:47 -08:00
Jaegeuk Kim	1546996348	f2fs: add a kernel thread to issue discard commands asynchronously This patch adds a kernel thread to issue discard commands. It proposes three states, D_PREP, D_SUBMIT, and D_DONE to identify current bio status. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 20:24:45 -08:00
Jaegeuk Kim	0b54fb8458	f2fs: factor out discard command info into discard_cmd_control This patch adds discard_cmd_control with the existing discarding controls. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:53 -08:00
Jaegeuk Kim	d4adb30f25	f2fs: reorganize stat information This patch modifies stat information more clearly. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:52 -08:00
Jaegeuk Kim	b01a92019c	f2fs: clean up flush/discard command namings This patch simply cleans up the names for flush/discard commands. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:51 -08:00
Chao Yu	ae27d62e6b	f2fs: check in-memory sit version bitmap This patch adds a mirror for sit version bitmap, and use it to detect in-memory bitmap corruption which may be caused by bit-transition of cache or memory overflow. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:50 -08:00
Chao Yu	599a09b2c1	f2fs: check in-memory nat version bitmap This patch adds a mirror for nat version bitmap, and use it to detect in-memory bitmap corruption which may be caused by bit-transition of cache or memory overflow. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:49 -08:00
Chao Yu	355e78913c	f2fs: check in-memory block bitmap This patch adds a mirror for valid block bitmap, and use it to detect in-memory bitmap corruption which may be caused by bit-transition of cache or memory overflow. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:49 -08:00
Chao Yu	5fe457430e	f2fs: introduce FI_ATOMIC_COMMIT This patch introduces a new flag to indicate inode status of doing atomic write committing, so that, we can keep atomic write status for inode during atomic committing, then we can skip GCing pages of atomic write inode, that avoids random GCed datas being mixed with current transaction, so isolation of transaction can be kept. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:48 -08:00
Chao Yu	939afa943c	f2fs: clean up with list_{first, last}_entry Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:47 -08:00
Jaegeuk Kim	25290fa559	f2fs: return fs_trim if there is no candidate If there is no candidate to submit discard command during f2fs_trim_fs, let's return without checkpoint. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 18:48:40 -08:00
Jaegeuk Kim	0333ad4e4f	f2fs: avoid needless checkpoint in f2fs_trim_fs The f2fs_trim_fs() doesn't need to do checkpoint if there are newly allocated data blocks only which didn't change the critical checkpoint data such as nat and sit entries. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-02-22 13:16:36 -08:00
Linus Torvalds	6c24337f22	Various cleanups for the file system encryption feature. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlirP6wACgkQ8vlZVpUN gaMwpQgApR67CxzlstxYjZpWPAqC8McJ2FBDX+mCOle5Vkc1WQDklwr0oCfQThTj eDSFRhNfIvyPh0DJ589PxBCsWOqN5h6Si7hD5ZinomVNI+IL89OytaU5EV2OpWaW iKdJgO9Tm8U7LuY6FOIoVdX57kUXVdkWoj61rC056B1SNhnNiVeofi7lYDM8Ix4q IGSQ9W24iQKmCk4hCwgObhJBRK9RnlOH0GLUmpMaS+jnfnj/uePwdxWEFsPuCOob 8acAJ49lr55kjIw79E0BAyWxhEZ2aiArHk8PaWynT/DyNq3ftcapPlpftoeba8vo glBJRX70QxPvt0iHEp0ykfExkhWhFA== =Joki -----END PGP SIGNATURE----- Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt Pull fscrypt updates from Ted Ts'o: "Various cleanups for the file system encryption feature" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: fscrypt: constify struct fscrypt_operations fscrypt: properly declare on-stack completion fscrypt: split supp and notsupp declarations into their own headers fscrypt: remove redundant assignment of res fscrypt: make fscrypt_operations.key_prefix a string fscrypt: remove unused 'mode' member of fscrypt_ctx ext4: don't allow encrypted operations without keys fscrypt: make test_dummy_encryption require a keyring key fscrypt: factor out bio specific functions fscrypt: pass up error codes from ->get_context() fscrypt: remove user-triggerable warning messages fscrypt: use EEXIST when file already uses different policy fscrypt: use ENOTDIR when setting encryption policy on nondirectory fscrypt: use ENOKEY when file cannot be created w/o key	2017-02-20 18:22:31 -08:00
Eric Biggers	6f69f0ed61	fscrypt: constify struct fscrypt_operations Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Richard Weinberger <richard@nod.at>	2017-02-08 10:59:57 -05:00
Eric Biggers	46f47e4800	fscrypt: split supp and notsupp declarations into their own headers Previously, each filesystem configured without encryption support would define all the public fscrypt functions to their notsupp_* stubs. This list of #defines had to be updated in every filesystem whenever a change was made to the public fscrypt functions. To make things more maintainable now that we have three filesystems using fscrypt, split the old header fscrypto.h into several new headers. fscrypt_supp.h contains the real declarations and is included by filesystems when configured with encryption support, whereas fscrypt_notsupp.h contains the inline stubs and is included by filesystems when configured without encryption support. fscrypt_common.h contains common declarations needed by both. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-02-06 23:26:43 -05:00
Jaegeuk Kim	4e6a8d9b22	f2fs: relax async discard commands more This patch relaxes async discard commands to avoid waiting its end_io during checkpoint. Instead of waiting them during checkpoint, it will be done when actually reusing them. Test on initial partition of nvme drive. # time fstrim /mnt/test Before : 6.158s After : 4.822s Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	bb95d9ab2a	f2fs: drop exist_data for inline_data when truncated to 0 A test program gets the SEEK_DATA with two values between a new created file and the exist file on f2fs filesystem. F2FS filesystem, (the first "test1" is a new file) SEEK_DATA size != 0 (offset = 8192) SEEK_DATA size != 0 (offset = 4096) PNFS filesystem, (the first "test1" is a new file) SEEK_DATA size != 0 (offset = 4096) SEEK_DATA size != 0 (offset = 4096) int main(int argc, char *argv) { char filename = argv[1]; int offset = 1, i = 0, fd = -1; if (argc < 2) { printf("Usage: %s f2fsfilename\n", argv[0]); return -1; } /* if (!access(filename, F_OK) \|\| errno != ENOENT) { printf("Needs a new file for test, %m\n"); return -1; }/ fd = open(filename, O_RDWR \| O_CREAT, 0777); if (fd < 0) { printf("Create test file %s failed, %m\n", filename); return -1; } for (i = 0; i < 20; i++) { offset = 1 << i; ftruncate(fd, 0); lseek(fd, offset, SEEK_SET); write(fd, "test", 5); / Get the alloc size by seek data equal zero*/ if (lseek(fd, 0, SEEK_DATA)) { printf("SEEK_DATA size != 0 (offset = %d)\n", offset); break; } } close(fd); return 0; } Reported-and-Tested-by: Kinglong Mee <kinglongmee@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	363fa4e078	f2fs: don't allow encrypted operations without keys This patch fixes the renaming bug on encrypted filenames, which was pointed by (ext4: don't allow encrypted operations without keys) Cc: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	26a28a0c1e	f2fs: show the max number of atomic operations This patch adds to show the max number of atomic operations which are conducting concurrently. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	ec91538dcc	f2fs: get io size bit from mount option This patch adds to set io_size_bits from mount option. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	0a595ebaaa	f2fs: support IO alignment for DATA and NODE writes This patch implements IO alignment by filling dummy blocks in DATA and NODE write bios. If we can guarantee, for example, 32KB or 64KB for such the IOs, we can eliminate underlying dummy page problem which FTL conducts in order to close MLC or TLC partial written pages. Note that, - it requires "-o mode=lfs". - IO size should be power of 2, not exceed BIO_MAX_PAGES, 256. - read IO is still 4KB. - do checkpoint at fsync, if dummy NODE page was written. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	554b5125f5	f2fs: add submit_bio tracepoint This patch adds final submit_bio() tracepoint. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Jaegeuk Kim	9d52a504db	f2fs: reassign new segment for mode=lfs Otherwise we can remain wrong curseg->next_blkoff, resulting in fsck failure. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Yunlei He	650d3c4e56	f2fs: fix a missing discard prefree segments If userspace issue a fstrim with a range not involve prefree segments, it will reuse these segments without discard. This patch fix it. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Geliang Tang	ed0b56209f	f2fs: use rb_entry_safe Use rb_entry_safe() instead of open-coding it. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Yunlei He	746e240392	f2fs: add a case of no need to read a page in write begin If the range we write cover the whole valid data in the last page, we do not need to read it. Signed-off-by: Yunlei He <heyunlei@huawei.com> [Jaegeuk Kim: nullify the remaining area (fix: xfstests/f2fs/001)] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Yunlei He	7855eba4d6	f2fs: fix a problem of using memory after free This patch fix a problem of using memory after free in function __try_merge_extent_node. Fixes: `0f825ee6e8` ("f2fs: add new interfaces for extent tree") Cc: <stable@vger.kernel.org> Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:01 +09:00
Dan Carpenter	07fe8d4440	f2fs: remove unneeded condition We checked that "inode" is not an error pointer earlier so there is no need to check again here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:00 +09:00
Chao Yu	5c9e418436	f2fs: don't cache nat entry if out of memory If we run out of memory, in cache_nat_entry, it's better to avoid loop for allocating memory to cache nat entry, so in low memory scenario, for read path of node block, I expect this can avoid unneeded latency. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:00 +09:00
Yunlei He	fed2466848	f2fs: remove unused values in recover_fsync_data This patch remove unused values in function recover_fsync_data Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2017-01-29 12:46:00 +09:00
Damien Le Moal	f99e86485c	block: Rename blk_queue_zone_size and bdev_zone_size All block device data fields and functions returning a number of 512B sectors are by convention named xxx_sectors while names in the form xxx_size are generally used for a number of bytes. The blk_queue_zone_size and bdev_zone_size functions were not following this convention so rename them. No functional change is introduced by this patch. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Collapsed the two patches, they were nonsensically split and broke bisection. Signed-off-by: Jens Axboe <axboe@fb.com>	2017-01-12 07:58:32 -07:00
Eric Biggers	a5d431eff2	fscrypt: make fscrypt_operations.key_prefix a string There was an unnecessary amount of complexity around requesting the filesystem-specific key prefix. It was unclear why; perhaps it was envisioned that different instances of the same filesystem type could use different key prefixes, or that key prefixes could be binary. However, neither of those things were implemented or really make sense at all. So simplify the code by making key_prefix a const char *. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2017-01-08 01:03:41 -05:00
Eric Biggers	54475f531b	fscrypt: use ENOKEY when file cannot be created w/o key As part of an effort to clean up fscrypt-related error codes, make attempting to create a file in an encrypted directory that hasn't been "unlocked" fail with ENOKEY. Previously, several error codes were used for this case, including ENOENT, EACCES, and EPERM, and they were not consistent between and within filesystems. ENOKEY is a better choice because it expresses that the failure is due to lacking the encryption key. It also matches the error code returned when trying to open an encrypted regular file without the key. I am not aware of any users who might be relying on the previous inconsistent error codes, which were never documented anywhere. This failure case will be exercised by an xfstest. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-31 16:26:20 -05:00
Linus Torvalds	231753ef78	Merge uncontroversial parts of branch 'readlink' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull partial readlink cleanups from Miklos Szeredi. This is the uncontroversial part of the readlink cleanup patch-set that simplifies the default readlink handling. Miklos and Al are still discussing the rest of the series. * git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: vfs: make generic_readlink() static vfs: remove ".readlink = generic_readlink" assignments vfs: default to generic_readlink() vfs: replace calling i_op->readlink with vfs_readlink() proc/self: use generic_readlink ecryptfs: use vfs_get_link() bad_inode: add missing i_op initializers	2016-12-17 19:16:12 -08:00
Linus Torvalds	5084fdf081	This merge request includes the dax-4.0-iomap-pmd branch which is needed for both ext4 and xfs dax changes to use iomap for DAX. It also includes the fscrypt branch which is needed for ubifs encryption work as well as ext4 encryption and fscrypt cleanups. Lots of cleanups and bug fixes, especially making sure ext4 is robust against maliciously corrupted file systems --- especially maliciously corrupted xattr blocks and a maliciously corrupted superblock. Also fix ext4 support for 64k block sizes so it works well on ppcle. Fixed mbcache so we don't miss some common xattr blocks that can be merged. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAlhQQVEACgkQ8vlZVpUN gaN9TQgAoCD+V4kJjMCFhiV8u6QR3hqD6bOZbggo5wJf4CHglWkmrbAmc3jANOgH CKsXDRRjxuDjPXf1ukB1i4M7ArLYjkbbzKdsu7lismoJLS+w8uwUKSNdep+LYMjD alxUcf5DCzLlUmdOdW4yE22L+CwRfqfs8IpBvKmJb7DrAKiwJVA340ys6daBGuu1 63xYx0QIyPzq0xjqLb6TVf88HUI4NiGVXmlm2wcrnYd5966hEZd/SztOZTVCVWOf Z0Z0fGQ1WJzmaBB9+YV3aBi+BObOx4m2PUprIa531+iEW02E+ot5Xd4vVQFoV/r4 NX3XtoBrT1XlKagy2sJLMBoCavqrKw== =j4KP -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "This merge request includes the dax-4.0-iomap-pmd branch which is needed for both ext4 and xfs dax changes to use iomap for DAX. It also includes the fscrypt branch which is needed for ubifs encryption work as well as ext4 encryption and fscrypt cleanups. Lots of cleanups and bug fixes, especially making sure ext4 is robust against maliciously corrupted file systems --- especially maliciously corrupted xattr blocks and a maliciously corrupted superblock. Also fix ext4 support for 64k block sizes so it works well on ppcle. Fixed mbcache so we don't miss some common xattr blocks that can be merged" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (86 commits) dax: Fix sleep in atomic contex in grab_mapping_entry() fscrypt: Rename FS_WRITE_PATH_FL to FS_CTX_HAS_BOUNCE_BUFFER_FL fscrypt: Delay bounce page pool allocation until needed fscrypt: Cleanup page locking requirements for fscrypt_{decrypt,encrypt}_page() fscrypt: Cleanup fscrypt_{decrypt,encrypt}_page() fscrypt: Never allocate fscrypt_ctx on in-place encryption fscrypt: Use correct index in decrypt path. fscrypt: move the policy flags and encryption mode definitions to uapi header fscrypt: move non-public structures and constants to fscrypt_private.h fscrypt: unexport fscrypt_initialize() fscrypt: rename get_crypt_info() to fscrypt_get_crypt_info() fscrypto: move ioctl processing more fully into common code fscrypto: remove unneeded Kconfig dependencies MAINTAINERS: fscrypto: recommend linux-fsdevel for fscrypto patches ext4: do not perform data journaling when data is encrypted ext4: return -ENOMEM instead of success ext4: reject inodes with negative size ext4: remove another test in ext4_alloc_file_blocks() Documentation: fix description of ext4's block_validity mount option ext4: fix checks for data=ordered and journal_async_commit options ...	2016-12-14 09:17:42 -08:00
Linus Torvalds	09cb6464fe	for-f2fs-4.10 This patch series contains several performance tuning patches regarding to the IO submission flow, in addition to supporting new features such as a ZBC-base drive and multiple devices. It also includes some major bug fixes such as: - checkpoint version control - fdatasync-related roll-forward recovery routine - memory boundary or null-pointer access in corner cases - missing error cases It has various minor clean-up patches as well. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJYTx44AAoJEEAUqH6CSFDSnAQP/jeYJq5Zd0bweEF5g00Ec1Qg qNKQ57e9EHDRaDLBUmHHEaCEPRL0bw6SOUUWWqzGA07KcsIK+Yb/dGAyIcuV7WMl PjntVbYm4yARDYBHGupdOCzFSkzr8gDalb+98jJnoGUonsftljhES9jedQ1NjAms GFPHDNtirZM/r0bjKkYKjpqJ6FCxFxcGPfb/GtohDajIpohWfKZiemaXGTgtYR4d iBVek16h+Hprz90ycZBY69uz0TdAwu/gb+htMVBrAdExHWvlFzgp35OIywiAB/YX 3QD/x4t2HqOBaNYiiOAY4ukVW/Yyqa/ZAzbm+m5B5CAcFYiWXMy+cMXUY9HJJ/K0 wdvi//Avtvgpp2PVZFn2pASx14vgMFylBzuNgKpP6MPdtWTEL33jT7VYs9Nuz45E dgZ9IpiDt4DeTRuZ4mPO5iH7bVHPvAVV80bpXzirCCzDeNZ1EFFIQzXh/2UAmCxI twPXGBIYul0aIl9JkWAyhCZSd3XDSqedpfPudknjhzM9Xb1H5X0QJco7f/UwsWXH WxV6lHr1Q7UH96wJ7x/GAqj8ArOAASRV18+K51dqU+DWHnFPpBArJe39FVf8NGWs Fz1ZmlWBQ0ZgzvLkGa80llhjalXIEy/JabMrpy6VrzQGxHdmW4cVxe4dJ3710WxX VysJUcNMRKxMUTWOKsxp =Boum -----END PGP SIGNATURE----- Merge tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch series contains several performance tuning patches regarding to the IO submission flow, in addition to supporting new features such as a ZBC-base drive and multiple devices. It also includes some major bug fixes such as: - checkpoint version control - fdatasync-related roll-forward recovery routine - memory boundary or null-pointer access in corner cases - missing error cases It has various minor clean-up patches as well" * tag 'for-f2fs-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (66 commits) f2fs: fix a missing size change in f2fs_setattr f2fs: fix to access nullified flush_cmd_control pointer f2fs: free meta pages if sanity check for ckpt is failed f2fs: detect wrong layout f2fs: call sync_fs when f2fs is idle Revert "f2fs: use percpu_counter for # of dirty pages in inode" f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage f2fs: do not activate auto_recovery for fallocated i_size f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack f2fs: fix 32-bit build f2fs: set ->owner for debugfs status file's file_operations f2fs: fix incorrect free inode count in ->statfs f2fs: drop duplicate header timer.h f2fs: fix wrong AUTO_RECOVER condition f2fs: do not recover i_size if it's valid f2fs: fix fdatasync f2fs: fix to account total free nid correctly f2fs: fix an infinite loop when flush nodes in cp f2fs: don't wait writeback for datas during checkpoint f2fs: fix wrong written_valid_blocks counting ...	2016-12-14 09:07:36 -08:00
Linus Torvalds	36869cb93d	Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block Pull block layer updates from Jens Axboe: "This is the main block pull request this series. Contrary to previous release, I've kept the core and driver changes in the same branch. We always ended up having dependencies between the two for obvious reasons, so makes more sense to keep them together. That said, I'll probably try and keep more topical branches going forward, especially for cycles that end up being as busy as this one. The major parts of this pull request is: - Improved support for O_DIRECT on block devices, with a small private implementation instead of using the pig that is fs/direct-io.c. From Christoph. - Request completion tracking in a scalable fashion. This is utilized by two components in this pull, the new hybrid polling and the writeback queue throttling code. - Improved support for polling with O_DIRECT, adding a hybrid mode that combines pure polling with an initial sleep. From me. - Support for automatic throttling of writeback queues on the block side. This uses feedback from the device completion latencies to scale the queue on the block side up or down. From me. - Support from SMR drives in the block layer and for SD. From Hannes and Shaun. - Multi-connection support for nbd. From Josef. - Cleanup of request and bio flags, so we have a clear split between which are bio (or rq) private, and which ones are shared. From Christoph. - A set of patches from Bart, that improve how we handle queue stopping and starting in blk-mq. - Support for WRITE_ZEROES from Chaitanya. - Lightnvm updates from Javier/Matias. - Supoort for FC for the nvme-over-fabrics code. From James Smart. - A bunch of fixes from a whole slew of people, too many to name here" * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits) blk-stat: fix a few cases of missing batch flushing blk-flush: run the queue when inserting blk-mq flush elevator: make the rqhash helpers exported blk-mq: abstract out blk_mq_dispatch_rq_list() helper blk-mq: add blk_mq_start_stopped_hw_queue() block: improve handling of the magic discard payload blk-wbt: don't throttle discard or write zeroes nbd: use dev_err_ratelimited in io path nbd: reset the setup task for NBD_CLEAR_SOCK nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME nvme-fabrics: Add target support for FC transport nvme-fabrics: Add host support for FC transport nvme-fabrics: Add FC transport LLDD api definitions nvme-fabrics: Add FC transport FC-NVME definitions nvme-fabrics: Add FC transport error codes to nvme.h Add type 0x28 NVME type code to scsi fc headers nvme-fabrics: patch target code in prep for FC transport support nvme-fabrics: set sqe.command_id in core not transports parser: add u64 number parser nvme-rdma: align to generic ib_event logging helper ...	2016-12-13 10:19:16 -08:00
Yunlei He	c0ed4405a9	f2fs: fix a missing size change in f2fs_setattr This patch fix a missing size change in f2fs_setattr Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-12-12 11:09:05 -08:00
David Gstir	bd7b829038	fscrypt: Cleanup page locking requirements for fscrypt_{decrypt,encrypt}_page() Rename the FS_CFLG_INPLACE_ENCRYPTION flag to FS_CFLG_OWN_PAGES which, when set, indicates that the fs uses pages under its own control as opposed to writeback pages which require locking and a bounce buffer for encryption. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-11 16:26:12 -05:00
Eric Biggers	db717d8e26	fscrypto: move ioctl processing more fully into common code Multiple bugs were recently fixed in the "set encryption policy" ioctl. To make it clear that fscrypt_process_policy() and fscrypt_get_policy() implement ioctls and therefore their implementations must take standard security and correctness precautions, rename them to fscrypt_ioctl_set_policy() and fscrypt_ioctl_get_policy(). Make the latter take in a struct file * to make it consistent with the former. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-12-11 16:26:07 -05:00
Miklos Szeredi	dfeef68862	vfs: remove ".readlink = generic_readlink" assignments If .readlink == NULL implies generic_readlink(). Generated by: to_del="\.readlink.=.generic_readlink" for i in `git grep -l $to_del`; do sed -i "/$to_del"/d $i; done Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-12-09 16:45:04 +01:00
Jaegeuk Kim	5eba8c5d1f	f2fs: fix to access nullified flush_cmd_control pointer f2fs_sync_file() remount_ro - f2fs_readonly - destroy_flush_cmd_control - f2fs_issue_flush - no fcc pointer! So, this patch doesn't free fcc in this case, but just stop its kernel thread which sends flush commands. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-12-07 18:56:50 -08:00
Jaegeuk Kim	a2125ff7dd	f2fs: free meta pages if sanity check for ckpt is failed This fixes missing freeing meta pages in the error case. Tested-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-12-07 14:38:16 -08:00
Jaegeuk Kim	2040fce83f	f2fs: detect wrong layout Previous mkfs.f2fs allows small partition inappropriately, so f2fs should detect that as well. Refer this in f2fs-tools. mkfs.f2fs: detect small partition by overprovision ratio and # of segments Reported-and-Tested-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-12-07 14:37:33 -08:00
Jaegeuk Kim	f455c8a5f0	f2fs: call sync_fs when f2fs is idle The sync_fs in f2fs_balance_fs_bg must avoid interrupting current user requests. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-12-05 11:44:07 -08:00
Jaegeuk Kim	204706c7ac	Revert "f2fs: use percpu_counter for # of dirty pages in inode" This reverts commit `1beba1b3a9`. The perpcu_counter doesn't provide atomicity in single core and consume more DRAM. That incurs fs_mark test failure due to ENOMEM. Cc: stable@vger.kernel.org # 4.7+ Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-12-05 11:43:59 -08:00
Chao Yu	0002b61bda	f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage We should use AOP_WRITEPAGE_ACTIVATE when we bypass writing pages. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-29 15:43:00 -08:00
Jaegeuk Kim	26787236b3	f2fs: do not activate auto_recovery for fallocated i_size If a file needs to keep its i_size by fallocate, we need to turn off auto recovery during roll-forward recovery. This will resolve the below scenario. 1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4096" -c "fsync" 2. xfs_io -f /mnt/f2fs/file -c "falloc -k 4096 4096" -c "fsync" 3. md5sum /mnt/f2fs/file; 4. godown /mnt/f2fs/ 5. umount /mnt/f2fs/ 6. mount -t f2fs /dev/sdx /mnt/f2fs 7. md5sum /mnt/f2fs/file Reported-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-29 15:42:58 -08:00
Jaegeuk Kim	8508e44ae9	f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack We don't guarantee cp_addr is fixed by cp_version. This is to sync with f2fs-tools. Cc: stable@vger.kernel.org Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-28 13:39:58 -08:00
Arnd Bergmann	19c526515f	f2fs: fix 32-bit build The addition of multiple-device support broke CONFIG_BLK_DEV_ZONED on 32-bit machines because of a 64-bit division: fs/f2fs/f2fs.o: In function `__issue_discard_async': extent_cache.c:(.text.__issue_discard_async+0xd4): undefined reference to `__aeabi_uldivmod' Fortunately, bdev_zone_size() is guaranteed to return a power-of-two number, so we can replace the % operator with a cheaper bit mask. Fixes: 792b84b74b54 ("f2fs: support multiple devices") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:16:09 -08:00
Nicolai Stange	05e6ea2685	f2fs: set ->owner for debugfs status file's file_operations The struct file_operations instance serving the f2fs/status debugfs file lacks an initialization of its ->owner. This means that although that file might have been opened, the f2fs module can still get removed. Any further operation on that opened file, releasing included, will cause accesses to unmapped memory. Indeed, Mike Marshall reported the following: BUG: unable to handle kernel paging request at ffffffffa0307430 IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90 <...> Call Trace: [] __fput+0xdf/0x1d0 [] ____fput+0xe/0x10 [] task_work_run+0x8e/0xc0 [] do_exit+0x2ae/0xae0 [] ? __audit_syscall_entry+0xae/0x100 [] ? syscall_trace_enter+0x1ca/0x310 [] do_group_exit+0x44/0xc0 [] SyS_exit_group+0x14/0x20 [] do_syscall_64+0x61/0x150 [] entry_SYSCALL64_slow_path+0x25/0x25 <...> ---[ end trace f22ae883fa3ea6b8 ]--- Fixing recursive fault but reboot is needed! Fix this by initializing the f2fs/status file_operations' ->owner with THIS_MODULE. This will allow debugfs to grab a reference to the f2fs module upon any open on that file, thus preventing it from getting removed. Fixes: `902829aa0b` ("f2fs: move proc files to debugfs") Reported-by: Mike Marshall <hubcap@omnibond.com> Reported-by: Martin Brandenburg <martin@omnibond.com> Cc: stable@vger.kernel.org Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:16:08 -08:00
Chao Yu	b08b12d2dd	f2fs: fix incorrect free inode count in ->statfs While calculating inode count that we can create at most in the left space, we should consider space which data/node blocks occupied, since we create data/node mixly in main area. So fix the wrong calculation in ->statfs. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:16:07 -08:00
Geliang Tang	b4ceec2921	f2fs: drop duplicate header timer.h Drop duplicate header timer.h from segment.c. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:16:06 -08:00
Jaegeuk Kim	97dd26ad83	f2fs: fix wrong AUTO_RECOVER condition If i_size is not aligned to the f2fs's block size, we should not skip inode update during fsync. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:16:05 -08:00
Jaegeuk Kim	3a3a5ead7b	f2fs: do not recover i_size if it's valid If i_size is already valid during roll_forward recovery, we should not update it according to the block alignment. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:16:04 -08:00
Chao Yu	281518c694	f2fs: fix fdatasync For below two cases, we can't guarantee data consistence: a) 1. xfs_io "pwrite 0 4195328" "fsync" 2. xfs_io "pwrite 4195328 1024" "fdatasync" 3. godown 4. umount & mount --> isize we updated before fdatasync won't be recovered b) 1. xfs_io "pwrite -S 0xcc 0 4202496" "fsync" 2. xfs_io "fpunch 4194304 4096" "fdatasync" 3. godown 4. umount & mount --> dnode we punched before fdatasync won't be recovered The reason is that normally fdatasync won't be aware of modification of metadata in file, e.g. isize changing, dnode updating, so in ->fsync we will skip flushing node pages for above cases, result in making fdatasynced file being lost during recovery. Currently we have introduced DIRTY_META global list in sbi for tracking dirty inode selectively, so in fdatasync we can choose to flush nodes depend on dirty state of current inode in the list. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:16:03 -08:00
Chao Yu	04d47e6738	f2fs: fix to account total free nid correctly Thread A Thread B Thread C - f2fs_create - f2fs_new_inode - f2fs_lock_op - alloc_nid alloc last nid - f2fs_unlock_op - f2fs_create - f2fs_new_inode - f2fs_lock_op - alloc_nid as node count still not be increased, we will loop in alloc_nid - f2fs_write_node_pages - f2fs_balance_fs_bg - f2fs_sync_fs - write_checkpoint - block_operations - f2fs_lock_all - f2fs_lock_op While creating new inode, we do not allocate and account nid atomically, so that when there is almost no free nids left, we may encounter deadloop like above stack. In order to avoid that, reuse nm_i::available_nids for accounting free nids and make nid allocation and counting being atomical during node creation. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:16:01 -08:00
Yunlei He	d40a43af0a	f2fs: fix an infinite loop when flush nodes in cp Thread A Thread B - write_checkpoint - block_operations -blk_start_plug -sync_node_pages - f2fs_do_sync_file - fsync_node_pages - f2fs_wait_on_page_writeback Thread A wait for global F2FS_DIRTY_NODES decreased to zero, it start a plug list, some requests have been added to this list. Thread B lock one dirty node page, and wait this page write back. But this page has been in plug list of thread A with PG_writeback flag. Thread A keep on running and its plug list has no chance to finish, so it seems a deadlock between cp and fsync path. This patch add a wait on page write back before set node page dirty to avoid this problem. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Pengyang Hou <houpengyang@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:16:00 -08:00
Chao Yu	36951b38d1	f2fs: don't wait writeback for datas during checkpoint Normally, while committing checkpoint, we will wait on all pages to be writebacked no matter the page is data or metadata, so in scenario where there are lots of data IO being submitted with metadata, we may suffer long latency for waiting writeback during checkpoint. Indeed, we only care about persistence for pages with metadata, but not pages with data, as file system consistent are only related to metadate, so in order to avoid encountering long latency in above scenario, let's recognize and reference metadata in submitted IOs, wait writeback only for metadatas. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:15:59 -08:00
Jaegeuk Kim	c79b7ff1d3	f2fs: fix wrong written_valid_blocks counting Previously, written_valid_blocks was got by ckpt->valid_block_count. But if the last checkpoint has some NEW_ADDR due to power-cut, we can get wrong value. Fix it to get the number from actual written block count from sit entries. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:15:58 -08:00
Jaegeuk Kim	7702bdbe50	f2fs: avoid BG_GC in f2fs_balance_fs If many threads hit has_not_enough_free_secs() in f2fs_balance_fs() at the same time, all the threads would do FG_GC or BG_GC. In this critical path, we totally don't need to do BG_GC at all. Let's avoid that. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:15:57 -08:00
Jaegeuk Kim	c040ff9d69	f2fs: fix redundant block allocation In direct_IO path of f2fs_file_write_iter(), 1. f2fs_preallocate_blocks(F2FS_GET_BLOCK_PRE_DIO) -> allocate LBA X 2. f2fs_direct_IO() -> return 0; Then, f2fs_write_data_page() will allocate another LBA X+1. This makes EIO triggered by HM-SMR. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:15:50 -08:00
Jaegeuk Kim	a7de608691	f2fs: use err for f2fs_preallocate_blocks This patch has no functional change. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:15:14 -08:00
Jaegeuk Kim	3c62be17d4	f2fs: support multiple devices This patch implements multiple devices support for f2fs. Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big volume under one f2fs instance. Internal block management is very simple, but we will modify block allocation and background GC policy to boost IO speed by exploiting them accoording to each device speed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:15:13 -08:00
Jaegeuk Kim	e57e9ae5b1	f2fs: allow dio read for LFS mode We can allow dio reads for LFS mode, while doing buffered writes for dio writes. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:15:12 -08:00
Jaegeuk Kim	6ae1be13e8	f2fs: revert segment allocation for direct IO Now we don't need to be too much careful about storage alignment for dio, since its speed becomes quite fast and we'd better avoid any misalignment first. Revert: `38aa0889b2` (f2fs: align direct_io'ed data to section) Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-25 10:15:02 -08:00
Yunlei He	2061471128	f2fs: return directly if block has been removed from the victim If one block has been to written to a new place, just return in move data process. This patch check it again with holding page lock. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:31 -08:00
Chao Yu	d47b871595	Revert "f2fs: do not recover from previous remained wrong dnodes" i_times of inode will be set with current system time which can be configured through 'date', so it's not safe to judge dnode block as garbage data or unchanged inode depend on i_times. Now, we have used enhanced 'cp_ver + cp' crc method to verify valid dnode block, so I expect recoverying invalid dnode is almost not possible. This reverts commit `807b1e1c8e`. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:30 -08:00
Jaegeuk Kim	b4b9d34c85	f2fs: remove checkpoint in f2fs_freeze The generic freeze_super() calls sync_filesystems() before f2fs_freeze(). So, basically we don't need to do checkpoint in f2fs_freeze(). But, in xfs/068, it triggers circular locking problem below due to gc_mutex for checkpoint. ====================================================== [ INFO: possible circular locking dependency detected ] 4.9.0-rc1+ #132 Tainted: G OE ------------------------------------------------------- 1. wait for __sb_start_write() by [<ffffffff9845f353>] dump_stack+0x85/0xc2 [<ffffffff980e80bf>] print_circular_bug+0x1cf/0x230 [<ffffffff980eb4d0>] __lock_acquire+0x19e0/0x1bc0 [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220 [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs] [<ffffffff9826bdd0>] __sb_start_write+0x130/0x200 [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs] [<ffffffffc08c7c3b>] f2fs_drop_inode+0x9b/0x160 [f2fs] [<ffffffff98289991>] iput+0x171/0x2c0 [<ffffffffc08cfccf>] f2fs_sync_inode_meta+0x3f/0xf0 [f2fs] [<ffffffffc08cfe04>] block_operations+0x84/0x110 [f2fs] [<ffffffffc08cff78>] write_checkpoint+0xe8/0xf20 [f2fs] [<ffffffff980e979d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs] [<ffffffff9803e9d9>] ? sched_clock+0x9/0x10 [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs] [<ffffffffc08c6df5>] f2fs_sync_fs+0x85/0x190 [f2fs] [<ffffffff982a4f90>] ? do_fsync+0x70/0x70 [<ffffffff982a4f90>] ? do_fsync+0x70/0x70 [<ffffffff982a4fb0>] sync_fs_one_sb+0x20/0x30 [<ffffffff9826ca3e>] iterate_supers+0xae/0x100 [<ffffffff982a50b5>] sys_sync+0x55/0x90 [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6 2. wait for sbi->gc_mutex by [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220 [<ffffffff989063d6>] mutex_lock_nested+0x76/0x3f0 [<ffffffffc08c6de9>] f2fs_sync_fs+0x79/0x190 [f2fs] [<ffffffffc08c7a6c>] f2fs_freeze+0x1c/0x20 [f2fs] [<ffffffff9826b6ef>] freeze_super+0xcf/0x190 [<ffffffff9827eebc>] do_vfs_ioctl+0x53c/0x6a0 [<ffffffff9827f099>] SyS_ioctl+0x79/0x90 [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:29 -08:00
Jaegeuk Kim	bdb7d964c4	f2fs: assign segments correctly for direct_io Previously, we assigned CURSEG_WARM_DATA for direct_io, but if we have two or four logs, we do not use that type at all. Let's fix it. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:28 -08:00
Chao Yu	9f0552e078	f2fs: fix wrong i_atime recovery Shouldn't update in-memory i_atime with on-disk i_mtime of inode when recovering inode. Shuoran found this bug which is hidden for a long time, honour is belong to him. Signed-off-by: Shuoran Liu <liushuoran@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:26 -08:00
Chao Yu	60dcedc997	f2fs: record inode updating status correctly We should record updating status of inode only for living inode, for those unlinked inode it needs to clear its ino cache, otherwise after the ino was been reused, it will cause unneeded node page writing during ->fsync. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:25 -08:00
Damien Le Moal	126606c7a9	f2fs: Trace reset zone events Similarly to the regular discard, trace zone reset events. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:24 -08:00
Damien Le Moal	f46e8809e8	f2fs: Reset sequential zones on zoned block devices When a zoned block device is mounted, discarding sections contained in sequential zones must reset the zone write pointer. For sections contained in conventional zones, the regular discard is used if the drive supports it. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:23 -08:00
Damien Le Moal	178053e2f1	f2fs: Cache zoned block devices zone type With the zoned block device feature enabled, section discard need to do a zone reset for sections contained in sequential zones, and a regular discard (if supported) for sections stored in conventional zones. Avoid the need for a costly report zones to obtain a section zone type when discarding it by caching the types of the device zones in the super block information. This cache is initialized at mount time for mounts with the zoned block device feature enabled. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:22 -08:00
Damien Le Moal	3adc57e977	f2fs: Do not allow adaptive mode for host-managed zoned block devices The LFS mode is mandatory for host-managed zoned block devices as update in place optimizations are not possible for segments in sequential zones. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:20 -08:00
Damien Le Moal	96ba2decb4	f2fs: Always enable discard for zoned blocks devices Zone write pointer reset acts as discard for zoned block devices. So if the zoned block device feature is enabled, always declare that discard is enabled, even if the device does not actually support the command. For the same reason, prevent the use the "nodicard" mount option. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:19 -08:00
Damien Le Moal	0ab0299835	f2fs: Suppress discard warning message for zoned block devices For zoned block devices, discard is replaced by zone reset. So do not warn if the device does not supports discard. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:18 -08:00
Damien Le Moal	d1b959c877	f2fs: Check zoned block feature for host-managed zoned block devices The F2FS_FEATURE_BLKZONED feature indicates that the drive was formatted with zone alignment optimization. This is optional for host-aware devices, but mandatory for host-managed zoned block devices. So check that the feature is set in this latter case. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:17 -08:00
Damien Le Moal	0bfd7a091c	f2fs: Use generic zoned block device terminology SMR stands for "Shingled Magnetic Recording" which makes sense only for hard disk drives (spinning rust). The ZBC/ZAC standards enable management of SMR disks, but solid state drives may also support those standards. So rename the HMSMR feature to BLKZONED to avoid a HDD centric terminology. For the same reason, rename f2fs_sb_mounted_hmsmr to f2fs_sb_mounted_blkzoned. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:16 -08:00
Damien Le Moal	487df616de	f2fs: Add missing break in switch-case Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:14 -08:00
Jaegeuk Kim	099228000e	f2fs: avoid infinite loop in the EIO case on recover_orphan_inodes This patch should fix an infinite loop case below. F2FS-fs : inject IO error in f2fs_read_end_io+0xf3/0x120 [f2fs] F2FS-fs (nvme0n1p1): recover_orphan_inode: orphan failed (ino=39ac1a), run fsck to fix. ... [<ffffffffc0b11ede>] sync_meta_pages+0xae/0x270 [f2fs] [<ffffffffc0b288dd>] ? flush_sit_entries+0x8d/0x960 [f2fs] [<ffffffffc0b13801>] write_checkpoint+0x361/0xf20 [f2fs] [<ffffffffb40e979d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffffc0b0a199>] ? f2fs_sync_fs+0x79/0x190 [f2fs] [<ffffffffc0b0a1a5>] f2fs_sync_fs+0x85/0x190 [f2fs] [<ffffffffc0b2560e>] f2fs_balance_fs_bg+0x7e/0x1c0 [f2fs] [<ffffffffc0b216c4>] f2fs_write_node_pages+0x34/0x320 [f2fs] [<ffffffffb41dff21>] do_writepages+0x21/0x30 [<ffffffffb429edb1>] __writeback_single_inode+0x61/0x760 [<ffffffffb490a937>] ? _raw_spin_unlock+0x27/0x40 [<ffffffffb42a0805>] writeback_single_inode+0xd5/0x190 [<ffffffffb42a0959>] write_inode_now+0x99/0xc0 [<ffffffffb4289a16>] iput+0x1f6/0x2c0 [<ffffffffc0b0e3be>] f2fs_fill_super+0xe0e/0x1300 [f2fs] [<ffffffffb426c394>] ? sget_userns+0x4f4/0x530 [<ffffffffb426c692>] mount_bdev+0x182/0x1b0 [<ffffffffc0b0d5b0>] ? f2fs_commit_super+0x100/0x100 [f2fs] [<ffffffffc0b0a375>] f2fs_mount+0x15/0x20 [f2fs] [<ffffffffb426d038>] mount_fs+0x38/0x170 [<ffffffffb428ec9b>] vfs_kern_mount+0x6b/0x160 [<ffffffffb4291d9e>] do_mount+0x1be/0xd60 [<ffffffffb4291a57>] ? copy_mount_options+0xb7/0x220 [<ffffffffb4292c54>] SyS_mount+0x94/0xd0 [<ffffffffb490b345>] entry_SYSCALL_64_fastpath+0x23/0xc6 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:13 -08:00
Chao Yu	ed6bd4b146	f2fs: report error of f2fs_fill_dentries Report error of f2fs_fill_dentries to ->iterate_shared, otherwise when error ocurrs, user may just list part of dirents in target directory without any hints. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:12 -08:00
Arnd Bergmann	230436b3ef	f2fs: hide a maybe-uninitialized warning gcc is unsure about the use of last_ofs_in_node, which might happen without a prior initialization: fs/f2fs//git/arm-soc/fs/f2fs/data.c: In function ‘f2fs_map_blocks’: fs/f2fs/data.c:799:54: warning: ‘last_ofs_in_node’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (prealloc && dn.ofs_in_node != last_ofs_in_node + 1) { As pointed out by Chao Yu, the code is actually correct as 'prealloc' is only set if the last_ofs_in_node has been set, the two always get updated together. This initializes last_ofs_in_node to dn.ofs_in_node for each new dnode at the start of the 'next_block' loop, which at that point is a correct initialization as well. I assume that compilers that correctly track the contents of the variables and do not warn about the condition also figure out that they can eliminate the extra assignment here. Fixes: `46008c6d42` ("f2fs: support in batch multi blocks preallocation") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:11 -08:00
Jaegeuk Kim	35782b233f	f2fs: remove percpu_count due to performance regression This patch removes percpu_count usage due to performance regression in iozone. Fixes: `523be8a6b3` ("f2fs: use percpu_counter for page counters") Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:10 -08:00
Jaegeuk Kim	18340edc8d	f2fs: make clean inodes when flushing inode page This patch tries to make more clean inodes when flushing dirty inodes in checkpoint. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:09 -08:00
Jaegeuk Kim	7c45729a4d	f2fs: keep dirty inodes selectively for checkpoint This is to avoid no free segment bug during checkpoint caused by a number of dirty inodes. The case was reported by Chao like this. 1. mount with lazytime option 2. fill 4k file until disk is full 3. sync filesystem 4. read all files in the image 5. umount In this case, we actually don't need to flush dirty inode to inode page during checkpoint. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:08 -08:00
Jaegeuk Kim	664ba972df	f2fs: use BIO_MAX_PAGES for bio allocation We don't need to allocate bio partially in order to maximize sequential writes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:07 -08:00
Jaegeuk Kim	3e7b5bbbef	f2fs: declare static function for __build_free_nids This patch avoids build warning. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:06 -08:00
Jaegeuk Kim	15d0435455	f2fs: call f2fs_balance_fs for setattr If inode becomes dirty, we need to check the # of dirty inodes whether or not further checkpoint would be required. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:05 -08:00
Jaegeuk Kim	b9610bdfcb	f2fs: count dirty inodes to flush node pages during checkpoint If there are a lot of dirty inodes, we need to flush all of them when doing checkpoint. So, we need to count this for enough free space. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:04 -08:00
Chao Yu	02110a4fd5	f2fs: avoid casted negative value as shrink count This patch makes sure it returns a positive value instead of a probable casted negative value as shrink count. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:03 -08:00
Chao Yu	3a2ad5672b	f2fs: don't interrupt free nids building during nid allocation Let build_free_nids support sync/async methods, in allocation flow of nids, we use synchronuous method, so that we can avoid looping in alloc_nid when free memory is low; in unblock_operations and f2fs_balance_fs_bg we use asynchronuous method in where low memory condition can interrupt us. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:02 -08:00
Jaegeuk Kim	eb0aa4b807	f2fs: clean up free nid list operations This patch cleans up to use consistent free nid list ops. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:01 -08:00
Chao Yu	b8559dc242	f2fs: split free nid list During free nid allocation, in order to do preallocation, we will tag free nid entry as allocated one and still leave it in free nid list, for other allocators who want to grab free nids, it needs to traverse the free nid list for lookup. It becomes overhead in scenario of allocating free nid intensively by multithreads. This patch splits free nid list to two list: {free,alloc}_nid_list, to keep free nids and preallocated free nids separately, after that, traverse latency will be gone, besides split nid_cnt for separate statistic. Additionally, introduce __insert_nid_to_list and __remove_nid_from_list for cleanup. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: modify f2fs_bug_on to avoid needless branches] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:11:00 -08:00
Chao Yu	a11b9f65ea	f2fs: clear nlink if fail to add_link We don't need to keep incomplete created inode in cache, so if we fail to add link into directory during new inode creation, it's better to set nlink of inode to zero, then we can evict inode immediately. Otherwise release of nid belong to inode will be delayed until inode cache is being shrunk, it may cause a seemingly endless loop while allocating free nids in time of testing generic/269 case of fstest suit. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: add update_inode_page to fix kernel panic] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:59 -08:00
Eric Biggers	0c0b471e43	f2fs: fix sparse warnings f2fs contained a number of endianness conversion bugs. Also, one function should have been 'static'. Found with sparse by running 'make C=2 CF=-D__CHECK_ENDIAN__ fs/f2fs/' Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:57 -08:00
Chao Yu	9de6927975	f2fs: fix error handling in fsync_node_pages In fsync_node_pages, if f2fs was taged with CP_ERROR_FLAG, make sure bio cache was flushed before return. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:56 -08:00
Chao Yu	b691d98fdd	f2fs: fix to update largest extent under lock In order to avoid racing problem, make largest extent cache being updated under lock. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:55 -08:00
Chao Yu	58736fa60f	f2fs: be aware of extent beyond EOF in fiemap f2fs can support fallocating blocks beyond file size without changing the size, but ->fiemap of f2fs was restricted and can't detect these extents fallocated past EOF, now relieve the restriction. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:54 -08:00
Chao Yu	6f2d8ed654	f2fs: don't miss any f2fs_balance_fs cases In f2fs_map_blocks, let f2fs_balance_fs detects node page modification with dn.node_changed to avoid miss some corner cases. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:53 -08:00
Chao Yu	9434fcde1f	f2fs: add missing f2fs_balance_fs in f2fs_zero_range f2fs_balance_fs should be called in between node page updating, otherwise node page count will exceeded far beyond watermark of triggering foreground garbage collection, result in facing high risk of hitting LFS allocation failure. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:52 -08:00
Chao Yu	933439c8f3	f2fs: give a chance to detach from dirty list If there is no dirty pages in inode, we should give a chance to detach the inode from global dirty list, otherwise it needs to call another unnecessary .writepages for detaching. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:51 -08:00
Chao Yu	2dd15654ac	f2fs: fix to release discard entries during checkpoint In f2fs_fill_super, if there is any IO error occurs during recovery, cached discard entries will be leaked, in order to avoid this, make write_checkpoint() handle memory release by itself, besides, move clear_prefree_segments to write_checkpoint for readability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:50 -08:00
Chao Yu	2411cf5bef	f2fs: exclude free nids building and allocation During nid allocation, it needs to exclude building and allocating flow of free nids, this is because while building free nid cache, there are two steps: a) load free nids from unused nat entries in NAT pages, b) update free nid cache by checking nat journal. The two steps should be atomical, otherwise an used nid can be allocated as free one after a) and before b). This patch adds missing lock which covers build_free_nids in unlock_operation and f2fs_balance_fs_bg to avoid that. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:49 -08:00
Jaegeuk Kim	e87f7329bb	f2fs: fix overflow due to condition check order In the last ilen case, i was already increased, resulting in accessing out- of-boundary entry of do_replace and blkaddr. Fix to check ilen first to exit the loop. Fixes: 2aa8fbb9693020 ("f2fs: refactor __exchange_data_block for speed up") Cc: stable@vger.kernel.org # 4.8+ Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-11-23 12:10:48 -08:00
David Gstir	9c4bb8a3a9	fscrypt: Let fs select encryption index/tweak Avoid re-use of page index as tweak for AES-XTS when multiple parts of same page are encrypted. This will happen on multiple (partial) calls of fscrypt_encrypt_page on same page. page->index is only valid for writeback pages. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-13 20:18:16 -05:00
David Gstir	7821d4dd45	fscrypt: Enable partial page encryption Not all filesystems work on full pages, thus we should allow them to hand partial pages to fscrypt for en/decryption. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-11-13 18:55:21 -05:00
Jens Axboe	7637241e65	writeback: add wbc_to_write_flags() Add wbc_to_write_flags(), which returns the write modifier flags to use, based on a struct writeback_control. No functional changes in this patch, but it prepares us for factoring other wbc fields for write type. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de>	2016-11-02 10:24:03 -06:00
Christoph Hellwig	70fd76140a	block,fs: use REQ_* flags directly Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-11-01 09:43:26 -06:00
Christoph Hellwig	ef295ecf09	block: better op and flags encoding Now that we don't need the common flags to overflow outside the range of a 32-bit type we can encode them the same way for both the bio and request fields. This in addition allows us to place the operation first (and make some room for more ops while we're at it) and to stop having to shift around the operation values. In addition this allows passing around only one value in the block layer instead of two (and eventuall also in the file systems, but we can do that later) and thus clean up a lot of code. Last but not least this allows decreasing the size of the cmd_flags field in struct request to 32-bits. Various functions passing this value could also be updated, but I'd like to avoid the churn for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-10-28 08:48:16 -06:00
Linus Torvalds	1a1891d762	This includes fixing a bug which references a wrong pointer, sum_page, in f2fs_gc. It was newly introduced in 4.9-rc1. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJYBoL+AAoJEEAUqH6CSFDSQEoP/irufW4HDUszwKCxISLGksBN /i85zfbL9pY11Ci78qW4N5pegd2ouKk9WtUdrtXwT6y8Y+CWs9Gx5FaYbdA8aj7T nQBfRlfN/zKIJkRqqqXh/YRTMyQfUticur0dWZ00l7wpGAxA6Xhe+QDV1T7rxQxZ W8Ne9jcAD+SLu8G5Ci8zOTDcK+q6SWeXFNFtM1MPqr/S86PXiTRmlWFNcubACBi9 iqLl5moD++oYBJSU6sqxaKXvf27GhJMZUGp+upYT9lEIGF98vyhw+yXT64HYDvrW lMJspz80m5CRT3Pz7nxG+1d6ctONbXwDjyBPeEINUlHi4DN07UJZCJUnUWyMYM8D JffKq4DGhmsh0CJoX5pKjzxDhuHtrWwN+d/gMuUZrgC7oecXs0U5O2WR49oMGCs8 MMh6T/MUh1fWBI7F1Y3+OvyshU5XA9LCcNguu0T/RSKsjvqxkYTlpS/Zo1U/mpqv ar34mlvLXlLQNj8gTYDNybq/ufBfS2Fzq+juWbXCaVZ4OV0rsu66D9aUInQ66Iy6 1EqjlLaqAwTF+DJsyGeAxNEn6tMPiqMVhK9LTT9dz5Uk5vK4FdvuDxqeFuwwTnCi xYHdsZn/yhs7Nn6akwOclJayegDKiV3OVEG4ZyUHrXVR+njFTJyW20VcuUvN9RCn AxHW5MI9jaU/U+O4Opbb =1MpQ -----END PGP SIGNATURE----- Merge tag 'for-f2fs-4.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs bugfix from Jaegeuk Kim: "This fixes a bug which referenced the wrong pointer, sum_page, in f2fs_gc. It was newly introduced in 4.9-rc1. * tag 'for-f2fs-4.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: fix wrong sum_page pointer in f2fs_gc	2016-10-18 14:15:23 -07:00
Jaegeuk Kim	de0dcc40f6	f2fs: fix wrong sum_page pointer in f2fs_gc This patch fixes using a wrong pointer for sum_page in f2fs_gc. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-10-12 16:23:36 -07:00
Michal Hocko	5114a97a8b	fs: use mapping_set_error instead of opencoded set_bit The mapping_set_error() helper sets the correct AS_ flag for the mapping so there is no reason to open code it. Use the helper directly. [akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO] Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:33 -07:00
Linus Torvalds	101105b171	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: ">rename2() work from Miklos + current_time() from Deepa" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Replace current_fs_time() with current_time() fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps fs: Replace CURRENT_TIME with current_time() for inode timestamps fs: proc: Delete inode time initializations in proc_alloc_inode() vfs: Add current_time() api vfs: add note about i_op->rename changes to porting fs: rename "rename2" i_op to "rename" vfs: remove unused i_op->rename fs: make remaining filesystems use .rename2 libfs: support RENAME_NOREPLACE in simple_rename() fs: support RENAME_NOREPLACE for local filesystems ncpfs: fix unused variable warning	2016-10-10 20:16:43 -07:00
Al Viro	3873691e5a	Merge remote-tracking branch 'ovl/rename2' into for-linus	2016-10-10 23:02:51 -04:00
Linus Torvalds	97d2116708	Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs xattr updates from Al Viro: "xattr stuff from Andreas This completes the switch to xattr_handler ->get()/->set() from ->getxattr/->setxattr/->removexattr" * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: Remove {get,set,remove}xattr inode operations xattr: Stop calling {get,set,remove}xattr inode operations vfs: Check for the IOP_XATTR flag in listxattr xattr: Add __vfs_{get,set,remove}xattr helpers libfs: Use IOP_XATTR flag for empty directory handling vfs: Use IOP_XATTR flag for bad-inode handling vfs: Add IOP_XATTR inode operations flag vfs: Move xattr_resolve_name to the front of fs/xattr.c ecryptfs: Switch to generic xattr handlers sockfs: Get rid of getxattr iop sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names kernfs: Switch to generic xattr handlers hfs: Switch to generic xattr handlers jffs2: Remove jffs2_{get,set,remove}xattr macros xattr: Remove unnecessary NULL attribute name check	2016-10-10 17:11:50 -07:00
Linus Torvalds	abb5a14fa2	Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted misc bits and pieces. There are several single-topic branches left after this (rename2 series from Miklos, current_time series from Deepa Dinamani, xattr series from Andreas, uaccess stuff from from me) and I'd prefer to send those separately" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits) proc: switch auxv to use of __mem_open() hpfs: support FIEMAP cifs: get rid of unused arguments of CIFSSMBWrite() posix_acl: uapi header split posix_acl: xattr representation cleanups fs/aio.c: eliminate redundant loads in put_aio_ring_file fs/internal.h: add const to ns_dentry_operations declaration compat: remove compat_printk() fs/buffer.c: make __getblk_slow() static proc: unsigned file descriptors fs/file: more unsigned file descriptors fs: compat: remove redundant check of nr_segs cachefiles: Fix attempt to read i_blocks after deleting file [ver #2] cifs: don't use memcpy() to copy struct iov_iter get rid of separate multipage fault-in primitives fs: Avoid premature clearing of capabilities fs: Give dentry to inode_change_ok() instead of inode fuse: Propagate dentry down to inode_change_ok() ceph: Propagate dentry down to inode_change_ok() xfs: Propagate dentry down to inode_change_ok() ...	2016-10-10 13:04:49 -07:00
Al Viro	e55f1d1d13	Merge remote-tracking branch 'jk/vfs' into work.misc	2016-10-08 11:06:08 -04:00
Andreas Gruenbacher	fd50ecaddf	vfs: Remove {get,set,remove}xattr inode operations These inode operations are no longer used; remove them. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-10-07 21:48:36 -04:00
Linus Torvalds	2eee010d09	Lots of bug fixes and cleanups. -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJX9pA6AAoJEPL5WVaVDYGj7fwH/0YcdQWBg0O5d7iXFnTcimh9 fiYkqKniBWQhgBAOFPMoNPRIW4tyeQmTtu8Rywx2Hr+v4lzJvuOaT18NDANdq/pp u5eDrnJ4R+uqPJlgxVOzopLVJ6I2glgSSRdvAKYxwTYcv8F88ObzVfsJ4M415gPq cbEKF+JT3l5hTGENR5sqmYvHYaNfOFkOqt4gulPtgk1eshy+BH/05M+qBSeA5a6k srdon0pFRoUV68m+T4G8FqOZxdybeT5Yx6X0GJf0eQJoX7IaiQTPcDrXzlrbDBbN rrzbpwsDeDKtgSOckbarCBroZKdToHFekfnOJ7IPWYq8IwYTSnZKFCWIRKO6z38= =IvhS -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Lots of bug fixes and cleanups" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits) ext4: remove unused variable ext4: use journal inode to determine journal overhead ext4: create function to read journal inode ext4: unmap metadata when zeroing blocks ext4: remove plugging from ext4_file_write_iter() ext4: allow unlocked direct IO when pages are cached ext4: require encryption feature for EXT4_IOC_SET_ENCRYPTION_POLICY fscrypto: use standard macros to compute length of fname ciphertext ext4: do not unnecessarily null-terminate encrypted symlink data ext4: release bh in make_indexed_dir ext4: Allow parallel DIO reads ext4: allow DAX writeback for hole punch jbd2: fix lockdep annotation in add_transaction_credits() blockgroup_lock.h: simplify definition of NR_BG_LOCKS blockgroup_lock.h: remove debris from bgl_lock_ptr() conversion fscrypto: make filename crypto functions return 0 on success fscrypto: rename completion callbacks to reflect usage fscrypto: remove unnecessary includes fscrypto: improved validation when loading inode encryption metadata ext4: fix memory leak when symlink decryption fails ...	2016-10-07 15:15:33 -07:00

... 3 4 5 6 7 ...

2013 Commits