linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 03:05:21 +07:00

Author	SHA1	Message	Date
Jaegeuk Kim	73faec4d99	f2fs: add mount option to select fault injection ratio This patch adds a mount option to select fault ratio. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-05-07 10:32:22 -07:00
Jaegeuk Kim	0414b004a8	f2fs: introduce f2fs_kmalloc to wrap kmalloc This patch adds f2fs_kmalloc. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-05-07 10:32:20 -07:00
Chao Yu	da011cc0da	f2fs: move node pages only in victim section during GC For foreground GC, we cache node blocks in victim section and set them dirty, then we call sync_node_pages to flush these node pages, but meanwhile, those node pages which does not locate in victim section will be flushed together, so more bandwidth and continuous free space would be occupied. So for this condition, it's better to leave those unrelated node page in cache for further write hit, and let CP or VM to flush them afterward. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-04-27 14:10:42 -07:00
Jaegeuk Kim	608514deba	f2fs: set fsync mark only for the last dnode In order to give atomic writes, we should consider power failure during sync_node_pages in fsync. So, this patch marks fsync flag only in the last dnode block. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-04-26 14:24:59 -07:00
Jaegeuk Kim	5268137564	f2fs: split sync_node_pages with fsync_node_pages This patch splits the existing sync_node_pages into (f)sync_node_pages. The fsync_node_pages is used for f2fs_sync_file only. Acked-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-04-26 14:24:48 -07:00
Chao Yu	675f10bde6	f2fs: fix to convert inline directory correctly With below serials, we will lose parts of dirents: 1) mount f2fs with inline_dentry option 2) echo 1 > /sys/fs/f2fs/sdX/dir_level 3) mkdir dir 4) touch 180 files named [1-180] in dir 5) touch 181 in dir 6) echo 3 > /proc/sys/vm/drop_caches 7) ll dir ls: cannot access 2: No such file or directory ls: cannot access 4: No such file or directory ls: cannot access 5: No such file or directory ls: cannot access 6: No such file or directory ls: cannot access 8: No such file or directory ls: cannot access 9: No such file or directory ... total 360 drwxr-xr-x 2 root root 4096 Feb 19 15:12 ./ drwxr-xr-x 3 root root 4096 Feb 19 15:11 ../ -rw-r--r-- 1 root root 0 Feb 19 15:12 1 -rw-r--r-- 1 root root 0 Feb 19 15:12 10 -rw-r--r-- 1 root root 0 Feb 19 15:12 100 -????????? ? ? ? ? ? 101 -????????? ? ? ? ? ? 102 -????????? ? ? ? ? ? 103 ... The reason is: when doing the inline dir conversion, we didn't consider that directory has hierarchical hash structure which can be configured through sysfs interface 'dir_level'. By default, dir_level of directory inode is 0, it means we have one bucket in hash table located in first level, all dirents will be hashed in this bucket, so it has no problem for us to do the duplication simply between inline dentry page and converted normal dentry page. However, if we configured dir_level with the value N (greater than 0), it will expand the bucket number of first level hash table by 2^N - 1, it hashs dirents into different buckets according their hash value, if we still move all dirents to first bucket, it makes incorrent locating for inline dirents, the result is, although we can iterate all dirents through ->readdir, we can't stat some of them in ->lookup which based on hash table searching. This patch fixes this issue by rehashing dirents into correct position when converting inline directory. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-04-15 08:49:47 -07:00
Jaegeuk Kim	6781eabba1	f2fs: give -EINVAL for norecovery and rw mount Once detecting something to recover, f2fs should stop mounting, given norecovery and rw mount options. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-04-15 08:49:47 -07:00
Jaegeuk Kim	df728b0f69	f2fs: recover superblock at RW remounts This patch adds a sbi flag, SBI_NEED_SB_WRITE, which indicates it needs to recover superblock when (re)mounting as RW. This is set only when f2fs is mounted as RO. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-04-15 08:49:47 -07:00
Kirill A. Shutemov	09cbfeaf1a	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-04-04 10:41:08 -07:00
Keith Mok	43b6573bac	f2fs: use cryptoapi crc32 functions The crc function is done bit by bit. Optimize this by use cryptoapi crc32 function which is backed by h/w acceleration. Signed-off-by: Keith Mok <ek9852@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-03-17 21:19:43 -07:00
Jaegeuk Kim	0b81d07790	fs crypto: move per-file encryption from f2fs tree to fs/crypto This patch adds the renamed functions moved from the f2fs crypto files. 1. definitions for per-file encryption used by ext4 and f2fs. 2. crypto.c for encrypt/decrypt functions a. IO preparation: - fscrypt_get_ctx / fscrypt_release_ctx b. before IOs: - fscrypt_encrypt_page - fscrypt_decrypt_page - fscrypt_zeroout_range c. after IOs: - fscrypt_decrypt_bio_pages - fscrypt_pullback_bio_page - fscrypt_restore_control_page 3. policy.c supporting context management. a. For ioctls: - fscrypt_process_policy - fscrypt_get_policy b. For context permission - fscrypt_has_permitted_context - fscrypt_inherit_context 4. keyinfo.c to handle permissions - fscrypt_get_encryption_info - fscrypt_free_encryption_info 5. fname.c to support filename encryption a. general wrapper functions - fscrypt_fname_disk_to_usr - fscrypt_fname_usr_to_disk - fscrypt_setup_filename - fscrypt_free_filename b. specific filename handling functions - fscrypt_fname_alloc_buffer - fscrypt_fname_free_buffer 6. Makefile and Kconfig Cc: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Ildar Muslukhov <ildarm@google.com> Signed-off-by: Uday Savagaonkar <savagaon@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-03-17 21:19:33 -07:00
Yang Shi	59692b7c71	f2fs: mutex can't be used by down_write_nest_lock() f2fs_lock_all() calls down_write_nest_lock() to acquire a rw_sem and check a mutex, but down_write_nest_lock() is designed for two rw_sem accoring to the comment in include/linux/rwsem.h. And, other than f2fs, it is just called in mm/mmap.c with two rwsem. So, it looks it is used wrongly by f2fs. And, it causes the below compile warning on -rt kernel too. In file included from fs/f2fs/xattr.c:25:0: fs/f2fs/f2fs.h: In function 'f2fs_lock_all': fs/f2fs/f2fs.h:962:34: warning: passing argument 2 of 'down_write_nest_lock' from incompatible pointer type [-Wincompatible-pointer-types] f2fs_down_write(&sbi->cp_rwsem, &sbi->cp_mutex); ^ fs/f2fs/f2fs.h:27:55: note: in definition of macro 'f2fs_down_write' #define f2fs_down_write(x, y) down_write_nest_lock(x, y) ^ In file included from include/linux/rwsem.h:22:0, from fs/f2fs/xattr.c:21: include/linux/rwsem_rt.h:138:20: note: expected 'struct rw_semaphore ' but argument is of type 'struct mutex ' static inline void down_write_nest_lock(struct rw_semaphore *sem, Signed-off-by: Yang Shi <yang.shi@linaro.org> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-03-02 10:22:14 -08:00
Chao Yu	406657dd18	f2fs: introduce f2fs_flush_merged_bios for cleanup Add a new helper f2fs_flush_merged_bios to clean up redundant codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-26 11:52:02 -08:00
Chao Yu	f28b3434af	f2fs: introduce f2fs_update_data_blkaddr for cleanup Add a new help f2fs_update_data_blkaddr to clean up redundant codes. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-26 11:52:01 -08:00
Chao Yu	4356e48e64	f2fs crypto: fix incorrect positioning for GCing encrypted data page For now, flow of GCing an encrypted data page: 1) try to grab meta page in meta inode's mapping with index of old block address of that data page 2) load data of ciphertext into meta page 3) allocate new block address 4) write the meta page into new block address 5) update block address pointer in direct node page. Other reader/writer will use f2fs_wait_on_encrypted_page_writeback to check and wait on GCed encrypted data cached in meta page writebacked in order to avoid inconsistence among data page cache, meta page cache and data on-disk when updating. However, we will use new block address updated in step 5) as an index to lookup meta page in inner bio buffer. That would be wrong, and we will never find the GCing meta page, since we use the old block address as index of that page in step 1). This patch fixes the issue by adjust the order of step 1) and step 3), and in step 1) grab page with index generated in step 3). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-26 11:51:58 -08:00
Chao Yu	7a9d75481b	f2fs: trace old block address for CoWed page This patch enables to trace old block address of CoWed page for better debugging. f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f0, oldaddr = 0xfe8ab, newaddr = 0xfee90 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4f8, oldaddr = 0xfe8b0, newaddr = 0xfee91 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 1, page_index = 0x1d4fa, oldaddr = 0xfe8ae, newaddr = 0xfee92 rw = WRITE_SYNC, type = NODE f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x96, oldaddr = 0xf049b, newaddr = 0x2bbe rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x97, oldaddr = 0xf049c, newaddr = 0x2bbf rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 134824, page_index = 0x98, oldaddr = 0xf049d, newaddr = 0x2bc0 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x47, oldaddr = 0xffffffff, newaddr = 0xf2631 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x48, oldaddr = 0xffffffff, newaddr = 0xf2632 rw = WRITE, type = DATA f2fs_submit_page_mbio: dev = (1,0), ino = 135260, page_index = 0x49, oldaddr = 0xffffffff, newaddr = 0xf2633 rw = WRITE, type = DATA Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 21:40:02 -08:00
Shawn Lin	984ec63c5a	f2fs: move sanity checking of cp into get_valid_checkpoint >From the function name of get_valid_checkpoint, it seems to return the valid cp or NULL for caller to check. If no valid one is found, f2fs_fill_super will print the err log. But if get_valid_checkpoint get one valid(the return value indicate that it's valid, however actually it is invalid after sanity checking), then print another similar err log. That seems strange. Let's keep sanity checking inside the procedure of geting valid cp. Another improvement we gained from this move is that even the large volume is supported, we check the cp in advanced to skip the following procedure if failing the sanity checking. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 21:39:56 -08:00
Chao Yu	dfc08a12e4	f2fs: introduce f2fs_journal struct to wrap journal info Introduce a new structure f2fs_journal to wrap journal info in struct f2fs_summary_block for readability. struct f2fs_journal { union { __le16 n_nats; __le16 n_sits; }; union { struct nat_journal nat_j; struct sit_journal sit_j; struct f2fs_extra_info info; }; } __packed; struct f2fs_summary_block { struct f2fs_summary entries[ENTRIES_IN_SUM]; struct f2fs_journal journal; struct summary_footer footer; } __packed; Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 21:39:53 -08:00
Chao Yu	922ec355f8	f2fs crypto: avoid unneeded memory allocation when {en/de}crypting symlink This patch adopts f2fs with codes of ext4, it removes unneeded memory allocation in creating/accessing path of symlink. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	28bc106b23	f2fs: support revoking atomic written pages f2fs support atomic write with following semantics: 1. open db file 2. ioctl start atomic write 3. (write db file) * n 4. ioctl commit atomic write 5. close db file With this flow we can avoid file becoming corrupted when abnormal power cut, because we hold data of transaction in referenced pages linked in inmem_pages list of inode, but without setting them dirty, so these data won't be persisted unless we commit them in step 4. But we should still hold journal db file in memory by using volatile write, because our semantics of 'atomic write support' is incomplete, in step 4, we could fail to submit all dirty data of transaction, once partial dirty data was committed in storage, then after a checkpoint & abnormal power-cut, db file will be corrupted forever. So this patch tries to improve atomic write flow by adding a revoking flow, once inner error occurs in committing, this gives another chance to try to revoke these partial submitted data of current transaction, it makes committing operation more like aotmical one. If we're not lucky, once revoking operation was failed, EAGAIN will be reported to user for suggesting doing the recovery with held journal file, or retrying current transaction again. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	29b96b547e	f2fs: split drop_inmem_pages from commit_inmem_pages Split drop_inmem_pages from commit_inmem_pages for code readability, and prepare for the following modification. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	ce855a3bd0	f2fs crypto: f2fs_page_crypto() doesn't need a encryption context This patch adopts: ext4 crypto: ext4_page_crypto() doesn't need a encryption context Since ext4_page_crypto() doesn't need an encryption context (at least not any more), this allows us to simplify a number function signature and also allows us to avoid needing to allocate a context in ext4_block_write_begin(). It also means we no longer need a separate ext4_decrypt_one() function. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	24b8491251	f2fs: preallocate blocks for buffered aio writes This patch preallocates data blocks for buffered aio writes. With this patch, we can avoid redundant locking and unlocking of node pages given consecutive aio request. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	b439b103a6	f2fs: move dio preallocation into f2fs_file_write_iter This patch moves preallocation code for direct IOs into f2fs_file_write_iter. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	0c3a579758	f2fs: introduce f2fs_submit_merged_bio_cond f2fs use single bio buffer per type data (META/NODE/DATA) for caching writes locating in continuous block address as many as possible, after submitting, these writes may be still cached in bio buffer, so we have to flush cached writes in bio buffer by calling f2fs_submit_merged_bio. Unfortunately, in the scenario of high concurrency, bio buffer could be flushed by someone else before we submit it as below reasons: a) there is no space in bio buffer. b) add a request of different type (SYNC, ASYNC). c) add a discontinuous block address. For this condition, f2fs_submit_merged_bio will be devastating, because it could break the following merging of writes in bio buffer, split one big bio into two smaller one. This patch introduces f2fs_submit_merged_bio_cond which can do a conditional submitting with bio buffer, before submitting it will judge whether: - page in DATA type bio buffer is matching with specified page; - page in DATA type bio buffer is belong to specified inode; - page in NODE type bio buffer is belong to specified inode; If there is no eligible page in bio buffer, we will skip submitting step, result in gaining more chance to merge consecutive block IOs in bio cache. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	da85985c61	f2fs: speed up handling holes in fiemap This patch makes f2fs_map_blocks supporting returning next potential page offset which skips hole region in indirect tree of inode, and use it to speed up fiemap in handling big hole case. Test method: xfs_io -f /mnt/f2fs/file -c "pwrite 1099511627776 4096" time xfs_io -f /mnt/f2fs/file -c "fiemap -v" Before: time xfs_io -f /mnt/f2fs/file -c "fiemap -v" /mnt/f2fs/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2147483647]: hole 2147483648 1: [2147483648..2147483655]: 81920..81927 8 0x1 real 3m3.518s user 0m0.000s sys 3m3.456s After: time xfs_io -f /mnt/f2fs/file -c "fiemap -v" /mnt/f2fs/file: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2147483647]: hole 2147483648 1: [2147483648..2147483655]: 81920..81927 8 0x1 real 0m0.008s user 0m0.000s sys 0m0.008s Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	3cf4574705	f2fs: introduce get_next_page_offset to speed up SEEK_DATA When seeking data in ->llseek, if we encounter a big hole which covers several dnode pages, we will try to seek data from index of page which is the first page of next dnode page, at most we could skip searching (ADDRS_PER_BLOCK - 1) pages. However it's still not efficient, because if our indirect/double-indirect pointer are NULL, there are no dnode page locate in the tree indirect/ double-indirect pointer point to, it's not necessary to search the whole region. This patch introduces get_next_page_offset to calculate next page offset based on current searching level and max searching level returned from get_dnode_of_data, with this, we could skip searching the entire area indirect or double-indirect node block is not exist. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	81ca7350ce	f2fs: remove unneeded pointer conversion There are redundant pointer conversion in following call stack: - at position a, inode was been converted to f2fs_file_info. - at position b, f2fs_file_info was been converted to inode again. - truncate_blocks(inode,..) - fi = F2FS_I(inode) ---a - ADDRS_PER_PAGE(node_page, fi) - addrs_per_inode(fi) - inode = &fi->vfs_inode ---b - f2fs_has_inline_xattr(inode) - fi = F2FS_I(inode) - is_inode_flag_set(fi,..) In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and addrs_per_inode to acept parameter with type of inode pointer. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Shuoran Liu	8f1dbbbbdf	f2fs: introduce lifetime write IO statistics This patch introduces lifetime IO write statistics exposed to the sysfs interface. The write IO amount is obtained from block layer, accumulated in the file system and stored in the hot node summary of checkpoint. Signed-off-by: Shuoran Liu <liushuoran@huawei.com> Signed-off-by: Pengyang Hou <houpengyang@huawei.com> [Jaegeuk Kim: add sysfs documentation] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Hou Pengyang	201ef5e080	f2fs: improve shrink performance of extent nodes On the worst case, we need to scan the whole radix tree and every rb-tree to free the victimed extent_nodes when shrinking. Pengyang initially introduced a victim_list to record the victimed extent_nodes, and free these extent_nodes by just scanning a list. Later, Chao Yu enhances the original patch to improve memory footprint by removing victim list. The policy of lru list shrinking becomes: 1) lock lru list's lock 2) trylock extent tree's lock 3) remove extent node from lru list 4) unlock lru list's lock 5) do shrink 6) repeat 1) to 5) Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Jaegeuk Kim	fec1d6576c	f2fs: use wait_for_stable_page to avoid contention In write_begin, if storage supports stable_page, we don't need to wait for writeback to update its contents. This patch introduces to use wait_for_stable_page instead of wait_on_page_writeback. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	2304cb0c44	f2fs: export dirty_nats_ratio in sysfs This patch exports a new sysfs entry 'dirty_nat_ratio' to control threshold of dirty nat entries, if current ratio exceeds configured threshold, checkpoint will be triggered in f2fs_balance_fs_bg for flushing dirty nats. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Chao Yu	0fd785eb93	f2fs: relocate is_merged_page Operations in is_merged_page is related to inner bio cache, move it to data.c. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-02-22 16:07:23 -08:00
Linus Torvalds	f9a03ae123	Merge tag 'for-f2fs-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This series adds two ioctls to control cached data and fragmented files. Most of the rest fixes missing error cases and bugs that we have not covered so far. Summary: Enhancements: - support an ioctl to execute online file defragmentation - support an ioctl to flush cached data - speed up shrinking of extent_cache entries - handle broken superblock - refector dirty inode management infra - revisit f2fs_map_blocks to handle more cases - reduce global lock coverage - add detecting user's idle time Major bug fixes: - fix data race condition on cached nat entries - fix error cases of volatile and atomic writes" * tag 'for-f2fs-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (87 commits) f2fs: should unset atomic flag after successful commit f2fs: fix wrong memory condition check f2fs: monitor the number of background checkpoint f2fs: detect idle time depending on user behavior f2fs: introduce time and interval facility f2fs: skip releasing nodes in chindless extent tree f2fs: use atomic type for node count in extent tree f2fs: recognize encrypted data in f2fs_fiemap f2fs: clean up f2fs_balance_fs f2fs: remove redundant calls f2fs: avoid unnecessary f2fs_balance_fs calls f2fs: check the page status filled from disk f2fs: introduce __get_node_page to reuse common code f2fs: check node id earily when readaheading node page f2fs: read isize while holding i_mutex in fiemap Revert "f2fs: check the node block address of newly allocated nid" f2fs: cover more area with nat_tree_lock f2fs: introduce max_file_blocks in sbi f2fs crypto: check CONFIG_F2FS_FS_XATTR for encrypted symlink f2fs: introduce zombie list for fast shrinking extent trees ...	2016-01-13 21:01:44 -08:00
Jaegeuk Kim	42190d2a86	f2fs: monitor the number of background checkpoint This patch adds to show the number of background checkpoint. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-11 15:56:42 -08:00
Jaegeuk Kim	d0239e1bf5	f2fs: detect idle time depending on user behavior This patch adds last time that user requested filesystem operations. This information is used to detect whether system is idle or not later. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-11 15:56:37 -08:00
Jaegeuk Kim	6beceb5427	f2fs: introduce time and interval facility This patch adds time and interval arrays to store some timing variables. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-11 15:36:27 -08:00
Chao Yu	68e3538510	f2fs: use atomic type for node count in extent tree 1. rename field in struct extent_tree from count to node_cnt for readability. 2. alter to use atomic type for node_cnt. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-08 11:57:11 -08:00
Jaegeuk Kim	2c4db1a6f6	f2fs: clean up f2fs_balance_fs This patch adds one parameter to clean up all the callers of f2fs_balance_fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-08 11:45:23 -08:00
Jaegeuk Kim	12719ae14e	f2fs: avoid unnecessary f2fs_balance_fs calls Only when node page is newly dirtied, it needs to check whether we need to do f2fs_gc. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-08 11:45:22 -08:00
Chao Yu	e0afc4d6d0	f2fs: introduce max_file_blocks in sbi Introduce max_file_blocks in sbi to store max block index of file in f2fs, it could be used to avoid unneeded calculation of max block index in runtime. Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: fix overflow of sbi->max_file_blocks] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2016-01-03 21:40:04 -08:00
Jaegeuk Kim	137d09f002	f2fs: introduce zombie list for fast shrinking extent trees This patch removes refcount, and instead, adds zombie_list to shrink directly without radix tree traverse. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-31 15:39:22 -08:00
Jaegeuk Kim	c00ba55485	f2fs: monitor zombie_tree count This patch adds an entry to show the number of zombie extent_tree. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-31 15:33:00 -08:00
Jaegeuk Kim	ed3d12561a	f2fs: load largest extent all the time Otherwise, we can get mismatched largest extent information. One example is: 1. mount f2fs w/ extent_cache 2. make a small extent 3. umount 4. mount f2fs w/o extent_cache 5. update the largest extent 6. umount 7. mount f2fs w/ extent_cache 8. get the old extent made by #2 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:20 -08:00
Yunlei He	179448bfe4	f2fs: add a max block check for get_data_block_bmap This patch adds a max block check for get_data_block_bmap. Trinity test program will send a block number as parameter into ioctl_fibmap, which will be used in get_node_path(), when the block number large than f2fs max blocks, it will trigger kernel bug. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Xue Liu <liuxueliu.liu@huawei.com> [Jaegeuk Kim: fix missing condition, pointed by Chao Yu] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:17 -08:00
Chao Yu	6d5a1495ee	f2fs: let user being aware of IO error Sometimes we keep dumb when IO error occur in lower layer device, so user will not receive any error return value for some operation, but actually, the operation did not succeed. This sould be avoided, so this patch reports such kind of error to user. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:15 -08:00
Chao Yu	c34f42e2cb	f2fs: report error of do_checkpoint do_checkpoint and write_checkpoint can fail due to reasons like triggering in a readonly fs or encountering IO error of storage device. So it's better to report such error info to user, let user be aware of failure of doing checkpoint. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:09 -08:00
Jaegeuk Kim	93bae099ea	f2fs: record node block allocation in dnode_of_data This patch introduces recording node block allocation in dnode_of_data. This information helps to figure out whether any node block is allocated during specific file operations. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:14:07 -08:00
Jaegeuk Kim	74fd8d9927	f2fs: speed up shrinking extent tree entries If there is no candidates for shrinking slab entries, we don't need to traverse any trees at all. Reviewed-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: fix missing initialization reported by Yunlei He] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-30 10:13:00 -08:00
Jaegeuk Kim	7441ccef33	f2fs: use atomic variable for total_extent_tree It would be better to use atomic variable for total_extent_tree. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-22 10:31:41 -08:00
Chao Yu	33fbd5100d	f2fs: stat dirty regular/symlink inodes Add to stat dirty regular and symlink inode for showing in debugfs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-17 09:53:19 -08:00
Chao Yu	343f40f0a7	f2fs: introduce new option for controlling data flush Add a new option 'data_flush' to enable data flush functionality. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-16 09:25:48 -08:00
Chao Yu	c227f91273	f2fs: record dirty status of regular/symlink inode Maintain regular/symlink inode which has dirty pages in global dirty list and record their total dirty pages count like the way of handling directory inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-16 08:58:12 -08:00
Chao Yu	e8240f656d	f2fs: don't grab super block buffer header all the time We have already got one copy of valid super block in memory, do not grab buffer header of super block all the time. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-16 08:58:06 -08:00
Chao Yu	2710fd7e00	f2fs: introduce dirty list node in inode info Add a new dirt list node member in inode info for linking the inode to global dirty list in superblock, instead of old implementation which allocate slab cache memory as an entry to inode. It avoids memory pressure due to slab cache allocation, and also makes codes more clean. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-15 13:24:19 -08:00
Chao Yu	a49324f127	f2fs: rename {add,remove,release}_dirty_inode to {add,remove,release}_ino_entry remove_dirty_dir_inode will be renamed to remove_dirty_inode as a generic function in following patch for removing directory/regular/symlink inode in global dirty list. Here rename ino management related functions for readability, also in order to avoid name conflict. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-15 13:23:43 -08:00
Al Viro	886f56f970	f2fs: it's umode_t, not mode_t... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-06 21:18:00 -05:00
Chao Yu	3519e3f992	f2fs: use sbi->blocks_per_seg to avoid unnecessary calculation Use sbi->blocks_per_seg directly to avoid unnecessary calculation when using 1 << sbi->log_blocks_per_seg. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-04 12:07:57 -08:00
Chao Yu	9006f2c93f	f2fs: kill f2fs_drop_largest_extent For direct IO, f2fs only allocate new address for the block which is not exist in the disk before, its mapping info should not exist in extent cache previously, so here we do not need to call f2fs_drop_largest_extent to drop related cache. Due to no more callers for f2fs_drop_largest_extent now, kill it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-04 12:07:57 -08:00
Chao Yu	04ef4b626c	f2fs: fix to enable missing ioctl interfaces in ->compat_ioctl In 64-bit kernel f2fs can supports 32-bit ioctl system call by identifying encoded code which is converted from 32-bit one to 64-bit one in ->compat_ioctl. When we introduced new interfaces in ->ioctl, we forgot to enable them in ->compat_ioctl, so enable them for fixing. Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: fix wrongly added spaces together] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-04 11:52:34 -08:00
Chao Yu	d323d005ac	f2fs: support file defragment This patch introduces a new ioctl F2FS_IOC_DEFRAGMENT to support file defragment in a specified range of regular file. This ioctl can be used in very limited workload: if user expects high sequential read performance in randomly written file, this interface can be used for defragmentation, after that file can be written as continuous as possible in the device. Meanwhile, it has side-effect, it will make holes in segments where blocks located originally, so it's better to trigger GC to eliminate fragment in segments. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-04 11:52:33 -08:00
Chao Yu	787c7b8cb3	f2fs: report error of f2fs_create_root_stats f2fs_create_root_stats can fail due to no memory, report it to user. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-12-04 11:52:33 -08:00
Jaegeuk Kim	67f8cf3cee	f2fs: support fiemap for inline_data There is a FIEMAP_EXTENT_INLINE_DATA, pointed out by Marc. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-20 11:33:21 -07:00
Chao Yu	08b39fbd59	f2fs crypto: fix racing of accessing encrypted page among different competitors Since we use different page cache (normally inode's page cache for R/W and meta inode's page cache for GC) to cache the same physical block which is belong to an encrypted inode. Writeback of these two page cache should be exclusive, but now we didn't handle writeback state well, so there may be potential racing problem: a) kworker: f2fs_gc: - f2fs_write_data_pages - f2fs_write_data_page - do_write_data_page - write_data_page - f2fs_submit_page_mbio (page#1 in inode's page cache was queued in f2fs bio cache, and be ready to write to new blkaddr) - gc_data_segment - move_encrypted_block - pagecache_get_page (page#2 in meta inode's page cache was cached with the invalid datas of physical block located in new blkaddr) - f2fs_submit_page_mbio (page#1 was submitted, later, page#2 with invalid data will be submitted) b) f2fs_gc: - gc_data_segment - move_encrypted_block - f2fs_submit_page_mbio (page#1 in meta inode's page cache was queued in f2fs bio cache, and be ready to write to new blkaddr) user thread: - f2fs_write_begin - f2fs_submit_page_bio (we submit the request to block layer to update page#2 in inode's page cache with physical block located in new blkaddr, so here we may read gabbage data from new blkaddr since GC hasn't writebacked the page#1 yet) This patch fixes above potential racing problem for encrypted inode. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-13 09:52:34 -07:00
Chao Yu	ea1a29a0bd	f2fs: export ra_nid_pages to sysfs After finishing building free nid cache, we will try to readahead asynchronously 4 more pages for the next reloading, the count of readahead nid pages is fixed. In some case, like SMR drive, read less sectors with fixed count each time we trigger RA may be low efficient, since we will face high seeking overhead, so we'd better let user to configure this parameter from sysfs in specific workload. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 14:03:43 -07:00
Chao Yu	26879fb101	f2fs: support lower priority asynchronous readahead in ra_meta_pages Now, we use ra_meta_pages to reads continuous physical blocks as much as possible to improve performance of following reads. However, ra_meta_pages uses a synchronous readahead approach by submitting bio with READ, as READ is with high priority, it can not be used in the case of preloading blocks, and it's not sure when these RAed pages will be used. This patch supports asynchronous readahead in ra_meta_pages by tagging bio with READA flag in order to allow preloading. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 14:03:15 -07:00
Chao Yu	2b947003fa	f2fs: don't tag REQ_META for temporary non-meta pages In recovery or checkpoint flow, we grab pages temperarily in meta inode's mapping for caching temperary data, actually, datas in these pages were not meta data of f2fs, but still we tag them with REQ_META flag. However, lower device like eMMC may do some optimization for data of such type. So in order to avoid wrong optimization, we'd better remove such flag for temperary non-meta pages. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 14:01:46 -07:00
Jaegeuk Kim	a56c7c6fb3	f2fs: set GFP_NOFS for grab_cache_page For normal inodes, their pages are allocated with __GFP_FS, which can cause filesystem calls when reclaiming memory. This can incur a dead lock condition accordingly. So, this patch addresses this problem by introducing f2fs_grab_cache_page(.., bool for_write), which calls grab_cache_page_write_begin() with AOP_FLAG_NOFS. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 13:38:03 -07:00
Jaegeuk Kim	6e2c64ad7c	f2fs: fix SSA updates resulting in corruption The f2fs_collapse_range and f2fs_insert_range changes the block addresses directly. But that can cause uncovered SSA updates. In that case, we need to give up to change the block addresses and do buffered writes to keep filesystem consistency. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 13:38:02 -07:00
Jaegeuk Kim	c912a8298c	f2fs: add F2FS_GOING_DOWN_METAFLUSH to test power-failure This patch introduces F2FS_GOING_DOWN_METAFLUSH which flushes meta pages like SSA blocks and then blocks all the writes. This can be used by power-failure tests. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-12 13:37:54 -07:00
Jaegeuk Kim	60b99b486b	f2fs: introduce a periodic checkpoint flow This patch introduces a periodic checkpoint feature. Note that, this is not enforcing to conduct checkpoints very strictly in terms of trigger timing, instead just hope to help user experiences. The default value is 60 seconds. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:57 -07:00
Jaegeuk Kim	6aefd93b01	f2fs: introduce background_gc=sync mount option This patch introduce background_gc=sync enabling synchronous cleaning in background. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:57 -07:00
Chao Yu	456b88e4d1	f2fs: introduce a new ioctl F2FS_IOC_WRITE_CHECKPOINT This patch introduce a new ioctl for those users who want to trigger checkpoint from userspace through ioctl. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:56 -07:00
Chao Yu	d530d4d8e2	f2fs: support synchronous gc in ioctl This patch drops in batches gc triggered through ioctl, since user can easily control the gc by designing the loop around the ->ioctl. We support synchronous gc by forcing using FG_GC in f2fs_gc, so with it, user can make sure that in this round all blocks gced were persistent in the device until ioctl returned. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:56 -07:00
Chao Yu	5b7ee37414	f2fs: use atomic64_t for extent cache hit stat Our hit stat of extent cache will increase all the time until remount, and we use atomic_t type for the stat variable, so it may easily incur overflow when we query extent cache frequently in a long time running fs. So to avoid that, this patch uses atomic64_t for hit stat variables. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:55 -07:00
Jaegeuk Kim	39307a8e24	f2fs: use vmalloc to handle -ENOMEM error This patch introduces f2fs_kvmalloc to avoid -ENOMEM during mount. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:55 -07:00
Chao Yu	4abd3f5ac4	f2fs: introduce __try_update_largest_extent This patch adds a new helper __try_update_largest_extent for cleanup. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:53 -07:00
Chao Yu	9edcdabf36	f2fs: fix overflow of size calculation We have potential overflow issue when calculating size of object, when we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only 32-bits space in 32-bit architecture, left shifting will incur overflow, i.e: pgoff_t index = 0xFFFFFFFF; loff_t size = index << PAGE_CACHE_SHIFT; size: 0xFFFFF000 So we should cast index with 64-bits type to avoid this issue. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-10-09 16:20:50 -07:00
Chao Yu	19b2c30d3c	f2fs: update extent tree in batches This patch introduce a new helper f2fs_update_extent_tree_range which can do extent mapping update at a specified range. The main idea is: 1) punch all mapping info in extent node(s) which are at a specified range; 2) try to merge new extent mapping with adjacent node, or failing that, insert the mapping into extent tree as a new node. In order to see the benefit, I add a function for stating time stamping count as below: uint64_t rdtsc(void) { uint32_t lo, hi; __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi)); return (uint64_t)hi << 32 \| lo; } My test environment is: ubuntu, intel i7-3770, 16G memory, 256g micron ssd. truncation path: update extent cache from truncate_data_blocks_range non-truncataion path: update extent cache from other paths total: all update paths a) Removing 128MB file which has one extent node mapping whole range of file: 1. dd if=/dev/zero of=/mnt/f2fs/128M bs=1M count=128 2. sync 3. rm /mnt/f2fs/128M Before: total count average truncation: 7651022 32768 233.49 Patched: total count average truncation: 3321 33 100.64 b) fsstress: fsstress -d /mnt/f2fs -l 5 -n 100 -p 20 Test times: 5 times. Before: total count average truncation: 5812480.6 20911.6 277.95 non-truncation: 7783845.6 13440.8 579.12 total: 13596326.2 34352.4 395.79 Patched: total count average truncation: 1281283.0 3041.6 421.25 non-truncation: 7355844.4 13662.8 538.38 total: 8637127.4 16704.4 517.06 1) For the updates in truncation path: - we can see updating in batches leads total tsc and update count reducing explicitly; - besides, for a single batched updating, punching multiple extent nodes in a loop, result in executing more operations, so our average tsc increase intensively. 2) For the updates in non-truncation path: - there is a little improvement, that is because for the scenario that we just need to update in the head or tail of extent node, new interface optimize to update info in extent node directly, rather than removing original extent node for updating and then inserting that updated one into cache as new node. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-26 11:50:35 -07:00
Chao Yu	13ec7297e5	f2fs: fix to release inode correctly In following call stack, if unfortunately we lose all chances to truncate inode page in remove_inode_page, eventually we will add the nid allocated previously into free nid cache, this nid is with NID_NEW status and with NEW_ADDR in its blkaddr pointer: - f2fs_create - f2fs_add_link - __f2fs_add_link - init_inode_metadata - new_inode_page - new_node_page - set_node_addr(, NEW_ADDR) - f2fs_init_acl failed - remove_inode_page failed - handle_failed_inode - remove_inode_page failed - iput - f2fs_evict_inode - remove_inode_page failed - alloc_nid_failed cache a nid with valid blkaddr: NEW_ADDR This may not only cause resource leak of previous inode, but also may cause incorrect use of the previous blkaddr which is located in NO.nid node entry when this nid is reused by others. This patch tries to add this inode to orphan list if we fail to truncate inode, so that we can obtain a second chance to release it in orphan recovery flow. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-24 16:35:59 -07:00
Chao Yu	b01548919c	f2fs: handle f2fs_truncate error correctly This patch fixes to return error number of f2fs_truncate, so that we can handle the error correctly in callers. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-24 09:39:56 -07:00
Jaegeuk Kim	80c545055d	f2fs: use __GFP_NOFAIL to avoid infinite loop __GFP_NOFAIL can avoid retrying the whole path of kmem_cache_alloc and bio_alloc. And, it also fixes the use cases of GFP_ATOMIC correctly. Suggested-by: Chao Yu <chao2.yu@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-24 09:37:21 -07:00
Chao Yu	029e13cc32	f2fs: adjust showing of extent cache stat This patch alters to replace total hit stat with rbtree hit stat, and then adjust showing of extent cache stat: Hit Count: L1-1: for largest node hit count; L1-2: for last cached node hit count; L2: for extent node hit after lookuping in rbtree. Hit Ratio: ratio (hit count / total lookup count) Inner Struct Count: tree count, node count. Before: Extent Hit Ratio: 0 / 2 Extent Tree Count: 3 Extent Node Count: 2 Patched: Exten Cacache: - Hit Count: L1-1:4871 L1-2:2074 L2:208 - Hit Ratio: 1% (7153 / 550751) - Inner Struct Count: tree: 26560, node: 11824 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-21 22:45:16 -07:00
Chao Yu	91c481fff9	f2fs: add largest/cached stat in extent cache This patch adds to stat the hit count of largest/cached node for showing in debugfs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-21 22:45:15 -07:00
Chao Yu	e2b4e2bc88	f2fs: fix incorrect mapping for bmap The test step is like below: 1. touch file 2. truncate -s $((10241024)) file 3. fallocate -o 0 -l $((10241024)) file 4. fibmap.f2fs file Our result of fibmap.f2fs showed below is not correct: file_pos start_blk end_blk blks 0 -937166132 -937166132 1 4096 -937166132 -937166132 1 8192 -937166132 -937166132 1 12288 -937166132 -937166132 1 16384 -937166132 -937166132 1 20480 -937166132 -937166132 1 ... 1040384 -937166132 -937166132 1 1044480 -937166132 -937166132 1 This is because f2fs_map_blocks will return with no error when meeting a hole or preallocated block, the caller __get_data_block will map the uninitialized variable value to bh->b_blocknr. Unfortunately generic_block_bmap will neither check the return value of get_data() nor check mapping info of buffer_head, result in returning the random block address. After fixing the issue, our result shows correctly: file_pos start_blk end_blk blks 0 0 0 256 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-21 22:45:14 -07:00
Jaegeuk Kim	740432f835	f2fs: handle failed bio allocation As the below comment of bio_alloc_bioset, f2fs can allocate multiple bios at the same time. So, we can't guarantee that bio is allocated all the time. " * When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be * able to allocate a bio. This is due to the mempool guarantees. To make this * work, callers must never allocate more than 1 bio at a time from this pool. * Callers that need to allocate more than 1 bio must always submit the * previously allocated bio for IO before attempting to allocate a new one. * Failure to do so can cause deadlocks under memory pressure. " Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-20 09:00:09 -07:00
Jaegeuk Kim	a6db67f06f	f2fs: increase the number of max hard links This patch increases the number of maximum hard links for one file. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-20 09:00:08 -07:00
Chao Yu	31696580bf	f2fs: shrink free_nids entries This patch introduces __count_free_nids/try_to_free_nids and registers them in slab shrinker for shrinking under memory pressure. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-20 09:00:06 -07:00
Chao Yu	8c14bfadea	f2fs: handle error of f2fs_iget correctly In recover_orphan_inode, whenever f2fs_iget fail, we will make kernel panic, but it's not reasonable, because f2fs_iget can fail due to a lot of reasons including out of memory. So we change error handling method as below: a) when finding no entry for the orphan inode, bug_on for catching bugs; b) for other reasons, report it to caller. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-14 16:02:14 -07:00
Chao Yu	decd36b6c4	f2fs: remove inmem radix tree Previously, we use radix tree to index all registered page entries for atomic file, but now we only use radix tree to see whether current page is indexed or not, since the other user of radix tree is gone in commit `042b7816aa` ("f2fs: remove unnecessary call to invalidate inmemory pages"). So in this patch, we try to use one more efficient way: Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private value to indicate page indexing status. By using this way, we can save memory and lookup time. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-11 11:31:14 -07:00
Fan Li	759af1c9c1	f2fs: use extent cache to optimize f2fs_reserve_block In some cases, we only need the block address when we call f2fs_reserve_block, other fields of struct dnode_of_data aren't necessary. We can try extent cache first for such cases in order to speed up the process. Signed-off-by: Fan li <fanofcode.li@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 22:15:42 -07:00
Chao Yu	e90c2d2850	f2fs: invalidate temporary meta page To avoid meeting garbage data in next free node block at the end of warm node chain when doing recovery, we will try to zero out that invalid block. If the device is not support discard, our way for zeroing out block is: grabbing a temporary zeroed page in meta inode, then, issue write request with this page. But, we forget to release that temporary page, so our memory usage will increase without gaining any hit ratio benefit, so it's better to free it for saving memory. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:19:21 -07:00
Jaegeuk Kim	edb27deea7	f2fs: handle error cases in commit_inmem_pages This patch adds to handle error cases in commit_inmem_pages. If an error occurs, it stops to write the pages and return the error right away. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:08:15 -07:00
Chao Yu	55f57d2c42	f2fs: fix double lock in handle_failed_inode In handle_failed_inode, there is a potential deadlock which can happen in below call path: - f2fs_create - f2fs_lock_op down_read(cp_rwsem) - f2fs_add_link - __f2fs_add_link - init_inode_metadata - f2fs_init_security failed - truncate_blocks failed - handle_failed_inode - f2fs_truncate - truncate_blocks(..,true) - write_checkpoint - block_operations - f2fs_lock_all down_write(cp_rwsem) - f2fs_lock_op down_read(cp_rwsem) So in this path, we pass parameter to f2fs_truncate to make sure cp_rwsem in truncate_blocks will not be locked again. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:08:09 -07:00
Chao Yu	727edac572	f2fs: use atomic_t to record hit ratio info of extent cache Variables for recording extent cache ratio info were updated without protection, this patch tries to alter them to atomic_t type for more accurate stat. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:08:06 -07:00
Chao Yu	d5e8f6c980	f2fs: stat inline xattr inode number This patch adds to stat the number of inline xattr inode for showing in debugfs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-05 08:08:05 -07:00
Chao Yu	c1c1b58359	f2fs: add new ioctl F2FS_IOC_GARBAGE_COLLECT When background gc is off, the only way to trigger gc is executing a force gc in some operations who wants to grab space in disk. The executing condition is limited: to execute force gc, we should wait for the time when there is almost no more free section for LFS allocation. This seems not reasonable for our user who wants to control triggering gc by himself. This patch introduces F2FS_IOC_GARBAGE_COLLECT interface for triggering garbage collection by using ioctl. It provides our users one more option to trigger gc. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:58 -07:00
Chao Yu	a28ef1f5ae	f2fs: maintain extent cache in separated file This patch moves extent cache related code from data.c into extent_cache.c since extent cache is independent feature, and its codes are not relate to others in data.c, it's better for us to maintain them in separated place. There is no functionality change, but several small coding style fixes including: * rename __drop_largest_extent to f2fs_drop_largest_extent for exporting; * rename misspelled word 'untill' to 'until'; * remove unneeded 'return' in the end of f2fs_destroy_extent_tree(). Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:58 -07:00
Jaegeuk Kim	3e72f72139	f2fs: use extent_cache by default We don't need to handle the duplicate extent information. The integrated rule is: - update on-disk extent with largest one tracked by in-memory extent_cache - destroy extent_tree for the truncation case - drop per-inode extent_cache by shrinker Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:56 -07:00
Jaegeuk Kim	554df79e52	f2fs: shrink extent_cache entries This patch registers shrinking extent_caches. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:55 -07:00
Jaegeuk Kim	2658e50de6	f2fs: introduce a shrinker for mounted fs This patch introduces a shrinker targeting to reduce memory footprint consumed by a number of in-memory f2fs data structures. In addition, it newly adds: - sbi->umount_mutex to avoid data races on shrinker and put_super - sbi->shruinker_run_no to not revisit objects Note that the basic implementation was copied from fs/ubifs/shrinker.c Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:54 -07:00
Jaegeuk Kim	c9b63bd01d	f2fs: avoid to use failed inode immediately Before iput is called, the inode number used by a bad inode can be reassigned to other new inode, resulting in any abnormal behaviors on the new inode. This should not happen for the new inode. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:53 -07:00
Chao Yu	5ac9f36fca	f2fs: fix to record dirty page count for symlink Dirty page can be exist in mapping of newly created symlink, but previously we did not maintain the counting of dirty page for symlink like we maintained for regular/directory, so the counting we lookuped should be wrong. This patch adds missed dirty page counting for symlink to fix this issue. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-08-04 14:09:52 -07:00
Chao Yu	c5bda1c8b1	f2fs: skip committing valid superblock In recovery procedure for superblock, we try to write data of valid superblock into invalid one for recovery, work should be finished here, but then still we will write the valid one with its original data. This operation is not needed. Let's skip doing this unnecessary work. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-06-08 11:20:51 -07:00
Chao Yu	528e34593d	f2fs: hide common code in f2fs_replace_block This patch clean up codes through: 1.rename f2fs_replace_block to __f2fs_replace_block(). 2.introduce new f2fs_replace_block() to include __f2fs_replace_block() and some common related codes around __f2fs_replace_block(). Then, newly introduced function f2fs_replace_block can be used by following patch. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-06-02 09:52:07 -07:00
Jaegeuk Kim	26bf3dc7e2	f2fs crypto: use per-inode tfm structure This patch applies the following ext4 patch: ext4 crypto: use per-inode tfm structure As suggested by Herbert Xu, we shouldn't allocate a new tfm each time we read or write a page. Instead we can use a single tfm hanging off the inode's crypt_info structure for all of our encryption needs for that inode, since the tfm can be used by multiple crypto requests in parallel. Also use cmpxchg() to avoid races that could result in crypt_info structure getting doubly allocated or doubly freed. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-06-01 16:21:04 -07:00
Chao Yu	381722d2ac	f2fs: introduce update_meta_page Add a help function update_meta_page() to update meta page with specified buffer. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-06-01 16:21:00 -07:00
Jaegeuk Kim	cfc4d971df	f2fs crypto: split f2fs_crypto_init/exit with two parts This patch splits f2fs_crypto_init/exit with two parts: base initialization and memory allocation. Firstly, f2fs module declares the base encryption memory pointers. Then, allocating internal memories is done at the first encrypted inode access. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-06-01 16:20:59 -07:00
Jaegeuk Kim	8bacf6deb0	f2fs crypto: use slab caches This patch integrates the below patch into f2fs. "ext4 crypto: use slab caches Use slab caches the ext4_crypto_ctx and ext4_crypt_info structures for slighly better memory efficiency and debuggability." Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-06-01 16:20:55 -07:00
Jaegeuk Kim	cbaf042a3c	f2fs crypto: add symlink encryption This patch implements encryption support for symlink. Signed-off-by: Uday Savagaonkar <savagaon@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:55 -07:00
Jaegeuk Kim	e7d5545285	f2fs crypto: add filename encryption for roll-forward recovery This patch adds a bit flag to indicate whether or not i_name in the inode is encrypted. If this name is encrypted, we can't do recover_dentry during roll-forward. So, f2fs_sync_file() needs to do checkpoint, if this will be needed in future. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:55 -07:00
Jaegeuk Kim	6e22c691ba	f2fs crypto: add filename encryption for f2fs_lookup This patch implements filename encryption support for f2fs_lookup. Note that, f2fs_find_entry should be outside of f2fs_(un)lock_op(). Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:54 -07:00
Jaegeuk Kim	d8c6822a05	f2fs crypto: add filename encryption for f2fs_readdir This patch implements filename encryption support for f2fs_readdir. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:53 -07:00
Jaegeuk Kim	4375a33664	f2fs crypto: add encryption support in read/write paths This patch adds encryption support in read and write paths. Note that, in f2fs, we need to consider cleaning operation. In cleaning procedure, we must avoid encrypting and decrypting written blocks. So, this patch implements move_encrypted_block(). Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:52 -07:00
Jaegeuk Kim	fcc85a4d86	f2fs crypto: activate encryption support for fs APIs This patch activates the following APIs for encryption support. The rules quoted by ext4 are: - An unencrypted directory may contain encrypted or unencrypted files or directories. - All files or directories in a directory must be protected using the same key as their containing directory. - Encrypted inode for regular file should not have inline_data. - Encrypted symlink and directory may have inline_data and inline_dentry. This patch activates the following APIs. 1. f2fs_link : validate context 2. f2fs_lookup : '' 3. f2fs_rename : '' 4. f2fs_create/f2fs_mkdir : inherit its dir's context 5. f2fs_direct_IO : do buffered io for regular files 6. f2fs_open : check encryption info 7. f2fs_file_mmap : '' 8. f2fs_setattr : '' 9. f2fs_file_write_iter : '' (Called by sys_io_submit) 10. f2fs_fallocate : do not support fcollapse 11. f2fs_evict_inode : free_encryption_info Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:51 -07:00
Jaegeuk Kim	6b3bd08f93	f2fs crypto: filename encryption facilities This patch adds filename encryption infra. Most of codes are copied from ext4 part, but changed to adjust f2fs directory structure. Signed-off-by: Uday Savagaonkar <savagaon@google.com> Signed-off-by: Ildar Muslukhov <ildarm@google.com> Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:50 -07:00
Jaegeuk Kim	0adda907f2	f2fs crypto: add encryption key management facilities This patch copies from encrypt_key.c in ext4, and modifies for f2fs. Use GFP_NOFS, since _f2fs_get_encryption_info is called under f2fs_lock_op. Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Ildar Muslukhov <muslukhovi@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:49 -07:00
Jaegeuk Kim	57e5055b0a	f2fs crypto: add f2fs encryption facilities Most of parts were copied from ext4, except: - add f2fs_restore_and_release_control_page which returns control page and restore control page - remove ext4_encrypted_zeroout() - remove sbi->s_file_encryption_mode & sbi->s_dir_encryption_mode - add f2fs_end_io_crypto_work for mpage_end_io Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Ildar Muslukhov <ildarm@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:49 -07:00
Jaegeuk Kim	f424f664f0	f2fs crypto: add encryption policy and password salt support This patch adds encryption policy and password salt support through ioctl implementation. It adds three ioctls: F2FS_IOC_SET_ENCRYPTION_POLICY, F2FS_IOC_GET_ENCRYPTION_POLICY, F2FS_IOC_GET_ENCRYPTION_PWSALT, which use xattr operations. Note that, these definition and codes are taken from ext4 crypto support. For f2fs, xattr operations and on-disk flags for superblock and inode were changed. Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ildar Muslukhov <muslukhovi@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:48 -07:00
Jaegeuk Kim	cde4de1205	f2fs crypto: declare some definitions for f2fs encryption feature This definitions will be used by inode and superblock for encyption. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:45 -07:00
Jaegeuk Kim	7f63eb77af	f2fs: report unwritten area in f2fs_fiemap This patch slightly changes f2fs_fiemap function to report unwritten area. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:45 -07:00
Chao Yu	19f106bc03	f2fs: introduce f2fs_replace_block() for reuse Introduce a generic function replace_block base on recover_data_page, and export it. So with it we can operate file's meta data which is in CP/SSA area when we invoke fallocate with FALLOC_FL_COLLAPSE_RANGE flag. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:42 -07:00
Chao Yu	402f721d2f	f2fs: remove unneeded f2fs_make_empty declaration Remove f2fs_make_empty() declaration, since the main body of this function is move into do_make_empty_dir() and the function is obsolete now. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:40 -07:00
Jaegeuk Kim	836b5a6356	f2fs: issue discard with finally produced len and minlen This patch determines to issue discard commands by comparing given minlen and the length of produced final candidates. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:39 -07:00
Jaegeuk Kim	a66cdd9855	f2fs: introduce discard_map for f2fs_trim_fs This patch adds a bitmap for discard issues from f2fs_trim_fs. There-in rule is to issue discard commands only for invalidated blocks after mount. Once mount is done, f2fs_trim_fs trims out whole invalid area. After ehn, it will not issue and discrads redundantly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:39 -07:00
Jaegeuk Kim	43f3eae1d3	f2fs: split find_data_page according to specific purposes This patch splits find_data_page as follows. 1. f2fs_gc - use get_read_data_page() with read only 2. find_in_level - use find_data_page without locked page 3. truncate_partial_page - In the case cache_only mode, just drop cached page. - Ohterwise, use get_lock_data_page() and guarantee to truncate Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:37 -07:00
Jaegeuk Kim	2dcf51ab2f	f2fs: add need_dentry_mark This patch introduces need_dentry_mark() to clean up and avoid redundant node locks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:36 -07:00
Jaegeuk Kim	eaa693f4dc	f2fs: introduce dot and dotdot name check This patch adds an inline function to check dot and dotdot names. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:34 -07:00
Jaegeuk Kim	05ca3632e5	f2fs: add sbi and page pointer in f2fs_io_info This patch adds f2fs_sb_info and page pointers in f2fs_io_info structure. With this change, we can reduce a lot of parameters for IO functions. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:32 -07:00
Jaegeuk Kim	01b960e94a	f2fs: add f2fs_may_inline_{data, dentry} This patch adds f2fs_may_inline_data and f2fs_may_inline_dentry. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:32 -07:00
Jaegeuk Kim	26d815ad75	f2fs: introduce f2fs_commit_super This patch introduces f2fs_commit_super to write updated superblock. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:29 -07:00
Jaegeuk Kim	003a3e1d60	f2fs: add f2fs_map_blocks This patch introduces f2fs_map_blocks structure likewise ext4_map_blocks. Now, f2fs uses f2fs_map_blocks when handling get_block. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:29 -07:00
Jaegeuk Kim	76f105a2db	f2fs: add feature facility in superblock This patch introduces a feature in superblock, which will indicate any new features for f2fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:28 -07:00
Jaegeuk Kim	b5492af78c	f2fs: move existing definitions into f2fs.h This patch moves some inode-related definitions from node.h to f2fs.h to add new features. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-28 15:41:27 -07:00
Chao Yu	2aa7c51a45	f2fs: make has_fsynced_inode static has_fsynced_inode() has no other caller out of node.c, make it static. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-07 11:38:32 -07:00
Chao Yu	f0c9cadae6	f2fs: use is_valid_blkaddr to verify blkaddr for readability Export is_valid_blkaddr() and use it to replace some codes for readability. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-05-07 11:38:32 -07:00
Jaegeuk Kim	5463e7c18e	Revert "f2fs: enhance multi-threads performance" This reports performance regression by Yuanhan Liu. The basic idea was to reduce one-point mutex, but it turns out this causes another contention like context swithes. https://lkml.org/lkml/2015/4/21/11 Until finishing the analysis on this issue, I'd like to revert this for a while. This reverts commit `78373b7319`.	2015-05-04 14:15:15 -07:00
Linus Torvalds	9ec3a646fe	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only	2015-04-26 17:22:07 -07:00
Jaegeuk Kim	10027551cc	f2fs: pass checkpoint reason on roll-forward recovery This patch adds CP_RECOVERY to remain recovery information for checkpoint. And, it makes sure writing checkpoint in this case. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-16 09:45:40 -07:00
David Howells	2b0143b5c9	VFS: normal filesystems (and lustre): d_inode() annotations that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-04-15 15:06:57 -04:00
Jaegeuk Kim	510022a858	f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries If f2fs was corrupted with missing dot dentries, it needs to recover them after fsck.f2fs detection. The underlying precedure is: 1. The fsck.f2fs remains F2FS_INLINE_DOTS flag in directory inode, if it detects missing dot dentries. 2. When f2fs looks up the corrupted directory, it triggers f2fs_add_link with proper inode numbers and their dot and dotdot names. 3. Once f2fs recovers the directory without errors, it removes F2FS_INLINE_DOTS finally. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:57 -07:00
Chao Yu	0bdee48250	f2fs: preserve extent info for extent cache This patch tries to preserve last extent info in extent tree cache into on-disk inode, so this can help us to reuse the last extent info next time for performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:51 -07:00
Chao Yu	028a41e893	f2fs: initialize extent tree with on-disk extent info of inode With normal extent info cache, we records largest extent mapping between logical block and physical block into extent info, and we persist extent info in on-disk inode. When we enable extent tree cache, if extent info of on-disk inode is exist, and the extent is not a small fragmented mapping extent. We'd better to load the extent info into extent tree cache when inode is loaded. By this way we can have more chance to hit extent tree cache rather than taking more time to read dnode page for block address. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:50 -07:00
Chao Yu	216a620a7c	f2fs: split set_data_blkaddr from f2fs_update_extent_cache Split __set_data_blkaddr from f2fs_update_extent_cache for readability. Additionally rename __set_data_blkaddr to set_data_blkaddr for exporting. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:49 -07:00
Jaegeuk Kim	8ce67cb07d	f2fs: add some tracepoints to debug volatile and atomic writes Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:47 -07:00
Jaegeuk Kim	3c6c2bebef	f2fs: avoid punch_hole overhead when releasing volatile data This patch is to avoid some punch_hole overhead when releasing volatile data. If volatile data was not written yet, we just can make the first page as zero. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:46 -07:00
Jaegeuk Kim	78373b7319	f2fs: enhance multi-threads performance Previously, f2fs_write_data_pages has a mutex, sbi->writepages, to serialize data writes to maximize write bandwidth, while sacrificing multi-threads performance. Practically, however, multi-threads environment is much more important for users. So this patch tries to remove the mutex. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:45 -07:00
Chao Yu	0bfcfcca3d	f2fs: fix to truncate inline data past EOF Previously if inode is with inline data, we will try to invalid partial inline data in page #0 when we truncate size of inode in truncate_partial_data_page(). And then we set page #0 to dirty, after this we can synchronize inode page with page #0 at ->writepage(). But sometimes we will fail to operate page #0 in truncate_partial_data_page() due to below reason: a) if offset is zero, we will skip setting page #0 to dirty. b) if page #0 is not uptodate, we will fail to update it as it has no mapping data. So with following operations, we will meet recent data which should be truncated. 1.write inline data to file 2.sync first data page to inode page 3.truncate file size to 0 4.truncate file size to max_inline_size 5.echo 1 > /proc/sys/vm/drop_caches 6.read file --> meet original inline data which is remained in inode page. This patch renames truncate_inline_data() to truncate_inline_inode() for code readability, then use truncate_inline_inode() to truncate inline data in inode page in truncate_blocks() and truncate page #0 in truncate_partial_data_page() for fixing. v2: o truncate partially #0 page in truncate_partial_data_page to avoid keeping old data in #0 page. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:41 -07:00
Changman Lee	e1235983e3	f2fs: add stat info for moved blocks by background gc This patch is for looking into gc performance of f2fs in detail. Signed-off-by: Changman Lee <cm224.lee@samsung.com> [Jaegeuk Kim: fix build errors] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:32 -07:00
Wanpeng Li	551414861f	f2fs: introduce macro __cp_payload This patch introduce macro __cp_payload. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:08:25 -07:00
Jaegeuk Kim	1abff93d01	f2fs: support fs shutdown This patch introduces a generic ioctl for fs shutdown, which was used by xfs. If this shutdown is triggered, filesystem stops any further IOs according to the following options. 1. FS_GOING_DOWN_FULLSYNC : this will flush all the data and dentry blocks, and do checkpoint before shutdown. 2. FS_GOING_DOWN_METASYNC : this will do checkpoint before shutdown. 3. FS_GOING_DOWN_NOSYNC : this will trigger shutdown as is. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-04-10 15:07:57 -07:00
Chao Yu	62c8af651b	f2fs: support fast lookup in extent cache This patch adds a fast lookup path for rb-tree extent cache. In this patch we add a recently accessed extent node pointer 'cached_en' in extent tree. In lookup path of extent cache, we will firstly lookup the last accessed extent node which cached_en points, if we do not hit in this node, we will try to lookup extent node in rb-tree. By this way we can avoid unnecessary slow lookup in rb-tree sometimes. Note that, side-effect of this patch is that we will increase memory cost, because we will store a pointer variable in each struct extent tree additionally. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-03-03 09:58:47 -08:00
Chao Yu	4bf6fd9fed	f2fs: show extent tree, node stat info in debugfs This patch add and show stat info of total memory footprint for extent tree,node in debugfs. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-03-03 09:58:47 -08:00
Chao Yu	8967215954	f2fs: add a mount option for rb-tree extent cache This patch adds a mount option 'extent_cache' in f2fs. It is try to use a rb-tree based extent cache to cache more mapping information with less memory if this option is set, otherwise we will use the original one extent info cache. Suggested-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-03-03 09:58:46 -08:00
Chao Yu	429511cdf8	f2fs: add core functions for rb-tree extent cache This patch adds core functions including slab cache init function and init/lookup/update/shrink/destroy function for rb-tree based extent cache. Thank Jaegeuk Kim and Changman Lee as they gave much suggestion about detail design and implementation of extent cache. Todo: * register rb-based extent cache shrink with mm shrink interface. v2: o move set_extent_info and __is_{extent,back,front}_mergeable into f2fs.h. o introduce __{attach,detach}_extent_node for code readability. o add cond_resched() when fail to invoke kmem_cache_alloc/radix_tree_insert. o fix some coding style and typo issues. v3: o fix oops due to using an unassigned pointer. o use list_del to remove extent node in shrink list. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Changman Lee <cm224.lee@samsung.com> [Jaegeuk Kim: add static for some funcitons and declare in f2fs.h] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-03-03 09:58:46 -08:00
Chao Yu	13054c548a	f2fs: introduce infra macro and data structure of rb-tree extent cache Introduce infra macro and data structure for rb-tree based extent cache: Macros: * EXT_TREE_VEC_SIZE: indicate vector size for gang lookup in extent tree. * F2FS_MIN_EXTENT_LEN: indicate minimum length of extent managed in cache. * EXTENT_CACHE_SHRINK_NUMBER: indicate number of extent in cache will be shrunk. Basic data structures for extent cache: * struct extent_tree: extent tree entry per inode. * struct extent_node: extent info node linked in extent tree. Besides, adding new extent cache related fields in f2fs_sb_info. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-03-03 09:58:46 -08:00
Chao Yu	7e4dde79df	f2fs: introduce universal lookup/update interface for extent cache In this patch, we do these jobs: 1. rename {check,update}_extent_cache to {lookup,update}_extent_info; 2. introduce universal lookup/update interface of extent cache: f2fs_{lookup,update}_extent_cache including above two real functions, then export them to function callers. So after above cleanup, we can add new rb-tree based extent cache into exported interfaces. v2: o remove "f2fs_" for inner function {lookup,update}_extent_info suggested by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-03-03 09:58:46 -08:00
Chao Yu	4d0b0bd438	f2fs: simplfy a field name in struct f2fs_extent,extent_info Rename a filed name from 'blk_addr' to 'blk' in struct {f2fs_extent,extent_info} as annotation of this field descripts its meaning well to us. By this way, we can avoid long statement in code of following patches. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-03-03 09:58:45 -08:00
Chao Yu	0c872e2ded	f2fs: move ext_lock out of struct extent_info Move ext_lock out of struct extent_info, then in the following patches we can use variables with struct extent_info type as a parameter to pass pure data. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-03-03 09:58:45 -08:00
Chao Yu	3b4d732a56	f2fs: introduce f2fs_update_dentry to clean up duplicated codes This patch introduces f2fs_update_dentry to remove redundant code in f2fs_add_inline_entry and __f2fs_add_link. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-03-03 09:58:44 -08:00
Chao Yu	1753396a0a	f2fs: remove unused inline_dentry_addr inline_dentry_addr is introduced with inline dentry feature without being used, now we do not need to keep it for any reason, so remove it. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-03-03 09:58:44 -08:00
Jaegeuk Kim	29e7043f40	f2fs: fix sparse warnings This patch resolves the following warnings. include/trace/events/f2fs.h:150:1: warning: expression using sizeof bool include/trace/events/f2fs.h:180:1: warning: expression using sizeof bool include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool include/trace/events/f2fs.h:990:1: warning: expression using sizeof bool include/trace/events/f2fs.h:150:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) include/trace/events/f2fs.h:180:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) include/trace/events/f2fs.h:990:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1) fs/f2fs/checkpoint.c:27:19: warning: symbol 'inode_entry_slab' was not declared. Should it be static? fs/f2fs/checkpoint.c:577:15: warning: cast to restricted __le32 fs/f2fs/checkpoint.c:592:15: warning: cast to restricted __le32 fs/f2fs/trace.c:19:1: warning: symbol 'pids' was not declared. Should it be static? fs/f2fs/trace.c:21:21: warning: symbol 'last_io' was not declared. Should it be static? Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:49 -08:00
Jaegeuk Kim	bba681cbb2	f2fs: introduce a batched trim This patch introduces a batched trimming feature, which submits split discard commands. This is to avoid long latency due to huge trim commands. If fstrim was triggered ranging from 0 to the end of device, we should lock all the checkpoint-related mutexes, resulting in very long latency. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:44 -08:00
Chao Yu	487261f39b	f2fs: merge {invalidate,release}page for meta/node/data pages This patch merges ->{invalidate,release}page function for meta/node/data pages. After this, duplication of codes could be removed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:44 -08:00
Jaegeuk Kim	d24bdcbfc6	f2fs: show the number of writeback pages in stat This patch adds the # of writeback pages in stat info. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:43 -08:00
Jaegeuk Kim	119ee91445	f2fs: split UMOUNT and FASTBOOT flags This patch adds FASTBOOT flag into checkpoint as follows. - CP_UMOUNT_FLAG is set when system is umounted. - CP_FASTBOOT_FLAG is set when intermediate checkpoint having node summaries was done. So, if you get CP_UMOUNT_FLAG from checkpoint, the system was umounted cleanly. Instead, if there was sudden-power-off, you can get CP_FASTBOOT_FLAG or nothing. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:41 -08:00
Chao Yu	caf0047e7e	f2fs: merge flags in struct f2fs_sb_info Currently, there are several variables with Boolean type as below: struct f2fs_sb_info { ... int s_dirty; bool need_fsck; bool s_closing; ... bool por_doing; ... } For this there are some issues: 1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean type variables by compiler. 2. if we continuously add new flag into f2fs_sb_info, structure will be messed up. So in this patch, we try to: 1. switch s_dirty to Boolean type variable since it has two status 0/1. 2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag. 3. introduce an enum type which can indicate different states of sbi. 4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag to operate flags for sbi. After that, above issues will be fixed. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:38 -08:00
Chao Yu	feeb0debfb	f2fs: make truncate_inline_date static 1. make truncate_inline_date static; 2. remove parameter @from of truncate_inline_date as callers only pass zero. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:37 -08:00
Chao Yu	d49f3e8902	f2fs: add F2FS_IOC_GETVERSION support In this patch we add the FS_IOC_GETVERSION ioctl for getting i_generation from inode, after that, users can list file's generation number by using "lsattr -v". Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:35 -08:00
Jaegeuk Kim	30a5537f9a	f2fs: trigger correct checkpoint during umount This patch fixes to trigger checkpoint with umount flag when kill_sb was called. In kill_sb, f2fs_sync_fs was finally called, but at this time, f2fs can't do checkpoint with CP_UMOUNT. After then, f2fs_put_super is not doing checkpoint, since it is not dirty. So, this patch adds a flag to indicate f2fs_sync_fs is called during umount. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:32 -08:00
Jaegeuk Kim	6f0aacbc3c	f2fs: update memory footprint information This patch adds missing memory usages, and splits them in detail. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-02-11 17:04:31 -08:00
Jaegeuk Kim	dd4e4b59b1	f2fs: add nat/sit entries into status This patch adds NAT/SIT entry informations. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:28 -08:00
Jaegeuk Kim	38aa0889b2	f2fs: align direct_io'ed data to section This patch aligns the start block address of a file for direct io to the f2fs's section size. Some flash devices manage an over 4KB-sized page as a write unit, and if the direct_io'ed data are written but not aligned to that unit, the performance can be degraded due to the partial page copies. Thus, since f2fs has a section that is well aligned to FTL units, we can align the block address to the section size so that f2fs avoids this misalignment. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:27 -08:00
Jaegeuk Kim	e1509cf294	f2fs: clean up to remove parameter This patch uses dn->data_blkaddr as a parameter for the destination block address. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:26 -08:00
Chao Yu	062920734c	f2fs: reuse inode_entry_slab in gc procedure for using slab more effectively There are two slab cache inode_entry_slab and winode_slab using the same structure as below: struct dir_inode_entry { struct list_head list; /* list head / struct inode inode; /* vfs inode pointer / }; struct inode_entry { struct list_head list; struct inode inode; }; It's a little waste that the two cache can not share their memory space for each other. So in this patch we remove one redundant winode_slab slab cache, then use more universal name struct inode_entry as remaining data structure name of slab, finally we reuse the inode_entry_slab to store dirty dir item and gc item for more effective. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:26 -08:00
Changman Lee	b9a2c25207	f2fs: add block count by in-place-update in stat info This patch adds block count by in-place-update in stat. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:25 -08:00
Jaegeuk Kim	cf04e8eb55	f2fs: use f2fs_io_info to clean up messy parameters during IO path This patch cleans up parameters on IO paths. The key idea is to use f2fs_io_info adding a parameter, block address, and then use this structure as parameters. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:23 -08:00
Chao Yu	3fa06d7bc9	f2fs: readahead contiguous current summary blocks in checkpoint Let's add readahead code for reading contiguous compact/normal summary blocks in checkpoint, then we will gain better performance in mount procedure. Changes from v1 o remove inappropriate 'unlikely' in npages_for_summary_flush. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:23 -08:00
Jaegeuk Kim	042b7816aa	f2fs: remove unnecessary call to invalidate inmemory pages Now we use inmemory pages for atomic write only and provide abort procedure, we don't need to truncate them explicitly. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:22 -08:00
Jaegeuk Kim	1e84371ffe	f2fs: change atomic and volatile write policies This patch adds two new ioctls to release inmemory pages grabbed by atomic writes. o f2fs_ioc_abort_volatile_write - If transaction was failed, all the grabbed pages and data should be written. o f2fs_ioc_release_volatile_write - This is to enhance the performance of PERSIST mode in sqlite. In order to avoid huge memory consumption which causes OOM, this patch changes volatile writes to use normal dirty pages, instead blocked flushing to the disk as long as system does not suffer from memory pressure. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2015-01-09 17:02:22 -08:00
Chao Yu	635aee1fef	f2fs: avoid to ra unneeded blocks in recover flow To improve recovery speed, f2fs try to readahead many contiguous blocks in warm node segment, but for most time, abnormal power-off do not occur frequently, so when mount a normal power-off f2fs image, by contrary ra so many blocks and then invalid them will hurt the performance of mount. It's better to just ra the first next-block for normal condition. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-12-08 14:19:09 -08:00
Chao Yu	03e14d522e	f2fs: use atomic for counting inode with inline_{dir,inode} flag As inline_{dir,inode} stat is increased/decreased concurrently by multi threads, so the value is not so accurate, let's use atomic type for counting accurately. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-12-08 10:54:59 -08:00
Jaegeuk Kim	8dcf2ff721	f2fs: count the number of inmemory pages This patch adds counting # of inmemory pages in the page cache. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-12-08 10:35:15 -08:00
Jaegeuk Kim	9be32d72be	f2fs: do retry operations with cond_resched This patch revists retrial paths in f2fs. The basic idea is to use cond_resched instead of retrying from the very early stage. Suggested-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-12-08 10:35:05 -08:00
Jaegeuk Kim	8b26ef98da	f2fs: use rw_semaphore for nat entry lock Previoulsy, we used rwlock for nat_entry lock. But, now we have a lot of complex operations in set_node_addr. (e.g., allocating kernel memories, handling radix_trees, and so on) So, this patches tries to change spinlock to rw_semaphore to give CPUs to other threads. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-12-03 21:23:29 -08:00
Jaegeuk Kim	9486ba442b	f2fs: introduce f2fs_dentry_kunmap to clean up This patch introduces f2fs_dentry_kunmap to clean up dirty codes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-23 21:51:53 -08:00
Chao Yu	67298804f3	f2fs: introduce struct inode_management to wrap inner fields Now in f2fs, we have three inode cache: ORPHAN_INO, APPEND_INO, UPDATE_INO, and we manage fields related to inode cache separately in struct f2fs_sb_info for each inode cache type. This makes codes a bit messy, so that this patch intorduce a new struct inode_management to wrap inner fields as following which make codes more neat. /* for inner inode cache management / struct inode_management { struct radix_tree_root ino_root; / ino entry array / spinlock_t ino_lock; / for ino entry lock / struct list_head ino_list; / inode list head / unsigned long ino_num; / number of entries / }; struct f2fs_sb_info { ... struct inode_management im[MAX_INO_ENTRY]; / manage inode cache */ ... } Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-19 22:49:32 -08:00
Jaegeuk Kim	8c402946f0	f2fs: introduce the number of inode entries This patch adds to monitor the number of ino entries. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-06 15:17:43 -08:00
Jaegeuk Kim	d5053a34a9	f2fs: introduce -o fastboot for reducing booting time only If a system wants to reduce the booting time as a top priority, now we can use a mount option, -o fastboot. With this option, f2fs conducts a little bit slow write_checkpoint, but it can avoid the node page reads during the next mount time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-04 17:34:15 -08:00
Jaegeuk Kim	6a8f8ca582	f2fs: avoid race condition in handling wait_io __submit_merged_bio f2fs_write_end_io f2fs_write_end_io wait_io = X wait_io = x complete(X) complete(X) wait_io = NULL wait_for_completion() free(X) spin_lock(X) kernel panic In order to avoid this, this patch removes the wait_io facility. Instead, we can use wait_on_all_pages_writeback(sbi) to wait for end_ios. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-04 17:34:14 -08:00
Jaegeuk Kim	b3d208f96d	f2fs: revisit inline_data to avoid data races and potential bugs This patch simplifies the inline_data usage with the following rule. 1. inline_data is set during the file creation. 2. If new data is requested to be written ranges out of inline_data, f2fs converts that inode permanently. 3. There is no cases which converts non-inline_data inode to inline_data. 4. The inline_data flag should be changed under inode page lock. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-04 17:34:11 -08:00
Gu Zheng	52aca07425	f2fs: rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bit Rename f2fs_set/clear_bit to f2fs_test_and_set/clear_bit, which mean set/clear bit and return the old value, for better readability. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:36 -08:00
Gu Zheng	c6ac4c0ec4	f2fs: introduce f2fs_change_bit to simplify the change bit logic Introduce f2fs_change_bit to simplify the change bit logic in function set_to_next_nat{sit}. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:36 -08:00
Gu Zheng	fa528722d0	f2fs: remove the redundant function cond_clear_inode_flag Use clear_inode_flag to replace the redundant cond_clear_inode_flag. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:36 -08:00
Jaegeuk Kim	062a3e7ba7	f2fs: reuse make_empty_dir code for inline_dentry This patch introduces do_make_empty_dir to mitigate code redundancy for inline_dentry. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:34 -08:00
Jaegeuk Kim	7b3cd7d6f0	f2fs: introduce f2fs_dentry_ptr structure for code clean-up This patch introduces f2fs_dentry_ptr structure for the use of a function parameter in inline_dentry operations. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:34 -08:00
Jaegeuk Kim	38594de767	f2fs: reuse core function in f2fs_readdir for inline_dentry This patch introduces a core function, f2fs_fill_dentries, to remove redundant code in f2fs_readdir and f2fs_read_inline_dir. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:34 -08:00
Jaegeuk Kim	3289c061c5	f2fs: add stat info for inline_dentry inodes This patch adds status information for inline_dentry inodes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:33 -08:00
Jaegeuk Kim	bce8d11207	f2fs: avoid deadlock on init_inode_metadata Previously, init_inode_metadata does not hold any parent directory's inode page. So, f2fs_init_acl can grab its parent inode page without any problem. But, when we use inline_dentry, that page is grabbed during f2fs_add_link, so that we can fall into deadlock condition like below. INFO: task mknod:11006 blocked for more than 120 seconds. Tainted: G OE 3.17.0-rc1+ #13 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mknod D ffff88003fc94580 0 11006 11004 0x00000000 ffff880007717b10 0000000000000002 ffff88003c323220 ffff880007717fd8 0000000000014580 0000000000014580 ffff88003daecb30 ffff88003c323220 ffff88003fc94e80 ffff88003ffbb4e8 ffff880007717ba0 0000000000000002 Call Trace: [<ffffffff8173dc40>] ? bit_wait+0x50/0x50 [<ffffffff8173d4cd>] io_schedule+0x9d/0x130 [<ffffffff8173dc6c>] bit_wait_io+0x2c/0x50 [<ffffffff8173da3b>] __wait_on_bit_lock+0x4b/0xb0 [<ffffffff811640a7>] __lock_page+0x67/0x70 [<ffffffff810acf50>] ? autoremove_wake_function+0x40/0x40 [<ffffffff811652cc>] pagecache_get_page+0x14c/0x1e0 [<ffffffffa029afa9>] get_node_page+0x59/0x130 [f2fs] [<ffffffffa02a63ad>] read_all_xattrs+0x24d/0x430 [f2fs] [<ffffffffa02a6ca2>] f2fs_getxattr+0x52/0xe0 [f2fs] [<ffffffffa02a7481>] f2fs_get_acl+0x41/0x2d0 [f2fs] [<ffffffff8122d847>] get_acl+0x47/0x70 [<ffffffff8122db5a>] posix_acl_create+0x5a/0x150 [<ffffffffa02a7759>] f2fs_init_acl+0x29/0xcb [f2fs] [<ffffffffa0286a8d>] init_inode_metadata+0x5d/0x340 [f2fs] [<ffffffffa029253a>] f2fs_add_inline_entry+0x12a/0x2e0 [f2fs] [<ffffffffa0286ea5>] __f2fs_add_link+0x45/0x4a0 [f2fs] [<ffffffffa028b5b6>] ? f2fs_new_inode+0x146/0x220 [f2fs] [<ffffffffa028b816>] f2fs_mknod+0x86/0xf0 [f2fs] [<ffffffff811e3ec1>] vfs_mknod+0xe1/0x160 [<ffffffff811e4b26>] SyS_mknod+0x1f6/0x200 [<ffffffff81741d7f>] tracesys+0xe1/0xe6 Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:33 -08:00
Jaegeuk Kim	4e6ebf6d49	f2fs: reuse find_in_block code for find_in_inline_dir This patch removes redundant copied code in find_in_inline_dir. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:32 -08:00
Jaegeuk Kim	a82afa2019	f2fs: reuse room_for_filename for inline dentry operation This patch introduces to reuse the existing room_for_filename for inline dentry operation. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:32 -08:00
Chao Yu	201a05be96	f2fs: add key function to handle inline dir Adds Functions to implement inline dir init/lookup/insert/delete/convert ops. Signed-off-by: Chao Yu <chao2.yu@samsung.com> [Jaegeuk Kim: remove needless reserved area copy, pointed by Dan Carpenter] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:31 -08:00
Chao Yu	dbeacf02eb	f2fs: export dir operations for inline dir This patch exports some dir operations for inline dir, additionally introduces f2fs_drop_nlink from f2fs_delete_entry for reusing by inline dir function. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:31 -08:00
Chao Yu	34d67debe0	f2fs: add infra struct and helper for inline dir This patch defines macro/inline dentry structure, and adds some helpers for inline dir infrastructure. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:31 -08:00
Jaegeuk Kim	cbcb2872e3	f2fs: invalidate inmemory page If user truncates file's data, we should truncate inmemory pages too. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:29 -08:00
Jaegeuk Kim	34ba94bac9	f2fs: do not make dirty any inmemory pages This patch let inmemory pages be clean all the time. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-11-03 16:07:29 -08:00
Jaegeuk Kim	02a1335f25	f2fs: support volatile operations for transient data This patch adds support for volatile writes which keep data pages in memory until f2fs_evict_inode is called by iput. For instance, we can use this feature for the sqlite database as follows. While supporting atomic writes for main database file, we can keep its journal data temporarily in the page cache by the following sequence. 1. open -> ioctl(F2FS_IOC_START_VOLATILE_WRITE); 2. writes : keep all the data in the page cache. 3. flush to the database file with atomic writes a. ioctl(F2FS_IOC_START_ATOMIC_WRITE); b. writes c. ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE); 4. close -> drop the cached data Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-10-07 11:54:41 -07:00
Jaegeuk Kim	88b88a6679	f2fs: support atomic writes This patch introduces a very limited functionality for atomic write support. In order to support atomic write, this patch adds two ioctls: o F2FS_IOC_START_ATOMIC_WRITE o F2FS_IOC_COMMIT_ATOMIC_WRITE The database engine should be aware of the following sequence. 1. open -> ioctl(F2FS_IOC_START_ATOMIC_WRITE); 2. writes : all the written data will be treated as atomic pages. 3. commit -> ioctl(F2FS_IOC_COMMIT_ATOMIC_WRITE); : this flushes all the data blocks to the disk, which will be shown all or nothing by f2fs recovery procedure. 4. repeat to #2. The IO pattens should be: ,- START_ATOMIC_WRITE ,- COMMIT_ATOMIC_WRITE CP \| D D D D D D \| FSYNC \| D D D D \| FSYNC ... `- COMMIT_ATOMIC_WRITE Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-10-06 17:39:50 -07:00
Jaegeuk Kim	44c1615651	f2fs: call f2fs_unlock_op after error was handled This patch relocates f2fs_unlock_op in every directory operations to be called after any error was processed. Otherwise, the checkpoint can be entered with valid node ids without its dentry when -ENOSPC is occurred. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-30 15:34:55 -07:00
Jaegeuk Kim	309cc2b6e7	f2fs: refactor flush_nat_entries to remove costly reorganizing ops Previously, f2fs tries to reorganize the dirty nat entries into multiple sets according to its nid ranges. This can improve the flushing nat pages, however, if there are a lot of cached nat entries, it becomes a bottleneck. This patch introduces a new set management flow by removing dirty nat list and adding a series of set operations when the nat entry becomes dirty. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-30 15:30:41 -07:00
Jaegeuk Kim	4b2fecc846	f2fs: introduce FITRIM in f2fs_ioctl This patch introduces FITRIM in f2fs_ioctl. In this case, f2fs will issue small discards and prefree discards as many as possible for the given area. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-30 15:06:09 -07:00
Jaegeuk Kim	75ab4cb830	f2fs: introduce cp_control structure This patch add a new data structure to control checkpoint parameters. Currently, it presents the reason of checkpoint such as is_umount and normal sync. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-30 15:01:28 -07:00
Jaegeuk Kim	c52e1b10b1	f2fs: remove redundant operation during roll-forward recovery If same data is updated multiple times, we don't need to redo whole the operations. Let's just update the lastest one. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:17 -07:00
Jaegeuk Kim	88bd02c947	f2fs: fix conditions to remain recovery information in f2fs_sync_file This patch revisited whole the recovery information during the f2fs_sync_file. In this patch, there are three information to make a decision. a) IS_CHECKPOINTED, /* is it checkpointed before? / b) HAS_FSYNCED_INODE, / is the inode fsynced before? / c) HAS_LAST_FSYNC, / has the latest node fsync mark? */ And, the scenarios for our rule are based on: [Term] F: fsync_mark, D: dentry_mark 1. inode(x) \| CP \| inode(x) \| dnode(F) 2. inode(x) \| CP \| inode(F) \| dnode(F) 3. inode(x) \| CP \| dnode(F) \| inode(x) \| inode(F) 4. inode(x) \| CP \| dnode(F) \| inode(F) 5. CP \| inode(x) \| dnode(F) \| inode(DF) 6. CP \| inode(DF) \| dnode(F) 7. CP \| dnode(F) \| inode(DF) 8. CP \| dnode(F) \| inode(x) \| inode(DF) For example, #3, the three conditions should be changed as follows. inode(x) \| CP \| dnode(F) \| inode(x) \| inode(F) a) x o o o o b) x x x x o c) x o o x o If f2fs_sync_file stops ------^, it should write inode(F) --------------^ So, the need_inode_block_update should return true, since c) get_nat_flag(e, HAS_LAST_FSYNC), is false. For example, #8, CP \| alloc \| dnode(F) \| inode(x) \| inode(DF) a) o x x x x b) x x x o c) o o x o If f2fs_sync_file stops -------^, it should write inode(DF) --------------^ Note that, the roll-forward policy should follow this rule, which means, if there are any missing blocks, we doesn't need to recover that inode. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:15 -07:00
Jaegeuk Kim	4c521f493b	f2fs: use meta_inode cache to improve roll-forward speed Previously, all the dnode pages should be read during the roll-forward recovery. Even worsely, whole the chain was traversed twice. This patch removes that redundant and costly read operations by using page cache of meta_inode and readahead function as well. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-23 11:10:12 -07:00
Jaegeuk Kim	c1ce1b02bb	f2fs: give an option to enable in-place-updates during fsync to users If user wrote F2FS_IPU_FSYNC:4 in /sys/fs/f2fs/ipu_policy, f2fs_sync_file only starts to try in-place-updates. And, if the number of dirty pages is over /sys/fs/f2fs/min_fsync_blocks, it keeps out-of-order manner. Otherwise, it triggers in-place-updates. This may be used by storage showing very high random write performance. For example, it can be used when, Seq. writes (Data) + wait + Seq. writes (Node) is pretty much slower than, Rand. writes (Data) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-16 04:10:44 -07:00
Jaegeuk Kim	a7ffdbe22c	f2fs: expand counting dirty pages in the inode page cache Previously f2fs only counts dirty dentry pages, but there is no reason not to expand the scope. This patch changes the names on the management of dirty pages and to count dirty pages in each inode info as well. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-16 04:10:39 -07:00
Gu Zheng	721bd4d5c3	f2fs: use lock-less list(llist) to simplify the flush cmd management We use flush cmd control to collect many flush cmds, and flush them together. In this case, we use two list to manage the flush cmds (collect and dispatch), and one spin lock is used to protect this. In fact, the lock-less list(llist) is very suitable to this case, and we use simplify this routine. - v2: -use llist_for_each_entry_safe to fix possible use-after-free issue. -remove the unused field from struct flush_cmd. Thanks for Yu's suggestion. - Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-09 13:15:06 -07:00
Chao Yu	184a5cd2ce	f2fs: refactor flush_sit_entries codes for reducing SIT writes In commit `aec71382c6` ("f2fs: refactor flush_nat_entries codes for reducing NAT writes"), we descripte the issue as below: "Although building NAT journal in cursum reduce the read/write work for NAT block, but previous design leave us lower performance when write checkpoint frequently for these cases: 1. if journal in cursum has already full, it's a bit of waste that we flush all nat entries to page for persistence, but not to cache any entries. 2. if journal in cursum is not full, we fill nat entries to journal util journal is full, then flush the left dirty entries to disk without merge journaled entries, so these journaled entries may be flushed to disk at next checkpoint but lost chance to flushed last time." Actually, we have the same problem in using SIT journal area. In this patch, firstly we will update sit journal with dirty entries as many as possible. Secondly if there is no space in sit journal, we will remove all entries in journal and walk through the whole dirty entry bitmap of sit, accounting dirty sit entries located in same SIT block to sit entry set. All entry sets are linked to list sit_entry_set in sm_info, sorted ascending order by count of entries in set. Later we flush entries in set which have fewest entries into journal as many as we can, and then flush dense set with merged entries to disk. In this way we can use sit journal area more effectively, also we will reduce SIT update, result in gaining in performance and saving lifetime of flash device. In my testing environment, it shows this patch can help to reduce SIT block update obviously. virtual machine + hard disk: fsstress -p 20 -n 400 -l 5 sit page num cp count sit pages/cp based 2006.50 1349.75 1.486 patched 1566.25 1463.25 1.070 Our latency of merging op is small when handling a great number of dirty SIT entries in flush_sit_entries: latency(ns) dirty sit count 36038 2151 49168 2123 37174 2232 Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-09 13:15:05 -07:00
Jaegeuk Kim	9850cf4a89	f2fs: need fsck.f2fs when f2fs_bug_on is triggered If any f2fs_bug_on is triggered, fsck.f2fs is needed. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-09 13:15:02 -07:00
Jaegeuk Kim	2ae4c673e3	f2fs: retain inconsistency information to initiate fsck.f2fs This patch adds sbi->need_fsck to conduct fsck.f2fs later. This flag can only be removed by fsck.f2fs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-09 13:14:25 -07:00
Jaegeuk Kim	4081363fbe	f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB This patch adds three inline functions to clean up dirty casting codes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-09-03 17:37:13 -07:00
Jaegeuk Kim	202095a7a0	f2fs: remove rewrite_node_page I think we need to let the dirty node pages remain in the page cache instead of rewriting them in their places. So, after done with successful recovery, write_checkpoint will flush all of them through the normal write path. Through this, we can avoid potential error cases in terms of block allocation. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-21 13:57:02 -07:00
Jaegeuk Kim	764aa3e978	f2fs: avoid double lock in truncate_blocks The init_inode_metadata calls truncate_blocks when error is occurred. The callers holds f2fs_lock_op, so we should not call it again in truncate_blocks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-21 13:57:01 -07:00
Jaegeuk Kim	b3fe0a0da2	f2fs: add WARN_ON in f2fs_bug_on This patch adds WARN_ON when f2fs_bug_on is disable to see kernel messages. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-21 13:56:59 -07:00
Jaegeuk Kim	1e968fdfe6	f2fs: introduce f2fs_cp_error for readability This patch adds f2fs_cp_error for readability. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-21 09:21:00 -07:00
Jaegeuk Kim	6f12ac25f0	f2fs: trigger release_dirty_inode in f2fs_put_super The generic_shutdown_super calls sync_filesystem, evict_inode, and then f2fs_put_super. In f2fs_evict_inode, we remain some dirty inode information so we should release them at f2fs_put_super. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-21 09:20:29 -07:00
Jaegeuk Kim	1c35a90e8a	f2fs: fix to recover inline_xattr/data and blocks This patch fixes not to skip xattr recovery and inline xattr/data recovery order. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-19 10:01:34 -07:00
Jaegeuk Kim	0342fd301a	f2fs: make clear on test condition and return types This patch adds a parentheses to make clear for condition check. And also it changes the return type for better meanings. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-19 10:01:33 -07:00
Jaegeuk Kim	b067ba1f1b	f2fs: should convert inline_data during the mkwrite If mkwrite is called to an inode having inline_data, it can overwrite the data index space as NEW_ADDR. (e.g., the first 4 bytes are coincidently zero) Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-19 10:01:33 -07:00
arter97	e1c4204520	f2fs: fix typo Fix typo and some grammatical errors. The words "filesystem" and "readahead" are being used without the space treewide. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-19 10:01:33 -07:00
Chao Yu	70cfed88ef	f2fs: avoid skipping recover_inline_xattr after recover_inline_data When we recover data of inode in roll-forward procedure, and the inode has both inline data and inline xattr. We may skip recovering inline xattr if we recover inline data form node page first. This patch will fix the problem that we lost inline xattr data in above scenario. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-08-02 07:43:51 -07:00
Chao Yu	b3582c6892	f2fs: reduce competition among node page writes We do not need to block on ->node_write among different node page writers e.g. fsync/flush, unless we have a node page writer from write_checkpoint. So it's better use rw_semaphore instead of mutex type for ->node_write to promote performance. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 23:28:37 -07:00
Jaegeuk Kim	65b85ccce0	f2fs: fix coding style This patch fixes wrong coding style. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 17:25:54 -07:00
Jaegeuk Kim	cf2271e781	f2fs: avoid retrying wrong recovery routine when error was occurred This patch eliminates the propagation of recovery errors to the next mount. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:35 -07:00
Jaegeuk Kim	61e0f2d0a5	f2fs: test before set/clear bits If the bit is already set, we don't need to reset it, and vice versa. Because we don't need to make the caches dirty for that. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:35 -07:00
Jaegeuk Kim	ea1aa12ca2	f2fs: enable in-place-update for fdatasync This patch enforces in-place-updates only when fdatasync is requested. If we adopt this in-place-updates for the fdatasync, we can skip to write the recovery information. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-30 14:13:23 -07:00
Jaegeuk Kim	fff04f90c1	f2fs: add info of appended or updated data writes This patch introduces a inode number list in which represents inodes having appended data writes or updated data writes after last checkpoint. This will be used at fsync to determine whether the recovery information should be written or not. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:46:11 -07:00
Jaegeuk Kim	39efac41fb	f2fs: use radix_tree for ino management For better ino management, this patch replaces the data structure from list to radix tree. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:46:11 -07:00
Jaegeuk Kim	6451e041c8	f2fs: add infra for ino management This patch changes the naming of orphan-related data structures to use as inode numbers managed globally. Later, we can use this facility for managing any inode number lists. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 07:45:54 -07:00
Jaegeuk Kim	0f7b2abd18	f2fs: add nobarrier mount option This patch adds a mount option, nobarrier, in f2fs. The assumption in here is that file system keeps the IO ordering, but doesn't care about cache flushes inside the storages. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-29 05:27:48 -07:00
Gu Zheng	4b2868aa4f	f2fs: remove the unused stat_lock Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-11 15:01:48 -07:00
Gu Zheng	eee6160f2e	f2fs: arguments cleanup of finding file flow functions Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-09 14:04:26 -07:00
Chao Yu	aec71382c6	f2fs: refactor flush_nat_entries codes for reducing NAT writes Although building NAT journal in cursum reduce the read/write work for NAT block, but previous design leave us lower performance when write checkpoint frequently for these cases: 1. if journal in cursum has already full, it's a bit of waste that we flush all nat entries to page for persistence, but not to cache any entries. 2. if journal in cursum is not full, we fill nat entries to journal util journal is full, then flush the left dirty entries to disk without merge journaled entries, so these journaled entries may be flushed to disk at next checkpoint but lost chance to flushed last time. In this patch we merge dirty entries located in same NAT block to nat entry set, and linked all set to list, sorted ascending order by entries' count of set. Later we flush entries in sparse set into journal as many as we can, and then flush merged entries to disk. In this way we can not only gain in performance, but also save lifetime of flash device. In my testing environment, it shows this patch can help to reduce NAT block writes obviously. In hard disk test case: cost time of fsstress is stablely reduced by about 5%. 1. virtual machine + hard disk: fsstress -p 20 -n 200 -l 5 node num cp count nodes/cp based 4599.6 1803.0 2.551 patched 2714.6 1829.6 1.483 2. virtual machine + 32g micro SD card: fsstress -p 20 -n 200 -l 1 -w -f chown=0 -f creat=4 -f dwrite=0 -f fdatasync=4 -f fsync=4 -f link=0 -f mkdir=4 -f mknod=4 -f rename=5 -f rmdir=5 -f symlink=0 -f truncate=4 -f unlink=5 -f write=0 -S node num cp count nodes/cp based 84.5 43.7 1.933 patched 49.2 40.0 1.23 Our latency of merging op shows not bad when handling extreme case like: merging a great number of dirty nats: latency(ns) dirty nat count 3089219 24922 5129423 27422 4000250 24523 change log from v1: o fix wrong logic in add_nat_entry when grab a new nat entry set. o swith to create slab cache in create_node_manager_caches. o use GFP_ATOMIC instead of GFP_NOFS to avoid potential long latency. change log from v2: o make comment position more appropriate suggested by Jaegeuk Kim. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-09 14:04:25 -07:00
Jaegeuk Kim	a014e037be	f2fs: clean up an unused parameter and assignment This patch cleans up simple unnecessary codes. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-09 14:04:25 -07:00
Jaegeuk Kim	b97a9b5da8	f2fs: introduce f2fs_do_tmpfile for code consistency This patch adds f2fs_do_tmpfile to eliminate the redundant init_inode_metadata flow. Throught this, we can provide the consistent lock usage, e.g., fi->i_sem, and this will enable better debugging stuffs. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-09 14:04:24 -07:00
Chao Yu	d6b7d4b31d	f2fs: check lower bound nid value in check_nid_range This patch add lower bound verification for nid in check_nid_range, so nids reserved like 0, node, meta passed by caller could be checked there. And then check_nid_range could be used in f2fs_nfs_get_inode for simplifying code. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-09 05:58:08 -07:00
Chao Yu	8bc6f60e3f	f2fs: remove unused variables in f2fs_sm_info Remove unused variables in struct f2fs_sm_info. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-07-09 05:57:57 -07:00
Jaegeuk Kim	9ab7013492	f2fs: support f2fs_fiemap This patch links f2fs_fiemap with generic function with get_block. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-06-08 08:56:49 +09:00
Jaegeuk Kim	b6fe5873cb	f2fs: fix to recover data written by dio If data are overwritten through dio, previous f2fs doesn't remain the fsync mark due to no additional node writes. Note that this patch should resolve the xfstests:311. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2014-06-04 18:41:38 +09:00

... 3 4 5 6 7 ...

597 Commits