linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-11 18:26:41 +07:00

Author	SHA1	Message	Date
Jaegeuk Kim	3cb5ad152b	f2fs: call f2fs_wait_on_page_writeback instead of native function If a page is on writeback, f2fs can face with deadlock due to under writepages. This is caused by merging IOs inside f2fs, so if it comes to detect, let's throw merged IOs, which is implemented by f2fs_wait_on_page_writeback. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-20 22:10:04 +09:00
Jaegeuk Kim	c81bf1c84f	f2fs: fix to write node pages with WRITE_SYNC This patch fixes performance regression of dbench reported by Alex <hbx7d@yandex.com>. This issue was revealed by Phoronix tests results: http://www.phoronix.com/scan.php?page=article&item=linux_314_ssdfs&num=2 It turns out that we need to assign WRITE_SYNC to the node writes, if fsync is triggered. The performance numbers are like below, which is measured by Alex. 1. 355MB/s ext4 2. 225MB/s f2fs : WRITE for node writes 3. 525MB/s f2fs : WRITE_SYNC for node writes Reported-And-Tested-by: Alex <hbx7d@yandex.com>. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-03-03 11:28:40 +09:00
Chao Yu	3375f696bd	f2fs: use inode mutex to keep atomicity of f2fs_falloc Previously without protection of inode mutex, f2fs_falloc and other data correlated operations will interfere with each other. So let's use inode mutex to keep atomicity of f2fs_falloc. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-02-17 14:58:53 +09:00
Linus Torvalds	bf3d846b78	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted stuff; the biggest pile here is Christoph's ACL series. Plus assorted cleanups and fixes all over the place... There will be another pile later this week" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (43 commits) __dentry_path() fixes vfs: Remove second variable named error in __dentry_path vfs: Is mounted should be testing mnt_ns for NULL or error. Fix race when checking i_size on direct i/o read hfsplus: remove can_set_xattr nfsd: use get_acl and ->set_acl fs: remove generic_acl nfs: use generic posix ACL infrastructure for v3 Posix ACLs gfs2: use generic posix ACL infrastructure jfs: use generic posix ACL infrastructure xfs: use generic posix ACL infrastructure reiserfs: use generic posix ACL infrastructure ocfs2: use generic posix ACL infrastructure jffs2: use generic posix ACL infrastructure hfsplus: use generic posix ACL infrastructure f2fs: use generic posix ACL infrastructure ext2/3/4: use generic posix ACL infrastructure btrfs: use generic posix ACL infrastructure fs: make posix_acl_create more useful fs: make posix_acl_chmod more useful ...	2014-01-28 08:38:04 -08:00
Christoph Hellwig	a6dda0e63e	f2fs: use generic posix ACL infrastructure f2fs has some weird mode bit handling, so still using the old chmod code for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-01-25 23:58:19 -05:00
Chris Fries	6c311ec6c2	f2fs: clean checkpatch warnings Fixed a variety of trivial checkpatch warnings. The only delta should be some minor formatting on log strings that were split / too long. Signed-off-by: Chris Fries <cfries@motorola.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-20 10:27:12 +09:00
Jaegeuk Kim	fb5566da91	f2fs: improve write performance under frequent fsync calls When considering a bunch of data writes with very frequent fsync calls, we are able to think the following performance regression. N: Node IO, D: Data IO, IO scheduler: cfq Issue pending IOs D1 D2 D3 D4 D1 D2 D3 D4 N1 D2 D3 D4 N1 N2 N1 D3 D4 N2 D1 --> N1 can be selected by cfq becase of the same priority of N and D. Then D3 and D4 would be delayed, resuling in performance degradation. So, when processing the fsync call, it'd better give higher priority to data IOs than node IOs by assigning WRITE and WRITE_SYNC respectively. This patch improves the random wirte performance with frequent fsync calls by up to 10%. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-08 11:16:20 +09:00
Jaegeuk Kim	1e1bb4baf1	f2fs: add inline_data recovery routine This patch adds a inline_data recovery routine with the following policy. [prev.] [next] of inline_data flag o o -> recover inline_data o x -> remove inline_data, and then recover data blocks x o -> remove inline_data, and then recover inline_data x x -> recover data blocks Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:20 +09:00
Jaegeuk Kim	9e09fc855d	f2fs: refactor f2fs_convert_inline_data Change log from v1: o handle NULL pointer of grab_cache_page_write_begin() pointed by Chao Yu. This patch refactors f2fs_convert_inline_data to check a couple of conditions internally for deciding whether it needs to convert inline_data or not. So, the new f2fs_convert_inline_data initially checks: 1) f2fs_has_inline_data(), and 2) the data size to be changed. If the inode has inline_data but the size to fill is less than MAX_INLINE_DATA, then we don't need to convert the inline_data with data allocation. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:19 +09:00
Jaegeuk Kim	8230a0a49f	f2fs: convert inline_data for punch_hole In the punch_hole(), let's convert inline_data all the time for simplicity and to avoid potential deadlock conditions. It is pretty much not a big deal to do this. Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2014-01-06 16:42:12 +09:00
Jaegeuk Kim	f185ff979f	f2fs: don't need to get f2fs_lock_op for the inline_data test This patch locates checking the inline_data prior to calling f2fs_lock_op() in truncate_blocks(), since getting the lock is unnecessary. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-27 12:40:41 +09:00
Huajun Li	9ffe0fb5f3	f2fs: handle inline data operations Hook inline data read/write, truncate, fallocate, setattr, etc. Files need meet following 2 requirement to inline: 1) file size is not greater than MAX_INLINE_DATA; 2) file doesn't pre-allocate data blocks by fallocate(). FI_INLINE_DATA will not be set while creating a new regular inode because most of the files are bigger than ~3.4K. Set FI_INLINE_DATA only when data is submitted to block layer, ranther than set it while creating a new inode, this also avoids converting data from inline to normal data block and vice versa. While writting inline data to inode block, the first data block should be released if the file has a block indexed by i_addr[0]. On the other hand, when a file operation is appied to a file with inline data, we need to test if this file can remain inline by doing this operation, otherwise it should be convert into normal file by reserving a new data block, copying inline data to this new block and clear FI_INLINE_DATA flag. Because reserve a new data block here will make use of i_addr[0], if we save inline data in i_addr[0..872], then the first 4 bytes would be overwriten. This problem can be avoided simply by not using i_addr[0] for inline data. Signed-off-by: Huajun Li <huajun.li@intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Weihong Xu <weihong.xu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-26 20:40:41 +09:00
Jaegeuk Kim	6bacf52fb5	f2fs: add unlikely() macro for compiler more aggressively This patch adds unlikely() macro into the most of codes. The basic rule is to add that when: - checking unusual errors, - checking page mappings, - and the other unlikely conditions. Change log from v1: - Don't add unlikely for the NULL test and error test: advised by Andi Kleen. Cc: Chao Yu <chao2.yu@samsung.com> Cc: Andi Kleen <andi@firstfloor.org> Reviewed-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:06 +09:00
Chao Yu	a66c7b2fcf	f2fs: remove unneeded code in punch_hole Because FALLOC_FL_PUNCH_HOLE flag must be ORed with FALLOC_FL_KEEP_SIZE in fallocate, so we could remove the useless 'keep size' branch code which will never be excuted in punch_hole. Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Fan Li <fanofcode.li@samsung.com> [Jaegeuk Kim: remove an unnecessary parameter togather] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:04 +09:00
Huajun Li	b600965c43	f2fs: add a new function: f2fs_reserve_block() Add the function f2fs_reserve_block() to easily reserve new blocks, and use it to clean up more codes. Signed-off-by: Huajun Li <huajun.li@intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Weihong Xu <weihong.xu@intel.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-12-23 10:18:03 +09:00
Jaegeuk Kim	cfe58f9dcd	f2fs: avoid to wait all the node blocks during fsync Previously, f2fs_sync_file() waits for all the node blocks to be written. But, we don't need to do that, but wait only the inode-related node blocks. This patch adds wait_on_node_pages_writeback() in which waits inode-related node blocks that are on writeback. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-31 16:01:03 +09:00
Jaegeuk Kim	5d56b6718a	f2fs: add an option to avoid unnecessary BUG_ONs If you want to remove unnecessary BUG_ONs, you can just turn off F2FS_CHECK_FS in your kernel config. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-29 15:44:38 +09:00
Jaegeuk Kim	e943a10d94	f2fs: add tracepoint for vm_page_mkwrite This patch adds a tracepoint for f2fs_vm_page_mkwrite. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-25 16:54:40 +09:00
Gu Zheng	e479556bfd	f2fs: use rw_sem instead of fs_lock(locks mutex) The fs_locks is used to block other ops(ex, recovery) when doing checkpoint. And each other operate routine(besides checkpoint) needs to acquire a fs_lock, there is a terrible problem here, if these are too many concurrency threads acquiring fs_lock, so that they will block each other and may lead to some performance problem, but this is not the phenomenon we want to see. Though there are some optimization patches introduced to enhance the usage of fs_lock, but the thorough solution is using a rw_sem to replace the fs_lock. Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other, this can avoid the problem described above completely. Because of the weakness of rw_sem, the above change may introduce a potential problem that the checkpoint thread might get starved if other threads are intensively locking the read semaphore for I/O.(Pointed out by Xu Jin) In order to avoid this, a wait_list is introduced, the appending read semaphore ops will be dropped into the wait_list if checkpoint thread is waiting for write semaphore, and will be waked up when checkpoint thread gives up write semaphore. Thanks to Kim's previous review and test, and will be very glad to see other guys' performance tests about this patch. V2: -fix the potential starvation problem. -use more suitable func name suggested by Xu Jin. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> [Jaegeuk Kim: adjust minor coding standard] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-10-07 11:33:05 +09:00
Jaegeuk Kim	de93653fe3	f2fs: reserve the xattr space dynamically This patch enables the number of direct pointers inside on-disk inode block to be changed dynamically according to the size of inline xattr space. The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file has inline xattr flag. The number of direct pointers that will be used by inline xattrs is defined as F2FS_INLINE_XATTR_ADDRS. Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-26 20:15:01 +09:00
Jaegeuk Kim	d71b5564c0	f2fs: introduce cur_cp_version function to reduce code size This patch introduces a new inline function, cur_cp_version, to reduce redundant codes. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-09 15:25:37 +09:00
Jaegeuk Kim	e518ff81c3	f2fs: fix inconsistency between xattr node blocks and its inode Previously xattr node blocks are stored to the COLD_NODE log, which means that our roll-forward mechanism doesn't recover the xattr node blocks at all. Only the direct node blocks in the WARM_NODE log can be recovered. So, let's resolve the issue simply by conducting checkpoint during fsync when a file has a modified xattr node block. This approach is able to degrade the performance, but normally the checkpoint overhead is shown at the initial fsync call after the xattr entry changes. Once the checkpoint is done, no additional overhead would be occurred. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-08-09 15:25:24 +09:00
Jaegeuk Kim	f0947e5cce	f2fs: fix i_name during f2fs_sync_file As similar as the i_pino fix, i_name also should be fixed when i_nlink is 1. The errorneous scenario is like this. 1. touch test1 2. link test1 test2 3. unlink test2 4. fsync test1 After this, i_name should be test1. CC: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:03 +09:00
Gu Zheng	4559071063	f2fs: introduce help function F2FS_NODE() Introduce help function F2FS_NODE() to simplify the conversion of node_page to f2fs_node. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:02 +09:00
Jaegeuk Kim	e5d2385ed6	f2fs: recover date requested by fdatasync In order to support SQLite that uses fdatasync instead of fsync, we should guarantee the data requested by fdatasync can be recovered after sudden-power- off. So, let's remove the fdatasync condition in f2fs_sync_file. Otherwise, we can restore the data after sudden-power-off due to nonexistence of any fsync mark'ed node blocks. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-07-30 15:17:02 +09:00
Jaegeuk Kim	354a3399dc	f2fs: recover wrong pino after checkpoint during fsync If a file is linked, f2fs loose its parent inode number so that fsync calls for the linked file should do checkpoint all the time. But, if we can recover its parent inode number after the checkpoint, we can adjust roll-forward mechanism for the further fsync calls, which is able to improve the fsync performance significatly. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-14 09:04:45 +09:00
Jaegeuk Kim	b3783873cc	f2fs: avoid freqeunt write_inode calls If update_inode is called, we don't need to do write_inode. So, let's use a dirty flag for each inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-14 09:04:43 +09:00
Namjae Jeon	d7cc950b4c	f2fs: optimise the truncate_data_blocks_range() range The function truncate_data_blocks_range() decrements the valid block count of inode via dec_valid_block_count(). Since this function updates the i_blocks field of inode, we can update this field once we have calculated total the number of blocks to be freed. Therefore we can decrement valid blocks outside of the for loop. if (nr_free) { + dec_valid_block_count(sbi, dn->inode, nr_free); set_page_dirty(dn->node_page); sync_inode_page(dn); } 'nr_free' tells the total number of blocks freed. So, we can just directly pass this value to dec_valid_block_count() and update the i_blocks. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-14 09:04:42 +09:00
Namjae Jeon	6a3e8ef0de	f2fs: use the F2FS specific flags in f2fs_ioctl() In f2fs_ioctl() function, it is using generic flags. Since F2FS specific flags are defined. So lets use those flags. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-14 09:04:26 +09:00
Jaegeuk Kim	2d4d9fb591	f2fs: fix i_blocks translation on various types of files Basically an inode manages the number of allocated blocks with inode->i_blocks which is represented in a unit of sectors, not file system blocks. But, f2fs has used i_blocks in a unit of file system blocks, and f2fs_getattr translates it to the number of sectors when fstat is called. However, previously f2fs_file_inode_operations only has this, so this patch adds it to all the types of inode_operations. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-06-11 16:01:09 +09:00
Jaegeuk Kim	b292dcab06	f2fs: reuse the locked dnode page and its inode This patch fixes the following deadlock bug during the recovery. INFO: task mount:1322 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount D ffffffff81125870 0 1322 1266 0x00000000 ffff8801207e39d8 0000000000000046 ffff88012ab1dee0 0000000000000046 ffff8801207e3a08 ffff880115903f40 ffff8801207e3fd8 ffff8801207e3fd8 ffff8801207e3fd8 ffff880115903f40 ffff8801207e39d8 ffff88012fc94520 Call Trace: [<ffffffff81125870>] ? __lock_page+0x70/0x70 [<ffffffff816a92d9>] schedule+0x29/0x70 [<ffffffff816a93af>] io_schedule+0x8f/0xd0 [<ffffffff8112587e>] sleep_on_page+0xe/0x20 [<ffffffff816a649a>] __wait_on_bit_lock+0x5a/0xc0 [<ffffffff81125867>] __lock_page+0x67/0x70 [<ffffffff8106c7b0>] ? autoremove_wake_function+0x40/0x40 [<ffffffff81126857>] find_lock_page+0x67/0x80 [<ffffffff8112698f>] find_or_create_page+0x3f/0xb0 [<ffffffffa03901a8>] ? sync_inode_page+0xa8/0xd0 [f2fs] [<ffffffffa038fdf7>] get_node_page+0x67/0x180 [f2fs] [<ffffffffa039818b>] recover_fsync_data+0xacb/0xff0 [f2fs] [<ffffffff816aaa1e>] ? _raw_spin_unlock+0x3e/0x40 [<ffffffffa0389634>] f2fs_fill_super+0x7d4/0x850 [f2fs] [<ffffffff81184cf9>] mount_bdev+0x1c9/0x210 [<ffffffffa0388e60>] ? validate_superblock+0x180/0x180 [f2fs] [<ffffffffa0387635>] f2fs_mount+0x15/0x20 [f2fs] [<ffffffff81185a13>] mount_fs+0x43/0x1b0 [<ffffffff81145ba0>] ? __alloc_percpu+0x10/0x20 [<ffffffff811a0796>] vfs_kern_mount+0x76/0x120 [<ffffffff811a2cb7>] do_mount+0x237/0xa10 [<ffffffff81140b9b>] ? strndup_user+0x5b/0x80 [<ffffffff811a3520>] SyS_mount+0x90/0xe0 [<ffffffff816b3502>] system_call_fastpath+0x16/0x1b The bug is triggered when check_index_in_prev_nodes tries to get the direct node page by calling get_node_page. At this point, if the direct node page is already locked by get_dnode_of_data, its caller, we got a deadlock condition. This patch adds additional condition check for the reuse of locked direct node pages prior to the get_node_page call. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:04 +09:00
Jaegeuk Kim	77888c1e42	f2fs: add f2fs_readonly() Introduce a simple macro function for readability. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:03 +09:00
Namjae Jeon	9851e6e189	f2fs: reorganize f2fs_vm_page_mkwrite Few things can be changed in the default mkwrite function 1) Make file_update_time at the start before acquiring any lock 2) the condition page_offset(page) >= i_size_read(inode) should be changed to page_offset(page) > i_size_read 3) Move wait_on_page_writeback. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:02 +09:00
Jaegeuk Kim	64aa7ed98d	f2fs: change get_new_data_page to pass a locked node page This patch is for passing a locked node page to get_dnode_of_data. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-05-28 15:03:01 +09:00
Linus Torvalds	942d33da99	f2fs updates for v3.10 This patch-set includes the following major enhancement patches. o introduce a new gloabl lock scheme o add tracepoints on several major functions o fix the overall cleaning process focused on victim selection o apply the block plugging to merge IOs as much as possible o enhance management of free nids and its list o enhance the readahead mode for node pages o address several cretical deadlock conditions o reduce lock_page calls The other minor bug fixes and enhancements are as follows. o calculation mistakes: overflow o bio types: READ, READA, and READ_SYNC o fix the recovery flow, data races, and null pointer errors -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJRijCLAAoJEEAUqH6CSFDSg9kQAIqxmQzCUvCN3HcyVe8bGhKz 8xhKrAY6ySRCKMuBbFRQsNrXUhckE3A44DgzYm5/gQikr/c8zhbqPVrtZ968eCKb wm3J+Re/uwZr5eOXlJEaHIiSkMDtERN7Cu2oYJWZi2B9wCSZcgvoWQ3c3LUVk6yF GFdi1Y00ll5tFKbEGbXSsfdul9P8jp0MmuMnWBBQZF3TrjETXMdThA5FXN0yTf9s XkcGE9vTCCPk8p7P3YmGGw6CwlaL8oallm0//iL4nMNpJzveq2C09IlY2BNrxU3L iTNXeIBdbhwXpnh2zq26Cy+cIEDIp0oXYui5BYdr/LWyWU3T/INa+hjUUszsESxF 51LIUA1rA9nX/BSmj2QomswZ3lt4u5jl6rSBFKv3NG1KsFrAdb8S4tHukRSTSxAJ gzpY6kLT1+bgciA16F5W4yhzMYPN5hPa8s6hx4LHlpoqQICQsurjtS9KW7vncLFt ttmCMn8ehHcTzKRNNqYaBerCtSB3Z3G/uAy1y+DB7Zx2h2mqhCBXRalyRvs7RKvK d5OyYCpHntxuzDwVuivnr9Ddp30LUP1WqexxK+ykn99Ji3leMmffHP8Oari8w96b RxSbjoo8hOgoS5xZ4v3AaqtLDlBpxC6oWJzDaq/fJeKxOx22Z5BDFUM9mBGxrouJ AATl8b+cW/aTZ4l7WOPU =Hqii -----END PGP SIGNATURE----- Merge tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This patch-set includes the following major enhancement patches. - introduce a new gloabl lock scheme - add tracepoints on several major functions - fix the overall cleaning process focused on victim selection - apply the block plugging to merge IOs as much as possible - enhance management of free nids and its list - enhance the readahead mode for node pages - address several cretical deadlock conditions - reduce lock_page calls The other minor bug fixes and enhancements are as follows. - calculation mistakes: overflow - bio types: READ, READA, and READ_SYNC - fix the recovery flow, data races, and null pointer errors" * tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits) f2fs: cover free_nid management with spin_lock f2fs: optimize scan_nat_page() f2fs: code cleanup for scan_nat_page() and build_free_nids() f2fs: bugfix for alloc_nid_failed() f2fs: recover when journal contains deleted files f2fs: continue to mount after failing recovery f2fs: avoid deadlock during evict after f2fs_gc f2fs: modify the number of issued pages to merge IOs f2fs: remove useless #include <linux/proc_fs.h> as we're now using sysfs as debug entry. f2fs: fix inconsistent using of NM_WOUT_THRESHOLD f2fs: check truncation of mapping after lock_page f2fs: enhance alloc_nid and build_free_nids flows f2fs: add a tracepoint on f2fs_new_inode f2fs: check nid == 0 in add_free_nid f2fs: add REQ_META about metadata requests for submit f2fs: give a chance to merge IOs by IO scheduler f2fs: avoid frequent background GC f2fs: add tracepoints to debug checkpoint request f2fs: add tracepoints for write page operations f2fs: add tracepoints to debug the block allocation ...	2013-05-08 15:11:48 -07:00
Jaegeuk Kim	afcb7ca01f	f2fs: check truncation of mapping after lock_page We call lock_page when we need to update a page after readpage. Between grab and lock page, the page can be truncated by other thread. So, we should check the page after lock_page whether it was truncated or not. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-29 11:19:32 +09:00
Jaegeuk Kim	c718379b6b	f2fs: give a chance to merge IOs by IO scheduler Previously, background GC submits many 4KB read requests to load victim blocks and/or its (i)node blocks. ... f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb61, blkaddr = 0x3b964ed f2fs_gc : block_rq_complete: 8,16 R () 499854968 + 8 [0] f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb6f, blkaddr = 0x3b964ee f2fs_gc : block_rq_complete: 8,16 R () 499854976 + 8 [0] f2fs_gc : f2fs_readpage: ino = 1, page_index = 0xb79, blkaddr = 0x3b964ef f2fs_gc : block_rq_complete: 8,16 R () 499854984 + 8 [0] ... However, by the fact that many IOs are sequential, we can give a chance to merge the IOs by IO scheduler. In order to do that, let's use blk_plug. ... f2fs_gc : f2fs_iget: ino = 143 f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c6, blkaddr = 0x2e6ee f2fs_gc : f2fs_iget: ino = 143 f2fs_gc : f2fs_readpage: ino = 143, page_index = 0x1c7, blkaddr = 0x2e6ef <idle> : block_rq_complete: 8,16 R () 1519616 + 8 [0] <idle> : block_rq_complete: 8,16 R () 1519848 + 8 [0] <idle> : block_rq_complete: 8,16 R () `1520432` + 96 [0] <idle> : block_rq_complete: 8,16 R () 1520536 + 104 [0] <idle> : block_rq_complete: 8,16 R () 1521008 + 112 [0] <idle> : block_rq_complete: 8,16 R () 1521440 + 152 [0] <idle> : block_rq_complete: 8,16 R () 1521688 + 144 [0] <idle> : block_rq_complete: 8,16 R () 1522128 + 192 [0] <idle> : block_rq_complete: 8,16 R () 1523256 + 328 [0] ... Note that this issue should be addressed in checkpoint, and some readahead flows too. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-26 10:35:10 +09:00
Namjae Jeon	c01e285324	f2fs: add tracepoints to debug the block allocation Add tracepoints to debug the block allocation & fallocate. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> [Jaegeuk: enhance information] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-23 18:15:16 +09:00
Namjae Jeon	51dd624934	f2fs: add tracepoints for truncate operation add tracepoints for tracing the truncate operations like truncate node/data blocks, f2fs_truncate etc. Tracepoints are added at entry and exit of operation to trace the success & failure of operation. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> [Jaegeuk: combine and modify the tracepoint structures] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-23 16:40:38 +09:00
Namjae Jeon	a2a4a7e4ab	f2fs: add tracepoints for sync & inode operations Add tracepoints in f2fs for tracing the syncing operations like filesystem sync, file sync enter/exit. It will helf to trace the code under debugging scenarios. Also add tracepoints for tracing the various inode operations like building inode, eviction of inode, link/unlike of inodes. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> [Jaegeuk: combine and modify the tracepoint structures] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-23 15:30:27 +09:00
Al Viro	bdaec334bb	f2fs: use mnt_want_write_file() in ioctl Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-04-09 14:12:56 -04:00
Jaegeuk Kim	399368372e	f2fs: introduce a new global lock scheme In the previous version, f2fs uses global locks according to the usage types, such as directory operations, block allocation, block write, and so on. Reference the following lock types in f2fs.h. enum lock_type { RENAME, /* for renaming operations / DENTRY_OPS, / for directory operations / DATA_WRITE, / for data write / DATA_NEW, / for data allocation / DATA_TRUNC, / for data truncate / NODE_NEW, / for node allocation / NODE_TRUNC, / for node truncate / NODE_WRITE, / for node write */ NR_LOCK_TYPE, }; In that case, we lose the performance under the multi-threading environment, since every types of operations must be conducted one at a time. In order to address the problem, let's share the locks globally with a mutex array regardless of any types. So, let users grab a mutex and perform their jobs in parallel as much as possbile. For this, I propose a new global lock scheme as follows. 0. Data structure - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS] - f2fs_sb_info -> node_write 1. mutex_lock_op(sbi) - try to get an avaiable lock from the array. - returns the index of the gottern lock variable. 2. mutex_unlock_op(sbi, index of the lock) - unlock the given index of the lock. 3. mutex_lock_all(sbi) - grab all the locks in the array before the checkpoint. 4. mutex_unlock_all(sbi) - release all the locks in the array after checkpoint. 5. block_operations() - call mutex_lock_all() - sync_dirty_dir_inodes() - grab node_write - sync_node_pages() Note that, the pairs of mutex_lock_op()/mutex_unlock_op() and mutex_lock_all()/mutex_unlock_all() should be used together. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-09 18:21:18 +09:00
Jason Hrycay	1127a3d448	f2fs: move f2fs_balance_fs from truncate to punch_hole Move the f2fs_balance_fs out of the truncate_hole function and only perform that in punch_hole use case. The commit: ed60b1644e7f7e5dd67d21caf7e4425dff05dad0 intended to do this but moved it into truncate_hole to cover more cases. However, a deadlock scenario is possible when deleting an inode entry under specific conditions: f2fs_delete_entry() mutex_lock_op(sbi, DENTRY_OPS); truncate_hole() f2fs_balance_fs() mutex_lock(&sbi->gc_mutex); f2fs_gc() write_checkpoint() block_operations() mutex_lock_op(sbi, DENTRY_OPS); Lets move it into the punch_hole case to cover the original intent of avoiding it during fallocate's expand_inode_data case. Change-Id: I29f8ea1056b0b88b70ba8652d901b6e8431bb27e Signed-off-by: Jason Hrycay <jason.hrycay@motorola.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-04-09 17:22:45 +09:00
Jaegeuk Kim	953a3e27e1	f2fs: fix to give correct parent inode number for roll forward When we recover fsync'ed data after power-off-recovery, we should guarantee that any parent inode number should be correct for each direct inode blocks. So, let's make the following rules. - The fsync should do checkpoint to all the inodes that were experienced hard links. - So, the only normal files can be recovered by roll-forward. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-27 09:16:25 +09:00
Jaegeuk Kim	0ff153a2f1	f2fs: do not skip writing file meta during fsync This patch removes data_version check flow during the fsync call. The original purpose for the use of data_version was to avoid writng inode pages redundantly by the fsync calls repeatedly. However, when user can modify file meta and then call fsync, we should not skip fsync procedure. So, let's remove this condition check and hope that user triggers in right manner. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-27 09:16:16 +09:00
Jaegeuk Kim	ae51fb31b8	f2fs: fix to call WRITE_FLUSH at the end of fsync The fsync call should be ended after flushing the in-device caches. Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-03-20 18:30:14 +09:00
Jaegeuk Kim	266e97a81c	f2fs: introduce readahead mode of node pages Previously, f2fs reads several node pages ahead when get_dnode_of_data is called with RDONLY_NODE flag. And, this flag is set by the following functions. - get_data_block_ro - get_lock_data_page - do_write_data_page - truncate_blocks - truncate_hole However, this readahead mechanism is initially introduced for the use of get_data_block_ro to enhance the sequential read performance. So, let's clarify all the cases with the additional modes as follows. enum { ALLOC_NODE, /* allocate a new node page if needed / LOOKUP_NODE, / look up a node without readahead / LOOKUP_NODE_RA, / * look up a node with readahead called * by get_datablock_ro. */ } Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com> Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>	2013-03-18 21:00:33 +09:00
Al Viro	6131ffaa1f	more file_inode() open-coded instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-27 16:59:05 -05:00
Namjae Jeon	e975082411	f2fs: add compat_ioctl to provide backward compatability adding compat_ioctl to provide support for backward comptability - 32bit binary execution on 64bit kernel. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>	2013-02-12 07:15:02 +09:00
Changman Lee	facb020540	f2fs: stop repeated checking if cp is needed If it is decided that f2fs should do checkpoint, skip next comparison. Signed-off-by: Changman Lee <cm224.lee@samsung.com>	2013-02-12 07:15:01 +09:00

1 2

61 Commits