f2fs_ioc_gc_range skips blocks_per_seg each time, however, f2fs_gc moves
blocks of section each time, so fix it from segment to section.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Introduce a wrapper __is_large_section() to clean up codes.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In F2FS_HAS_FEATURE(), we will use F2FS_SB(sb) to get sbi pointer to
access .raw_super field, to avoid unneeded pointer conversion, this
patch changes to F2FS_HAS_FEATURE() accept sbi parameter directly.
Just do cleanup, no logic change.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch does below changes to keep consistence of project quota data
in sudden power-cut case:
- update inode.i_projid and project quota atomically under lock_op() in
f2fs_ioc_setproject()
- recover inode.i_projid and project quota in recover_inode()
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.
The implementation is as below:
1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.
2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.
3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The REQ_TIME should be updated only in case of success cases
as followed at all other places in the file system.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Note that, it requires "f2fs: return correct errno in f2fs_gc".
This adds a lightweight non-persistent snapshotting scheme to f2fs.
To use, mount with the option checkpoint=disable, and to return to
normal operation, remount with checkpoint=enable. If the filesystem
is shut down before remounting with checkpoint=enable, it will revert
back to its apparent state when it was first mounted with
checkpoint=disable. This is useful for situations where you wish to be
able to roll back the state of the disk in case of some critical
failure.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
[Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Normally, DIO uses in-pllace-update, but in LFS mode, f2fs doesn't
allow triggering any in-place-update writes, so we fallback direct
write to buffered write, result in bad performance of large size
write.
This patch adds to support triggering out-place-update for direct IO
to enhance its performance.
Note that it needs to exclude direct read IO during direct write,
since new data writing to new block address will no be valid until
write finished.
storage: zram
time xfs_io -f -d /mnt/f2fs/file -c "pwrite 0 1073741824" -c "fsync"
Before:
real 0m13.061s
user 0m0.327s
sys 0m12.486s
After:
real 0m6.448s
user 0m0.228s
sys 0m6.212s
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Thread A Thread B
- f2fs_vm_page_mkwrite
- f2fs_setattr
- down_write(i_mmap_sem)
- truncate_setsize
- f2fs_truncate
- up_write(i_mmap_sem)
- f2fs_reserve_block
reserve NEW_ADDR
- skip dirty page due to truncation
1. we don't need to rserve new block address for a truncated page.
2. dn.data_blkaddr is used out of node page lock coverage.
Refactor ->page_mkwrite() flow to fix above issues:
- use __do_map_lock() to avoid racing checkpoint()
- lock data page in prior to dnode page
- cover f2fs_reserve_block with i_mmap_sem lock
- wait page writeback before zeroing page
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Remove the verbose license text from f2fs files and replace them with
SPDX tags. This does not change the license of any of the code.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Currently, project quota could be changed by fssetxattr
ioctl, and existed permission check inode_owner_or_capable()
is obviously not enough, just think that common users could
change project id of file, that could make users to
break project quota easily.
This patch try to follow same regular of xfs project
quota:
"Project Quota ID state is only allowed to change from
within the init namespace. Enforce that restriction only
if we are trying to change the quota ID state.
Everything else is allowed in user namespaces."
Besides that, check and set project id'state should
be an atomic operation, protect whole operation with
inode lock.
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
1. Create a file in an encrypted directory
2. Do GC & drop caches
3. Read stale data before its bio for metapage was not issued yet
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The f2fs_gc() called by f2fs_balance_fs() requires to be called outside of
fi->i_gc_rwsem[WRITE], since f2fs_gc() can try to grab it in a loop.
If it hits the miximum retrials in GC, let's give a chance to release
gc_mutex for a short time in order not to go into live lock in the worst
case.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an
unused label:
fs/f2fs/segment.c: In function '__submit_discard_cmd':
fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used [-Werror=unused-label]
This could be fixed by adding another #ifdef around it, but the more
reliable way of doing this seems to be to remove the other #ifdefs
where that is easily possible.
By defining time_to_inject() as a trivial stub, most of the checks for
CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer
formatting of the code.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Thread A Background GC
- f2fs_setattr isize to 0
- truncate_setsize
- gc_data_segment
- f2fs_get_read_data_page page #0
- set_page_dirty
- set_cold_data
- f2fs_truncate
- f2fs_setattr isize to 4k
- read 4k <--- hit data in cached page #0
Above race condition can cause read out invalid data in a truncated
page, fix it by i_gc_rwsem[WRITE] lock.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Thread A Background GC
- f2fs_zero_range
- truncate_pagecache_range
- gc_data_segment
- get_read_data_page
- move_data_page
- set_page_dirty
- set_cold_data
- f2fs_do_zero_range
- dn->data_blkaddr = NEW_ADDR;
- f2fs_set_data_blkaddr
Actually, we don't need to set dirty & checked flag on the page, since
all valid data in the page should be zeroed by zero_range().
Use i_gc_rwsem[WRITE] to avoid such race condition.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Let's reset i_gc_failures to zero when we unset pinned state for file.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs recovery flow is relying on dnode block link list, it means fsynced
file recovery depends on previous dnode's persistence in the list, so
during fsync() we should wait on all regular inode's dnode writebacked
before issuing flush.
By this way, we can avoid dnode block list being broken by out-of-order
IO submission due to IO scheduler or driver.
Sheng Yong helps to do the test with this patch:
Target:/data (f2fs, -)
64MB / 32768KB / 4KB / 8
1 / PERSIST / Index
Base:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 867.82 204.15 41440.03 41370.54 680.8 1025.94 1031.08
2 871.87 205.87 41370.3 40275.2 791.14 1065.84 1101.7
3 866.52 205.69 41795.67 40596.16 694.69 1037.16 1031.48
Avg 868.7366667 205.2366667 41535.33333 40747.3 722.21 1042.98 1054.753333
After:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 798.81 202.5 41143 40613.87 602.71 838.08 913.83
2 805.79 206.47 40297.2 41291.46 604.44 840.75 924.27
3 814.83 206.17 41209.57 40453.62 602.85 834.66 927.91
Avg 806.4766667 205.0466667 40883.25667 40786.31667 603.3333333 837.83 922.0033333
Patched/Original:
0.928332713 0.999074239 0.984300676 1.000957528 0.835398753 0.803303994 0.874141189
It looks like atomic write will suffer performance regression.
I suspect that the criminal is that we forcing to wait all dnode being in
storage cache before we issue PREFLUSH+FUA.
BTW, will commit ("f2fs: don't need to wait for node writes for atomic write")
cause the problem: we will lose data of last transaction after SPO, even if
atomic write return no error:
- atomic_open();
- write() P1, P2, P3;
- atomic_commit();
- writeback data: P1, P2, P3;
- writeback node: N1, N2, N3; <--- If N1, N2 is not writebacked, N3 with fsync_mark is
writebacked, In SPOR, we won't find N3 since node chain is broken, turns out that losing
last transaction.
- preflush + fua;
- power-cut
If we don't wait dnode writeback for atomic_write:
SEQ-RD(MB/s) SEQ-WR(MB/s) RND-RD(IOPS) RND-WR(IOPS) Insert(TPS) Update(TPS) Delete(TPS)
1 779.91 206.03 41621.5 40333.16 716.9 1038.21 1034.85
2 848.51 204.35 40082.44 39486.17 791.83 1119.96 1083.77
3 772.12 206.27 41335.25 41599.65 723.29 1055.07 971.92
Avg 800.18 205.55 41013.06333 40472.99333 744.0066667 1071.08 1030.18
Patched/Original:
0.92108464 1.001526693 0.987425886 0.993268102 1.030180511 1.026942031 0.976702294
SQLite's performance recovers.
Jaegeuk:
"Practically, I don't see db corruption becase of this. We can excuse to lose
the last transaction."
Finally, we decide to keep original implementation of atomic write interface
sematics that we don't wait all dnode writeback before preflush+fua submission.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In order to prevent abusing atomic writes by abnormal users, we've added a
threshold, 20% over memory footprint, which disallows further atomic writes.
Previously, however, SQLite doesn't know the files became normal, so that
it could write stale data and commit on revoked normal database file.
Once f2fs detects such the abnormal behavior, this patch tries to avoid further
writes in write_begin().
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Romve redundant prefix 'f2fs_' in the middle of f2fs_ioc_f2fs_write_checkpoint().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If caller of __get_meta_page() can handle error, let's propagate error
from __get_meta_page().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces verify_blkaddr to check meta/data block address
with valid range to detect bug earlier.
In addition, once we encounter an invalid blkaddr, notice user to run
fsck to fix, and let the kernel panic.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
"ret" can be uninitialized on the success path when "in ==
F2FS_GOING_DOWN_FULLSYNC".
Fixes: 60b2b4ee2b ("f2fs: Fix deadlock in shutdown ioctl")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Once we shutdown f2fs, we have to flush stale pages in order to unmount
the system. In order to make stable, we need to stop fault injection as well.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This is a late set of changes from Deepa Dinamani doing an automated
treewide conversion of the inode and iattr structures from 'timespec'
to 'timespec64', to push the conversion from the VFS layer into the
individual file systems.
There were no conflicts between this and the contents of linux-next
until just before the merge window, when we saw multiple problems:
- A minor conflict with my own y2038 fixes, which I could address
by adding another patch on top here.
- One semantic conflict with late changes to the NFS tree. I addressed
this by merging Deepa's original branch on top of the changes that
now got merged into mainline and making sure the merge commit includes
the necessary changes as produced by coccinelle.
- A trivial conflict against the removal of staging/lustre.
- Multiple conflicts against the VFS changes in the overlayfs tree.
These are still part of linux-next, but apparently this is no longer
intended for 4.18 [1], so I am ignoring that part.
As Deepa writes:
The series aims to switch vfs timestamps to use struct timespec64.
Currently vfs uses struct timespec, which is not y2038 safe.
The series involves the following:
1. Add vfs helper functions for supporting struct timepec64 timestamps.
2. Cast prints of vfs timestamps to avoid warnings after the switch.
3. Simplify code using vfs timestamps so that the actual
replacement becomes easy.
4. Convert vfs timestamps to use struct timespec64 using a script.
This is a flag day patch.
Next steps:
1. Convert APIs that can handle timespec64, instead of converting
timestamps at the boundaries.
2. Update internal data structures to avoid timestamp conversions.
Thomas Gleixner adds:
I think there is no point to drag that out for the next merge window.
The whole thing needs to be done in one go for the core changes which
means that you're going to play that catchup game forever. Let's get
over with it towards the end of the merge window.
[1] https://www.spinics.net/lists/linux-fsdevel/msg128294.html
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbInZAAAoJEGCrR//JCVInReoQAIlVIIMt5ZX6wmaKbrjy9Itf
MfgbFihQ/djLnuSPVQ3nztcxF0d66BKHZ9puVjz6+mIHqfDvJTRwZs9nU+sOF/T1
g78fRkM1cxq6ZCkGYAbzyjyo5aC4PnSMP/NQLmwqvi0MXqqrbDoq5ZdP9DHJw39h
L9lD8FM/P7T29Fgp9tq/pT5l9X8VU8+s5KQG1uhB5hii4VL6pD6JyLElDita7rg+
Z7/V7jkxIGEUWF7vGaiR1QTFzEtpUA/exDf9cnsf51OGtK/LJfQ0oiZPPuq3oA/E
LSbt8YQQObc+dvfnGxwgxEg1k5WP5ekj/Wdibv/+rQKgGyLOTz6Q4xK6r8F2ahxs
nyZQBdXqHhJYyKr1H1reUH3mrSgQbE5U5R1i3My0xV2dSn+vtK5vgF21v2Ku3A1G
wJratdtF/kVBzSEQUhsYTw14Un+xhBLRWzcq0cELonqxaKvRQK9r92KHLIWNE7/v
c0TmhFbkZA+zR8HdsaL3iYf1+0W/eYy8PcvepyldKNeW2pVk3CyvdTfY2Z87G2XK
tIkK+BUWbG3drEGG3hxZ3757Ln3a9qWyC5ruD3mBVkuug/wekbI8PykYJS7Mx4s/
WNXl0dAL0Eeu1M8uEJejRAe1Q3eXoMWZbvCYZc+wAm92pATfHVcKwPOh8P7NHlfy
A3HkjIBrKW5AgQDxfgvm
=CZX2
-----END PGP SIGNATURE-----
Merge tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground
Pull inode timestamps conversion to timespec64 from Arnd Bergmann:
"This is a late set of changes from Deepa Dinamani doing an automated
treewide conversion of the inode and iattr structures from 'timespec'
to 'timespec64', to push the conversion from the VFS layer into the
individual file systems.
As Deepa writes:
'The series aims to switch vfs timestamps to use struct timespec64.
Currently vfs uses struct timespec, which is not y2038 safe.
The series involves the following:
1. Add vfs helper functions for supporting struct timepec64
timestamps.
2. Cast prints of vfs timestamps to avoid warnings after the switch.
3. Simplify code using vfs timestamps so that the actual replacement
becomes easy.
4. Convert vfs timestamps to use struct timespec64 using a script.
This is a flag day patch.
Next steps:
1. Convert APIs that can handle timespec64, instead of converting
timestamps at the boundaries.
2. Update internal data structures to avoid timestamp conversions'
Thomas Gleixner adds:
'I think there is no point to drag that out for the next merge
window. The whole thing needs to be done in one go for the core
changes which means that you're going to play that catchup game
forever. Let's get over with it towards the end of the merge window'"
* tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
pstore: Remove bogus format string definition
vfs: change inode times to use struct timespec64
pstore: Convert internal records to timespec64
udf: Simplify calls to udf_disk_stamp_to_time
fs: nfs: get rid of memcpys for inode times
ceph: make inode time prints to be long long
lustre: Use long long type to print inode time
fs: add timespec64_truncate()
Thread A Thread B
- f2fs_release_file
- clear_inode_flag(FI_VOLATILE_FILE)
- wb_writeback
- writeback_sb_inodes
- __writeback_single_inode
- do_writepages
- f2fs_write_data_pages
- __write_data_page
all volatile file's pages
are writebacked to storage
- set_inode_flag(FI_DROP_CACHE)
- filemap_fdatawrite
There is a hole that mm can flush all dirty pages of volatile file as
inode is not tagged with both FI_VOLATILE_FILE and FI_DROP_CACHE flags,
we should never writeback the page #0 and also it's unneeded to writeback
other pages.
This patch adjusts to relocate clear_inode_flag(FI_VOLATILE_FILE), so that
FI_VOLATILE_FILE flag can be remained before all dirty pages were dropped
to avoid issue.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Although mixed sync/async IOs can have continuous LBA, as they have
different IO priority, block IO scheduler will add them into different
queues and commit them separately, result in splited IOs which causes
wrose performance.
This patch gives high priority to synchronous IO of nodes, means that
once synchronous flow starts, it can interrupt asynchronous writeback
flow of system flusher, so more big IOs can be expected.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
clean up checkpatch error:
ERROR: space required after that ':'
ERROR: space required after that ','
Signed-off-by: youngjun yoo <youngjun.willow@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
clean up checkpatch warning:
WARNING: Missing a blank line after declarations
Signed-off-by: youngjun yoo <youngjun.willow@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
clean up checkpatch warning:
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
Signed-off-by: youngjun yoo <youngjun.willow@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Ted reported:
"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).
As one example, in fs/f2fs/dir.c there is:
unsigned char get_de_type(struct f2fs_dir_entry *de)
This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.
You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.
acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."
This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;
Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs doesn't allow abuse on atomic write class interface, so except
limiting in-mem pages' total memory usage capacity, we need to limit
atomic-write usage as well when filesystem is seriously fragmented,
otherwise we may run into infinite loop during foreground GC because
target blocks in victim segment are belong to atomic opened file for
long time.
Now, we will detect failure due to atomic write in foreground GC, if
the count exceeds threshold, we will drop all atomic written data in
cache, by this, I expect it can keep our system running safely to
prevent Dos attack.
In addition, his patch adds to show GC skip information in debugfs,
now it just shows count of skipped caused by atomic write.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
- rename is_valid_blkaddr() to is_valid_meta_blkaddr() for readability.
- introduce is_valid_blkaddr() for cleanup.
No logic change in this patch.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Volatile file's data will be updated oftenly, so it'd better to place
its data into hot data segment.
In addition, for atomic file, we change to check FI_ATOMIC_FILE instead
of FI_HOT_DATA to make code readability better.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
RW semphore dio_rwsem in struct f2fs_inode_info is introduced to avoid
race between dio and data gc, but now, it is more wildly used to avoid
foreground operation vs data gc. So rename it to i_gc_rwsem to improve
its readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch move mnt_want_write_file after range check,
it's needless to check arguments with it.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch fix missing clear FI_NO_PREALLOC in some error case
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
dquot_initialize() can fail due to any exception inside quota subsystem,
f2fs needs to be aware of it, and return correct return value to caller.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In f2fs_ioc_commit_atomic_write, if file is volatile, return -EINVAL to
indicate that commit failure.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If a file not set type as hot, has dirty pages more than
threshold 64 before starting atomic write, may be lose hot
flag.
v1->v2: move set FI_ATOMIC_FILE flag behind flush dirty pages too,
in case of dirty pages before starting atomic use atomic mode to
write back.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Thread GC thread
- f2fs_ioc_start_atomic_write
- get_dirty_pages
- filemap_write_and_wait_range
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- f2fs_is_atomic_file
- set_page_dirty
- set_inode_flag(, FI_ATOMIC_FILE)
Dirty data page can still be generated by GC in race condition as
above call stack.
This patch adds fi->dio_rwsem[WRITE] in f2fs_ioc_start_atomic_write
to avoid such race.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Use new return type vm_fault_t for page_mkwrite
and fault handler.
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch fixes to show missing encrypt/inline_data flag in
FS_IOC_GETFLAGS like ext4 does.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Now F2FS_FL_USER_VISIBLE and F2FS_FL_USER_MODIFIABLE has included
F2FS_PROJINHERIT_FL, so remove unneeded F2FS_PROJINHERIT_FL when
using visible/modifiable flag macro.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For buffered IO, we don't need to use block plug to cache bio,
for direct IO, generic f2fs_direct_IO has already added block
plug, so let's remove redundant one in .write_iter.
As Yunlei described in his patch:
-f2fs_file_write_iter
-blk_start_plug
-__generic_file_write_iter
...
-do_blockdev_direct_IO
-blk_start_plug
...
-blk_finish_plug
...
-blk_finish_plug
which may conduct performance decrease in our platform
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Previously, we use generic FS_*_FL defined by vfs to indicate inode status
for each bit of i_flags, so f2fs's flag status definition is tied to vfs'
one, it will be hard for f2fs to reuse bits f2fs never used to indicate
new status..
In order to solve this issue, we introduce private inode status mapping,
Note, for these bits have already been persisted into disk, we should
never change their definition, for other ones, we can remap them for
later new coming status.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For non-atomic files, this patch adds an option to give nobarrier which
doesn't issue flush commands to the device.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Currently f2fs's ->readpage() and ->readpages() assume that either the
data undergoes no postprocessing, or decryption only. But with
fs-verity, there will be an additional authenticity verification step,
and it may be needed either by itself, or combined with decryption.
To support this, store a 'struct bio_post_read_ctx' in ->bi_private
which contains a work struct, a bitmask of postprocessing steps that are
enabled, and an indicator of the current step. The bio completion
routine, if there was no I/O error, enqueues the first postprocessing
step. When that completes, it continues to the next step. Pages that
fail any postprocessing step have PageError set. Once all steps have
completed, pages without PageError set are set Uptodate, and all pages
are unlocked.
Also replace f2fs_encrypted_file() with a new function
f2fs_post_read_required() in places like direct I/O and garbage
collection that really should be testing whether the file needs special
I/O processing, not whether it is encrypted specifically.
This may also be useful for other future f2fs features such as
compression.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If write is failed, we must deallocate the blocks that we couldn't write.
Cc: stable@vger.kernel.org
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Currently, we will leave the kernel with locks still held when the gc_range
is invalid. This patch fixes the bug.
Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds nowait aio support[1].
Return EAGAIN if any of the following checks fail for direct I/O:
- i_rwsem is not lockable
- Blocks are not allocated at the write location
And xfstests generic/471 is passed.
[1]: 6be96d "Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT"
Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch merges miscellaneous mount options into struct f2fs_mount_info,
After this patch, once we add new mount option, we don't need to worry
about recovery of it in remount_fs(), since we will recover the
f2fs_sb_info.mount_opt including all options.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Commit "0a007b97aad6"(f2fs: recover directory operations by fsync)
fixed xfstest generic/342 case, but it also increased the written
data and caused the performance degradation. In most cases, there's
no need to do so heavy fsync actually.
So we introduce new mount option "fsync_mode={posix,strict}" to
control the policy of fsync. "fsync_mode=posix" is set by default,
and means that f2fs uses a light fsync, which follows POSIX semantics.
And "fsync_mode=strict" means that it's a heavy fsync, which behaves
in line with xfs, ext4 and btrfs, where generic/342 will pass, but
the performance will regress.
Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Sqlite user Background GC
- move_data_block
: move page #1
- f2fs_is_atomic_file
- f2fs_ioc_start_atomic_write
- f2fs_ioc_commit_atomic_write
- commit_inmem_pages
: commit page #1 & set node #2 dirty
- f2fs_submit_page_write
- f2fs_update_data_blkaddr
- set_page_dirty
: set node #2 dirty
- f2fs_do_sync_file
- fsync_node_pages
: commit node #1 & node #2, then sudden power-cut
In a race case, we may check FI_ATOMIC_FILE flag before starting atomic
write flow, then we will commit meta data before data with reversed
order, after a sudden pow-cut, database transaction will be inconsistent.
So we'd better to exclude gc/atomic_write to each other by using lock
instead of flag checking.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Jayashree Mohan reported:
A simple workload to reproduce this would be :
1. create foo
2. Write (8K - 16K) // foo size = 16K now
3. fsync()
4. falloc zero_range , keep_size (4202496 - 4210688) // foo size must be 16K
5. fdatasync()
Crash now
On recovery, we see that the file size is 4210688 and not 16K, which
violates the semantics of keep_size flag. We have a test case to
reproduce this using CrashMonkey on 4.15 kernel. Try this out by
simply running :
./c_harness -f /dev/sda -d /dev/cow_ram0 -t f2fs -e 102400 -P -v
tests/generic_468_zero.so
The root cause is that we miss to set KEEP_SIZE bit correctly in zero_range
when zeroing block cross EOF with FALLOC_FL_KEEP_SIZE, let's fix this
missing case.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs_super_block.encrypt_pw_salt can be udpated and persisted
concurrently, result in getting different pwsalt in separated
threads, so let's introduce sb_lock to exclude concurrent
accessers.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces F2FS_FEATURE_FUNCS to clean up the definitions of
different f2fs_sb_has_xxx functions.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds creation time field in inode layout to support showing
kstat.btime in ->statx.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Once filesystem shuts down, daemons like gc/discard thread should be
aware of it, and do exit, in addtion, drop all cached pending discard
commands and turn off real-time discard mode.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch makes f2fs_ioc_shutdown handling error case correctly.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch splits need_inplace_update to two functions:
a. should_update_inplace() includes all conditions that we must use IPU.
b. should_update_outplace() includes all conditions that we must use OPU.
So that, in f2fs_ioc_set_pin_file() and f2fs_defragment_range(), we can
use corresponding function to check whether we can trigger OPU/IPU or not.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We have supported to get next page offset with valid mapping crossing
hole in f2fs_map_blocks, utilizing it to speed up defragment on sparse
file.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch introduces a new ioctl F2FS_IOC_PRECACHE_EXTENTS to precache
extent info like ext4, in order to gain better performance during
triggering AIO by eliminating synchronous waiting of mapping info.
Referred commit: 7869a4a6c5 ("ext4: add support for extent pre-caching")
In addition, with newly added extent precache abilitiy, this patch add
to support FIEMAP_FLAG_CACHE in ->fiemap.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch gives a flag to disable GC on given file, which would be useful, when
user wants to keep its block map. It also conducts in-place-update for dontmove
file.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When calculating required free section during file defragmenting, we
should skip holes in file, otherwise we will probably fail to defrag
sparse file with large size.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This fixes generic/342 which doesn't recover renamed file which was fsynced
before. It will be done via another fsync on newly created file.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There is no caller cares about return value of truncate_data_blocks_range,
remove it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch supports to inject fault into kvmalloc/kvzalloc.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When blocks are allocated for direct write, select the type of
segment using the kiocb hint. But if an inode has FI_NO_ALLOC,
use the inode hint.
Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
test/generic/208 reports a potential deadlock as below:
Chain exists of:
&mm->mmap_sem --> &fi->i_mmap_sem --> &fi->dio_rwsem[WRITE]
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&fi->dio_rwsem[WRITE]);
lock(&fi->i_mmap_sem);
lock(&fi->dio_rwsem[WRITE]);
lock(&mm->mmap_sem);
This patch changes the lock dependency as below in fallocate() to
fix this issue:
- dio_rwsem
- i_mmap_sem
Fixes: bb06664a53 ("f2fs: avoid race in between GC and block exchange")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In this round, we introduce sysfile-based quota support which is required
for Android by default. In addition, we allow that users are able to reserve
some blocks in runtime to mitigate performance drops in low free space.
Enhancement
- assign proper data segments according to write_hints given by user
- issue cache_flush on dirty devices only among multiple devices
- exploit cp_error flag and add more faults to enhance fault injection test
- conduct more readaheads during f2fs_readdir
- add a range for discard commands
Bug fix
- fix zero stat->st_blocks when inline_data is set
- drop crypto key and free stale memory pointer while evict_inode is failing
- fix some corner cases in free space and segment management
- fix wrong last_disk_size
This series includes lots of clean-ups and code enhancement in terms of xattr
operations, discard/flush command control. In addition, it adds versatile
debugfs entries to monitor f2fs status.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAloNCPAACgkQQBSofoJI
UNLYmg/8DbDp/mTXqJ0AURo84Z4OQUOTRxYkWazx4ct2WPZp2+5HCWDDoM8AAtUn
1J6/t7cU3osjos+zWvpUREZq1SPbp5m0h818HBFFJ/YMBPXucdQcd6wpepniOR5J
5uKauVd7jd2pbAAL7hKyr+iBSLrJl816wsq34Ml8y8zkDSJe4wO5YsGDqzqyKf4N
8nxMavUgerb14I/qXPb3ljlYlfaNNRlCT649QGCG78gx5hPeiUtUJ2l5DKV2xPe7
v+5lZO93FFwW1siGy+Atq+nqQJyUkeiOYGPR1NPx9tfmaPO58iOIXLirfblKASZY
HXJigVf50fQQBtwdBFL8ICSop6zV6gCKkNGZCHLzcYFWWL2TQwCIP3/iJdj9Wy+j
+YUYyN0dyl2mmNEDZjRNX1V+QBW1k+msmvBCb0fT1GJTQAyRfA4XfBDyg94cpWQ1
9YivNywuzG8YtghY7gYU3lCfT2OG19nXCSdz4qYUb5SSwoeGtLahLxMV4mlil4Tg
dOa8CPLFhJnCqB9ivI4L6SennBr+gNgL26SeZ3PF+B5KimYOTZxbenrll1kTi1xp
uCU6UR1xJS0W7Cjk8sCIu5hXkJMJwPJ0hcVeTgsxMkujLGvSSRCGb2hmOeILfwRZ
N4aGn+kVmwwgKaKjD/F4CY4b3yJLdTKMjjl74u5YaMQWe4Bq4qU=
=c49T
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we introduce sysfile-based quota support which is
required for Android by default. In addition, we allow that users are
able to reserve some blocks in runtime to mitigate performance drops
in low free space.
Enhancements:
- assign proper data segments according to write_hints given by user
- issue cache_flush on dirty devices only among multiple devices
- exploit cp_error flag and add more faults to enhance fault
injection test
- conduct more readaheads during f2fs_readdir
- add a range for discard commands
Bug fixes:
- fix zero stat->st_blocks when inline_data is set
- drop crypto key and free stale memory pointer while evict_inode is
failing
- fix some corner cases in free space and segment management
- fix wrong last_disk_size
This series includes lots of clean-ups and code enhancement in terms
of xattr operations, discard/flush command control. In addition, it
adds versatile debugfs entries to monitor f2fs status"
* tag 'f2fs-for-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (75 commits)
f2fs: deny accessing encryption policy if encryption is off
f2fs: inject fault in inc_valid_node_count
f2fs: fix to clear FI_NO_PREALLOC
f2fs: expose quota information in debugfs
f2fs: separate nat entry mem alloc from nat_tree_lock
f2fs: validate before set/clear free nat bitmap
f2fs: avoid opened loop codes in __add_ino_entry
f2fs: apply write hints to select the type of segments for buffered write
f2fs: introduce scan_curseg_cache for cleanup
f2fs: optimize the way of traversing free_nid_bitmap
f2fs: keep scanning until enough free nids are acquired
f2fs: trace checkpoint reason in fsync()
f2fs: keep isize once block is reserved cross EOF
f2fs: avoid race in between GC and block exchange
f2fs: save a multiplication for last_nid calculation
f2fs: fix summary info corruption
f2fs: remove dead code in update_meta_page
f2fs: remove unneeded semicolon
f2fs: don't bother with inode->i_version
f2fs: check curseg space before foreground GC
...
__get_first_dirty_index() wants to lookup only the first dirty page
after given index. There's no point in using pagevec_lookup_tag() for
that. Just use find_get_pages_tag() directly.
Link: http://lkml.kernel.org/r/20171009151359.31984-8-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to clear FI_NO_PREALLOC flag in error path of f2fs_file_write_iter,
otherwise we will lose the chance to preallocate blocks in latter write()
at one time.
Fixes: dc91de78e5 ("f2fs: do not preallocate blocks which has wrong buffer")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch slightly changes need_do_checkpoint to return the detail
info that indicates why we need do checkpoint, then caller could print
it with trace message.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Without FADVISE_KEEP_SIZE_BIT, we will try to recover file size
according to last non-hole block, so in fallocate(), we must set
FADVISE_KEEP_SIZE_BIT flag once we have preallocated block cross
EOF, instead of when all preallocation is success. Otherwise, file
size will be incorrect due to lack of this flag.
Simple testcase to reproduce this:
1. echo 2 > /sys/fs/f2fs/<device>/inject_type
2. echo 10 > /sys/fs/f2fs/<device>/inject_rate
3. run tests/generic/392
4. disable fault injection
5. do remount
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
During block exchange in {insert,collapse,move}_range, page-block mapping
is unstable due to mapping moving or recovery, so there should be no
concurrent cache read operation rely on such mapping, nor cache write
operation to mess up block exchange.
So this patch let background GC be aware of that.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch replaces to use cp_error flag instead of RDONLY for quota off.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Let's skip entire non-exist area to speed up truncate_hole by
using get_next_page_offset.
Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If there's some data written through inline data or dentry, we need to shouw
st_blocks. This fixes reporting zero blocks even though there is small written
data.
Cc: stable@vger.kernel.org
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid link file for quotacheck]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
last_disk_size could be wrong due to concurrently updating, so using
i_sem semaphore to make last_disk_size updating exclusive to fix this
issue.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When multiple device feature is enabled, during ->fsync we will issue
flush in all devices to make sure node/data of the file being persisted
into storage. But some flushes of device could be unneeded as file's
data may be not writebacked into those devices. So this patch adds and
manage bitmap per inode in global cache to indicate which device is
dirty and it needs to issue flush during ->fsync, hence, we could improve
performance of fsync in scenario of multiple device.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If we failed to issue flush in ->fsync, we need to keep FI_UPDATE_WRITE
flag to make sure triggering flush in next ->fsync.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In this round, we've mostly tuned f2fs to provide better user experience
for Android. Especially, we've worked on atomic write feature again with
SQLite community in order to support it officially. And we added or modified
several facilities to analyze and enhance IO behaviors.
Major changes include:
- add app/fs io stat
- add inode checksum feature
- support project/journalled quota
- enhance atomic write with new ioctl() which exposes feature set
- enhance background gc/discard/fstrim flows with new gc_urgent mode
- add F2FS_IOC_FS{GET,SET}XATTR
- fix some quota flows
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAlm4brsACgkQQBSofoJI
UNK6dw/+Jd0j2whU5oxRcxZ6aL1Pj9fp2IdnGk1NbAI2mKAAlGknE/8CDS9OOMdO
y8O0x3H271DXTMfHJAq2pAfJzcMhiT/Wmw2UsHvmHU0mPmfDcSBKEqPQj6Nbl483
4s1dyt20InfHsVaKhUWAhov14bxLSiQTfeFH0SL2qv/NTp1Xlp6mwQvKCrmNNxud
coUL45Zk5uVAVckR0hsyfqudvdXM1LTDG0Y6/j0IaFtO9HqyAEgkILiSqL65TpBV
2OrXsTf0p2HN9g8vSUUouyD4Oj+q1OHt+VN7gw03xXm3TqAaqnkpIq/dtGLEPyM5
HD6Q2nDHDTLeKO2Ibi9C0f+bph4UqrCq/eoAjG1sM+6Sm+Hyf193FLR/E2R9aj8w
++lCoHUSf/krrMs9d+vnNWaTsKszAbAQRLiZaSHi21+0lcDZtYejNsm52LpDMAfO
jzz+TTOvXTSlHWSlt8DRKVolNhMRFy9OYIJ0schYYD6FJldARmBMfcZosrhL1Xoh
oU/bBaXwMv1XOWAOGCQbGrqREiciqXbKDGPQJq65Zvn60U6YzZf04wDbm0zXku5E
x7S8kPxz8c/010JHIxvULZRamlvXSjFevbAa+QtNsEhlj6DkDSdisMj+w7/jU4Yx
uInHojIq7ARJO0SBIoYFkz3+/2w++McCK0b/gpx1WHsN8I013zs=
=w4KH
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've mostly tuned f2fs to provide better user
experience for Android. Especially, we've worked on atomic write
feature again with SQLite community in order to support it officially.
And we added or modified several facilities to analyze and enhance IO
behaviors.
Major changes include:
- add app/fs io stat
- add inode checksum feature
- support project/journalled quota
- enhance atomic write with new ioctl() which exposes feature set
- enhance background gc/discard/fstrim flows with new gc_urgent mode
- add F2FS_IOC_FS{GET,SET}XATTR
- fix some quota flows"
* tag 'f2fs-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (63 commits)
f2fs: hurry up to issue discard after io interruption
f2fs: fix to show correct discard_granularity in sysfs
f2fs: detect dirty inode in evict_inode
f2fs: clear radix tree dirty tag of pages whose dirty flag is cleared
f2fs: speed up gc_urgent mode with SSR
f2fs: better to wait for fstrim completion
f2fs: avoid race in between read xattr & write xattr
f2fs: make get_lock_data_page to handle encrypted inode
f2fs: use generic terms used for encrypted block management
f2fs: introduce f2fs_encrypted_file for clean-up
Revert "f2fs: add a new function get_ssr_cost"
f2fs: constify super_operations
f2fs: fix to wake up all sleeping flusher
f2fs: avoid race in between atomic_read & atomic_inc
f2fs: remove unneeded parameter of change_curseg
f2fs: update i_flags correctly
f2fs: don't check inode's checksum if it was dirtied or writebacked
f2fs: don't need to update inode checksum for recovery
f2fs: trigger fdatasync for non-atomic_write file
f2fs: fix to avoid race in between aio and gc
...
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZrTy3AAoJEAAOaEEZVoIVaucP/ApBAj2S5wzvlV1u6l8E6ae7
ZeEEZfcWwzRYlKjZAkTWqj9XvGpDGO5gLq4wsZK2edFAq++/MJF8ZVtN4tdZ1kUZ
DUvRodtVOrT08Kp9wZXGT7JOFrf6U/6gMcR6p0MuWnHndeKYvlpcFi9NPT4EC9/z
Zm9V7gtlPdSOha7eaSjUS0+vLERkxqXLBW3Av9QUOBP/lbI3lqIroGKeHDYnVdya
2P/k5EcRRJMyJP6TqyYxmmJl+UWjJFMLvnlUDBslHnD/u3mIUhw3JLHYBjn5dZRE
Xjq56IDPoXDUvzlBhtn/Uqyx+/wtwsNsylpmKv6K5G1JfdeuSsPVsCey+A1cqV64
LpE5896wf9TmnmI9LNyh6vDn925xPSGBiF45UEp5f9aO7jXeY0MaEZ8g+ENqFIDK
v4gtZdS9FhYHV+/l4qEwYMKrqSbwKEs1r1FT+f4wnABby1ojfdA57ZPlp5PV2Vjp
szTp88Zkb7cMvZwEnWwxWofcJNmgS7uNahvnQF3IJ4ITsioEkuyYR3K4ZQMaaaV9
wCp6G0FhXZaK3OI7o9WiDwaO2elp9Hxc8bnqKpiBbHZkY0NLh7/++5VxpeNbTHFy
AGijQiiKGNNyYqNj93wq9jpVdMNjB0pXrHRxfav8v7MtQ+WfbEoAENF4T7hN7iXn
UuF6eSWEC5O1UCRUk1A+
=LLY3
-----END PGP SIGNATURE-----
Merge tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull writeback error handling updates from Jeff Layton:
"This pile continues the work from last cycle on better tracking
writeback errors. In v4.13 we added some basic errseq_t infrastructure
and converted a few filesystems to use it.
This set continues refining that infrastructure, adds documentation,
and converts most of the other filesystems to use it. The main
exception at this point is the NFS client"
* tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
ecryptfs: convert to file_write_and_wait in ->fsync
mm: remove optimizations based on i_size in mapping writeback waits
fs: convert a pile of fsync routines to errseq_t based reporting
gfs2: convert to errseq_t based writeback error reporting for fsync
fs: convert sync_file_range to use errseq_t based error-tracking
mm: add file_fdatawait_range and file_write_and_wait
fuse: convert to errseq_t based error tracking for fsync
mm: consolidate dax / non-dax checks for writeback
Documentation: add some docs for errseq_t
errseq: rename __errseq_set to errseq_set
This patch renames functions regarding to buffer management via META_MAPPING
used for encrypted blocks especially. We can actually use them in generic way.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch replaces (f2fs_encrypted_inode() && S_ISREG()) with
f2fs_encrypted_file(), which gives no functional change.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Sqlite only cares about synchronization of file data instead of other data
unrelated attribute of inode, so in commit flow, call fdatasync is enough.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If file was not opened with atomic write mode, but user uses atomic write
ioctl to fsync datas, in the flow, we should not fsync that file with
atomic write mode.
Fixes: 608514deba ("f2fs: set fsync mark only for the last dnode")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>