Mutex lock won't serialize callers, in order to avoid starving of unlucky
caller, let's use rwsem lock instead.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch tries to support compression in f2fs.
- New term named cluster is defined as basic unit of compression, file can
be divided into multiple clusters logically. One cluster includes 4 << n
(n >= 0) logical pages, compression size is also cluster size, each of
cluster can be compressed or not.
- In cluster metadata layout, one special flag is used to indicate cluster
is compressed one or normal one, for compressed cluster, following metadata
maps cluster to [1, 4 << n - 1] physical blocks, in where f2fs stores
data including compress header and compressed data.
- In order to eliminate write amplification during overwrite, F2FS only
support compression on write-once file, data can be compressed only when
all logical blocks in file are valid and cluster compress ratio is lower
than specified threshold.
- To enable compression on regular inode, there are three ways:
* chattr +c file
* chattr +c dir; touch dir/file
* mount w/ -o compress_extension=ext; touch file.ext
Compress metadata layout:
[Dnode Structure]
+-----------------------------------------------+
| cluster 1 | cluster 2 | ......... | cluster N |
+-----------------------------------------------+
. . . .
. . . .
. Compressed Cluster . . Normal Cluster .
+----------+---------+---------+---------+ +---------+---------+---------+---------+
|compr flag| block 1 | block 2 | block 3 | | block 1 | block 2 | block 3 | block 4 |
+----------+---------+---------+---------+ +---------+---------+---------+---------+
. .
. .
. .
+-------------+-------------+----------+----------------------------+
| data length | data chksum | reserved | compressed data |
+-------------+-------------+----------+----------------------------+
Changelog:
20190326:
- fix error handling of read_end_io().
- remove unneeded comments in f2fs_encrypt_one_page().
20190327:
- fix wrong use of f2fs_cluster_is_full() in f2fs_mpage_readpages().
- don't jump into loop directly to avoid uninitialized variables.
- add TODO tag in error path of f2fs_write_cache_pages().
20190328:
- fix wrong merge condition in f2fs_read_multi_pages().
- check compressed file in f2fs_post_read_required().
20190401
- allow overwrite on non-compressed cluster.
- check cluster meta before writing compressed data.
20190402
- don't preallocate blocks for compressed file.
- add lz4 compress algorithm
- process multiple post read works in one workqueue
Now f2fs supports processing post read work in multiple workqueue,
it shows low performance due to schedule overhead of multiple
workqueue executing orderly.
20190921
- compress: support buffered overwrite
C: compress cluster flag
V: valid block address
N: NEW_ADDR
One cluster contain 4 blocks
before overwrite after overwrite
- VVVV -> CVNN
- CVNN -> VVVV
- CVNN -> CVNN
- CVNN -> CVVV
- CVVV -> CVNN
- CVVV -> CVVV
20191029
- add kconfig F2FS_FS_COMPRESSION to isolate compression related
codes, add kconfig F2FS_FS_{LZO,LZ4} to cover backend algorithm.
note that: will remove lzo backend if Jaegeuk agreed that too.
- update codes according to Eric's comments.
20191101
- apply fixes from Jaegeuk
20191113
- apply fixes from Jaegeuk
- split workqueue for fsverity
20191216
- apply fixes from Jaegeuk
20200117
- fix to avoid NULL pointer dereference
[Jaegeuk Kim]
- add tracepoint for f2fs_{,de}compress_pages()
- fix many bugs and add some compression stats
- fix overwrite/mmap bugs
- address 32bit build error, reported by Geert.
- bug fixes when handling errors and i_compressed_blocks
Reported-by: <noreply@ellerman.id.au>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In low memory scenario, we can allocate multiple bios without
submitting any of them.
- f2fs_write_checkpoint()
- block_operations()
- f2fs_sync_node_pages()
step 1) flush cold nodes, allocate new bio from mempool
- bio_alloc()
- mempool_alloc()
step 2) flush hot nodes, allocate a bio from mempool
- bio_alloc()
- mempool_alloc()
step 3) flush warm nodes, be stuck in below call path
- bio_alloc()
- mempool_alloc()
- loop to wait mempool element release, as we only
reserved memory for two bio allocation, however above
allocated two bios may never be submitted.
So we need avoid using default bioset, in this patch we introduce a
private bioset, in where we enlarg mempool element count to total
number of log header, so that we can make sure we have enough
backuped memory pool in scenario of allocating/holding multiple
bios.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
To catch f2fs bugs in write pointer handling code for zoned block
devices, check write pointers of non-open zones that current segments do
not point to. Do this check at mount time, after the fsync data recovery
and current segments' write pointer consistency fix. Or when fsync data
recovery is disabled by mount option, do the check when there is no fsync
data.
Check two items comparing write pointers with valid block maps in SIT.
The first item is check for zones with no valid blocks. When there is no
valid blocks in a zone, the write pointer should be at the start of the
zone. If not, next write operation to the zone will cause unaligned write
error. If write pointer is not at the zone start, reset the write pointer
to place at the zone start.
The second item is check between the write pointer position and the last
valid block in the zone. It is unexpected that the last valid block
position is beyond the write pointer. In such a case, report as a bug.
Fix is not required for such zone, because the zone is not selected for
next write operation until the zone get discarded.
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl3hAFIACgkQnJ2qBz9k
QNkV/gf+Kwn7xHg76YXd15lZYBzhgj/ABAYEEAAVY49OOCK5+XVmmAufHesMZ2lU
Solt8PvbQ8d5786bWpaYXgrTU3JW37c6x1MDUPDLQ8goXWzx7pZWvD+Yup558rDa
H1aoqvFKLgpeVVqkUdvvv2CDbgZyOgGlkDqWeS+c5pZd1NPFZzUAoU26slvQ5h4f
t41mbavOIm5DChQ5UjwRNw+pb09GXaHrPBRJwa1XuJYJWAansBcQIsxiiqt/43Gn
AzwUGrsz4vrPBk+Kcd0SGb8vinFVQr19gBFKFeN3rPFUEUn6T0FPBqaYeiNTNE37
AqASYKlIuhcSf0Wdvx6vxwSHsFl5VA==
=NGxV
-----END PGP SIGNATURE-----
Merge tag 'for_v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, quota, reiserfs cleanups and fixes from Jan Kara:
- Refactor the quota on/off kernel internal interfaces (mostly for
ubifs quota support as ubifs does not want to have inodes holding
quota information)
- A few other small quota fixes and cleanups
- Various small ext2 fixes and cleanups
- Reiserfs xattr fix and one cleanup
* tag 'for_v5.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (28 commits)
ext2: code cleanup for descriptor_loc()
fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long
ext2: fix improper function comment
ext2: code cleanup for ext2_try_to_allocate()
ext2: skip unnecessary operations in ext2_try_to_allocate()
ext2: Simplify initialization in ext2_try_to_allocate()
ext2: code cleanup by calling ext2_group_last_block_no()
ext2: introduce new helper ext2_group_last_block_no()
reiserfs: replace open-coded atomic_dec_and_mutex_lock()
ext2: check err when partial != NULL
quota: Handle quotas without quota inodes in dquot_get_state()
quota: Make dquot_disable() work without quota inodes
quota: Drop dquot_enable()
fs: Use dquot_load_quota_inode() from filesystems
quota: Rename vfs_load_quota_inode() to dquot_load_quota_inode()
quota: Simplify dquot_resume()
quota: Factor out setup of quota inode
quota: Check that quota is not dirty before release
quota: fix livelock in dquot_writeback_dquots
ext2: don't set *count in the case of failure in ext2_try_to_allocate()
...
In this round, we've introduced fairly small number of patches as below.
Enhancement:
- improve the in-place-update IO flow
- allocate segment to guarantee no GC for pinned files
Bug fix:
- fix updatetime in lazytime mode
- potential memory leak in f2fs_listxattr
- record parent inode number in rename2 correctly
- fix deadlock in f2fs_gc along with atomic writes
- avoid needless data migration in GC
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAl3e1XkACgkQQBSofoJI
UNJ0GhAAhVIX4J91CLnVSh0ik1XCaI6h/dFeS6kbDd8oxzQm/qt64b59aZqgy7Rk
iblGWfj8uPP5yO60pqb5uN4a0hybptVZSEldbhF0Xv0zUeVoT7C1ksTMrdUd1p7d
YkO8G+V4QBBrtpKG1KKKEncrvcdx4n9QHxGsRh4z5vXZH7sEmH7+N8OE88MaPjdZ
UWqYk0S0GoZBhPe7c8pQuD/PM+WJJH4Lewgw5kK21eAjOKI+yZKb+bY2tGjo5dA1
nzYO72CRMV4VEKsnxTZ/LCB2kCXeexaGuiVPyHjCmgAh990cLjsCWIbJ8EJu7uAa
vAo6/EMfgfPkPt5Y7uWGR4EeNT7AFhUoMuoQ9zdXzecY48D4Gz58o87Q+OFY3ipZ
W2OSf92pEJyfumE5o8wN435gaRYUjjCo1SMoIQABNav411XrBVoRwjvkV3DyA6af
Bs1bafz2hR/E1q0uoZvLWC5waiHy9605OkKMs/y8IRsn6yhRep/tv3KLk2Dz3fOO
LxenhuVO9bQDCheEcH15qIljxTuyfTyUOa9UrFXOwn4mK61J8A/Gs+SiqW0y28oA
feSw7cLPxK0OlYQgql24JfJN/Xt523WmCSfXfe7TCUDTDkBpmsdhFwHYZyCLzqt+
FyBhf2DF/BGzKMT28oc7StO43mIvOc1Wk+jfJFW+hld5ncAJxCE=
=qyrd
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've introduced fairly small number of patches as below.
Enhancements:
- improve the in-place-update IO flow
- allocate segment to guarantee no GC for pinned files
Bug fixes:
- fix updatetime in lazytime mode
- potential memory leak in f2fs_listxattr
- record parent inode number in rename2 correctly
- fix deadlock in f2fs_gc along with atomic writes
- avoid needless data migration in GC"
* tag 'f2fs-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: stop GC when the victim becomes fully valid
f2fs: expose main_blkaddr in sysfs
f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project()
f2fs: Fix deadlock in f2fs_gc() context during atomic files handling
f2fs: show f2fs instance in printk_ratelimited
f2fs: fix potential overflow
f2fs: fix to update dir's i_pino during cross_rename
f2fs: support aligned pinned file
f2fs: avoid kernel panic on corruption test
f2fs: fix wrong description in document
f2fs: cache global IPU bio
f2fs: fix to avoid memory leakage in f2fs_listxattr
f2fs: check total_segments from devices in raw_super
f2fs: update multi-dev metadata in resize_fs
f2fs: mark recovery flag correctly in read_raw_super_block()
f2fs: fix to update time in lazytime mode
- Add the IV_INO_LBLK_64 encryption policy flag which modifies the
encryption to be optimized for UFS inline encryption hardware.
- For AES-128-CBC, use the crypto API's implementation of ESSIV (which
was added in 5.4) rather than doing ESSIV manually.
- A few other cleanups.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXdtVMxQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK8MVAP44iRzj8ZXu62BhqNOYYcF60s/58QfZ
Jo1VdmvO/8MNrAD+P/jW5sqzcB5BLdNzS7pLKGIzsC55uMyp/79xyKK8wQc=
=XKWV
-----END PGP SIGNATURE-----
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fscrypt updates from Eric Biggers:
- Add the IV_INO_LBLK_64 encryption policy flag which modifies the
encryption to be optimized for UFS inline encryption hardware.
- For AES-128-CBC, use the crypto API's implementation of ESSIV (which
was added in 5.4) rather than doing ESSIV manually.
- A few other cleanups.
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
f2fs: add support for IV_INO_LBLK_64 encryption policies
ext4: add support for IV_INO_LBLK_64 encryption policies
fscrypt: add support for IV_INO_LBLK_64 policies
fscrypt: avoid data race on fscrypt_mode::logged_impl_name
docs: ioctl-number: document fscrypt ioctl numbers
fscrypt: zeroize fscrypt_info before freeing
fscrypt: remove struct fscrypt_ctx
fscrypt: invoke crypto API for ESSIV handling
Setting softlimit larger than hardlimit seems meaningless
for disk quota but currently it is allowed. In this case,
there may be a bit of comfusion for users when they run
df comamnd to directory which has project quota.
For example, we set 20M softlimit and 10M hardlimit of
block usage limit for project quota of test_dir(project id 123).
[root@hades f2fs]# repquota -P -a
*** Report for project quotas on device /dev/nvme0n1p8
Block grace time: 7days; Inode grace time: 7days
Block limits File limits
Project used soft hard grace used soft hard grace
----------------------------------------------------------------------
0 -- 4 0 0 1 0 0
123 +- 10248 20480 10240 2 0 0
The result of df command as below:
[root@hades f2fs]# df -h /mnt/f2fs/test
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p8 20M 11M 10M 51% /mnt/f2fs
Even though it looks like there is another 10M free space to use,
if we write new data to diretory test(inherit project id),
the write will fail with errno(-EDQUOT).
After this patch, the df result looks like below.
[root@hades f2fs]# df -h /mnt/f2fs/test
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p8 10M 10M 0 100% /mnt/f2fs
Signed-off-by: Chengguang Xu <cgxu519@mykernel.net>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Avoid the need to allocate a potentially large array of struct blk_zone
in the block layer by switching the ->report_zones method interface to
a callback model. Now the caller simply supplies a callback that is
executed on each reported zone, and private data for it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch supports 2MB-aligned pinned file, which can guarantee no GC at all
by allocating fully valid 2MB segment.
Check free segments by has_not_enough_free_secs() with large budget.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs inode numbers are stable across filesystem resizing, and f2fs inode
and file logical block numbers are always 32-bit. So f2fs can always
support IV_INO_LBLK_64 encryption policies. Wire up the needed
fscrypt_operations to declare support.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Use dquot_load_quota_inode from filesystems instead of dquot_enable().
In all three cases we want to load quota inode and never use the
function to update quota flags.
Signed-off-by: Jan Kara <jack@suse.cz>
In commit 8648de2c58 ("f2fs: add bio cache for IPU"), we added
f2fs_submit_ipu_bio() in __write_data_page() as below:
__write_data_page()
if (!S_ISDIR(inode->i_mode) && !IS_NOQUOTA(inode)) {
f2fs_submit_ipu_bio(sbi, bio, page);
....
}
in order to avoid below deadlock:
Thread A Thread B
- __write_data_page (inode x, page y)
- f2fs_do_write_data_page
- set_page_writeback ---- set writeback flag in page y
- f2fs_inplace_write_data
- f2fs_balance_fs
- lock gc_mutex
- lock gc_mutex
- f2fs_gc
- do_garbage_collect
- gc_data_segment
- move_data_page
- f2fs_wait_on_page_writeback
- wait_on_page_writeback --- wait writeback of page y
However, the bio submission breaks the merge of IPU IOs.
So in this patch let's add a global bio cache for merged IPU pages,
then f2fs_wait_on_page_writeback() is able to submit bio if a
writebacked page is cached in global bio cache.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For multi-device F2FS, we should check if the sum of total_segments from
all devices matches segment_count.
Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
On the combination of first fail and second success,
we will miss to mark recovery flag because currently
we reuse err variable in the loop.
Signed-off-by: Chengguang Xu <cgxu519@zoho.com.cn>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In this round, we introduced casefolding support in f2fs, and fixed various bugs
in individual features such as IO alignment, checkpoint=disable, quota, and
swapfile.
Enhancement:
- support casefolding w/ enhancement in ext4
- support fiemap for directory
- support FS_IO_GET|SET_FSLABEL
Bug fix:
- fix IO stuck during checkpoint=disable
- avoid infinite GC loop
- fix panic/overflow related to IO alignment feature
- fix livelock in swap file
- fix discard command leak
- disallow dio for atomic_write
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAl2FL1wACgkQQBSofoJI
UNIRdg/+M6QNiAazIbqzPXyUUkyZQYR5YKhu2abd49o/g38xjTq1afH0PZpQyDrA
9RncR4xsW8F241vPLVoCSanfaa+MxN6xHi3TrD8zYtZWxOcPF6v1ETHeUXGTHuJ2
gqlk+mm+CnY02M6rxW7XwixuXwttT3bF9+cf1YBWRpNoVrR+SjNqgeJS7FmJwXKd
nGKb+94OxuygL1NUop+LDUo3qRQjc0Sxv/7qj/K4lhqgTjhAxMYT2KvUP/1MZ7U0
Kh9WIayDXnpoioxMPnt4VEb+JgXfLLFELvQzNjwulk15GIweuJzwVYCBXcRoX0cK
eRBRmRy/kRp/e0R1gvl3kYrXQC2AC5QTlBVH/0ESwnaukFiUBKB509vH4aqE/vpB
Krldjfg+uMHkc7XiNBf1boDp713vJ76iRKUDWoVb6H/sPbdJ+jtrnUNeBP8CVpWh
u31SY1MppnmKhhsoCHQRbhbXO/Z29imBQgF9Tm3IFWImyLY3IU40vFj2fR15gJkL
X3x/HWxQynSqyqEOwAZrvhCRTvBAIGIVy5292Di1RkqIoh8saxcqiaywgLz1+eVE
0DCOoh8R6sSbfN/EEh+yZqTxmjo0VGVTw30XVI6QEo4cY5Vfc9u6dN6SRWVRvbjb
kPb3dKcMrttgbn3fcXU8Jbw1AOor9N6afHaqs0swQJyci2RwJyc=
=oonf
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we introduced casefolding support in f2fs, and fixed
various bugs in individual features such as IO alignment,
checkpoint=disable, quota, and swapfile.
Enhancement:
- support casefolding w/ enhancement in ext4
- support fiemap for directory
- support FS_IO_GET|SET_FSLABEL
Bug fix:
- fix IO stuck during checkpoint=disable
- avoid infinite GC loop
- fix panic/overflow related to IO alignment feature
- fix livelock in swap file
- fix discard command leak
- disallow dio for atomic_write"
* tag 'f2fs-for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits)
f2fs: add a condition to detect overflow in f2fs_ioc_gc_range()
f2fs: fix to add missing F2FS_IO_ALIGNED() condition
f2fs: fix to fallback to buffered IO in IO aligned mode
f2fs: fix to handle error path correctly in f2fs_map_blocks
f2fs: fix extent corrupotion during directIO in LFS mode
f2fs: check all the data segments against all node ones
f2fs: Add a small clarification to CONFIG_FS_F2FS_FS_SECURITY
f2fs: fix inode rwsem regression
f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()
f2fs: avoid infinite GC loop due to stale atomic files
f2fs: Fix indefinite loop in f2fs_gc()
f2fs: convert inline_data in prior to i_size_write
f2fs: fix error path of f2fs_convert_inline_page()
f2fs: add missing documents of reserve_root/resuid/resgid
f2fs: fix flushing node pages when checkpoint is disabled
f2fs: enhance f2fs_is_checkpoint_ready()'s readability
f2fs: clean up __bio_alloc()'s parameter
f2fs: fix wrong error injection path in inc_valid_block_count()
f2fs: fix to writeout dirty inode during node flush
f2fs: optimize case-insensitive lookups
...
Please consider pulling fs-verity for 5.4.
fs-verity is a filesystem feature that provides Merkle tree based
hashing (similar to dm-verity) for individual readonly files, mainly for
the purpose of efficient authenticity verification.
This pull request includes:
(a) The fs/verity/ support layer and documentation.
(b) fs-verity support for ext4 and f2fs.
Compared to the original fs-verity patchset from last year, the UAPI to
enable fs-verity on a file has been greatly simplified. Lots of other
things were cleaned up too.
fs-verity is planned to be used by two different projects on Android;
most of the userspace code is in place already. Another userspace tool
("fsverity-utils"), and xfstests, are also available. e2fsprogs and
f2fs-tools already have fs-verity support. Other people have shown
interest in using fs-verity too.
I've tested this on ext4 and f2fs with xfstests, both the existing tests
and the new fs-verity tests. This has also been in linux-next since
July 30 with no reported issues except a couple minor ones I found
myself and folded in fixes for.
Ted and I will be co-maintaining fs-verity.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXX8ZUBQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK2YOAQCbnBAKWDxXS3alLARRwjQLjmEtQIGl
gsek+WurFIg/zAEAlpSzHwu13LvYzTqv3rhO2yhSlvhnDu4GQEJPXPm0wgM=
=ID0n
-----END PGP SIGNATURE-----
Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fs-verity support from Eric Biggers:
"fs-verity is a filesystem feature that provides Merkle tree based
hashing (similar to dm-verity) for individual readonly files, mainly
for the purpose of efficient authenticity verification.
This pull request includes:
(a) The fs/verity/ support layer and documentation.
(b) fs-verity support for ext4 and f2fs.
Compared to the original fs-verity patchset from last year, the UAPI
to enable fs-verity on a file has been greatly simplified. Lots of
other things were cleaned up too.
fs-verity is planned to be used by two different projects on Android;
most of the userspace code is in place already. Another userspace tool
("fsverity-utils"), and xfstests, are also available. e2fsprogs and
f2fs-tools already have fs-verity support. Other people have shown
interest in using fs-verity too.
I've tested this on ext4 and f2fs with xfstests, both the existing
tests and the new fs-verity tests. This has also been in linux-next
since July 30 with no reported issues except a couple minor ones I
found myself and folded in fixes for.
Ted and I will be co-maintaining fs-verity"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
f2fs: add fs-verity support
ext4: update on-disk format documentation for fs-verity
ext4: add fs-verity read support
ext4: add basic fs-verity support
fs-verity: support builtin file signatures
fs-verity: add SHA-512 support
fs-verity: implement FS_IOC_MEASURE_VERITY ioctl
fs-verity: implement FS_IOC_ENABLE_VERITY ioctl
fs-verity: add data verification hooks for ->readpages()
fs-verity: add the hook for file ->setattr()
fs-verity: add the hook for file ->open()
fs-verity: add inode and superblock fields
fs-verity: add Kconfig and the helper functions for hashing
fs: uapi: define verity bit for FS_IOC_GETFLAGS
fs-verity: add UAPI header
fs-verity: add MAINTAINERS file entry
fs-verity: add a documentation file
As a part of the sanity checking while mounting, distinct segment number
assignment to data and node segments is verified. Fixing a small bug in
this verification between node and data segments. We need to check all
the data segments with all the node segments.
Fixes: 042be0f849 ("f2fs: fix to do sanity check with current segment number")
Signed-off-by: Surbhi Palande <csurbhi@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In mkfs, we have counted quota file's node number in cp.valid_node_count,
so we have to avoid wrong substraction of quota node number in
.available_nid/.avail_node_count calculation.
f2fs_write_check_point_pack()
{
..
set_cp(valid_node_count, 1 + c.quota_inum + c.lpf_inum);
Fixes: 292c196a36 ("f2fs: reserve nid resource for quota sysfile")
Fixes: 7b63f72f73 ("f2fs: fix to do sanity check on valid node/block count")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Modeled after commit b886ee3e77 ("ext4: Support case-insensitive file
name lookups")
"""
This patch implements the actual support for case-insensitive file name
lookups in f2fs, based on the feature bit and the encoding stored in the
superblock.
A filesystem that has the casefold feature set is able to configure
directories with the +F (F2FS_CASEFOLD_FL) attribute, enabling lookups
to succeed in that directory in a case-insensitive fashion, i.e: match
a directory entry even if the name used by userspace is not a byte per
byte match with the disk name, but is an equivalent case-insensitive
version of the Unicode string. This operation is called a
case-insensitive file name lookup.
The feature is configured as an inode attribute applied to directories
and inherited by its children. This attribute can only be enabled on
empty directories for filesystems that support the encoding feature,
thus preventing collision of file names that only differ by case.
* dcache handling:
For a +F directory, F2Fs only stores the first equivalent name dentry
used in the dcache. This is done to prevent unintentional duplication of
dentries in the dcache, while also allowing the VFS code to quickly find
the right entry in the cache despite which equivalent string was used in
a previous lookup, without having to resort to ->lookup().
d_hash() of casefolded directories is implemented as the hash of the
casefolded string, such that we always have a well-known bucket for all
the equivalencies of the same string. d_compare() uses the
utf8_strncasecmp() infrastructure, which handles the comparison of
equivalent, same case, names as well.
For now, negative lookups are not inserted in the dcache, since they
would need to be invalidated anyway, because we can't trust missing file
dentries. This is bad for performance but requires some leveraging of
the vfs layer to fix. We can live without that for now, and so does
everyone else.
* on-disk data:
Despite using a specific version of the name as the internal
representation within the dcache, the name stored and fetched from the
disk is a byte-per-byte match with what the user requested, making this
implementation 'name-preserving'. i.e. no actual information is lost
when writing to storage.
DX is supported by modifying the hashes used in +F directories to make
them case/encoding-aware. The new disk hashes are calculated as the
hash of the full casefolded string, instead of the string directly.
This allows us to efficiently search for file names in the htree without
requiring the user to provide an exact name.
* Dealing with invalid sequences:
By default, when a invalid UTF-8 sequence is identified, ext4 will treat
it as an opaque byte sequence, ignoring the encoding and reverting to
the old behavior for that unique file. This means that case-insensitive
file name lookup will not work only for that file. An optional bit can
be set in the superblock telling the filesystem code and userspace tools
to enforce the encoding. When that optional bit is set, any attempt to
create a file name using an invalid UTF-8 sequence will fail and return
an error to userspace.
* Normalization algorithm:
The UTF-8 algorithms used to compare strings in f2fs is implemented
in fs/unicode, and is based on a previous version developed by
SGI. It implements the Canonical decomposition (NFD) algorithm
described by the Unicode specification 12.1, or higher, combined with
the elimination of ignorable code points (NFDi) and full
case-folding (CF) as documented in fs/unicode/utf8_norm.c.
NFD seems to be the best normalization method for F2FS because:
- It has a lower cost than NFC/NFKC (which requires
decomposing to NFD as an intermediary step)
- It doesn't eliminate important semantic meaning like
compatibility decompositions.
Although:
- This implementation is not completely linguistic accurate, because
different languages have conflicting rules, which would require the
specialization of the filesystem to a given locale, which brings all
sorts of problems for removable media and for users who use more than
one language.
"""
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Add charset encoding to f2fs to support casefolding. It is modeled after
the same feature introduced in commit c83ad55eaa ("ext4: include charset
encoding information in the superblock")
Currently this is not compatible with encryption, similar to the current
ext4 imlpementation. This will change in the future.
>From the ext4 patch:
"""
The s_encoding field stores a magic number indicating the encoding
format and version used globally by file and directory names in the
filesystem. The s_encoding_flags defines policies for using the charset
encoding, like how to handle invalid sequences. The magic number is
mapped to the exact charset table, but the mapping is specific to ext4.
Since we don't have any commitment to support old encodings, the only
encoding I am supporting right now is utf8-12.1.0.
The current implementation prevents the user from enabling encoding and
per-directory encryption on the same filesystem at the same time. The
incompatibility between these features lies in how we do efficient
directory searches when we cannot be sure the encryption of the user
provided fname will match the actual hash stored in the disk without
decrypting every directory entry, because of normalization cases. My
quickest solution is to simply block the concurrent use of these
features for now, and enable it later, once we have a better solution.
"""
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
With quota_ino feature on, generic/232 reports an inconsistence issue
on the image.
The root cause is that the testcase tries to:
- use quotactl to shutdown journalled quota based on sysfile;
- and then use quotactl to enable/turn on quota based on specific file
(aquota.user or aquota.group).
Eventually, quota sysfile will be out-of-update due to following specific
file creation.
Change as below to fix this issue:
- deny enabling quota based on specific file if quota sysfile exists.
- set SBI_QUOTA_NEED_REPAIR once sysfile based quota shutdowns via
ioctl.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=204193
A null pointer dereference bug is triggered in f2fs under kernel-5.1.3.
kasan_report.cold+0x5/0x32
f2fs_write_end_io+0x215/0x650
bio_endio+0x26e/0x320
blk_update_request+0x209/0x5d0
blk_mq_end_request+0x2e/0x230
lo_complete_rq+0x12c/0x190
blk_done_softirq+0x14a/0x1a0
__do_softirq+0x119/0x3e5
irq_exit+0x94/0xe0
call_function_single_interrupt+0xf/0x20
During umount, we will access NULL sbi->node_inode pointer in
f2fs_write_end_io():
f2fs_bug_on(sbi, page->mapping == NODE_MAPPING(sbi) &&
page->index != nid_of_node(page));
The reason is if disable_checkpoint mount option is on, meta dirty
pages can remain during umount, and then be flushed by iput() of
meta_inode, however node_inode has been iput()ed before
meta_inode's iput().
Since checkpoint is disabled, all meta/node datas are useless and
should be dropped in next mount, so in umount, let's adjust
drop_inode() to give a hint to iput_final() to drop all those dirty
datas correctly.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If IO alignment feature is turned on after remount, we didn't
initialize mempool of it, it turns out we will encounter panic
during IO submission due to access NULL mempool pointer.
This feature should be set only at mount time, so simply deny
configuring during remount.
This fixes bug reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=204135
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Since 07173c3ec2 ("block: enable multipage bvecs"), one bio vector
can store multi pages, so that we can not calculate max IO size of
bio as PAGE_SIZE * bio->bi_max_vecs. However IO alignment feature of
f2fs always has that assumption, so finally, it may cause panic during
IO submission as below stack.
kernel BUG at fs/f2fs/data.c:317!
RIP: 0010:__submit_merged_bio+0x8b0/0x8c0
Call Trace:
f2fs_submit_page_write+0x3cd/0xdd0
do_write_page+0x15d/0x360
f2fs_outplace_write_data+0xd7/0x210
f2fs_do_write_data_page+0x43b/0xf30
__write_data_page+0xcf6/0x1140
f2fs_write_cache_pages+0x3ba/0xb40
f2fs_write_data_pages+0x3dd/0x8b0
do_writepages+0xbb/0x1e0
__writeback_single_inode+0xb6/0x800
writeback_sb_inodes+0x441/0x910
wb_writeback+0x261/0x650
wb_workfn+0x1f9/0x7a0
process_one_work+0x503/0x970
worker_thread+0x7d/0x820
kthread+0x1ad/0x210
ret_from_fork+0x35/0x40
This patch adds one extra condition to check left space in bio while
trying merging page to bio, to avoid panic.
This bug was reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=204043
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Add fs-verity support to f2fs. fs-verity is a filesystem feature that
enables transparent integrity protection and authentication of read-only
files. It uses a dm-verity like mechanism at the file level: a Merkle
tree is used to verify any block in the file in log(filesize) time. It
is implemented mainly by helper functions in fs/verity/. See
Documentation/filesystems/fsverity.rst for the full documentation.
The f2fs support for fs-verity consists of:
- Adding a filesystem feature flag and an inode flag for fs-verity.
- Implementing the fsverity_operations to support enabling verity on an
inode and reading/writing the verity metadata.
- Updating ->readpages() to verify data as it's read from verity files
and to support reading verity metadata pages.
- Updating ->write_begin(), ->write_end(), and ->writepages() to support
writing verity metadata pages.
- Calling the fs-verity hooks for ->open(), ->setattr(), and ->ioctl().
Like ext4, f2fs stores the verity metadata (Merkle tree and
fsverity_descriptor) past the end of the file, starting at the first 64K
boundary beyond i_size. This approach works because (a) verity files
are readonly, and (b) pages fully beyond i_size aren't visible to
userspace but can be read/written internally by f2fs with only some
relatively small changes to f2fs. Extended attributes cannot be used
because (a) f2fs limits the total size of an inode's xattr entries to
4096 bytes, which wouldn't be enough for even a single Merkle tree
block, and (b) f2fs encryption doesn't encrypt xattrs, yet the verity
metadata *must* be encrypted when the file is because it contains hashes
of the plaintext data.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Wire up the new ioctls for adding and removing fscrypt keys to/from the
filesystem, and the new ioctl for retrieving v2 encryption policies.
The key removal ioctls also required making f2fs_drop_inode() call
fscrypt_drop_inode().
For more details see Documentation/filesystems/fscrypt.rst and the
fscrypt patches that added the implementation of these ioctls.
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
This set of patches adjust to follow recent setflags changes and fix two
regression introduced since 5.4-rc1.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAl1AgaYACgkQQBSofoJI
UNJzPA/+NXLuNcjdP2HdiTj2fZc5saFDZkMKmjd6559rsQI9799qGIqZbhoynV15
sWNToo04LL1nNSTj/+/Kemwx3jnKUst4HCeUodb88Fey/phzbX4Di59aOv9cb5nT
g1Fozf7pt4TI67/zOxrP32s5hAUe1c7kkdp4XC8lK4Sv7KqXxvw6dz7QOWCaWd/z
DfEww1miNL0zRKSHUxj+74nLU6Q3Ot9/lEg4Af+8hJfURuCtkqck3yUXuyYnPmA0
4E15IeayqUEwqLHsGb3ysWsTQz1gRPRJncMz9FHsaW2pPIGiQtvJMzlhmo8vdrnP
8vRXGL9U/kuzod/MR0nOl16Wmq4u75IbK4FGEHF032An0bzYspwej5GDWbD7fiIW
gjGF6MoSmtYdngSxED2Cf81MM5oCKGMMxkMc9C+NFIMR9UNtagTMjzLtEICP/Q0h
7u6A5vqbND7Oh8O2b+Xk4kvVtJ2Czu0Jq7p6kdfCPTXV3oJR9Wq7fdXyGBtm6AWx
sjbt6d0RvuR8G+4jPI8Wa8gZ9cPJIQW+Iw917Q/mDqCfK7mXarrpy8tsNzNduLzl
80XjIA1BknNPFOHKPndSAhkXrpb269nPVfdtPTv0GAris3DANG71tKMp9pijFOo0
rn6SKzVVmYo41FWgYpQc2b5V8uQZXK/LMY4TF6bgRHAfmPPgOxk=
=TSfD
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
"This set of patches adjust to follow recent setflags changes and fix
two regressions"
* tag 'f2fs-for-5.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: use EINVAL for superblock with invalid magic
f2fs: fix to read source block before invalidating it
f2fs: remove redundant check from f2fs_setflags_common()
f2fs: use generic checking function for FS_IOC_FSSETXATTR
f2fs: use generic checking and prep function for FS_IOC_SETFLAGS
The kernel mount_block_root() function expects -EACESS or -EINVAL for a
unmountable filesystem when trying to mount the root with different
filesystem types.
However, in 5.3-rc1 the behavior when F2FS code cannot find valid block
changed to return -EFSCORRUPTED(-EUCLEAN), and this error code makes
mount_block_root() fail when trying to probe F2FS.
When the magic number of the superblock mismatches, it has a high
probability that it's just not a F2FS. In this case return -EINVAL seems
to be a better result, and this return value can make mount_block_root()
probing work again.
Return -EINVAL when the superblock has magic mismatch, -EFSCORRUPTED in
other cases (the magic matches but the superblock cannot be recognized).
Fixes: 10f966bbf5 ("f2fs: use generic EFSBADCRC/EFSCORRUPTED")
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl0s1ZEQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpiCEEACE9H/pXoegTTWIVPVajMlsa19UHIeilk4N
GI7oKSiirQEMZnAOmrEzgB4/0zyYQsVypys0gZlYUD3GJVsXDT3zzjNXL5NpVg/O
nqwSGWMHBSjWkLbaM40Pb2QLXsYgveptNL+9PtxrgtoYPoT5/+TyrJMFrRfi72EK
WFeNDKOu6aJxpJ26JSsckJ0gluKeeEpRoEqsgHGIwaMIGHQf+b+ikk7tel5FAIgA
uDwwD+Oxsdgh/ChsXL0d90GkcbcSp6GQ7GybxVmw/tPijx6mpeIY72xY3Zx+t8zF
b71UNk6NmCKjOPO/6fiuYKKTYw+KhzlyEKO0j675HKfx2AhchEwKw0irp4yUlydA
zxWYmz4U7iRgktJtymv3J4FEQQ3S6d1EnuQkQNX1LwiOsEsfzhkWi+7jy7KFhZoJ
AqtYzqnOXvLx92q0vloj06HtK6zo+I/MINldy0+qn9lq0N0VF+dctyztAHLsF7P6
pUtS6i7l1JSFKAmMhC31sIj5TImaehM2e/TWMUPEDZaO96oKCmQwOF1oiloc6vlW
h4xWsxP/9zOFcWNyPzy6Vo3JUXWRvFA7K+jV3Hsukw6rVHiNCGVYGSlTv8Roi5b7
I4ggu9R2JOGyku7UIlL50IRxEyjAp11LaO8yHhcCnRB65rmyBuNMQNcfOsfxpZ5Y
1mtSNhm5TQ==
=g8xI
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20190715' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
"A later pull request with some followup items. I had some vacation
coming up to the merge window, so certain things items were delayed a
bit. This pull request also contains fixes that came in within the
last few days of the merge window, which I didn't want to push right
before sending you a pull request.
This contains:
- NVMe pull request, mostly fixes, but also a few minor items on the
feature side that were timing constrained (Christoph et al)
- Report zones fixes (Damien)
- Removal of dead code (Damien)
- Turn on cgroup psi memstall (Josef)
- block cgroup MAINTAINERS entry (Konstantin)
- Flush init fix (Josef)
- blk-throttle low iops timing fix (Konstantin)
- nbd resize fixes (Mike)
- nbd 0 blocksize crash fix (Xiubo)
- block integrity error leak fix (Wenwen)
- blk-cgroup writeback and priority inheritance fixes (Tejun)"
* tag 'for-linus-20190715' of git://git.kernel.dk/linux-block: (42 commits)
MAINTAINERS: add entry for block io cgroup
null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONED
block: Limit zone array allocation size
sd_zbc: Fix report zones buffer allocation
block: Kill gfp_t argument of blkdev_report_zones()
block: Allow mapping of vmalloc-ed buffers
block/bio-integrity: fix a memory leak bug
nvme: fix NULL deref for fabrics options
nbd: add netlink reconfigure resize support
nbd: fix crash when the blksize is zero
block: Disable write plugging for zoned block devices
block: Fix elevator name declaration
block: Remove unused definitions
nvme: fix regression upon hot device removal and insertion
blk-throttle: fix zero wait time for iops throttled group
block: Fix potential overflow in blk_report_zones()
blkcg: implement REQ_CGROUP_PUNT
blkcg, writeback: Implement wbc_blkcg_css()
blkcg, writeback: Add wbc->no_cgroup_owner
blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner()
...
Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In
preparation of using vmalloc() for large report buffer and zone array
allocations used by this function, remove its "gfp_t gfp_mask" argument
and rely on the caller context to use memalloc_noio_save/restore() where
necessary (block layer zone revalidation and dm-zoned I/O error path).
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
As Park Ju Hyung suggested:
"I'd like to suggest to write down an actual version of f2fs-tools
here as we've seen older versions of fsck doing even more damage
and the users might not have the latest f2fs-tools installed."
This patch give a more detailed info of how we fix such corruption
to user to avoid damageable repair with low version fsck.
Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Two paths to update quota and f2fs_lock_op:
1.
- lock_op
| - quota_update
`- unlock_op
2.
- quota_update
- lock_op
`- unlock_op
But, we need to make a transaction on quota_update + lock_op in #2 case.
So, this patch introduces:
1. lock_op
2. down_write
3. check __need_flush
4. up_write
5. if there is dirty quota entries, flush them
6. otherwise, good to go
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs uses EFAULT as error number to indicate filesystem is corrupted
all the time, but generic filesystems use EUCLEAN for such condition,
we need to change to follow others.
This patch adds two new macros as below to wrap more generic error
code macros, and spread them in code.
EFSBADCRC EBADMSG /* Bad CRC detected */
EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
- Add and use f2fs_<level> macros
- Convert f2fs_msg to f2fs_printk
- Remove level from f2fs_printk and embed the level in the format
- Coalesce formats and align multi-line arguments
- Remove unnecessary duplicate extern f2fs_msg f2fs.h
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This ioctl shrinks a given length (aligned to sections) from end of the
main area. Any cursegs and valid blocks will be moved out before
invalidating the range.
This feature can be used for adjusting partition sizes online.
History of the patch:
Sahitya Tummala:
- Add this ioctl for f2fs_compat_ioctl() as well.
- Fix debugfs status to reflect the online resize changes.
- Fix potential race between online resize path and allocate new data
block path or gc path.
Others:
- Rename some identifiers.
- Add some error handling branches.
- Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range.
- Implement this interface as ext4's, and change the parameter from shrunk
bytes to new block count of F2FS.
- During resizing, force to empty sit_journal and forbid adding new
entries to it, in order to avoid invalid segno in journal after resize.
- Reduce sbi->user_block_count before resize starts.
- Commit the updated superblock first, and then update in-memory metadata
only when the former succeeds.
- Target block count must align to sections.
- Write checkpoint before and after committing the new superblock, w/o
CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if
resize fails after the new superblock is committed.
- In free_segment_range(), reduce granularity of gc_mutex.
- Add protection on curseg migration.
- Add freeze_bdev() and thaw_bdev() for resize fs.
- Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation.
- Recover super_block and FS metadata when resize fails.
- No need to clear CP_FSCK_FLAG in update_ckpt_flags().
- Clean up the sb and fs metadata update functions for resize_fs.
Geert Uytterhoeven:
- Use div_u64*() for 64-bit divisions
Arnd Bergmann:
- Not all architectures support get_user() with a 64-bit argument:
ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined!
Use copy_from_user() here, this will always work.
Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This extends the checkpoint option to allow checkpoint=disable:%u[%]
This allows you to specify what how much of the disk you are willing
to lose access to while mounting with checkpoint=disable. If the amount
lost would be higher, the mount will return -EAGAIN. This can be given
as a percent of total space, or in blocks.
Currently, we need to run garbage collection until the amount of holes
is smaller than the OVP space. With the new option, f2fs can mark
space as unusable up front instead of requiring garbage collection until
the number of holes is small enough.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
On a remount, you can currently set root reserved if it was not
previously set. This can cause an underflow if reserved has been set to
a very high value, since then root reserved + current reserved could be
greater than user_block_count. inc_valid_block_count later subtracts out
these values from user_block_count, causing an underflow.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fix f2fs_show_options to show nodiscard mount option.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Add error prints to get more details on the mount failure.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Ju Hyung reported:
"
I was semi-forced today to use the new kernel and test f2fs.
My Ubuntu initramfs got a bit wonky and I had to boot into live CD and
fix some stuffs. The live CD was using 4.15 kernel, and just mounting
the f2fs partition there corrupted f2fs and my 4.19(with 5.1-rc1-4.19
f2fs-stable merged) refused to mount with "SIT is corrupted node"
message.
I used the latest f2fs-tools sent by Chao including "fsck.f2fs: fix to
repair cp_loads blocks at correct position"
It spit out 140M worth of output, but at least I didn't have to run it
twice. Everything returned "Ok" in the 2nd run.
The new log is at
http://arter97.com/f2fs/final
After fixing the image, I used my 4.19 kernel with 5.2-rc1-4.19
f2fs-stable merged and it mounted.
But, I got this:
[ 1.047791] F2FS-fs (nvme0n1p3): layout of large_nat_bitmap is
deprecated, run fsck to repair, chksum_offset: 4092
[ 1.081307] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
[ 1.161520] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
[ 1.162418] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7e00
But after doing a reboot, the message is gone:
[ 1.098423] F2FS-fs (nvme0n1p3): Found nat_bits in checkpoint
[ 1.177771] F2FS-fs (nvme0n1p3): recover fsync data on readonly fs
[ 1.178365] F2FS-fs (nvme0n1p3): Mounted with checkpoint version = 761c7eda
I'm not exactly sure why the kernel detected that I'm still using the
old layout on the first boot. Maybe fsck didn't fix it properly, or
the check from the kernel is improper.
"
Although we have rebuild the old deprecated checkpoint with new layout
during repair, we only repair last checkpoint park, the other old one is
remained.
Once the image was mounted, we will 1) sanity check layout and 2) decide
which checkpoint park to use according to cp_ver. So that we will print
reported message unnecessarily at step 1), to avoid it, we simply move
layout check into f2fs_sanity_check_ckpt() after step 2).
Reported-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch reverts:
commit fb40d618b0 ("f2fs: don't clear CP_QUOTA_NEED_FSCK_FLAG").
We were missing error handlers used in f2fs quota ops.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Another round of various bug fixes came in. Damien improved SMR drive support a
bit, and Chao replaced BUG_ON() with reporting errors to user since we've not
hit from users but did hit from crafted images. We've found a disk layout bug
in large_nat_bits feature which supports very large NAT entries enabled at mkfs.
If the feature is enabled, it will give a notice to run fsck to correct the
on-disk layout.
Enhancement:
- reduce memory consumption for SMR drive
- better discard handling for multiple partitions
- tracepoints for f2fs_file_write_iter/f2fs_filemap_fault
- allow to change CP_CHKSUM_OFFSET
- detect wrong layout of large_nat_bitmap feature
- enhance checking valid data indices
Bug fix:
- Multiple partition support for SMR drive
- deadlock problem in f2fs_balance_fs_bg
- add boundary checks to fix abnormal behaviors on fuzzed images
- inline_xattr space calculations
- replace f2fs_bug_on with errors
In addition, this series contains various memory boundary check and sanity check
of on-disk consistency.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAlzZ/ukACgkQQBSofoJI
UNKnkQ//U6QsEgNsC5GeNAXPLxodxC406CSo+aRk//3xUdORzkVNia1ThLeYVNfd
3LsLaSy1eLZ0GylrR/MpvHt+eSWH5L5Ferj2v4l1zAYLr29FEhl6bmYPyvc3e4pr
IbHwC6W1g1sV2LxeNp7z0chkT7cOAm633d3OVJzUXD5B1k52lx72QnqoXal0Ae1L
tt3h/TVBIRJX26nSTiEZTBnIDmpiZYSsJVGgjqLnTCXHWUKPPfuJqALiELluhBNJ
M7gDE3/JYJyb+1PrdtvsEesPJxSLovGJJJvcJKcuMhWwij0Jsq2BwiP3shcfj8iA
76PiPlhjS6D5sMo1hJzbKettusfZrxX284UHNacrkgA/TBbHeGGBy3Fbh6B3+/o1
qvCl0atqp3km6Z8vg5r7nDTOMrg0JSjsHz3WA9ZXZ+cSM6mGxk7vYMd7Sn7h2GqY
deIfWqlTdB8hOFQJWDswXI2ILyJlvquc4jTKIUmHp4ZEQXdYSWlBUWm7+1XZvoYn
rlrAcr/loSQ/gT4U/Z9RQdpMYb2n2n3+YF8QAeTDmVafMeqbrwspCZf6l8Pfoto1
ZVYO9QUIsvRBDEiDHdLmRwb1ckTTiHiHrO9YwtA5zlYRRyH93MamQTsH7BTGt0OC
tjCPbpQGlWK6M9p3LNQ4H5lsdLzD7EEP0JXLOQ57rY3aDaZsOsE=
=HOz+
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"Another round of various bug fixes came in. Damien improved SMR drive
support a bit, and Chao replaced BUG_ON() with reporting errors to
user since we've not hit from users but did hit from crafted images.
We've found a disk layout bug in large_nat_bits feature which supports
very large NAT entries enabled at mkfs. If the feature is enabled, it
will give a notice to run fsck to correct the on-disk layout.
Enhancements:
- reduce memory consumption for SMR drive
- better discard handling for multiple partitions
- tracepoints for f2fs_file_write_iter/f2fs_filemap_fault
- allow to change CP_CHKSUM_OFFSET
- detect wrong layout of large_nat_bitmap feature
- enhance checking valid data indices
Bug fixes:
- Multiple partition support for SMR drive
- deadlock problem in f2fs_balance_fs_bg
- add boundary checks to fix abnormal behaviors on fuzzed images
- inline_xattr space calculations
- replace f2fs_bug_on with errors
In addition, this series contains various memory boundary check and
sanity check of on-disk consistency"
* tag 'f2fs-for-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
f2fs: fix to avoid accessing xattr across the boundary
f2fs: fix to avoid potential race on sbi->unusable_block_count access/update
f2fs: add tracepoint for f2fs_filemap_fault()
f2fs: introduce DATA_GENERIC_ENHANCE
f2fs: fix to handle error in f2fs_disable_checkpoint()
f2fs: remove redundant check in f2fs_file_write_iter()
f2fs: fix to be aware of readonly device in write_checkpoint()
f2fs: fix to skip recovery on readonly device
f2fs: fix to consider multiple device for readonly check
f2fs: relocate chksum_offset for large_nat_bitmap feature
f2fs: allow unfixed f2fs_checkpoint.checksum_offset
f2fs: Replace spaces with tab
f2fs: insert space before the open parenthesis '('
f2fs: allow address pointer number of dnode aligning to specified size
f2fs: introduce f2fs_read_single_page() for cleanup
f2fs: mark is_extension_exist() inline
f2fs: fix to set FI_UPDATE_WRITE correctly
f2fs: fix to avoid panic in f2fs_inplace_write_data()
f2fs: fix to do sanity check on valid block count of segment
f2fs: fix to do sanity check on valid node/block count
...
Use sbi.stat_lock to protect sbi->unusable_block_count accesss/udpate, in
order to avoid potential race on it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In f2fs_disable_checkpoint(), it needs to detect and propagate error
number returned from f2fs_write_checkpoint().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Park Ju Hyung reported in mailing list:
https://sourceforge.net/p/linux-f2fs/mailman/message/36639787/
generic_make_request: Trying to write to read-only block-device loop0 (partno 0)
WARNING: CPU: 0 PID: 23437 at block/blk-core.c:2174 generic_make_request_checks+0x594/0x630
generic_make_request+0x46/0x3d0
submit_bio+0x30/0x110
__submit_merged_bio+0x68/0x390
f2fs_submit_page_write+0x1bb/0x7f0
f2fs_do_write_meta_page+0x7f/0x160
__f2fs_write_meta_page+0x70/0x140
f2fs_sync_meta_pages+0x140/0x250
f2fs_write_checkpoint+0x5c5/0x17b0
f2fs_sync_fs+0x9c/0x110
sync_filesystem+0x66/0x80
f2fs_recover_fsync_data+0x790/0xa30
f2fs_fill_super+0xe4e/0x1980
mount_bdev+0x518/0x610
mount_fs+0x34/0x13f
vfs_kern_mount.part.11+0x4f/0x120
do_mount+0x2d1/0xe40
__x64_sys_mount+0xbf/0xe0
do_syscall_64+0x4a/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
print_req_error: I/O error, dev loop0, sector 4096
If block device is readonly, we should never trigger write IO from
filesystem layer, but previously, orphan and journal recovery didn't
consider such condition, result in triggering above warning, fix it.
Reported-by: Park Ju Hyung <qkrwngud825@gmail.com>
Tested-by: Park Ju Hyung <qkrwngud825@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>