This patch fixes a couple of bugs regarding to orphan inodes when handling
errors.
This tries to
- call alloc_nid_done with add_orphan_inode in handle_failed_inode
- let truncate blocks in f2fs_evict_inode
- not make a bad inode due to i_mode change
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Commit 28bc106b23 ("f2fs: support revoking atomic written pages")
forgot to clear page private flag correctly, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Private data in page should be removed during ->releasepage or
->invalidatepage, otherwise garbage data would be remained in that page.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Commit 57b62d29ad ("f2fs: fix to report
error in f2fs_readdir") causes f2fs_readdir to return -ENOENT when
get_lock_data_page returns -ENOENT. However, the original logic is to
continue when get_lock_data_page returns -ENOENT, but it forgets to
reset err to 0.
This will cause getdents64 incorretly return -ENOENT when lastdirent is
NULL in getdents64. This will lead to a wrong return value for syscall
caller.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For foreground GC, we cache node blocks in victim section and set them
dirty, then we call sync_node_pages to flush these node pages, but
meanwhile, those node pages which does not locate in victim section
will be flushed together, so more bandwidth and continuous free space
would be occupied.
So for this condition, it's better to leave those unrelated node page
in cache for further write hit, and let CP or VM to flush them afterward.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The filename length in dirent of may become zero-sized after random junk
data injection, once encounter such dirent, find_target_dentry or
f2fs_add_inline_entries will run into an infinite loop. So let f2fs being
aware of that to avoid deadloop.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
I've changed employer, update my email address to the new one.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Under direct IO path with O_(D)SYNC, it needs to set proper APPEND or UPDATE
flags, so taht f2fs_sync_file can make its data safe.
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In order to give atomic writes, we should consider power failure during
sync_node_pages in fsync.
So, this patch marks fsync flag only in the last dnode block.
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The fsync_node_pages should return pass or failure so that user could know
fsync is completed or not.
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch splits the existing sync_node_pages into (f)sync_node_pages.
The fsync_node_pages is used for f2fs_sync_file only.
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The first page of volatile writes usually contains a sort of header information
which will be used for recovery.
(e.g., journal header of sqlite)
If this is written without other journal data, user needs to handle the stale
journal information.
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When fsync is called, sync_node_pages finds a proper direct node pages to flush.
But, it locks unrelated direct node pages together unnecessarily.
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If somebody wrote some data before atomic writes, we should flush them in order
to handle atomic data in a right period.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch returns -E2BIG if there is no space to add an xattr entry.
This should fix generic/026 in xfstests as well.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch resolves the redundant condition check reported by David.
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The atomic/volatile operation should be done in pair of start and commit
ioctl.
For example, if a killed process remains open-ended atomic operation, we should
drop its flag as well as its atomic data. Otherwise, if sqlite initiates another
operation which doesn't require atomic writes, it will lose every data, since
f2fs still treats with them as atomic writes; nobody will trigger its commit.
Reported-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When one reader closes its file while the other writer is doing atomic writes,
f2fs_release_file drops atomic data resulting in an empty commit.
This patch fixes this wrong commit problem by checking openess of the file.
Process0 Process1
open file
start atomic write
write data
read data
close file
f2fs_release_file()
clear atomic data
commit atomic write
Reported-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds BUG_ON instead of retrying loop.
In the case of node pages, we already got this inode page, but unlocked it.
By the fact that we don't truncate any node pages in operations, the page's
mapping should be unchangeable.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Previously, after trylock_page is succeeded, it doesn't check its mapping.
In order to fix that, we can just give PGP_LOCK to pagecache_get_page.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
With below serials, we will lose parts of dirents:
1) mount f2fs with inline_dentry option
2) echo 1 > /sys/fs/f2fs/sdX/dir_level
3) mkdir dir
4) touch 180 files named [1-180] in dir
5) touch 181 in dir
6) echo 3 > /proc/sys/vm/drop_caches
7) ll dir
ls: cannot access 2: No such file or directory
ls: cannot access 4: No such file or directory
ls: cannot access 5: No such file or directory
ls: cannot access 6: No such file or directory
ls: cannot access 8: No such file or directory
ls: cannot access 9: No such file or directory
...
total 360
drwxr-xr-x 2 root root 4096 Feb 19 15:12 ./
drwxr-xr-x 3 root root 4096 Feb 19 15:11 ../
-rw-r--r-- 1 root root 0 Feb 19 15:12 1
-rw-r--r-- 1 root root 0 Feb 19 15:12 10
-rw-r--r-- 1 root root 0 Feb 19 15:12 100
-????????? ? ? ? ? ? 101
-????????? ? ? ? ? ? 102
-????????? ? ? ? ? ? 103
...
The reason is: when doing the inline dir conversion, we didn't consider
that directory has hierarchical hash structure which can be configured
through sysfs interface 'dir_level'.
By default, dir_level of directory inode is 0, it means we have one bucket
in hash table located in first level, all dirents will be hashed in this
bucket, so it has no problem for us to do the duplication simply between
inline dentry page and converted normal dentry page.
However, if we configured dir_level with the value N (greater than 0), it
will expand the bucket number of first level hash table by 2^N - 1, it
hashs dirents into different buckets according their hash value, if we
still move all dirents to first bucket, it makes incorrent locating for
inline dirents, the result is, although we can iterate all dirents through
->readdir, we can't stat some of them in ->lookup which based on hash
table searching.
This patch fixes this issue by rehashing dirents into correct position
when converting inline directory.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds a sbi flag, SBI_NEED_SB_WRITE, which indicates it needs to
recover superblock when (re)mounting as RW. This is set only when f2fs is
mounted as RO.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When one of superblocks is missing, f2fs recovers it with the valid one.
But, even if f2fs is mounted as RO, we'd better notify that too.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull x86 fixes from Ingo Molnar:
"Misc fixes: a binutils fix, an lguest fix, an mcelog fix and a missing
documentation fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Avoid using object after free in genpool
lguest, x86/entry/32: Fix handling of guest syscalls using interrupt gates
x86/build: Build compressed x86 kernels as PIE
x86/mm/pkeys: Add missing Documentation
Pull mm gup cleanup from Ingo Molnar:
"This removes the ugly get-user-pages API hack, now that all upstream
code has been migrated to it"
("ugly" is putting it mildly. But it worked.. - Linus)
* 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mm/gup: Remove the macro overload API migration helpers from the get_user*() APIs
- stable@ fix for DM cache metadata's READ_LOCK macros that were
incorrectly returning error if the block manager was in read-only
mode; also cleanup multi-statement macros to use do {} while(0)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXEA8BAAoJEMUj8QotnQNauNoH/i5bVVPZTMKA3wzCpziuP08X
V5E9nLhd+Lvng05AZLsrTeIdvDfS7eAPQyFkPY9NpvzoJEAL0ft6bjGdaDTH090u
Y4mn08BcCwCzaJL41u43TGe3+iRZfUA/W7uu80SSaTuKq7EFeRha+LfyOUCwrNcT
P5LEPsl7EKhan3K3+9BrnX3IPG1MwswDAt9Sw0e83cyhFH5Xzy5AkygaSgHSamvf
/LJ3Hid8kCbdGseODJQ+sVg957+5H4mpZ3VUaTek/+DurCPpMFCxNQJTv2nxTd4h
AiEh+ed/RGzAP8hrZdtQTqKjc6ILiR0tRCtYX9WHDXLklkiJ5ZGKMgKyyrZYrzo=
=lzmV
-----END PGP SIGNATURE-----
Merge tag 'dm-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- fix a 4.6-rc1 bio-based DM 'struct dm_target_io' leak in an error
path
- stable@ fix for DM cache metadata's READ_LOCK macros that were
incorrectly returning error if the block manager was in read-only
mode; also cleanup multi-statement macros to use do {} while(0)
* tag 'dm-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros
dm: fix dm_target_io leak if clone_bio() returns an error
A single one-line fix to turn the regmap cache from an RB-tree to a flat
cache to avoid lockdep and abort issues.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXD7AJAAoJEN0jrNd/PrOhvF4QAImttdmzsC77+SKle4Lrs4TW
r1QoyDVkqU7R76MUW6L1kMVh48S3ppGBp8uxNj//vip9uohBfA07CcsfRoM/bQsY
6mYpkAZB84oPK/nUJynjzvpSfNZArDeAwA6d4z4EfxI0zJJLlaOCkoF3VKzyhcdy
dAReRs/gIaEMVXx2Lov/Y8Fr6D2hCAVeYohGvgMii7f/wCI/5wpeZI/D4cPf4+uO
tVMA8iCykcn1El+HsNBSZxvMlwTnNVqhS2N1Omsx43gu2jIaS0ILO4BIsPXI6QdG
L+fadSCbfq8M7BBC0thGSbRA/FOrz2aJzbvqjcpowGLqvSnzh6TXh35XIdXKF/iJ
vg4q4yOcR9k19IslSetp9re0CuR3uQoy9V1eWaoGkwE+orfHlSB7QhZ3kZQntgmP
Eb7FV820WbCIZtW3k0TDud/tX9twAOYaRVVVhxARnsaPIQ2t0QFPMzLfKWefU4/j
O+8PkXZW19KfTyZdWouimbLUSRvyngJv079lFnXSXTbsTXDi1xeJcRjW+rNNtW63
5kpXY6uZ1xrJs6rJJnIB3GzFGb9nIYznC9ezg8aIedUqX3X+FBKD/nh07fEENIx8
LnLm7t/V/nIDLcn3CloCugiChM4wEFgjhELcRJ/H7Z69KUZ67WEou3D1yOEIMsRm
67nnuzrFR58zVbHjJjtE
=T3Vh
-----END PGP SIGNATURE-----
Merge tag 'pwm/for-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
Pull pwm fix from Thierry Reding:
"A single one-line fix to turn the regmap cache from an RB-tree to a
flat cache to avoid lockdep and abort issues"
* tag 'pwm/for-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: fsl-ftm: Use flat regmap cache
we've had a very calm development cycle, so far. Here are the few
fixes for HD-audio and USB-audio, all of which are small and easy.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJXD2IxAAoJEGwxgFQ9KSmkckEQAJUw0T+6LGzvt91R1NUTvVc2
N5UAVWblefXL32siYFVNQ7H1PJMrCLdgMb6jBXUYvWxSPKtn37v/RAdSVC3LFTq1
V8tbmEkMZJzNECTeLH/wnj4reRjXO/hS1GibKx7ZdPGEaYWZV+T0kHkufu7s/KmC
THNspUwchfgcKc4BNkVm/ateypkPbuhA/hq6jB+XwjSE46AaZYVvDwy7uIbSEJhL
kODppXrOLp0qjUtU2T+vZXFghmv2d4FCt42R80KNwREQMzq1KNprP7uG64sz4hFw
Er/OTuhRezBssgr2xHvqF0mFQAjSMSJDB2j3tQGOaWjWR0aDHM8KKP9qjl4iWQ9e
FJ7fxmiPJqEd4Ixm+2KZXEfWk8dpF+d/6tPgi43oxz3svKwMKbS4CRANIOJvTxvQ
CCZkyqeySw6RR0OSXb2wwQlv39CEnAX872JpLpwH7kWLn27q8Prxojm5SH6iFJPv
fHVgDw9vZ2QWE6fN2sPZFP/4TQ39XWptt+gTFV6r2aX9OkYWv4MFGdypMxOOiG2P
TnyHIczqwjz/9f8OaI2F0U+bXmFC6IaC8tTBht1abhMcqe3jfpJDWccuHX7QNDGn
6Kcwb1pfgylk2NqAf2jc+X8LPwI/mqtMRNsYu6hOedVSwuSF76xDlafhdyfS8Ygp
IivXk+F+YdpX85fwVffp
=PYqq
-----END PGP SIGNATURE-----
Merge tag 'sound-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"We've had a very calm development cycle, so far. Here are the few
fixes for HD-audio and USB-audio, all of which are small and easy"
* tag 'sound-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Fix inconsistent monitor_present state until repoll
ALSA: hda - Fix regression of monitor_present flag in eld proc file
ALSA: usb-audio: Skip volume controls triggers hangup on Dell USB Dock
ALSA: hda/realtek - Enable the ALC292 dock fixup on the Thinkpad T460s
ALSA: sscape: Use correct format identifier for size_t
ALSA: usb-audio: Add a quirk for Plantronics BT300
ALSA: usb-audio: Add a sample rate quirk for Phoenix Audio TMX320
ALSA: hda - Bind with i915 only when Intel graphics is present
Pull mailbox fixes from Jussi Brar:
"Misc fixes:
mailbox-test driver:
- prevent memory leak and another cosmetic change
mailbox:
- change the returned error code
Xgene driver:
- return -ENOMEM instead of PTR_ERR for failed devm_kzalloc"
* 'mailbox-devel' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: Stop using ENOSYS for anything other than unimplemented syscalls
mailbox: mailbox-test: Prevent memory leak
mailbox: mailbox-test: Use more consistent format for calling copy_from_user()
mailbox: xgene-slimpro: Fix wrong test for devm_kzalloc
These patches fix f2fs and fscrypto based on -rc3 bug fixes in ext4 crypto,
which have not yet been fully propagated as follows.
- use of dget_parent and file_dentry to avoid crashes
- disallow RCU-mode lookup in d_invalidate
- disallow -ENOMEM in the core data encryption path
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXDbkhAAoJEEAUqH6CSFDS6yMP/1L+1PcmTJKloe0wCft57fL0
hXRXI57DeNwCxc9fKmaBnAn9uv96qc7fWq1oMqT0oYPTf8qGUaIc0bALxx/IFCwh
Ul1QOzlGJmomPq4kGleOEPmDGBGUwDYgH55rEadhmevbOMX5IGQOSDGu9qxZ+LMA
orYTYTHX5tjeR1UAxzjMkwa0WRyYJGRsqqycgIl8f/kIg06GIpxOx1IIuFQOBv1N
BJlkKbbiX/snWKXXVCNlFzVAZOWjpgzfvCGB0Fge4HZlMZL876jKIj+4dVmQ5xiI
K6i5NXk1U13hhQEFzK4McicHBwlMnMxeMbdR8wm1vQ/4UfQFZoQAeClAxz5/ZyPr
lHi+VamyTbMfRwijp49hNQ8SKjMKPd70yFd2d2csQAwVra2t4fmXqieh6AEppkJb
EDeFAK8Y+PGis6XVnAJ1wkXW7OSwzfJUVAmj3oFsZ7Onny+mm++f+i7a7I3RKUib
SH23Imv1OQaoGcAFVSCYe/JP9dcfKUwzSFFYOiQneqdNTNN8rzoPYYhQ4hNCKv6/
zHSEI2EcpGipmSTh4dROPJvNf42QG8Sw4lEuyGsXy251t7wxBrc7e43BzXtjyKsl
/2d5hgAQbR9HeU3Oz41Dq/LEjReEAcdJDKMETP9/qrzs7qDJgtI0XoUkVSn3NyOR
i3JSEix2JljR97DzUIW/
=ML1+
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs/fscrypto fixes from Jaegeuk Kim:
"In addition to f2fs/fscrypto fixes, I've added one patch which
prevents RCU mode lookup in d_revalidate, as Al mentioned.
These patches fix f2fs and fscrypto based on -rc3 bug fixes in ext4
crypto, which have not yet been fully propagated as follows.
- use of dget_parent and file_dentry to avoid crashes
- disallow RCU-mode lookup in d_invalidate
- disallow -ENOMEM in the core data encryption path"
* tag 'for-linus-4.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
ext4/fscrypto: avoid RCU lookup in d_revalidate
fscrypto: don't let data integrity writebacks fail with ENOMEM
f2fs: use dget_parent and file_dentry in f2fs_file_open
fscrypto: use dget_parent() in fscrypt_d_revalidate()
Pull crypto fixes from Herbert Xu:
"This fixes an NFS regression caused by the skcipher/hash conversion in
sunrpc. It also fixes a build problem in certain configurations with
bcm63xx"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
hwrng: bcm63xx - fix device tree compilation
sunrpc: Fix skcipher/shash conversion
Pull keys bugfixes from James Morris:
"Two bugfixes for Keys related code"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
ASN.1: fix open failure check on headername
assoc_array: don't call compare_object() on a node
The READ_LOCK macro was incorrectly returning -EINVAL if
dm_bm_is_read_only() was true -- it will always be true once the cache
metadata transitions to read-only by dm_cache_metadata_set_read_only().
Wrap READ_LOCK and WRITE_LOCK multi-statement macros in do {} while(0).
Also, all accesses of the 'cmd' argument passed to these related macros
are now encapsulated in parenthesis.
A follow-up patch can be developed to eliminate the use of macros in
favor of pure C code. Avoiding that now given that this needs to apply
to stable@.
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixes: d14fcf3dd7 ("dm cache: make sure every metadata function checks fail_io")
Cc: stable@vger.kernel.org
In commit c4004b02f8 ("x86: remove the kernel code/data/bss resources
from /proc/iomem") I was hoping to remove the phyiscal kernel address
data from /proc/iomem entirely, but that had to be reverted because some
system programs actually use it.
This limits all the detailed resource information to properly
credentialed users instead.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The PCI config access checked the file capabilities correctly, but used
the itnernal security capability check rather than the helper function
that is actually meant for that.
The security_capable() has unusual return values and is not meant to be
used elsewhere (the only other use is in the capability checking
functions that we actually intend people to use, and this odd PCI usage
really stood out when looking around the capability code.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A lot of seqfile users seem to be using things like %pK that uses the
credentials of the current process, but that is actually completely
wrong for filesystem interfaces.
The unix semantics for permission checking files is to check permissions
at _open_ time, not at read or write time, and that is not just a small
detail: passing off stdin/stdout/stderr to a suid application and making
the actual IO happen in privileged context is a classic exploit
technique.
So if we want to be able to look at permissions at read time, we need to
use the file open credentials, not the current ones. Normal file
accesses can just use "f_cred" (or any of the helper functions that do
that, like file_ns_capable()), but the seqfile interfaces do not have
any such options.
It turns out that seq_file _does_ save away the user_ns information of
the file, though. Since user_ns is just part of the full credential
information, replace that special case with saving off the cred pointer
instead, and suddenly seq_file has all the permission information it
needs.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit c4004b02f8.
Sadly, my hope that nobody would actually use the special kernel entries
in /proc/iomem were dashed by kexec. Which reads /proc/iomem explicitly
to find the kernel base address. Nasty.
Anyway, that means we can't do the sane and simple thing and just remove
the entries, and we'll instead have to mask them out based on permissions.
Reported-by: Zhengyu Zhang <zhezhang@redhat.com>
Reported-by: Dave Young <dyoung@redhat.com>
Reported-by: Freeman Zhang <freeman.zhang1992@gmail.com>
Reported-by: Emrah Demir <ed@abdsec.com>
Reported-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>