Fix problems noted in compilion with -Wformat=2 -Wformat-signedness.
In particular, a mismatch between the signedness of a value and the
signedness of its format specifier can result in unsigned values being
printed as negative numbers, e.g.:
Partition (0 type 1511) starts at physical 460, block length -1779968542
...which occurs when mounting a large (> 1 TiB) UDF partition.
Changes since V1:
* Fixed additional issues noted in udf_bitmap_free_blocks(),
udf_get_fileident(), udf_show_options()
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Large (> 1 TiB) UDF filesystems appear subject to several problems when
mounted on 64-bit systems:
* readdir() can fail on a directory containing File Identifiers residing
above 0x7FFFFFFF. This manifests as a 'ls' command failing with EIO.
* FIBMAP on a file block located above 0x7FFFFFFF can return a negative
value. The low 32 bits are correct, but applications that don't mask the
high 32 bits of the result can perform incorrectly.
Per suggestion by Jan Kara, introduce a udf_pblk_t type for representation
of UDF block addresses. Ultimately, all driver functions that manipulate
UDF block addresses should use this type; for now, deployment is limited
to functions with actual or potential sign extension issues.
Changes to udf_readdir() and udf_block_map() address the issues noted
above; other changes address potential similar issues uncovered during
audit of the driver code.
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When session starts beyond offset 2^31 the arithmetics in
udf_check_vsd() would overflow. Make sure the computation is done in
large enough type.
Reported-by: Cezary Sliwa <sliwa@ifpan.edu.pl>
Signed-off-by: Jan Kara <jack@suse.cz>
Pull mount flag updates from Al Viro:
"Another chunk of fmount preparations from dhowells; only trivial
conflicts for that part. It separates MS_... bits (very grotty
mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
only a small subset of MS_... stuff).
This does *not* convert the filesystems to new constants; only the
infrastructure is done here. The next step in that series is where the
conflicts would be; that's the conversion of filesystems. It's purely
mechanical and it's better done after the merge, so if you could run
something like
list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')
sed -i -e 's/\<MS_RDONLY\>/SB_RDONLY/g' \
-e 's/\<MS_NOSUID\>/SB_NOSUID/g' \
-e 's/\<MS_NODEV\>/SB_NODEV/g' \
-e 's/\<MS_NOEXEC\>/SB_NOEXEC/g' \
-e 's/\<MS_SYNCHRONOUS\>/SB_SYNCHRONOUS/g' \
-e 's/\<MS_MANDLOCK\>/SB_MANDLOCK/g' \
-e 's/\<MS_DIRSYNC\>/SB_DIRSYNC/g' \
-e 's/\<MS_NOATIME\>/SB_NOATIME/g' \
-e 's/\<MS_NODIRATIME\>/SB_NODIRATIME/g' \
-e 's/\<MS_SILENT\>/SB_SILENT/g' \
-e 's/\<MS_POSIXACL\>/SB_POSIXACL/g' \
-e 's/\<MS_KERNMOUNT\>/SB_KERNMOUNT/g' \
-e 's/\<MS_I_VERSION\>/SB_I_VERSION/g' \
-e 's/\<MS_LAZYTIME\>/SB_LAZYTIME/g' \
$list
and commit it with something along the lines of 'convert filesystems
away from use of MS_... constants' as commit message, it would save a
quite a bit of headache next cycle"
* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: Differentiate mount flags (MS_*) from internal superblock flags
VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags
Omit an extra message for a memory allocation failure in these functions.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Jan Kara <jack@suse.cz>
Replace the specification of data structures by variable references
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Jan Kara <jack@suse.cz>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The script “checkpatch.pl” pointed information out like the following.
Comparison to NULL could be written !…
Thus fix affected source code places.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Jan Kara <jack@suse.cz>
Firstly by applying the following with coccinelle's spatch:
@@ expression SB; @@
-SB->s_flags & MS_RDONLY
+sb_rdonly(SB)
to effect the conversion to sb_rdonly(sb), then by applying:
@@ expression A, SB; @@
(
-(!sb_rdonly(SB)) && A
+!sb_rdonly(SB) && A
|
-A != (sb_rdonly(SB))
+A != sb_rdonly(SB)
|
-A == (sb_rdonly(SB))
+A == sb_rdonly(SB)
|
-!(sb_rdonly(SB))
+!sb_rdonly(SB)
|
-A && (sb_rdonly(SB))
+A && sb_rdonly(SB)
|
-A || (sb_rdonly(SB))
+A || sb_rdonly(SB)
|
-(sb_rdonly(SB)) != A
+sb_rdonly(SB) != A
|
-(sb_rdonly(SB)) == A
+sb_rdonly(SB) == A
|
-(sb_rdonly(SB)) && A
+sb_rdonly(SB) && A
|
-(sb_rdonly(SB)) || A
+sb_rdonly(SB) || A
)
@@ expression A, B, SB; @@
(
-(sb_rdonly(SB)) ? 1 : 0
+sb_rdonly(SB)
|
-(sb_rdonly(SB)) ? A : B
+sb_rdonly(SB) ? A : B
)
to remove left over excess bracketage and finally by applying:
@@ expression A, SB; @@
(
-(A & MS_RDONLY) != sb_rdonly(SB)
+(bool)(A & MS_RDONLY) != sb_rdonly(SB)
|
-(A & MS_RDONLY) == sb_rdonly(SB)
+(bool)(A & MS_RDONLY) == sb_rdonly(SB)
)
to make comparisons against the result of sb_rdonly() (which is a bool)
work correctly.
Signed-off-by: David Howells <dhowells@redhat.com>
udf_fill_super() used udf_parse_options() to flag UDF_FLAG_BLOCKSIZE_SET
when blocksize was specified otherwise used 512 bytes
(bdev_logical_block_size) and 2048 bytes (UDF_DEFAULT_BLOCKSIZE)
IOW both 1024 and 4096 specifications were required or resulted in
"mount: wrong fs type, bad option, bad superblock on /dev/loop1"
This patch loops through different block values but also updates
udf_load_vrs() to return -EINVAL instead of 0 when udf_check_vsd()
fails (and uopt->novrs = 0).
The later being the reason for the RFC; we have that case when mounting
a 4kb blocksize against other values but maybe VRS is not mandatory
there ?
Tested with 512, 1024, 2048 and 4096 blocksize
Reported-by: Jan Kara <jack@suse.com>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
Move all module attributes at the end of one file like other FS.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
CURRENT_TIME is not y2038 safe.
CURRENT_TIME macro is also not appropriate for filesystems
as it doesn't use the right granularity for filesystem
timestamps.
Logical Volume Integrity format is described to have the
same timestamp format for "Recording Date and time" as
the other [a,c,m]timestamps.
The function udf_time_to_disk_format() does this conversion.
Hence the timestamp is passed directly to the function and
not truncated. This is as per Arnd's suggestion on the
thread.
This is also in preparation for the patch that transitions
vfs timestamps to use 64 bit time and hence make them
y2038 safe. As part of the effort current_time() will be
extended to do range checks.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jan Kara <jack@suse.cz>
UDF/OSTA terminology is confusing. Partition Numbers (PNs) are arbitrary
16-bit values, one for each physical partition in the volume. Partition
Reference Numbers (PRNs) are indices into the the Partition Map Table
and do not necessarily equal the PN of the mapped partition.
The current metadata code mistakenly uses the PN instead of the PRN when
mapping metadata blocks to physical/sparable blocks. Windows-created
UDF 2.5 discs for some reason use large, arbitrary PNs, resulting in
mount failure and KASAN read warnings in udf_read_inode().
For example, a NetBSD UDF 2.5 partition might look like this:
PRN PN Type
--- -- ----
0 0 Sparable
1 0 Metadata
Since PRN == PN, we are fine.
But Windows could gives us:
PRN PN Type
--- ---- ----
0 8192 Sparable
1 8192 Metadata
So udf_read_inode() will start out by checking the partition length in
sbi->s_partmaps[8192], which is obviously out of bounds.
Fix this by creating a new field (s_phys_partition_ref) in struct
udf_meta_data, referencing whatever physical or sparable map has the
same partition number as the metadata partition.
[JK: Add comment about s_phys_partition_ref, change its name]
Signed-off-by: Alden Tondettar <alden.tondettar@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Presently, a corrupted or malicious UDF filesystem containing a very large
number (or cycle) of Logical Volume Integrity Descriptor extent
indirections may trigger a stack overflow and kernel panic in
udf_load_logicalvolint() on mount.
Replace the unnecessary recursion in udf_load_logicalvolint() with
simple iteration. Set an arbitrary limit of 1000 indirections (which would
have almost certainly overflowed the stack without this fix), and treat
such cases as if there were no LVID.
Signed-off-by: Alden Tondettar <alden.tondettar@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Commit 9293fcfbc1
("udf: Remove struct ustr as non-needed intermediate storage"),
while getting rid of 'struct ustr', does not take any special care
of 'dstring' fields and effectively use fixed field length instead
of actual string length, encoded in the last byte of the field.
Also, commit 484a10f493
("udf: Merge linux specific translation into CS0 conversion function")
introduced checking of the length of the string being converted,
requiring proper alignment to number of bytes constituing each
character.
The UDF volume identifier is represented as a 32-bytes 'dstring',
and needs to be converted from CS0 to UTF8, while mounting UDF
filesystem. The changes in mentioned commits can in some cases
lead to incorrect handling of volume identifier:
- if the actual string in 'dstring' is of maximal length and
does not have zero bytes separating it from dstring encoded
length in last byte, that last byte may be included in conversion,
thus making incorrect resulting string;
- if the identifier is encoded with 2-bytes characters (compression
code is 16), the length of 31 bytes (32 bytes of field length minus
1 byte of compression code), taken as the string length, is reported
as an incorrect (unaligned) length, and the conversion fails, which
in its turn leads to volume mounting failure.
This patch introduces handling of 'dstring' encoded length field
in udf_CS0toUTF8 function, that is used in all and only cases
when 'dstring' fields are converted. Currently these cases are
processing of Volume Identifier and Volume Set Identifier fields.
The function is also renamed to udf_dstrCS0toUTF8 to distinctly
indicate that it handles 'dstring' input.
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Although 'struct ustr' tries to structurize the data by combining
the string and its length, it doesn't actually make much benefit,
since it saves only one parameter, but introduces an extra copying
of the whole buffer, serving as an intermediate storage. It looks
quite inefficient and not actually needed.
This commit gets rid of the struct ustr by changing the parameters
of some functions appropriately.
Also, it removes using 'dstring' type, since it doesn't make much
sense too.
Just using the occasion, add a 'const' qualifier to udf_get_filename
to make consistent parameters sets.
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Actual name length restriction is 254 bytes, this is used in 'ustr'
structure, and this is what fits into UDF File Ident structures.
And in most cases the constant is used as UDF_NAME_LEN-2.
So, it's better to just modify the constant to make it closer
to reality.
Also, in some cases it's useful to have a separate constant for
the maximum length of file name field in CS0 encoding in UDF File
Ident structures.
Also, remove the unused UDF_PATH_LEN constant.
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Signed-off-by: Jan Kara <jack@suse.cz>
There are many locations that do
if (memory_was_allocated_by_vmalloc)
vfree(ptr);
else
kfree(ptr);
but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory
using is_vmalloc_addr(). Unless callers have special reasons, we can
replace this branch with kvfree(). Please check and reply if you found
problems.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Acked-by: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Acked-by: David Rientjes <rientjes@google.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Oleg Drokin <oleg.drokin@intel.com>
Cc: Boris Petkov <bp@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull UDF fixes and quota cleanups from Jan Kara:
"Several UDF fixes and some minor quota cleanups"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Check output buffer length when converting name to CS0
udf: Prevent buffer overrun with multi-byte characters
quota: constify qtree_fmt_operations structures
udf: avoid uninitialized variable use
udf: Fix lost indirect extent block
udf: Factor out code for creating indirect extent
udf: limit the maximum number of indirect extents in a row
udf: limit the maximum number of TD redirections
fs: make quota/dquot.c explicitly non-modular
fs: make quota/netlink.c explicitly non-modular
Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg. For the list, see below:
- threadinfo
- task_struct
- task_delay_info
- pid
- cred
- mm_struct
- vm_area_struct and vm_region (nommu)
- anon_vma and anon_vma_chain
- signal_struct
- sighand_struct
- fs_struct
- files_struct
- fdtable and fdtable->full_fds_bits
- dentry and external_name
- inode for all filesystems. This is the most tedious part, because
most filesystems overwrite the alloc_inode method.
The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds. Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Filesystem fuzzing revealed that we could get stuck in the
udf_process_sequence() loop.
The maximum limit was chosen arbitrarily but fixes the problem I saw.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
When read-write mount of a filesystem is requested but we find out we
can mount the filesystem only in read-only mode, we still modify
LVID in udf_close_lvid(). That is both unnecessary and contrary to
expectation that when we fall back to read-only mount we don't modify
the filesystem.
Make sure we call udf_close_lvid() only if we called udf_open_lvid() so
that filesystem gets modified only if we verified we are allowed to
write to it.
Reported-by: Karel Zak <kzak@redhat.com>
Signed-off-by: Jan Kara <jack@suse.com>
There are some missing braces here which means this function never
succeeds.
Fixes: e9d4cf411f ('udf: improve error management in udf_CS0toUTF8()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
udf_CS0toUTF8() now returns -EINVAL on error.
udf_load_pvoldesc() and udf_get_filename() do the same.
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
Call mutex_destroy() on superblock mutex in udf_put_super()
otherwise mutex debugging code isn't able to detect that
mutex is used after being freed.
(thanks to Jan Kara for complete definition).
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
The iput() function was called in up to three cases by the udf_fill_super()
function during error handling even if the passed data structure element
contained still a null pointer. This implementation detail could be improved
by the introduction of another jump label.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Jan Kara <jack@suse.cz>
The iput() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Jan Kara <jack@suse.cz>
Some UDF media have special inodes (like VAT or metadata partition
inodes) whose link_count is 0. Thus commit 4071b91362 (udf: Properly
detect stale inodes) broke loading these inodes because udf_iget()
started returning -ESTALE for them. Since we still need to properly
detect stale inodes queried by NFS, create two variants of udf_iget() -
one which is used for looking up special inodes (which ignores
link_count == 0) and one which is used for other cases which return
ESTALE when link_count == 0.
Fixes: 4071b91362
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Currently __udf_read_inode() wasn't returning anything and we found out
whether we succeeded reading inode by checking whether inode is bad or
not. udf_iget() returned NULL on failure and inode pointer otherwise.
Make these two functions properly propagate errors up the call stack and
use the return value in callers.
Signed-off-by: Jan Kara <jack@suse.cz>
Fix checkpatch warning
WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
Pull ext3 improvements, cleanups, reiserfs fix from Jan Kara:
"various cleanups for ext2, ext3, udf, isofs, a documentation update
for quota, and a fix of a race in reiserfs readdir implementation"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
reiserfs: fix race in readdir
ext2: acl: remove unneeded include of linux/capability.h
ext3: explicitly remove inode from orphan list after failed direct io
fs/isofs/inode.c add __init to init_inodecache()
ext3: Speedup WB_SYNC_ALL pass
fs/quota/Kconfig: Update filesystems
ext3: Update outdated comment before ext3_ordered_writepage()
ext3: Update PF_MEMALLOC handling in ext3_write_inode()
ext2/3: use prandom_u32() instead of get_random_bytes()
ext3: remove an unneeded check in ext3_new_blocks()
ext3: remove unneeded check in ext3_ordered_writepage()
fs: Mark function as static in ext3/xattr_security.c
fs: Mark function as static in ext3/dir.c
fs: Mark function as static in ext2/xattr_security.c
ext3: Add __init macro to init_inodecache
ext2: Add __init macro to init_inodecache
udf: Add __init macro to init_inodecache
fs: udf: parse_options: blocksize check
Both affs and isofs check for blocksize integrity during
parse_options.Do the same thing for udf.
Valid values : 512, 1024, 2048 or 4096 bytes.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
The UDF driver was not strict enough about checking the IDs in the
VSDs when mounting, which resulted in reading through all the sectors
of the block device in some unfortunate cases. Eg, trying to mount my
uninitialized 200G SSD partition (all 0xFF bytes) took ~350 minutes to
fail, because the code expected some of the valid IDs or a zero byte.
During this, the mount couldn't be killed, sync from the cmdline
blocked, and the machine froze into the shutdown. Valid filesystems
(extX, btrfs, ntfs) were rejected by the mere accident of having a
zero byte at just the right place in some of their sectors, close
enough to the beginning not to generate excess I/O. The fix adds a
hard limit on the VSD sector offset, adds the two missing VSD IDs, and
stops scanning when encountering an invalid ID. Also replaced the
magic number 32768 with a more meaningful #define, and supressed the
bogus message about failing to read the first sector if no UDF fs was
detected.
Signed-off-by: Peter A. Felvegi <petschy@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
A user has reported an oops in udf_statfs() that was caused by
numOfPartitions entry in LVID structure being corrupted. Fix the problem
by verifying whether numOfPartitions makes sense at least to the extent
that LVID fits into a single block as it should.
Reported-by: Juergen Weigert <jw@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Refuse RW mount of udf filesystem. So far we just silently changed it
to RO mount but when the media is writeable, block layer won't notice
this change and thus will think device is used RW and will block eject
button of the drive. That is unexpected by users because for
non-writeable media eject button works just fine.
Userspace mount(8) command handles this just fine and retries mounting
with MS_RDONLY set so userspace shouldn't see any regression. Plus any
tool mounting udf is likely confronted with the case of read-only
media where block layer already refuses to mount the filesystem without
MS_RDONLY set so our behavior shouldn't be anything new for it.
Reported-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Change all function used in filesystem discovery during mount to user
standard kernel return values - -errno on error, 0 on success instead
of 1 on failure and 0 on success. This allows us to pass error number
(not just failure / success) so we can abort device scanning earlier
in case of errors like EIO or ENOMEM . Also we will be able to return
EROFS in case writeable mount is requested but writing isn't supported.
Signed-off-by: Jan Kara <jack@suse.cz>
Somehow I failed to add the MODULE_ALIAS_FS for cifs, hostfs, hpfs,
squashfs, and udf despite what I thought were my careful checks :(
Add them now.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
struct udf_bitmap has array of buffer pointers attached to it. The code
unnecessarily used s_block_bitmap as a pointer to the array instead of
the standard trick of using 0 length array in the declaration. Change
that to make code more readable and actually shrink the structure by one
pointer.
Signed-off-by: Jan Kara <jack@suse.cz>
This patch implements extent caching in case of file reading.
While reading a file, currently, UDF reads metadata serially
which takes a lot of time depending on the number of extents present
in the file. Caching last accessd extent improves metadata read time.
Instead of reading file metadata from start, now we read from
the cached extent.
This patch considerably improves the time spent by CPU in kernel mode.
For example, while reading a 10.9 GB file using dd:
Time before applying patch:
11677022208 bytes (10.9GB) copied, 1529.748921 seconds, 7.3MB/s
real 25m 29.85s
user 0m 12.41s
sys 15m 34.75s
Time after applying patch:
11677022208 bytes (10.9GB) copied, 1469.338231 seconds, 7.6MB/s
real 24m 29.44s
user 0m 15.73s
sys 3m 27.61s
[JK: Fix bh refcounting issues, simplify initialization]
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: Bonggil Bak <bgbak@samsung.com>
Signed-off-by: Jan Kara <jack@suse.cz>
So far we just marked the buffer as dirty and left writing on flusher thread
but especially on opening that opens possible race window where we could write
other modified fs structures to disk before we mark filesystem as open. So sync
LVID buffer to disk after opening and closing fs.
Reported-by: Steve Nickel <snickel58@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
This patch fixes a regression caused by commit bff943af6f "udf: Fix memory
leak when mounting" due to which it was triggering a kernel null point
dereference in case of interrupted mount OR when allocating memory to
sbi->s_partmaps failed in function udf_sb_alloc_partition_maps.
Reported-and-tested-by: James Hogan <james@albanarts.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Pull vfs update from Al Viro:
- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c
(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).
A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.
- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).
- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.
- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.
- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."
Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...
There's no reason to call rcu_barrier() on every
deactivate_locked_super(). We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.
Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths. E.g. on my machine exit_group() of a last process in IPC
namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull user namespace changes from Eric Biederman:
"This is a mostly modest set of changes to enable basic user namespace
support. This allows the code to code to compile with user namespaces
enabled and removes the assumption there is only the initial user
namespace. Everything is converted except for the most complex of the
filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
nfs, ocfs2 and xfs as those patches need a bit more review.
The strategy is to push kuid_t and kgid_t values are far down into
subsystems and filesystems as reasonable. Leaving the make_kuid and
from_kuid operations to happen at the edge of userspace, as the values
come off the disk, and as the values come in from the network.
Letting compile type incompatible compile errors (present when user
namespaces are enabled) guide me to find the issues.
The most tricky areas have been the places where we had an implicit
union of uid and gid values and were storing them in an unsigned int.
Those places were converted into explicit unions. I made certain to
handle those places with simple trivial patches.
Out of that work I discovered we have generic interfaces for storing
quota by projid. I had never heard of the project identifiers before.
Adding full user namespace support for project identifiers accounts
for most of the code size growth in my git tree.
Ultimately there will be work to relax privlige checks from
"capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
root in a user names to do those things that today we only forbid to
non-root users because it will confuse suid root applications.
While I was pushing kuid_t and kgid_t changes deep into the audit code
I made a few other cleanups. I capitalized on the fact we process
netlink messages in the context of the message sender. I removed
usage of NETLINK_CRED, and started directly using current->tty.
Some of these patches have also made it into maintainer trees, with no
problems from identical code from different trees showing up in
linux-next.
After reading through all of this code I feel like I might be able to
win a game of kernel trivial pursuit."
Fix up some fairly trivial conflicts in netfilter uid/git logging code.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
userns: Convert the ufs filesystem to use kuid/kgid where appropriate
userns: Convert the udf filesystem to use kuid/kgid where appropriate
userns: Convert ubifs to use kuid/kgid
userns: Convert squashfs to use kuid/kgid where appropriate
userns: Convert reiserfs to use kuid and kgid where appropriate
userns: Convert jfs to use kuid/kgid where appropriate
userns: Convert jffs2 to use kuid and kgid where appropriate
userns: Convert hpfs to use kuid and kgid where appropriate
userns: Convert btrfs to use kuid/kgid where appropriate
userns: Convert bfs to use kuid/kgid where appropriate
userns: Convert affs to use kuid/kgid wherwe appropriate
userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
userns: On ia64 deal with current_uid and current_gid being kuid and kgid
userns: On ppc convert current_uid from a kuid before printing.
userns: Convert s390 getting uid and gid system calls to use kuid and kgid
userns: Convert s390 hypfs to use kuid and kgid where appropriate
userns: Convert binder ipc to use kuids
userns: Teach security_path_chown to take kuids and kgids
userns: Add user namespace support to IMA
userns: Convert EVM to deal with kuids and kgids in it's hmac computation
...