Commit Graph

59389 Commits

Author SHA1 Message Date
Christoph Hellwig
35af80aef9 gfs2: don't use buffer_heads in gfs2_allocate_page_backing
Rewrite gfs2_allocate_page_backing to call gfs2_iomap_get_alloc and operate on
struct iomap directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-03 14:45:18 +02:00
Christoph Hellwig
7770c93a46 gfs2: use iomap_bmap instead of generic_block_bmap
No need to indirect through get_blocks and buffer_heads when we can just use
the iomap version.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-03 14:45:18 +02:00
Christoph Hellwig
378b6cbfb8 gfs2: mark stuffed_readpage static
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-03 14:45:18 +02:00
Christoph Hellwig
59c01c5046 gfs2: merge gfs2_writepage_common into gfs2_writepage
There is no need to keep these two functions separate.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-03 14:45:18 +02:00
Christoph Hellwig
eadd753580 gfs2: merge gfs2_writeback_aops and gfs2_ordered_aops
The only difference between the two is that gfs2_ordered_aops sets the
set_page_dirty method to __set_page_dirty_buffers, but given that
__set_page_dirty_buffers is the default, if no method is set, there is no need
to to do that.  Merge the two sets of operations into one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-03 14:45:09 +02:00
Linus Torvalds
6e692c3b72 SMB3 fix (for stable as well) for crash mishandling one of the Windows reparse point symlink tags
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl0b67IACgkQiiy9cAdy
 T1H/Ngv/XNc9l/OEHwyWZ1QnSCBKZyLyD5ZcKQRFFkfiktmQ8FtPUzf4qKlHxX1h
 ssefwBIbkW1+DG2sgvrL7OfqnPDnSezVoifvRmbh0nFX8anWhtChZMc0s+xiLtz2
 SbDBugNSkc8l9fvQz5A6VPJ3TcNA+VsSE2rr1HuimS9S4RAy1RsPhhWNyUh3GV5A
 SWuD7bsnxZ7/H2l+hx+s2O5RLDFoeniEIGFTsH9/f7Q19YGJtf6arnUlyUaZjkXK
 bPV2jZyalRUznK7RSFDLu49fS2zH8/m6MfBYyat31SZVtLFcQC/ijhKYTWr8wrKu
 +iQPlX+IDk4rfH/++7PXJJv1sKFLZNEs22dOi1YG0FgkRtMNA8HzmJqVFLcgoB2d
 QD7Ahj4dE0ghXv1dLMjfKdchNbkrWiygfpje54AkhU9SWUIS/EljDbQSq3e/wpAW
 i9HxCGCmmTPFzVKDVhyaBXHi6h5pzd7FfNNS4iJ2Lsy5PRLOHBMxaX1wknu/8vP0
 IIWuB9Hh
 =1zkr
 -----END PGP SIGNATURE-----

Merge tag '5.2-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fix from Steve French:
 "SMB3 fix (for stable as well) for crash mishandling one of the Windows
  reparse point symlink tags"

* tag '5.2-rc6-smb3-fix' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix crash querying symlinks stored as reparse-points
2019-07-03 16:06:36 +08:00
Christoph Hellwig
e0ec0a6ba6 gfs2: remove the unused gfs2_stuffed_write_end function
This function was overlooked when the write_begin and write_end address space
operations were removed as part of gfs2's iomap conversion.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-03 08:56:01 +02:00
Christoph Hellwig
f3915f83e8 gfs2: use page_offset in gfs2_page_mkwrite
Without casting page->index to a guaranteed 64-bit type, the value might be
treated as 32-bit on 32-bit platforms and thus get truncated.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-07-03 08:53:01 +02:00
Jaegeuk Kim
cad3836f9e f2fs: allocate blocks for pinned file
This patch allows fallocate to allocate physical blocks for pinned file.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02 15:40:42 -07:00
Sahitya Tummala
56659ce838 f2fs: fix is_idle() check for discard type
The discard thread should issue upto dpolicy->max_requests at once
and wait for all those discard requests at once it reaches
dpolicy->max_requests. It should then sleep for dpolicy->min_interval
timeout before issuing the next batch of discard requests. But in the
current code of is_idle(), it checks for dcc_info->queued_discard and
aborts issuing the discard batch of max_requests. This
dcc_info->queued_discard will be true always once one discard command
is issued.

It is thus resulting into this type of discard request pattern -

- Issue discard request#1
- is_idle() returns false, discard thread waits for request#1 and then
  sleeps for min_interval 50ms.
- Issue discard request#2
- is_idle() returns false, discard thread waits for request#2 and then
  sleeps for min_interval 50ms.
- and so on for all other discard requests, assuming f2fs is idle w.r.t
  other conditions.

With this fix, the pattern will look like this -

- Issue discard request#1
- Issue discard request#2
  and so on upto max_requests of 8
- Issue discard request#8
- wait for min_interval 50ms.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02 15:40:42 -07:00
Jaegeuk Kim
db6ec53b7e f2fs: add a rw_sem to cover quota flag changes
Two paths to update quota and f2fs_lock_op:

1.
 - lock_op
 |  - quota_update
 `- unlock_op

2.
 - quota_update
 - lock_op
 `- unlock_op

But, we need to make a transaction on quota_update + lock_op in #2 case.
So, this patch introduces:
1. lock_op
2. down_write
3. check __need_flush
4. up_write
5. if there is dirty quota entries, flush them
6. otherwise, good to go

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02 15:40:41 -07:00
Chao Yu
c83414aedf f2fs: set SBI_NEED_FSCK for xattr corruption case
If xattr is corrupted, let's print kernel message and set SBI_NEED_FSCK
for further repair.

Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02 15:40:41 -07:00
Chao Yu
10f966bbf5 f2fs: use generic EFSBADCRC/EFSCORRUPTED
f2fs uses EFAULT as error number to indicate filesystem is corrupted
all the time, but generic filesystems use EUCLEAN for such condition,
we need to change to follow others.

This patch adds two new macros as below to wrap more generic error
code macros, and spread them in code.

EFSBADCRC	EBADMSG		/* Bad CRC detected */
EFSCORRUPTED	EUCLEAN		/* Filesystem is corrupted */

Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02 15:40:41 -07:00
Geert Uytterhoeven
f91108b801 f2fs: Use DIV_ROUND_UP() instead of open-coding
Replace the open-coded divisions with round-up by calls to the
DIV_ROUND_UP() helper macro.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02 15:40:41 -07:00
Chao Yu
2d821c1217 f2fs: print kernel message if filesystem is inconsistent
As Pavel reported, once we detect filesystem inconsistency in
f2fs_inplace_write_data(), it will be better to print kernel message as
we did in other places.

Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02 15:40:41 -07:00
Joe Perches
dcbb4c10e6 f2fs: introduce f2fs_<level> macros to wrap f2fs_printk()
- Add and use f2fs_<level> macros
- Convert f2fs_msg to f2fs_printk
- Remove level from f2fs_printk and embed the level in the format
- Coalesce formats and align multi-line arguments
- Remove unnecessary duplicate extern f2fs_msg f2fs.h

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02 15:40:40 -07:00
Chao Yu
8740edc3e5 f2fs: avoid get_valid_blocks() for cleanup
No logic change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02 15:40:40 -07:00
Qiuyang Sun
04f0b2eaa3 f2fs: ioctl for removing a range from F2FS
This ioctl shrinks a given length (aligned to sections) from end of the
main area. Any cursegs and valid blocks will be moved out before
invalidating the range.

This feature can be used for adjusting partition sizes online.

History of the patch:

Sahitya Tummala:
 - Add this ioctl for f2fs_compat_ioctl() as well.
 - Fix debugfs status to reflect the online resize changes.
 - Fix potential race between online resize path and allocate new data
   block path or gc path.

Others:
 - Rename some identifiers.
 - Add some error handling branches.
 - Clear sbi->next_victim_seg[BG_GC/FG_GC] in shrinking range.
 - Implement this interface as ext4's, and change the parameter from shrunk
bytes to new block count of F2FS.
 - During resizing, force to empty sit_journal and forbid adding new
   entries to it, in order to avoid invalid segno in journal after resize.
 - Reduce sbi->user_block_count before resize starts.
 - Commit the updated superblock first, and then update in-memory metadata
   only when the former succeeds.
 - Target block count must align to sections.
 - Write checkpoint before and after committing the new superblock, w/o
CP_FSCK_FLAG respectively, so that the FS can be fixed by fsck even if
resize fails after the new superblock is committed.
 - In free_segment_range(), reduce granularity of gc_mutex.
 - Add protection on curseg migration.
 - Add freeze_bdev() and thaw_bdev() for resize fs.
 - Remove CUR_MAIN_SECS and use MAIN_SECS directly for allocation.
 - Recover super_block and FS metadata when resize fails.
 - No need to clear CP_FSCK_FLAG in update_ckpt_flags().
 - Clean up the sb and fs metadata update functions for resize_fs.

Geert Uytterhoeven:
 - Use div_u64*() for 64-bit divisions

Arnd Bergmann:
 - Not all architectures support get_user() with a 64-bit argument:
    ERROR: "__get_user_bad" [fs/f2fs/f2fs.ko] undefined!
    Use copy_from_user() here, this will always work.

Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-07-02 15:39:24 -07:00
Gabriel Krisman Bertazi
96fcaf86c3 ext4: fix coverity warning on error path of filename setup
Fix the following coverity warning reported by Dan Carpenter:

fs/ext4/namei.c:1311 ext4_fname_setup_ci_filename()
	  warn: 'cf_name->len' unsigned <= 0

Fixes: 3ae72562ad ("ext4: optimize case-insensitive lookups")
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
2019-07-02 17:56:12 -04:00
Kimberly Brown
78e9605d4f ext4: replace ktype default_attrs with default_groups
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace the default_attrs field in ext4_sb_ktype
and ext4_feat_ktype with default_groups. Use the ATTRIBUTE_GROUPS macro
to create ext4_groups and ext4_feat_groups.

Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-07-02 17:38:55 -04:00
Hariprasad Kelam
7451c54abc ecryptfs: Change return type of ecryptfs_process_flags
Change return type of ecryptfs_process_flags from int to void as it
never fails.

fixes below issue reported by coccicheck

s/ecryptfs/crypto.c:870:5-7: Unneeded variable: "rc". Return "0" on line
883

Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
[tyhicks: Remove the return value line from the function documentation]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2019-07-02 19:28:02 +00:00
Christoph Hellwig
25b2995a35 mm: remove MEMORY_DEVICE_PUBLIC support
The code hasn't been used since it was added to the tree, and doesn't
appear to actually be usable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-07-02 14:32:43 -03:00
Darrick J. Wong
677717fbd4 xfs: refactor INUMBERS to use iwalk functions
Now that we have generic functions to walk inode records, refactor the
INUMBERS implementation to use it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-02 09:40:06 -07:00
Darrick J. Wong
04b8fba2e1 xfs: refactor iwalk code to handle walking inobt records
Refactor xfs_iwalk_ag_start and xfs_iwalk_ag so that the bits that are
particular to bulkstat (trimming the start irec, starting inode
readahead, and skipping empty groups) can be controlled via flags in the
iwag structure.

This enables us to add a new function to walk all inobt records which
will be used for the new INUMBERS implementation in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-02 09:40:06 -07:00
Darrick J. Wong
2b5eb82601 xfs: refactor xfs_iwalk_grab_ichunk
In preparation for reusing the iwalk code for the inogrp walking code
(aka INUMBERS), move the initial inobt lookup and retrieval code out of
xfs_iwalk_grab_ichunk so that we call the masking code only when we need
to trim out the inodes that came before the cursor in the inobt record
(aka BULKSTAT).

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-02 09:40:06 -07:00
Darrick J. Wong
688f7c3678 xfs: clean up long conditionals in xfs_iwalk_ichunk_ra
Refactor xfs_iwalk_ichunk_ra to avoid long conditionals.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-02 09:40:06 -07:00
Darrick J. Wong
5e29f3b720 xfs: change xfs_iwalk_grab_ichunk to use startino, not lastino
Now that the inode chunk grabbing function is a static function in the
iwalk code, change its behavior so that @agino is the inode where we
want to /start/ the iteration.  This reduces cognitive friction with the
callers and simplifes the code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-02 09:40:05 -07:00
Darrick J. Wong
da1d9e5912 xfs: move bulkstat ichunk helpers to iwalk code
Now that we've reworked the bulkstat code to use iwalk, we can move the
old bulkstat ichunk helpers to xfs_iwalk.c.  No functional changes here.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-02 09:40:05 -07:00
Darrick J. Wong
938c710d99 xfs: calculate inode walk prefetch more carefully
The existing inode walk prefetch is based on the old bulkstat code,
which simply allocated 4 pages worth of memory and prefetched that many
inobt records, regardless of however many inodes the caller requested.
65536 inodes is a lot to prefetch (~32M on x64, ~512M on arm64) so let's
scale things down a little more intelligently based on the number of
inodes requested, etc.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-02 09:40:05 -07:00
Darrick J. Wong
2810bd6840 xfs: convert bulkstat to new iwalk infrastructure
Create a new ibulk structure incore to help us deal with bulk inode stat
state tracking and then convert the bulkstat code to use the new iwalk
iterator.  This disentangles inode walking from bulk stat control for
simpler code and enables us to isolate the formatter functions to the
ioctl handling code.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-02 09:40:05 -07:00
Darrick J. Wong
f16fe3ecde xfs: bulkstat should copy lastip whenever userspace supplies one
When userspace passes in a @lastip pointer we should copy the results
back, even if the @ocount pointer is NULL.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-02 09:40:05 -07:00
Darrick J. Wong
ebd126a651 xfs: convert quotacheck to use the new iwalk functions
Convert quotacheck to use the new iwalk iterator to dig through the
inodes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-02 09:40:05 -07:00
Darrick J. Wong
a211432c27 xfs: create simplified inode walk function
Create a new iterator function to simplify walking inodes in an XFS
filesystem.  This new iterator will replace the existing open-coded
walking that goes on in various places.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-02 09:40:05 -07:00
Darrick J. Wong
5bb46e3e18 xfs: create iterator error codes
Currently, xfs doesn't have generic error codes defined for "stop
iterating"; we just reuse the XFS_BTREE_QUERY_* return values.  This
looks a little weird if we're not actually iterating a btree index.
Before we start adding more iterators, we should create general
XFS_ITER_{CONTINUE,ABORT} return values and define the XFS_BTREE_QUERY_*
ones from that.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-07-02 09:40:05 -07:00
Darrick J. Wong
dbc77f31e5 vfs: only allow FSSETXATTR to set DAX flag on files and dirs
The DAX flag only applies to files and directories, so don't let it get
set for other types of files.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
2019-07-01 08:25:36 -07:00
Darrick J. Wong
ca29be7534 vfs: teach vfs_ioc_fssetxattr_check to check extent size hints
Move the extent size hint checks that aren't xfs-specific to the vfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
2019-07-01 08:25:36 -07:00
Darrick J. Wong
f991492ed1 vfs: teach vfs_ioc_fssetxattr_check to check project id info
Standardize the project id checks for FSSETXATTR.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
2019-07-01 08:25:35 -07:00
Darrick J. Wong
7b0e492e6b vfs: create a generic checking function for FS_IOC_FSSETXATTR
Create a generic checking function for the incoming FS_IOC_FSSETXATTR
fsxattr values so that we can standardize some of the implementation
behaviors.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
2019-07-01 08:25:35 -07:00
Darrick J. Wong
5aca284210 vfs: create a generic checking and prep function for FS_IOC_SETFLAGS
Create a generic function to check incoming FS_IOC_SETFLAGS flag values
and later prepare the inode for updates so that we can standardize the
implementations that follow ext4's flag values.

Note that the efivarfs implementation no longer fails a no-op SETFLAGS
without CAP_LINUX_IMMUTABLE since that's the behavior in ext*.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: David Sterba <dsterba@suse.com>
Reviewed-by: Bob Peterson <rpeterso@redhat.com>
2019-07-01 08:25:34 -07:00
Eric Biggers
570d7a98e7 vfs: move_mount: reject moving kernel internal mounts
sys_move_mount() crashes by dereferencing the pointer MNT_NS_INTERNAL,
a.k.a. ERR_PTR(-EINVAL), if the old mount is specified by fd for a
kernel object with an internal mount, such as a pipe or memfd.

Fix it by checking for this case and returning -EINVAL.

[AV: what we want is is_mounted(); use that instead of making the
condition even more convoluted]

Reproducer:

    #include <unistd.h>

    #define __NR_move_mount         429
    #define MOVE_MOUNT_F_EMPTY_PATH 0x00000004

    int main()
    {
    	int fds[2];

    	pipe(fds);
        syscall(__NR_move_mount, fds[0], "", -1, "/", MOVE_MOUNT_F_EMPTY_PATH);
    }

Reported-by: syzbot+6004acbaa1893ad013f0@syzkaller.appspotmail.com
Fixes: 2db154b3ea ("vfs: syscall: Add move_mount(2) to move mounts around")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-07-01 10:46:36 -04:00
Ming Lei
79d08f89bb block: fix .bi_size overflow
'bio->bi_iter.bi_size' is 'unsigned int', which at most hold 4G - 1
bytes.

Before 07173c3ec2 ("block: enable multipage bvecs"), one bio can
include very limited pages, and usually at most 256, so the fs bio
size won't be bigger than 1M bytes most of times.

Since we support multi-page bvec, in theory one fs bio really can
be added > 1M pages, especially in case of hugepage, or big writeback
with too many dirty pages. Then there is chance in which .bi_size
is overflowed.

Fixes this issue by using bio_full() to check if the added segment may
overflow .bi_size.

Cc: Liu Yiding <liuyd.fnst@cn.fujitsu.com>
Cc: kernel test robot <rong.a.chen@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: 07173c3ec2 ("block: enable multipage bvecs")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-01 08:18:54 -06:00
Jens Axboe
5be1f9d82f Linux 5.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Os1seHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGtx4H/j6i482XzcGFKTBm
 A7mBoQpy+kLtoUov4EtBAR62OuwI8rsahW9di37QKndPoQrczWaKBmr3De6LCdPe
 v3pl3O6wBbvH5ru+qBPFX9PdNbDvimEChh7LHxmMxNQq3M+AjZAZVJyfpoiFnx35
 Fbge+LZaH/k8HMwZmkMr5t9Mpkip715qKg2o9Bua6dkH0AqlcpLlC8d9a+HIVw/z
 aAsyGSU8jRwhoAOJsE9bJf0acQ/pZSqmFp0rDKqeFTSDMsbDRKLGq/dgv4nW0RiW
 s7xqsjb/rdcvirRj3rv9+lcTVkOtEqwk0PVdL9WOf7g4iYrb3SOIZh8ZyViaDSeH
 VTS5zps=
 =huBY
 -----END PGP SIGNATURE-----

Merge tag 'v5.2-rc6' into for-5.3/block

Merge 5.2-rc6 into for-5.3/block, so we get the same page merge leak
fix. Otherwise we end up having conflicts with future patches between
for-5.3/block and master that touch this area. In particular, it makes
the bio_full() fix hard to backport to stable.

* tag 'v5.2-rc6': (482 commits)
  Linux 5.2-rc6
  Revert "iommu/vt-d: Fix lock inversion between iommu->lock and device_domain_lock"
  Bluetooth: Fix regression with minimum encryption key size alignment
  tcp: refine memory limit test in tcp_fragment()
  x86/vdso: Prevent segfaults due to hoisted vclock reads
  SUNRPC: Fix a credential refcount leak
  Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
  net :sunrpc :clnt :Fix xps refcount imbalance on the error path
  NFS4: Only set creation opendata if O_CREAT
  ARM: 8867/1: vdso: pass --be8 to linker if necessary
  KVM: nVMX: reorganize initial steps of vmx_set_nested_state
  KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries
  habanalabs: use u64_to_user_ptr() for reading user pointers
  nfsd: replace Jeff by Chuck as nfsd co-maintainer
  inet: clear num_timeout reqsk_alloc()
  PCI/P2PDMA: Ignore root complex whitelist when an IOMMU is present
  net: mvpp2: debugfs: Add pmap to fs dump
  ipv6: Default fib6_type to RTN_UNICAST when not set
  net: hns3: Fix inconsistent indenting
  net/af_iucv: always register net_device notifier
  ...
2019-07-01 08:16:08 -06:00
Christoph Hellwig
73d30d4874 xfs: remove XFS_TRANS_NOFS
Instead of a magic flag for xfs_trans_alloc, just ensure all callers
that can't relclaim through the file system use memalloc_nofs_save to
set the per-task nofs flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-30 09:05:17 -07:00
Christoph Hellwig
fe64e0d26b xfs: simplify xfs_ioend_can_merge
Compare the block layer status directly instead of converting it to
an errno first.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-30 09:05:17 -07:00
Christoph Hellwig
7dbae9fbde xfs: allow merging ioends over append boundaries
There is no real problem merging ioends that go beyond i_size into an
ioend that doesn't.  We just need to move the append transaction to the
base ioend.  Also use the opportunity to use a real error code instead
of the magic 1 to cancel the transactions, and write a comment
explaining the scheme.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-30 09:05:17 -07:00
Christoph Hellwig
0290d9c1e5 xfs: fix a comment typo in xfs_submit_ioend
The fail argument is long gone, update the comment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-30 09:05:17 -07:00
Christoph Hellwig
1fdafce55c xfs: remove the unused xfs_count_page_state declaration
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-30 09:05:17 -07:00
Christoph Hellwig
b620743077 block: never take page references for ITER_BVEC
If we pass pages through an iov_iter we always already have a reference
in the caller.  Thus remove the ITER_BVEC_FLAG_NO_REF and don't take
reference to pages by default for bvec backed iov_iters.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:32 -06:00
Christoph Hellwig
d7c8aa85ed direct-io: use bio_release_pages in dio_bio_complete
Use bio_release_pages instead of duplicating it.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:31 -06:00
Christoph Hellwig
9fec4a2188 block_dev: use bio_release_pages in bio_unmap_user
Use bio_release_pages instead of duplicating it.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:31 -06:00
Christoph Hellwig
57dfe3ce10 block_dev: use bio_release_pages in blkdev_bio_end_io
Use bio_release_pages instead of duplicating it.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:31 -06:00
Christoph Hellwig
147a60538d iomap: use bio_release_pages in iomap_dio_bio_end_io
Use bio_release_pages instead of duplicating it.

Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-29 09:47:31 -06:00
Linus Torvalds
01305db842 XArray updates for 5.2-rc6
Account XArray nodes for the page cache to the appropriate cgroup
   (Johannes Weiner)
 Fix idr_get_next() when called under the RCU lock (Matthew Wilcox)
 Add a test for xa_insert() (Matthew Wilcox)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAl0WuKsACgkQDpNsjXcp
 gj73zgf9Eb477PuwYZpFBA9ZxI5v/6WyqbaWXKdqEhotARgIUuv1CfVnkt1IJE6P
 Z3QCRABZ3pIKHgIErJN53B7AdvdONUO4Xf9VFBqmxeWE7F9L3sROOpXc8IrR26kV
 hITQn8mwgacNQ8mLtQmcSFaCVC2E7yVNBhVd5zmcA6jNIAFsOJcP06KLJTe94OXe
 AB9TJvswxpzAEX8emHQ/a1SFBNZWJ7b53hBcu8CJn8CuGDxmo1/+qqoRyNY+WrDO
 OohFk2u1j6Esfc6j0k+Akt8mEFyfU2oxFfv5MjL0KYEyrHoU84eZljFGgf7rQqGj
 fqH9RO8J8eoj4D/3XaLL5QYRLIxRaw==
 =AXZy
 -----END PGP SIGNATURE-----

Merge tag 'xarray-5.2-rc6' of git://git.infradead.org/users/willy/linux-dax

Pull XArray fixes from Matthew Wilcox:

 - Account XArray nodes for the page cache to the appropriate cgroup
   (Johannes Weiner)

 - Fix idr_get_next() when called under the RCU lock (Matthew Wilcox)

 - Add a test for xa_insert() (Matthew Wilcox)

* tag 'xarray-5.2-rc6' of git://git.infradead.org/users/willy/linux-dax:
  XArray tests: Add check_insert
  idr: Fix idr_get_next race with idr_remove
  mm: fix page cache convergence regression
2019-06-29 17:14:57 +08:00
Linus Torvalds
0839c53762 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "15 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  linux/kernel.h: fix overflow for DIV_ROUND_UP_ULL
  mm, swap: fix THP swap out
  fork,memcg: alloc_thread_stack_node needs to set tsk->stack
  MAINTAINERS: add CLANG/LLVM BUILD SUPPORT info
  mm/vmalloc.c: avoid bogus -Wmaybe-uninitialized warning
  mm/page_idle.c: fix oops because end_pfn is larger than max_pfn
  initramfs: fix populate_initrd_image() section mismatch
  mm/oom_kill.c: fix uninitialized oc->constraint
  mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge
  mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails
  signal: remove the wrong signal_pending() check in restore_user_sigmask()
  fs/binfmt_flat.c: make load_flat_shared_library() work
  mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask
  fs/proc/array.c: allow reporting eip/esp for all coredumping threads
  mm/dev_pfn: exclude MEMORY_DEVICE_PRIVATE while computing virtual address
2019-06-29 17:11:01 +08:00
Linus Torvalds
c949c30b26 Two more NFS client fixes for Linux 5.2
Stable bugfixes:
 - SUNRPC: Fix up calculation of client message length # 5.1+
 - NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O # 4.8+
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl0Wf3EACgkQ18tUv7Cl
 QOs2ORAA5/CXFa471jUldOsHejxfFoddFBkuqf8qZ1AF3TZdFuITAsq+xydxfO5U
 hYzUUlOTKedEi+ISYLFs1tjU/nYRQJv7fFZxVwq6uDZ53Z/doiMLAIR67Eq7EcTY
 KBWA9zdldnBzb0S87+hkbmaNPR5pjqxBzLEfMmOQEAAh5pSGf5YSeUNTXLGj4wBd
 iXf25o1VSjUmNpSHaA3KsrqTJ4mJ7+i/17Iny1c4xRgZbJtoTm44DpceHCheJpbl
 DymRSgjSr0vFjJbufcKkbF2OPp1ZsnkDiKyJmZzgPOa3+TMGzisU5yiASoac6D+j
 gs426yEz9rvR/TMZtFS05nfu2clKuS8foLGwZelJ7XjQSXJgObCb4xf97jLIOWNb
 J+BWwsTmUIQS+fMUQDA+rlbyepJ+skVZpbjmUy+/Uy52oqtnYK6uTD469NdmxBwr
 7z2pnCUjJFTqo6BHeCQgR5XlSt1MGDByamcVAWONS+9zJttRhfUOjq0PIOLSrsBK
 5zRzJxtBoYLwP5py3zKAeV9RcvDNSgh5U6P0hhFRtHfqMUmtGeA58nNND2S6Qm3/
 vAB7WZL0aVSvc3zpz7qdctitMESQNspCkMooAp/EoIime3YkqKCS+AgED9jKLhJR
 /5eqtr6tehh6A4dshzSlDF7cFrKyUd+ulS0IN8vt1V2TYgOQmbY=
 =FATk
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.2-4' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull two more NFS client fixes from Anna Schumaker:
 "These are both stable fixes.

  One to calculate the correct client message length in the case of
  partial transmissions. And the other to set the proper TCP timeout for
  flexfiles"

* tag 'nfs-for-5.2-4' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O
  SUNRPC: Fix up calculation of client message length
2019-06-29 17:02:22 +08:00
Linus Torvalds
43251dbd6a A small fix for a potential -rc1 regression from Jeff.
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl0WLe4THGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzizPrB/4tNUS8J9mW9Zd3xLAzZmwjq+WAfCV8
 wp3IjBHCgvn9SmTYOJtozjTLJVlmeGNVyrCaWbtzQ2YLKvyBTCUF4kg9EG7FMX9a
 ixzlHb2+Wu46LYWiA7jhUnoKNMMl1swm01BOvfmGprSwV70BAEF0i2/D7WHikolX
 rgcwGb58vUMmXQ1VGfIO9Pox2a8jaZNj82BZnDniMDxetZ5sRsZXGy43s14zC6Lt
 YnwDT70Y7+Pr9SwHMA5bnZ8kCtQpr0qAHmDVhEd965Io1XZ+2/EHF5IwqK0xGg+e
 KUQdRyhMWjIGG34SWMt5tbT+9Lzeju4CAka9NPSJ1tRtFnk1AvpILbnB
 =TpFR
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.2-rc7' of git://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "A small fix for a potential -rc1 regression from Jeff"

* tag 'ceph-for-5.2-rc7' of git://github.com/ceph/ceph-client:
  ceph: fix ceph_mdsc_build_path to not stop on first component
2019-06-29 17:01:02 +08:00
Linus Torvalds
9dda12b6fa for-linus-20190628
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl0WM9YQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmkyEADVPjXlIZETBpAl/oK/StNc1NMdfgBiWaX7
 kQHbFu3V4soDpvR8iQvMVyFc7dUpwo9lmgxIOcZSfdCf/ciJ/G4trhH4UljXfRsj
 2vdKV3rZXragrclN0zGtW90sBBYxSilaezzRQbnnXjEgGaHFkeJJR3xW00UMoGrm
 GDO2gSQdhDKqhJtKjiCASkyN9uWMkcLFdsGErPgA6e4S3NTbaLKaY/xFUCcMF7aX
 N1aYkIfdyl38QUU/N+5WLgiJYHkiZNqcrJ+a5aECioqqiNh9ST+UR1jCgo7tlt4h
 b3Gb5mxP0CPUuTh3VQD8GHCaPzDsxUIxThJkz5aih3M9NEQmm5Du0GDChaDuMoUR
 zyFT/Yl4JfeO93mlpxGUyC5WyFCQdj0QOBuyxInCchvJC5kbpRflMuKt+xRYlSqg
 331njdykyKkgutagLzzTME38RPUbttZVmbc6K422PXKkYW+FOlS352FZpl5qxDOu
 5+ihOXOLvO09VXu6kcC5UH4Yi6nuGYDS95oIZhJ0OODx10xnKSE4ZozlPXAEreAR
 NVJN7vbHVqLnphuplRK9Kh0VngdIhLkeTsUxaTnX6UQSioHPDJPqPP5nfSu9Xkyo
 e+2UAXkfVjnw45jAu8Mrsu0KhabCB5Pde8Jk+kmqPcuWXQEN5OHqeA09vtvKj81J
 lIagz1NZxw==
 =WzXj
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190628' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Just two small fixes.

  One from Paolo, fixing a silly mistake in BFQ. The other one is from
  me, ensuring that we have ->file cleared in the io_uring request a bit
  earlier. That avoids a use-before-free, if we encounter an error
  before ->file is assigned"

* tag 'for-linus-20190628' of git://git.kernel.dk/linux-block:
  block, bfq: fix operator in BFQQ_TOTALLY_SEEKY
  io_uring: ensure req->file is cleared on allocation
2019-06-29 16:58:35 +08:00
Oleg Nesterov
97abc889ee signal: remove the wrong signal_pending() check in restore_user_sigmask()
This is the minimal fix for stable, I'll send cleanups later.

Commit 854a6ed568 ("signal: Add restore_user_sigmask()") introduced
the visible change which breaks user-space: a signal temporary unblocked
by set_user_sigmask() can be delivered even if the caller returns
success or timeout.

Change restore_user_sigmask() to accept the additional "interrupted"
argument which should be used instead of signal_pending() check, and
update the callers.

Eric said:

: For clarity.  I don't think this is required by posix, or fundamentally to
: remove the races in select.  It is what linux has always done and we have
: applications who care so I agree this fix is needed.
:
: Further in any case where the semantic change that this patch rolls back
: (aka where allowing a signal to be delivered and the select like call to
: complete) would be advantage we can do as well if not better by using
: signalfd.
:
: Michael is there any chance we can get this guarantee of the linux
: implementation of pselect and friends clearly documented.  The guarantee
: that if the system call completes successfully we are guaranteed that no
: signal that is unblocked by using sigmask will be delivered?

Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com
Fixes: 854a6ed568 ("signal: Add restore_user_sigmask()")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Eric Wong <e@80x24.org>
Tested-by: Eric Wong <e@80x24.org>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Deepa Dinamani <deepa.kernel@gmail.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: <stable@vger.kernel.org>	[5.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29 16:43:45 +08:00
Jann Horn
867bfa4a5f fs/binfmt_flat.c: make load_flat_shared_library() work
load_flat_shared_library() is broken: It only calls load_flat_file() if
prepare_binprm() returns zero, but prepare_binprm() returns the number of
bytes read - so this only happens if the file is empty.

Instead, call into load_flat_file() if the number of bytes read is
non-negative. (Even if the number of bytes is zero - in that case,
load_flat_file() will see nullbytes and return a nice -ENOEXEC.)

In addition, remove the code related to bprm creds and stop using
prepare_binprm() - this code is loading a library, not a main executable,
and it only actually uses the members "buf", "file" and "filename" of the
linux_binprm struct. Instead, call kernel_read() directly.

Link: http://lkml.kernel.org/r/20190524201817.16509-1-jannh@google.com
Fixes: 287980e49f ("remove lots of IS_ERR_VALUE abuses")
Signed-off-by: Jann Horn <jannh@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29 16:43:45 +08:00
John Ogness
cb8f381f16 fs/proc/array.c: allow reporting eip/esp for all coredumping threads
0a1eb2d474 ("fs/proc: Stop reporting eip and esp in /proc/PID/stat")
stopped reporting eip/esp and fd7d56270b ("fs/proc: Report eip/esp in
/prod/PID/stat for coredumping") reintroduced the feature to fix a
regression with userspace core dump handlers (such as minicoredumper).

Because PF_DUMPCORE is only set for the primary thread, this didn't fix
the original problem for secondary threads.  Allow reporting the eip/esp
for all threads by checking for PF_EXITING as well.  This is set for all
the other threads when they are killed.  coredump_wait() waits for all the
tasks to become inactive before proceeding to invoke a core dumper.

Link: http://lkml.kernel.org/r/87y32p7i7a.fsf@linutronix.de
Link: http://lkml.kernel.org/r/20190522161614.628-1-jlu@pengutronix.de
Fixes: fd7d56270b ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reported-by: Jan Luebbe <jlu@pengutronix.de>
Tested-by: Jan Luebbe <jlu@pengutronix.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-06-29 16:43:44 +08:00
Christoph Hellwig
89b171acb2 xfs: fix iclog allocation size
Properly allocate the space for the bio_vecs instead of just one byte
per bio_vec.

Fixes: 79b54d9bfc ("xfs: use bios directly to write log buffers")
Reported-by: syzbot+b75afdbe271a0d7ac4f6@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 21:02:45 -07:00
Eric Sandeen
250d4b4c40 xfs: remove unused header files
There are many, many xfs header files which are included but
unneeded (or included twice) in the xfs code, so remove them.

nb: xfs_linux.h includes about 9 headers for everyone, so those
explicit includes get removed by this.  I'm not sure what the
preference is, but if we wanted explicit includes everywhere,
a followup patch could remove those xfs_*.h includes from
xfs_linux.h and move them into the files that need them.
Or it could be left as-is.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:30:43 -07:00
Christoph Hellwig
adfb5fb46a xfs: implement cgroup aware writeback
Link every newly allocated writeback bio to cgroup pointed to by the
writeback control structure, and charge every byte written back to it.

Tested-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:30:22 -07:00
Christoph Hellwig
a247373596 xfs: simplify xfs_chain_bio
Move setting up operation and write hint to xfs_alloc_ioend, and
then just copy over all needed information from the previous bio
in xfs_chain_bio and stop passing various parameters to it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:30:22 -07:00
Darrick J. Wong
f327a00745 xfs: account for log space when formatting new AGs
When we're writing out a fresh new AG, make sure that we don't list an
internal log as free and that we create the rmap for the region.  growfs
never does this, but we will need it when we hook up mkfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-06-28 19:30:21 -07:00
Darrick J. Wong
8d90857cff xfs: refactor free space btree record initialization
Refactor the code that populates the free space btrees of a new AG so
that we can avoid code duplication once things start getting
complicated.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-06-28 19:30:21 -07:00
Brian Foster
7e36a3a63d xfs: always update params on small allocation
xfs_alloc_ag_vextent_small() doesn't update the output parameters in
the event of an AGFL allocation. Instead, it updates the
xfs_alloc_arg structure directly to complete the allocation.

Update both args and the output params to provide consistent
behavior for future callers.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:30:20 -07:00
Brian Foster
6691cd9267 xfs: skip small alloc cntbt logic on NULL cursor
The small allocation helper is implemented in a way that is fairly
tightly integrated to the existing allocation algorithms. It expects
a cntbt cursor beyond the end of the tree, attempts to locate the
last record in the tree and only attempts an AGFL allocation if the
cntbt is empty.

The upcoming generic algorithm doesn't rely on the cntbt processing
of this function. It will only call this function when the cntbt
doesn't have a big enough extent or is empty and thus AGFL
allocation is the only remaining option. Tweak
xfs_alloc_ag_vextent_small() to handle a NULL cntbt cursor and skip
the cntbt logic. This facilitates use by the existing allocation
code and new code that only requires an AGFL allocation attempt.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:30:20 -07:00
Brian Foster
c63cdd4fc9 xfs: move small allocation helper
Move the small allocation helper further up in the file to avoid the
need for a function declaration. The remaining declarations will be
removed by followup patches. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:30:19 -07:00
Brian Foster
2a4f35f984 xfs: clean up small allocation helper
xfs_alloc_ag_vextent_small() is kind of a mess. Clean it up in
preparation for future changes. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:30:19 -07:00
Christoph Hellwig
caeaea9858 xfs: merge xfs_trans_bmap.c into xfs_bmap_item.c
Keep all bmap item related code together.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:29:42 -07:00
Christoph Hellwig
3cfce1e3ce xfs: merge xfs_trans_rmap.c into xfs_rmap_item.c
Keep all rmap item related code together in one file.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:29:41 -07:00
Christoph Hellwig
effd5e96e7 xfs: merge xfs_trans_refcount.c into xfs_refcount_item.c
Keep all the refcount item related code together in one file.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:29:41 -07:00
Christoph Hellwig
81f4004173 xfs: merge xfs_trans_extfree.c into xfs_extfree_item.c
Keep all the extree item related code together in one file.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:28:17 -07:00
Christoph Hellwig
73f0d23633 xfs: merge xfs_bud_init into xfs_trans_get_bud
There is no good reason to keep these two functions separate.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:36 -07:00
Christoph Hellwig
60883447f4 xfs: merge xfs_rud_init into xfs_trans_get_rud
There is no good reason to keep these two functions separate.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:36 -07:00
Christoph Hellwig
ebeb8e0629 xfs: merge xfs_cud_init into xfs_trans_get_cud
There is no good reason to keep these two functions separate.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:35 -07:00
Christoph Hellwig
9c5e7c2ae3 xfs: merge xfs_efd_init into xfs_trans_get_efd
There is no good reason to keep these two functions separate.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:35 -07:00
Christoph Hellwig
95cf0e4a0d xfs: remove a pointless comment duplicated above all xfs_item_ops instances
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:34 -07:00
Christoph Hellwig
89ae379d56 xfs: use a list_head for iclog callbacks
Replace the hand grown linked list handling and cil context attachment
with the standard list_head structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:34 -07:00
Christoph Hellwig
efe2330fdc xfs: remove the xfs_log_item_t typedef
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:33 -07:00
Christoph Hellwig
b3b14aacc6 xfs: don't cast inode_log_items to get the log_item
The cast is not type safe, and we can just dereference the first
member instead to start with.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:33 -07:00
Christoph Hellwig
9ce632a28a xfs: add a flag to release log items on commit
We have various items that are released from ->iop_comitting.  Add a
flag to just call ->iop_release from the commit path to avoid tons
of boilerplate code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:32 -07:00
Christoph Hellwig
ddf92053e4 xfs: split iop_unlock
The iop_unlock method is called when comitting or cancelling a
transaction.  In the latter case, the transaction may or may not be
aborted.  While there is no known problem with the current code in
practice, this implementation is limited in that any log item
implementation that might want to differentiate between a commit and a
cancellation must rely on the aborted state.  The aborted bit is only
set when the cancelled transaction is dirty, however.  This means that
there is no way to distinguish between a commit and a clean transaction
cancellation.

For example, intent log items currently rely on this distinction.  The
log item is either transferred to the CIL on commit or released on
transaction cancel. There is currently no possibility for a clean intent
log item in a transaction, but if that state is ever introduced a cancel
of such a transaction will immediately result in memory leaks of the
associated log item(s).  This is an interface deficiency and landmine.

To clean this up, replace the iop_unlock method with an iop_release
method that is specific to transaction cancel.  The existing
iop_committing method occurs at the same time as iop_unlock in the
commit path and there is no need for two separate callbacks here.
Overload the iop_committing method with the current commit time
iop_unlock implementations to eliminate the need for the latter and
further simplify the interface.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:32 -07:00
Christoph Hellwig
195cd83d1b xfs: don't use xfs_trans_free_items in the commit path
While commiting items looks very similar to freeing them on error it is
a different operation, and they will diverge a bit soon.

Split out the commit case from xfs_trans_free_items, inline it into
xfs_log_commit_cil and give it a separate trace point.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:31 -07:00
Christoph Hellwig
8e4b20ea83 xfs: remove the dummy iop_push implementation for inode creation items
This method should never be called, so don't waste code on it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:31 -07:00
Christoph Hellwig
e8b78db77d xfs: don't require log items to implement optional methods
Just check if they are present first.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:30 -07:00
Christoph Hellwig
d15cbf2f38 xfs: stop using XFS_LI_ABORTED as a parameter flag
Just pass a straight bool aborted instead of abusing XFS_LI_ABORTED as a
flag in function parameters.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:30 -07:00
Christoph Hellwig
086252c34b xfs: fix a trivial comment typo in xfs_trans_committed_bulk
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:29 -07:00
Christoph Hellwig
dbd329f1e4 xfs: add struct xfs_mount pointer to struct xfs_buf
We need to derive the mount pointer from a buffer in a lot of place.
Add a direct pointer to short cut the pointer chasing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:29 -07:00
Christoph Hellwig
8124b9b601 xfs: remove the b_io_length field in struct xfs_buf
This field is now always idential to b_length.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:28 -07:00
Christoph Hellwig
e99b4bd0cb xfs: properly type the b_log_item field in struct xfs_buf
Now that the log code doesn't abuse this field any more we can
declare it as a struct xfs_buf_log_item pointer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:28 -07:00
Christoph Hellwig
0564501ff5 xfs: remove unused buffer cache APIs
Now that the log code uses bios directly we can drop various special
cases in the buffer cache code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:27 -07:00
Christoph Hellwig
6e9b3dd80f xfs: stop using bp naming for log recovery buffers
Now that we don't use struct xfs_buf to hold log recovery buffer rename
the related functions and variables to just talk of a buffer instead of
using the bp name that we usually use for xfs_buf related functionality.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:27 -07:00
Christoph Hellwig
6ad5b3255b xfs: use bios directly to read and write the log recovery buffers
The xfs_buf structure is basically used as a glorified container for
a memory allocation in the log recovery code.  Replace it with a
call to kmem_alloc_large and a simple abstraction to read into or
write from it synchronously using chained bios.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:26 -07:00
Christoph Hellwig
18ffb8c3f0 xfs: return an offset instead of a pointer from xlog_align
This simplifies both the helper and the callers.  We lost a bit of
size sanity checking, but that is already covered by KASAN if needed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:26 -07:00
Christoph Hellwig
1058d0f5ee xfs: move the log ioend workqueue to struct xlog
Move the workqueue used for log I/O completions from struct xfs_mount
to struct xlog to keep it self contained in the log code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: destroy the log workqueue after ensuring log ios are done]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:25 -07:00
Christoph Hellwig
79b54d9bfc xfs: use bios directly to write log buffers
Currently the XFS logging code uses the xfs_buf structure and
associated APIs to write the log buffers to disk.  This requires
various special cases in the log code and is generally not very
optimal.

Instead of using a buffer just allocate a kmem_alloc_larger region for
each log buffer, and use a bio and bio_vec array embedded in the iclog
structure to write the buffer to disk.  This also allows for using
the bio split and chaining case to deal with the case of a log
buffer wrapping around the end of the log.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: don't split if/else with an #endif]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:25 -07:00
Christoph Hellwig
2d15d2c0e0 xfs: make use of the l_targ field in struct xlog
Use the slightly shorter way to get at the buftarg for the log device
wherever we can in the log and log recovery code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:24 -07:00
Christoph Hellwig
abca1f33f8 xfs: remove the syncing argument from xlog_verify_iclog
The only caller unconditionally passes true here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:24 -07:00
Christoph Hellwig
9b0489c1d1 xfs: update both stat counters together in xlog_sync
Just a small bit of code tidying up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:23 -07:00
Christoph Hellwig
db0a6faf93 xfs: factor out iclog size calculation from xlog_sync
Split out another self-contained bit of code from xlog_sync.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:23 -07:00
Christoph Hellwig
5693384805 xfs: factor out splitting of an iclog from xlog_sync
Split out a self-contained chunk of code from xlog_sync that calculates
the split offset for an iclog that wraps the log end and bumps the
cycles for the second half.

Use the chance to bring some sanity to the variables used to track the
split in xlog_sync by not changing the count variable, and instead use
split as the offset for the split and use those to calculate the
sizes and offsets for the two write buffers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:22 -07:00
Christoph Hellwig
94860a301b xfs: factor out log buffer writing from xlog_sync
Replace the not very useful xlog_bdstrat wrapper with a new version that
that takes care of all the common logic for writing log buffers.  Use
the opportunity to avoid overloading the buffer address with the log
relative address, and to shed the unused return value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:22 -07:00
Christoph Hellwig
1f9489be02 xfs: don't use REQ_PREFLUSH for split log writes
If we have to split a log write because it wraps the end of the log we
can't just use REQ_PREFLUSH to flush before the first log write,
as the writes might get reordered somewhere in the I/O stack.  Issue
a manual flush in that case so that the ordering of the two log I/Os
doesn't matter.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:21 -07:00
Christoph Hellwig
366fc4b898 xfs: remove XLOG_STATE_IOABORT
This value is the only flag in ic_state, which we otherwise use as
a state.  Switch it to a new debug-only field and also report and
actual error in the buffer in the I/O completion path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:21 -07:00
Christoph Hellwig
9bff313253 xfs: reformat xlog_get_lowest_lsn
Reformat xlog_get_lowest_lsn to our usual style.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:20 -07:00
Christoph Hellwig
4f62282a36 xfs: cleanup xlog_get_iclog_buffer_size
We don't really need all the messy branches in the function, as it
really does three things, out of which 2 are common for all branches:

 1) set up mount point log buffer size and count values if not already
    done from mount options
 2) calculate the number of log headers
 3) set up all the values in struct xlog based on the above

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:20 -07:00
Christoph Hellwig
76ce9823ac xfs: remove the l_iclog_size_log field from struct xlog
This field is never used, so we can simply kill it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:19 -07:00
Christoph Hellwig
72945d86dd xfs: make mem_to_page available outside of xfs_buf.c
Rename the function to kmem_to_page and move it to kmem.h together
with our kmem_large allocator that may either return kmalloced or
vmalloc pages.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:19 -07:00
Christoph Hellwig
ce89755cdf xfs: renumber XBF_WRITE_FAIL
Assining a numerical value that is not close to the flags
defined near by is just asking for conflicts later on.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:18 -07:00
Christoph Hellwig
153fd7b57c xfs: remove the never used _XBF_COMPOUND flag
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:18 -07:00
Christoph Hellwig
1e85a3670d xfs: remove the no-op spinlock_destroy stub
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-28 19:27:17 -07:00
Darrick J. Wong
5467b34bd1 xfs: move xfs_ino_geometry to xfs_shared.h
The inode geometry structure isn't related to ondisk format; it's
support for the mount structure.  Move it to xfs_shared.h.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-06-28 19:25:35 -07:00
David Howells
1eda8bab70 afs: Add support for the UAE error table
Add support for mapping AFS UAE abort codes to Linux errno values.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-28 18:37:53 +01:00
Trond Myklebust
68f461593f NFS/flexfiles: Use the correct TCP timeout for flexfiles I/O
Fix a typo where we're confusing the default TCP retrans value
(NFS_DEF_TCP_RETRANS) for the default TCP timeout value.

Fixes: 15d03055cf ("pNFS/flexfiles: Set reasonable default ...")
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-06-28 11:48:52 -04:00
Ronnie Sahlberg
5de254dca8 cifs: fix crash querying symlinks stored as reparse-points
We never parsed/returned any data from .get_link() when the object is a windows reparse-point
containing a symlink. This results in the VFS layer oopsing accessing an uninitialized buffer:

...
[  171.407172] Call Trace:
[  171.408039]  readlink_copy+0x29/0x70
[  171.408872]  vfs_readlink+0xc1/0x1f0
[  171.409709]  ? readlink_copy+0x70/0x70
[  171.410565]  ? simple_attr_release+0x30/0x30
[  171.411446]  ? getname_flags+0x105/0x2a0
[  171.412231]  do_readlinkat+0x1b7/0x1e0
[  171.412938]  ? __ia32_compat_sys_newfstat+0x30/0x30
...

Fix this by adding code to handle these buffers and make sure we do return a valid buffer
to .get_link()

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-06-28 00:34:17 -05:00
David S. Miller
d96ff269a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The new route handling in ip_mc_finish_output() from 'net' overlapped
with the new support for returning congestion notifications from BPF
programs.

In order to handle this I had to take the dev_loopback_xmit() calls
out of the switch statement.

The aquantia driver conflicts were simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-27 21:06:39 -07:00
Linus Torvalds
7a702b4e82 for-linus-20190627
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE7btrcuORLb1XUhEwjrBW1T7ssS0FAl0UnRoACgkQjrBW1T7s
 sS1T0w/+PFooDZNaKJkhJCGm0XyRDYmmuivEX9ydUR1x9/doRbDZTqfjsQBJLoVK
 PulxDiuFbQWXzhBJFEMuU6YBR2fjFqUGsXz5qAXPB0zaahWcSY/0Y8VCU/PKq7A6
 3oJPl/lYwYkLTYUKsnN08hByosUA7WeRQRAxbSFWdCTlUfIw72mDhprMGjJIVAlu
 snLA5lUoy7hyoFdXR5qNhYAcX8sASmi01hXhdnsKMOv4z2Vb5NoQsgqL1W8tAnsf
 BdJKL82Qd7vWQahlbOtur46aeJAL2ukGSTskuA2jOQqsKxmpos+hWq36gToq7usa
 XgPii0Rz7/2s6ZvhmxV5kmzqHylT9giU1DxWybSVo9IZBsU2i1o9DV+yBY50tr45
 s0bmpSA/u4DP2uT8oRvh47LbDqiQFA8dyVWQKE25smSdjekuZHTO0tgXf8mwC2CW
 hDci4z+ONOyqIQyFrhP7UaKuSK6tAAUbYKtXUIN6rnuq1FjuTA2+wtIlOPZuDZQ2
 yrsSUefh4/sFMBSAgoGTg9f+PiCejBMKcxoqhU2/27mvkiInAyDPfoc4oGQcinOy
 OVX3B0A8B88l26sDkWdv15d92E1GKzZLj8h66TlYwDpN+seevftKtpblZ9fJWsSf
 0NejoMV/GcA/KsAp1sxqWwouRob8H6pbGXWb97DYRA2IVyjK3q4=
 =APjA
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190627' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux

Pull pidfd fixes from Christian Brauner:
 "Userspace tools and libraries such as strace or glibc need a cheap and
  reliable way to tell whether CLONE_PIDFD is supported. The easiest way
  is to pass an invalid fd value in the return argument, perform the
  syscall and verify the value in the return argument has been changed
  to a valid fd.

  However, if CLONE_PIDFD is specified we currently check if pidfd == 0
  and return EINVAL if not.

  The check for pidfd == 0 was originally added to enable us to abuse
  the return argument for passing additional flags along with
  CLONE_PIDFD in the future.

  However, extending legacy clone this way would be a terrible idea and
  with clone3 on the horizon and the ability to reuse CLONE_DETACHED
  with CLONE_PIDFD there's no real need for this clutch. So remove the
  pidfd == 0 check and help userspace out.

  Also, accordig to Al, anon_inode_getfd() should only be used past the
  point of no failure and ksys_close() should not be used at all since
  it is far too easy to get wrong. Al's motto being "basically, once
  it's in descriptor table, it's out of your control". So Al's patch
  switches back to what we already had in v1 of the original patchset
  and uses a anon_inode_getfile() + put_user() + fd_install() sequence
  in the success path and a fput() + put_unused_fd() in the failure
  path.

  The other two changes should be trivial"

* tag 'for-linus-20190627' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
  proc: remove useless d_is_dir() check
  copy_process(): don't use ksys_close() on cleanups
  samples: make pidfd-metadata fail gracefully on older kernels
  fork: don't check parent_tidptr with CLONE_PIDFD
2019-06-28 08:41:18 +08:00
Linus Torvalds
cd0f3aaebc AFS fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAXRMn5vu3V2unywtrAQICpA/+IIINk6MJVQDzGhOnvWrbGdPnOdJEUyLN
 B9U4bLZJRg/j+Sqodn+fXIfsEO4FQflkSJD+xoBi4pzBZcr0xkLUVOog/1S7dv4J
 bPVT9p2f3ITNiatmisOrUe1InuHa6Wb/cUnQaLLRhd7NqbawKGRQG4tv4CGwKn67
 dJIOOm/iTCs1ACES4C5QOpU7/DWK38Pn3BbnN21bFzDgfbtbdDTaFFkhFtXy78oB
 Gcj5g+ULpkKBcuJThFuJUPZ9E4qICNZR4kJXEULSvykDDRzluhJmQ+v8btm6NJsq
 hMqTrT9M2y114V1OqXj3me7tA6wOEAfTQ0WzpzF2SmyFQKnSly/EkWc4HZXFD/8O
 BczCcABUbuKNE/pJSELx6k1M0+00QfeLcjHPc6joZFCni3lMdYWOncn/syyHw5P+
 rc9JQsy3+dLcFsaVQ5eGmX6NDc70dCrAlS6MllIzSBcwAVCctTKwm0meaSW6B2y6
 VymPy+cqi1RxMKyiQ0hAeU7Xe6yqFcl6rtonfCQqRLxkfzrCXkDp6/ELOXBzDft1
 ey6+N3WsmWW7YSPuM/SIZKV66rshlflj0w+FRluZEEAF1NYeYqXUDvK/S8KC9kPG
 AXUDvhI+tBpxg1AVz94JN714VmkbY23xV0g44eQsdqSQm2YvsxiFCSWZZ6L/KEWe
 kWQc6BGDCB0=
 =YTdG
 -----END PGP SIGNATURE-----

Merge tag 'afs-fixes-20190620' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull AFS fixes from David Howells:
 "The in-kernel AFS client has been undergoing testing on opendev.org on
  one of their mirror machines. They are using AFS to hold data that is
  then served via apache, and Ian Wienand had reported seeing oopses,
  spontaneous machine reboots and updates to volumes going missing. This
  patch series appears to have fixed the problem, very probably due to
  patch (2), but it's not 100% certain.

  (1) Fix the printing of the "vnode modified" warning to exclude checks
      on files for which we don't have a callback promise from the
      server (and so don't expect the server to tell us when it
      changes).

      Without this, for every file or directory for which we still have
      an in-core inode that gets changed on the server, we may get a
      message logged when we next look at it. This can happen in bulk
      if, for instance, someone does "vos release" to update a R/O
      volume from a R/W volume and a whole set of files are all changed
      together.

      We only really want to log a message if the file changed and the
      server didn't tell us about it or we failed to track the state
      internally.

  (2) Fix accidental corruption of either afs_vlserver struct objects or
      the the following memory locations (which could hold anything).
      The issue is caused by a union that points to two different
      structs in struct afs_call (to save space in the struct). The call
      cleanup code assumes that it can simply call the cleanup for one
      of those structs if not NULL - when it might be actually pointing
      to the other struct.

      This means that every Volume Location RPC op is going to corrupt
      something.

  (3) Fix an uninitialised spinlock. This isn't too bad, it just causes
      a one-off warning if lockdep is enabled when "vos release" is
      called, but the spinlock still behaves correctly.

  (4) Fix the setting of i_block in the inode. This causes du, for
      example, to produce incorrect results, but otherwise should not be
      dangerous to the kernel"

* tag 'afs-fixes-20190620' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  afs: Fix setting of i_blocks
  afs: Fix uninitialised spinlock afs_volume::cb_break_lock
  afs: Fix vlserver record corruption
  afs: Fix over zealous "vnode modified" warnings
2019-06-28 08:34:12 +08:00
Andreas Gruenbacher
36a7347de0 iomap: fix page_done callback for short writes
When we truncate a short write to have it retried, pass the truncated
length to the page_done callback instead of the full length.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-27 17:28:41 -07:00
Christoph Hellwig
8af54f291e fs: fold __generic_write_end back into generic_write_end
This effectively reverts a6d639da63 ("fs: factor out a
__generic_write_end helper") as we now open code what is left of that
helper in iomap.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-27 17:28:40 -07:00
Andreas Gruenbacher
8d3e72a180 iomap: don't mark the inode dirty in iomap_write_end
Marking the inode dirty for each page copied into the page cache can be
very inefficient for file systems that use the VFS dirty inode tracking,
and is completely pointless for those that don't use the VFS dirty inode
tracking.  So instead, only set an iomap flag when changing the in-core
inode size, and open code the rest of __generic_write_end.

Partially based on code from Christoph Hellwig.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-27 17:28:40 -07:00
David Howells
2e12256b9a keys: Replace uid/gid/perm permissions checking with an ACL
Replace the uid/gid/perm permissions checking on a key with an ACL to allow
the SETATTR and SEARCH permissions to be split.  This will also allow a
greater range of subjects to represented.

============
WHY DO THIS?
============

The problem is that SETATTR and SEARCH cover a slew of actions, not all of
which should be grouped together.

For SETATTR, this includes actions that are about controlling access to a
key:

 (1) Changing a key's ownership.

 (2) Changing a key's security information.

 (3) Setting a keyring's restriction.

And actions that are about managing a key's lifetime:

 (4) Setting an expiry time.

 (5) Revoking a key.

and (proposed) managing a key as part of a cache:

 (6) Invalidating a key.

Managing a key's lifetime doesn't really have anything to do with
controlling access to that key.

Expiry time is awkward since it's more about the lifetime of the content
and so, in some ways goes better with WRITE permission.  It can, however,
be set unconditionally by a process with an appropriate authorisation token
for instantiating a key, and can also be set by the key type driver when a
key is instantiated, so lumping it with the access-controlling actions is
probably okay.

As for SEARCH permission, that currently covers:

 (1) Finding keys in a keyring tree during a search.

 (2) Permitting keyrings to be joined.

 (3) Invalidation.

But these don't really belong together either, since these actions really
need to be controlled separately.

Finally, there are number of special cases to do with granting the
administrator special rights to invalidate or clear keys that I would like
to handle with the ACL rather than key flags and special checks.


===============
WHAT IS CHANGED
===============

The SETATTR permission is split to create two new permissions:

 (1) SET_SECURITY - which allows the key's owner, group and ACL to be
     changed and a restriction to be placed on a keyring.

 (2) REVOKE - which allows a key to be revoked.

The SEARCH permission is split to create:

 (1) SEARCH - which allows a keyring to be search and a key to be found.

 (2) JOIN - which allows a keyring to be joined as a session keyring.

 (3) INVAL - which allows a key to be invalidated.

The WRITE permission is also split to create:

 (1) WRITE - which allows a key's content to be altered and links to be
     added, removed and replaced in a keyring.

 (2) CLEAR - which allows a keyring to be cleared completely.  This is
     split out to make it possible to give just this to an administrator.

 (3) REVOKE - see above.


Keys acquire ACLs which consist of a series of ACEs, and all that apply are
unioned together.  An ACE specifies a subject, such as:

 (*) Possessor - permitted to anyone who 'possesses' a key
 (*) Owner - permitted to the key owner
 (*) Group - permitted to the key group
 (*) Everyone - permitted to everyone

Note that 'Other' has been replaced with 'Everyone' on the assumption that
you wouldn't grant a permit to 'Other' that you wouldn't also grant to
everyone else.

Further subjects may be made available by later patches.

The ACE also specifies a permissions mask.  The set of permissions is now:

	VIEW		Can view the key metadata
	READ		Can read the key content
	WRITE		Can update/modify the key content
	SEARCH		Can find the key by searching/requesting
	LINK		Can make a link to the key
	SET_SECURITY	Can change owner, ACL, expiry
	INVAL		Can invalidate
	REVOKE		Can revoke
	JOIN		Can join this keyring
	CLEAR		Can clear this keyring


The KEYCTL_SETPERM function is then deprecated.

The KEYCTL_SET_TIMEOUT function then is permitted if SET_SECURITY is set,
or if the caller has a valid instantiation auth token.

The KEYCTL_INVALIDATE function then requires INVAL.

The KEYCTL_REVOKE function then requires REVOKE.

The KEYCTL_JOIN_SESSION_KEYRING function then requires JOIN to join an
existing keyring.

The JOIN permission is enabled by default for session keyrings and manually
created keyrings only.


======================
BACKWARD COMPATIBILITY
======================

To maintain backward compatibility, KEYCTL_SETPERM will translate the
permissions mask it is given into a new ACL for a key - unless
KEYCTL_SET_ACL has been called on that key, in which case an error will be
returned.

It will convert possessor, owner, group and other permissions into separate
ACEs, if each portion of the mask is non-zero.

SETATTR permission turns on all of INVAL, REVOKE and SET_SECURITY.  WRITE
permission turns on WRITE, REVOKE and, if a keyring, CLEAR.  JOIN is turned
on if a keyring is being altered.

The KEYCTL_DESCRIBE function translates the ACL back into a permissions
mask to return depending on possessor, owner, group and everyone ACEs.

It will make the following mappings:

 (1) INVAL, JOIN -> SEARCH

 (2) SET_SECURITY -> SETATTR

 (3) REVOKE -> WRITE if SETATTR isn't already set

 (4) CLEAR -> WRITE

Note that the value subsequently returned by KEYCTL_DESCRIBE may not match
the value set with KEYCTL_SETATTR.


=======
TESTING
=======

This passes the keyutils testsuite for all but a couple of tests:

 (1) tests/keyctl/dh_compute/badargs: The first wrong-key-type test now
     returns EOPNOTSUPP rather than ENOKEY as READ permission isn't removed
     if the type doesn't have ->read().  You still can't actually read the
     key.

 (2) tests/keyctl/permitting/valid: The view-other-permissions test doesn't
     work as Other has been replaced with Everyone in the ACL.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-27 23:03:07 +01:00
David Howells
a58946c158 keys: Pass the network namespace into request_key mechanism
Create a request_key_net() function and use it to pass the network
namespace domain tag into DNS revolver keys and rxrpc/AFS keys so that keys
for different domains can coexist in the same keyring.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: netdev@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: linux-afs@lists.infradead.org
2019-06-27 23:02:12 +01:00
Bob Peterson
f29e62eed2 gfs2: replace more printk with calls to fs_info and friends
This patch replaces a few leftover printk errors with calls to
fs_info and similar, so that the file system having the error is
properly logged.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 21:30:27 +02:00
Bob Peterson
3792ce973f gfs2: dump fsid when dumping glock problems
Before this patch, if a glock error was encountered, the glock with
the problem was dumped. But sometimes you may have lots of file systems
mounted, and that doesn't tell you which file system it was for.

This patch adds a new boolean parameter fsid to the dump_glock family
of functions. For non-error cases, such as dumping the glocks debugfs
file, the fsid is not dumped in order to keep lock dumps and glocktop
as clean as possible. For all error cases, such as GLOCK_BUG_ON, the
file system id is now printed. This will make it easier to debug.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 21:27:43 +02:00
Bob Peterson
55317f5b00 gfs2: simplify gfs2_freeze by removing case
Function gfs2_freeze had a case statement that simply checked the
error code, but the break statements just made the logic hard to
read. This patch simplifies the logic in favor of a simple if.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 21:26:58 +02:00
Bob Peterson
04aea0ca14 gfs2: Rename SDF_SHUTDOWN to SDF_WITHDRAWN
Before this patch, the superblock flag indicating when a file system
is withdrawn was called SDF_SHUTDOWN. This patch simply renames it to
the more obvious SDF_WITHDRAWN.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 21:26:35 +02:00
Bob Peterson
d14e1ca305 gfs2: Warn when a journal replay overwrites a rgrp with buffers
This patch adds some instrumentation in gfs2's journal replay that
indicates when we're about to overwrite a rgrp for which we already
have a valid buffer_head.

When this problem occurs, it's a situation in which this node has
been granted a rgrp glock and subsequently read in buffer_heads for
it, and possibly even made changes to the rgrp bits and/or
allocation values. But now another node has failed and forced us to
replay its journal, but its journal contains a copy of the same
rgrp, without a revoke, which means we're about to overwrite a
rgrp that we now rightfully own, with an obsolete copy. That is
always a problem. It means the other node (which failed and left
its journal to be replayed) failed to flush out its rgrp buffers,
write out the revoke, and invalidate its copy before it released
the glock to our possession.

No node should ever release a glock until its metadata has been
written to the journal and revoked and invalidated..

We also kludge around the problem and refuse to replace our good
copy with the journals bad copy by not marking the buffer dirty,
but never do it silently. That's wallpapering over a larger problem
that still exists. IOW, if this situation can happen to this node,
it can also happen to a different node and we wouldn't even know it
or be able to circumvent it: Suppose we have a 3-node cluster:
Node 1 fails, leaving an obsolete rgrp block in its journal without
a revoke. Node 2 grabs the rgrp as soon as the rgrp glock is
released and starts making changes, allocating and freeing blocks
from the rgrp, etc. Node 3 replays the journal from node 1,
oblivious and unaware that it's about to overwrite node 2's changes.
So we still need to be vocal and log the error to make it apparent
that a corruption path still exists in gfs2.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 21:04:07 +02:00
Bob Peterson
49eb776ed9 gfs2: log which portion of the journal is replayed
When a journal is replayed, gfs2 logs a message similar to:

jid=X: Replaying journal...

This patch adds the tail and block number so that the range of the
replayed block is also printed. These values will match the values
shown if the journal is dumped with gfs2_edit -p journalX. The
resulting output looks something like this:

jid=1: Replaying journal...0x28b7 to 0x2beb

This will allow us to better debug file system corruption problems.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 21:03:58 +02:00
Bob Peterson
e955537e32 gfs2: eliminate tr_num_revoke_rm
For its journal processing, gfs2 kept track of the number of buffers
added and removed on a per-transaction basis. These values are used
to calculate space needed in the journal. But while these calculations
make sense for the number of buffers, they make no sense for revokes.
Revokes are managed in their own list, linked from the superblock.
So it's entirely unnecessary to keep separate per-transaction counts
for revokes added and removed. A single count will do the same job.
Therefore, this patch combines the transaction revokes into a single
count.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 21:03:53 +02:00
Bob Peterson
5b3a9f348b gfs2: kthread and remount improvements
Before this patch, gfs2 saved the pointers to the two daemon threads
(logd and quotad) in the superblock, but they were never cleared,
even if the threads were stopped (e.g. on remount -o ro). That meant
that certain error conditions (like a withdrawn file system) could
race. For example, xfstests generic/361 caused an IO error during
remount -o ro, which caused the kthreads to be stopped, then the
error flagged. Later, when the test unmounted the file system, it
would try to stop the threads a second time with kthread_stop.

This patch does two things: First, every time it stops the threads
it zeroes out the thread pointer, and also checks whether it's NULL
before trying to stop it. Second, in function gfs2_remount_fs, it
was returning if an error was logged by either of the two functions
for gfs2_make_fs_ro and _rw, which caused it to bypass the online
uevent at the bottom of the function. This removes that bypass in
favor of just running the whole function, then returning the error.
That way, unmounts and remounts won't hang forever.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 21:03:43 +02:00
Kefeng Wang
15a798f7de gfs2: Use IS_ERR_OR_NULL
Use IS_ERR_OR_NULL where appropriate.

(Several more places converted by Andreas.)

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 20:53:46 +02:00
Andreas Gruenbacher
2a27b755ed gfs2: Clean up freeing struct gfs2_sbd
Add a free_sbd function for freeing a struct gfs2_sbd.  Use that for
freeing a super-block descriptor, either directly or via kobject_put.
Free sd_lkstats inside the kobject release function: that way,
gfs2_put_super will no longer leak sd_lkstats.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-06-27 20:53:45 +02:00
Eric Biggers
adbd9b4dee fscrypt: remove selection of CONFIG_CRYPTO_SHA256
fscrypt only uses SHA-256 for AES-128-CBC-ESSIV, which isn't the default
and is only recommended on platforms that have hardware accelerated
AES-CBC but not AES-XTS.  There's no link-time dependency, since SHA-256
is requested via the crypto API on first use.

To reduce bloat, we should limit FS_ENCRYPTION to selecting the default
algorithms only.  SHA-256 by itself isn't that much bloat, but it's
being discussed to move ESSIV into a crypto API template, which would
incidentally bring in other things like "authenc" support, which would
all end up being built-in since FS_ENCRYPTION is now a bool.

For Adiantum encryption we already just document that users who want to
use it have to enable CONFIG_CRYPTO_ADIANTUM themselves.  So, let's do
the same for AES-128-CBC-ESSIV and CONFIG_CRYPTO_SHA256.

Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Eric Biggers <ebiggers@google.com>
2019-06-27 10:29:33 -07:00
Jeff Layton
d6b8bd679c ceph: fix ceph_mdsc_build_path to not stop on first component
When ceph_mdsc_build_path is handed a positive dentry, it will return a
zero-length path string with the base set to that dentry.  This is not
what we want.  Always include at least one path component in the string.

ceph_mdsc_build_path has behaved this way for a long time but it didn't
matter until recent d_name handling rework.

Fixes: 964fff7491 ("ceph: use ceph_mdsc_build_path instead of clone_dentry_name")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-06-27 18:27:36 +02:00
Christian Brauner
30d158b143
proc: remove useless d_is_dir() check
Remove the d_is_dir() check from tgid_pidfd_to_pid().

It is pointless since you should never get &proc_tgid_base_operations
for f_op on a non-directory.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christian Brauner <christian@brauner.io>
2019-06-27 12:25:09 +02:00
Eric Sandeen
555b2c3da1 quota: honor quota type in Q_XGETQSTAT[V] calls
The code in quota_getstate and quota_getstatev is strange; it
says the returned fs_quota_stat[v] structure has room for only
one type of time limits, so fills it in with the first enabled
quota, even though every quotactl command must have a type sent
in by the user.

Instead of just picking the first enabled quota, fill in the
reply with the timers for the quota type that was actually
requested.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2019-06-25 17:51:35 +02:00
Ingo Molnar
b9271f0c65 Linux 5.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Os1seHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGtx4H/j6i482XzcGFKTBm
 A7mBoQpy+kLtoUov4EtBAR62OuwI8rsahW9di37QKndPoQrczWaKBmr3De6LCdPe
 v3pl3O6wBbvH5ru+qBPFX9PdNbDvimEChh7LHxmMxNQq3M+AjZAZVJyfpoiFnx35
 Fbge+LZaH/k8HMwZmkMr5t9Mpkip715qKg2o9Bua6dkH0AqlcpLlC8d9a+HIVw/z
 aAsyGSU8jRwhoAOJsE9bJf0acQ/pZSqmFp0rDKqeFTSDMsbDRKLGq/dgv4nW0RiW
 s7xqsjb/rdcvirRj3rv9+lcTVkOtEqwk0PVdL9WOf7g4iYrb3SOIZh8ZyViaDSeH
 VTS5zps=
 =huBY
 -----END PGP SIGNATURE-----

Merge tag 'v5.2-rc6' into perf/core, to refresh branch

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24 19:25:52 +02:00
Ingo Molnar
d2abae71eb Linux 5.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Os1seHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGtx4H/j6i482XzcGFKTBm
 A7mBoQpy+kLtoUov4EtBAR62OuwI8rsahW9di37QKndPoQrczWaKBmr3De6LCdPe
 v3pl3O6wBbvH5ru+qBPFX9PdNbDvimEChh7LHxmMxNQq3M+AjZAZVJyfpoiFnx35
 Fbge+LZaH/k8HMwZmkMr5t9Mpkip715qKg2o9Bua6dkH0AqlcpLlC8d9a+HIVw/z
 aAsyGSU8jRwhoAOJsE9bJf0acQ/pZSqmFp0rDKqeFTSDMsbDRKLGq/dgv4nW0RiW
 s7xqsjb/rdcvirRj3rv9+lcTVkOtEqwk0PVdL9WOf7g4iYrb3SOIZh8ZyViaDSeH
 VTS5zps=
 =huBY
 -----END PGP SIGNATURE-----

Merge tag 'v5.2-rc6' into sched/core, to refresh the branch

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-06-24 19:19:53 +02:00
Jens Axboe
9e645e1105 io_uring: add support for sqe links
With SQE links, we can create chains of dependent SQEs. One example
would be queueing an SQE that's a read from one file descriptor, with
the linked SQE being a write to another with the same set of buffers.

An SQE link will not stall the pipeline, it'll just ensure that
dependent SQEs aren't issued before the previous link has completed.

Any error at submission or completion time will break the chain of SQEs.
For completions, this also includes short reads or writes, as the next
SQE could depend on the previous one being fully completed.

Any SQE in a chain that gets canceled due to any of the above errors,
will get an CQE fill with -ECANCELED as the error value.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-24 08:00:18 -06:00
Christoph Hellwig
a2357223c5 binfmt_flat: don't offset the data start
Ever since the initial commit of the binfmt_flat shared library
support back in the bitkeeper days we've offset the actual in-memory
.data start by one field per possible shared library, or 1 in case
shared library support isn't enabled.  I can't find anything in the
loader that actually makes use of it, nor was it present before
shared library support it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2019-06-24 09:16:47 +10:00
Christoph Hellwig
a445d988b4 binfmt_flat: move the MAX_SHARED_LIBS definition to binfmt_flat.c
MAX_SHARED_LIBS is an implementation detail of the kernel loader,
and should be kept away from the file format definition.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2019-06-24 09:16:47 +10:00
Christoph Hellwig
6843d8aa5b binfmt_flat: remove the persistent argument from flat_get_addr_from_rp
The argument is never used.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2019-06-24 09:16:47 +10:00
Christoph Hellwig
cf9a566c2c binfmt_flat: make support for old format binaries optional
No need to carry the extra code around, given that systems using flat
binaries are generally very resource constrained.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2019-06-24 09:16:47 +10:00
Christoph Hellwig
aef0f78e74 binfmt_flat: add a ARCH_HAS_BINFMT_FLAT option
Allow architectures to opt into ARCH_HAS_BINFMT_FLAT support instead of
assuming that all nommu ports support the format.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2019-06-24 09:16:47 +10:00
Christoph Hellwig
3b97771842 binfmt_flat: add endianess annotations
Most binfmt_flat on-disk fields are big endian.  Use the proper __be32
type where applicable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2019-06-24 09:16:47 +10:00
Christoph Hellwig
06d2bfedd1 binfmt_flat: remove the uapi <linux/flat.h> header
The split between the two flat.h files is completely arbitrary, and the
uapi version even contains CONFIG_ ifdefs that can't work in userspace.
The only userspace program known to use the header is elf2flt, and it
ships with its own version of the combined header.

Use the chance to move the <asm/flat.h> inclusion out of this file, as it
is in no way needed for the format defintion, but just for the binfmt
implementation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2019-06-24 09:16:46 +10:00
Christoph Hellwig
bdd15a2884 binfmt_flat: replace flat_argvp_envp_on_stack with a Kconfig variable
This will eventually allow us to kill the need for an <asm/flat.h> for
many cases.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2019-06-24 09:16:46 +10:00
Christoph Hellwig
1d52dca117 binfmt_flat: remove flat_old_ram_flag
Instead add a Kconfig variable that only h8300 selects.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2019-06-24 09:16:46 +10:00
Christoph Hellwig
02da283302 binfmt_flat: provide a default version of flat_get_relocate_addr
This way only the two architectures that do masking need to provide
the helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2019-06-24 09:16:46 +10:00
Christoph Hellwig
2f3196d49b binfmt_flat: remove flat_set_persistent
This helper is a no-op on all architectures, remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2019-06-24 09:16:46 +10:00
Christoph Hellwig
9ee24b2a38 binfmt_flat: remove flat_reloc_valid
This helper is the same for all architectures, open code it in the only
caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2019-06-24 09:16:46 +10:00
Greg Kroah-Hartman
8083f3d788 Merge 5.2-rc6 into char-misc-next
We need the char-misc fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-23 09:23:33 +02:00
David S. Miller
92ad6325cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor SPDX change conflict.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-22 08:59:24 -04:00
Theodore Ts'o
7633b08b27 ext4: rename htree_inline_dir_to_tree() to ext4_inlinedir_to_tree()
Clean up namespace pollution by the inline_data code.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-06-21 21:57:00 -04:00
Linus Torvalds
c036f7dabc More NFS client fixes for Linux 5.2
Bugfixes:
 - SUNRPC: Fix a credential refcount leak
 - Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
 - SUNRPC: Fix xps refcount imbalance on the error path
 - NFS4: Only set creation opendata if O_CREAT
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl0NMHQACgkQ18tUv7Cl
 QOujAw//bL4p0ADvCSglqO0ceZcHpHuQVhj10f+tyXfkq3R7WDQmP2qok/+6uZjk
 rYxWh6nCqQqTBSmstf4ouLMjdQTk+zDQf38zSSWGPW0U6rCX4kl/yOyskBriYqnf
 W4U5+hOCvQ8prKdkcjbhYqdvz6qT9bhc5X7kHgtD66CtvyzUDraYw04Mojzodl91
 CPV97rDZbiHBAgZPBFKF+qoTXEL7hQlHUREcR/DZBLV3qrMBsraog1T1OpWRGev0
 OAxVAwZyXFcWDFm7mFcItMA4WcUnZbL77gPNtvZSfgYPxHsytHfH8KOmEG7zP5Yy
 +blko41nvNR2UVOPQ/zTvWj9pkuHQlacDUrlYdgnOmHuGhufj1/xx4Z5C2dkTTnp
 ufjcVXJOqZE7lWyA4IOWapc7gLM3q6I8sUR9w5nLc1meYphNAqHZgK0fTgfbMoo7
 JweUBcgGmNvkMpkX561HBe16ENKvcgQQp666VsnTTI/BPZ/BBdlayKGBbxA3j22F
 znF4gwznvQ0jVxtlNsibzSwM8GQbc6UM5fGPM+atlPJEzban0waQaIbCW347DViZ
 fTXP2NQmvH1x+YNgx6RTRwVBWWT02u/ijQHf7+NvwWLWTKb9OXwQVjlnWHu/E/Qi
 MpNXxtBTT6y+DG09VNV9CYwmAsULnlRQWe9RNBfMz+4sjLCyIjg=
 =+z30
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.2-3' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull more NFS client fixes from Anna Schumaker:
 "These are mostly refcounting issues that people have found recently.
  The revert fixes a suspend recovery performance issue.

   - SUNRPC: Fix a credential refcount leak

   - Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"

   - SUNRPC: Fix xps refcount imbalance on the error path

   - NFS4: Only set creation opendata if O_CREAT"

* tag 'nfs-for-5.2-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  SUNRPC: Fix a credential refcount leak
  Revert "SUNRPC: Declare RPC timers as TIMER_DEFERRABLE"
  net :sunrpc :clnt :Fix xps refcount imbalance on the error path
  NFS4: Only set creation opendata if O_CREAT
2019-06-21 13:45:41 -07:00
Theodore Ts'o
ddce3b9471 ext4: refactor initialize_dirent_tail()
Move the calculation of the location of the dirent tail into
initialize_dirent_tail().  Also prefix the function with ext4_ to fix
kernel namepsace polution.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-06-21 16:31:47 -04:00
Jens Axboe
60c112b0ad io_uring: ensure req->file is cleared on allocation
Stephen reports:

I hit the following General Protection Fault when testing io_uring via
the io_uring engine in fio. This was on a VM running 5.2-rc5 and the
latest version of fio. The issue occurs for both null_blk and fake NVMe
drives. I have not tested bare metal or real NVMe SSDs. The fio script
used is given below.

[io_uring]
time_based=1
runtime=60
filename=/dev/nvme2n1 (note /dev/nullb0 also fails)
ioengine=io_uring
bs=4k
rw=readwrite
direct=1
fixedbufs=1
sqthread_poll=1
sqthread_poll_cpu=0

general protection fault: 0000 [#1] SMP PTI
CPU: 0 PID: 872 Comm: io_uring-sq Not tainted 5.2.0-rc5-cpacket-io-uring #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
RIP: 0010:fput_many+0x7/0x90
Code: 01 48 85 ff 74 17 55 48 89 e5 53 48 8b 1f e8 a0 f9 ff ff 48 85 db 48 89 df 75 f0 5b 5d f3 c3 0f 1f 40 00 0f 1f 44 00 00 89 f6 <f0> 48 29 77 38 74 01 c3 55 48 89 e5 53 48 89 fb 65 48 \

RSP: 0018:ffffadeb817ebc50 EFLAGS: 00010246
RAX: 0000000000000004 RBX: ffff8f46ad477480 RCX: 0000000000001805
RDX: 0000000000000000 RSI: 0000000000000001 RDI: f18b51b9a39552b5
RBP: ffffadeb817ebc58 R08: ffff8f46b7a318c0 R09: 000000000000015d
R10: ffffadeb817ebce8 R11: 0000000000000020 R12: ffff8f46ad4cd000
R13: 00000000fffffff7 R14: ffffadeb817ebe30 R15: 0000000000000004
FS:  0000000000000000(0000) GS:ffff8f46b7a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055828f0bbbf0 CR3: 0000000232176004 CR4: 00000000003606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ? fput+0x13/0x20
 io_free_req+0x20/0x40
 io_put_req+0x1b/0x20
 io_submit_sqe+0x40a/0x680
 ? __switch_to_asm+0x34/0x70
 ? __switch_to_asm+0x40/0x70
 io_submit_sqes+0xb9/0x160
 ? io_submit_sqes+0xb9/0x160
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __schedule+0x3f2/0x6a0
 ? __switch_to_asm+0x34/0x70
 io_sq_thread+0x1af/0x470
 ? __switch_to_asm+0x34/0x70
 ? wait_woken+0x80/0x80
 ? __switch_to+0x85/0x410
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 ? __schedule+0x3f2/0x6a0
 kthread+0x105/0x140
 ? io_submit_sqes+0x160/0x160
 ? kthread+0x105/0x140
 ? io_submit_sqes+0x160/0x160
 ? kthread_destroy_worker+0x50/0x50
 ret_from_fork+0x35/0x40

which occurs because using a kernel side submission thread isn't valid
without using fixed files (registered through io_uring_register()). This
causes io_uring to put the request after logging an error, but before
the file field is set in the request. If it happens to be non-zero, we
attempt to fput() garbage.

Fix this by ensuring that req->file is initialized when the request is
allocated.

Cc: stable@vger.kernel.org # 5.1+
Reported-by: Stephen Bates <sbates@raithlin.com>
Tested-by: Stephen Bates <sbates@raithlin.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-06-21 14:16:28 -06:00
Theodore Ts'o
f036adb399 ext4: rename "dirent_csum" functions to use "dirblock"
Functions such as ext4_dirent_csum_verify() and ext4_dirent_csum_set()
don't actually operate on a directory entry, but a directory block.
And while they take a struct ext4_dir_entry *dirent as an argument, it
had better be the first directory at the beginning of the direct
block, or things will go very wrong.

Rename the following functions so that things make more sense, and
remove a lot of confusing casts along the way:

   ext4_dirent_csum_verify	 -> ext4_dirblock_csum_verify
   ext4_dirent_csum_set		 -> ext4_dirblock_csum_set
   ext4_dirent_csum		 -> ext4_dirblock_csum
   ext4_handle_dirty_dirent_node -> ext4_handle_dirty_dirblock

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-06-21 15:49:26 -04:00
Benjamin Coddington
909105199a NFS4: Only set creation opendata if O_CREAT
We can end up in nfs4_opendata_alloc during task exit, in which case
current->fs has already been cleaned up.  This leads to a crash in
current_umask().

Fix this by only setting creation opendata if we are actually doing an open
with O_CREAT.  We can drop the check for NULL nfs4_open_createattrs, since
O_CREAT will never be set for the recovery path.

Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-06-21 14:43:25 -04:00
Wang Shilong
5043a9643f f2fs: only set project inherit bit for directory
It doesn't make any sense to have project inherit bits
for regular files, even though this won't cause any
problem, but it is better fix this.

Cc: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-06-21 10:41:57 -07:00
Eric Biggers
360985573b f2fs: separate f2fs i_flags from fs_flags and ext4 i_flags
f2fs copied all the on-disk i_flags from ext4, and along with it the
assumption that the on-disk i_flags are the same as the bits used by
FS_IOC_GETFLAGS and FS_IOC_SETFLAGS.  This is problematic because
reserving an on-disk inode flag in either filesystem's i_flags or in
these ioctls effectively reserves it in all the other places too.  In
fact, most of the "f2fs i_flags" are not used by f2fs at all.

Fix this by separating f2fs's i_flags from the ioctl bits and ext4's
i_flags.

In the process, un-reserve all "f2fs i_flags" that aren't actually
supported by f2fs.  This included various flags that were not settable
at all, as well as various flags that were settable by FS_IOC_SETFLAGS
but didn't actually do anything.

There's a slight chance we'll need to add some flag(s) back to
FS_IOC_SETFLAGS in order to avoid breaking users who expect f2fs to
accept some random flag(s).  But hopefully such users don't exist.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-06-21 10:41:57 -07:00
Kimberly Brown
176ef3c4de f2fs: replace ktype default_attrs with default_groups
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace the default_attrs fields in f2fs_sb_ktype
and f2fs_feat_ktype with default_groups. Use the ATTRIBUTE_GROUPS macro
to create f2fs_groups and f2fs_feat_groups.

Fixes: fef4129ec2 ("f2fs: fix to be aware discard/preflush/dio command in is_idle()")
Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-06-21 10:41:56 -07:00
Linus Torvalds
c884d8ac7f SPDX update for 5.2-rc6
Another round of SPDX updates for 5.2-rc6
 
 Here is what I am guessing is going to be the last "big" SPDX update for
 5.2.  It contains all of the remaining GPLv2 and GPLv2+ updates that
 were "easy" to determine by pattern matching.  The ones after this are
 going to be a bit more difficult and the people on the spdx list will be
 discussing them on a case-by-case basis now.
 
 Another 5000+ files are fixed up, so our overall totals are:
 	Files checked:            64545
 	Files with SPDX:          45529
 
 Compared to the 5.1 kernel which was:
 	Files checked:            63848
 	Files with SPDX:          22576
 This is a huge improvement.
 
 Also, we deleted another 20000 lines of boilerplate license crud, always
 nice to see in a diffstat.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXQyQYA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymnGQCghETUBotn1p3hTjY56VEs6dGzpHMAnRT0m+lv
 kbsjBGEJpLbMRB2krnaU
 =RMcT
 -----END PGP SIGNATURE-----

Merge tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx

Pull still more SPDX updates from Greg KH:
 "Another round of SPDX updates for 5.2-rc6

  Here is what I am guessing is going to be the last "big" SPDX update
  for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates
  that were "easy" to determine by pattern matching. The ones after this
  are going to be a bit more difficult and the people on the spdx list
  will be discussing them on a case-by-case basis now.

  Another 5000+ files are fixed up, so our overall totals are:
	Files checked:            64545
	Files with SPDX:          45529

  Compared to the 5.1 kernel which was:
	Files checked:            63848
	Files with SPDX:          22576

  This is a huge improvement.

  Also, we deleted another 20000 lines of boilerplate license crud,
  always nice to see in a diffstat"

* tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (65 commits)
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 506
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 498
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 496
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 490
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 489
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 488
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 487
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 486
  treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 485
  ...
2019-06-21 09:58:42 -07:00
Linus Torvalds
05512b0f46 four small SMB3 fixes, all for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAl0MOgQACgkQiiy9cAdy
 T1EJYwv9HdThounjBOZJIxWEgrmZPDzrX4gC3qMtCW5kn8VIKu3em6QAh0N8F03y
 j4NZwH1bSUPaD+mtiWbSHA7An9On6xfGbLOPzJukVPYIjm58IxrUD5PUzgZk8lFg
 5WL4F5TO7uyecokK5RP/HijgT6bg9ZAcC++dV/ZAZ0+ihGIc5iRFT+eH0Y4k3aGr
 tf97JDf02N5olOKKfOg4yfbZA0tG/A2W6BShZg+f1HaNxhBmRtzmPqMAOnGQQiL6
 uig3la1KC7czC0gnaYqbQD7Nisy7KUbeF15B5/l8/ukUMMTQig4zoynBukMiasSQ
 XsHYGzNxLoNmnw3NKyZQ8FC3k1cYRz0gxQn++QQ2q1kbOxBvwlNaNWxJ/UXRQQTB
 Prc+PqYpUabzrNWjlXUP8uliRH6vDV/0Y5Ohotl8megKWNiJV5C3LV+Zaf04IDag
 Db0W6qTZbnlrQn+0rx7F643R+zBooS+cIfgP+U6zR5KyuO32bPN5+z/o1WXL71gQ
 2PH8qw6k
 =SJLT
 -----END PGP SIGNATURE-----

Merge tag '5.2-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Four small SMB3 fixes, all for stable"

* tag '5.2-rc5-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix GlobalMid_Lock bug in cifs_reconnect
  SMB3: retry on STATUS_INSUFFICIENT_RESOURCES instead of failing write
  cifs: add spinlock for the openFileList to cifsInodeInfo
  cifs: fix panic in smb2_reconnect
2019-06-21 09:51:44 -07:00
Kimberly Brown
7c7e301406 btrfs: sysfs: Replace default_attrs in ktypes with groups
The kobj_type default_attrs field is being replaced by the
default_groups field. Replace the default_attrs fields in
btrfs_raid_ktype and space_info_ktype with default_groups.

Change "raid_attributes" to "raid_attrs", and use the ATTRIBUTE_GROUPS
macro to create raid_groups and space_info_groups.

Signed-off-by: Kimberly Brown <kimbrownkd@gmail.com>
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-21 15:38:59 +02:00
Theodore Ts'o
4e19d6b65f ext4: allow directory holes
The largedir feature was intended to allow ext4 directories to have
unmapped directory blocks (e.g., directory holes).  And so the
released e2fsprogs no longer enforces this for largedir file systems;
however, the corresponding change to the kernel-side code was not made.

This commit fixes this oversight.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2019-06-20 21:19:02 -04:00
Theodore Ts'o
9382cde8cd jbd2: drop declaration of journal_sync_buffer()
The journal_sync_buffer() function was never carried over from jbd to
jbd2.  So get rid of the vestigal declaration of this (non-existent)
function.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-06-20 17:32:21 -04:00
Ross Zwisler
73131fbb00 ext4: use jbd2_inode dirty range scoping
Use the newly introduced jbd2_inode dirty range scoping to prevent us
from waiting forever when trying to complete a journal transaction.

Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
2019-06-20 17:26:26 -04:00
Ross Zwisler
6ba0e7dc64 jbd2: introduce jbd2_inode dirty range scoping
Currently both journal_submit_inode_data_buffers() and
journal_finish_inode_data_buffers() operate on the entire address space
of each of the inodes associated with a given journal entry.  The
consequence of this is that if we have an inode where we are constantly
appending dirty pages we can end up waiting for an indefinite amount of
time in journal_finish_inode_data_buffers() while we wait for all the
pages under writeback to be written out.

The easiest way to cause this type of workload is do just dd from
/dev/zero to a file until it fills the entire filesystem.  This can
cause journal_finish_inode_data_buffers() to wait for the duration of
the entire dd operation.

We can improve this situation by scoping each of the inode dirty ranges
associated with a given transaction.  We do this via the jbd2_inode
structure so that the scoping is contained within jbd2 and so that it
follows the lifetime and locking rules for that structure.

This allows us to limit the writeback & wait in
journal_submit_inode_data_buffers() and
journal_finish_inode_data_buffers() respectively to the dirty range for
a given struct jdb2_inode, keeping us from waiting forever if the inode
in question is still being appended to.

Signed-off-by: Ross Zwisler <zwisler@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
2019-06-20 17:24:56 -04:00
Linus Torvalds
4ae004a9bc overlayfs fixes for 5.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXQvlwwAKCRDh3BK/laaZ
 PJH4AP4vnumu1Q22ZWiGkTeU93JTgHt3MGPG1r1DtnUmsIKRfwEAjDY8bvuOP7Vw
 EYQicghvAPTHWqyGUoe0QZJwPlMiZw4=
 =TKOM
 -----END PGP SIGNATURE-----

Merge tag 'ovl-fixes-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs fixes from Miklos Szeredi:
 "Fix two regressions in this cycle, and a couple of older bugs"

* tag 'ovl-fixes-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: make i_ino consistent with st_ino in more cases
  ovl: fix typo in MODULE_PARM_DESC
  ovl: fix bogus -Wmaybe-unitialized warning
  ovl: don't fail with disconnected lower NFS
  ovl: fix wrong flags check in FS_IOC_FS[SG]ETXATTR ioctls
2019-06-20 14:19:34 -07:00
Linus Torvalds
b910f6a7cc fuse fixes for 5.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXQvk/wAKCRDh3BK/laaZ
 POwjAP9hPq9pTlX3YZsz14DcoBdz8iyFXNWMj7eQCL4GioCMKgEA3XNajQyf9DLK
 bWRkAdYHVAcP0QueK5ReNYl3pV66mw4=
 =f2l/
 -----END PGP SIGNATURE-----

Merge tag 'fuse-fixes-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse fix from Miklos Szeredi:
 "Just a single revert, fixing a regression in -rc1"

* tag 'fuse-fixes-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  Revert "fuse: require /dev/fuse reads to have enough buffer capacity"
2019-06-20 14:16:16 -07:00
Linus Torvalds
d72558b2b3 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAl0LhAIACgkQnJ2qBz9k
 QNnUqwf/d7fNZv0+GJVBIrIVbSUgHqzJYxakMWAS6NGMmd2fkPcoPRHitXWbi5MJ
 fhJPFceNVqY30RPQUePlDmWSitEDI0kdaNZ3Z8SzE9YszaEgoLNAN/dpOuPGpQfh
 kXQd7yM1cBZJoAv5kQsECiYXfY7nk+3J+DVsu69rBcsooxT5rfXs00Dz9ETao9gK
 L1SR/s5C6b2t0m0EfQpv/+PjbzPQPLKngvihvFesAT6lSA6QpRMY7M8+4Es3rzuI
 7h0kuThkJaIp9B+D9C8vYIT+uVQVjsN9wXozJHXRNvnK/4mfDvYJdWSkRhqP5p1a
 DBRo/jK8meV1ZvIEsLjARxHg0z7yAA==
 =PlCd
 -----END PGP SIGNATURE-----

Merge tag 'for_v5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull two misc vfs fixes from Jan Kara:
 "One small quota fix fixing spurious EDQUOT errors and one fanotify fix
  fixing a bug in the new fanotify FID reporting code"

* tag 'for_v5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fanotify: update connector fsid cache on add mark
  quota: fix a problem about transfer quota
2019-06-20 10:12:53 -07:00
Zhengyuan Liu
ee102584ef fs/afs: use struct_size() in kzalloc()
As Gustavo said in other patches doing the same replace, we can now
use the new struct_size() helper to avoid leaving these open-coded and
prone to type mistake.

Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 18:12:17 +01:00
David Howells
4521819369 afs: Trace afs_server usage
Add a tracepoint (afs_server) to track the afs_server object usage count.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 18:12:17 +01:00
David Howells
051d25250b afs: Add some callback management tracepoints
Add a couple of tracepoints to track callback management:

 (1) afs_cb_miss - Logs when we were unable to apply a callback, either due
     to the inode being discarded or due to a competing thread applying a
     callback first.

 (2) afs_cb_break - Logs when we attempted to clear the noted callback
     promise, either due to the server explicitly breaking the callback,
     the callback promise lapsing or a local event obsoleting it.

Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 18:12:16 +01:00
David Howells
fa59f52f5b afs: afs_unlink() doesn't need to check dentry->d_inode
Don't check that dentry->d_inode is valid in afs_unlink().  We should be
able to take that as given.

This caused Smatch to issue the following warning:

	fs/afs/dir.c:1392 afs_unlink() error: we previously assumed 'vnode' could be null (see line 1375)

Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 18:12:16 +01:00
David Howells
2cd42d19cf afs: Fix setting of i_blocks
The setting of i_blocks, which is calculated from i_size, has got
accidentally misordered relative to the setting of i_size when initially
setting up an inode.  Further, i_blocks isn't updated by afs_apply_status()
when the size is updated.

To fix this, break the i_size/i_blocks setting out into a helper function
and call it from both places.

Fixes: a58823ac45 ("afs: Fix application of status and callback to be under same lock")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 18:12:02 +01:00
Linus Torvalds
41a247d896 for-linus-20190620
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl0LUG0QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplqNEADDmx5+r01qqeVKHbcKPFdd1BZMxVhun0cI
 u1kaeQijGCuYeWACuwXAuMRovEjr/lz9ClJVAKqT+e+wKtbEnRzT1fgG2elYU/ta
 gSAFzqbQOidY+r4oF+xsqJLduOlFtNbiPtyFWzBf/FHe53FS3OT017FJ+SaIE4eD
 ljzo4QD2Sv3/c3CGbbCZUGdIMd4c/7qwU+dHeoVDOG3o8FAYCwewA/XCJQ9VZXgW
 38bpRPvEZ9nvXP00C5Khzsqyxo3P+A2qk1+z3Bx4d8Dw64+jUVoYNdws8qr13MZu
 +EwHy91cvBCF1mzu0+X3irDh+Di+uuzvQ0Nfd7E1xkTNUKSc7ql7XpYoAyF3D7E3
 /4M864cFcaXq6RVY25uq92vUPk4bKsugR19zmKe8PYKrhG0NhRncJNSXNV1coyhD
 Nfu4EKybTwBcdJO8hvs8moAjLPLPtcWopLrHq9CoCqTC8RAIG1IT8OWfaqQuEBCn
 RlzaCuAHP2QBdkZ/69BK48/OSSqhnsQF200pRDA+3NJnoX5UIcqdFwXNu0GUaqzg
 nmqWiNorIvKKmWTWDi8LgnqYM1WU6K30ix1yG848e9Clkw/pwKPLc0FuDBiynIqy
 GD0FZK4v4z8gz0GeASJqqefI63DnT8CeQvuCuRDQoyLl46ND7kXvI04Q3QvTXpUD
 I6wfzGZJvQ==
 =rUI5
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190620' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Three fixes that should go into this series.

  One is a set of two patches from Christoph, fixing a page leak on same
  page merges. Boiled down version of a bigger fix, but this one is more
  appropriate for this late in the cycle (and easier to backport to
  stable).

  The last patch is for a divide error in MD, from Mariusz (via Song)"

* tag 'for-linus-20190620' of git://git.kernel.dk/linux-block:
  md: fix for divide error in status_resync
  block: fix page leak when merging to same page
  block: return from __bio_try_merge_page if merging occured in the same page
2019-06-20 09:58:35 -07:00
David Howells
90fa9b6452 afs: Fix uninitialised spinlock afs_volume::cb_break_lock
Fix the cb_break_lock spinlock in afs_volume struct by initialising it when
the volume record is allocated.

Also rename the lock to cb_v_break_lock to distinguish it from the lock of
the same name in the afs_server struct.

Without this, the following trace may be observed when a volume-break
callback is received:

  INFO: trying to register non-static key.
  the code is fine but needs lockdep annotation.
  turning off the locking correctness validator.
  CPU: 2 PID: 50 Comm: kworker/2:1 Not tainted 5.2.0-rc1-fscache+ #3045
  Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
  Workqueue: afs SRXAFSCB_CallBack
  Call Trace:
   dump_stack+0x67/0x8e
   register_lock_class+0x23b/0x421
   ? check_usage_forwards+0x13c/0x13c
   __lock_acquire+0x89/0xf73
   lock_acquire+0x13b/0x166
   ? afs_break_callbacks+0x1b2/0x3dd
   _raw_write_lock+0x2c/0x36
   ? afs_break_callbacks+0x1b2/0x3dd
   afs_break_callbacks+0x1b2/0x3dd
   ? trace_event_raw_event_afs_server+0x61/0xac
   SRXAFSCB_CallBack+0x11f/0x16c
   process_one_work+0x2c5/0x4ee
   ? worker_thread+0x234/0x2ac
   worker_thread+0x1d8/0x2ac
   ? cancel_delayed_work_sync+0xf/0xf
   kthread+0x11f/0x127
   ? kthread_park+0x76/0x76
   ret_from_fork+0x24/0x30

Fixes: 68251f0a68 ("afs: Fix whole-volume callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 16:49:35 +01:00
David Howells
a6853b9ce8 afs: Fix vlserver record corruption
Because I made the afs_call struct share pointers to an afs_server object
and an afs_vlserver object to save space, afs_put_call() calls
afs_put_server() on afs_vlserver object (which is only meant for the
afs_server object) because it sees that call->server isn't NULL.

This means that the afs_vlserver object gets unpredictably and randomly
modified, depending on what config options are set (such as lockdep).

Fix this by getting rid of the union and having two non-overlapping
pointers in the afs_call struct.

Fixes: ffba718e93 ("afs: Get rid of afs_call::reply[]")
Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 16:49:35 +01:00
David Howells
3647e42b55 afs: Fix over zealous "vnode modified" warnings
Occasionally, warnings like this:

	vnode modified 2af7 on {10000b:1} [exp 2af2] YFS.FetchStatus(vnode)

are emitted into the kernel log.  This indicates that when we were applying
the updated vnode (file) status retrieved from the server to an inode we
saw that the data version number wasn't what we were expecting (in this
case it's 0x2af7 rather than 0x2af2).

We've usually received a callback from the server prior to this point - or
the callback promise has lapsed - so the warning is merely informative and
the state is to be expected.

Fix this by only emitting the warning if the we still think that we have a
valid callback promise and haven't received a callback.

Also change the format slightly so so that the new data version doesn't
look like part of the text, the like is prefixed with "kAFS: " and the
message is ranked as a warning.

Fixes: 31143d5d51 ("AFS: implement basic file write support")
Reported-by: Ian Wienand <iwienand@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
2019-06-20 16:49:34 +01:00
Amir Goldstein
7377f5bec1 fsnotify: get rid of fsnotify_nameremove()
For all callers of fsnotify_{unlink,rmdir}(), we made sure that d_parent
and d_name are stable.  Therefore, fsnotify_{unlink,rmdir}() do not need
the safety measures in fsnotify_nameremove() to stabilize parent and name.
We can now simplify those hooks and get rid of fsnotify_nameremove().

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2019-06-20 14:47:54 +02:00
Amir Goldstein
49246466a9 fsnotify: move fsnotify_nameremove() hook out of d_delete()
d_delete() was piggy backed for the fsnotify_nameremove() hook when
in fact not all callers of d_delete() care about fsnotify events.

For all callers of d_delete() that may be interested in fsnotify events,
we made sure to call one of fsnotify_{unlink,rmdir}() hooks before
calling d_delete().

Now we can move the fsnotify_nameremove() call from d_delete() to the
fsnotify_{unlink,rmdir}() hooks.

Two explicit calls to fsnotify_nameremove() from nfs/afs sillyrename
are also removed. This will cause a change of behavior - nfs/afs will
NOT generate an fsnotify delete event when renaming over a positive
dentry.  This change is desirable, because it is consistent with the
behavior of all other filesystems.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2019-06-20 14:47:44 +02:00
Amir Goldstein
6146e78c03 configfs: call fsnotify_rmdir() hook
This will allow generating fsnotify delete events on unregister
of group/subsystem after the fsnotify_nameremove() hook is removed
from d_delete().

The rest of the d_delete() calls from this filesystem are either
called recursively from within debugfs_unregister_{group,subsystem},
called from a vfs function that already has delete hooks or are
called from shutdown/cleanup code.

Cc: Joel Becker <jlbec@evilplan.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2019-06-20 14:47:21 +02:00
Amir Goldstein
6679ea6dea debugfs: call fsnotify_{unlink,rmdir}() hooks
This will allow generating fsnotify delete events after the
fsnotify_nameremove() hook is removed from d_delete().

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2019-06-20 14:47:09 +02:00
Amir Goldstein
823e545c02 debugfs: simplify __debugfs_remove_file()
Move simple_unlink()+d_delete() from __debugfs_remove_file() into
caller __debugfs_remove() and rename helper for post remove file to
__debugfs_file_removed().

This will simplify adding fsnotify_unlink() hook.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2019-06-20 14:46:42 +02:00
Amir Goldstein
fd0d506f2b devpts: call fsnotify_unlink() hook
This will allow generating fsnotify delete events after the
fsnotify_nameremove() hook is removed from d_delete().

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2019-06-20 14:46:34 +02:00
Amir Goldstein
4bf2377472 tracefs: call fsnotify_{unlink,rmdir}() hooks
This will allow generating fsnotify delete events after the
fsnotify_nameremove() hook is removed from d_delete().

Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2019-06-20 14:46:31 +02:00
Amir Goldstein
46008d9d3f btrfs: call fsnotify_rmdir() hook
This will allow generating fsnotify delete events after the
fsnotify_nameremove() hook is removed from d_delete().

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2019-06-20 14:45:04 +02:00
Amir Goldstein
116b9731ad fsnotify: add empty fsnotify_{unlink,rmdir}() hooks
We would like to move fsnotify_nameremove() calls from d_delete()
into a higher layer where the hook makes more sense and so we can
consider every d_delete() call site individually.

Start by creating empty hook fsnotify_{unlink,rmdir}() and place
them in the proper VFS call sites.  After all d_delete() call sites
will be converted to use the new hook, the new hook will generate the
delete events and fsnotify_nameremove() hook will be removed.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2019-06-20 14:44:55 +02:00
Lianbo Jiang
4eb5fec31e fs/proc/vmcore: Enable dumping of encrypted memory when SEV was active
In the kdump kernel, the memory of the first kernel gets to be dumped
into a vmcore file.

Similarly to SME kdump, if SEV was enabled in the first kernel, the old
memory has to be remapped encrypted in order to access it properly.

Commit

  992b649a3f ("kdump, proc/vmcore: Enable kdumping encrypted memory with SME enabled")

took care of the SME case but it uses sme_active() which checks for SME
only. Use mem_encrypt_active() instead, which returns true when either
SME or SEV is active.

Unlike SME, the second kernel images (kernel and initrd) are loaded into
encrypted memory when SEV is active, hence the kernel elf header must be
remapped as encrypted in order to access it properly.

 [ bp: Massage commit message. ]

Co-developed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Lianbo Jiang <lijiang@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: bhe@redhat.com
Cc: dyoung@redhat.com
Cc: Ganesh Goudar <ganeshgr@chelsio.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: kexec@lists.infradead.org
Cc: linux-fsdevel@vger.kernel.org
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: mingo@redhat.com
Cc: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190430074421.7852-4-lijiang@redhat.com
2019-06-20 10:07:49 +02:00
Ard Biesheuvel
97a5fee2bd fs: cifs: switch to RC4 library interface
The CIFS code uses the sync skcipher API to invoke the ecb(arc4) skcipher,
of which only a single generic C code implementation exists. This means
that going through all the trouble of using scatterlists etc buys us
very little, and we're better off just invoking the arc4 library directly.

This also reverts commit 5f4b55699a ("CIFS: Fix BUG() in calc_seckey()"),
since it is no longer necessary to allocate sec_key on the heap.

Cc: linux-cifs@vger.kernel.org
Cc: Steve French <sfrench@samba.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-06-20 14:19:55 +08:00
Colin Ian King
c708b1c6de ext4: remove redundant assignment to node
Pointer 'node' is assigned a value that is never read, node is
later overwritten when it re-assigned a different value inside
the while-loop.  The assignment is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2019-06-20 00:10:10 -04:00
Gabriel Krisman Bertazi
3ae72562ad ext4: optimize case-insensitive lookups
Temporarily cache a casefolded version of the file name under lookup in
ext4_filename, to avoid repeatedly casefolding it.  I got up to 30%
speedup on lookups of large directories (>100k entries), depending on
the length of the string under lookup.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-06-19 23:45:09 -04:00
zhangjs
b03755ad6f ext4: make __ext4_get_inode_loc plug
Add a blk_plug to prevent the inode table readahead from being
submitted as small I/O requests.

Signed-off-by: zhangjs <zachary@baishancloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2019-06-19 23:41:29 -04:00
Theodore Ts'o
c60990b361 ext4: clean up kerneldoc warnigns when building with W=1
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2019-06-19 16:30:03 -04:00
Jan Kara
936bbf3aea ext2: Always brelse bh on failure in ext2_iget()
All but one bail out paths in ext2_iget() is releasing bh. Move the
releasing of bh into a common error handling code.

Signed-off-by: Jan Kara <jack@suse.cz>
2019-06-19 18:29:45 +02:00