Commit Graph

1271 Commits

Author SHA1 Message Date
Christoph Hellwig
1a1c57b282 xfs: remove the special COW fork handling in xfs_bmapi_read
We don't call xfs_bmapi_read for the COW fork anymore, so remove the
special casing.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Christoph Hellwig
0f45a1b20c xfs: improve local fork verification
Call the data/attr local fork verifiers as soon as we are ready for them.
This keeps them close to the code setting up the forks, and avoids a
few branches later on.  Also open code xfs_inode_verify_forks in the
only remaining caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:58 -07:00
Christoph Hellwig
7c7ba21863 xfs: refactor xfs_inode_verify_forks
The split between xfs_inode_verify_forks and the two helpers
implementing the actual functionality is a little strange.  Reshuffle
it so that xfs_inode_verify_forks verifies if the data and attr forks
are actually in local format and only call the low-level helpers if
that is the case.  Handle the actual error reporting in the low-level
handlers to streamline the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:57 -07:00
Christoph Hellwig
1934c8bd81 xfs: remove xfs_ifork_ops
xfs_ifork_ops add up to two indirect calls per inode read and flush,
despite just having a single instance in the kernel.  In xfsprogs
phase6 in xfs_repair overrides the verify_dir method to deal with inodes
that do not have a valid parent, but that can be fixed pretty easily
by ensuring they always have a valid looking parent.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:57 -07:00
Christoph Hellwig
bb8a66af4f xfs: remove xfs_iread
There is not much point in the xfs_iread function, as it has a single
caller and not a whole lot of code.  Move it into the only caller,
and trim down the overdocumentation to just documenting the important
"why" instead of a lot of redundant "what".

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:57 -07:00
Christoph Hellwig
7f02901235 xfs: don't reset i_delayed_blks in xfs_iread
i_delayed_blks is set to 0 in xfs_inode_alloc and can't have anything
assigned to it until the inode is visible to the VFS.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:57 -07:00
Christoph Hellwig
2d6051d496 xfs: call xfs_dinode_verify from xfs_inode_from_disk
Keep the code dealing with the dinode together, and also ensure we verify
the dinode in the owner change log recovery case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:57 -07:00
Christoph Hellwig
0bce8173fd xfs: handle unallocated inodes in xfs_inode_from_disk
Handle inodes with a 0 di_mode in xfs_inode_from_disk, instead of partially
duplicating inode reading in xfs_iread.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:57 -07:00
Christoph Hellwig
9229d18e80 xfs: split xfs_iformat_fork
xfs_iformat_fork is a weird catchall.  Split it into one helper for
the data fork and one for the attr fork, and then call both helper
as well as the COW fork initialization from xfs_inode_from_disk.  Order
the COW fork initialization after the attr fork initialization given
that it can't fail to simplify the error handling.

Note that the newly split helpers are moved down the file in
xfs_inode_fork.c to avoid the need for forward declarations.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:57 -07:00
Christoph Hellwig
cb7d585944 xfs: call xfs_iformat_fork from xfs_inode_from_disk
We always need to fill out the fork structures when reading the inode,
so call xfs_iformat_fork from the tail of xfs_inode_from_disk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:57 -07:00
Christoph Hellwig
b90c2a9c8b xfs: xfs_bmapi_read doesn't take a fork id as the last argument
The last argument to xfs_bmapi_raad contains XFS_BMAPI_* flags, not the
fork.  Given that XFS_DATA_FORK evaluates to 0 no real harm is done,
but let's fix this anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:57 -07:00
Kaixu Xia
14506f7a91 xfs: fix the warning message in xfs_validate_sb_common()
Fix this error message to complain about project and group quota flag
bits instead of "PUOTA" and "QUOTA".

Signed-off-by: Kaixu Xia <kaixuxia@tencent.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-19 09:40:56 -07:00
Darrick J. Wong
78bba5c812 xfs: use ordered buffers to initialize dquot buffers during quotacheck
While QAing the new xfs_repair quotacheck code, I uncovered a quota
corruption bug resulting from a bad interaction between dquot buffer
initialization and quotacheck.  The bug can be reproduced with the
following sequence:

# mkfs.xfs -f /dev/sdf
# mount /dev/sdf /opt -o usrquota
# su nobody -s /bin/bash -c 'touch /opt/barf'
# sync
# xfs_quota -x -c 'report -ahi' /opt
User quota on /opt (/dev/sdf)
                        Inodes
User ID      Used   Soft   Hard Warn/Grace
---------- ---------------------------------
root            3      0      0  00 [------]
nobody          1      0      0  00 [------]

# xfs_io -x -c 'shutdown' /opt
# umount /opt
# mount /dev/sdf /opt -o usrquota
# touch /opt/man2
# xfs_quota -x -c 'report -ahi' /opt
User quota on /opt (/dev/sdf)
                        Inodes
User ID      Used   Soft   Hard Warn/Grace
---------- ---------------------------------
root            1      0      0  00 [------]
nobody          1      0      0  00 [------]

# umount /opt

Notice how the initial quotacheck set the root dquot icount to 3
(rootino, rbmino, rsumino), but after shutdown -> remount -> recovery,
xfs_quota reports that the root dquot has only 1 icount.  We haven't
deleted anything from the filesystem, which means that quota is now
under-counting.  This behavior is not limited to icount or the root
dquot, but this is the shortest reproducer.

I traced the cause of this discrepancy to the way that we handle ondisk
dquot updates during quotacheck vs. regular fs activity.  Normally, when
we allocate a disk block for a dquot, we log the buffer as a regular
(dquot) buffer.  Subsequent updates to the dquots backed by that block
are done via separate dquot log item updates, which means that they
depend on the logged buffer update being written to disk before the
dquot items.  Because individual dquots have their own LSN fields, that
initial dquot buffer must always be recovered.

However, the story changes for quotacheck, which can cause dquot block
allocations but persists the final dquot counter values via a delwri
list.  Because recovery doesn't gate dquot buffer replay on an LSN, this
means that the initial dquot buffer can be replayed over the (newer)
contents that were delwritten at the end of quotacheck.  In effect, this
re-initializes the dquot counters after they've been updated.  If the
log does not contain any other dquot items to recover, the obsolete
dquot contents will not be corrected by log recovery.

Because quotacheck uses a transaction to log the setting of the CHKD
flags in the superblock, we skip quotacheck during the second mount
call, which allows the incorrect icount to remain.

Fix this by changing the ondisk dquot initialization function to use
ordered buffers to write out fresh dquot blocks if it detects that we're
running quotacheck.  If the system goes down before quotacheck can
complete, the CHKD flags will not be set in the superblock and the next
mount will run quotacheck again, which can fix uninitialized dquot
buffers.  This requires amending the defer code to maintaine ordered
buffer state across defer rolls for the sake of the dquot allocation
code.

For regular operations we preserve the current behavior since the dquot
items require properly initialized ondisk dquot records.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-19 09:40:56 -07:00
Brian Foster
f28cef9e4d xfs: don't fail verifier on empty attr3 leaf block
The attr fork can transition from shortform to leaf format while
empty if the first xattr doesn't fit in shortform. While this empty
leaf block state is intended to be transient, it is technically not
due to the transactional implementation of the xattr set operation.

We historically have a couple of bandaids to work around this
problem. The first is to hold the buffer after the format conversion
to prevent premature writeback of the empty leaf buffer and the
second is to bypass the xattr count check in the verifier during
recovery. The latter assumes that the xattr set is also in the log
and will be recovered into the buffer soon after the empty leaf
buffer is reconstructed. This is not guaranteed, however.

If the filesystem crashes after the format conversion but before the
xattr set that induced it, only the format conversion may exist in
the log. When recovered, this creates a latent corrupted state on
the inode as any subsequent attempts to read the buffer fail due to
verifier failure. This includes further attempts to set xattrs on
the inode or attempts to destroy the attr fork, which prevents the
inode from ever being removed from the unlinked list.

To avoid this condition, accept that an empty attr leaf block is a
valid state and remove the count check from the verifier. This means
that on rare occasions an attr fork might exist in an unexpected
state, but is otherwise consistent and functional. Note that we
retain the logic to avoid racing with metadata writeback to reduce
the window where this can occur.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-19 09:05:24 -07:00
Nishad Kamdar
508578f2f5 xfs: Use the correct style for SPDX License Identifier
This patch corrects the SPDX License Identifier style in header files
related to XFS File System support. For C header files
Documentation/process/license-rules.rst mandates C-like comments.
(opposed to C source files where C++ style should be used).

Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-13 15:32:45 -07:00
Gustavo A. R. Silva
ee4064e56c xfs: Replace zero-length array with flexible-array
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-13 15:32:45 -07:00
Darrick J. Wong
17d29bf271 xfs: move log recovery buffer cancellation code to xfs_buf_item_recover.c
Move the helpers that handle incore buffer cancellation records to
xfs_buf_item_recover.c since they're not directly related to the main
log recovery machinery.  No functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08 08:50:01 -07:00
Darrick J. Wong
154c733a33 xfs: refactor releasing finished intents during log recovery
Replace the open-coded AIL item walking with a proper helper when we're
trying to release an intent item that has been finished.  We add a new
->iop_match method to decide if an intent item matches a supplied ID.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08 08:50:00 -07:00
Darrick J. Wong
1094d3f123 xfs: refactor log recovery buffer item dispatch for pass2 commit functions
Move the log buffer item pass2 commit code into the per-item source code
files and use the dispatch function to call it.  We do these one at a
time because there's a lot of code to move.  No functional changes.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08 08:49:58 -07:00
Darrick J. Wong
3304a4fabd xfs: refactor log recovery item dispatch for pass1 commit functions
Move the pass1 commit code into the per-item source code files and use
the dispatch function to call them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08 08:49:58 -07:00
Darrick J. Wong
8ea5682d07 xfs: refactor log recovery item dispatch for pass2 readhead functions
Move the pass2 readhead code into the per-item source code files and use
the dispatch function to call them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08 08:49:58 -07:00
Darrick J. Wong
86ffa471d9 xfs: refactor log recovery item sorting into a generic dispatch structure
Create a generic dispatch structure to delegate recovery of different
log item types into various code modules.  This will enable us to move
code specific to a particular log item type out of xfs_log_recover.c and
into the log item source.

The first operation we virtualize is the log item sorting.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08 08:49:58 -07:00
Darrick J. Wong
35f4521fd3 xfs: convert xfs_log_recover_item_t to struct xfs_log_recover_item
Remove the old typedefs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-05-08 08:49:58 -07:00
Brian Foster
c199507993 xfs: remove unused iget_flags param from xfs_imap_to_bp()
iget_flags is unused in xfs_imap_to_bp(). Remove the parameter and
fix up the callers.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-07 08:27:49 -07:00
Brian Foster
7376d74547 xfs: random buffer write failure errortag
Introduce an error tag to randomly fail async buffer writes. This is
primarily to facilitate testing of the XFS error configuration
mechanism.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-07 08:27:48 -07:00
Brian Foster
15fab3b9be xfs: remove unnecessary shutdown check from xfs_iflush()
The shutdown check in xfs_iflush() duplicates checks down in the
buffer code. If the fs is shut down, xfs_trans_read_buf_map() always
returns an error and falls into the same error path. Remove the
unnecessary check along with the warning in xfs_imap_to_bp()
that generates excessive noise in the log if the fs is shut down.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-07 08:27:46 -07:00
Christoph Hellwig
2f88f1efd0 xfs: spell out the parameter name for ->cancel_item
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:17 -07:00
Christoph Hellwig
3ec1b26c04 xfs: use a xfs_btree_cur for the ->finish_cleanup state
Given how XFS is all based around btrees it doesn't make much sense
to offer a totally generic state when we can just use the btree cursor.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:17 -07:00
Christoph Hellwig
f09d167c20 xfs: turn dfp_done into a xfs_log_item
All defer op instance place their own extension of the log item into
the dfp_done field.  Replace that with a xfs_log_item to improve type
safety and make the code easier to follow.

Also use the opportunity to improve the ->finish_item calling conventions
to place the done log item as the higher level structure before the
list_entry used for the individual items.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:17 -07:00
Christoph Hellwig
bb47d79750 xfs: refactor xfs_defer_finish_noroll
Split out a helper that operates on a single xfs_defer_pending structure
to untangle the code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:17 -07:00
Christoph Hellwig
13a8333339 xfs: turn dfp_intent into a xfs_log_item
All defer op instance place their own extension of the log item into
the dfp_intent field.  Replace that with a xfs_log_item to improve type
safety and make the code easier to follow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:16 -07:00
Christoph Hellwig
d367a868e4 xfs: merge the ->diff_items defer op into ->create_intent
This avoids a per-item indirect call, and also simplifies the interface
a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:16 -07:00
Christoph Hellwig
c1f09188e8 xfs: merge the ->log_item defer op into ->create_intent
These are aways called together, and my merging them we reduce the amount
of indirect calls, improve type safety and in general clean up the code
a bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:16 -07:00
Christoph Hellwig
e046e94948 xfs: factor out a xfs_defer_create_intent helper
Create a helper that encapsulates the whole logic to create a defer
intent.  This reorders some of the work that was done, but none of
that has an affect on the operation as only fields that don't directly
interact are affected.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:16 -07:00
Christoph Hellwig
fd9cbe5121 xfs: remove the xfs_inode_log_item_t typedef
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-05-04 09:03:16 -07:00
Darrick J. Wong
f8e566c0f5 xfs: validate the realtime geometry in xfs_validate_sb_common
Validate the geometry of the realtime geometry when we mount the
filesystem, so that we don't abruptly shut down the filesystem later on.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-03-27 08:32:53 -07:00
Christoph Hellwig
6471e9c5e7 xfs: remove the di_version field from struct icdinode
We know the version is 3 if on a v5 file system.   For earlier file
systems formats we always upgrade the remaining v1 inodes to v2 and
thus only use v2 inodes.  Use the xfs_sb_version_has_large_dinode
helper to check if we deal with small or large dinodes, and thus
remove the need for the di_version field in struct icdinode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-19 08:48:47 -07:00
Christoph Hellwig
e9e2eae89d xfs: only check the superblock version for dinode size calculation
The size of the dinode structure is only dependent on the file system
version, so instead of checking the individual inode version just use
the newly added xfs_sb_version_has_large_dinode helper, and simplify
various calling conventions.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-19 08:48:47 -07:00
Christoph Hellwig
b81b79f4ed xfs: add a new xfs_sb_version_has_v3inode helper
Add a new wrapper to check if a file system supports the v3 inode format
with a larger dinode core.  Previously we used xfs_sb_version_hascrc for
that, which is technically correct but a little confusing to read.

Also move xfs_dinode_good_version next to xfs_sb_version_has_v3inode
so that we have one place that documents the superblock version to
inode version relationship.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-19 08:47:34 -07:00
Darrick J. Wong
59d677127c xfs: add support for rmap btree staging cursors
Add support for btree staging cursors for the rmap btrees.  This is
needed both for online repair and also to convert xfs_repair to use
btree bulk loading.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18 08:12:23 -07:00
Darrick J. Wong
56e98164ff xfs: add support for refcount btree staging cursors
Add support for btree staging cursors for the refcount btrees.  This
is needed both for online repair and also to convert xfs_repair to use
btree bulk loading.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18 08:12:23 -07:00
Darrick J. Wong
c29ce8f48e xfs: add support for inode btree staging cursors
Add support for btree staging cursors for the inode btrees.  This
is needed both for online repair and also to convert xfs_repair to use
btree bulk loading.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18 08:12:23 -07:00
Darrick J. Wong
e6eb33d905 xfs: add support for free space btree staging cursors
Add support for btree staging cursors for the free space btrees.  This
is needed both for online repair and also to convert xfs_repair to use
btree bulk loading.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18 08:12:23 -07:00
Darrick J. Wong
60e3d70707 xfs: support bulk loading of staged btrees
Add a new btree function that enables us to bulk load a btree cursor.
This will be used by the upcoming online repair patches to generate new
btrees.  This avoids the programmatic inefficiency of calling
xfs_btree_insert in a loop (which generates a lot of log traffic) in
favor of stamping out new btree blocks with ordered buffers, and then
committing both the new root and scheduling the removal of the old btree
blocks in a single transaction commit.

The design of this new generic code is based off the btree rebuilding
code in xfs_repair's phase 5 code, with the explicit goal of enabling us
to share that code between scrub and repair.  It has the additional
feature of being able to control btree block loading factors.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18 08:12:23 -07:00
Darrick J. Wong
349e1c0380 xfs: introduce fake roots for inode-rooted btrees
Create an in-core fake root for inode-rooted btree types so that callers
can generate a whole new btree using the upcoming btree bulk load
function without making the new tree accessible from the rest of the
filesystem.  It is up to the individual btree type to provide a function
to create a staged cursor (presumably with the appropriate callouts to
update the fakeroot) and then commit the staged root back into the
filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18 08:12:23 -07:00
Darrick J. Wong
e06536a692 xfs: introduce fake roots for ag-rooted btrees
Create an in-core fake root for AG-rooted btree types so that callers
can generate a whole new btree using the upcoming btree bulk load
function without making the new tree accessible from the rest of the
filesystem.  It is up to the individual btree type to provide a function
to create a staged cursor (presumably with the appropriate callouts to
update the fakeroot) and then commit the staged root back into the
filesystem.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-18 08:12:23 -07:00
Darrick J. Wong
77ca1eed5a xfs: fix incorrect test in xfs_alloc_ag_vextent_lastblock
When I lifted the code in xfs_alloc_ag_vextent_lastblock out of a loop,
I forgot to convert all the accesses to len to be pointer dereferences.

Coverity-id: 1457918
Fixes: 5113f8ec37 ("xfs: clean up weird while loop in xfs_alloc_ag_vextent_near")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-03-17 07:32:46 -07:00
Darrick J. Wong
faf8ee8476 xfs: xfs_dabuf_map should return ENOMEM when map allocation fails
If the xfs_buf_map array allocation in xfs_dabuf_map fails for whatever
reason, we bail out with error code zero.  This will confuse callers, so
make sure that we return ENOMEM.  Allocation failure should never happen
with the small size of the array, but code defensively anyway.

Fixes: 45feef8f50 ("xfs: refactor xfs_dabuf_map")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
2020-03-15 09:22:35 -07:00
Dave Chinner
c4aa10d041 xfs: make the btree ag cursor private union anonymous
This is much less widely used than the bc_private union was, so this
is done as a single patch. The named union xfs_btree_cur_private
goes away and is embedded into the struct xfs_btree_cur_ag as an
anonymous union, and the code is modified via this script:

$ sed -i 's/priv\.\([abt|refc]\)/\1/g' fs/xfs/*[ch] fs/xfs/*/*[ch]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-13 10:37:15 -07:00
Dave Chinner
68422d90da xfs: make the btree cursor union members named structure
we need to name the btree cursor private structures to be able
to pull them out of the deeply nested structure definition they are
in now.

Based on code extracted from a patchset by Darrick Wong.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-13 10:37:14 -07:00
Dave Chinner
352890735e xfs: make btree cursor private union anonymous
Rename the union and it's internal structures to the new name and
remove the temporary defines that facilitated the change.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-13 10:37:14 -07:00
Dave Chinner
8ef547976a xfs: rename btree cursor private btree member flags
BPRV is not longer appropriate because bc_private is going away.
Script:

$ sed -i 's/BTCUR_BPRV/BTCUR_BMBT/g' fs/xfs/*[ch] fs/xfs/*/*[ch]

With manual cleanup to the definitions in fs/xfs/libxfs/xfs_btree.h

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: change "BC_BT" to "BTCUR_BMBT", fix subject line typo]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-13 10:37:14 -07:00
Dave Chinner
92219c292a xfs: convert btree cursor inode-private member names
bc_private.b -> bc_ino conversion via script:

$ sed -i 's/bc_private\.b/bc_ino/g' fs/xfs/*[ch] fs/xfs/*/*[ch]

And then revert the change to the bc_ino #define in
fs/xfs/libxfs/xfs_btree.h manually.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: tweak the subject line slightly]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-13 10:37:14 -07:00
Dave Chinner
576af73228 xfs: convert btree cursor ag-private member name
bc_private.a -> bc_ag conversion via script:

`sed -i 's/bc_private\.a/bc_ag/g' fs/xfs/*[ch] fs/xfs/*/*[ch]`

And then revert the change to the bc_ag #define in
fs/xfs/libxfs/xfs_btree.h manually.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-13 10:37:14 -07:00
Dave Chinner
7cace18ab5 xfs: introduce new private btree cursor names
Just the defines of the new names - the conversion will be in
scripted commits after this.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: change "bc_bt" to "bc_ino"]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2020-03-13 10:37:13 -07:00
Darrick J. Wong
1b2c1a63b6 xfs: check owner of dir3 blocks
Check the owner field of dir3 block headers.  If it's corrupt, release
the buffer and return EFSCORRUPTED.  All callers handle this properly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-03-12 07:58:13 -07:00
Darrick J. Wong
a10c21ed5d xfs: check owner of dir3 data blocks
Check the owner field of dir3 data block headers.  If it's corrupt,
release the buffer and return EFSCORRUPTED.  All callers handle this
properly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-03-12 07:58:12 -07:00
Darrick J. Wong
6fb5aac733 xfs: check owner of dir3 free blocks
Check the owner field of dir3 free block headers and reject the metadata
if there's something wrong with it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-03-12 07:58:12 -07:00
Darrick J. Wong
1cb5deb5bc xfs: don't ever return a stale pointer from __xfs_dir3_free_read
If we decide that a directory free block is corrupt, we must take care
not to leak a buffer pointer to the caller.  After xfs_trans_brelse
returns, the buffer can be freed or reused, which means that we have to
set *bpp back to NULL.

Callers are supposed to notice the nonzero return value and not use the
buffer pointer, but we should code more defensively, even if all current
callers handle this situation correctly.

Fixes: de14c5f541 ("xfs: verify free block header fields")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-03-12 07:58:12 -07:00
Darrick J. Wong
ce99494c96 xfs: fix buffer corruption reporting when xfs_dir3_free_header_check fails
xfs_verifier_error is supposed to be called on a corrupt metadata buffer
from within a buffer verifier function, whereas xfs_buf_mark_corrupt
is the function to be called when a piece of code has read a buffer and
catches something that a read verifier cannot.  The first function sets
b_error anticipating that the low level buffer handling code will see
the nonzero b_error and clear XBF_DONE on the buffer, whereas the second
function does not.

Since xfs_dir3_free_header_check examines fields in the dir free block
header that require more context than can be provided to read verifiers,
we must call xfs_buf_mark_corrupt when it finds a problem.

Switching the calls has a secondary effect that we no longer corrupt the
buffer state by setting b_error and leaving XBF_DONE set.  When /that/
happens, we'll trip over various state assertions (most commonly the
b_error check in xfs_buf_reverify) on a subsequent attempt to read the
buffer.

Fixes: bc1a09b8e3 ("xfs: refactor verifier callers to print address of failing check")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-03-12 07:58:12 -07:00
Darrick J. Wong
8d57c21600 xfs: add a function to deal with corrupt buffers post-verifiers
Add a helper function to get rid of buffers that we have decided are
corrupt after the verifiers have run.  This function is intended to
handle metadata checks that can't happen in the verifiers, such as
inter-block relationship checking.  Note that we now mark the buffer
stale so that it will not end up on any LRU and will be purged on
release.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-03-12 07:58:12 -07:00
Darrick J. Wong
a71e4228e6 xfs: fix xfs_rmap_has_other_keys usage of ECANCELED
In e7ee96dfb8, we converted all ITER_ABORT users to use ECANCELED
instead, but we forgot to teach xfs_rmap_has_other_keys not to return
that magic value to callers.  Fix it now by using ECANCELED both to
abort the iteration and to signal that we found another reverse mapping.
This enables us to drop the separate boolean flag.

Fixes: e7ee96dfb8 ("xfs: remove all *_ITER_ABORT values")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-03-12 07:58:12 -07:00
Christoph Hellwig
3e6e8afd3a xfs: remove XFS_BUF_TO_SBP
Just dereference bp->b_addr directly and make the code a little
simpler and more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-11 09:11:39 -07:00
Christoph Hellwig
9798f615ad xfs: remove XFS_BUF_TO_AGF
Just dereference bp->b_addr directly and make the code a little
simpler and more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-11 09:11:39 -07:00
Christoph Hellwig
370c782b98 xfs: remove XFS_BUF_TO_AGI
Just dereference bp->b_addr directly and make the code a little
simpler and more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-11 09:11:38 -07:00
Christoph Hellwig
4b97510859 xfs: remove the xfs_agfl_t typedef
There is just a single user left, so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-11 09:11:38 -07:00
Christoph Hellwig
183606d824 xfs: remove the agfl_bno member from struct xfs_agfl
struct xfs_agfl is a header in front of the AGFL entries that exists
for CRC enabled file systems.  For not CRC enabled file systems the AGFL
is simply a list of agbno.  Make the CRC case similar to that by just
using the list behind the new header.  This indirectly solves a problem
with modern gcc versions that warn about taking addresses of packed
structures (and we have to pack the AGFL given that gcc rounds up
structure sizes).  Also replace the helper macro to get from a buffer
with an inline function in xfs_alloc.h to make the code easier to
read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-11 09:11:38 -07:00
Christoph Hellwig
e3a19cdea8 xfs: embedded the attrlist cursor into struct xfs_attr_list_context
The attrlist cursor only exists as part of an attr list context, so
embedd the structure instead of pointing to it.  Also give it a proper
xfs_ prefix and remove the obsolete typedef.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:55 -08:00
Christoph Hellwig
254f800f81 xfs: remove XFS_DA_OP_INCOMPLETE
Now that we use the on-disk flags field also for the interface to the
lower level attr routines we can use the XFS_ATTR_INCOMPLETE definition
from the on-disk format directly instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:55 -08:00
Christoph Hellwig
d5f0f49a9b xfs: clean up the attr flag confusion
The ATTR_* flags have a long IRIX history, where they a userspace
interface, the on-disk format and an internal interface.  We've split
out the on-disk interface to the XFS_ATTR_* values, but despite (or
because?) of that the flag have still been a mess.  Switch the
internal interface to pass the on-disk XFS_ATTR_* flags for the
namespace and the Linux XATTR_* flags for the actual flags instead.
The ATTR_* values that are actually used are move to xfs_fs.h with a
new XFS_IOC_* prefix to not conflict with the userspace version that
has the same name and must have the same value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:55 -08:00
Christoph Hellwig
f3e93d95fe xfs: clean up the ATTR_REPLACE checks
Remove superflous braces, elses after return statements and use a goto
label to merge common error handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:55 -08:00
Christoph Hellwig
17e1dd83ea xfs: rename xfs_attr_list_int to xfs_attr_list
The version taking the context structure is the main interface to list
attributes, so drop the _int postfix.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:54 -08:00
Christoph Hellwig
3e7a779937 xfs: move the legacy xfs_attr_list to xfs_ioctl.c
The old xfs_attr_list code is only used by the attrlist by handle
ioctl.  Move it to xfs_ioctl.c with its user.  Also move the
attrlist and attrlist_ent structure to xfs_fs.h, as they are exposed
user ABIs.  They are used through libattr headers with the same name
by at least xfsdump.  Also document this relation so that it doesn't
require a research project to figure out.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:54 -08:00
Christoph Hellwig
fe96008712 xfs: remove the unused ATTR_ENTRY macro
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:54 -08:00
Christoph Hellwig
a9c8c69b49 xfs: cleanup struct xfs_attr_list_context
Replace the alist char pointer with a void buffer given that different
callers use it in different ways.  Use the chance to remove the typedef
and reduce the indentation of the struct definition so that it doesn't
overflow 80 char lines all over.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:54 -08:00
Christoph Hellwig
377f16ac67 xfs: factor out a xfs_attr_match helper
Factor out a helper that compares an on-disk attr vs the name, length and
flags specified in struct xfs_da_args.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:53 -08:00
Christoph Hellwig
1d73301994 xfs: replace ATTR_KERNOTIME with XFS_DA_OP_NOTIME
op_flags with the XFS_DA_OP_* flags is the usual place for in-kernel
only flags, so move the notime flag there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:53 -08:00
Christoph Hellwig
d49db18b24 xfs: remove ATTR_ALLOC and XFS_DA_OP_ALLOCVAL
Use a NULL args->value as the indicator to lazily allocate a buffer
instead, and let the caller always free args->value instead of
duplicating the cleanup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:53 -08:00
Christoph Hellwig
e513e25c38 xfs: remove ATTR_KERNOVAL
We can just pass down the Linux convention of a zero valuelen to just
query for the existance of an attribute to the low-level code instead.
The use in the legacy xfs_attr_list code only used by the ioctl
interface was already dead code, as the callers check that the flag
is not present.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:53 -08:00
Christoph Hellwig
c36f533f14 xfs: remove the xfs_inode argument to xfs_attr_get_ilocked
The inode can easily be derived from the args structure.  Also
don't bother with else statements after early returns.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:53 -08:00
Christoph Hellwig
e5171d7e98 xfs: pass an initialized xfs_da_args to xfs_attr_get
Instead of converting from one style of arguments to another in
xfs_attr_set, pass the structure from higher up in the call chain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:53 -08:00
Christoph Hellwig
a254462243 xfs: pass an initialized xfs_da_args structure to xfs_attr_set
Instead of converting from one style of arguments to another in
xfs_attr_set, pass the structure from higher up in the call chain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:53 -08:00
Christoph Hellwig
ead189adb8 xfs: turn xfs_da_args.value into a void pointer
The xattr values are blobs and should not be typed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:53 -08:00
Christoph Hellwig
4df28c64e4 xfs: remove the MAXNAMELEN check from xfs_attr_args_init
All the callers already check the length when allocating the
in-kernel xattrs buffers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:52 -08:00
Christoph Hellwig
79f2280b9b xfs: remove the name == NULL check from xfs_attr_args_init
All callers provide a valid name pointer, remove the redundant check.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:52 -08:00
Christoph Hellwig
0eb81a5f5c xfs: merge xfs_attr_remove into xfs_attr_set
The Linux xattr and acl APIs use a single call for set and remove.
Modify the high-level XFS API to match that and let xfs_attr_set handle
removing attributes as well.  With a little bit of reordering this
removes a lot of code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:52 -08:00
Christoph Hellwig
5e81357435 xfs: remove the ATTR_INCOMPLETE flag
Replace the ATTR_INCOMPLETE flag with a new boolean field in struct
xfs_attr_list_context.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:52 -08:00
Brian Foster
b73df17e4c xfs: open code insert range extent split helper
The insert range operation currently splits the extent at the target
offset in a separate transaction and lock cycle from the one that
shifts extents. In preparation for reworking insert range into an
atomic operation, lift the code into the caller so it can be easily
condensed to a single rolling transaction and lock cycle and
eliminate the helper. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:51 -08:00
Qian Cai
4982bff1ac xfs: fix an undefined behaviour in _da3_path_shift
In xfs_da3_path_shift() "blk" can be assigned to state->path.blk[-1] if
state->path.active is 1 (which is a valid state) when it tries to add an
entry to a single dir leaf block and then to shift forward to see if
there's a sibling block that would be a better place to put the new
entry. This causes a UBSAN warning given negative array indices are
undefined behavior in C. In practice the warning is entirely harmless
given that "blk" is never dereferenced in this case, but it is still
better to fix up the warning and slightly improve the code.

 UBSAN: Undefined behaviour in fs/xfs/libxfs/xfs_da_btree.c:1989:14
 index -1 is out of range for type 'xfs_da_state_blk_t [5]'
 Call trace:
  dump_backtrace+0x0/0x2c8
  show_stack+0x20/0x2c
  dump_stack+0xe8/0x150
  __ubsan_handle_out_of_bounds+0xe4/0xfc
  xfs_da3_path_shift+0x860/0x86c [xfs]
  xfs_da3_node_lookup_int+0x7c8/0x934 [xfs]
  xfs_dir2_node_addname+0x2c8/0xcd0 [xfs]
  xfs_dir_createname+0x348/0x38c [xfs]
  xfs_create+0x6b0/0x8b4 [xfs]
  xfs_generic_create+0x12c/0x1f8 [xfs]
  xfs_vn_mknod+0x3c/0x4c [xfs]
  xfs_vn_create+0x34/0x44 [xfs]
  do_last+0xd4c/0x10c8
  path_openat+0xbc/0x2f4
  do_filp_open+0x74/0xf4
  do_sys_openat2+0x98/0x180
  __arm64_sys_openat+0xf8/0x170
  do_el0_svc+0x170/0x240
  el0_sync_handler+0x150/0x250
  el0_sync+0x164/0x180

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:51 -08:00
Christoph Hellwig
ba8adad5d0 xfs: remove the kuid/kgid conversion wrappers
Remove the XFS wrappers for converting from and to the kuid/kgid types.
Mostly this means switching to VFS i_{u,g}id_{read,write} helpers, but
in a few spots the calls to the conversion functions is open coded.
To match the use of sb->s_user_ns in the helpers and other file systems,
sb->s_user_ns is also used in the quota code.  The ACL code already does
the conversion in a grotty layering violation in the VFS xattr code,
so it keeps using init_user_ns for the identity mapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:50 -08:00
Christoph Hellwig
542951592c xfs: remove the icdinode di_uid/di_gid members
Use the Linux inode i_uid/i_gid members everywhere and just convert
from/to the scalar value when reading or writing the on-disk inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:50 -08:00
Christoph Hellwig
3d8f282150 xfs: ensure that the inode uid/gid match values match the icdinode ones
Instead of only synchronizing the uid/gid values in xfs_setup_inode,
ensure that they always match to prepare for removing the icdinode
fields.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:50 -08:00
Zheng Bin
d0c7feaf87 xfs: add agf freeblocks verify in xfs_agf_verify
We recently used fuzz(hydra) to test XFS and automatically generate
tmp.img(XFS v5 format, but some metadata is wrong)

xfs_repair information(just one AG):
agf_freeblks 0, counted 3224 in ag 0
agf_longest 536874136, counted 3224 in ag 0
sb_fdblocks 613, counted 3228

Test as follows:
mount tmp.img tmpdir
cp file1M tmpdir
sync

In 4.19-stable, sync will stuck, the reason is:
xfs_mountfs
  xfs_check_summary_counts
    if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) ||
       XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) &&
       !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS))
	return 0;  -->just return, incore sb_fdblocks still be 613
    xfs_initialize_perag_data

cp file1M tmpdir -->ok(write file to pagecache)
sync -->stuck(write pagecache to disk)
xfs_map_blocks
  xfs_iomap_write_allocate
    while (count_fsb != 0) {
      nimaps = 0;
      while (nimaps == 0) { --> endless loop
         nimaps = 1;
         xfs_bmapi_write(..., &nimaps) --> nimaps becomes 0 again
xfs_bmapi_write
  xfs_bmap_alloc
    xfs_bmap_btalloc
      xfs_alloc_vextent
        xfs_alloc_fix_freelist
          xfs_alloc_space_available -->fail(agf_freeblks is 0)

In linux-next, sync not stuck, cause commit c2b3164320 ("xfs:
use the latest extent at writeback delalloc conversion time") remove
the above while, dmesg is as follows:
[   55.250114] XFS (loop0): page discard on page ffffea0008bc7380, inode 0x1b0c, offset 0.

Users do not know why this page is discard, the better soultion is:
1. Like xfs_repair, make sure sb_fdblocks is equal to counted
(xfs_initialize_perag_data did this, who is not called at this mount)
2. Add agf verify, if fail, will tell users to repair

This patch use the second soultion.

Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Signed-off-by: Ren Xudong <renxudong1@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-03-02 20:55:50 -08:00
Darrick J. Wong
706b8c5bc7 xfs: remove unnecessary null pointer checks from _read_agf callers
Drop the null buffer pointer checks in all code that calls
xfs_alloc_read_agf and doesn't pass XFS_ALLOC_FLAG_TRYLOCK because
they're no longer necessary.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:27 -08:00
Darrick J. Wong
f48e2df8a8 xfs: make xfs_*read_agf return EAGAIN to ALLOC_FLAG_TRYLOCK callers
Refactor xfs_read_agf and xfs_alloc_read_agf to return EAGAIN if the
caller passed TRYLOCK and we weren't able to get the lock; and change
the callers to recognize this.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
ee647f85cb xfs: remove the xfs_btree_get_buf[ls] functions
Remove the xfs_btree_get_bufs and xfs_btree_get_bufl functions, since
they're pretty trivial oneliners.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
ce92464c18 xfs: make xfs_trans_get_buf return an error code
Convert xfs_trans_get_buf() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
9676b54e6e xfs: make xfs_trans_get_buf_map return an error code
Convert xfs_trans_get_buf_map() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
0e3eccce5e xfs: make xfs_buf_read return an error code
Convert xfs_buf_read() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
2842b6db3d xfs: make xfs_buf_get_uncached return an error code
Convert xfs_buf_get_uncached() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
841263e933 xfs: make xfs_buf_get return an error code
Convert xfs_buf_get() to return numeric error codes like most
everywhere else in xfs.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
4ed8e27b4f xfs: make xfs_buf_read_map return an error code
Convert xfs_buf_read_map() to return numeric error codes like most
everywhere else in xfs.  This involves moving the open-coded logic that
reports metadata IO read / corruption errors and stales the buffer into
xfs_buf_read_map so that the logic is all in one place.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2020-01-26 14:32:26 -08:00
Darrick J. Wong
b7df5e9205 xfs: make struct xfs_buf_log_format have a consistent size
Increase XFS_BLF_DATAMAP_SIZE by 1 to fill in the implied padding at the
end of struct xfs_buf_log_format.  This makes the size consistent so
that we can check it in xfs_ondisk.h, and will be needed once we start
logging attribute values.

On amd64 we get the following pahole:

struct xfs_buf_log_format {
        short unsigned int         blf_type;       /*     0     2 */
        short unsigned int         blf_size;       /*     2     2 */
        short unsigned int         blf_flags;      /*     4     2 */
        short unsigned int         blf_len;        /*     6     2 */
        long long int              blf_blkno;      /*     8     8 */
        unsigned int               blf_map_size;   /*    16     4 */
        unsigned int               blf_data_map[16]; /*    20    64 */
        /* --- cacheline 1 boundary (64 bytes) was 20 bytes ago --- */

        /* size: 88, cachelines: 2, members: 7 */
        /* padding: 4 */
        /* last cacheline: 24 bytes */
};

But on i386 we get the following:

struct xfs_buf_log_format {
        short unsigned int         blf_type;       /*     0     2 */
        short unsigned int         blf_size;       /*     2     2 */
        short unsigned int         blf_flags;      /*     4     2 */
        short unsigned int         blf_len;        /*     6     2 */
        long long int              blf_blkno;      /*     8     8 */
        unsigned int               blf_map_size;   /*    16     4 */
        unsigned int               blf_data_map[16]; /*    20    64 */
        /* --- cacheline 1 boundary (64 bytes) was 20 bytes ago --- */

        /* size: 84, cachelines: 2, members: 7 */
        /* last cacheline: 20 bytes */
};

Notice how the amd64 compiler inserts 4 bytes of padding to the end of
the structure to ensure 8-byte alignment.  Prior to "xfs: fix memory
corruption during remote attr value buffer invalidation" we would try to
write to blf_data_map[17], which is harmless on amd64 but really bad on
i386.

This shouldn't cause any changes in the ondisk logging formats because
the log code writes out the log vectors with the appropriate size for
the log item's map_size, and log recovery treats the data_map array as a
VLA.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-01-16 08:07:23 -08:00
Darrick J. Wong
0bb9d159bd xfs: streamline xfs_attr3_leaf_inactive
Now that we know we don't have to take a transaction to stale the incore
buffers for a remote value, get rid of the unnecessary memory allocation
in the leaf walker and call the rmt_stale function directly.  Flatten
the loop while we're at it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-01-16 08:07:23 -08:00
Darrick J. Wong
e8db2aafce xfs: fix memory corruption during remote attr value buffer invalidation
While running generic/103, I observed what looks like memory corruption
and (with slub debugging turned on) a slub redzone warning on i386 when
inactivating an inode with a 64k remote attr value.

On a v5 filesystem, maximally sized remote attr values require one block
more than 64k worth of space to hold both the remote attribute value
header (64 bytes).  On a 4k block filesystem this results in a 68k
buffer; on a 64k block filesystem, this would be a 128k buffer.  Note
that even though we'll never use more than 65,600 bytes of this buffer,
XFS_MAX_BLOCKSIZE is 64k.

This is a problem because the definition of struct xfs_buf_log_format
allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k).  On i386 when we
invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of
the dirty map, writing right off the end of the log item and corrupting
memory.  We've gotten away with this on x86_64 for years because the
compiler inserts a u32 padding on the end of struct xfs_buf_log_format.

Fortunately for us, remote attribute values are written to disk with
xfs_bwrite(), which is to say that they are not logged.  Fix the problem
by removing all places where we could end up creating a buffer log item
for a remote attribute value and leave a note explaining why.  Next,
replace the open-coded buffer invalidation with a call to the helper we
created in the previous patch that does better checking for bad metadata
before marking the buffer stale.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-01-16 08:07:23 -08:00
Darrick J. Wong
8edbb26b06 xfs: refactor remote attr value buffer invalidation
Hoist the code that invalidates remote extended attribute value buffers
into a separate helper function.  This prepares us for a memory
corruption fix in the next patch.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-01-16 08:07:23 -08:00
Vincenzo Frascino
ca78eee7b4 xfs: Add __packed to xfs_dir2_sf_entry_t definition
xfs_check_ondisk_structs() verifies that the sizes of the data types
used by xfs are correct via the XFS_CHECK_STRUCT_SIZE() macro.

Since the structures padding can vary depending on the ABI (e.g. on
ARM OABI structures are padded to multiple of 32 bits), it may happen
that xfs_dir2_sf_entry_t size check breaks the compilation with the
assertion below:

In file included from linux/include/linux/string.h:6,
                 from linux/include/linux/uuid.h:12,
                 from linux/fs/xfs/xfs_linux.h:10,
                 from linux/fs/xfs/xfs.h:22,
                 from linux/fs/xfs/xfs_super.c:7:
In function ‘xfs_check_ondisk_structs’,
    inlined from ‘init_xfs_fs’ at linux/fs/xfs/xfs_super.c:2025:2:
linux/include/linux/compiler.h:350:38:
    error: call to ‘__compiletime_assert_107’ declared with attribute
    error: XFS: sizeof(xfs_dir2_sf_entry_t) is wrong, expected 3
    _compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)

Restore the correct behavior adding __packed to the structure definition.

Cc: Darrick J. Wong <darrick.wong@oracle.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-01-15 08:45:51 -08:00
Darrick J. Wong
a508486552 xfs: introduce XFS_MAX_FILEOFF
Introduce a new #define for the maximum supported file block offset.
We'll use this in the next patch to make it more obvious that we're
doing some operation for all possible inode fork mappings after a given
offset.  We can't use ULLONG_MAX here because bunmapi uses that to
detect when it's done.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2020-01-14 08:02:51 -08:00
Allison Henderson
d29f781c32 xfs: Remove all strlen in all xfs_attr_* functions for attr names.
This helps to pre-simplify the extra handling of the null terminator in
delayed operations which use memcpy rather than strlen.  Later
when we introduce parent pointers, attribute names will become binary,
so strlen will not work at all.  Removing uses of strlen now will
help reduce complexities later

Signed-off-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-01-09 10:55:19 -08:00
Christoph Hellwig
780d290577 xfs: fix misuse of the XFS_ATTR_INCOMPLETE flag
XFS_ATTR_INCOMPLETE is a flag in the on-disk attribute format, and thus
in a different namespace as the ATTR_* flags in xfs_da_args.flags.
Switch to using a XFS_DA_OP_INCOMPLETE flag in op_flags instead.  Without
this users might be able to inject this flag into operations using the
attr by handle ioctl.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-01-09 10:55:18 -08:00
Christoph Hellwig
953aa9d136 xfs: clear kernel only flags in XFS_IOC_ATTRMULTI_BY_HANDLE
Don't allow passing arbitrary flags as they change behavior including
memory allocation that the call stack is not prepared for.

Fixes: ddbca70cc4 ("xfs: allocate xattr buffer on demand")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-01-09 10:55:18 -08:00
Eric Sandeen
5a57c05b56 xfs: remove shadow variable in xfs_btree_lshift
Sparse warns about a shadow variable in this function after the
Fixed: commit added another int i; with larger scope.  It's safe
to remove the one with the smaller scope to fix this shadow,
although the shadow itself is harmless.

Fixes: 2c813ad66a ("xfs: support btrees with overlapping intervals for keys")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2020-01-07 10:21:08 -08:00
Chen Wandun
5084bf6b20 xfs: Make the symbol 'xfs_rtalloc_log_count' static
Fix the following sparse warning:

fs/xfs/libxfs/xfs_trans_resv.c:206:1: warning: symbol 'xfs_rtalloc_log_count' was not declared. Should it be static?

Fixes: b1de6fc752 ("xfs: fix log reservation overflows when allocating large rt extents")
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-12-20 08:07:31 -08:00
Darrick J. Wong
13eaec4b2a xfs: don't commit sunit/swidth updates to disk if that would cause repair failures
Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
and swidth values could cause xfs_repair to fail loudly.  The problem
here is that repair calculates the where mkfs should have allocated the
root inode, based on the superblock geometry.  The allocation decisions
depend on sunit, which means that we really can't go updating sunit if
it would lead to a subsequent repair failure on an otherwise correct
filesystem.

Port from xfs_repair some code that computes the location of the root
inode and teach mount to skip the ondisk update if it would cause
problems for repair.  Along the way we'll update the documentation,
provide a function for computing the minimum AGFL size instead of
open-coding it, and cut down some indenting in the mount code.

Note that we allow the mount to proceed (and new allocations will
reflect this new geometry) because we've never screened this kind of
thing before.  We'll have to wait for a new future incompat feature to
enforce correct behavior, alas.

Note that the geometry reporting always uses the superblock values, not
the incore ones, so that is what xfs_info and xfs_growfs will report.

[1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7d

Reported-by: Alex Lyakas <alex@zadara.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-12-19 07:53:48 -08:00
Darrick J. Wong
1cac233cfe xfs: refactor agfl length computation function
Refactor xfs_alloc_min_freelist to accept a NULL @pag argument, in which
case it returns the largest possible minimum length.  This will be used
in an upcoming patch to compute the length of the AGFL at mkfs time.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-12-19 07:53:48 -08:00
Darrick J. Wong
af952aeb4a libxfs: resync with the userspace libxfs
Prepare to resync the userspace libxfs with the kernel libxfs.  There
were a few things I missed -- a couple of static inline directory
functions that have to be exported for xfs_repair; a couple of directory
naming functions that make porting much easier if they're /not/ static
inline; and a u16 usage that should have been uint16_t.

None of these things are bugs in their own right; this just makes
porting xfsprogs easier.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2019-12-19 07:53:47 -08:00
Darrick J. Wong
b1de6fc752 xfs: fix log reservation overflows when allocating large rt extents
Omar Sandoval reported that a 4G fallocate on the realtime device causes
filesystem shutdowns due to a log reservation overflow that happens when
we log the rtbitmap updates.  Factor rtbitmap/rtsummary updates into the
the tr_write and tr_itruncate log reservation calculation.

"The following reproducer results in a transaction log overrun warning
for me:

    mkfs.xfs -f -r rtdev=/dev/vdc -d rtinherit=1 -m reflink=0 /dev/vdb
    mount -o rtdev=/dev/vdc /dev/vdb /mnt
    fallocate -l 4G /mnt/foo

Reported-by: Omar Sandoval <osandov@osandov.com>
Tested-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2019-12-17 11:19:28 -08:00
Brian Foster
d0c2204135 xfs: stabilize insert range start boundary to avoid COW writeback race
generic/522 (fsx) occasionally fails with a file corruption due to
an insert range operation. The primary characteristic of the
corruption is a misplaced insert range operation that differs from
the requested target offset. The reason for this behavior is a race
between the extent shift sequence of an insert range and a COW
writeback completion that causes a front merge with the first extent
in the shift.

The shift preparation function flushes and unmaps from the target
offset of the operation to the end of the file to ensure no
modifications can be made and page cache is invalidated before file
data is shifted. An insert range operation then splits the extent at
the target offset, if necessary, and begins to shift the start
offset of each extent starting from the end of the file to the start
offset. The shift sequence operates at extent level and so depends
on the preparation sequence to guarantee no changes can be made to
the target range during the shift. If the block immediately prior to
the target offset was dirty and shared, however, it can undergo
writeback and move from the COW fork to the data fork at any point
during the shift. If the block is contiguous with the block at the
start offset of the insert range, it can front merge and alter the
start offset of the extent. Once the shift sequence reaches the
target offset, it shifts based on the latest start offset and
silently changes the target offset of the operation and corrupts the
file.

To address this problem, update the shift preparation code to
stabilize the start boundary along with the full range of the
insert. Also update the existing corruption check to fail if any
extent is shifted with a start offset behind the target offset of
the insert range. This prevents insert from racing with COW
writeback completion and fails loudly in the event of an unexpected
extent shift.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-12-11 13:18:42 -08:00
Linus Torvalds
50caca9d7f Fixes for 5.5-rc1:
- Fix a crash in the log setup code when log mounting fails
 - Fix a hang when allocating space on the realtime device
 - Fix a block leak when freeing space on the realtime device
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl3n5OkACgkQ+H93GTRK
 tOtUvw/+Lxnwx1AaYT7hSMS4p1buJnAV7aM6Ofm+F+GkRISWlxSEYLIjrj3NFxeI
 Fv4jnPWMGHtBzW2c4OXv9zhW7vQeuU0Z72sgxA+Wqccf53sfR5Qum+Tya+Bo8X0X
 LPeDEuA+k4UUUHvSLqscPFPZYbXgwp3dJ2i7JeuH0vx3BhFeTjV1nlOeo1z3QBzN
 LLVjuBg7Zms+TPorrOgS67LAWfzqCAiQDauvyODB5EW+UElK6pvpOklFzc7TUPr2
 PIvmjEE8UNN2y9QuEEKJ26t43ejAzG016yGMxyW74i5wU33R7PHkdgG1pWwWRZzr
 yvNClg2CMtLu+Bcx8Lc9X23mW0DqPkUYDohWCe/Tytvz5kaKCEq3WlqwaG5EdMOg
 gSJlivpoOTduQ26V4fPvD1/fjGpJLWyfHPs9p43wie+K/NuUcTIvr4BGGQszjS/n
 5Zr630g6Tq5VrBMl0f1P2NuEbeQEvmbWNTW2TIvvHZTgMd8mZdvX28IXj0dAhBb/
 2U5o1NF8F6VeRGYoZFJI70RIfiLYzOQEmsA5hAyJfUQQ18u8zDJuPV4isO4/XjQf
 d32E36cvP3CKVQYn7hAiMD5O8jOFckYL287qd4uDYutmleHEcfzc0H4pTW+66IHp
 IuzkPCgvOkd4h4qnhtHSDoSlKd7kc1Ai3hBhCdA9zd6IUAVsZ6Y=
 =Ps4B
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
 "Fix a couple of resource management errors and a hang:

   - fix a crash in the log setup code when log mounting fails

   - fix a hang when allocating space on the realtime device

   - fix a block leak when freeing space on the realtime device"

* tag 'xfs-5.5-merge-17' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix mount failure crash on invalid iclog memory access
  xfs: don't check for AG deadlock for realtime files in bunmapi
  xfs: fix realtime file data space leak
2019-12-07 17:05:33 -08:00
Omar Sandoval
69ffe5960d xfs: don't check for AG deadlock for realtime files in bunmapi
Commit 5b094d6dac ("xfs: fix multi-AG deadlock in xfs_bunmapi") added
a check in __xfs_bunmapi() to stop early if we would touch multiple AGs
in the wrong order. However, this check isn't applicable for realtime
files. In most cases, it just makes us do unnecessary commits. However,
without the fix from the previous commit ("xfs: fix realtime file data
space leak"), if the last and second-to-last extents also happen to have
different "AG numbers", then the break actually causes __xfs_bunmapi()
to return without making any progress, which sends
xfs_itruncate_extents_flags() into an infinite loop.

Fixes: 5b094d6dac ("xfs: fix multi-AG deadlock in xfs_bunmapi")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-12-02 17:58:51 -08:00
Omar Sandoval
0c4da70c83 xfs: fix realtime file data space leak
Realtime files in XFS allocate extents in rextsize units. However, the
written/unwritten state of those extents is still tracked in blocksize
units. Therefore, a realtime file can be split up into written and
unwritten extents that are not necessarily aligned to the realtime
extent size. __xfs_bunmapi() has some logic to handle these various
corner cases. Consider how it handles the following case:

1. The last extent is unwritten.
2. The last extent is smaller than the realtime extent size.
3. startblock of the last extent is not aligned to the realtime extent
   size, but startblock + blockcount is.

In this case, __xfs_bunmapi() calls xfs_bmap_add_extent_unwritten_real()
to set the second-to-last extent to unwritten. This should merge the
last and second-to-last extents, so __xfs_bunmapi() moves on to the
second-to-last extent.

However, if the size of the last and second-to-last extents combined is
greater than MAXEXTLEN, xfs_bmap_add_extent_unwritten_real() does not
merge the two extents. When that happens, __xfs_bunmapi() skips past the
last extent without unmapping it, thus leaking the space.

Fix it by only unwriting the minimum amount needed to align the last
extent to the realtime extent size, which is guaranteed to merge with
the last extent.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-12-02 17:58:50 -08:00
Linus Torvalds
97eeb4d9d7 New code for 5.5:
- Fill out the build string
 - Prevent inode fork extent count overflows
 - Refactor the allocator to reduce long tail latency
 - Rework incore log locking a little to reduce spinning
 - Break up the xfs_iomap_begin functions into smaller more cohesive
 parts
 - Fix allocation alignment being dropped too early when the allocation
 request is for more blocks than an AG is large
 - Other small cleanups
 - Clean up file buftarg retrieval helpers
 - Hoist the resvsp and unresvsp ioctls to the vfs
 - Remove the undocumented biosize mount option, since it has never been
   mentioned as existing or supported on linux
 - Clean up some of the mount option printing and parsing
 - Enhance attr leaf verifier to check block structure
 - Check dirent and attr names for invalid characters before passing them
 to the vfs
 - Refactor open-coded bmbt walking
 - Fix a few places where we return EIO instead of EFSCORRUPTED after
 failing metadata sanity checks
 - Fix a synchronization problem between fallocate and aio dio corrupting
 the file length
 - Clean up various loose ends in the iomap and bmap code
 - Convert to the new mount api
 - Make sure we always log something when returning EFSCORRUPTED
 - Fix some problems where long running scrub loops could trigger soft
 lockup warnings and/or fail to exit due to fatal signals pending
 - Fix various Coverity complaints
 - Remove most of the function pointers from the directory code to reduce
 indirection penalties
 - Ensure that dquots are attached to the inode when performing unwritten
 extent conversion after io
 - Deuglify incore projid and crtime types
 - Fix another AGI/AGF locking order deadlock when renaming
 - Clean up some quota typedefs
 - Remove the FSSETDM ioctls which haven't done anything in 20 years
 - Fix some memory leaks when mounting the log fails
 - Fix an underflow when updating an xattr leaf freemap
 - Remove some trivial wrappers
 - Report metadata corruption as an error, not a (potentially) fatal
 assertion
 - Clean up the dir/attr buffer mapping code
 - Allow fatal signals to kill scrub during parent pointer checks
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl3fNjcACgkQ+H93GTRK
 tOv/8w//Y0Oa9Paiy8+iPTChs3/PqeKp307Fj5KONG52haMCakEJFT5+/wpkIAJw
 uUmKiPolwN1ivviIUmIS14ThTJ7NV1jq0G0h/0tC25i/3hoJrGWdzqYJMlvhlqgE
 taHrjCwPTDkhRJ0D5QCrkkHPU7lSdquO5TWxltaqYLhyLIt8SkklD6dN1dHWEPnk
 k0j3TL+VqVJDYyEj1bLwJ0QUb2C3J8ygWnlviF/WxsSeJtJpGoeLEaYXhhsUK0Dt
 aHg70OM6zzFzrJJAtJeBXpgaFsG/Pqbcw4wUWSxEMWjVSJwCSKLuZ5F+p6NcqoEj
 HeLQkaGePoO61YCInk2JKLHIyx7ohqMOt7+Dm0mdbe1pvcKwV9ZcdkqKa8L/Fm6v
 bUP6a2hEpsGy7vLnkYxwYACTLPbGX3uLw8MUr6ZpJ+SpfVLktU4ycpr8dCkJkp6a
 0qOpEeHsBDy74NkMOUa7Qrju7lJ2GiL70qqBwaPe+ubcUa3U/3WAsSekSzXgUwn8
 Fap4r8wn7cUbxymAvO06RlU8YymuulAlyjwdo9gOL/Su/5POldss6dy1YuUtyq19
 CD6NtkHqEUMsTc2cI+H65H44aEeckB1j0D2Grm2uMchAh0GcTSFVNF6jony++B8k
 s2sL2dEw9/9vr0uc1TSVF5ezxaONuyaCXdYXUkkdyq3iNvfpRCg=
 =aACq
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.5-merge-16' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull XFS updates from Darrick Wong:
 "For this release, we changed quite a few things.

  Highlights:

   - Fixed some long tail latency problems in the block allocator

   - Removed some long deprecated (and for the past several years no-op)
     mount options and ioctls

   - Strengthened the extended attribute and directory verifiers

   - Audited and fixed all the places where we could return EFSCORRUPTED
     without logging anything

   - Refactored the old SGI space allocation ioctls to make the
     equivalent fallocate calls

   - Fixed a race between fallocate and directio

   - Fixed an integer overflow when files have more than a few
     billion(!) extents

   - Fixed a longstanding bug where quota accounting could be incorrect
     when performing unwritten extent conversion on a freshly mounted fs

   - Fixed various complaints in scrub about soft lockups and
     unresponsiveness to signals

   - De-vtable'd the directory handling code, which should make it
     faster

   - Converted to the new mount api, for better or for worse

   - Cleaned up some memory leaks

  and quite a lot of other smaller fixes and cleanups.

  A more detailed summary:

   - Fill out the build string

   - Prevent inode fork extent count overflows

   - Refactor the allocator to reduce long tail latency

   - Rework incore log locking a little to reduce spinning

   - Break up the xfs_iomap_begin functions into smaller more cohesive
     parts

   - Fix allocation alignment being dropped too early when the
     allocation request is for more blocks than an AG is large

   - Other small cleanups

   - Clean up file buftarg retrieval helpers

   - Hoist the resvsp and unresvsp ioctls to the vfs

   - Remove the undocumented biosize mount option, since it has never
     been mentioned as existing or supported on linux

   - Clean up some of the mount option printing and parsing

   - Enhance attr leaf verifier to check block structure

   - Check dirent and attr names for invalid characters before passing
     them to the vfs

   - Refactor open-coded bmbt walking

   - Fix a few places where we return EIO instead of EFSCORRUPTED after
     failing metadata sanity checks

   - Fix a synchronization problem between fallocate and aio dio
     corrupting the file length

   - Clean up various loose ends in the iomap and bmap code

   - Convert to the new mount api

   - Make sure we always log something when returning EFSCORRUPTED

   - Fix some problems where long running scrub loops could trigger soft
     lockup warnings and/or fail to exit due to fatal signals pending

   - Fix various Coverity complaints

   - Remove most of the function pointers from the directory code to
     reduce indirection penalties

   - Ensure that dquots are attached to the inode when performing
     unwritten extent conversion after io

   - Deuglify incore projid and crtime types

   - Fix another AGI/AGF locking order deadlock when renaming

   - Clean up some quota typedefs

   - Remove the FSSETDM ioctls which haven't done anything in 20 years

   - Fix some memory leaks when mounting the log fails

   - Fix an underflow when updating an xattr leaf freemap

   - Remove some trivial wrappers

   - Report metadata corruption as an error, not a (potentially) fatal
     assertion

   - Clean up the dir/attr buffer mapping code

   - Allow fatal signals to kill scrub during parent pointer checks"

* tag 'xfs-5.5-merge-16' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (198 commits)
  xfs: allow parent directory scans to be interrupted with fatal signals
  xfs: remove the mappedbno argument to xfs_da_get_buf
  xfs: remove the mappedbno argument to xfs_da_read_buf
  xfs: split xfs_da3_node_read
  xfs: remove the mappedbno argument to xfs_dir3_leafn_read
  xfs: remove the mappedbno argument to xfs_dir3_leaf_read
  xfs: remove the mappedbno argument to xfs_attr3_leaf_read
  xfs: remove the mappedbno argument to xfs_da_reada_buf
  xfs: improve the xfs_dabuf_map calling conventions
  xfs: refactor xfs_dabuf_map
  xfs: simplify mappedbno handling in xfs_da_{get,read}_buf
  xfs: report corruption only as a regular error
  xfs: Remove kmem_zone_free() wrapper
  xfs: Remove kmem_zone_destroy() wrapper
  xfs: Remove slab init wrappers
  xfs: fix attr leaf header freemap.size underflow
  xfs: fix some memory leaks in log recovery
  xfs: fix another missing include
  xfs: remove XFS_IOC_FSSETDM and XFS_IOC_FSSETDM_BY_HANDLE
  xfs: remove duplicated include from xfs_dir2_data.c
  ...
2019-12-02 14:46:22 -08:00
Linus Torvalds
3b266a52d8 New code for 5.5:
- Make iomap_dio_rw callers explicitly tell us if they want us to wait
 - Port the xfs writeback code to iomap to complete the buffered io
   library functions
 - Refactor the unshare code to share common pieces
 - Add support for performing copy on write with buffered writes
 - Other minor fixes
 - Fix unchecked return in iomap_bmap
 - Fix a type casting bug in a ternary statement in iomap_dio_bio_actor
 - Improve tracepoints for easier diagnostic ability
 - Fix pipe page leakage in directio reads
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl3YDqQACgkQ+H93GTRK
 tOsbPg/+KrPkhe60RnUO4PQbSLTRLsz5R7OK5ubfxsKlHdyy/8vu12yPdY03LcHn
 QoyQz+5gPjM58UvysIonlyRO0O30apl8UZ9PAINDMi7os6NP87illw4HtHL1cZjB
 JfIQKVJrLNtocZnAgVL74d6pUs7MH32SPw0r+/qbfz/JFcHdf/Sz8fSpb0sdS/oK
 QwT73TcaCcNa2C4twvhtO6+kMkzlTkJknqSZMqrthScqRRVeOnyGTjLdKqUamSqp
 uj0iAKm+bBpCcuMdcHd7EOgQyVGwCQKUndaLKojK/V1iqiK+3KsLnIoJj6HwlU27
 Q+pDoThv2V8m/Y940Gq0wzTNtkwdCirNaeKXwXX2ytlyPX5W45ZxgzGUQy4YYXGM
 ObHRmeJXdka6kH7yzWlPiZGdZixpagFLzFUHWjMXAD8Fb4YxNKM4FsJDY3K3uQi6
 y7EKV5O4q3qBW3ieL6l+wl9NkdcppSywRjRxhBZIhH4T2n7RMDLdBkNCiKZrWqIW
 1QcrIC1NvwbXPzSNaZ40dgjB6mJGTv+P9AJLSQpmlR1Y6txLsJ1AeVzucZOnXcCo
 TEiBfDFOZ9jvXUsaqGSdM99HxsePKn+VgEeTA6TFxTWJ9QcEgio1Pym+RkN6KrIU
 zsbJbgk9bp3YT940xKt90fAHWm/iKUWrrRyidVCK3kJd72e41UE=
 =FRl4
 -----END PGP SIGNATURE-----

Merge tag 'iomap-5.5-merge-11' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap updates from Darrick Wong:
 "In this release, we hoisted as much of XFS' writeback code into iomap
  as was practicable, refactored the unshare file data function, added
  the ability to perform buffered io copy on write, and tweaked various
  parts of the directio implementation as needed to port ext4's directio
  code (that will be a separate pull).

  Summary:

   - Make iomap_dio_rw callers explicitly tell us if they want us to
     wait

   - Port the xfs writeback code to iomap to complete the buffered io
     library functions

   - Refactor the unshare code to share common pieces

   - Add support for performing copy on write with buffered writes

   - Other minor fixes

   - Fix unchecked return in iomap_bmap

   - Fix a type casting bug in a ternary statement in
     iomap_dio_bio_actor

   - Improve tracepoints for easier diagnostic ability

   - Fix pipe page leakage in directio reads"

* tag 'iomap-5.5-merge-11' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (31 commits)
  iomap: Fix pipe page leakage during splicing
  iomap: trace iomap_appply results
  iomap: fix return value of iomap_dio_bio_actor on 32bit systems
  iomap: iomap_bmap should check iomap_apply return value
  iomap: Fix overflow in iomap_page_mkwrite
  fs/iomap: remove redundant check in iomap_dio_rw()
  iomap: use a srcmap for a read-modify-write I/O
  iomap: renumber IOMAP_HOLE to 0
  iomap: use write_begin to read pages to unshare
  iomap: move the zeroing case out of iomap_read_page_sync
  iomap: ignore non-shared or non-data blocks in xfs_file_dirty
  iomap: always use AOP_FLAG_NOFS in iomap_write_begin
  iomap: remove the unused iomap argument to __iomap_write_end
  iomap: better document the IOMAP_F_* flags
  iomap: enhance writeback error message
  iomap: pass a struct page to iomap_finish_page_writeback
  iomap: cleanup iomap_ioend_compare
  iomap: move struct iomap_page out of iomap.h
  iomap: warn on inline maps in iomap_writepage_map
  iomap: lift the xfs writeback code to iomap
  ...
2019-11-30 10:44:49 -08:00
Christoph Hellwig
2911edb653 xfs: remove the mappedbno argument to xfs_da_get_buf
Use the xfs_da_get_buf_daddr function directly for the two callers
that pass a mapped disk address, and then remove the mappedbno argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-22 08:17:10 -08:00
Christoph Hellwig
cd2c9f1b54 xfs: remove the mappedbno argument to xfs_da_read_buf
Move the code for reading an already mapped block into
xfs_da3_node_read_mapped, which is the only caller ever passing a block
number in the mappedbno argument and replace the mappedbno argument with
the simple xfs_dabuf_get flags.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-22 08:17:10 -08:00
Christoph Hellwig
02c57f0a8b xfs: split xfs_da3_node_read
Split xfs_da3_node_read into one variant that always looks up the daddr
and doesn't accept holes, and one that already has a daddr at hand.
This is in preparation of splitting up xfs_da_read_buf in a similar way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-22 08:17:10 -08:00
Christoph Hellwig
f3fcb314d1 xfs: remove the mappedbno argument to xfs_dir3_leafn_read
This argument is always hard coded to -1, so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-22 08:17:10 -08:00
Christoph Hellwig
c943c0b2e5 xfs: remove the mappedbno argument to xfs_dir3_leaf_read
This argument is always hard coded to -1, so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-22 08:17:10 -08:00
Christoph Hellwig
dfb8759408 xfs: remove the mappedbno argument to xfs_attr3_leaf_read
This argument is always hard coded to -1, so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-22 08:17:09 -08:00
Christoph Hellwig
06566fda42 xfs: remove the mappedbno argument to xfs_da_reada_buf
Replace the mappedbno argument with the simple flags for xfs_da_reada_buf
and xfs_dir3_data_readahead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-22 08:17:09 -08:00
Christoph Hellwig
199e9ba4e4 xfs: improve the xfs_dabuf_map calling conventions
Use a flags argument with the XFS_DABUF_MAP_HOLE_OK flag to signal that
a hole is okay and not corruption, and return 0 with *nmap set to 0 to
signal that case in the return value instead of a nameless -1 return
code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-22 08:17:09 -08:00
Christoph Hellwig
45feef8f50 xfs: refactor xfs_dabuf_map
Merge xfs_buf_map_from_irec and xfs_da_map_covers_blocks into a single
loop in the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-22 08:17:09 -08:00
Christoph Hellwig
fa0d44ec7f xfs: simplify mappedbno handling in xfs_da_{get,read}_buf
Shortcut the creation of xfs_bmbt_irec and xfs_buf_map for the case
where the callers passed an already mapped xfs_daddr_t.  This is in
preparation for splitting these cases out entirely later.  Also reject
the mappedbno case for xfs_da_reada_buf as no callers currently uses
it and it will be removed soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-22 08:17:09 -08:00
Carlos Maiolino
377bcd5f3b xfs: Remove kmem_zone_free() wrapper
We can remove it now, without needing to rework the KM_ flags.

Use kmem_cache_free() directly.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-18 08:40:44 -08:00
Brian Foster
2a2b5932db xfs: fix attr leaf header freemap.size underflow
The leaf format xattr addition helper xfs_attr3_leaf_add_work()
adjusts the block freemap in a couple places. The first update drops
the size of the freemap that the caller had already selected to
place the xattr name/value data. Before the function returns, it
also checks whether the entries array has encroached on a freemap
range by virtue of the new entry addition. This is necessary because
the entries array grows from the start of the block (but end of the
block header) towards the end of the block while the name/value data
grows from the end of the block in the opposite direction. If the
associated freemap is already empty, however, size is zero and the
subtraction underflows the field and causes corruption.

This is reproduced rarely by generic/070. The observed behavior is
that a smaller sized freemap is aligned to the end of the entries
list, several subsequent xattr additions land in larger freemaps and
the entries list expands into the smaller freemap until it is fully
consumed and then underflows. Note that it is not otherwise a
corruption for the entries array to consume an empty freemap because
the nameval list (i.e. the firstused pointer in the xattr header)
starts beyond the end of the corrupted freemap.

Update the freemap size modification to account for the fact that
the freemap entry can be empty and thus stale.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-15 21:15:29 -08:00
YueHaibing
eb0d21637f xfs: remove duplicated include from xfs_dir2_data.c
Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13 18:22:41 -08:00
Eric Sandeen
a55cefccaa xfs: remove unused structure members & simple typedefs
Remove some unused typedef'd simple types, and some unused
structure members.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13 18:22:41 -08:00
Eric Sandeen
35dab307c8 xfs: remove unused typedef definitions
Remove some typdefs for type_t's that are no longer referred to
by their typedef'd types.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13 18:22:40 -08:00
Pavel Reichl
d0bdfb1069 xfs: remove the xfs_qoff_logitem_t typedef
Signed-off-by: Pavel Reichl <preichl@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix a comment]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13 18:22:28 -08:00
Pavel Reichl
aefe69a45d xfs: remove the xfs_disk_dquot_t and xfs_dquot_t
Signed-off-by: Pavel Reichl <preichl@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix some of the comments]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13 11:13:45 -08:00
Arnd Bergmann
e8777b27ca xfs: avoid time_t in user api
The ioctl definitions for XFS_IOC_SWAPEXT, XFS_IOC_FSBULKSTAT and
XFS_IOC_FSBULKSTAT_SINGLE are part of libxfs and based on time_t.

The definition for time_t differs between current kernels and coming
32-bit libc variants that define it as 64-bit. For most ioctls, that
means the kernel has to be able to handle two different command codes
based on the different structure sizes.

The same solution could be applied for XFS_IOC_SWAPEXT, but it would
not work for XFS_IOC_FSBULKSTAT and XFS_IOC_FSBULKSTAT_SINGLE because
the structure with the time_t is passed through an indirect pointer,
and the command number itself is based on struct xfs_fsop_bulkreq,
which does not differ based on time_t.

This means any solution that can be applied requires a change of the
ABI definition in the xfs_fs.h header file, as well as doing the same
change in any user application that contains a copy of this header.

The usual solution would be to define a replacement structure and
use conditional compilation for the ioctl command codes to use
one or the other, such as

 #define XFS_IOC_FSBULKSTAT_OLD _IOWR('X', 101, struct xfs_fsop_bulkreq)
 #define XFS_IOC_FSBULKSTAT_NEW _IOWR('X', 129, struct xfs_fsop_bulkreq)
 #define XFS_IOC_FSBULKSTAT ((sizeof(time_t) == sizeof(__kernel_long_t)) ? \
			     XFS_IOC_FSBULKSTAT_OLD : XFS_IOC_FSBULKSTAT_NEW)

After this, the kernel would be able to implement both
XFS_IOC_FSBULKSTAT_OLD and XFS_IOC_FSBULKSTAT_NEW handlers on
32-bit architectures with the correct ABI for either definition
of time_t.

However, as long as two observations are true, a much simpler solution
can be used:

1. xfsprogs is the only user space project that has a copy of this header
2. xfsprogs already has a replacement for all three affected ioctl commands,
   based on the xfs_bulkstat structure to pass 64-bit timestamps
   regardless of the architecture

Based on those assumptions, changing xfs_bstime to use __kernel_long_t
instead of time_t in both the kernel and in xfsprogs preserves the current
ABI for any libc definition of time_t and solves the problem of passing
64-bit timestamps to 32-bit user space.

If either of the two assumptions is invalid, more discussion is needed
for coming up with a way to fix as much of the affected user space
code as possible.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13 11:13:45 -08:00
kaixuxia
93597ae8da xfs: Fix deadlock between AGI and AGF when target_ip exists in xfs_rename()
When target_ip exists in xfs_rename(), the xfs_dir_replace() call may
need to hold the AGF lock to allocate more blocks, and then invoking
the xfs_droplink() call to hold AGI lock to drop target_ip onto the
unlinked list, so we get the lock order AGF->AGI. This would break the
ordering constraint on AGI and AGF locking - inode allocation locks
the AGI, then can allocate a new extent for new inodes, locking the
AGF after the AGI.

In this patch we check whether the replace operation need more
blocks firstly. If so, acquire the agi lock firstly to preserve
locking order(AGI/AGF). Actually, the locking order problem only
occurs when we are locking the AGI/AGF of the same AG. For multiple
AGs the AGI lock will be released after the transaction committed.

Signed-off-by: kaixuxia <kaixuxia@tencent.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: reword the comment]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13 11:13:45 -08:00
Christoph Hellwig
048a35d2f0 xfs: don't reset the "inode core" in xfs_iread
We have the exact same memset in xfs_inode_alloc, which is always called
just before xfs_iread.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13 11:13:45 -08:00
Christoph Hellwig
de7a866fd4 xfs: merge the projid fields in struct xfs_icdinode
There is no point in splitting the fields like this in an purely
in-memory structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13 11:13:45 -08:00
Christoph Hellwig
8d2d878db8 xfs: use a struct timespec64 for the in-core crtime
struct xfs_icdinode is purely an in-memory data structure, so don't use
a log on-disk structure for it.  This simplifies the code a bit, and
also reduces our include hell slightly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix a minor indenting problem in xfs_trans_ichgtime]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13 11:13:45 -08:00
Christoph Hellwig
d8d11fc703 xfs: devirtualize ->m_dirnameops
Instead of causing a relatively expensive indirect call for each
hashing and comparism of a file name in a directory just use an
inline function and a simple branch on the ASCII CI bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix unused variable warning]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-13 11:13:45 -08:00
Darrick J. Wong
a71895c5da xfs: convert open coded corruption check to use XFS_IS_CORRUPT
Convert the last of the open coded corruption check and report idioms to
use the XFS_IS_CORRUPT macro.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-13 11:08:01 -08:00
Darrick J. Wong
f9e0370648 xfs: kill the XFS_WANT_CORRUPT_* macros
The XFS_WANT_CORRUPT_* macros conceal subtle side effects such as the
creation of local variables and redirections of the code flow.  This is
pretty ugly, so replace them with explicit XFS_IS_CORRUPT tests that
remove both of those ugly points.  The change was performed with the
following coccinelle script:

@@
expression mp, test;
identifier label;
@@

- XFS_WANT_CORRUPTED_GOTO(mp, test, label);
+ if (XFS_IS_CORRUPT(mp, !test)) { error = -EFSCORRUPTED; goto label; }

@@
expression mp, test;
@@

- XFS_WANT_CORRUPTED_RETURN(mp, test);
+ if (XFS_IS_CORRUPT(mp, !test)) return -EFSCORRUPTED;

@@
expression mp, lval, rval;
@@

- XFS_IS_CORRUPT(mp, !(lval == rval))
+ XFS_IS_CORRUPT(mp, lval != rval)

@@
expression mp, e1, e2;
@@

- XFS_IS_CORRUPT(mp, !(e1 && e2))
+ XFS_IS_CORRUPT(mp, !e1 || !e2)

@@
expression e1, e2;
@@

- !(e1 == e2)
+ e1 != e2

@@
expression e1, e2, e3, e4, e5, e6;
@@

- !(e1 == e2 && e3 == e4) || e5 != e6
+ e1 != e2 || e3 != e4 || e5 != e6

@@
expression e1, e2, e3, e4, e5, e6;
@@

- !(e1 == e2 || (e3 <= e4 && e5 <= e6))
+ e1 != e2 && (e3 > e4 || e5 > e6)

@@
expression mp, e1, e2;
@@

- XFS_IS_CORRUPT(mp, !(e1 <= e2))
+ XFS_IS_CORRUPT(mp, e1 > e2)

@@
expression mp, e1, e2;
@@

- XFS_IS_CORRUPT(mp, !(e1 < e2))
+ XFS_IS_CORRUPT(mp, e1 >= e2)

@@
expression mp, e1;
@@

- XFS_IS_CORRUPT(mp, !!e1)
+ XFS_IS_CORRUPT(mp, e1)

@@
expression mp, e1, e2;
@@

- XFS_IS_CORRUPT(mp, !(e1 || e2))
+ XFS_IS_CORRUPT(mp, !e1 && !e2)

@@
expression mp, e1, e2, e3, e4;
@@

- XFS_IS_CORRUPT(mp, !(e1 == e2) && !(e3 == e4))
+ XFS_IS_CORRUPT(mp, e1 != e2 && e3 != e4)

@@
expression mp, e1, e2, e3, e4;
@@

- XFS_IS_CORRUPT(mp, !(e1 <= e2) || !(e3 >= e4))
+ XFS_IS_CORRUPT(mp, e1 > e2 || e3 < e4)

@@
expression mp, e1, e2, e3, e4;
@@

- XFS_IS_CORRUPT(mp, !(e1 == e2) && !(e3 <= e4))
+ XFS_IS_CORRUPT(mp, e1 != e2 && e3 > e4)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-12 17:19:02 -08:00
Darrick J. Wong
27d9ee577d xfs: actually check xfs_btree_check_block return in xfs_btree_islastblock
Coverity points out that xfs_btree_islastblock doesn't check the return
value of xfs_btree_check_block.  Since the question "Does the cursor
point to the last block in this level?" only makes sense if the caller
previously performed a lookup or seek operation, the block should
already have been checked.

Therefore, check the return value in an ASSERT and turn the whole thing
into a static inline predicate.

Coverity-id: 114069
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2019-11-11 12:42:52 -08:00
Christoph Hellwig
23220fe260 xfs: always pass a valid hdr to xfs_dir3_leaf_check_int
Move the code for extracting the incore header to the only caller that
didn't already do that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-11-10 16:54:24 -08:00