Long saga. There have been days spent following this through dead end
after dead end in multi-GB event traces. This morning, after writing
a trace-cmd wrapper that enabled me to be more selective about XFS
trace points, I discovered that I could get just enough essential
tracepoints enabled that there was a 50:50 chance the fsx config
would fail at ~115k ops. If it didn't fail at op 115547, I stopped
fsx at op 115548 anyway.
That gave me two traces - one where the problem manifested, and one
where it didn't. After refining the traces to have the necessary
information, I found that in the failing case there was a real
extent in the COW fork compared to an unwritten extent in the
working case.
Walking back through the two traces to the point where the CWO fork
extents actually diverged, I found that the bad case had an extra
unwritten extent in it. This is likely because the bug it led me to
had triggered multiple times in those 115k ops, leaving stray
COW extents around. What I saw was a COW delalloc conversion to an
unwritten extent (as they should always be through
xfs_iomap_write_allocate()) resulted in a /written extent/:
xfs_writepage: dev 259:0 ino 0x83 pgoff 0x17000 size 0x79a00 offset 0 length 0
xfs_iext_remove: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/2 offset 32 block 152 count 20 flag 1 caller xfs_bmap_add_extent_delay_real
xfs_bmap_pre_update: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/1 offset 1 block 4503599627239429 count 31 flag 0 caller xfs_bmap_add_extent_delay_real
xfs_bmap_post_update: dev 259:0 ino 0x83 state RC|LF|RF|COW cur 0xffff888247b899c0/1 offset 1 block 121 count 51 flag 0 caller xfs_bmap_add_ex
Basically, Cow fork before:
0 1 32 52
+H+DDDDDDDDDDDD+UUUUUUUUUUU+
PREV RIGHT
COW delalloc conversion allocates:
1 32
+uuuuuuuuuuuu+
NEW
And the result according to the xfs_bmap_post_update trace was:
0 1 32 52
+H+wwwwwwwwwwwwwwwwwwwwwwww+
PREV
Which is clearly wrong - it should be a merged unwritten extent,
not an unwritten extent.
That lead me to look at the LEFT_FILLING|RIGHT_FILLING|RIGHT_CONTIG
case in xfs_bmap_add_extent_delay_real(), and sure enough, there's
the bug.
It takes the old delalloc extent (PREV) and adds the length of the
RIGHT extent to it, takes the start block from NEW, removes the
RIGHT extent and then updates PREV with the new extent.
What it fails to do is update PREV.br_state. For delalloc, this is
always XFS_EXT_NORM, while in this case we are converting the
delayed allocation to unwritten, so it needs to be updated to
XFS_EXT_UNWRITTEN. This LF|RF|RC case does not do this, and so
the resultant extent is always written.
And that's the bug I've been chasing for a week - a bmap btree bug,
not a reflink/dedupe/copy_file_range bug, but a BMBT bug introduced
with the recent in core extent tree scalability enhancements.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The last AG may be very small comapred to all other AGs, and hence
AG reservations based on the superblock AG size may actually consume
more space than the AG actually has. This results on assert failures
like:
XFS: Assertion failed: xfs_perag_resv(pag, XFS_AG_RESV_METADATA)->ar_reserved + xfs_perag_resv(pag, XFS_AG_RESV_RMAPBT)->ar_reserved <= pag->pagf_freeblks + pag->pagf_flcount, file: fs/xfs/libxfs/xfs_ag_resv.c, line: 319
[ 48.932891] xfs_ag_resv_init+0x1bd/0x1d0
[ 48.933853] xfs_fs_reserve_ag_blocks+0x37/0xb0
[ 48.934939] xfs_mountfs+0x5b3/0x920
[ 48.935804] xfs_fs_fill_super+0x462/0x640
[ 48.936784] ? xfs_test_remount_options+0x60/0x60
[ 48.937908] mount_bdev+0x178/0x1b0
[ 48.938751] mount_fs+0x36/0x170
[ 48.939533] vfs_kern_mount.part.43+0x54/0x130
[ 48.940596] do_mount+0x20e/0xcb0
[ 48.941396] ? memdup_user+0x3e/0x70
[ 48.942249] ksys_mount+0xba/0xd0
[ 48.943046] __x64_sys_mount+0x21/0x30
[ 48.943953] do_syscall_64+0x54/0x170
[ 48.944835] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Hence we need to ensure the finobt per-ag space reservations take
into account the size of the last AG rather than treat it like all
the other full size AGs.
Note that both refcountbt and rmapbt already take the size of the AG
into account via reading the AGF length directly.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
generic/070 on 64k block size filesystems is failing with a verifier
corruption on writeback or an attribute leaf block:
[ 94.973083] XFS (pmem0): Metadata corruption detected at xfs_attr3_leaf_verify+0x246/0x260, xfs_attr3_leaf block 0x811480
[ 94.975623] XFS (pmem0): Unmount and run xfs_repair
[ 94.976720] XFS (pmem0): First 128 bytes of corrupted metadata buffer:
[ 94.978270] 000000004b2e7b45: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00 ........;.......
[ 94.980268] 000000006b1db90b: 00 00 00 00 00 81 14 80 00 00 00 00 00 00 00 00 ................
[ 94.982251] 00000000433f2407: 22 7b 5c 82 2d 5c 47 4c bb 31 1c 37 fa a9 ce d6 "{\.-\GL.1.7....
[ 94.984157] 0000000010dc7dfb: 00 00 00 00 00 81 04 8a 00 0a 18 e8 dd 94 01 00 ................
[ 94.986215] 00000000d5a19229: 00 a0 dc f4 fe 98 01 68 f0 d8 07 e0 00 00 00 00 .......h........
[ 94.988171] 00000000521df36c: 0c 2d 32 e2 fe 20 01 00 0c 2d 58 65 fe 0c 01 00 .-2.. ...-Xe....
[ 94.990162] 000000008477ae06: 0c 2d 5b 66 fe 8c 01 00 0c 2d 71 35 fe 7c 01 00 .-[f.....-q5.|..
[ 94.992139] 00000000a4a6bca6: 0c 2d 72 37 fc d4 01 00 0c 2d d8 b8 f0 90 01 00 .-r7.....-......
[ 94.994789] XFS (pmem0): xfs_do_force_shutdown(0x8) called from line 1453 of file fs/xfs/xfs_buf.c. Return address = ffffffff815365f3
This is failing this check:
end = ichdr.freemap[i].base + ichdr.freemap[i].size;
if (end < ichdr.freemap[i].base)
>>>>> return __this_address;
if (end > mp->m_attr_geo->blksize)
return __this_address;
And from the buffer output above, the freemap array is:
freemap[0].base = 0x00a0
freemap[0].size = 0xdcf4 end = 0xdd94
freemap[1].base = 0xfe98
freemap[1].size = 0x0168 end = 0x10000
freemap[2].base = 0xf0d8
freemap[2].size = 0x07e0 end = 0xf8b8
These all look valid - the block size is 0x10000 and so from the
last check in the above verifier fragment we know that the end
of freemap[1] is valid. The problem is that end is declared as:
uint16_t end;
And (uint16_t)0x10000 = 0. So we have a verifier bug here, not a
corruption. Fix the verifier to use uint32_t types for the check and
hence avoid the overflow.
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=201577
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
This patch adds xfs_attr_remove_args. These sub-routines remove
the attributes specified in @args. We will use this later for setting
parent pointers as a deferred attribute operation.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
This patch adds xfs_attr_set_args and xfs_bmap_set_attrforkoff.
These sub-routines set the attributes specified in @args.
We will use this later for setting parent pointers as a deferred
attribute operation.
[dgc: remove attr fork init code from xfs_attr_set_args().]
[dgc: xfs_attr_try_sf_addname() NULLs args.trans after commit.]
[dgc: correct sf add error handling.]
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
This patch adds a subroutine xfs_attr_try_sf_addname
used by xfs_attr_set. This subrotine will attempt to
add the attribute name specified in args in shortform,
as well and perform error handling previously done in
xfs_attr_set.
This patch helps to pre-simplify xfs_attr_set for reviewing
purposes and reduce indentation. New function will be added
in the next patch.
[dgc: moved commit to helper function, too.]
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
This patch moves fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h
since xfs_attr.c is in libxfs. We will need these later in
xfsprogs.
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The option to enable unwritten extents was made default in 2003,
removed from mkfs in 2007, and cannot be disabled in v5. We also
rely on it for a lot of common functionality, so filesystems without
it will run a completely untested and buggy code path. Enabling the
support also is a simple bit flip using xfs_db, so legacy file
systems can still be brought forward.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Commit 01239d77b9 ("xfs: fix a null pointer dereference in
xfs_bmap_extents_to_btree") attempted to fix a null pointer
dreference when a fuzzing corruption of some kind was found.
This fix was flawed, resulting in assert failures like:
XFS: Assertion failed: ifp->if_broot == NULL, file: fs/xfs/libxfs/xfs_bmap.c, line: 715
.....
Call Trace:
xfs_bmap_extents_to_btree+0x6b9/0x7b0
__xfs_bunmapi+0xae7/0xf00
? xfs_log_reserve+0x1c8/0x290
xfs_reflink_remap_extent+0x20b/0x620
xfs_reflink_remap_blocks+0x7e/0x290
xfs_reflink_remap_range+0x311/0x530
vfs_dedupe_file_range_one+0xd7/0xe0
vfs_dedupe_file_range+0x15b/0x1a0
do_vfs_ioctl+0x267/0x6c0
The problem is that the error handling code now asserts that the
inode fork is not in btree format before the error handling code
undoes the modifications that put the fork back in extent format.
Fix this by moving the assert back to after the xfs_iroot_realloc()
call that returns the fork to extent format, and clean up the jump
labels to be meaningful.
Also, returning ENOSPC when xfs_btree_get_bufl() fails to
instantiate the buffer that was allocated (the actual fix in the
commit mentioned above) is incorrect. This is a fatal error - only
an invalid block address or a filesystem shutdown can result in
failing to get a buffer here.
Hence change this to EFSCORRUPTED so that the higher layer knows
this was a corruption related failure and should not treat it as an
ENOSPC error. This should result in a shutdown (via cancelling a
dirty transaction) which is necessary as we do not attempt to clean
up the (invalid) block that we have already allocated.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Verify the inode di_forkoff, lifted from xfs_repair's
process_check_inode_forkoff().
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
xchk_inode_flags2() currently treats any di_flags2 values that the
running kernel doesn't recognize as corruption, and calls
xchk_ino_set_corrupt() if they are set. However, it's entirely possible
that these flags were set in some newer kernel and are quite valid,
but ignored in this kernel.
(Validators don't care one bit about unknown di_flags2.)
Call xchk_ino_set_warning instead, because this may or may not actually
indicate a problem.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Now that deferred operations are completely managed via
transactions, it's no longer necessary to cancel the dfops in error
paths that already cancel the associated transaction. There are a
few such calls lingering throughout the codebase.
Remove all remaining unnecessary calls to xfs_defer_cancel(). This
leaves xfs_defer_cancel() calls in two places. The first is the call
in the transaction cancel path itself, which facilitates this patch.
The second is made via the xfs_defer_finish() error path to provide
consistent error semantics with transaction commit. For example,
xfs_trans_commit() expects an xfs_defer_finish() failure to clean up
the dfops structure before it returns.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
- Use extent maps to track pagecache page status instead of bufferhead
state.
- Refactor pagecache read and write paths to use the new iomap library
functions, which enable us to drop the old bufferhead code for
pagesize == blocksize filesystems.
- Set up parallel per-block-per-page metadata to track subpage
information that was tracked by buffer heads, which enables us to drop
the old bufferhead code for pagesize > blocksize filesystems.
- Tie a deferred ops control structure to a transaction so that we can
take advantage of an upper-level dfops without having to plumb pointer
passing through the code.
- Refactor the deferred ops code to track deferred ops as part of the
transaction structure (instead of as a separate data structure) so
that we can simplify the scoping rules around defer_ops.
- Refactor twisty delwri buffer submission code to avoid deadlocks.
- Shorten and fix indenting problems in the scrub code.
- Detect obviously bad summary counts at mount and fix them.
- Directly associate deferred ops control structure with a transaction
so that callers no longer have to manage it themselves.
- Remove a couple of IRIX-era inode macros.
- Remove the long-deprecated 'barrier' and 'nobarrier' mount options.
- Clean up the inode fork structure a bit.
- Check for bad fs summary counter values in the superblock.
- Reduce COW fork lookups during writeback.
- Refactor the deferred ops control structures into the transaction
structure, thereby eliminating the need for transaction users to
handle the deferred ops as a separate data structure.
- Add the ability to repair AG headers online.
- Fix a crash due to insufficient return value checking.
- Various fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAltwVGoACgkQ+H93GTRK
tOt3bw//WaG1mR44Oo/mhf27MaEJK74LXeViqCH4Sdk10gujClTVnl6h33ChAyEi
7BT4x1JtwM6xOh7nPsXQy/besVxWadjQcTtAz/3U2wJoFyOX2+I27SAawrmX6jfR
Hi1DxXFFK7z/8YvuZqYl3vTgxMNb7bLAUybe2sYX8q+vrQaUvl9eLQlHSaT3sxrc
/lBkog1dYmbw3yjLnWYpQtC0I6Pa3ZuG/S2vpeJ2H5MADtzrRNjuC9MHZJW7tIGm
+rCLm0agk8yFkEA84VvS5Afee3TppY/JBaYlsvG1rp3bs0fELAJFnzS4g/QDbbsX
HAKPcMICJksF4C9y0Xb7wXPz/4PKur5/OSuGXN4QtOivOEoAdWfh2PLInqAjo/Le
mO92PdkBucfVqJzfEC2q2QAnGIaJlG8txhAz87wZ1YfZDQQlJDy385Z9GQXfUpy5
/1xH7V0cze1ZBSxWSddSFg0gCtaWSerfp0CmAG3A+HWKIN6c/ZNSCrqdq0DBC99D
qOn6ThjckZWGvz/KV5xBr/KvUYOpSeEyREtgcAN008TiUaNy4nOhWV2xgLGuPY/J
ed4V2B9qVbq+l+sZyzukB8cmOXmcCey6omwJ7LqZzoTWTAtTQtM2MwhaQFUWtQG8
mCqPXJp1XyL24sn0bI1t2NuKgQcs6QEQWX3zN4DA6I+N9+sTDqo=
=2G+i
-----END PGP SIGNATURE-----
Merge tag 'xfs-4.19-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Darrick Wong:
"This is the second part of the XFS changes for 4.19.
The biggest changes are the removal of buffer heads frm XFS, a massive
reworking of the deferred transaction operations handling code, the
removal of the long defunct barrier/nobarrier mount options, and the
addition of a few more online repair functions.
Summary:
- Use extent maps to track pagecache page status instead of
bufferhead state.
- Refactor pagecache read and write paths to use the new iomap
library functions, which enable us to drop the old bufferhead code
for pagesize == blocksize filesystems.
- Set up parallel per-block-per-page metadata to track subpage
information that was tracked by buffer heads, which enables us to
drop the old bufferhead code for pagesize > blocksize filesystems.
- Tie a deferred ops control structure to a transaction so that we
can take advantage of an upper-level dfops without having to plumb
pointer passing through the code.
- Refactor the deferred ops code to track deferred ops as part of the
transaction structure (instead of as a separate data structure) so
that we can simplify the scoping rules around defer_ops.
- Refactor twisty delwri buffer submission code to avoid deadlocks.
- Shorten and fix indenting problems in the scrub code.
- Detect obviously bad summary counts at mount and fix them.
- Directly associate deferred ops control structure with a
transaction so that callers no longer have to manage it themselves.
- Remove a couple of IRIX-era inode macros.
- Remove the long-deprecated 'barrier' and 'nobarrier' mount options.
- Clean up the inode fork structure a bit.
- Check for bad fs summary counter values in the superblock.
- Reduce COW fork lookups during writeback.
- Refactor the deferred ops control structures into the transaction
structure, thereby eliminating the need for transaction users to
handle the deferred ops as a separate data structure.
- Add the ability to repair AG headers online.
- Fix a crash due to insufficient return value checking.
- Various fixes and cleanups"
* tag 'xfs-4.19-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (155 commits)
xfs: fix a null pointer dereference in xfs_bmap_extents_to_btree
xfs: remove b_last_holder & associated macros
iomap: Switch to offset_in_page for clarity
xfs: Close race between direct IO and xfs_break_layouts()
xfs: repair the AGI
xfs: repair the AGFL
xfs: repair the AGF
xfs: remove dead error handling code in xfs_dquot_disk_alloc()
xfs: use WRITE_ONCE to update if_seq
xfs: fix a comment in xfs_log_reserve
xfs: only validate summary counts on primary superblock
xfs: substitute spaces with tabs
xfs: fold dfops into the transaction
xfs: always defer agfl block frees
xfs: pass transaction to xfs_defer_add()
xfs: replace xfs_defer_ops ->dop_pending with on-stack list
xfs: cancel dfops on xfs_defer_finish() error
xfs: clean out superfluous dfops dop params/vars
xfs: drop dop param from xfs_defer_op_type ->finish_item() callback
xfs: automatic dfops inode relogging
...
Fuzzing tool reports a write to null pointer error in the
xfs_bmap_extents_to_btree, fix it by bailing out on encountering
a null pointer.
Signed-off-by: Shan Hai <shan.hai@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
This adds ordering of the updates and makes sure we always see the if_seq
update before the extent tree is modified.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Skip the summary counter checks for secondary superblocks and inprogress
primary superblocks because mkfs has always written those out with
zeroed summary counters.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
struct xfs_defer_ops has now been reduced to a single list_head. The
external dfops mechanism is unused and thus everywhere a (permanent)
transaction is accessible the associated dfops structure is as well.
Remove the xfs_defer_ops structure and fold the list_head into the
transaction. Also remove the last remnant of external dfops in
xfs_trans_dup().
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The AGFL fixup code conditionally defers block frees from the free
list based on whether the current transaction has an associated
xfs_defer_ops structure. Now that dfops is embedded in the
transaction and the internal dfops is used unconditionally, this
invariant is always true.
Remove the now dead logic to check for ->t_dfops in
xfs_alloc_fix_freelist() and unconditionally defer AGFL block frees.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The majority of remaining references to struct xfs_defer_ops in XFS
are associated with xfs_defer_add(). At this point, there are no
more external xfs_defer_ops users left. All instances of
xfs_defer_ops are embedded in the transaction, which means we can
safely pass the transaction down to the dfops add interface.
Update xfs_defer_add() to receive the transaction as a parameter.
Various subsystems implement wrappers to allocate and construct the
context specific data structures for the associated deferred
operation type. Update these to also carry the transaction down as
needed and clean up unused dfops parameters along the way.
This removes most of the remaining references to struct
xfs_defer_ops throughout the code and facilitates removal of the
structure.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[darrick: fix unused variable warnings with ftrace disabled]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The xfs_defer_ops ->dop_pending list is used to track active
deferred operations once intents are logged. These items must be
aborted in the event of an error. The list is populated as intents
are logged and items are removed as they complete (or are aborted).
Now that xfs_defer_finish() cancels on error, there is no need to
ever access ->dop_pending outside of xfs_defer_finish(). The list is
only ever populated after xfs_defer_finish() begins and is either
completed or cancelled before it returns.
Remove ->dop_pending from xfs_defer_ops and replace it with a local
list in the xfs_defer_finish() path. Pass the local list to the
various helpers now that it is not accessible via dfops. Note that
we have to check for NULL in the abort case as the final tx roll
occurs outside of the scope of the new local list (once the dfops
has completed and thus drained the list).
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The current semantics of xfs_defer_finish() require the caller to
call xfs_defer_cancel() on error. This is slightly inconsistent with
transaction commit error handling where a failed commit cleans up
the transaction before returning.
More significantly, the only requirement for exposure of
->dop_pending outside of xfs_defer_finish() is so that
xfs_defer_cancel() can drain it on error. Since the only recourse of
xfs_defer_finish() errors is cancellation, mirror the transaction
logic and cancel remaining dfops before returning from
xfs_defer_finish() with an error.
Beside simplifying xfs_defer_finish() semantics, this ensures that
xfs_defer_finish() always returns with an empty ->dop_pending and
thus facilitates removal of the list from xfs_defer_ops.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The dfops code still passes around the xfs_defer_ops pointer
superfluously in a few places. Clean this up wherever the
transaction will suffice.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The dfops infrastructure ->finish_item() callback passes the
transaction and dfops as separate parameters. Since dfops is always
part of a transaction, the latter parameter is no longer necessary.
Remove it from the various callbacks.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Inodes that are held across deferred operations are explicitly
joined to the dfops structure to ensure appropriate relogging.
While inodes are currently joined explicitly, we can detect the
conditions that require relogging at dfops finish time by inspecting
the transaction item list for inodes with ili_lock_flags == 0.
Replace the xfs_defer_ijoin() infrastructure with such detection and
automatic relogging of held inodes. This eliminates the need for the
per-dfops inode list, replaced by an on-stack variant in
xfs_defer_trans_roll().
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Buffers that are held across deferred operations are explicitly
joined to the dfops structure to ensure appropriate relogging.
While buffers are currently joined explicitly, we can detect the
conditions that require relogging at dfops finish time by inspecting
the transaction item list for held buffers.
Replace the xfs_defer_bjoin() infrastructure with such detection and
automatic relogging of held buffers. This eliminates the need for
the per-dfops buffer list, replaced by an on-stack variant in
xfs_defer_trans_roll().
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Log items that require relogging during deferred operations
processing are explicitly joined to the associated dfops via the
xfs_defer_*join() helpers. These calls imply that the associated
object is "held" by the transaction such that when rolled, the item
can be immediately joined to a follow up transaction. For buffers,
this means the buffer remains locked and held after each roll. For
inodes, this means that the inode remains locked.
Failure to join a held item to the dfops structure means the
associated object pins the tail of the log while dfops processing
completes, because the item never relogs and is not unlocked or
released until deferred processing completes.
Currently, all buffers that are held in transactions (XFS_BLI_HOLD)
with deferred operations are explicitly joined to the dfops. This is
not the case for inodes, however, as various contexts defer
operations to transactions with held inodes without explicit joins
to the associated dfops (and thus not relogging).
While this is not a catastrophic problem, it is not ideal. Given
that we want to eventually relog such items automatically during
dfops processing, start by explicitly adding these missing
xfs_defer_ijoin() calls. A call is added everywhere an inode is
joined to a transaction without transferring lock ownership and
said transaction runs deferred operations.
All xfs_defer_ijoin() calls will eventually be replaced by automatic
dfops inode relogging. This patch essentially implements the
behavior change that would otherwise occur due to automatic inode
dfops relogging.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The dop_low field enables the low free space allocation mode when a
previous allocation has detected difficulty allocating blocks. It
has historically been part of the xfs_defer_ops structure, which
means if enabled, it remains enabled across a set of transactions
until the deferred operations have completed and the dfops is reset.
Now that the dfops is embedded in the transaction, we can save a bit
more space by using a transaction flag rather than a standalone
boolean. Drop the ->dop_low field and replace it with a transaction
flag that is set at the same points, carried across rolling
transactions and cleared on completion of deferred operations. This
essentially emulates the behavior of ->dop_low and so should not
change behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
All callers pass ->t_dfops of the associated transactions. Refactor
the helpers to receive the transactions and facilitate further
cleanups between xfs_defer_ops and xfs_trans.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
With no more external dfops users, there is no need for an
xfs_defer_ops cancel wrapper.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Before we start processing what we /think/ is a da3 node block, actually
check the magic to make sure that we're looking at a node block. This
way we won't blow the asserts in _node_hdr_from_disk on corrupted
metadata.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Use a local variable for the block magic number checks instead of
abusing blk->magic.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Add a predicate to decide if the log is actively in recovery and use
that instead of open-coding a pagf_init check in the attr leaf verifier.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Move the per-AG busy extent tree initialization to the per-ag structure
initialization since we don't want online repair to leak the old tree.
We only deconstruct the tree at unmount time, so this should be safe.
This also enables us to eliminate the commented out initialization in
the xfsprogs libxfs.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Add a simple 32-bit unsigned integer as the sequence count for
modifications to the extent list in the inode fork. This will be
used to optimize away extent list lookups in the writeback code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Make sure we never try to write the superblock with unknown feature bits
set. We checked those at mount time, so if they're set now then memory
is corrupt.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Add a helper predicate to check the inode count for sanity, then use it
in the superblock write verifier to inspect sb_icount.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Current sb verifier doesn't check bounds on sb_fdblocks and sb_ifree.
Add sanity checks for these parameters.
Signed-off-by: Bill O'Donnell <billodo@redhat.com>
[darrick: port to refactored sb validation predicates]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Split the superblock verifier into the common checks, the read-time
checks, and the write-time check functions. No functional changes, but
we're setting up to add more write-only checks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
We only have a few more callers left, so seize the opportunity and kill
it off.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Streamline the code and take advantage of the fact that kmem_realloc
through krealloc will be have like a normal allocation if passing in a
NULL old pointer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The field is only used for asserts, and to track if we really need to do
realloc when growing the inode fork data. But the krealloc function
already performs this check internally, so there is no need to keep track
of the real allocation size.
This will free space in the inode fork for keeping a sequence counter of
changes to the extent list.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Pass a tranaction pointer through to all helpers that calculate the
per-AG block reservation. Online repair will use this to reinitialize
per-ag reservations while it still holds all the AG headers locked to
the repair transaction.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Once xfs_defer_finish() has completed all deferred operations, it
checks the dirty state of the transaction and rolls it once more to
return a clean transaction for the caller. This primarily to cover
the case where repeated xfs_defer_finish() calls are made in a loop
and we need to make sure that the caller starts the next iteration
with a clean transaction. Otherwise we risk transaction reservation
overrun.
This final transaction roll is not required in the transaction
commit path, however, because the transaction is immediately
committed and freed after dfops completion. Refactor the final roll
into a separate helper such that we can avoid it in the transaction
commit path. Lift the dfops reset as well so dfops remains valid
until after the last call to xfs_defer_trans_roll(). The reset is
also unnecessary in the transaction commit path because the
transaction is about to complete.
This eliminates unnecessary regrants of transactions where the
associated transaction roll can be replaced by a transaction commit.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Every caller of xfs_defer_finish() now passes the transaction and
its associated ->t_dfops. The xfs_defer_ops parameter is therefore
no longer necessary and can be removed.
Since most xfs_defer_finish() callers also have to consider
xfs_defer_cancel() on error, update the latter to also receive the
transaction for consistency. The log recovery code contains an
outlier case that cancels a dfops directly without an available
transaction. Retain an internal wrapper to support this outlier case
for the time being.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Each xfs_defer_init() call in the xattr code uses the internal dfops
reference. In addition, a successful xfs_defer_finish() always
returns with a reset xfs_defer_ops structure.
Given that along with the fact that every xfs_defer_init() call in
the xattr code is followed up by an xfs_defer_finish(), the former
calls are no longer necessary and can be removed.
Note that the xfs_defer_init() call in the remote value copy loop of
xfs_attr_rmtval_set() is not followed by a finish, but the dfops is
unused in this instance.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
At this point, the transaction subsystem completely manages deferred
items internally such that the common and boilerplate
xfs_trans_alloc() -> xfs_defer_init() -> xfs_defer_finish() ->
xfs_trans_commit() sequence can be replaced with a simple
transaction allocation and commit.
Remove all such boilerplate deferred ops code. In doing so, we
change each case over to use the dfops in the transaction and
specifically eliminate:
- The on-stack dfops and associated xfs_defer_init() call, as the
internal dfops is initialized on transaction allocation.
- xfs_bmap_finish() calls that precede a final xfs_trans_commit() of
a transaction.
- xfs_defer_cancel() calls in error handlers that precede a
transaction cancel.
The only deferred ops calls that remain are those that are
non-deterministic with respect to the final commit of the associated
transaction or are open-coded due to special handling.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Remove the unnecessary on-stack dfops structure and use the internal
transaction dfops instead. The lower level xattr code already
appropriately accesses ->t_dfops throughout.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The dfops structure used by multi-transaction operations is
typically stored on the stack and carried around by the associated
transaction. The lifecycle of dfops does not quite match that of the
transaction, but they are tightly related in that the former depends
on the latter.
The relationship of these objects is tight enough that we can avoid
the cumbersome boilerplate code required in most cases to manage
them separately by just embedding an xfs_defer_ops in the
transaction itself. This means that a transaction allocation returns
with an initialized dfops, a transaction commit finishes pending
deferred items before the tx commit, a transaction cancel cancels
the dfops before the transaction and a transaction dup operation
transfers the current dfops state to the new transaction.
The dup operation is slightly complicated by the fact that we can no
longer just copy a dfops pointer from the old transaction to the new
transaction. This is solved through a dfops move helper that
transfers the pending items and other dfops state across the
transactions. This also requires that transaction rolling code
always refer to the transaction for the current dfops reference.
Finally, to facilitate incremental conversion to the internal dfops
and continue to support the current external dfops mode of
operation, create the new ->t_dfops_internal field with a layer of
indirection. On allocation, ->t_dfops points to the internal dfops.
This state is overridden by callers who re-init a local dfops on the
transaction. Once ->t_dfops is overridden, the external dfops
reference is maintained as the transaction rolls.
This patch adds the fundamental ability to support an internal
dfops. All codepaths that perform deferred processing continue to
override the internal dfops until they are converted over in
subsequent patches.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Both structures have holes due to member alignment. Move dop_low to
the end of xfs_defer ops to sanitize the cache line alignment and
move t_flags to save 8 bytes in xfs_trans.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
xfs_defer_init() is currently used in two particular situations. The
first and most obvious case is raw initialization of an
xfs_defer_ops struct. The other case is partial reinit of
xfs_defer_ops on reuse due to iteration.
Most instances of the first case will be replaced by a single init
of a dfops embedded in the transaction. Init calls are still
technically required for the second case because the dfops may have
low space mode enabled or have joined items that need to be reset
before the dfops should be reused.
Since the current dfops usage expects either a final transaction
commit after xfs_defer_finish() or xfs_defer_init() if dfops is to
be reused, we can shift some of the init logic into
xfs_defer_finish() such that the latter returns with a reinitialized
dfops. This eliminates the second dependency noted above such that a
dfops is immediately ready for reuse after an xfs_defer_finish()
without the need to change any calling code.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
dop_committed is set when deferred item processing rolls the
transaction at least once, but is only ever accessed in tracepoints.
The transaction roll/commit events are already available via
independent tracepoints, so remove the otherwise unused field.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
xfs_defer_finish() has a couple quirks that are not safe with
respect to the upcoming internal dfops functionality. First,
xfs_defer_finish() attaches the passed in dfops structure to
->t_dfops and caches and restores the original value. Second, it
continues to use the initial dfops reference before and after the
transaction roll.
These behaviors assume that dop is an independent memory allocation
from the transaction itself, which may not always be true once
transactions begin to use an embedded dfops structure. In the latter
model, dfops processing creates a new xfs_defer_ops structure with
each transaction and the associated state is migrated across to the
new transaction.
Fix up xfs_defer_finish() to handle the possibility of the current
dfops changing after a transaction roll. Since ->t_dfops is used
unconditionally in this path, it is no longer necessary to
attach/restore ->t_dfops and pass it explicitly down to
xfs_defer_trans_roll(). Update dop in the latter function and the
caller to ensure that it always refers to the current dfops
structure.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The xattr remote value set/remove handlers both clear args.trans in
the error path without having cancelled the transaction. This leaks
the transaction, causes warnings around returning to userspace with
locks held and leads to system lockups or other general problems.
The higher level xfs_attr_[set|remove]() functions already detect
and cancel args.trans when set in the error path. Drop the NULL
assignments from the rmtval handlers and allow the callers to clean
up the transaction correctly.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
When inodes are freed in xfs_ifree(), di_flags is cleared (so extent size
hints are removed) but the actual extent size fields are left intact.
This causes the extent hint validators to fail on freed inodes which once
had extent size hints.
This can be observed (for example) by running xfs/229 twice on a
non-crc xfs filesystem, or presumably on V5 with ikeep.
Fixes: 7d71a67 ("xfs: verify extent size hint is valid in inode verifier")
Fixes: 02a0fda ("xfs: verify COW extent size hint is valid in inode verifier")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Use the "bad summary count" mount flag from the previous patch to skip
writing the unmount record to force log recovery at the next mount,
which will recalculate the summary counters for us.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Refactor the writing of the unmount record into a separate helper. No
functionality changes.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Filippo Giunchedi complained that xfs doesn't even perform basic sanity
checks of the fs summary counters at mount time. Therefore, recalculate
the summary counters from the AGFs after log recovery if the counts were
bad (or we had to recover the fs). Enhance the recalculation routine to
fail the mount entirely if the new values are also obviously incorrect.
We use a mount state flag to record the "bad summary count" state so
that the (subsequent) online fsck patches can detect subtlely incorrect
counts and set the flag; clear it userspace asks for a repair; or force
a recalculation at the next mount if nobody fixes it by unmount time.
Reported-by: Filippo Giunchedi <fgiunchedi@wikimedia.org>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Less trivial cleanups of the error argument to xfs_btree_del_cursor;
these require some minor code refactoring.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The error argument to xfs_btree_del_cursor already understands the
"nonzero for error" semantics, so remove pointless error testing in the
callers and pass it directly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The following assertion was seen on generic/051:
XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5
------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:102!
invalid opcode: 0000 [#1] SMP PTI
CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4
RIP: 0010:assfail+0x23/0x30
Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa
RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
FS: 00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0
Call Trace:
xfs_defer_init+0xff/0x160
xfs_reflink_remap_extent+0x31b/0xa00
xfs_reflink_remap_blocks+0xec/0x4a0
xfs_reflink_remap_range+0x3a1/0x650
xfs_file_dedupe_range+0x39/0x50
vfs_dedupe_file_range+0x218/0x260
do_vfs_ioctl+0x262/0x6a0
? __se_sys_newfstat+0x3c/0x60
ksys_ioctl+0x35/0x60
__x64_sys_ioctl+0x11/0x20
do_syscall_64+0x4b/0x190
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The root cause of the assertion failure is that xfs_defer_finish doesn't
roll the transaction after processing all the deferred items. Therefore
it returns a dirty transaction to the caller, which leaves the caller at
risk of exceeding the transaction reservation if it logs more items.
Brian Foster's patchset to move the defer_ops firstblock into the
transaction requires t_firstblock == NULLFSBLOCK upon defer_ops
initialization, which is how this was noticed at all.
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Check the leaf attribute freemap when we're verifying the block.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
No significant changes, just silence a couple of sparse errors.
Using cpu_to_be32(NULLAGINO), the NULLAGINO constant will be encoded in
BE as a constant, avoiding a BE -> CPU conversion every iteraction of
the loop, if be32_to_cpu(agi->agi_unlinked[i]) was used instead.
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Make use of the swap macro and remove unnecessary variable *tmp*. This
makes the code easier to read and maintain. Also, slightly refactor some
code.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Make use of the swap macro and remove some unnecessary variables.
This makes the code easier to read and maintain. Also, reduces the
stack usage.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The original rmap code assumed that there would always be at least one
rmap in the rmapbt (the AG sb/agf/agi) and so errored out if it didn't
find one. This assumption isn't true for the rmapbt repair function
(and it won't be true for realtime rmap either), so remove the check and
just deal with the situation.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Make sure we initialize *bno and *len, before jumping to out_bad_rec
label, and risk calling xfs_warn() with uninitialized variables.
Coverity: 100898
Coverity: 1437081
Coverity: 1437129
Coverity: 1437191
Coverity: 1437201
Coverity: 1437212
Coverity: 1437341
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
All but one caller of xfs_defer_init() passes in the ->t_firstblock
of the associated transaction. The one outlier is
xlog_recover_process_intents(), which simply passes a dummy value
because a valid pointer is required. This firstblock variable can
simply be removed.
At this point we could remove the xfs_defer_init() firstblock
parameter and initialize ->t_firstblock directly. Even that is not
necessary, however, because ->t_firstblock is automatically
reinitialized in the new transaction on a transaction roll. Since
xfs_defer_init() should never occur more than once on a particular
transaction (since the corresponding finish will roll it), replace
the reinit from xfs_defer_init() with an assert that verifies the
transaction has a NULLFSBLOCK firstblock.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
xfs_refcount_recover_cow_leftovers() has no need for a firstblock
variable and so passes an unrelated xfs_fsblock_t to
xfs_defer_init() to avoid declaring one. Replace this no-op
initialization with ->t_firstblock. This will be optimized away by
the removal of the xfs_defer_init() firstblock param.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The xfs_alloc_arg.firstblock field is used to control the starting
agno for an allocation. The structure already carries a pointer to
the transaction, which carries the current firstblock value.
Remove the field and access ->t_firstblock directly in the
allocation code.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The bmbt cursor private structure has a firstblock field that is
used to maintain locking order on bmbt allocations. The field holds
an actual firstblock value (as opposed to a pointer), so it is
initialized on cursor creation, updated on allocation and then the
value is transferred back to the source before the cursor is
destroyed.
This value is always transferred from and back to the ->t_firstblock
field. Since xfs_btree_cur already carries a reference to the
transaction, we can remove this field from xfs_btree_cur and the
associated copying. The bmbt allocations will update the value in
the transaction directly.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The bmap format helpers receive firstblock via ->t_firstblock. Drop
the param and access it directly.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The add extent helpers all receive firstblock via ->t_firstblock.
Drop the parameter and access it directly.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The xfs_bmalloca.firstblock field carries the firstblock value from
the transaction into the bmap infrastructure. It's initialized in
one place from ->t_firstblock, so drop the field and access
->t_firstblock directly throughout the bmap code.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Also remove the unnecessary xfs_bmap_split_extent_at() parameter.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The only callers pass ->t_firstblock.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
All callers pass ->t_firstblock from the current transaction.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
All callers pass ->t_firstblock from the current transaction.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Convert all xfs_bunmapi() callers to ->t_firstblock.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Similar to the dirops code, the xattr code uses an on-stack
firstblock variable for the various operations. This code rolls the
underlying transaction in various places, however, which means we
cannot simply replace the local firstblock vars with ->t_firstblock.
Doing so (without further changes) would invalidate the memory
pointed to by xfs_da_args.firstblock as soon as the first
transaction rolls.
To avoid this problem, remove xfs_da_args.firstblock and replace all
such accesses with ->t_firstblock at the same time. This ensures
that accesses to the current firstblock always occur through the
current transaction rather than a potentially invalid xfs_da_args
pointer.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Note that this codepath is a user of struct xfs_da_args. Switch it
over to ->t_firstblock in preparation to remove
xfs_da_args.firstblock.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
All callers of the xfs_dir_*() functions pass ->t_firstblock as the
firstblock parameter. Drop the parameter and access ->t_firstblock
directly.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
xfs_bmapi_write() always expects a valid firstblock pointer. It
immediately dereferences the pointer to help determine how to
initialize the bma.minleft field. The remaining accesses are
related to modifying btree format forks, which is only relevant for
!COW fork callers.
The reflink code passes a NULL transaction to xfs_bmapi_write() in a
couple places that do COW fork unwritten conversion. The purpose of
the firstblock field is to track the first block allocation in the
current transaction, so technically firstblock should not be
required for these callers either.
Tweak xfs_bmapi_write() to initialize the bma correctly without
accessing the firstblock pointer if no transaction is provided in
the first place. Update the reflink callers to pass NULL instead of
otherwise unused firstblock references.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Most callers of xfs_defer_init() immediately attach the dfops
structure to a transaction. Add a transaction parameter to eliminate
much of this boilerplate code. This also helps self-document the
fact that many codepaths now expect a dfops pointer implicitly via
xfs_trans->t_dfops.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Use ->t_dfops of the leftover COW reservation cleanup transaction.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The xfs_btree_cur.bc_private.a.dfops field is only ever initialized
by the refcountbt cursor init function. The only caller of that
function with a non-NULL dfops is from deferred completion context,
which already has attached to ->t_dfops.
In addition to that, the only actual reference of a.dfops is the
cursor duplication function, which means the field is effectively
unused.
Remove the dfops field from the bc_private.a union. Any future users
can acquire the dfops from the transaction. This patch does not
change behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
All assignments of xfs_btree_cur.bc_private.b.dfops originate from
->t_dfops. Replace accesses of the former with the latter and remove
the unnecessary field. This patch does not change behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
All callers of the various bmap extent helpers now use ->t_dfops.
Remove the unnecessary dfops params and access ->t_dfops directly.
This patch does not change behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Use ->t_dfops for the collapse and insert range transactions. These
are the only callers of the respective bmap helpers, so replace the
unnecessary dfops parameters with direct accesses to ->t_dfops.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Now that bma.dfops is only assigned from ->t_dfops, replace all
accesses to the former with the latter and remove the unnecessary
field. This patch does not change behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
All xfs_bmapi_remap() callers already use ->t_dfops. Note that
deferred completion context unconditionally sets ->t_dfops if it
hasn't already been set by the caller. Remove the unnecessary
parameter and access ->t_dfops directly.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Now that all xfs_bunmapi() callers use ->t_dfops, remove the
unnecessary parameter and access ->t_dfops directly. This patch does
not change behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Now that all callers use ->t_dfops, the xfs_bmapi_write() dfops
parameter is no longer necessary. Remove it and access ->t_dfops
directly. This patch does not change behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Now that xfs_da_args->dfops is always assigned from a ->t_dfops
pointer (or one that is immediately attached), replace all
downstream accesses of the former with the latter and remove the
field from struct xfs_da_args.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Attach the local dfops to ->t_dfops of the extent split transaction.
Since this is the only caller of xfs_bmap_split_extent_at(), remove
the dfops parameter as well.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Now that the attribute fork add tx carries dfops along with the
transaction, it is unnecessary to pass it down the stack. Remove the
dfops parameter and access ->t_dfops directly where necessary. This
patch does not change behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Attach the local dfops to the transaction allocated for xattr add
and remove operations. Add an earlier initialization in
xfs_attr_remove() to ensure the structure is valid if it remains
unused at transaction commit time.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
All callers of the directory create, rename and remove interfaces
already associate the dfops with the transaction. Drop the dfops
parameters in these calls in preparation for further cleanups in the
layers below. This patch does not change behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The inode free callchain starting in xfs_inactive_ifree() already
associates its dfops with the transaction. It still passes the dfops
on the stack down through xfs_difree_inobt(), however.
Clean up the call stack and reference dfops directly from the
transaction. This patch does not change behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The ->t_agfl_dfops field is currently used to defer agfl block frees
from associated transaction contexts. While all known problematic
contexts have already been updated to use ->t_agfl_dfops, the
broader goal is defer agfl frees from all callers that already use a
deferred operations structure. Further, the transaction field
facilitates a good amount of code clean up where the transaction and
dfops have historically been passed down through the stack
separately.
Rename the field to something more generic to prepare to use it as
such throughout XFS. This patch does not change behavior.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>