The volume mutex does not protect against anything in this case, the
comment about scrub is right but not related to locking and looks
confusing. The comment in btrfs_find_device_missing_or_by_path is wrong
and confusing too.
The device_list_mutex is not held here to protect device lookup, but in
this case device replace cannot run in parallel with device removal (due
to exclusive op protection), so we don't need further locking here.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function __cancel_balance name is confusing with the cancel
operation of balance and it really resets the state of balance back to
zero. The unset_balance_control helper is called only from one place and
simple enough to be inlined.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Replace a WARN_ON with a proper check and message in case something goes
really wrong and resumed balance cannot set up its exclusive status.
The check is a user friendly assertion, I don't expect to ever happen
under normal circumstances.
Also document that the paused balance starts here and owns the exclusive
op status.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The device replace is paused by unmount or read only remount, and
resumed on next mount or write remount.
The exclusive status should be checked properly as it's a global
invariant and we must not allow 2 operations run. In this case, the
balance can be also paused and resumed under same conditions. It's
always checked first so dev-replace could see the EXCL_OP already taken,
BUT, the ioctl would never let start both at the same time.
Replace the WARN_ON with message and return 0, indicating no error as
this is purely theoretical and the user will be informed. Resolving that
manually should be possible by waiting for the other operation to finish
or cancel the paused state.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Move locking and unlocking next to the BTRFS_FS_EXCL_OP bit manipulation
so it's obvious that the two happen at the same time.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function logically belongs there and there's only a single caller,
no need to export it. No code changes.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function will be used outside of volumes.c, the allocation
btrfs_alloc_device is also exported.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is a preparatory cleanup that will make clear that the only
successful way out of btrfs_init_dev_replace_tgtdev will also set the
device_out to a valid pointer. With this guarantee, the callers can be
simplified.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function is called once and is fairly small, we can merge it with
the caller.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function uses fs_info::fs_devices number of time, however we
declare and use it only at the end, instead do it in the beginning of
the function and use it.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
find_device() declares struct list_head *head pointer and used only once,
instead just use it directly.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
__btrfs_open_devices() is un-exported drop __ prefix and rename it to
open_fs_devices().
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
__btrfs_close_devices() is un-exported, drop the __ prefix and rename it
to close_fs_devices().
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
__btrfs_open_devices() declares struct list_head *head, however head is
used only once, instead use btrfs_fs_devices::devices directly.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
btrfs_fs_devices::list is the list of BTRFS fsid in the kernel, a generic
name 'list' makes it's search very difficult, rename it to fs_list.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function is called from only 1 place and is effectively a wrapper
over wait_completion/kfree. It doesn't really bring any value having
those two calls in a separate function. Just open code it and remove it.
No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It can be directly referenced from the passed address_space so do that.
No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
list_empty_careful usually is a signal of something tricky going on. Its
usage in btrfs is actually not needed since both lists it's used on are
local to a function and cannot be modified concurrently. So switch to
plain list_empty. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function is called only from btrfs_readpage and is already passed
the mapping. Simplify its signature by moving the code obtaining
reference to the extent tree in the function. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's not used in the function so just remove it. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This function already gets the page from which the two extent trees
are referenced. Simplify its signature by moving the code getting the
trees inside the function. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Change the behavior of rmdir(2) and allow it to delete an empty
subvolume by using btrfs_delete_subvolume() which is used by
btrfs_ioctl_snap_destroy().
This is a change in behaviour and has been requested by users. Deleting
the subvolume by ioctl requires root permissions while the rmdir way
does works with standard tools and syscalls for all users that can
access the subvolume.
The main usecase is to allow 'rm -rf /path/with/subvols' to simply work.
We were not able to find any nasty usability surprises, the intention is
to do the destructive rm. Without allowing rmdir, this would have to be
followed by the ioctl subvolume deletion, which is more of an annoyance.
Implementation details:
The required lock for @dir and inode of @dentry is already acquired in
vfs layer.
We need some check before deleting a subvolume. Permission check is done
in vfs layer, emptiness check is in btrfs_rmdir() and additional check
(i.e. neither the subvolume is a default subvolume nor send is in progress)
is in btrfs_delete_subvolume().
Note that in btrfs_ioctl_snap_destroy(), d_delete() is called after
btrfs_delete_subvolume(). For rmdir(2), d_delete() is called in vfs
layer later.
Tested-by: Goffredo Baroncelli <kreijack@inwind.it>
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ enhance changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Factor out the second half of btrfs_ioctl_snap_destroy() as
btrfs_delete_subvolume(), which performs some subvolume specific checks
before deletion:
1. send is not in progress
2. the subvolume is not the default subvolume
3. the subvolume does not contain other subvolumes
and actual deletion process. btrfs_delete_subvolume() requires
inode_lock for both @dir and inode of @dentry. The remaining part of
btrfs_ioctl_snap_destroy() is mainly permission checks.
Note that call of d_delete() is not included in btrfs_delete_subvolume()
as this function will also be used by btrfs_rmdir() to delete an empty
subvolume and in that case d_delete() is called in VFS layer.
As a result, btrfs_unlink_subvol() and may_destroy_subvol()
become static functions. No functional changes.
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor comment updates ]
Signed-off-by: David Sterba <dsterba@suse.com>
This is a preparation work to refactor btrfs_ioctl_snap_destroy()
and to allow rmdir(2) to delete an empty subvolume.
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor update of the function comment ]
Signed-off-by: David Sterba <dsterba@suse.com>
With commit b18253ec57c0 ("btrfs: optimize free space tree bitmap
conversion"), there are no more callers to le_test_bit(). This patch
removes le_test_bit().
Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Presently, convert_free_space_to_extents() does a linear scan of the
bitmap. We can speed this up with find_next_{bit,zero_bit}_le().
This patch replaces the linear scan with find_next_{bit,zero_bit}_le().
Testing shows a 20-33% decrease in execution time for
convert_free_space_to_extents().
Since we change bitmap to be unsigned long, we have to do some casting
for the bitmap cursor. In le_bitmap_set() it makes sense to use u8, as
we are doing bit operations. Everywhere else, we're just using it for
pointer arithmetic and not directly accessing it, so char seems more
appropriate.
Suggested-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
le_bitmap_set() is only used by free-space-tree, so move it there and
make it static. le_bitmap_clear() is not used, so remove it.
Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We really want to know to which filesystem the extent map events belong,
but as it cannot be reached from the extent_map pointers, we need to
pass it down the callchain.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Preparatory work to pass fs_info to btrfs_add_extent_mapping so we can
get a better tracepoint message. Extent maps do not need fs_info for
anything so we only add a dummy one without any other initialization.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Most of the strings are prefixed by the UUID of the filesystem that
generates the message, however there are a few events that still
opencode the macro magic and can be converted to the common macros.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The preferred style is to avoid spaces between key and value and no
commas between key=values.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The (unsigned long long) casts are not necessary since long ago.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The size of ino_t depends on 32/64bit architecture type. Btrfs stores
the full 64bit inode anyway so we should use it.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The second if is really a subcase of ret being less than 0. So
introduce a generic if (ret < 0) check, and inside have another if
which explicitly handles the -ENOSPC and any other errors. No
functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Locks should generally be released in the oppposite order they are
acquired. Generally lock acquisiton ordering is used to ensure
deadlocks don't happen. However, as becomes more complicated it's
best to also maintain proper unlock order so as to avoid possible dead
locks. This was found by code inspection and doesn't necessarily lead
to a deadlock scenario.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently __endio_write_update_ordered uses labels to implement
what is essentially a simple while loop. This makes the code more
cumbersome to follow than it actually has to be. No functional
changes. No xfstest regressions were found during testing.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Adds comments about BTRFS_FS_EXCL_OP to existing comments
about the device locks.
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ minor updates ]
Signed-off-by: David Sterba <dsterba@suse.com>
It's used to print its pointer in a debug statement but doesn't really
bring any useful information to the error message.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The function btrfs_get_block_group_info() was introduced by the
commit 5af3e8cce8 ("Btrfs: make filesystem read-only when submitting
barrier fails") which used it in disk-io.c.
However, the function is only called in ioctl.c now.
Its parameter type btrfs_ioctl_space_info* is only for ioctl.
So, make it static and rename it to be original name
get_block_group_info.
No functional change.
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
add_pinned_bytes really cares whether the bytes being pinned are either
data or metadata. To that effect it checks whether the 'owner' argument
is less than BTRFS_FIRST_FREE_OBJECTID (256). This works because
owner can really have 2 types of values:
a) For metadata extents it holds the level at which the parent is in
the btree. This amounts to owner having the values 0-7
b) In case of modifying data extents, owner is the inode number
to which those extents belongs.
Let's make this more explicit byt converting the owner parameter to a
boolean value and either pass it directly when we know the type of
extents we are working with (i.e. in btrfs_free_tree_block). In cases
when the parent function can be called on both metadata/data extents
perform the check in the caller. This hopefully makes the interface
of add_pinned_bytes more intuitive.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
- enable -fno-tree-loop-im only when supported
- add -fno-PIE option before the asm-goto test
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJbCkFPAAoJED2LAQed4NsGZVwP/juiS21QZAiQbUdX3FRcyYGs
+FDNLOwNdSU18QkdVrcJ4tG8hxBZqhIU0kq1MVE72Yo10xX8u7ssUJ0ttrUo5qIb
vlXtnqMOaZgLWnoNMGlDVPnNBxZh2UscbvjVGa5m9eqXrCU9AtQiCCoSceRtka12
tOBbfeTeJ8Ab2BKfzHcuqS+DSURkQGTyG4q1ZMxmdtIsltbZIez/zauRtAU/ULKx
Ed6HAdNiiMXRwsXnAwcGnJe9FyW7UPjZOdLn0vSizZQe8BJ+H+EotZy7FO8L407w
lgLVccCSZEFAilJRR+Xa1pMlg1KwSINcMK9BVOjIeeZL0kAIaC1zzVaPEbZ1MyDA
HKtX/MeDGX52ZW9SBCFQYKVsZQecYtyr27Z+c+8Af37sB3/ffBSeQc7YilsIGjSZ
MWARYbkOAcUif8IG6ymnEv2a4IOcD4rYNMkUfs8vXeJjejiP5rhA8zxWYng1DRmw
0g4x2iQeY7erUu/elflNa94e+PSgnwnmzWdloBqcmOtGxV+K+9BVaNsVmchyMAzt
PbQq1T8zodfr2+Jsf+yj1rWv3fLnahYh/WVAKj1rB/+Q31sYfvPlEmzayk2k9enK
Sgu5amtl64tgZD3zcSs1Ik39Ioe7s1Kf0W1Li8f2v1JR5t38UX5zkOa+O5w+sq77
NSBoCCRtn0eY3j/wo5kS
=r3P/
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild fixes from Masahiro Yamada:
- enable '-fno-tree-loop-im' only when supported
- add '-fno-PIE' option before the asm-goto test
* tag 'kbuild-fixes-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
Makefile: disable PIE before testing asm goto
kbuild: gcov: enable -fno-tree-loop-im if supported
A few more fixes for v4.17:
- A fix for a crash in scm_call_atomic on qcom platforms
- Display fix for Allwinner A10
- A fix that re-enables ethernet on Allwinner H3 (C.H.I.P et al)
- A fix for eMMC corruption on hikey
- i2c-gpio descriptor tables for ixp4xx
+ a small typo fix
-----BEGIN PGP SIGNATURE-----
iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAlsJynEPHG9sb2ZAbGl4
b20ubmV0AAoJEIwa5zzehBx3QEkP/A5dGXeQkArCWPvWoFr+20KjIS07f7F8olNy
9JKG3R2uEZsqjD3c6HFkd1abTtUQmgg/hmpxakAI8vbypA4gsq9jyFC6TxqsBSyz
uw7hQ5XcGA99pQXp8jYUrazi/XnG9Wm8LLBslsx75wJwNikzlAl6PStKDFcz0Pr6
A9JXWnqFY50YRzUr4y9GrSo3o4dvVniF3PUFEwnYliUI5qszph2/rwaE2zLQt/PT
X0DMA4v+c+4ngS5TGipY4vFjRyvsOv/NeDQzGTvGcU6QMdP4ZEsQBrye6BqowmaD
DqaoSHvsi7Lel4u29p5KyBKrM0bAhtFX+iCGiqTfkKwRWHkh7CHombUk2qX/9OJW
oB9orkKgiP35xAL5xFmB5tf03s0tQ8/qicE72tGW/TVIEBX/l+ymD76DH4rmYvRw
wNZ+HwHrMVkYgVG0TQIxxEgkXbPsyDbk3DbNbQkHf/pV5+PsMrp0iSo7oaglsS9Y
NYTRA/DQCldzhv68YRoMBh5gD4oE5iK3e3c4nLm80vd7zj8YsuXnc4+55a8PrHfs
oVg0PE5fVlP3AVRJW09ikdf03U7m0AFX/fFKHrAwWylT1+Z1KSJhM4ZaXGgdvuOV
asFUenzF3WF6Nsx+smL/vLzr/AvvYeq80Q9OdLWQl4056HurkrpL/E2HVj4MYaoW
WKKRdfzX
=mga+
-----END PGP SIGNATURE-----
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"A few more fixes for v4.17:
- a fix for a crash in scm_call_atomic on qcom platforms
- display fix for Allwinner A10
- a fix that re-enables ethernet on Allwinner H3 (C.H.I.P et al)
- a fix for eMMC corruption on hikey
- i2c-gpio descriptor tables for ixp4xx
... plus a small typo fix"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: Fix i2c-gpio GPIO descriptor tables
arm64: dts: hikey: Fix eMMC corruption regression
firmware: qcom: scm: Fix crash in qcom_scm_call_atomic1()
ARM: sun8i: v3s: fix spelling mistake: "disbaled" -> "disabled"
ARM: dts: sun4i: Fix incorrect clocks for displays
ARM: dts: sun8i: h3: Re-enable EMAC on Orange Pi One
Pull x86 store buffer fixes from Thomas Gleixner:
"Two fixes for the SSBD mitigation code:
- expose SSBD properly to guests. This got broken when the CPU
feature flags got reshuffled.
- simplify the CPU detection logic to avoid duplicate entries in the
tables"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/speculation: Simplify the CPU bug detection logic
KVM/VMX: Expose SSBD properly to guests
Pull scheduler fixes from Thomas Gleixner:
"Three fixes for scheduler and kthread code:
- allow calling kthread_park() on an already parked thread
- restore the sched_pi_setprio() tracepoint behaviour
- clarify the unclear string for the scheduling domain debug output"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched, tracing: Fix trace_sched_pi_setprio() for deboosting
kthread: Allow kthread_park() on a parked kthread
sched/topology: Clarify root domain(s) debug string