Go to file
Omar Sandoval c495dcd6fb btrfs: don't prematurely free work in run_ordered_work()
We hit the following very strange deadlock on a system with Btrfs on a
loop device backed by another Btrfs filesystem:

1. The top (loop device) filesystem queues an async_cow work item from
   cow_file_range_async(). We'll call this work X.
2. Worker thread A starts work X (normal_work_helper()).
3. Worker thread A executes the ordered work for the top filesystem
   (run_ordered_work()).
4. Worker thread A finishes the ordered work for work X and frees X
   (work->ordered_free()).
5. Worker thread A executes another ordered work and gets blocked on I/O
   to the bottom filesystem (still in run_ordered_work()).
6. Meanwhile, the bottom filesystem allocates and queues an async_cow
   work item which happens to be the recently-freed X.
7. The workqueue code sees that X is already being executed by worker
   thread A, so it schedules X to be executed _after_ worker thread A
   finishes (see the find_worker_executing_work() call in
   process_one_work()).

Now, the top filesystem is waiting for I/O on the bottom filesystem, but
the bottom filesystem is waiting for the top filesystem to finish, so we
deadlock.

This happens because we are breaking the workqueue assumption that a
work item cannot be recycled while it still depends on other work. Fix
it by waiting to free the work item until we are done with all of the
related ordered work.

P.S.:

One might ask why the workqueue code doesn't try to detect a recycled
work item. It actually does try by checking whether the work item has
the same work function (find_worker_executing_work()), but in our case
the function is the same. This is the only key that the workqueue code
has available to compare, short of adding an additional, layer-violating
"custom key". Considering that we're the only ones that have ever hit
this, we should just play by the rules.

Unfortunately, we haven't been able to create a minimal reproducer other
than our full container setup using a compress-force=zstd filesystem on
top of another compress-force=zstd filesystem.

Suggested-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-11-18 12:46:48 +01:00
arch Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-16 16:10:59 -08:00
block iocost: check active_list of all the ancestors in iocg_activate() 2019-11-14 13:56:54 -07:00
certs PKCS#7: Refactor verify_pkcs7_signature() 2019-08-05 18:40:18 -04:00
crypto Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
Documentation Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-12 10:53:24 -08:00
drivers Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2019-11-17 08:15:41 -08:00
fs btrfs: don't prematurely free work in run_ordered_work() 2019-11-18 12:46:48 +01:00
include IOMMU Fixes for Linux v5.4-rc7 2019-11-17 11:27:44 -08:00
init Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
ipc ipc/sem.c: convert to use built-in RCU list checking 2019-09-25 17:51:41 -07:00
kernel Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-17 08:30:38 -08:00
lib lib/xz: fix XZ_DYNALLOC to avoid useless memory reallocations 2019-11-15 18:34:00 -08:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
mm mm/debug.c: PageAnon() is true for PageKsm() pages 2019-11-15 18:34:00 -08:00
net ipmr: Fix skb headroom in ipmr_get_route(). 2019-11-16 13:06:54 -08:00
samples samples/bpf: fix build by setting HAVE_ATTR_TEST to zero 2019-10-31 21:39:15 +01:00
scripts arm64 fix for -rc8 / final 2019-11-15 09:14:23 -08:00
security efi/efi_test: Lock down /dev/efi_test and require CAP_SYS_ADMIN 2019-10-31 09:40:21 +01:00
sound ALSA: usb-audio: Fix incorrect size check for processing/extension units 2019-11-14 18:01:22 +01:00
tools Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2019-11-16 15:52:00 -08:00
usr kbuild: update compile-test header list for v5.4-rc2 2019-10-05 15:29:49 +09:00
virt KVM: Add a comment describing the /dev/kvm no_compat handling 2019-11-15 10:14:04 +01:00
.clang-format clang-format: Update with the latest for_each macro list 2019-08-31 10:00:51 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Modules updates for v5.4 2019-09-22 10:34:46 -07:00
.mailmap ARM: SoC fixes 2019-11-10 13:41:59 -08:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS MAINTAINERS: Remove Simon as Renesas SoC Co-Maintainer 2019-10-10 08:12:51 -07:00
Kbuild kbuild: do not descend to ./Kbuild when cleaning 2019-08-21 21:03:58 +09:00
Kconfig docs: kbuild: convert docs to ReST and rename to *.rst 2019-06-14 14:21:21 -06:00
MAINTAINERS IOMMU Fixes for Linux v5.4-rc7 2019-11-17 11:27:44 -08:00
Makefile Linux 5.4-rc8 2019-11-17 14:47:30 -08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.