Go to file
Gabriel Krisman Bertazi 36e1f3d107 blk-mq: Avoid memory reclaim when remapping queues
While stressing memory and IO at the same time we changed SMT settings,
we were able to consistently trigger deadlocks in the mm system, which
froze the entire machine.

I think that under memory stress conditions, the large allocations
performed by blk_mq_init_rq_map may trigger a reclaim, which stalls
waiting on the block layer remmaping completion, thus deadlocking the
system.  The trace below was collected after the machine stalled,
waiting for the hotplug event completion.

The simplest fix for this is to make allocations in this path
non-reclaimable, with GFP_NOIO.  With this patch, We couldn't hit the
issue anymore.

This should apply on top of Jens's for-next branch cleanly.

Changes since v1:
  - Use GFP_NOIO instead of GFP_NOWAIT.

 Call Trace:
[c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable)
[c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430
[c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0
[c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0
[c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30
[c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0
[c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0
[c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs]
[c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs]
[c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs]
[c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210
[c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0
[c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0
[c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520
[c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250
[c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0
[c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380
[c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360
[c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0
[c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250
[c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100
[c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0
[c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0
[c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250
[c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120
[c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0
[c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150
[c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0
[c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120
[c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0
[c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0
[c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0
[c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250
[c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0
[c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270
[c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110
[c000000f0160be30] [c000000000009204] system_call+0x38/0xec

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Cc: linux-block@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-12-14 08:15:25 -07:00
arch arm64 updates for 4.10: 2016-12-13 16:39:21 -08:00
block blk-mq: Avoid memory reclaim when remapping queues 2016-12-14 08:15:25 -07:00
certs certs: Add a secondary system keyring that can be added to dynamically 2016-04-11 22:48:09 +01:00
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-12-10 16:21:55 -05:00
Documentation arm64 updates for 4.10: 2016-12-13 16:39:21 -08:00
drivers nvme/pci: Log PCI_STATUS when the controller dies 2016-12-13 21:07:08 -07:00
firmware WHENCE: use https://linuxtv.org for LinuxTV URLs 2015-12-04 10:35:11 -02:00
fs block_dev: don't update file access position for sync direct IO 2016-12-13 21:07:08 -07:00
include arm64 updates for 4.10: 2016-12-13 16:39:21 -08:00
init Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2016-12-13 12:59:57 -08:00
ipc sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q 2016-11-21 10:29:01 +01:00
kernel Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2016-12-13 12:59:57 -08:00
lib Merge branch 'stable/for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb 2016-12-13 15:52:23 -08:00
mm Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2016-12-13 12:59:57 -08:00
net Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 19:25:04 -08:00
samples VFIO updates for v4.10-rc1 2016-12-13 09:23:56 -08:00
scripts Char/Misc driver patches for 4.10-rc1 2016-12-13 12:11:01 -08:00
security Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 19:56:15 -08:00
sound Main pull request for drm for 4.10 kernel 2016-12-13 09:35:09 -08:00
tools arm64 updates for 4.10: 2016-12-13 16:39:21 -08:00
usr usr/Kconfig: make initrd compression algorithm selection not expert 2014-12-13 12:42:52 -08:00
virt Small release, the most interesting stuff is x86 nested virt improvements. 2016-12-13 15:47:02 -08:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Add hch to .get_maintainer.ignore 2015-08-21 14:30:10 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild 2016-08-02 16:48:52 -04:00
.mailmap Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2016-10-15 09:26:12 -07:00
COPYING [PATCH] update FSF address in COPYING 2005-09-10 10:06:29 -07:00
CREDITS Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-12 15:23:02 -08:00
Kbuild scripts/gdb: provide linux constants 2016-05-23 17:04:14 -07:00
Kconfig kbuild: migrate all arch to the kconfig mainmenu upgrade 2010-09-19 22:54:11 -04:00
MAINTAINERS Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2016-12-13 16:33:33 -08:00
Makefile Linux 4.9 2016-12-11 11:17:54 -08:00
README README: add a new README file, pointing to the Documentation/ 2016-10-24 08:12:35 -02:00

Linux kernel
============

This file was moved to Documentation/admin-guide/README.rst

Please notice that there are several guides for kernel developers and users.
These guides can be rendered in a number of formats, like HTML and PDF.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.