linux_dsm_epyc7002/drivers
Michal Hocko c32b3cbe0d oom, PM: make OOM detection in the freezer path raceless
Commit 5695be142e ("OOM, PM: OOM killed task shouldn't escape PM
suspend") has left a race window when OOM killer manages to
note_oom_kill after freeze_processes checks the counter.  The race
window is quite small and really unlikely and partial solution deemed
sufficient at the time of submission.

Tejun wasn't happy about this partial solution though and insisted on a
full solution.  That requires the full OOM and freezer's task freezing
exclusion, though.  This is done by this patch which introduces oom_sem
RW lock and turns oom_killer_disable() into a full OOM barrier.

oom_killer_disabled check is moved from the allocation path to the OOM
level and we take oom_sem for reading for both the check and the whole
OOM invocation.

oom_killer_disable() takes oom_sem for writing so it waits for all
currently running OOM killer invocations.  Then it disable all the further
OOMs by setting oom_killer_disabled and checks for any oom victims.
Victims are counted via mark_tsk_oom_victim resp.  unmark_oom_victim.  The
last victim wakes up all waiters enqueued by oom_killer_disable().
Therefore this function acts as the full OOM barrier.

The page fault path is covered now as well although it was assumed to be
safe before.  As per Tejun, "We used to have freezing points deep in file
system code which may be reacheable from page fault." so it would be
better and more robust to not rely on freezing points here.  Same applies
to the memcg OOM killer.

out_of_memory tells the caller whether the OOM was allowed to trigger and
the callers are supposed to handle the situation.  The page allocation
path simply fails the allocation same as before.  The page fault path will
retry the fault (more on that later) and Sysrq OOM trigger will simply
complain to the log.

Normally there wouldn't be any unfrozen user tasks after
try_to_freeze_tasks so the function will not block. But if there was an
OOM killer racing with try_to_freeze_tasks and the OOM victim didn't
finish yet then we have to wait for it. This should complete in a finite
time, though, because

	- the victim cannot loop in the page fault handler (it would die
	  on the way out from the exception)
	- it cannot loop in the page allocator because all the further
	  allocation would fail and __GFP_NOFAIL allocations are not
	  acceptable at this stage
	- it shouldn't be blocked on any locks held by frozen tasks
	  (try_to_freeze expects lockless context) and kernel threads and
	  work queues are not frozen yet

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:03 -08:00
..
accessibility
acpi Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2015-02-10 20:01:30 -08:00
amba
android
ata SCSI misc on 20150209 2015-02-11 10:28:45 -08:00
atm atm: remove deprecated use of pci api 2015-01-18 00:28:41 -05:00
auxdisplay
base ACPI and power management updates for v3.20-rc1 2015-02-10 15:09:41 -08:00
bcma bcma: implement host code support for PCIe Gen 2 devices 2015-01-29 10:54:43 +02:00
block xen: features and fixes for 3.20-rc0 2015-02-10 13:56:56 -08:00
bluetooth Bluetooth: btusb: Add support for Lite-On (04ca) Broadcom based, BCM43142 2015-02-03 08:57:14 +01:00
bus mvebu fixes for 3.19. (Part 4) 2015-01-23 14:08:13 -08:00
cdrom
char ACPI and power management updates for v3.20-rc1 2015-02-10 15:09:41 -08:00
clk Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2015-02-10 20:01:30 -08:00
clocksource Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-01-27 16:59:56 -08:00
connector
coresight
cpufreq ACPI and power management updates for v3.20-rc1 2015-02-10 15:09:41 -08:00
cpuidle drivers: cpuidle: Don't initialize big.LITTLE driver if MCPM is unavailable 2015-01-23 15:05:48 +01:00
crypto
dca
devfreq ACPI and power management updates for v3.20-rc1 2015-02-10 15:09:41 -08:00
dio
dma resources: Move struct resource_list_entry from ACPI into resource core 2015-02-05 15:09:25 +01:00
dma-buf
edac EDAC, mv64x60_edac: Fix an error code in probe() 2015-01-30 17:00:43 +01:00
eisa
extcon
firewire
firmware * Move efivarfs from the misc filesystem section to pseudo filesystem, 2015-01-29 19:16:40 +01:00
fmc
gpio gpio: sysfs: fix memory leak in gpiod_sysfs_set_active_low 2015-01-30 10:29:33 +01:00
gpu sound updates for 3.20-rc1 2015-02-11 08:51:59 -08:00
hid Merge branches 'for-3.19/upstream-fixes', 'for-3.20/apple', 'for-3.20/betop', 'for-3.20/lenovo', 'for-3.20/logitech', 'for-3.20/rmi', 'for-3.20/upstream' and 'for-3.20/wacom' into for-linus 2015-02-09 11:17:45 +01:00
hsi hsi: nokia-modem: fix uninitialized device pointer 2015-01-04 20:19:30 +01:00
hv ACPICA: Resources: Provide common part for struct acpi_resource_address structures. 2015-01-26 16:09:56 +01:00
hwmon hwmon: (tmp102) add hibernation callbacks 2015-02-03 12:17:12 -08:00
hwspinlock
i2c i2c: sh_mobile: terminate DMA reads properly 2015-01-30 17:58:43 +01:00
ide
idle
iio Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2015-02-11 09:32:08 -08:00
infiniband Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2015-02-10 20:01:30 -08:00
input Merge branch 'next' into for-linus 2015-02-10 11:35:36 -08:00
iommu Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-02-09 16:57:56 -08:00
ipack
irqchip Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2015-02-06 08:28:54 -08:00
isdn drivers: isdn: isdnloop: isdnloop.c: Remove parenthesis around return values, as specified in CodingStyle. 2015-02-05 15:40:23 -08:00
leds leds: netxbig: fix oops at probe time 2015-01-13 13:49:01 -08:00
lguest
macintosh macintosh: therm_pm72: delete deprecated driver 2014-12-19 19:32:47 +01:00
mailbox ACPI / PCC: Use pr_debug() for debug messages in pcc_init() 2015-02-05 00:40:08 +01:00
mcb mcb: mcb-pci: Only remap the 1st 0x200 bytes of BAR 0 2015-01-09 15:46:37 -08:00
md Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-02-09 14:28:42 -08:00
media [media] dvb_net: Convert local hex dump to print_hex_dump_debug 2015-02-03 18:24:44 -02:00
memory MTD updates for 3.19: 2014-12-17 09:59:26 -08:00
memstick
message
mfd - Avoid platform ID collision in da9052 2015-01-21 18:29:44 +12:00
misc SCSI misc on 20150209 2015-02-11 10:28:45 -08:00
mmc mmc: sdhci-s3c: solve problem with sleeping in atomic context 2015-02-04 13:39:14 +01:00
mtd MTD updates for 3.19: 2014-12-17 09:59:26 -08:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2015-02-10 20:01:30 -08:00
nfc NFC: nci: Move NFCEE discovery logic 2015-02-04 09:15:18 +01:00
ntb
nubus
of ACPI and power management updates for v3.20-rc1 2015-02-10 15:09:41 -08:00
oprofile
parisc parisc/PCI: Clip bridge windows to fit in upstream windows 2015-01-16 10:04:43 -06:00
parport parport: parport_atari: Remove obsolete IRQ_TYPE_SLOW 2015-01-15 13:44:50 +01:00
pci ACPI and power management updates for v3.20-rc1 2015-02-10 15:09:41 -08:00
pcmcia
phy SCSI misc on 20150209 2015-02-11 10:28:45 -08:00
pinctrl pinctrl: at91: allow to have disabled gpio bank 2015-01-26 09:13:36 +01:00
platform Revert "platform: x86: dell-laptop: Add support for keyboard backlight" 2015-01-23 11:10:32 -08:00
pnp ACPI: Return translation offset when parsing ACPI address space resources 2015-02-03 22:27:21 +01:00
power power_supply: 88pm860x: Fix leaked power supply on probe fail 2015-01-28 15:08:10 +01:00
powercap powercap / RAPL: add IDs for future Xeon CPUs 2014-12-17 02:35:42 +01:00
pps
ps3
ptp
pwm pwm: Changes for v3.19-rc1 2014-12-17 10:10:51 -08:00
rapidio rapidio/tsi721: use PCI define for Max_Read_Request_Size 2015-01-27 08:14:26 -06:00
ras
regulator Merge remote-tracking branches 'regulator/topic/rk808', 'regulator/topic/rpm', 'regulator/topic/rt5033' and 'regulator/topic/tps65023' into regulator-next 2015-02-08 11:16:30 +08:00
remoteproc
reset reset: sunxi: fix spinlock initialization 2015-01-16 19:11:31 -08:00
rpmsg
rtc Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-02-09 17:53:53 -08:00
s390 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-01-27 16:59:56 -08:00
sbus
scsi SCSI misc on 20150209 2015-02-11 10:28:45 -08:00
sfi
sh
sn
soc
spi Merge remote-tracking branch 'spi/topic/xilinx' into spi-next 2015-02-08 11:17:01 +08:00
spmi
ssb ssb: Fix Sparse error in main 2015-01-29 10:17:56 +02:00
staging oom: add helpers for setting and clearing TIF_MEMDIE 2015-02-11 17:06:03 -08:00
target netlink: make nlmsg_end() and genlmsg_end() void 2015-01-18 01:03:45 -05:00
tc
thermal Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-01-27 16:59:56 -08:00
thunderbolt
tty oom, PM: make OOM detection in the freezer path raceless 2015-02-11 17:06:03 -08:00
uio
usb xilinx usb2 gadget: get rid of incredibly annoying compile warning 2015-02-11 10:52:56 -08:00
uwb
vfio vfio-pci: Fix the check on pci device type in vfio_pci_probe() 2015-01-07 10:29:11 -07:00
vhost Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-02-05 14:33:28 -08:00
video fbdev changes for v3.20 2015-02-11 09:24:30 -08:00
virt
virtio virtio_pci: document why we defer kfree 2015-01-06 16:35:36 +02:00
vlynq
vme
w1
watchdog watchdog: drop owner assignment from platform_drivers 2015-01-21 14:52:34 +01:00
xen SCSI misc on 20150209 2015-02-11 10:28:45 -08:00
zorro
Kconfig drivers/Kconfig: remove duplicate entry for soc 2015-01-25 20:26:42 +08:00
Makefile drivers: Move iommu/ before gpu/ in Makefile 2014-12-22 11:47:37 +02:00