linux_dsm_epyc7002/include
Michal Hocko 6b31d5955c mm, oom: fix potential data corruption when oom_reaper races with writer
Wenwei Tao has noticed that our current assumption that the oom victim
is dying and never doing any visible changes after it dies, and so the
oom_reaper can tear it down, is not entirely true.

__task_will_free_mem consider a task dying when SIGNAL_GROUP_EXIT is set
but do_group_exit sends SIGKILL to all threads _after_ the flag is set.
So there is a race window when some threads won't have
fatal_signal_pending while the oom_reaper could start unmapping the
address space.  Moreover some paths might not check for fatal signals
before each PF/g-u-p/copy_from_user.

We already have a protection for oom_reaper vs.  PF races by checking
MMF_UNSTABLE.  This has been, however, checked only for kernel threads
(use_mm users) which can outlive the oom victim.  A simple fix would be
to extend the current check in handle_mm_fault for all tasks but that
wouldn't be sufficient because the current check assumes that a kernel
thread would bail out after EFAULT from get_user*/copy_from_user and
never re-read the same address which would succeed because the PF path
has established page tables already.  This seems to be the case for the
only existing use_mm user currently (virtio driver) but it is rather
fragile in general.

This is even more fragile in general for more complex paths such as
generic_perform_write which can re-read the same address more times
(e.g.  iov_iter_copy_from_user_atomic to fail and then
iov_iter_fault_in_readable on retry).

Therefore we have to implement MMF_UNSTABLE protection in a robust way
and never make a potentially corrupted content visible.  That requires
to hook deeper into the PF path and check for the flag _every time_
before a pte for anonymous memory is established (that means all
!VM_SHARED mappings).

The corruption can be triggered artificially
(http://lkml.kernel.org/r/201708040646.v746kkhC024636@www262.sakura.ne.jp)
but there doesn't seem to be any real life bug report.  The race window
should be quite tight to trigger most of the time.

Link: http://lkml.kernel.org/r/20170807113839.16695-3-mhocko@kernel.org
Fixes: aac4536355 ("mm, oom: introduce oom reaper")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Wenwei Tao <wenwei.tww@alibaba-inc.com>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-18 15:32:01 -07:00
..
acpi ACPI: NUMA: add missing include in acpi_numa.h 2017-07-24 22:27:43 +02:00
asm-generic mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem 2017-08-10 15:54:07 -07:00
clocksource
crypto
drm i915, amd and some core fixes + mediatek color support 2017-07-13 11:26:18 -07:00
dt-bindings Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-07-15 10:59:54 -07:00
keys
kvm KVM: arm/arm64: PMU: Fix overflow interrupt injection 2017-07-25 14:18:01 +01:00
linux mm, oom: fix potential data corruption when oom_reaper races with writer 2017-08-18 15:32:01 -07:00
math-emu
media media: platform: davinci: drop VPFE_CMD_S_CCDC_RAW_PARAMS 2017-07-26 06:14:33 -04:00
memory
misc cxl: Export library to support IBM XSL 2017-07-03 23:07:03 +10:00
net ipv6: fix NULL dereference in ip6_route_dev_notify() 2017-08-15 17:06:34 -07:00
pcmcia
ras
rdma IB/cma: Fix reference count leak when no ipv4 addresses are set 2017-07-20 11:24:13 -04:00
rxrpc
scsi Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2017-07-13 14:27:32 -07:00
soc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2017-07-05 12:31:59 -07:00
sound ASoC: fix pcm-creation regression 2017-07-17 15:50:32 +01:00
target iscsi-target: Fix iscsi_np reset hung task during parallel delete 2017-08-06 14:41:41 -07:00
trace ext4: remove unused metadata accounting variables 2017-07-30 22:30:11 -04:00
uapi drm/msm: Remove __user from __u64 data types 2017-08-01 19:11:48 -04:00
video
xen xen/balloon: don't online new memory initially 2017-07-23 08:13:18 +02:00