linux_dsm_epyc7002/arch
Sebastian Andrzej Siewior 5cf0791da5 x86/mm: Disable preemption during CR3 read+write
There's a subtle preemption race on UP kernels:

Usually current->mm (and therefore mm->pgd) stays the same during the
lifetime of a task so it does not matter if a task gets preempted during
the read and write of the CR3.

But then, there is this scenario on x86-UP:

TaskA is in do_exit() and exit_mm() sets current->mm = NULL followed by:

 -> mmput()
 -> exit_mmap()
 -> tlb_finish_mmu()
 -> tlb_flush_mmu()
 -> tlb_flush_mmu_tlbonly()
 -> tlb_flush()
 -> flush_tlb_mm_range()
 -> __flush_tlb_up()
 -> __flush_tlb()
 ->  __native_flush_tlb()

At this point current->mm is NULL but current->active_mm still points to
the "old" mm.

Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its
own mm so CR3 has changed.

Now preempt back to taskA. TaskA has no ->mm set so it borrows taskB's
mm and so CR3 remains unchanged. Once taskA gets active it continues
where it was interrupted and that means it writes its old CR3 value
back. Everything is fine because userland won't need its memory
anymore.

Now the fun part:

Let's preempt taskA one more time and get back to taskB. This
time switch_mm() won't do a thing because oldmm (->active_mm)
is the same as mm (as per context_switch()). So we remain
with a bad CR3 / PGD and return to userland.

The next thing that happens is handle_mm_fault() with an address for
the execution of its code in userland. handle_mm_fault() realizes that
it has a PTE with proper rights so it returns doing nothing. But the
CPU looks at the wrong PGD and insists that something is wrong and
faults again. And again. And one more time…

This pagefault circle continues until the scheduler gets tired of it and
puts another task on the CPU. It gets little difficult if the task is a
RT task with a high priority. The system will either freeze or it gets
fixed by the software watchdog thread which usually runs at RT-max prio.
But waiting for the watchdog will increase the latency of the RT task
which is no good.

Fix this by disabling preemption across the critical code section.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de
[ Prettified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-10 15:37:16 +02:00
..
alpha RTC for 4.8 2016-08-05 09:48:22 -04:00
arc dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
arm Implements HARDENED_USERCOPY verification of copy_to_user/copy_from_user 2016-08-08 14:48:14 -07:00
arm64 Implements HARDENED_USERCOPY verification of copy_to_user/copy_from_user 2016-08-08 14:48:14 -07:00
avr32 dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
blackfin dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
c6x dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
cris dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
frv RTC for 4.8 2016-08-05 09:48:22 -04:00
h8300 RTC for 4.8 2016-08-05 09:48:22 -04:00
hexagon dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
ia64 Implements HARDENED_USERCOPY verification of copy_to_user/copy_from_user 2016-08-08 14:48:14 -07:00
m32r mm: do not pass mm_struct into handle_mm_fault 2016-07-26 16:19:19 -07:00
m68k RTC for 4.8 2016-08-05 09:48:22 -04:00
metag Metag architecture changes for v4.8 2016-08-05 08:58:00 -04:00
microblaze dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
mips Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2016-08-06 09:13:11 -04:00
mn10300 RTC for 4.8 2016-08-05 09:48:22 -04:00
nios2 dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
openrisc dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
parisc RTC for 4.8 2016-08-05 09:48:22 -04:00
powerpc Implements HARDENED_USERCOPY verification of copy_to_user/copy_from_user 2016-08-08 14:48:14 -07:00
s390 Implements HARDENED_USERCOPY verification of copy_to_user/copy_from_user 2016-08-08 14:48:14 -07:00
score treewide: replace obsolete _refok by __ref 2016-08-02 17:31:41 -04:00
sh These changes improve device tree support (including builtin DTB), add 2016-08-06 09:00:05 -04:00
sparc Implements HARDENED_USERCOPY verification of copy_to_user/copy_from_user 2016-08-08 14:48:14 -07:00
tile tile: support static_key usage in non-module __exit sections 2016-08-04 08:50:07 -04:00
um Merge branch 'for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml 2016-08-04 19:37:59 -04:00
unicore32 dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
x86 x86/mm: Disable preemption during CR3 read+write 2016-08-10 15:37:16 +02:00
xtensa dma-mapping: use unsigned long for dma_attrs 2016-08-04 08:50:07 -04:00
.gitignore
Kconfig Implements HARDENED_USERCOPY verification of copy_to_user/copy_from_user 2016-08-08 14:48:14 -07:00