linux_dsm_epyc7002/arch/x86/mm
Toshi Kani f4eafd8bcd x86/mm: Fix vmalloc_fault() to handle large pages properly
A kernel page fault oops with the callstack below was observed
when a read syscall was made to a pmem device after a huge amount
(>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64
system:

     BUG: unable to handle kernel paging request at ffff880840000ff8
     IP: vmalloc_fault+0x1be/0x300
     PGD c7f03a067 PUD 0
     Oops: 0000 [#1] SM
     Call Trace:
        __do_page_fault+0x285/0x3e0
        do_page_fault+0x2f/0x80
        ? put_prev_entity+0x35/0x7a0
        page_fault+0x28/0x30
        ? memcpy_erms+0x6/0x10
        ? schedule+0x35/0x80
        ? pmem_rw_bytes+0x6a/0x190 [nd_pmem]
        ? schedule_timeout+0x183/0x240
        btt_log_read+0x63/0x140 [nd_btt]
         :
        ? __symbol_put+0x60/0x60
        ? kernel_read+0x50/0x80
        SyS_finit_module+0xb9/0xf0
        entry_SYSCALL_64_fastpath+0x1a/0xa4

Since v4.1, ioremap() supports large page (pud/pmd) mappings in
x86_64 and PAE.  vmalloc_fault() however assumes that the vmalloc
range is limited to pte mappings.

vmalloc faults do not normally happen in ioremap'd ranges since
ioremap() sets up the kernel page tables, which are shared by
user processes.  pgd_ctor() sets the kernel's PGD entries to
user's during fork().  When allocation of the vmalloc ranges
crosses a 512GB boundary, ioremap() allocates a new pud table
and updates the kernel PGD entry to point it.  If user process's
PGD entry does not have this update yet, a read/write syscall
to the range will cause a vmalloc fault, which hits the Oops
above as it does not handle a large page properly.

Following changes are made to vmalloc_fault().

64-bit:

 - No change for the PGD sync operation as it handles large
   pages already.
 - Add pud_huge() and pmd_huge() to the validation code to
   handle large pages.
 - Change pud_page_vaddr() to pud_pfn() since an ioremap range
   is not directly mapped (while the if-statement still works
   with a bogus addr).
 - Change pmd_page() to pmd_pfn() since an ioremap range is not
   backed by struct page (while the if-statement still works
   with a bogus addr).

32-bit:
 - No change for the sync operation since the index3 PGD entry
   covers the entire vmalloc range, which is always valid.
   (A separate change to sync PGD entry is necessary if this
    memory layout is changed regardless of the page size.)
 - Add pmd_huge() to the validation code to handle large pages.
   This is for completeness since vmalloc_fault() won't happen
   in ioremap'd ranges as its PGD entry is always valid.

Reported-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: <stable@vger.kernel.org> # 4.1+
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-02-18 09:02:59 +01:00
..
kmemcheck x86: Replace __get_cpu_var uses 2014-08-26 13:45:49 -04:00
amdtopology.c x86/mm/numa: Simplify some bit mangling 2013-04-10 19:06:26 +02:00
debug_pagetables.c x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-only 2015-12-04 12:55:01 +01:00
dump_pagetables.c Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-01-11 17:16:01 -08:00
extable.c x86, extable: Switch to relative exception table entries 2012-04-20 17:22:34 -07:00
fault.c x86/mm: Fix vmalloc_fault() to handle large pages properly 2016-02-18 09:02:59 +01:00
gup.c mm, x86: get_user_pages() for dax mappings 2016-01-15 17:56:32 -08:00
highmem_32.c kmap_atomic_to_page() has no users, remove it 2015-11-09 15:11:24 -08:00
hugetlbpage.c mm, hugetlb: don't require CMA for runtime gigantic pages 2016-02-05 18:10:40 -08:00
init_32.c x86/mm: Warn on W^X mappings 2015-10-06 11:11:48 +02:00
init_64.c x86, mm: introduce vmem_altmap to augment vmemmap_populate() 2016-01-15 17:56:32 -08:00
init.c libnvdimm for 4.4: 2015-11-10 12:07:22 -08:00
iomap_32.c Merge branch 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-06-22 17:59:09 -07:00
ioremap.c x86/mm: Drop WARN from multi-BAR check 2015-12-29 12:34:38 +01:00
kasan_init_64.c kasan: update log messages 2015-11-05 19:34:48 -08:00
kmmio.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
Makefile x86/mm: Turn CONFIG_X86_PTDUMP into a module 2015-11-23 10:50:13 +01:00
mm_internal.h x86: Enable PAT to use cache mode translation tables 2014-11-16 11:04:26 +01:00
mmap.c x86: mm: support ARCH_MMAP_RND_BITS 2016-01-14 16:00:49 -08:00
mmio-mod.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
mpx.c x86/mpx: Fix instruction decoder condition 2015-12-05 18:52:14 +01:00
numa_32.c x86: Fix the initialization of physnode_map 2014-02-01 22:15:51 -08:00
numa_64.c x86, mm: kill numa_free_all_bootmem() 2012-11-17 11:59:47 -08:00
numa_emulation.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
numa_internal.h x86-32, mm: Rip out x86_32 NUMA remapping code 2013-01-31 14:12:30 -08:00
numa.c x86/mm/numa: Fix 32-bit memblock range truncation bug on 32-bit NUMA kernels 2016-02-08 12:10:03 +01:00
pageattr-test.c x86/mm/pat: Make mm/pageattr[-test].c explicitly non-modular 2015-08-25 09:48:38 +02:00
pageattr.c x86/mm/pat: Avoid truncation when converting cpa->numpages to address 2016-01-29 15:03:09 +01:00
pat_internal.h x86/mm/pat: Convert to pr_*() usage 2015-05-27 14:40:59 +02:00
pat_rbtree.c x86/mm/pat: Change free_memtype() to support shrinking case 2016-01-05 11:10:23 +01:00
pat.c mm, dax: convert vmf_insert_pfn_pmd() to pfn_t 2016-01-15 17:56:32 -08:00
pf_in.c x86: Eliminate various 'set but not used' warnings 2011-05-21 19:10:33 +02:00
pf_in.h x86 mmiotrace: move files into arch/x86/mm/. 2008-05-24 11:25:37 +02:00
pgtable_32.c x86: Remove set_pmd_pfn 2014-09-01 10:15:31 +02:00
pgtable.c x86, thp: remove infrastructure for handling splitting PMDs 2016-01-15 17:56:32 -08:00
physaddr.c x86, mm: Make DEBUG_VIRTUAL work earlier in boot 2013-01-25 16:33:22 -08:00
physaddr.h x86: split __phys_addr out into separate file 2009-09-10 11:48:55 -07:00
setup_nx.c x86/cpufeature: Remove unused and seldomly used cpu_has_xx macros 2015-12-19 11:49:55 +01:00
srat.c x86/mm: Introduce max_possible_pfn 2015-12-06 12:46:31 +01:00
testmmiotrace.c x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages 2010-06-18 11:30:09 +02:00
tlb.c x86/mm: Add barriers and document switch_mm()-vs-flush synchronization 2016-01-11 12:03:15 +01:00