linux_dsm_epyc7002/arch
Michael Ellerman 09567e7fd4 powerpc/mm: Check paca psize is up to date for huge mappings
We have a bug in our hugepage handling which exhibits as an infinite
loop of hash faults. If the fault is being taken in the kernel it will
typically trigger the softlockup detector, or the RCU stall detector.

The bug is as follows:

 1. mmap(0xa0000000, ..., MAP_FIXED | MAP_HUGE_TLB | MAP_ANONYMOUS ..)
 2. Slice code converts the slice psize to 16M.
 3. The code on lines 539-540 of slice.c in slice_get_unmapped_area()
    synchronises the mm->context with the paca->context. So the paca slice
    mask is updated to include the 16M slice.
 3. Either:
    * mmap() fails because there are no huge pages available.
    * mmap() succeeds and the mapping is then munmapped.
    In both cases the slice psize remains at 16M in both the paca & mm.
 4. mmap(0xa0000000, ..., MAP_FIXED | MAP_ANONYMOUS ..)
 5. The slice psize is converted back to 64K. Because of the check on line 539
    of slice.c we DO NOT update the paca->context. The paca slice mask is now
    out of sync with the mm slice mask.
 6. User/kernel accesses 0xa0000000.
 7. The SLB miss handler slb_allocate_realmode() **uses the paca slice mask**
    to create an SLB entry and inserts it in the SLB.
18. With the 16M SLB entry in place the hardware does a hash lookup, no entry
    is found so a data access exception is generated.
19. The data access handler calls do_page_fault() -> handle_mm_fault().
10. __handle_mm_fault() creates a THP mapping with do_huge_pmd_anonymous_page().
11. The hardware retries the access, there is still nothing in the hash table
    so once again a data access exception is generated.
12. hash_page() calls into __hash_page_thp() and inserts a mapping in the
    hash. Although the THP mapping maps 16M the hashing is done using 64K
    as the segment page size.
13. hash_page() returns immediately after calling __hash_page_thp(), skipping
    over the code at line 1125. Resulting in the mismatch between the
    paca->context and mm->context not being detected.
14. The hardware retries the access, the hash it generates using the 16M
    SLB entry does NOT match the hash we inserted.
15. We take another data access and go into __hash_page_thp().
16. We see a valid entry in the hpte_slot_array and so we call updatepp()
    which succeeds.
17. Goto 14.

We could fix this in two ways. The first would be to remove or modify
the check on line 539 of slice.c.

The second option is to cause the check of paca psize in hash_page() on
line 1125 to also be done for THP pages.

We prefer the latter, because the check & update of the paca psize is
not done until we know it's necessary. It's also done only on the
current cpu, so we don't need to IPI all other cpus.

Without further rearranging the code, the simplest fix is to pull out
the code that checks paca psize and call it in two places. Firstly for
THP/hugetlb, and secondly for other mappings as before.

Thanks to Dave Jones for trinity, which originally found this bug.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: stable@vger.kernel.org [v3.11+]
2014-06-06 13:54:26 +10:00
..
alpha Merge git://git.infradead.org/users/eparis/audit 2014-04-12 12:38:53 -07:00
arc ARC: Delete stale barrier.h 2014-04-18 13:49:15 -07:00
arm Shiraz has moved 2014-04-18 16:40:08 -07:00
arm64 - Documentation clarification on CPU topology and booting requirements 2014-04-08 12:06:03 -07:00
avr32 Char/Misc driver patches for 3.15-rc1 2014-04-01 16:13:21 -07:00
blackfin blackfin updates for Linux 3.15 2014-04-12 17:26:45 -07:00
c6x Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-03-31 10:59:39 -07:00
cris Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
frv PCI changes for the v3.15 merge window: 2014-04-01 15:14:04 -07:00
hexagon Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
ia64 Small workaround for a rare, but annoying, erratum 2014-04-16 11:22:45 -07:00
m32r Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
m68k Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
metag Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
microblaze Microblaze patches for 3.15-rc1 2014-04-11 11:53:45 -07:00
mips mips: export flush_icache_range 2014-04-18 16:40:09 -07:00
mn10300 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
openrisc Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
parisc Merge branch 'parisc-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 2014-04-17 13:21:35 -07:00
powerpc powerpc/mm: Check paca psize is up to date for huge mappings 2014-06-06 13:54:26 +10:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2014-04-16 11:28:25 -07:00
score score: remove unused CPU_SCORE7 Kconfig parameter 2014-04-03 16:20:52 -07:00
sh Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
sparc Merge git://git.infradead.org/users/eparis/audit 2014-04-12 12:38:53 -07:00
tile Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
um Merge git://git.infradead.org/users/eparis/audit 2014-04-12 12:38:53 -07:00
unicore32 Kconfig: rename HAS_IOPORT to HAS_IOPORT_MAP 2014-04-07 16:36:11 -07:00
x86 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-19 10:41:43 -07:00
xtensa Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
.gitignore
Kconfig uprobes: Kconfig dependency fix 2014-03-18 16:39:33 -04:00