mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2025-01-23 20:19:57 +07:00
5a28fc94c9
This is a bit of a mess, to put it mildly. But, it's a bug that only seems to have showed up in 4.20 but wasn't noticed until now, because nobody uses MPX. MPX has the arch_unmap() hook inside of munmap() because MPX uses bounds tables that protect other areas of memory. When memory is unmapped, there is also a need to unmap the MPX bounds tables. Barring this, unused bounds tables can eat 80% of the address space. But, the recursive do_munmap() that gets called vi arch_unmap() wreaks havoc with __do_munmap()'s state. It can result in freeing populated page tables, accessing bogus VMA state, double-freed VMAs and more. See the "long story" further below for the gory details. To fix this, call arch_unmap() before __do_unmap() has a chance to do anything meaningful. Also, remove the 'vma' argument and force the MPX code to do its own, independent VMA lookup. == UML / unicore32 impact == Remove unused 'vma' argument to arch_unmap(). No functional change. I compile tested this on UML but not unicore32. == powerpc impact == powerpc uses arch_unmap() well to watch for munmap() on the VDSO and zeroes out 'current->mm->context.vdso_base'. Moving arch_unmap() makes this happen earlier in __do_munmap(). But, 'vdso_base' seems to only be used in perf and in the signal delivery that happens near the return to userspace. I can not find any likely impact to powerpc, other than the zeroing happening a little earlier. powerpc does not use the 'vma' argument and is unaffected by its removal. I compile-tested a 64-bit powerpc defconfig. == x86 impact == For the common success case this is functionally identical to what was there before. For the munmap() failure case, it's possible that some MPX tables will be zapped for memory that continues to be in use. But, this is an extraordinarily unlikely scenario and the harm would be that MPX provides no protection since the bounds table got reset (zeroed). I can't imagine anyone doing this: ptr = mmap(); // use ptr ret = munmap(ptr); if (ret) // oh, there was an error, I'll // keep using ptr. Because if you're doing munmap(), you are *done* with the memory. There's probably no good data in there _anyway_. This passes the original reproducer from Richard Biener as well as the existing mpx selftests/. The long story: munmap() has a couple of pieces: 1. Find the affected VMA(s) 2. Split the start/end one(s) if neceesary 3. Pull the VMAs out of the rbtree 4. Actually zap the memory via unmap_region(), including freeing page tables (or queueing them to be freed). 5. Fix up some of the accounting (like fput()) and actually free the VMA itself. This specific ordering was actually introduced by:dd2283f260
("mm: mmap: zap pages with read mmap_sem in munmap") during the 4.20 merge window. The previous __do_munmap() code was actually safe because the only thing after arch_unmap() was remove_vma_list(). arch_unmap() could not see 'vma' in the rbtree because it was detached, so it is not even capable of doing operations unsafe for remove_vma_list()'s use of 'vma'. Richard Biener reported a test that shows this in dmesg: [1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551 [1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576 What triggered this was the recursive do_munmap() called via arch_unmap(). It was freeing page tables that has not been properly zapped. But, the problem was bigger than this. For one, arch_unmap() can free VMAs. But, the calling __do_munmap() has variables that *point* to VMAs and obviously can't handle them just getting freed while the pointer is still in use. I tried a couple of things here. First, I tried to fix the page table freeing problem in isolation, but I then found the VMA issue. I also tried having the MPX code return a flag if it modified the rbtree which would force __do_munmap() to re-walk to restart. That spiralled out of control in complexity pretty fast. Just moving arch_unmap() and accepting that the bonkers failure case might eat some bounds tables seems like the simplest viable fix. This was also reported in the following kernel bugzilla entry: https://bugzilla.kernel.org/show_bug.cgi?id=203123 There are some reports that this commit triggered this bug:dd2283f260
("mm: mmap: zap pages with read mmap_sem in munmap") While that commit certainly made the issues easier to hit, I believe the fundamental issue has been with us as long as MPX itself, thus the Fixes: tag below is for one of the original MPX commits. [ mingo: Minor edits to the changelog and the patch. ] Reported-by: Richard Biener <rguenther@suse.de> Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rik van Riel <riel@surriel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: linux-arch@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-um@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: stable@vger.kernel.org Fixes:dd2283f260
("mm: mmap: zap pages with read mmap_sem in munmap") Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
117 lines
3.1 KiB
C
117 lines
3.1 KiB
C
/* SPDX-License-Identifier: GPL-2.0 */
|
|
#ifndef _ASM_X86_MPX_H
|
|
#define _ASM_X86_MPX_H
|
|
|
|
#include <linux/types.h>
|
|
#include <linux/mm_types.h>
|
|
|
|
#include <asm/ptrace.h>
|
|
#include <asm/insn.h>
|
|
|
|
/*
|
|
* NULL is theoretically a valid place to put the bounds
|
|
* directory, so point this at an invalid address.
|
|
*/
|
|
#define MPX_INVALID_BOUNDS_DIR ((void __user *)-1)
|
|
#define MPX_BNDCFG_ENABLE_FLAG 0x1
|
|
#define MPX_BD_ENTRY_VALID_FLAG 0x1
|
|
|
|
/*
|
|
* The upper 28 bits [47:20] of the virtual address in 64-bit
|
|
* are used to index into bounds directory (BD).
|
|
*
|
|
* The directory is 2G (2^31) in size, and with 8-byte entries
|
|
* it has 2^28 entries.
|
|
*/
|
|
#define MPX_BD_SIZE_BYTES_64 (1UL<<31)
|
|
#define MPX_BD_ENTRY_BYTES_64 8
|
|
#define MPX_BD_NR_ENTRIES_64 (MPX_BD_SIZE_BYTES_64/MPX_BD_ENTRY_BYTES_64)
|
|
|
|
/*
|
|
* The 32-bit directory is 4MB (2^22) in size, and with 4-byte
|
|
* entries it has 2^20 entries.
|
|
*/
|
|
#define MPX_BD_SIZE_BYTES_32 (1UL<<22)
|
|
#define MPX_BD_ENTRY_BYTES_32 4
|
|
#define MPX_BD_NR_ENTRIES_32 (MPX_BD_SIZE_BYTES_32/MPX_BD_ENTRY_BYTES_32)
|
|
|
|
/*
|
|
* A 64-bit table is 4MB total in size, and an entry is
|
|
* 4 64-bit pointers in size.
|
|
*/
|
|
#define MPX_BT_SIZE_BYTES_64 (1UL<<22)
|
|
#define MPX_BT_ENTRY_BYTES_64 32
|
|
#define MPX_BT_NR_ENTRIES_64 (MPX_BT_SIZE_BYTES_64/MPX_BT_ENTRY_BYTES_64)
|
|
|
|
/*
|
|
* A 32-bit table is 16kB total in size, and an entry is
|
|
* 4 32-bit pointers in size.
|
|
*/
|
|
#define MPX_BT_SIZE_BYTES_32 (1UL<<14)
|
|
#define MPX_BT_ENTRY_BYTES_32 16
|
|
#define MPX_BT_NR_ENTRIES_32 (MPX_BT_SIZE_BYTES_32/MPX_BT_ENTRY_BYTES_32)
|
|
|
|
#define MPX_BNDSTA_TAIL 2
|
|
#define MPX_BNDCFG_TAIL 12
|
|
#define MPX_BNDSTA_ADDR_MASK (~((1UL<<MPX_BNDSTA_TAIL)-1))
|
|
#define MPX_BNDCFG_ADDR_MASK (~((1UL<<MPX_BNDCFG_TAIL)-1))
|
|
#define MPX_BNDSTA_ERROR_CODE 0x3
|
|
|
|
struct mpx_fault_info {
|
|
void __user *addr;
|
|
void __user *lower;
|
|
void __user *upper;
|
|
};
|
|
|
|
#ifdef CONFIG_X86_INTEL_MPX
|
|
|
|
extern int mpx_fault_info(struct mpx_fault_info *info, struct pt_regs *regs);
|
|
extern int mpx_handle_bd_fault(void);
|
|
|
|
static inline int kernel_managing_mpx_tables(struct mm_struct *mm)
|
|
{
|
|
return (mm->context.bd_addr != MPX_INVALID_BOUNDS_DIR);
|
|
}
|
|
|
|
static inline void mpx_mm_init(struct mm_struct *mm)
|
|
{
|
|
/*
|
|
* NULL is theoretically a valid place to put the bounds
|
|
* directory, so point this at an invalid address.
|
|
*/
|
|
mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR;
|
|
}
|
|
|
|
extern void mpx_notify_unmap(struct mm_struct *mm, unsigned long start, unsigned long end);
|
|
extern unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len, unsigned long flags);
|
|
|
|
#else
|
|
static inline int mpx_fault_info(struct mpx_fault_info *info, struct pt_regs *regs)
|
|
{
|
|
return -EINVAL;
|
|
}
|
|
static inline int mpx_handle_bd_fault(void)
|
|
{
|
|
return -EINVAL;
|
|
}
|
|
static inline int kernel_managing_mpx_tables(struct mm_struct *mm)
|
|
{
|
|
return 0;
|
|
}
|
|
static inline void mpx_mm_init(struct mm_struct *mm)
|
|
{
|
|
}
|
|
static inline void mpx_notify_unmap(struct mm_struct *mm,
|
|
unsigned long start, unsigned long end)
|
|
{
|
|
}
|
|
|
|
static inline unsigned long mpx_unmapped_area_check(unsigned long addr,
|
|
unsigned long len, unsigned long flags)
|
|
{
|
|
return addr;
|
|
}
|
|
#endif /* CONFIG_X86_INTEL_MPX */
|
|
|
|
#endif /* _ASM_X86_MPX_H */
|