linux_dsm_epyc7002/arch/riscv/mm/fault.c
Linus Torvalds 413879a10b RISC-V changes for 4.16
This tag contains the fixes we'd like to target for the 4.16 merge
 window.  It's not as much as I was originally hoping to do but between
 glibc, the chip, and FOSDEM there just wasn't enough time to get
 everything put together.  As such, this merge window is essentially just
 going to be small changes.  This includes mostly cleanups:
 
 * A build fix failure to the audit test cases.  RISC-V doesn't have
   renameat because the generic syscall ABI moved to renameat2 by the
   time of our port.  The syscall audit test cases don't understand this,
   so I added a trivial fix.  This went through mailing list review
   during the 4.15 merge window, but nobody has picked it up so I think
   it's best to just do this here.
 * The removal of our command-line argument processing code.  The
   "mem_end" stuff was broken and the rest duplicated generic device tree
   code.  The generic code was already being called.
 * Some unused/redundant code has been removed, including
   __ARCH_HAVE_MMU, current_pgdir, and the initialization of init_mm.pgd.
 * SUM is disabled upon taking a trap, which means that user memory is
   protected during traps taking inside copy_{to,from}_user().
 * The sptbr CSR has been renamed to satp in C code.  We haven't changed
   the assembly code in order to maintain compatibility with binutils
   2.29, which doesn't understand the new name.
 
 Additionally, we're adding some new features:
 
 * Basic ftrace support, thanks to Alan Kao!
 * Support for ZONE_DMA32.  This is necessary for all the normal reasons,
   but also to deal with a deficiency in the Xilinx PCIe controller we're
   using on our FPGA-based systems.  While the ZONE_DMA32 addition should
   be sufficient for most uses, it doesn't complete the fix for the
   Xilinx controller.
 * TLB shootdowns now only target the harts where they're necessary,
   instead of applying to all harts in the system.
 
 These patches have all been sitting on our linux-next branch for a while
 now.  Due to time constraints this is all I feel comfortable submitting
 during the 4.16 merge window, hopefully we'll do better next time!
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEAM520YNJYN/OiG3470yhUCzLq0EFAlp7N2gTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRDvTKFQLMurQX8kD/4xxw6TuuESmDXxAQPQ+S8J98uKRfAF
 9kMMzJJARcW5sT1vo3pKpE8+Ss0Hy2fIcaYsw5Je/Yl7vdAy/Dk7X3/mx7mxf5BP
 8m2cSd7DFLLLhntZTbr1Y5fJ6awFLtzI46zn/SzTdTatLWKXNLS5wmPKE33ddq/C
 iTi4k/as8E/vuNtuPy1GsOF0gICpZ2xB4YoMwTgWfpxTekBkUktO3EOHmZTwQEEM
 U1muB+4WoqusbBt6cP3Q7cUF3b6aMVSevWnywZGkD+yWOGRXTVzMgT7R4YlKEOre
 OQypZocYUbRmZQMZACKpgHIcOZpePaSTIQ2zzhXEPVGB0XAHtMRnAaVtwPxwG6c4
 EThDCN9ldShutKqT4XilHrh5gf0sy7qG0PIidPhMmXH9LCeTSAU4VdISJP1jkq19
 chiMHlf6+/DhikyiH0+lK/MX8vQMt6UJL1SlRKO/c2FxxKAZKnENJ+tuAlkAlwoC
 gnvZsE5BUYw1ptRHXR0d5C4m8M2M9LPZfpWYcg+1mRO9EA+kt0XCupL7RsrdFuoa
 FCVEhP/JMaiX0JtmAHfVIU0yNGjH3b5xi3FoGk2Aoj/c8O3F5YcwT5C5nO+jpv32
 n9vyMR20/721+yA2dFIlq4DnelwdZczOTqrcDYJrLxXzk8OXUFFffbe4kbDCxp34
 WniBxwnY9BF25g==
 =cNRH
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-4.16-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux

Pull RISC-V updates from Palmer Dabbelt:
 "This contains the fixes we'd like to target for the 4.16 merge window.
  It's not as much as I was originally hoping to do but between glibc,
  the chip, and FOSDEM there just wasn't enough time to get everything
  put together. As such, this merge window is essentially just going to
  be small changes. This includes mostly cleanups:

   - A build fix failure to the audit test cases.

     RISC-V doesn't have renameat because the generic syscall ABI moved
     to renameat2 by the time of our port. The syscall audit test cases
     don't understand this, so I added a trivial fix. This went through
     mailing list review during the 4.15 merge window, but nobody has
     picked it up so I think it's best to just do this here.

   - The removal of our command-line argument processing code. The
     "mem_end" stuff was broken and the rest duplicated generic device
     tree code. The generic code was already being called.

   - Some unused/redundant code has been removed, including
     __ARCH_HAVE_MMU, current_pgdir, and the initialization of
     init_mm.pgd.

   - SUM is disabled upon taking a trap, which means that user memory is
     protected during traps taking inside copy_{to,from}_user().

   - The sptbr CSR has been renamed to satp in C code. We haven't
     changed the assembly code in order to maintain compatibility with
     binutils 2.29, which doesn't understand the new name.

  Additionally, we're adding some new features:

   - Basic ftrace support, thanks to Alan Kao!

   - Support for ZONE_DMA32.

     This is necessary for all the normal reasons, but also to deal with
     a deficiency in the Xilinx PCIe controller we're using on our
     FPGA-based systems. While the ZONE_DMA32 addition should be
     sufficient for most uses, it doesn't complete the fix for the
     Xilinx controller.

   - TLB shootdowns now only target the harts where they're necessary,
     instead of applying to all harts in the system.

  These patches have all been sitting on our linux-next branch for a
  while now. Due to time constraints this is all I feel comfortable
  submitting during the 4.16 merge window, hopefully we'll do better
  next time!"

[ Note to self: "harts" is RISC-V speak for "hardware threads".  I had
  to look that up.    - Linus ]

* tag 'riscv-for-linus-4.16-merge_window' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
  riscv: inline set_pgdir into its only caller
  riscv: rename sptbr to satp
  riscv: don't read back satp in paging_init
  riscv: remove the unused current_pgdir function
  riscv: add ZONE_DMA32
  RISC-V: Limit the scope of TLB shootdowns
  riscv: disable SUM in the exception handler
  riscv: remove redundant unlikely()
  riscv: remove unused __ARCH_HAVE_MMU define
  riscv/ftrace: Add basic support
  RISC-V: Remove mem_end command line processing
  RISC-V: Remove duplicate command-line parsing logic
  audit: Avoid build failures on systems without renameat
2018-02-07 11:33:08 -08:00

286 lines
7.2 KiB
C

/*
* Copyright (C) 2009 Sunplus Core Technology Co., Ltd.
* Lennox Wu <lennox.wu@sunplusct.com>
* Chen Liqin <liqin.chen@sunplusct.com>
* Copyright (C) 2012 Regents of the University of California
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, see the file COPYING, or write
* to the Free Software Foundation, Inc.,
*/
#include <linux/mm.h>
#include <linux/kernel.h>
#include <linux/interrupt.h>
#include <linux/perf_event.h>
#include <linux/signal.h>
#include <linux/uaccess.h>
#include <asm/pgalloc.h>
#include <asm/ptrace.h>
/*
* This routine handles page faults. It determines the address and the
* problem, and then passes it off to one of the appropriate routines.
*/
asmlinkage void do_page_fault(struct pt_regs *regs)
{
struct task_struct *tsk;
struct vm_area_struct *vma;
struct mm_struct *mm;
unsigned long addr, cause;
unsigned int flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE;
int fault, code = SEGV_MAPERR;
cause = regs->scause;
addr = regs->sbadaddr;
tsk = current;
mm = tsk->mm;
/*
* Fault-in kernel-space virtual memory on-demand.
* The 'reference' page table is init_mm.pgd.
*
* NOTE! We MUST NOT take any locks for this case. We may
* be in an interrupt or a critical region, and should
* only copy the information from the master page table,
* nothing more.
*/
if (unlikely((addr >= VMALLOC_START) && (addr <= VMALLOC_END)))
goto vmalloc_fault;
/* Enable interrupts if they were enabled in the parent context. */
if (likely(regs->sstatus & SR_SPIE))
local_irq_enable();
/*
* If we're in an interrupt, have no user context, or are running
* in an atomic region, then we must not take the fault.
*/
if (unlikely(faulthandler_disabled() || !mm))
goto no_context;
if (user_mode(regs))
flags |= FAULT_FLAG_USER;
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, addr);
retry:
down_read(&mm->mmap_sem);
vma = find_vma(mm, addr);
if (unlikely(!vma))
goto bad_area;
if (likely(vma->vm_start <= addr))
goto good_area;
if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
goto bad_area;
if (unlikely(expand_stack(vma, addr)))
goto bad_area;
/*
* Ok, we have a good vm_area for this memory access, so
* we can handle it.
*/
good_area:
code = SEGV_ACCERR;
switch (cause) {
case EXC_INST_PAGE_FAULT:
if (!(vma->vm_flags & VM_EXEC))
goto bad_area;
break;
case EXC_LOAD_PAGE_FAULT:
if (!(vma->vm_flags & VM_READ))
goto bad_area;
break;
case EXC_STORE_PAGE_FAULT:
if (!(vma->vm_flags & VM_WRITE))
goto bad_area;
flags |= FAULT_FLAG_WRITE;
break;
default:
panic("%s: unhandled cause %lu", __func__, cause);
}
/*
* If for any reason at all we could not handle the fault,
* make sure we exit gracefully rather than endlessly redo
* the fault.
*/
fault = handle_mm_fault(vma, addr, flags);
/*
* If we need to retry but a fatal signal is pending, handle the
* signal first. We do not need to release the mmap_sem because it
* would already be released in __lock_page_or_retry in mm/filemap.c.
*/
if ((fault & VM_FAULT_RETRY) && fatal_signal_pending(tsk))
return;
if (unlikely(fault & VM_FAULT_ERROR)) {
if (fault & VM_FAULT_OOM)
goto out_of_memory;
else if (fault & VM_FAULT_SIGBUS)
goto do_sigbus;
BUG();
}
/*
* Major/minor page fault accounting is only done on the
* initial attempt. If we go through a retry, it is extremely
* likely that the page will be found in page cache at that point.
*/
if (flags & FAULT_FLAG_ALLOW_RETRY) {
if (fault & VM_FAULT_MAJOR) {
tsk->maj_flt++;
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ,
1, regs, addr);
} else {
tsk->min_flt++;
perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN,
1, regs, addr);
}
if (fault & VM_FAULT_RETRY) {
/*
* Clear FAULT_FLAG_ALLOW_RETRY to avoid any risk
* of starvation.
*/
flags &= ~(FAULT_FLAG_ALLOW_RETRY);
flags |= FAULT_FLAG_TRIED;
/*
* No need to up_read(&mm->mmap_sem) as we would
* have already released it in __lock_page_or_retry
* in mm/filemap.c.
*/
goto retry;
}
}
up_read(&mm->mmap_sem);
return;
/*
* Something tried to access memory that isn't in our memory map.
* Fix it, but check if it's kernel or user first.
*/
bad_area:
up_read(&mm->mmap_sem);
/* User mode accesses just cause a SIGSEGV */
if (user_mode(regs)) {
do_trap(regs, SIGSEGV, code, addr, tsk);
return;
}
no_context:
/* Are we prepared to handle this kernel fault? */
if (fixup_exception(regs))
return;
/*
* Oops. The kernel tried to access some bad page. We'll have to
* terminate things with extreme prejudice.
*/
bust_spinlocks(1);
pr_alert("Unable to handle kernel %s at virtual address " REG_FMT "\n",
(addr < PAGE_SIZE) ? "NULL pointer dereference" :
"paging request", addr);
die(regs, "Oops");
do_exit(SIGKILL);
/*
* We ran out of memory, call the OOM killer, and return the userspace
* (which will retry the fault, or kill us if we got oom-killed).
*/
out_of_memory:
up_read(&mm->mmap_sem);
if (!user_mode(regs))
goto no_context;
pagefault_out_of_memory();
return;
do_sigbus:
up_read(&mm->mmap_sem);
/* Kernel mode? Handle exceptions or die */
if (!user_mode(regs))
goto no_context;
do_trap(regs, SIGBUS, BUS_ADRERR, addr, tsk);
return;
vmalloc_fault:
{
pgd_t *pgd, *pgd_k;
pud_t *pud, *pud_k;
p4d_t *p4d, *p4d_k;
pmd_t *pmd, *pmd_k;
pte_t *pte_k;
int index;
if (user_mode(regs))
goto bad_area;
/*
* Synchronize this task's top level page-table
* with the 'reference' page table.
*
* Do _not_ use "tsk->active_mm->pgd" here.
* We might be inside an interrupt in the middle
* of a task switch.
*
* Note: Use the old spbtr name instead of using the current
* satp name to support binutils 2.29 which doesn't know about
* the privileged ISA 1.10 yet.
*/
index = pgd_index(addr);
pgd = (pgd_t *)pfn_to_virt(csr_read(sptbr)) + index;
pgd_k = init_mm.pgd + index;
if (!pgd_present(*pgd_k))
goto no_context;
set_pgd(pgd, *pgd_k);
p4d = p4d_offset(pgd, addr);
p4d_k = p4d_offset(pgd_k, addr);
if (!p4d_present(*p4d_k))
goto no_context;
pud = pud_offset(p4d, addr);
pud_k = pud_offset(p4d_k, addr);
if (!pud_present(*pud_k))
goto no_context;
/*
* Since the vmalloc area is global, it is unnecessary
* to copy individual PTEs
*/
pmd = pmd_offset(pud, addr);
pmd_k = pmd_offset(pud_k, addr);
if (!pmd_present(*pmd_k))
goto no_context;
set_pmd(pmd, *pmd_k);
/*
* Make sure the actual PTE exists as well to
* catch kernel vmalloc-area accesses to non-mapped
* addresses. If we don't do this, this will just
* silently loop forever.
*/
pte_k = pte_offset_kernel(pmd_k, addr);
if (!pte_present(*pte_k))
goto no_context;
return;
}
}