License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 21:07:57 +07:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2008-07-18 11:55:51 +07:00
|
|
|
/*
|
|
|
|
* pgtable.h: SpitFire page table operations.
|
|
|
|
*
|
|
|
|
* Copyright 1996,1997 David S. Miller (davem@caip.rutgers.edu)
|
|
|
|
* Copyright 1997,1998 Jakub Jelinek (jj@sunsite.mff.cuni.cz)
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef _SPARC64_PGTABLE_H
|
|
|
|
#define _SPARC64_PGTABLE_H
|
|
|
|
|
|
|
|
/* This file contains the functions and defines necessary to modify and use
|
|
|
|
* the SpitFire page tables.
|
|
|
|
*/
|
|
|
|
|
2017-03-09 21:24:05 +07:00
|
|
|
#include <asm-generic/5level-fixup.h>
|
2008-07-18 11:55:51 +07:00
|
|
|
#include <linux/compiler.h>
|
|
|
|
#include <linux/const.h>
|
|
|
|
#include <asm/types.h>
|
|
|
|
#include <asm/spitfire.h>
|
|
|
|
#include <asm/asi.h>
|
2018-02-24 05:46:41 +07:00
|
|
|
#include <asm/adi.h>
|
2008-07-18 11:55:51 +07:00
|
|
|
#include <asm/page.h>
|
|
|
|
#include <asm/processor.h>
|
|
|
|
|
|
|
|
/* The kernel image occupies 0x4000000 to 0x6000000 (4MB --> 96MB).
|
|
|
|
* The page copy blockops can use 0x6000000 to 0x8000000.
|
2014-05-08 04:07:32 +07:00
|
|
|
* The 8K TSB is mapped in the 0x8000000 to 0x8400000 range.
|
|
|
|
* The 4M TSB is mapped in the 0x8400000 to 0x8800000 range.
|
2008-07-18 11:55:51 +07:00
|
|
|
* The PROM resides in an area spanning 0xf0000000 to 0x100000000.
|
|
|
|
* The vmalloc area spans 0x100000000 to 0x200000000.
|
|
|
|
* Since modules need to be in the lowest 32-bits of the address space,
|
|
|
|
* we place them right before the OBP area from 0x10000000 to 0xf0000000.
|
|
|
|
* There is a single static kernel PMD which maps from 0x0 to address
|
|
|
|
* 0x400000000.
|
|
|
|
*/
|
|
|
|
#define TLBTEMP_BASE _AC(0x0000000006000000,UL)
|
2014-05-08 04:07:32 +07:00
|
|
|
#define TSBMAP_8K_BASE _AC(0x0000000008000000,UL)
|
|
|
|
#define TSBMAP_4M_BASE _AC(0x0000000008400000,UL)
|
2008-07-18 11:55:51 +07:00
|
|
|
#define MODULES_VADDR _AC(0x0000000010000000,UL)
|
|
|
|
#define MODULES_LEN _AC(0x00000000e0000000,UL)
|
|
|
|
#define MODULES_END _AC(0x00000000f0000000,UL)
|
|
|
|
#define LOW_OBP_ADDRESS _AC(0x00000000f0000000,UL)
|
|
|
|
#define HI_OBP_ADDRESS _AC(0x0000000100000000,UL)
|
|
|
|
#define VMALLOC_START _AC(0x0000000100000000,UL)
|
2014-09-28 01:05:21 +07:00
|
|
|
#define VMEMMAP_BASE VMALLOC_END
|
2008-07-18 11:55:51 +07:00
|
|
|
|
|
|
|
/* PMD_SHIFT determines the size of the area a second-level page
|
|
|
|
* table can map
|
|
|
|
*/
|
sparc64: Move from 4MB to 8MB huge pages.
The impetus for this is that we would like to move to 64-bit PMDs and
PGDs, but that would result in only supporting a 42-bit address space
with the current page table layout. It'd be nice to support at least
43-bits.
The reason we'd end up with only 42-bits after making PMDs and PGDs
64-bit is that we only use half-page sized PTE tables in order to make
PMDs line up to 4MB, the hardware huge page size we use.
So what we do here is we make huge pages 8MB, and fabricate them using
4MB hw TLB entries.
Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in
places that really need to operate on hardware 4MB pages.
Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT,
PGD_SHIFT, and the build time CPP test as needed. Use a CPP test to
make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up.
This makes the pgtable cache completely unused, so remove the code
managing it and the state used in mm_context_t. Now we have less
spinlocks taken in the page table allocation path.
The technique we use to fabricate the 8MB pages is to transfer bit 22
from the missing virtual address into the PTEs physical address field.
That takes care of the transparent huge pages case.
For hugetlb, we fill things in at the PTE level and that code already
puts the sub huge page physical bits into the PTEs, based upon the
offset, so there is nothing special we need to do. It all just works
out.
So, a small amount of complexity in the THP case, but this code is
about to get much simpler when we move the 64-bit PMDs as we can move
away from the fancy 32-bit huge PMD encoding and just put a real PTE
value in there.
With bug fixes and help from Bob Picco.
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26 03:48:49 +07:00
|
|
|
#define PMD_SHIFT (PAGE_SHIFT + (PAGE_SHIFT-3))
|
2008-07-18 11:55:51 +07:00
|
|
|
#define PMD_SIZE (_AC(1,UL) << PMD_SHIFT)
|
|
|
|
#define PMD_MASK (~(PMD_SIZE-1))
|
2013-09-26 04:33:16 +07:00
|
|
|
#define PMD_BITS (PAGE_SHIFT - 3)
|
2008-07-18 11:55:51 +07:00
|
|
|
|
2014-09-27 11:19:46 +07:00
|
|
|
/* PUD_SHIFT determines the size of the area a third-level page
|
|
|
|
* table can map
|
|
|
|
*/
|
|
|
|
#define PUD_SHIFT (PMD_SHIFT + PMD_BITS)
|
|
|
|
#define PUD_SIZE (_AC(1,UL) << PUD_SHIFT)
|
|
|
|
#define PUD_MASK (~(PUD_SIZE-1))
|
|
|
|
#define PUD_BITS (PAGE_SHIFT - 3)
|
|
|
|
|
|
|
|
/* PGDIR_SHIFT determines what a fourth-level page table entry can map */
|
|
|
|
#define PGDIR_SHIFT (PUD_SHIFT + PUD_BITS)
|
2008-07-18 11:55:51 +07:00
|
|
|
#define PGDIR_SIZE (_AC(1,UL) << PGDIR_SHIFT)
|
|
|
|
#define PGDIR_MASK (~(PGDIR_SIZE-1))
|
2013-09-26 04:33:16 +07:00
|
|
|
#define PGDIR_BITS (PAGE_SHIFT - 3)
|
2008-07-18 11:55:51 +07:00
|
|
|
|
2014-09-25 11:49:29 +07:00
|
|
|
#if (MAX_PHYS_ADDRESS_BITS > PGDIR_SHIFT + PGDIR_BITS)
|
|
|
|
#error MAX_PHYS_ADDRESS_BITS exceeds what kernel page tables can support
|
|
|
|
#endif
|
|
|
|
|
2014-09-27 11:19:46 +07:00
|
|
|
#if (PGDIR_SHIFT + PGDIR_BITS) != 53
|
2012-10-09 06:34:20 +07:00
|
|
|
#error Page table parameters do not cover virtual address space properly.
|
|
|
|
#endif
|
|
|
|
|
2012-10-09 06:34:29 +07:00
|
|
|
#if (PMD_SHIFT != HPAGE_SHIFT)
|
|
|
|
#error PMD_SHIFT must equal HPAGE_SHIFT for transparent huge pages.
|
|
|
|
#endif
|
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
#ifndef __ASSEMBLY__
|
|
|
|
|
2014-09-28 01:05:21 +07:00
|
|
|
extern unsigned long VMALLOC_END;
|
|
|
|
|
|
|
|
#define vmemmap ((struct page *)VMEMMAP_BASE)
|
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
#include <linux/sched.h>
|
|
|
|
|
sparc64: Fix physical memory management regressions with large max_phys_bits.
If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
DEBUG_PAGEALLOC stop working because the 3-level page tables only
can cover up to 43 bits.
Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
47, several statically allocated tables became enormous.
Compounding this is that we will need to support up to 49 bits of
physical addressing for M7 chips.
The two tables in question are sparc64_valid_addr_bitmap and
kpte_linear_bitmap.
The first holds a bitmap, with 1 bit for each 4MB chunk of physical
memory, indicating whether that chunk actually exists in the machine
and is valid.
The second table is a set of 2-bit values which tell how large of a
mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
chunk of ram in the system.
These tables are huge and take up an enormous amount of the BSS
section of the sparc64 kernel image. Specifically, the
sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.
So let's solve the space wastage and the DEBUG_PAGEALLOC problem
at the same time, by using the kernel page tables (as designed) to
manage this information.
We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
and we do this by encoding huge PMDs and PUDs.
On a T4-2 with 256GB of ram the kernel page table takes up 16K with
DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this
memory is dynamically allocated at run time rather than coded
statically into the kernel image.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2014-09-25 10:56:11 +07:00
|
|
|
bool kern_addr_valid(unsigned long addr);
|
2014-04-30 03:03:27 +07:00
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
/* Entries per page directory level. */
|
sparc64: Move from 4MB to 8MB huge pages.
The impetus for this is that we would like to move to 64-bit PMDs and
PGDs, but that would result in only supporting a 42-bit address space
with the current page table layout. It'd be nice to support at least
43-bits.
The reason we'd end up with only 42-bits after making PMDs and PGDs
64-bit is that we only use half-page sized PTE tables in order to make
PMDs line up to 4MB, the hardware huge page size we use.
So what we do here is we make huge pages 8MB, and fabricate them using
4MB hw TLB entries.
Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in
places that really need to operate on hardware 4MB pages.
Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT,
PGD_SHIFT, and the build time CPP test as needed. Use a CPP test to
make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up.
This makes the pgtable cache completely unused, so remove the code
managing it and the state used in mm_context_t. Now we have less
spinlocks taken in the page table allocation path.
The technique we use to fabricate the 8MB pages is to transfer bit 22
from the missing virtual address into the PTEs physical address field.
That takes care of the transparent huge pages case.
For hugetlb, we fill things in at the PTE level and that code already
puts the sub huge page physical bits into the PTEs, based upon the
offset, so there is nothing special we need to do. It all just works
out.
So, a small amount of complexity in the THP case, but this code is
about to get much simpler when we move the 64-bit PMDs as we can move
away from the fancy 32-bit huge PMD encoding and just put a real PTE
value in there.
With bug fixes and help from Bob Picco.
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26 03:48:49 +07:00
|
|
|
#define PTRS_PER_PTE (1UL << (PAGE_SHIFT-3))
|
2008-07-18 11:55:51 +07:00
|
|
|
#define PTRS_PER_PMD (1UL << PMD_BITS)
|
2014-09-27 11:19:46 +07:00
|
|
|
#define PTRS_PER_PUD (1UL << PUD_BITS)
|
2008-07-18 11:55:51 +07:00
|
|
|
#define PTRS_PER_PGD (1UL << PGDIR_BITS)
|
|
|
|
|
|
|
|
/* Kernel has a separate 44bit address space. */
|
2015-02-12 06:26:41 +07:00
|
|
|
#define FIRST_USER_ADDRESS 0UL
|
2008-07-18 11:55:51 +07:00
|
|
|
|
2014-04-30 03:28:23 +07:00
|
|
|
#define pmd_ERROR(e) \
|
|
|
|
pr_err("%s:%d: bad pmd %p(%016lx) seen at (%pS)\n", \
|
|
|
|
__FILE__, __LINE__, &(e), pmd_val(e), __builtin_return_address(0))
|
2014-09-27 11:19:46 +07:00
|
|
|
#define pud_ERROR(e) \
|
|
|
|
pr_err("%s:%d: bad pud %p(%016lx) seen at (%pS)\n", \
|
|
|
|
__FILE__, __LINE__, &(e), pud_val(e), __builtin_return_address(0))
|
2014-04-30 03:28:23 +07:00
|
|
|
#define pgd_ERROR(e) \
|
|
|
|
pr_err("%s:%d: bad pgd %p(%016lx) seen at (%pS)\n", \
|
|
|
|
__FILE__, __LINE__, &(e), pgd_val(e), __builtin_return_address(0))
|
2008-07-18 11:55:51 +07:00
|
|
|
|
|
|
|
#endif /* !(__ASSEMBLY__) */
|
|
|
|
|
|
|
|
/* PTE bits which are the same in SUN4U and SUN4V format. */
|
|
|
|
#define _PAGE_VALID _AC(0x8000000000000000,UL) /* Valid TTE */
|
|
|
|
#define _PAGE_R _AC(0x8000000000000000,UL) /* Keep ref bit uptodate*/
|
2011-07-26 07:12:21 +07:00
|
|
|
#define _PAGE_SPECIAL _AC(0x0200000000000000,UL) /* Special page */
|
2013-09-27 03:45:15 +07:00
|
|
|
#define _PAGE_PMD_HUGE _AC(0x0100000000000000,UL) /* Huge page */
|
sparc64: Fix physical memory management regressions with large max_phys_bits.
If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
DEBUG_PAGEALLOC stop working because the 3-level page tables only
can cover up to 43 bits.
Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
47, several statically allocated tables became enormous.
Compounding this is that we will need to support up to 49 bits of
physical addressing for M7 chips.
The two tables in question are sparc64_valid_addr_bitmap and
kpte_linear_bitmap.
The first holds a bitmap, with 1 bit for each 4MB chunk of physical
memory, indicating whether that chunk actually exists in the machine
and is valid.
The second table is a set of 2-bit values which tell how large of a
mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
chunk of ram in the system.
These tables are huge and take up an enormous amount of the BSS
section of the sparc64 kernel image. Specifically, the
sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.
So let's solve the space wastage and the DEBUG_PAGEALLOC problem
at the same time, by using the kernel page tables (as designed) to
manage this information.
We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
and we do this by encoding huge PMDs and PUDs.
On a T4-2 with 256GB of ram the kernel page table takes up 16K with
DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this
memory is dynamically allocated at run time rather than coded
statically into the kernel image.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2014-09-25 10:56:11 +07:00
|
|
|
#define _PAGE_PUD_HUGE _PAGE_PMD_HUGE
|
2011-07-26 07:12:21 +07:00
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
/* SUN4U pte bits... */
|
|
|
|
#define _PAGE_SZ4MB_4U _AC(0x6000000000000000,UL) /* 4MB Page */
|
|
|
|
#define _PAGE_SZ512K_4U _AC(0x4000000000000000,UL) /* 512K Page */
|
|
|
|
#define _PAGE_SZ64K_4U _AC(0x2000000000000000,UL) /* 64K Page */
|
|
|
|
#define _PAGE_SZ8K_4U _AC(0x0000000000000000,UL) /* 8K Page */
|
|
|
|
#define _PAGE_NFO_4U _AC(0x1000000000000000,UL) /* No Fault Only */
|
|
|
|
#define _PAGE_IE_4U _AC(0x0800000000000000,UL) /* Invert Endianness */
|
|
|
|
#define _PAGE_SOFT2_4U _AC(0x07FC000000000000,UL) /* Software bits, set 2 */
|
2011-07-26 07:12:21 +07:00
|
|
|
#define _PAGE_SPECIAL_4U _AC(0x0200000000000000,UL) /* Special page */
|
2013-09-27 03:45:15 +07:00
|
|
|
#define _PAGE_PMD_HUGE_4U _AC(0x0100000000000000,UL) /* Huge page */
|
2008-07-18 11:55:51 +07:00
|
|
|
#define _PAGE_RES1_4U _AC(0x0002000000000000,UL) /* Reserved */
|
|
|
|
#define _PAGE_SZ32MB_4U _AC(0x0001000000000000,UL) /* (Panther) 32MB page */
|
|
|
|
#define _PAGE_SZ256MB_4U _AC(0x2001000000000000,UL) /* (Panther) 256MB page */
|
|
|
|
#define _PAGE_SZALL_4U _AC(0x6001000000000000,UL) /* All pgsz bits */
|
|
|
|
#define _PAGE_SN_4U _AC(0x0000800000000000,UL) /* (Cheetah) Snoop */
|
|
|
|
#define _PAGE_RES2_4U _AC(0x0000780000000000,UL) /* Reserved */
|
|
|
|
#define _PAGE_PADDR_4U _AC(0x000007FFFFFFE000,UL) /* (Cheetah) pa[42:13] */
|
|
|
|
#define _PAGE_SOFT_4U _AC(0x0000000000001F80,UL) /* Software bits: */
|
|
|
|
#define _PAGE_EXEC_4U _AC(0x0000000000001000,UL) /* Executable SW bit */
|
|
|
|
#define _PAGE_MODIFIED_4U _AC(0x0000000000000800,UL) /* Modified (dirty) */
|
|
|
|
#define _PAGE_ACCESSED_4U _AC(0x0000000000000400,UL) /* Accessed (ref'd) */
|
|
|
|
#define _PAGE_READ_4U _AC(0x0000000000000200,UL) /* Readable SW Bit */
|
|
|
|
#define _PAGE_WRITE_4U _AC(0x0000000000000100,UL) /* Writable SW Bit */
|
|
|
|
#define _PAGE_PRESENT_4U _AC(0x0000000000000080,UL) /* Present */
|
|
|
|
#define _PAGE_L_4U _AC(0x0000000000000040,UL) /* Locked TTE */
|
|
|
|
#define _PAGE_CP_4U _AC(0x0000000000000020,UL) /* Cacheable in P-Cache */
|
|
|
|
#define _PAGE_CV_4U _AC(0x0000000000000010,UL) /* Cacheable in V-Cache */
|
|
|
|
#define _PAGE_E_4U _AC(0x0000000000000008,UL) /* side-Effect */
|
|
|
|
#define _PAGE_P_4U _AC(0x0000000000000004,UL) /* Privileged Page */
|
|
|
|
#define _PAGE_W_4U _AC(0x0000000000000002,UL) /* Writable */
|
|
|
|
|
|
|
|
/* SUN4V pte bits... */
|
|
|
|
#define _PAGE_NFO_4V _AC(0x4000000000000000,UL) /* No Fault Only */
|
|
|
|
#define _PAGE_SOFT2_4V _AC(0x3F00000000000000,UL) /* Software bits, set 2 */
|
|
|
|
#define _PAGE_MODIFIED_4V _AC(0x2000000000000000,UL) /* Modified (dirty) */
|
|
|
|
#define _PAGE_ACCESSED_4V _AC(0x1000000000000000,UL) /* Accessed (ref'd) */
|
|
|
|
#define _PAGE_READ_4V _AC(0x0800000000000000,UL) /* Readable SW Bit */
|
|
|
|
#define _PAGE_WRITE_4V _AC(0x0400000000000000,UL) /* Writable SW Bit */
|
2011-07-26 07:12:21 +07:00
|
|
|
#define _PAGE_SPECIAL_4V _AC(0x0200000000000000,UL) /* Special page */
|
2013-09-27 03:45:15 +07:00
|
|
|
#define _PAGE_PMD_HUGE_4V _AC(0x0100000000000000,UL) /* Huge page */
|
2008-07-18 11:55:51 +07:00
|
|
|
#define _PAGE_PADDR_4V _AC(0x00FFFFFFFFFFE000,UL) /* paddr[55:13] */
|
|
|
|
#define _PAGE_IE_4V _AC(0x0000000000001000,UL) /* Invert Endianness */
|
|
|
|
#define _PAGE_E_4V _AC(0x0000000000000800,UL) /* side-Effect */
|
|
|
|
#define _PAGE_CP_4V _AC(0x0000000000000400,UL) /* Cacheable in P-Cache */
|
|
|
|
#define _PAGE_CV_4V _AC(0x0000000000000200,UL) /* Cacheable in V-Cache */
|
2018-02-22 00:15:45 +07:00
|
|
|
/* Bit 9 is used to enable MCD corruption detection instead on M7 */
|
|
|
|
#define _PAGE_MCD_4V _AC(0x0000000000000200,UL) /* Memory Corruption */
|
2008-07-18 11:55:51 +07:00
|
|
|
#define _PAGE_P_4V _AC(0x0000000000000100,UL) /* Privileged Page */
|
|
|
|
#define _PAGE_EXEC_4V _AC(0x0000000000000080,UL) /* Executable Page */
|
|
|
|
#define _PAGE_W_4V _AC(0x0000000000000040,UL) /* Writable */
|
|
|
|
#define _PAGE_SOFT_4V _AC(0x0000000000000030,UL) /* Software bits */
|
|
|
|
#define _PAGE_PRESENT_4V _AC(0x0000000000000010,UL) /* Present */
|
|
|
|
#define _PAGE_RESV_4V _AC(0x0000000000000008,UL) /* Reserved */
|
|
|
|
#define _PAGE_SZ16GB_4V _AC(0x0000000000000007,UL) /* 16GB Page */
|
|
|
|
#define _PAGE_SZ2GB_4V _AC(0x0000000000000006,UL) /* 2GB Page */
|
|
|
|
#define _PAGE_SZ256MB_4V _AC(0x0000000000000005,UL) /* 256MB Page */
|
|
|
|
#define _PAGE_SZ32MB_4V _AC(0x0000000000000004,UL) /* 32MB Page */
|
|
|
|
#define _PAGE_SZ4MB_4V _AC(0x0000000000000003,UL) /* 4MB Page */
|
|
|
|
#define _PAGE_SZ512K_4V _AC(0x0000000000000002,UL) /* 512K Page */
|
|
|
|
#define _PAGE_SZ64K_4V _AC(0x0000000000000001,UL) /* 64K Page */
|
|
|
|
#define _PAGE_SZ8K_4V _AC(0x0000000000000000,UL) /* 8K Page */
|
|
|
|
#define _PAGE_SZALL_4V _AC(0x0000000000000007,UL) /* All pgsz bits */
|
|
|
|
|
|
|
|
#define _PAGE_SZBITS_4U _PAGE_SZ8K_4U
|
|
|
|
#define _PAGE_SZBITS_4V _PAGE_SZ8K_4V
|
2012-10-09 06:34:19 +07:00
|
|
|
|
sparc64: Move from 4MB to 8MB huge pages.
The impetus for this is that we would like to move to 64-bit PMDs and
PGDs, but that would result in only supporting a 42-bit address space
with the current page table layout. It'd be nice to support at least
43-bits.
The reason we'd end up with only 42-bits after making PMDs and PGDs
64-bit is that we only use half-page sized PTE tables in order to make
PMDs line up to 4MB, the hardware huge page size we use.
So what we do here is we make huge pages 8MB, and fabricate them using
4MB hw TLB entries.
Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in
places that really need to operate on hardware 4MB pages.
Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT,
PGD_SHIFT, and the build time CPP test as needed. Use a CPP test to
make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up.
This makes the pgtable cache completely unused, so remove the code
managing it and the state used in mm_context_t. Now we have less
spinlocks taken in the page table allocation path.
The technique we use to fabricate the 8MB pages is to transfer bit 22
from the missing virtual address into the PTEs physical address field.
That takes care of the transparent huge pages case.
For hugetlb, we fill things in at the PTE level and that code already
puts the sub huge page physical bits into the PTEs, based upon the
offset, so there is nothing special we need to do. It all just works
out.
So, a small amount of complexity in the THP case, but this code is
about to get much simpler when we move the 64-bit PMDs as we can move
away from the fancy 32-bit huge PMD encoding and just put a real PTE
value in there.
With bug fixes and help from Bob Picco.
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-09-26 03:48:49 +07:00
|
|
|
#if REAL_HPAGE_SHIFT != 22
|
|
|
|
#error REAL_HPAGE_SHIFT and _PAGE_SZHUGE_foo must match up
|
|
|
|
#endif
|
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
#define _PAGE_SZHUGE_4U _PAGE_SZ4MB_4U
|
|
|
|
#define _PAGE_SZHUGE_4V _PAGE_SZ4MB_4V
|
|
|
|
|
|
|
|
/* These are actually filled in at boot time by sun4{u,v}_pgprot_init() */
|
|
|
|
#define __P000 __pgprot(0)
|
|
|
|
#define __P001 __pgprot(0)
|
|
|
|
#define __P010 __pgprot(0)
|
|
|
|
#define __P011 __pgprot(0)
|
|
|
|
#define __P100 __pgprot(0)
|
|
|
|
#define __P101 __pgprot(0)
|
|
|
|
#define __P110 __pgprot(0)
|
|
|
|
#define __P111 __pgprot(0)
|
|
|
|
|
|
|
|
#define __S000 __pgprot(0)
|
|
|
|
#define __S001 __pgprot(0)
|
|
|
|
#define __S010 __pgprot(0)
|
|
|
|
#define __S011 __pgprot(0)
|
|
|
|
#define __S100 __pgprot(0)
|
|
|
|
#define __S101 __pgprot(0)
|
|
|
|
#define __S110 __pgprot(0)
|
|
|
|
#define __S111 __pgprot(0)
|
|
|
|
|
|
|
|
#ifndef __ASSEMBLY__
|
|
|
|
|
2014-05-17 04:25:50 +07:00
|
|
|
pte_t mk_pte_io(unsigned long, pgprot_t, int, unsigned long);
|
2008-07-18 11:55:51 +07:00
|
|
|
|
2014-05-17 04:25:50 +07:00
|
|
|
unsigned long pte_sz_bits(unsigned long size);
|
2008-07-18 11:55:51 +07:00
|
|
|
|
|
|
|
extern pgprot_t PAGE_KERNEL;
|
|
|
|
extern pgprot_t PAGE_KERNEL_LOCKED;
|
|
|
|
extern pgprot_t PAGE_COPY;
|
|
|
|
extern pgprot_t PAGE_SHARED;
|
|
|
|
|
2016-03-05 02:21:18 +07:00
|
|
|
/* XXX This ugliness is for the atyfb driver's sparc mmap() support. XXX */
|
2008-07-18 11:55:51 +07:00
|
|
|
extern unsigned long _PAGE_IE;
|
|
|
|
extern unsigned long _PAGE_E;
|
|
|
|
extern unsigned long _PAGE_CACHE;
|
|
|
|
|
|
|
|
extern unsigned long pg_iobits;
|
|
|
|
extern unsigned long _PAGE_ALL_SZ_BITS;
|
|
|
|
|
|
|
|
extern struct page *mem_map_zero;
|
|
|
|
#define ZERO_PAGE(vaddr) (mem_map_zero)
|
|
|
|
|
|
|
|
/* PFNs are real physical page numbers. However, mem_map only begins to record
|
|
|
|
* per-page information starting at pfn_base. This is to handle systems where
|
|
|
|
* the first physical page in the machine is at some huge physical address,
|
|
|
|
* such as 4GB. This is common on a partitioned E10000, for example.
|
|
|
|
*/
|
|
|
|
static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
|
|
|
|
{
|
|
|
|
unsigned long paddr = pfn << PAGE_SHIFT;
|
2012-10-09 06:34:19 +07:00
|
|
|
|
|
|
|
BUILD_BUG_ON(_PAGE_SZBITS_4U != 0UL || _PAGE_SZBITS_4V != 0UL);
|
|
|
|
return __pte(paddr | pgprot_val(prot));
|
2008-07-18 11:55:51 +07:00
|
|
|
}
|
|
|
|
#define mk_pte(page, pgprot) pfn_pte(page_to_pfn(page), (pgprot))
|
|
|
|
|
2012-10-09 06:34:29 +07:00
|
|
|
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
2013-09-27 03:45:15 +07:00
|
|
|
static inline pmd_t pfn_pmd(unsigned long page_nr, pgprot_t pgprot)
|
2012-10-09 06:34:29 +07:00
|
|
|
{
|
2013-09-27 03:45:15 +07:00
|
|
|
pte_t pte = pfn_pte(page_nr, pgprot);
|
|
|
|
|
|
|
|
return __pmd(pte_val(pte));
|
2012-10-09 06:34:29 +07:00
|
|
|
}
|
2013-09-27 03:45:15 +07:00
|
|
|
#define mk_pmd(page, pgprot) pfn_pmd(page_to_pfn(page), (pgprot))
|
2012-10-09 06:34:29 +07:00
|
|
|
#endif
|
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
/* This one can be done with two shifts. */
|
|
|
|
static inline unsigned long pte_pfn(pte_t pte)
|
|
|
|
{
|
|
|
|
unsigned long ret;
|
|
|
|
|
|
|
|
__asm__ __volatile__(
|
|
|
|
"\n661: sllx %1, %2, %0\n"
|
|
|
|
" srlx %0, %3, %0\n"
|
|
|
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" sllx %1, %4, %0\n"
|
|
|
|
" srlx %0, %5, %0\n"
|
|
|
|
" .previous\n"
|
|
|
|
: "=r" (ret)
|
|
|
|
: "r" (pte_val(pte)),
|
|
|
|
"i" (21), "i" (21 + PAGE_SHIFT),
|
|
|
|
"i" (8), "i" (8 + PAGE_SHIFT));
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
#define pte_page(x) pfn_to_page(pte_pfn(x))
|
|
|
|
|
|
|
|
static inline pte_t pte_modify(pte_t pte, pgprot_t prot)
|
|
|
|
{
|
|
|
|
unsigned long mask, tmp;
|
|
|
|
|
2014-04-29 09:11:27 +07:00
|
|
|
/* SUN4U: 0x630107ffffffec38 (negated == 0x9cfef800000013c7)
|
|
|
|
* SUN4V: 0x33ffffffffffee07 (negated == 0xcc000000000011f8)
|
2008-07-18 11:55:51 +07:00
|
|
|
*
|
|
|
|
* Even if we use negation tricks the result is still a 6
|
|
|
|
* instruction sequence, so don't try to play fancy and just
|
|
|
|
* do the most straightforward implementation.
|
|
|
|
*
|
|
|
|
* Note: We encode this into 3 sun4v 2-insn patch sequences.
|
|
|
|
*/
|
|
|
|
|
2012-10-09 06:34:19 +07:00
|
|
|
BUILD_BUG_ON(_PAGE_SZBITS_4U != 0UL || _PAGE_SZBITS_4V != 0UL);
|
2008-07-18 11:55:51 +07:00
|
|
|
__asm__ __volatile__(
|
|
|
|
"\n661: sethi %%uhi(%2), %1\n"
|
|
|
|
" sethi %%hi(%2), %0\n"
|
|
|
|
"\n662: or %1, %%ulo(%2), %1\n"
|
|
|
|
" or %0, %%lo(%2), %0\n"
|
|
|
|
"\n663: sllx %1, 32, %1\n"
|
|
|
|
" or %0, %1, %0\n"
|
|
|
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" sethi %%uhi(%3), %1\n"
|
|
|
|
" sethi %%hi(%3), %0\n"
|
|
|
|
" .word 662b\n"
|
|
|
|
" or %1, %%ulo(%3), %1\n"
|
|
|
|
" or %0, %%lo(%3), %0\n"
|
|
|
|
" .word 663b\n"
|
|
|
|
" sllx %1, 32, %1\n"
|
|
|
|
" or %0, %1, %0\n"
|
|
|
|
" .previous\n"
|
2015-05-27 23:00:46 +07:00
|
|
|
" .section .sun_m7_2insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" sethi %%uhi(%4), %1\n"
|
|
|
|
" sethi %%hi(%4), %0\n"
|
|
|
|
" .word 662b\n"
|
|
|
|
" or %1, %%ulo(%4), %1\n"
|
|
|
|
" or %0, %%lo(%4), %0\n"
|
|
|
|
" .word 663b\n"
|
|
|
|
" sllx %1, 32, %1\n"
|
|
|
|
" or %0, %1, %0\n"
|
|
|
|
" .previous\n"
|
2008-07-18 11:55:51 +07:00
|
|
|
: "=r" (mask), "=r" (tmp)
|
|
|
|
: "i" (_PAGE_PADDR_4U | _PAGE_MODIFIED_4U | _PAGE_ACCESSED_4U |
|
2014-04-29 09:11:27 +07:00
|
|
|
_PAGE_CP_4U | _PAGE_CV_4U | _PAGE_E_4U |
|
2013-09-27 03:45:15 +07:00
|
|
|
_PAGE_SPECIAL | _PAGE_PMD_HUGE | _PAGE_SZALL_4U),
|
2008-07-18 11:55:51 +07:00
|
|
|
"i" (_PAGE_PADDR_4V | _PAGE_MODIFIED_4V | _PAGE_ACCESSED_4V |
|
2014-04-29 09:11:27 +07:00
|
|
|
_PAGE_CP_4V | _PAGE_CV_4V | _PAGE_E_4V |
|
2015-05-27 23:00:46 +07:00
|
|
|
_PAGE_SPECIAL | _PAGE_PMD_HUGE | _PAGE_SZALL_4V),
|
|
|
|
"i" (_PAGE_PADDR_4V | _PAGE_MODIFIED_4V | _PAGE_ACCESSED_4V |
|
|
|
|
_PAGE_CP_4V | _PAGE_E_4V |
|
2013-09-27 03:45:15 +07:00
|
|
|
_PAGE_SPECIAL | _PAGE_PMD_HUGE | _PAGE_SZALL_4V));
|
2008-07-18 11:55:51 +07:00
|
|
|
|
|
|
|
return __pte((pte_val(pte) & mask) | (pgprot_val(prot) & ~mask));
|
|
|
|
}
|
|
|
|
|
2013-09-27 03:45:15 +07:00
|
|
|
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
|
|
|
static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
|
|
|
|
{
|
|
|
|
pte_t pte = __pte(pmd_val(pmd));
|
|
|
|
|
|
|
|
pte = pte_modify(pte, newprot);
|
|
|
|
|
|
|
|
return __pmd(pte_val(pte));
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
static inline pgprot_t pgprot_noncached(pgprot_t prot)
|
|
|
|
{
|
|
|
|
unsigned long val = pgprot_val(prot);
|
|
|
|
|
|
|
|
__asm__ __volatile__(
|
|
|
|
"\n661: andn %0, %2, %0\n"
|
|
|
|
" or %0, %3, %0\n"
|
|
|
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" andn %0, %4, %0\n"
|
|
|
|
" or %0, %5, %0\n"
|
|
|
|
" .previous\n"
|
2015-05-27 23:00:46 +07:00
|
|
|
" .section .sun_m7_2insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" andn %0, %6, %0\n"
|
|
|
|
" or %0, %5, %0\n"
|
|
|
|
" .previous\n"
|
2008-07-18 11:55:51 +07:00
|
|
|
: "=r" (val)
|
|
|
|
: "0" (val), "i" (_PAGE_CP_4U | _PAGE_CV_4U), "i" (_PAGE_E_4U),
|
2015-05-27 23:00:46 +07:00
|
|
|
"i" (_PAGE_CP_4V | _PAGE_CV_4V), "i" (_PAGE_E_4V),
|
|
|
|
"i" (_PAGE_CP_4V));
|
2008-07-18 11:55:51 +07:00
|
|
|
|
|
|
|
return __pgprot(val);
|
|
|
|
}
|
|
|
|
/* Various pieces of code check for platform support by ifdef testing
|
|
|
|
* on "pgprot_noncached". That's broken and should be fixed, but for
|
|
|
|
* now...
|
|
|
|
*/
|
|
|
|
#define pgprot_noncached pgprot_noncached
|
|
|
|
|
2013-09-27 03:45:15 +07:00
|
|
|
#if defined(CONFIG_HUGETLB_PAGE) || defined(CONFIG_TRANSPARENT_HUGEPAGE)
|
2017-02-02 07:16:36 +07:00
|
|
|
extern pte_t arch_make_huge_pte(pte_t entry, struct vm_area_struct *vma,
|
|
|
|
struct page *page, int writable);
|
|
|
|
#define arch_make_huge_pte arch_make_huge_pte
|
|
|
|
static inline unsigned long __pte_default_huge_mask(void)
|
2008-07-18 11:55:51 +07:00
|
|
|
{
|
|
|
|
unsigned long mask;
|
|
|
|
|
|
|
|
__asm__ __volatile__(
|
|
|
|
"\n661: sethi %%uhi(%1), %0\n"
|
|
|
|
" sllx %0, 32, %0\n"
|
|
|
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" mov %2, %0\n"
|
|
|
|
" nop\n"
|
|
|
|
" .previous\n"
|
|
|
|
: "=r" (mask)
|
|
|
|
: "i" (_PAGE_SZHUGE_4U), "i" (_PAGE_SZHUGE_4V));
|
|
|
|
|
2016-03-31 01:17:13 +07:00
|
|
|
return mask;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline pte_t pte_mkhuge(pte_t pte)
|
|
|
|
{
|
2017-02-02 07:16:36 +07:00
|
|
|
return __pte(pte_val(pte) | __pte_default_huge_mask());
|
2016-03-31 01:17:13 +07:00
|
|
|
}
|
|
|
|
|
2017-02-02 07:16:36 +07:00
|
|
|
static inline bool is_default_hugetlb_pte(pte_t pte)
|
2016-03-31 01:17:13 +07:00
|
|
|
{
|
2017-02-02 07:16:36 +07:00
|
|
|
unsigned long mask = __pte_default_huge_mask();
|
|
|
|
|
|
|
|
return (pte_val(pte) & mask) == mask;
|
2008-07-18 11:55:51 +07:00
|
|
|
}
|
2016-03-31 01:17:13 +07:00
|
|
|
|
2016-07-29 14:54:21 +07:00
|
|
|
static inline bool is_hugetlb_pmd(pmd_t pmd)
|
|
|
|
{
|
|
|
|
return !!(pmd_val(pmd) & _PAGE_PMD_HUGE);
|
|
|
|
}
|
|
|
|
|
2017-08-12 06:46:50 +07:00
|
|
|
static inline bool is_hugetlb_pud(pud_t pud)
|
|
|
|
{
|
|
|
|
return !!(pud_val(pud) & _PAGE_PUD_HUGE);
|
|
|
|
}
|
|
|
|
|
2013-09-27 03:45:15 +07:00
|
|
|
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
|
|
|
static inline pmd_t pmd_mkhuge(pmd_t pmd)
|
|
|
|
{
|
|
|
|
pte_t pte = __pte(pmd_val(pmd));
|
|
|
|
|
|
|
|
pte = pte_mkhuge(pte);
|
|
|
|
pte_val(pte) |= _PAGE_PMD_HUGE;
|
|
|
|
|
|
|
|
return __pmd(pte_val(pte));
|
|
|
|
}
|
|
|
|
#endif
|
2016-03-31 01:17:13 +07:00
|
|
|
#else
|
|
|
|
static inline bool is_hugetlb_pte(pte_t pte)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
2008-07-18 11:55:51 +07:00
|
|
|
#endif
|
|
|
|
|
|
|
|
static inline pte_t pte_mkdirty(pte_t pte)
|
|
|
|
{
|
|
|
|
unsigned long val = pte_val(pte), tmp;
|
|
|
|
|
|
|
|
__asm__ __volatile__(
|
|
|
|
"\n661: or %0, %3, %0\n"
|
|
|
|
" nop\n"
|
|
|
|
"\n662: nop\n"
|
|
|
|
" nop\n"
|
|
|
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" sethi %%uhi(%4), %1\n"
|
|
|
|
" sllx %1, 32, %1\n"
|
|
|
|
" .word 662b\n"
|
|
|
|
" or %1, %%lo(%4), %1\n"
|
|
|
|
" or %0, %1, %0\n"
|
|
|
|
" .previous\n"
|
|
|
|
: "=r" (val), "=r" (tmp)
|
|
|
|
: "0" (val), "i" (_PAGE_MODIFIED_4U | _PAGE_W_4U),
|
|
|
|
"i" (_PAGE_MODIFIED_4V | _PAGE_W_4V));
|
|
|
|
|
|
|
|
return __pte(val);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline pte_t pte_mkclean(pte_t pte)
|
|
|
|
{
|
|
|
|
unsigned long val = pte_val(pte), tmp;
|
|
|
|
|
|
|
|
__asm__ __volatile__(
|
|
|
|
"\n661: andn %0, %3, %0\n"
|
|
|
|
" nop\n"
|
|
|
|
"\n662: nop\n"
|
|
|
|
" nop\n"
|
|
|
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" sethi %%uhi(%4), %1\n"
|
|
|
|
" sllx %1, 32, %1\n"
|
|
|
|
" .word 662b\n"
|
|
|
|
" or %1, %%lo(%4), %1\n"
|
|
|
|
" andn %0, %1, %0\n"
|
|
|
|
" .previous\n"
|
|
|
|
: "=r" (val), "=r" (tmp)
|
|
|
|
: "0" (val), "i" (_PAGE_MODIFIED_4U | _PAGE_W_4U),
|
|
|
|
"i" (_PAGE_MODIFIED_4V | _PAGE_W_4V));
|
|
|
|
|
|
|
|
return __pte(val);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline pte_t pte_mkwrite(pte_t pte)
|
|
|
|
{
|
|
|
|
unsigned long val = pte_val(pte), mask;
|
|
|
|
|
|
|
|
__asm__ __volatile__(
|
|
|
|
"\n661: mov %1, %0\n"
|
|
|
|
" nop\n"
|
|
|
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" sethi %%uhi(%2), %0\n"
|
|
|
|
" sllx %0, 32, %0\n"
|
|
|
|
" .previous\n"
|
|
|
|
: "=r" (mask)
|
|
|
|
: "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V));
|
|
|
|
|
|
|
|
return __pte(val | mask);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline pte_t pte_wrprotect(pte_t pte)
|
|
|
|
{
|
|
|
|
unsigned long val = pte_val(pte), tmp;
|
|
|
|
|
|
|
|
__asm__ __volatile__(
|
|
|
|
"\n661: andn %0, %3, %0\n"
|
|
|
|
" nop\n"
|
|
|
|
"\n662: nop\n"
|
|
|
|
" nop\n"
|
|
|
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" sethi %%uhi(%4), %1\n"
|
|
|
|
" sllx %1, 32, %1\n"
|
|
|
|
" .word 662b\n"
|
|
|
|
" or %1, %%lo(%4), %1\n"
|
|
|
|
" andn %0, %1, %0\n"
|
|
|
|
" .previous\n"
|
|
|
|
: "=r" (val), "=r" (tmp)
|
|
|
|
: "0" (val), "i" (_PAGE_WRITE_4U | _PAGE_W_4U),
|
|
|
|
"i" (_PAGE_WRITE_4V | _PAGE_W_4V));
|
|
|
|
|
|
|
|
return __pte(val);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline pte_t pte_mkold(pte_t pte)
|
|
|
|
{
|
|
|
|
unsigned long mask;
|
|
|
|
|
|
|
|
__asm__ __volatile__(
|
|
|
|
"\n661: mov %1, %0\n"
|
|
|
|
" nop\n"
|
|
|
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" sethi %%uhi(%2), %0\n"
|
|
|
|
" sllx %0, 32, %0\n"
|
|
|
|
" .previous\n"
|
|
|
|
: "=r" (mask)
|
|
|
|
: "i" (_PAGE_ACCESSED_4U), "i" (_PAGE_ACCESSED_4V));
|
|
|
|
|
|
|
|
mask |= _PAGE_R;
|
|
|
|
|
|
|
|
return __pte(pte_val(pte) & ~mask);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline pte_t pte_mkyoung(pte_t pte)
|
|
|
|
{
|
|
|
|
unsigned long mask;
|
|
|
|
|
|
|
|
__asm__ __volatile__(
|
|
|
|
"\n661: mov %1, %0\n"
|
|
|
|
" nop\n"
|
|
|
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" sethi %%uhi(%2), %0\n"
|
|
|
|
" sllx %0, 32, %0\n"
|
|
|
|
" .previous\n"
|
|
|
|
: "=r" (mask)
|
|
|
|
: "i" (_PAGE_ACCESSED_4U), "i" (_PAGE_ACCESSED_4V));
|
|
|
|
|
|
|
|
mask |= _PAGE_R;
|
|
|
|
|
|
|
|
return __pte(pte_val(pte) | mask);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline pte_t pte_mkspecial(pte_t pte)
|
|
|
|
{
|
2011-07-26 07:12:21 +07:00
|
|
|
pte_val(pte) |= _PAGE_SPECIAL;
|
2008-07-18 11:55:51 +07:00
|
|
|
return pte;
|
|
|
|
}
|
|
|
|
|
2018-02-24 05:46:41 +07:00
|
|
|
static inline pte_t pte_mkmcd(pte_t pte)
|
|
|
|
{
|
|
|
|
pte_val(pte) |= _PAGE_MCD_4V;
|
|
|
|
return pte;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline pte_t pte_mknotmcd(pte_t pte)
|
|
|
|
{
|
|
|
|
pte_val(pte) &= ~_PAGE_MCD_4V;
|
|
|
|
return pte;
|
|
|
|
}
|
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
static inline unsigned long pte_young(pte_t pte)
|
|
|
|
{
|
|
|
|
unsigned long mask;
|
|
|
|
|
|
|
|
__asm__ __volatile__(
|
|
|
|
"\n661: mov %1, %0\n"
|
|
|
|
" nop\n"
|
|
|
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" sethi %%uhi(%2), %0\n"
|
|
|
|
" sllx %0, 32, %0\n"
|
|
|
|
" .previous\n"
|
|
|
|
: "=r" (mask)
|
|
|
|
: "i" (_PAGE_ACCESSED_4U), "i" (_PAGE_ACCESSED_4V));
|
|
|
|
|
|
|
|
return (pte_val(pte) & mask);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline unsigned long pte_dirty(pte_t pte)
|
|
|
|
{
|
|
|
|
unsigned long mask;
|
|
|
|
|
|
|
|
__asm__ __volatile__(
|
|
|
|
"\n661: mov %1, %0\n"
|
|
|
|
" nop\n"
|
|
|
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" sethi %%uhi(%2), %0\n"
|
|
|
|
" sllx %0, 32, %0\n"
|
|
|
|
" .previous\n"
|
|
|
|
: "=r" (mask)
|
|
|
|
: "i" (_PAGE_MODIFIED_4U), "i" (_PAGE_MODIFIED_4V));
|
|
|
|
|
|
|
|
return (pte_val(pte) & mask);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline unsigned long pte_write(pte_t pte)
|
|
|
|
{
|
|
|
|
unsigned long mask;
|
|
|
|
|
|
|
|
__asm__ __volatile__(
|
|
|
|
"\n661: mov %1, %0\n"
|
|
|
|
" nop\n"
|
|
|
|
" .section .sun4v_2insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" sethi %%uhi(%2), %0\n"
|
|
|
|
" sllx %0, 32, %0\n"
|
|
|
|
" .previous\n"
|
|
|
|
: "=r" (mask)
|
|
|
|
: "i" (_PAGE_WRITE_4U), "i" (_PAGE_WRITE_4V));
|
|
|
|
|
|
|
|
return (pte_val(pte) & mask);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline unsigned long pte_exec(pte_t pte)
|
|
|
|
{
|
|
|
|
unsigned long mask;
|
|
|
|
|
|
|
|
__asm__ __volatile__(
|
|
|
|
"\n661: sethi %%hi(%1), %0\n"
|
|
|
|
" .section .sun4v_1insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" mov %2, %0\n"
|
|
|
|
" .previous\n"
|
|
|
|
: "=r" (mask)
|
|
|
|
: "i" (_PAGE_EXEC_4U), "i" (_PAGE_EXEC_4V));
|
|
|
|
|
|
|
|
return (pte_val(pte) & mask);
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline unsigned long pte_present(pte_t pte)
|
|
|
|
{
|
|
|
|
unsigned long val = pte_val(pte);
|
|
|
|
|
|
|
|
__asm__ __volatile__(
|
|
|
|
"\n661: and %0, %2, %0\n"
|
|
|
|
" .section .sun4v_1insn_patch, \"ax\"\n"
|
|
|
|
" .word 661b\n"
|
|
|
|
" and %0, %3, %0\n"
|
|
|
|
" .previous\n"
|
|
|
|
: "=r" (val)
|
|
|
|
: "0" (val), "i" (_PAGE_PRESENT_4U), "i" (_PAGE_PRESENT_4V));
|
|
|
|
|
|
|
|
return val;
|
|
|
|
}
|
|
|
|
|
2012-12-19 07:06:16 +07:00
|
|
|
#define pte_accessible pte_accessible
|
mm: fix TLB flush race between migration, and change_protection_range
There are a few subtle races, between change_protection_range (used by
mprotect and change_prot_numa) on one side, and NUMA page migration and
compaction on the other side.
The basic race is that there is a time window between when the PTE gets
made non-present (PROT_NONE or NUMA), and the TLB is flushed.
During that time, a CPU may continue writing to the page.
This is fine most of the time, however compaction or the NUMA migration
code may come in, and migrate the page away.
When that happens, the CPU may continue writing, through the cached
translation, to what is no longer the current memory location of the
process.
This only affects x86, which has a somewhat optimistic pte_accessible.
All other architectures appear to be safe, and will either always flush,
or flush whenever there is a valid mapping, even with no permissions
(SPARC).
The basic race looks like this:
CPU A CPU B CPU C
load TLB entry
make entry PTE/PMD_NUMA
fault on entry
read/write old page
start migrating page
change PTE/PMD to new page
read/write old page [*]
flush TLB
reload TLB from new entry
read/write new page
lose data
[*] the old page may belong to a new user at this point!
The obvious fix is to flush remote TLB entries, by making sure that
pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
still be accessible if there is a TLB flush pending for the mm.
This should fix both NUMA migration and compaction.
[mgorman@suse.de: fix build]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Alex Thorlton <athorlton@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-19 08:08:44 +07:00
|
|
|
static inline unsigned long pte_accessible(struct mm_struct *mm, pte_t a)
|
2012-12-19 07:06:16 +07:00
|
|
|
{
|
|
|
|
return pte_val(a) & _PAGE_VALID;
|
|
|
|
}
|
|
|
|
|
2011-07-26 07:12:21 +07:00
|
|
|
static inline unsigned long pte_special(pte_t pte)
|
2008-07-18 11:55:51 +07:00
|
|
|
{
|
2011-07-26 07:12:21 +07:00
|
|
|
return pte_val(pte) & _PAGE_SPECIAL;
|
2008-07-18 11:55:51 +07:00
|
|
|
}
|
|
|
|
|
2013-09-27 03:45:15 +07:00
|
|
|
static inline unsigned long pmd_large(pmd_t pmd)
|
2013-02-14 03:21:06 +07:00
|
|
|
{
|
2013-09-27 03:45:15 +07:00
|
|
|
pte_t pte = __pte(pmd_val(pmd));
|
|
|
|
|
2014-04-26 00:21:12 +07:00
|
|
|
return pte_val(pte) & _PAGE_PMD_HUGE;
|
2013-02-14 03:21:06 +07:00
|
|
|
}
|
|
|
|
|
sparc64: Fix physical memory management regressions with large max_phys_bits.
If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
DEBUG_PAGEALLOC stop working because the 3-level page tables only
can cover up to 43 bits.
Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
47, several statically allocated tables became enormous.
Compounding this is that we will need to support up to 49 bits of
physical addressing for M7 chips.
The two tables in question are sparc64_valid_addr_bitmap and
kpte_linear_bitmap.
The first holds a bitmap, with 1 bit for each 4MB chunk of physical
memory, indicating whether that chunk actually exists in the machine
and is valid.
The second table is a set of 2-bit values which tell how large of a
mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
chunk of ram in the system.
These tables are huge and take up an enormous amount of the BSS
section of the sparc64 kernel image. Specifically, the
sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.
So let's solve the space wastage and the DEBUG_PAGEALLOC problem
at the same time, by using the kernel page tables (as designed) to
manage this information.
We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
and we do this by encoding huge PMDs and PUDs.
On a T4-2 with 256GB of ram the kernel page table takes up 16K with
DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this
memory is dynamically allocated at run time rather than coded
statically into the kernel image.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2014-09-25 10:56:11 +07:00
|
|
|
static inline unsigned long pmd_pfn(pmd_t pmd)
|
2012-10-09 06:34:29 +07:00
|
|
|
{
|
2013-09-27 03:45:15 +07:00
|
|
|
pte_t pte = __pte(pmd_val(pmd));
|
|
|
|
|
sparc64: Fix physical memory management regressions with large max_phys_bits.
If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
DEBUG_PAGEALLOC stop working because the 3-level page tables only
can cover up to 43 bits.
Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
47, several statically allocated tables became enormous.
Compounding this is that we will need to support up to 49 bits of
physical addressing for M7 chips.
The two tables in question are sparc64_valid_addr_bitmap and
kpte_linear_bitmap.
The first holds a bitmap, with 1 bit for each 4MB chunk of physical
memory, indicating whether that chunk actually exists in the machine
and is valid.
The second table is a set of 2-bit values which tell how large of a
mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
chunk of ram in the system.
These tables are huge and take up an enormous amount of the BSS
section of the sparc64 kernel image. Specifically, the
sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.
So let's solve the space wastage and the DEBUG_PAGEALLOC problem
at the same time, by using the kernel page tables (as designed) to
manage this information.
We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
and we do this by encoding huge PMDs and PUDs.
On a T4-2 with 256GB of ram the kernel page table takes up 16K with
DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this
memory is dynamically allocated at run time rather than coded
statically into the kernel image.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2014-09-25 10:56:11 +07:00
|
|
|
return pte_pfn(pte);
|
2012-10-09 06:34:29 +07:00
|
|
|
}
|
|
|
|
|
2017-11-30 07:10:10 +07:00
|
|
|
#define pmd_write pmd_write
|
2017-04-01 05:31:42 +07:00
|
|
|
static inline unsigned long pmd_write(pmd_t pmd)
|
2014-12-11 06:44:36 +07:00
|
|
|
{
|
|
|
|
pte_t pte = __pte(pmd_val(pmd));
|
|
|
|
|
2017-04-01 05:31:42 +07:00
|
|
|
return pte_write(pte);
|
2014-12-11 06:44:36 +07:00
|
|
|
}
|
|
|
|
|
2017-08-12 06:46:49 +07:00
|
|
|
#define pud_write(pud) pte_write(__pte(pud_val(pud)))
|
|
|
|
|
2017-04-01 05:31:42 +07:00
|
|
|
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
|
|
|
static inline unsigned long pmd_dirty(pmd_t pmd)
|
2012-10-09 06:34:29 +07:00
|
|
|
{
|
2013-09-27 03:45:15 +07:00
|
|
|
pte_t pte = __pte(pmd_val(pmd));
|
|
|
|
|
2017-04-01 05:31:42 +07:00
|
|
|
return pte_dirty(pte);
|
2012-10-09 06:34:29 +07:00
|
|
|
}
|
|
|
|
|
2017-04-01 05:31:42 +07:00
|
|
|
static inline unsigned long pmd_young(pmd_t pmd)
|
2012-10-09 06:34:29 +07:00
|
|
|
{
|
2013-09-27 03:45:15 +07:00
|
|
|
pte_t pte = __pte(pmd_val(pmd));
|
2012-10-09 06:34:29 +07:00
|
|
|
|
2017-04-01 05:31:42 +07:00
|
|
|
return pte_young(pte);
|
2012-10-09 06:34:29 +07:00
|
|
|
}
|
|
|
|
|
2013-09-27 03:45:15 +07:00
|
|
|
static inline unsigned long pmd_trans_huge(pmd_t pmd)
|
2012-10-09 06:34:29 +07:00
|
|
|
{
|
2013-09-27 03:45:15 +07:00
|
|
|
pte_t pte = __pte(pmd_val(pmd));
|
|
|
|
|
|
|
|
return pte_val(pte) & _PAGE_PMD_HUGE;
|
2012-10-09 06:34:29 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline pmd_t pmd_mkold(pmd_t pmd)
|
|
|
|
{
|
2013-09-27 03:45:15 +07:00
|
|
|
pte_t pte = __pte(pmd_val(pmd));
|
|
|
|
|
|
|
|
pte = pte_mkold(pte);
|
|
|
|
|
|
|
|
return __pmd(pte_val(pte));
|
2012-10-09 06:34:29 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline pmd_t pmd_wrprotect(pmd_t pmd)
|
|
|
|
{
|
2013-09-27 03:45:15 +07:00
|
|
|
pte_t pte = __pte(pmd_val(pmd));
|
|
|
|
|
|
|
|
pte = pte_wrprotect(pte);
|
|
|
|
|
|
|
|
return __pmd(pte_val(pte));
|
2012-10-09 06:34:29 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline pmd_t pmd_mkdirty(pmd_t pmd)
|
|
|
|
{
|
2013-09-27 03:45:15 +07:00
|
|
|
pte_t pte = __pte(pmd_val(pmd));
|
|
|
|
|
|
|
|
pte = pte_mkdirty(pte);
|
|
|
|
|
|
|
|
return __pmd(pte_val(pte));
|
2012-10-09 06:34:29 +07:00
|
|
|
}
|
|
|
|
|
2016-01-16 07:55:24 +07:00
|
|
|
static inline pmd_t pmd_mkclean(pmd_t pmd)
|
|
|
|
{
|
|
|
|
pte_t pte = __pte(pmd_val(pmd));
|
|
|
|
|
|
|
|
pte = pte_mkclean(pte);
|
|
|
|
|
|
|
|
return __pmd(pte_val(pte));
|
|
|
|
}
|
|
|
|
|
2012-10-09 06:34:29 +07:00
|
|
|
static inline pmd_t pmd_mkyoung(pmd_t pmd)
|
|
|
|
{
|
2013-09-27 03:45:15 +07:00
|
|
|
pte_t pte = __pte(pmd_val(pmd));
|
|
|
|
|
|
|
|
pte = pte_mkyoung(pte);
|
|
|
|
|
|
|
|
return __pmd(pte_val(pte));
|
2012-10-09 06:34:29 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
static inline pmd_t pmd_mkwrite(pmd_t pmd)
|
|
|
|
{
|
2013-09-27 03:45:15 +07:00
|
|
|
pte_t pte = __pte(pmd_val(pmd));
|
|
|
|
|
|
|
|
pte = pte_mkwrite(pte);
|
|
|
|
|
|
|
|
return __pmd(pte_val(pte));
|
2012-10-09 06:34:29 +07:00
|
|
|
}
|
|
|
|
|
2013-09-27 03:45:15 +07:00
|
|
|
static inline pgprot_t pmd_pgprot(pmd_t entry)
|
|
|
|
{
|
|
|
|
unsigned long val = pmd_val(entry);
|
|
|
|
|
|
|
|
return __pgprot(val);
|
|
|
|
}
|
2012-10-09 06:34:29 +07:00
|
|
|
#endif
|
|
|
|
|
|
|
|
static inline int pmd_present(pmd_t pmd)
|
|
|
|
{
|
2013-09-26 04:33:16 +07:00
|
|
|
return pmd_val(pmd) != 0UL;
|
2012-10-09 06:34:29 +07:00
|
|
|
}
|
|
|
|
|
|
|
|
#define pmd_none(pmd) (!pmd_val(pmd))
|
|
|
|
|
2014-04-30 03:03:27 +07:00
|
|
|
/* pmd_bad() is only called on non-trans-huge PMDs. Our encoding is
|
|
|
|
* very simple, it's just the physical address. PTE tables are of
|
|
|
|
* size PAGE_SIZE so make sure the sub-PAGE_SIZE bits are clear and
|
|
|
|
* the top bits outside of the range of any physical address size we
|
|
|
|
* support are clear as well. We also validate the physical itself.
|
|
|
|
*/
|
sparc64: Fix physical memory management regressions with large max_phys_bits.
If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
DEBUG_PAGEALLOC stop working because the 3-level page tables only
can cover up to 43 bits.
Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
47, several statically allocated tables became enormous.
Compounding this is that we will need to support up to 49 bits of
physical addressing for M7 chips.
The two tables in question are sparc64_valid_addr_bitmap and
kpte_linear_bitmap.
The first holds a bitmap, with 1 bit for each 4MB chunk of physical
memory, indicating whether that chunk actually exists in the machine
and is valid.
The second table is a set of 2-bit values which tell how large of a
mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
chunk of ram in the system.
These tables are huge and take up an enormous amount of the BSS
section of the sparc64 kernel image. Specifically, the
sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.
So let's solve the space wastage and the DEBUG_PAGEALLOC problem
at the same time, by using the kernel page tables (as designed) to
manage this information.
We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
and we do this by encoding huge PMDs and PUDs.
On a T4-2 with 256GB of ram the kernel page table takes up 16K with
DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this
memory is dynamically allocated at run time rather than coded
statically into the kernel image.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2014-09-25 10:56:11 +07:00
|
|
|
#define pmd_bad(pmd) (pmd_val(pmd) & ~PAGE_MASK)
|
2014-04-30 03:03:27 +07:00
|
|
|
|
|
|
|
#define pud_none(pud) (!pud_val(pud))
|
|
|
|
|
sparc64: Fix physical memory management regressions with large max_phys_bits.
If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
DEBUG_PAGEALLOC stop working because the 3-level page tables only
can cover up to 43 bits.
Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
47, several statically allocated tables became enormous.
Compounding this is that we will need to support up to 49 bits of
physical addressing for M7 chips.
The two tables in question are sparc64_valid_addr_bitmap and
kpte_linear_bitmap.
The first holds a bitmap, with 1 bit for each 4MB chunk of physical
memory, indicating whether that chunk actually exists in the machine
and is valid.
The second table is a set of 2-bit values which tell how large of a
mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
chunk of ram in the system.
These tables are huge and take up an enormous amount of the BSS
section of the sparc64 kernel image. Specifically, the
sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.
So let's solve the space wastage and the DEBUG_PAGEALLOC problem
at the same time, by using the kernel page tables (as designed) to
manage this information.
We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
and we do this by encoding huge PMDs and PUDs.
On a T4-2 with 256GB of ram the kernel page table takes up 16K with
DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this
memory is dynamically allocated at run time rather than coded
statically into the kernel image.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2014-09-25 10:56:11 +07:00
|
|
|
#define pud_bad(pud) (pud_val(pud) & ~PAGE_MASK)
|
2014-04-30 03:03:27 +07:00
|
|
|
|
2014-09-27 11:19:46 +07:00
|
|
|
#define pgd_none(pgd) (!pgd_val(pgd))
|
|
|
|
|
sparc64: Fix physical memory management regressions with large max_phys_bits.
If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
DEBUG_PAGEALLOC stop working because the 3-level page tables only
can cover up to 43 bits.
Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
47, several statically allocated tables became enormous.
Compounding this is that we will need to support up to 49 bits of
physical addressing for M7 chips.
The two tables in question are sparc64_valid_addr_bitmap and
kpte_linear_bitmap.
The first holds a bitmap, with 1 bit for each 4MB chunk of physical
memory, indicating whether that chunk actually exists in the machine
and is valid.
The second table is a set of 2-bit values which tell how large of a
mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
chunk of ram in the system.
These tables are huge and take up an enormous amount of the BSS
section of the sparc64 kernel image. Specifically, the
sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.
So let's solve the space wastage and the DEBUG_PAGEALLOC problem
at the same time, by using the kernel page tables (as designed) to
manage this information.
We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
and we do this by encoding huge PMDs and PUDs.
On a T4-2 with 256GB of ram the kernel page table takes up 16K with
DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this
memory is dynamically allocated at run time rather than coded
statically into the kernel image.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2014-09-25 10:56:11 +07:00
|
|
|
#define pgd_bad(pgd) (pgd_val(pgd) & ~PAGE_MASK)
|
2014-09-27 11:19:46 +07:00
|
|
|
|
2012-10-09 06:34:29 +07:00
|
|
|
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
2014-05-17 04:25:50 +07:00
|
|
|
void set_pmd_at(struct mm_struct *mm, unsigned long addr,
|
|
|
|
pmd_t *pmdp, pmd_t pmd);
|
2012-10-09 06:34:29 +07:00
|
|
|
#else
|
|
|
|
static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr,
|
|
|
|
pmd_t *pmdp, pmd_t pmd)
|
|
|
|
{
|
|
|
|
*pmdp = pmd;
|
|
|
|
}
|
|
|
|
#endif
|
|
|
|
|
|
|
|
static inline void pmd_set(struct mm_struct *mm, pmd_t *pmdp, pte_t *ptep)
|
|
|
|
{
|
2013-09-27 03:45:15 +07:00
|
|
|
unsigned long val = __pa((unsigned long) (ptep));
|
2012-10-09 06:34:29 +07:00
|
|
|
|
|
|
|
pmd_val(*pmdp) = val;
|
|
|
|
}
|
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
#define pud_set(pudp, pmdp) \
|
2013-09-27 03:45:15 +07:00
|
|
|
(pud_val(*(pudp)) = (__pa((unsigned long) (pmdp))))
|
2012-10-09 06:34:29 +07:00
|
|
|
static inline unsigned long __pmd_page(pmd_t pmd)
|
|
|
|
{
|
2013-09-27 03:45:15 +07:00
|
|
|
pte_t pte = __pte(pmd_val(pmd));
|
|
|
|
unsigned long pfn;
|
|
|
|
|
|
|
|
pfn = pte_pfn(pte);
|
|
|
|
|
|
|
|
return ((unsigned long) __va(pfn << PAGE_SHIFT));
|
2012-10-09 06:34:29 +07:00
|
|
|
}
|
2017-08-12 06:46:49 +07:00
|
|
|
|
|
|
|
static inline unsigned long pud_page_vaddr(pud_t pud)
|
|
|
|
{
|
|
|
|
pte_t pte = __pte(pud_val(pud));
|
|
|
|
unsigned long pfn;
|
|
|
|
|
|
|
|
pfn = pte_pfn(pte);
|
|
|
|
|
|
|
|
return ((unsigned long) __va(pfn << PAGE_SHIFT));
|
|
|
|
}
|
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
#define pmd_page(pmd) virt_to_page((void *)__pmd_page(pmd))
|
|
|
|
#define pud_page(pud) virt_to_page((void *)pud_page_vaddr(pud))
|
2013-09-26 04:33:16 +07:00
|
|
|
#define pmd_clear(pmdp) (pmd_val(*(pmdp)) = 0UL)
|
2008-07-18 11:55:51 +07:00
|
|
|
#define pud_present(pud) (pud_val(pud) != 0U)
|
2013-09-26 04:33:16 +07:00
|
|
|
#define pud_clear(pudp) (pud_val(*(pudp)) = 0UL)
|
2014-09-27 11:19:46 +07:00
|
|
|
#define pgd_page_vaddr(pgd) \
|
|
|
|
((unsigned long) __va(pgd_val(pgd)))
|
|
|
|
#define pgd_present(pgd) (pgd_val(pgd) != 0U)
|
2016-12-09 18:24:00 +07:00
|
|
|
#define pgd_clear(pgdp) (pgd_val(*(pgdp)) = 0UL)
|
2008-07-18 11:55:51 +07:00
|
|
|
|
2019-07-12 10:57:03 +07:00
|
|
|
/* only used by the stubbed out hugetlb gup code, should never be called */
|
|
|
|
#define pgd_page(pgd) NULL
|
|
|
|
|
sparc64: Fix physical memory management regressions with large max_phys_bits.
If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
DEBUG_PAGEALLOC stop working because the 3-level page tables only
can cover up to 43 bits.
Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
47, several statically allocated tables became enormous.
Compounding this is that we will need to support up to 49 bits of
physical addressing for M7 chips.
The two tables in question are sparc64_valid_addr_bitmap and
kpte_linear_bitmap.
The first holds a bitmap, with 1 bit for each 4MB chunk of physical
memory, indicating whether that chunk actually exists in the machine
and is valid.
The second table is a set of 2-bit values which tell how large of a
mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
chunk of ram in the system.
These tables are huge and take up an enormous amount of the BSS
section of the sparc64 kernel image. Specifically, the
sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.
So let's solve the space wastage and the DEBUG_PAGEALLOC problem
at the same time, by using the kernel page tables (as designed) to
manage this information.
We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
and we do this by encoding huge PMDs and PUDs.
On a T4-2 with 256GB of ram the kernel page table takes up 16K with
DEBUG_PAGEALLOC disabled and 256MB with it enabled. Furthermore, this
memory is dynamically allocated at run time rather than coded
statically into the kernel image.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Bob Picco <bob.picco@oracle.com>
2014-09-25 10:56:11 +07:00
|
|
|
static inline unsigned long pud_large(pud_t pud)
|
|
|
|
{
|
|
|
|
pte_t pte = __pte(pud_val(pud));
|
|
|
|
|
|
|
|
return pte_val(pte) & _PAGE_PMD_HUGE;
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline unsigned long pud_pfn(pud_t pud)
|
|
|
|
{
|
|
|
|
pte_t pte = __pte(pud_val(pud));
|
|
|
|
|
|
|
|
return pte_pfn(pte);
|
|
|
|
}
|
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
/* Same in both SUN4V and SUN4U. */
|
|
|
|
#define pte_none(pte) (!pte_val(pte))
|
|
|
|
|
2014-09-27 11:19:46 +07:00
|
|
|
#define pgd_set(pgdp, pudp) \
|
|
|
|
(pgd_val(*(pgdp)) = (__pa((unsigned long) (pudp))))
|
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
/* to find an entry in a page-table-directory. */
|
|
|
|
#define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
|
|
|
|
#define pgd_offset(mm, address) ((mm)->pgd + pgd_index(address))
|
|
|
|
|
|
|
|
/* to find an entry in a kernel page-table-directory */
|
|
|
|
#define pgd_offset_k(address) pgd_offset(&init_mm, address)
|
|
|
|
|
2014-09-27 11:19:46 +07:00
|
|
|
/* Find an entry in the third-level page table.. */
|
|
|
|
#define pud_index(address) (((address) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
|
|
|
|
#define pud_offset(pgdp, address) \
|
|
|
|
((pud_t *) pgd_page_vaddr(*(pgdp)) + pud_index(address))
|
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
/* Find an entry in the second-level page table.. */
|
|
|
|
#define pmd_offset(pudp, address) \
|
|
|
|
((pmd_t *) pud_page_vaddr(*(pudp)) + \
|
|
|
|
(((address) >> PMD_SHIFT) & (PTRS_PER_PMD-1)))
|
|
|
|
|
|
|
|
/* Find an entry in the third-level page table.. */
|
|
|
|
#define pte_index(dir, address) \
|
|
|
|
((pte_t *) __pmd_page(*(dir)) + \
|
|
|
|
((address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)))
|
|
|
|
#define pte_offset_kernel pte_index
|
|
|
|
#define pte_offset_map pte_index
|
|
|
|
#define pte_unmap(pte) do { } while (0)
|
|
|
|
|
2017-02-04 06:16:44 +07:00
|
|
|
/* We cannot include <linux/mm_types.h> at this point yet: */
|
|
|
|
extern struct mm_struct init_mm;
|
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
/* Actual page table PTE updates. */
|
2014-05-17 04:25:50 +07:00
|
|
|
void tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
|
2017-02-02 07:16:36 +07:00
|
|
|
pte_t *ptep, pte_t orig, int fullmm,
|
|
|
|
unsigned int hugepage_shift);
|
2008-07-18 11:55:51 +07:00
|
|
|
|
2016-03-31 01:17:13 +07:00
|
|
|
static void maybe_tlb_batch_add(struct mm_struct *mm, unsigned long vaddr,
|
2017-02-02 07:16:36 +07:00
|
|
|
pte_t *ptep, pte_t orig, int fullmm,
|
|
|
|
unsigned int hugepage_shift)
|
2016-03-31 01:17:13 +07:00
|
|
|
{
|
|
|
|
/* It is more efficient to let flush_tlb_kernel_range()
|
|
|
|
* handle init_mm tlb flushes.
|
|
|
|
*
|
|
|
|
* SUN4V NOTE: _PAGE_VALID is the same value in both the SUN4U
|
|
|
|
* and SUN4V pte layout, so this inline test is fine.
|
|
|
|
*/
|
|
|
|
if (likely(mm != &init_mm) && pte_accessible(mm, orig))
|
2017-02-02 07:16:36 +07:00
|
|
|
tlb_batch_add(mm, vaddr, ptep, orig, fullmm, hugepage_shift);
|
2016-03-31 01:17:13 +07:00
|
|
|
}
|
|
|
|
|
2015-06-25 06:57:44 +07:00
|
|
|
#define __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR
|
|
|
|
static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
|
|
|
|
unsigned long addr,
|
|
|
|
pmd_t *pmdp)
|
2012-10-09 06:34:29 +07:00
|
|
|
{
|
|
|
|
pmd_t pmd = *pmdp;
|
2013-09-26 04:33:16 +07:00
|
|
|
set_pmd_at(mm, addr, pmdp, __pmd(0UL));
|
2012-10-09 06:34:29 +07:00
|
|
|
return pmd;
|
|
|
|
}
|
|
|
|
|
2011-05-25 07:11:50 +07:00
|
|
|
static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
|
|
|
|
pte_t *ptep, pte_t pte, int fullmm)
|
2008-07-18 11:55:51 +07:00
|
|
|
{
|
|
|
|
pte_t orig = *ptep;
|
|
|
|
|
|
|
|
*ptep = pte;
|
2017-02-02 07:16:36 +07:00
|
|
|
maybe_tlb_batch_add(mm, addr, ptep, orig, fullmm, PAGE_SHIFT);
|
2008-07-18 11:55:51 +07:00
|
|
|
}
|
|
|
|
|
2011-05-25 07:11:50 +07:00
|
|
|
#define set_pte_at(mm,addr,ptep,pte) \
|
|
|
|
__set_pte_at((mm), (addr), (ptep), (pte), 0)
|
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
#define pte_clear(mm,addr,ptep) \
|
|
|
|
set_pte_at((mm), (addr), (ptep), __pte(0UL))
|
|
|
|
|
2011-05-25 07:11:50 +07:00
|
|
|
#define __HAVE_ARCH_PTE_CLEAR_NOT_PRESENT_FULL
|
|
|
|
#define pte_clear_not_present_full(mm,addr,ptep,fullmm) \
|
|
|
|
__set_pte_at((mm), (addr), (ptep), __pte(0UL), (fullmm))
|
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
#ifdef DCACHE_ALIASING_POSSIBLE
|
|
|
|
#define __HAVE_ARCH_MOVE_PTE
|
|
|
|
#define move_pte(pte, prot, old_addr, new_addr) \
|
|
|
|
({ \
|
|
|
|
pte_t newpte = (pte); \
|
|
|
|
if (tlb_type != hypervisor && pte_present(pte)) { \
|
|
|
|
unsigned long this_pfn = pte_pfn(pte); \
|
|
|
|
\
|
|
|
|
if (pfn_valid(this_pfn) && \
|
|
|
|
(((old_addr) ^ (new_addr)) & (1 << 13))) \
|
|
|
|
flush_dcache_page_all(current->mm, \
|
|
|
|
pfn_to_page(this_pfn)); \
|
|
|
|
} \
|
|
|
|
newpte; \
|
|
|
|
})
|
|
|
|
#endif
|
|
|
|
|
2013-09-26 04:33:16 +07:00
|
|
|
extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
|
2008-07-18 11:55:51 +07:00
|
|
|
|
2014-05-17 04:25:50 +07:00
|
|
|
void paging_init(void);
|
|
|
|
unsigned long find_ecache_flush_span(unsigned long size);
|
2008-07-18 11:55:51 +07:00
|
|
|
|
2011-04-22 05:45:45 +07:00
|
|
|
struct seq_file;
|
2014-05-17 04:25:50 +07:00
|
|
|
void mmu_info(struct seq_file *);
|
2011-04-22 05:45:45 +07:00
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
struct vm_area_struct;
|
2014-05-17 04:25:50 +07:00
|
|
|
void update_mmu_cache(struct vm_area_struct *, unsigned long, pte_t *);
|
2012-10-09 06:34:29 +07:00
|
|
|
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
2014-05-17 04:25:50 +07:00
|
|
|
void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long addr,
|
|
|
|
pmd_t *pmd);
|
2012-10-09 06:34:29 +07:00
|
|
|
|
2014-04-25 03:58:02 +07:00
|
|
|
#define __HAVE_ARCH_PMDP_INVALIDATE
|
2018-02-01 07:18:09 +07:00
|
|
|
extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
|
2014-04-25 03:58:02 +07:00
|
|
|
pmd_t *pmdp);
|
|
|
|
|
2012-10-09 06:34:29 +07:00
|
|
|
#define __HAVE_ARCH_PGTABLE_DEPOSIT
|
2014-05-17 04:25:50 +07:00
|
|
|
void pgtable_trans_huge_deposit(struct mm_struct *mm, pmd_t *pmdp,
|
|
|
|
pgtable_t pgtable);
|
2012-10-09 06:34:29 +07:00
|
|
|
|
|
|
|
#define __HAVE_ARCH_PGTABLE_WITHDRAW
|
2014-05-17 04:25:50 +07:00
|
|
|
pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp);
|
2012-10-09 06:34:29 +07:00
|
|
|
#endif
|
2008-07-18 11:55:51 +07:00
|
|
|
|
|
|
|
/* Encode and de-code a swap entry */
|
|
|
|
#define __swp_type(entry) (((entry).val >> PAGE_SHIFT) & 0xffUL)
|
|
|
|
#define __swp_offset(entry) ((entry).val >> (PAGE_SHIFT + 8UL))
|
|
|
|
#define __swp_entry(type, offset) \
|
|
|
|
( (swp_entry_t) \
|
|
|
|
{ \
|
|
|
|
(((long)(type) << PAGE_SHIFT) | \
|
|
|
|
((long)(offset) << (PAGE_SHIFT + 8UL))) \
|
|
|
|
} )
|
|
|
|
#define __pte_to_swp_entry(pte) ((swp_entry_t) { pte_val(pte) })
|
|
|
|
#define __swp_entry_to_pte(x) ((pte_t) { (x).val })
|
|
|
|
|
2014-05-17 04:25:50 +07:00
|
|
|
int page_in_phys_avail(unsigned long paddr);
|
2008-07-18 11:55:51 +07:00
|
|
|
|
|
|
|
/*
|
|
|
|
* For sparc32&64, the pfn in io_remap_pfn_range() carries <iospace> in
|
|
|
|
* its high 4 bits. These macros/functions put it there or get it from there.
|
|
|
|
*/
|
|
|
|
#define MK_IOSPACE_PFN(space, pfn) (pfn | (space << (BITS_PER_LONG - 4)))
|
|
|
|
#define GET_IOSPACE(pfn) (pfn >> (BITS_PER_LONG - 4))
|
|
|
|
#define GET_PFN(pfn) (pfn & 0x0fffffffffffffffUL)
|
|
|
|
|
2014-05-17 04:25:50 +07:00
|
|
|
int remap_pfn_range(struct vm_area_struct *, unsigned long, unsigned long,
|
|
|
|
unsigned long, pgprot_t);
|
2011-11-18 09:17:59 +07:00
|
|
|
|
2018-02-24 05:46:41 +07:00
|
|
|
void adi_restore_tags(struct mm_struct *mm, struct vm_area_struct *vma,
|
|
|
|
unsigned long addr, pte_t pte);
|
|
|
|
|
|
|
|
int adi_save_tags(struct mm_struct *mm, struct vm_area_struct *vma,
|
|
|
|
unsigned long addr, pte_t oldpte);
|
|
|
|
|
|
|
|
#define __HAVE_ARCH_DO_SWAP_PAGE
|
|
|
|
static inline void arch_do_swap_page(struct mm_struct *mm,
|
|
|
|
struct vm_area_struct *vma,
|
|
|
|
unsigned long addr,
|
|
|
|
pte_t pte, pte_t oldpte)
|
|
|
|
{
|
|
|
|
/* If this is a new page being mapped in, there can be no
|
|
|
|
* ADI tags stored away for this page. Skip looking for
|
|
|
|
* stored tags
|
|
|
|
*/
|
|
|
|
if (pte_none(oldpte))
|
|
|
|
return;
|
|
|
|
|
|
|
|
if (adi_state.enabled && (pte_val(pte) & _PAGE_MCD_4V))
|
|
|
|
adi_restore_tags(mm, vma, addr, pte);
|
|
|
|
}
|
|
|
|
|
|
|
|
#define __HAVE_ARCH_UNMAP_ONE
|
|
|
|
static inline int arch_unmap_one(struct mm_struct *mm,
|
|
|
|
struct vm_area_struct *vma,
|
|
|
|
unsigned long addr, pte_t oldpte)
|
|
|
|
{
|
|
|
|
if (adi_state.enabled && (pte_val(oldpte) & _PAGE_MCD_4V))
|
|
|
|
return adi_save_tags(mm, vma, addr, oldpte);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2011-11-18 09:17:59 +07:00
|
|
|
static inline int io_remap_pfn_range(struct vm_area_struct *vma,
|
|
|
|
unsigned long from, unsigned long pfn,
|
|
|
|
unsigned long size, pgprot_t prot)
|
|
|
|
{
|
|
|
|
unsigned long offset = GET_PFN(pfn) << PAGE_SHIFT;
|
|
|
|
int space = GET_IOSPACE(pfn);
|
|
|
|
unsigned long phys_base;
|
|
|
|
|
|
|
|
phys_base = offset | (((unsigned long) space) << 32UL);
|
|
|
|
|
|
|
|
return remap_pfn_range(vma, from, phys_base >> PAGE_SHIFT, size, prot);
|
|
|
|
}
|
2013-05-11 23:13:10 +07:00
|
|
|
#define io_remap_pfn_range io_remap_pfn_range
|
2011-11-18 09:17:59 +07:00
|
|
|
|
lib: untag user pointers in strn*_user
Patch series "arm64: untag user pointers passed to the kernel", v19.
=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer. Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged
pointers, due to these patches:
1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
tagged pointer")
2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
pointers")
3. 276e9327 ("arm64: entry: improve data abort handling of tagged
pointers")
This patchset extends tagged pointer support to syscall arguments.
As per the proposed ABI change [3], tagged pointers are only allowed to be
passed to syscalls when they point to memory ranges obtained by anonymous
mmap() or sbrk() (see the patchset [3] for more details).
For non-memory syscalls this is done by untaging user pointers when the
kernel performs pointer checking to find out whether the pointer comes
from userspace (most notably in access_ok). The untagging is done only
when the pointer is being checked, the tag is preserved as the pointer
makes its way through the kernel and stays tagged when the kernel
dereferences the pointer when perfoming user memory accesses.
The mmap and mremap (only new_addr) syscalls do not currently accept
tagged addresses. Architectures may interpret the tag as a background
colour for the corresponding vma.
Other memory syscalls (mprotect, etc.) don't do user memory accesses but
rather deal with memory ranges, and untagged pointers are better suited to
describe memory ranges internally. Thus for memory syscalls we untag
pointers completely when they enter the kernel.
=== Other approaches
One of the alternative approaches to untagging that was considered is to
completely strip the pointer tag as the pointer enters the kernel with
some kind of a syscall wrapper, but that won't work with the countless
number of different ioctl calls. With this approach we would need a
custom wrapper for each ioctl variation, which doesn't seem practical.
An alternative approach to untagging pointers in memory syscalls prologues
is to inspead allow tagged pointers to be passed to find_vma() (and other
vma related functions) and untag them there. Unfortunately, a lot of
find_vma() callers then compare or subtract the returned vma start and end
fields against the pointer that was being searched. Thus this approach
would still require changing all find_vma() callers.
=== Testing
The following testing approaches has been taken to find potential issues
with user pointer untagging:
1. Static testing (with sparse [2] and separately with a custom static
analyzer based on Clang) to track casts of __user pointers to integer
types to find places where untagging needs to be done.
2. Static testing with grep to find parts of the kernel that call
find_vma() (and other similar functions) or directly compare against
vm_start/vm_end fields of vma.
3. Static testing with grep to find parts of the kernel that compare
user pointers with TASK_SIZE or other similar consts and macros.
4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
a modified syzkaller version that passes tagged pointers to the kernel.
Based on the results of the testing the requried patches have been added
to the patchset.
=== Notes
This patchset is meant to be merged together with "arm64 relaxed ABI" [3].
This patchset is a prerequisite for ARM's memory tagging hardware feature
support [4].
This patchset has been merged into the Pixel 2 & 3 kernel trees and is
now being used to enable testing of Pixel phones with HWASan.
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060e0145f292
[3] https://lkml.org/lkml/2019/6/12/745
[4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture-2018-developments-armv85a
This patch (of 11)
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.
strncpy_from_user and strnlen_user accept user addresses as arguments, and
do not go through the same path as copy_from_user and others, so here we
need to handle the case of tagged user addresses separately.
Untag user pointers passed to these functions.
Note, that this patch only temporarily untags the pointers to perform
validity checks, but then uses them as is to perform user memory accesses.
[andreyknvl@google.com: fix sparc4 build]
Link: http://lkml.kernel.org/r/CAAeHK+yx4a-P0sDrXTUxMvO2H0CJZUFPffBrg_cU7oJOZyC7ew@mail.gmail.com
Link: http://lkml.kernel.org/r/c5a78bcad3e94d6cda71fcaa60a423231ae71e4c.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jens Wiklander <jens.wiklander@linaro.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26 06:48:27 +07:00
|
|
|
static inline unsigned long __untagged_addr(unsigned long start)
|
2019-07-12 10:57:07 +07:00
|
|
|
{
|
|
|
|
if (adi_capable()) {
|
|
|
|
long addr = start;
|
|
|
|
|
|
|
|
/* If userspace has passed a versioned address, kernel
|
|
|
|
* will not find it in the VMAs since it does not store
|
|
|
|
* the version tags in the list of VMAs. Storing version
|
|
|
|
* tags in list of VMAs is impractical since they can be
|
|
|
|
* changed any time from userspace without dropping into
|
|
|
|
* kernel. Any address search in VMAs will be done with
|
|
|
|
* non-versioned addresses. Ensure the ADI version bits
|
|
|
|
* are dropped here by sign extending the last bit before
|
|
|
|
* ADI bits. IOMMU does not implement version tags.
|
|
|
|
*/
|
|
|
|
return (addr << (long)adi_nbits()) >> (long)adi_nbits();
|
|
|
|
}
|
|
|
|
|
|
|
|
return start;
|
|
|
|
}
|
lib: untag user pointers in strn*_user
Patch series "arm64: untag user pointers passed to the kernel", v19.
=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer. Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged
pointers, due to these patches:
1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
tagged pointer")
2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
pointers")
3. 276e9327 ("arm64: entry: improve data abort handling of tagged
pointers")
This patchset extends tagged pointer support to syscall arguments.
As per the proposed ABI change [3], tagged pointers are only allowed to be
passed to syscalls when they point to memory ranges obtained by anonymous
mmap() or sbrk() (see the patchset [3] for more details).
For non-memory syscalls this is done by untaging user pointers when the
kernel performs pointer checking to find out whether the pointer comes
from userspace (most notably in access_ok). The untagging is done only
when the pointer is being checked, the tag is preserved as the pointer
makes its way through the kernel and stays tagged when the kernel
dereferences the pointer when perfoming user memory accesses.
The mmap and mremap (only new_addr) syscalls do not currently accept
tagged addresses. Architectures may interpret the tag as a background
colour for the corresponding vma.
Other memory syscalls (mprotect, etc.) don't do user memory accesses but
rather deal with memory ranges, and untagged pointers are better suited to
describe memory ranges internally. Thus for memory syscalls we untag
pointers completely when they enter the kernel.
=== Other approaches
One of the alternative approaches to untagging that was considered is to
completely strip the pointer tag as the pointer enters the kernel with
some kind of a syscall wrapper, but that won't work with the countless
number of different ioctl calls. With this approach we would need a
custom wrapper for each ioctl variation, which doesn't seem practical.
An alternative approach to untagging pointers in memory syscalls prologues
is to inspead allow tagged pointers to be passed to find_vma() (and other
vma related functions) and untag them there. Unfortunately, a lot of
find_vma() callers then compare or subtract the returned vma start and end
fields against the pointer that was being searched. Thus this approach
would still require changing all find_vma() callers.
=== Testing
The following testing approaches has been taken to find potential issues
with user pointer untagging:
1. Static testing (with sparse [2] and separately with a custom static
analyzer based on Clang) to track casts of __user pointers to integer
types to find places where untagging needs to be done.
2. Static testing with grep to find parts of the kernel that call
find_vma() (and other similar functions) or directly compare against
vm_start/vm_end fields of vma.
3. Static testing with grep to find parts of the kernel that compare
user pointers with TASK_SIZE or other similar consts and macros.
4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
a modified syzkaller version that passes tagged pointers to the kernel.
Based on the results of the testing the requried patches have been added
to the patchset.
=== Notes
This patchset is meant to be merged together with "arm64 relaxed ABI" [3].
This patchset is a prerequisite for ARM's memory tagging hardware feature
support [4].
This patchset has been merged into the Pixel 2 & 3 kernel trees and is
now being used to enable testing of Pixel phones with HWASan.
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060e0145f292
[3] https://lkml.org/lkml/2019/6/12/745
[4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture-2018-developments-armv85a
This patch (of 11)
This patch is a part of a series that extends kernel ABI to allow to pass
tagged user pointers (with the top byte set to something else other than
0x00) as syscall arguments.
strncpy_from_user and strnlen_user accept user addresses as arguments, and
do not go through the same path as copy_from_user and others, so here we
need to handle the case of tagged user addresses separately.
Untag user pointers passed to these functions.
Note, that this patch only temporarily untags the pointers to perform
validity checks, but then uses them as is to perform user memory accesses.
[andreyknvl@google.com: fix sparc4 build]
Link: http://lkml.kernel.org/r/CAAeHK+yx4a-P0sDrXTUxMvO2H0CJZUFPffBrg_cU7oJOZyC7ew@mail.gmail.com
Link: http://lkml.kernel.org/r/c5a78bcad3e94d6cda71fcaa60a423231ae71e4c.1563904656.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Jens Wiklander <jens.wiklander@linaro.org>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26 06:48:27 +07:00
|
|
|
#define untagged_addr(addr) \
|
2019-09-26 21:28:17 +07:00
|
|
|
((__typeof__(addr))(__untagged_addr((unsigned long)(addr))))
|
2019-07-12 10:57:07 +07:00
|
|
|
|
2019-07-12 10:57:11 +07:00
|
|
|
static inline bool pte_access_permitted(pte_t pte, bool write)
|
|
|
|
{
|
|
|
|
u64 prot;
|
|
|
|
|
|
|
|
if (tlb_type == hypervisor) {
|
|
|
|
prot = _PAGE_PRESENT_4V | _PAGE_P_4V;
|
|
|
|
if (write)
|
|
|
|
prot |= _PAGE_WRITE_4V;
|
|
|
|
} else {
|
|
|
|
prot = _PAGE_PRESENT_4U | _PAGE_P_4U;
|
|
|
|
if (write)
|
|
|
|
prot |= _PAGE_WRITE_4U;
|
|
|
|
}
|
|
|
|
|
|
|
|
return (pte_val(pte) & (prot | _PAGE_SPECIAL)) == prot;
|
|
|
|
}
|
|
|
|
#define pte_access_permitted pte_access_permitted
|
|
|
|
|
sparc64: Fix race in TLB batch processing.
As reported by Dave Kleikamp, when we emit cross calls to do batched
TLB flush processing we have a race because we do not synchronize on
the sibling cpus completing the cross call.
So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.)
and either flushes are missed or flushes will flush the wrong
addresses.
Fix this by using generic infrastructure to synchonize on the
completion of the cross call.
This first required getting the flush_tlb_pending() call out from
switch_to() which operates with locks held and interrupts disabled.
The problem is that smp_call_function_many() cannot be invoked with
IRQs disabled and this is explicitly checked for with WARN_ON_ONCE().
We get the batch processing outside of locked IRQ disabled sections by
using some ideas from the powerpc port. Namely, we only batch inside
of arch_{enter,leave}_lazy_mmu_mode() calls. If we're not in such a
region, we flush TLBs synchronously.
1) Get rid of xcall_flush_tlb_pending and per-cpu type
implementations.
2) Do TLB batch cross calls instead via:
smp_call_function_many()
tlb_pending_func()
__flush_tlb_pending()
3) Batch only in lazy mmu sequences:
a) Add 'active' member to struct tlb_batch
b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE
c) Set 'active' in arch_enter_lazy_mmu_mode()
d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode()
e) Check 'active' in tlb_batch_add_one() and do a synchronous
flush if it's clear.
4) Add infrastructure for synchronous TLB page flushes.
a) Implement __flush_tlb_page and per-cpu variants, patch
as needed.
b) Likewise for xcall_flush_tlb_page.
c) Implement smp_flush_tlb_page() to invoke the cross-call.
d) Wire up global_flush_tlb_page() to the right routine based
upon CONFIG_SMP
5) It turns out that singleton batches are very common, 2 out of every
3 batch flushes have only a single entry in them.
The batch flush waiting is very expensive, both because of the poll
on sibling cpu completeion, as well as because passing the tlb batch
pointer to the sibling cpus invokes a shared memory dereference.
Therefore, in flush_tlb_pending(), if there is only one entry in
the batch perform a completely asynchronous global_flush_tlb_page()
instead.
Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2013-04-20 04:26:26 +07:00
|
|
|
#include <asm/tlbflush.h>
|
2008-07-18 11:55:51 +07:00
|
|
|
#include <asm-generic/pgtable.h>
|
|
|
|
|
|
|
|
/* We provide our own get_unmapped_area to cope with VA holes and
|
|
|
|
* SHM area cache aliasing for userland.
|
|
|
|
*/
|
|
|
|
#define HAVE_ARCH_UNMAPPED_AREA
|
|
|
|
#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
|
|
|
|
|
|
|
|
/* We provide a special get_unmapped_area for framebuffer mmaps to try and use
|
|
|
|
* the largest alignment possible such that larget PTEs can be used.
|
|
|
|
*/
|
2014-05-17 04:25:50 +07:00
|
|
|
unsigned long get_fb_unmapped_area(struct file *filp, unsigned long,
|
|
|
|
unsigned long, unsigned long,
|
|
|
|
unsigned long);
|
2008-07-18 11:55:51 +07:00
|
|
|
#define HAVE_ARCH_FB_UNMAPPED_AREA
|
|
|
|
|
2014-05-17 04:25:50 +07:00
|
|
|
void sun4v_register_fault_status(void);
|
|
|
|
void sun4v_ktsb_register(void);
|
|
|
|
void __init cheetah_ecache_flush_init(void);
|
|
|
|
void sun4v_patch_tlb_handlers(void);
|
2008-07-18 11:55:51 +07:00
|
|
|
|
|
|
|
extern unsigned long cmdline_memory_size;
|
|
|
|
|
2014-05-17 04:25:50 +07:00
|
|
|
asmlinkage void do_sparc64_fault(struct pt_regs *regs);
|
2008-09-12 14:10:32 +07:00
|
|
|
|
2008-07-18 11:55:51 +07:00
|
|
|
#endif /* !(__ASSEMBLY__) */
|
|
|
|
|
|
|
|
#endif /* !(_SPARC64_PGTABLE_H) */
|