Commit Graph

103 Commits

Author SHA1 Message Date
Linus Torvalds
588ab3f9af arm64 updates for 4.6:
- Initial page table creation reworked to avoid breaking large block
   mappings (huge pages) into smaller ones. The ARM architecture requires
   break-before-make in such cases to avoid TLB conflicts but that's not
   always possible on live page tables
 
 - Kernel virtual memory layout: the kernel image is no longer linked to
   the bottom of the linear mapping (PAGE_OFFSET) but at the bottom of
   the vmalloc space, allowing the kernel to be loaded (nearly) anywhere
   in physical RAM
 
 - Kernel ASLR: position independent kernel Image and modules being
   randomly mapped in the vmalloc space with the randomness is provided
   by UEFI (efi_get_random_bytes() patches merged via the arm64 tree,
   acked by Matt Fleming)
 
 - Implement relative exception tables for arm64, required by KASLR
   (initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c but
   actual x86 conversion to deferred to 4.7 because of the merge
   dependencies)
 
 - Support for the User Access Override feature of ARMv8.2: this allows
   uaccess functions (get_user etc.) to be implemented using LDTR/STTR
   instructions. Such instructions, when run by the kernel, perform
   unprivileged accesses adding an extra level of protection. The
   set_fs() macro is used to "upgrade" such instruction to privileged
   accesses via the UAO bit
 
 - Half-precision floating point support (part of ARMv8.2)
 
 - Optimisations for CPUs with or without a hardware prefetcher (using
   run-time code patching)
 
 - copy_page performance improvement to deal with 128 bytes at a time
 
 - Sanity checks on the CPU capabilities (via CPUID) to prevent
   incompatible secondary CPUs from being brought up (e.g. weird
   big.LITTLE configurations)
 
 - valid_user_regs() reworked for better sanity check of the sigcontext
   information (restored pstate information)
 
 - ACPI parking protocol implementation
 
 - CONFIG_DEBUG_RODATA enabled by default
 
 - VDSO code marked as read-only
 
 - DEBUG_PAGEALLOC support
 
 - ARCH_HAS_UBSAN_SANITIZE_ALL enabled
 
 - Erratum workaround Cavium ThunderX SoC
 
 - set_pte_at() fix for PROT_NONE mappings
 
 - Code clean-ups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW6u95AAoJEGvWsS0AyF7xMyoP/3x2O6bgreSQ84BdO4JChN4+
 RQ9OVdX8u2ItO9sgaCY2AA6KoiBuEjGmPl/XRuK0I7DpODTtRjEXQHuNNhz8AelC
 hn4AEVqamY6Z5BzHFIjs8G9ydEbq+OXcKWEdwSsBhP/cMvI7ss3dps1f5iNPT5Vv
 50E/kUz+aWYy7pKlB18VDV7TUOA3SuYuGknWV8+bOY5uPb8hNT3Y3fHOg/EuNNN3
 DIuYH1V7XQkXtF+oNVIGxzzJCXULBE7egMcWAm1ydSOHK0JwkZAiL7OhI7ceVD0x
 YlDxBnqmi4cgzfBzTxITAhn3OParwN6udQprdF1WGtFF6fuY2eRDSH/L/iZoE4DY
 OulL951OsBtF8YC3+RKLk908/0bA2Uw8ftjCOFJTYbSnZBj1gWK41VkCYMEXiHQk
 EaN8+2Iw206iYIoyvdjGCLw7Y0oakDoVD9vmv12SOaHeQljTkjoN8oIlfjjKTeP7
 3AXj5v9BDMDVh40nkVayysRNvqe48Kwt9Wn0rhVTLxwdJEiFG/OIU6HLuTkretdN
 dcCNFSQrRieSFHpBK9G0vKIpIss1ZwLm8gjocVXH7VK4Mo/TNQe4p2/wAF29mq4r
 xu1UiXmtU3uWxiqZnt72LOYFCarQ0sFA5+pMEvF5W+NrVB0wGpXhcwm+pGsIi4IM
 LepccTgykiUBqW5TRzPz
 =/oS+
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "Here are the main arm64 updates for 4.6.  There are some relatively
  intrusive changes to support KASLR, the reworking of the kernel
  virtual memory layout and initial page table creation.

  Summary:

   - Initial page table creation reworked to avoid breaking large block
     mappings (huge pages) into smaller ones.  The ARM architecture
     requires break-before-make in such cases to avoid TLB conflicts but
     that's not always possible on live page tables

   - Kernel virtual memory layout: the kernel image is no longer linked
     to the bottom of the linear mapping (PAGE_OFFSET) but at the bottom
     of the vmalloc space, allowing the kernel to be loaded (nearly)
     anywhere in physical RAM

   - Kernel ASLR: position independent kernel Image and modules being
     randomly mapped in the vmalloc space with the randomness is
     provided by UEFI (efi_get_random_bytes() patches merged via the
     arm64 tree, acked by Matt Fleming)

   - Implement relative exception tables for arm64, required by KASLR
     (initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c
     but actual x86 conversion to deferred to 4.7 because of the merge
     dependencies)

   - Support for the User Access Override feature of ARMv8.2: this
     allows uaccess functions (get_user etc.) to be implemented using
     LDTR/STTR instructions.  Such instructions, when run by the kernel,
     perform unprivileged accesses adding an extra level of protection.
     The set_fs() macro is used to "upgrade" such instruction to
     privileged accesses via the UAO bit

   - Half-precision floating point support (part of ARMv8.2)

   - Optimisations for CPUs with or without a hardware prefetcher (using
     run-time code patching)

   - copy_page performance improvement to deal with 128 bytes at a time

   - Sanity checks on the CPU capabilities (via CPUID) to prevent
     incompatible secondary CPUs from being brought up (e.g.  weird
     big.LITTLE configurations)

   - valid_user_regs() reworked for better sanity check of the
     sigcontext information (restored pstate information)

   - ACPI parking protocol implementation

   - CONFIG_DEBUG_RODATA enabled by default

   - VDSO code marked as read-only

   - DEBUG_PAGEALLOC support

   - ARCH_HAS_UBSAN_SANITIZE_ALL enabled

   - Erratum workaround Cavium ThunderX SoC

   - set_pte_at() fix for PROT_NONE mappings

   - Code clean-ups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (99 commits)
  arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
  arm64: kasan: Use actual memory node when populating the kernel image shadow
  arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
  arm64: Fix misspellings in comments.
  arm64: efi: add missing frame pointer assignment
  arm64: make mrs_s prefixing implicit in read_cpuid
  arm64: enable CONFIG_DEBUG_RODATA by default
  arm64: Rework valid_user_regs
  arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
  arm64: KVM: Move kvm_call_hyp back to its original localtion
  arm64: mm: treat memstart_addr as a signed quantity
  arm64: mm: list kernel sections in order
  arm64: lse: deal with clobbered IP registers after branch via PLT
  arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDR
  arm64: kconfig: add submenu for 8.2 architectural features
  arm64: kernel: acpi: fix ioremap in ACPI parking protocol cpu_postboot
  arm64: Add support for Half precision floating point
  arm64: Remove fixmap include fragility
  arm64: Add workaround for Cavium erratum 27456
  arm64: mm: Mark .rodata as RO
  ...
2016-03-17 20:03:47 -07:00
Catalin Marinas
fdc69e7df3 arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
The set_pte_at() function must update the hardware PTE_RDONLY bit
depending on the state of the PTE_WRITE and PTE_DIRTY bits of the given
entry value. However, it currently only performs this for pte_valid()
entries, ignoring PTE_PROT_NONE. The side-effect is that PROT_NONE
mappings would not have the PTE_RDONLY bit set. Without
CONFIG_ARM64_HW_AFDBM, this is not an issue since such PROT_NONE pages
are not accessible anyway.

With commit 2f4b829c62 ("arm64: Add support for hardware updates of
the access and dirty pte bits"), the ptep_set_wrprotect() function was
re-written to cope with automatic hardware updates of the dirty state.
As an optimisation, only PTE_RDONLY is checked to assess the "dirty"
status. Since set_pte_at() does not set this bit for PROT_NONE mappings,
such pages may be considered "dirty" as a result of
ptep_set_wrprotect().

This patch updates the pte_valid() check to pte_present() in
set_pte_at(). It also adds PTE_PROT_NONE to the swap entry bits comment.

Fixes: 2f4b829c62 ("arm64: Add support for hardware updates of the access and dirty pte bits")
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Tested-by: Ganapatrao Kulkarni <gkulkarni@cavium.com>
Cc: <stable@vger.kernel.org>
2016-03-11 11:03:34 +00:00
Ard Biesheuvel
36e5cd6b89 arm64: account for sparsemem section alignment when choosing vmemmap offset
Commit dfd55ad85e ("arm64: vmemmap: use virtual projection of linear
region") fixed an issue where the struct page array would overflow into the
adjacent virtual memory region if system RAM was placed so high up in
physical memory that its addresses were not representable in the build time
configured virtual address size.

However, the fix failed to take into account that the vmemmap region needs
to be relatively aligned with respect to the sparsemem section size, so that
a sequence of page structs corresponding with a sparsemem section in the
linear region appears naturally aligned in the vmemmap region.

So round up vmemmap to sparsemem section size. Since this essentially moves
the projection of the linear region up in memory, also revert the reduction
of the size of the vmemmap region.

Cc: <stable@vger.kernel.org>
Fixes: dfd55ad85e ("arm64: vmemmap: use virtual projection of linear region")
Tested-by: Mark Langsdorf <mlangsdo@redhat.com>
Tested-by: David Daney <david.daney@cavium.com>
Tested-by: Robert Richter <rrichter@cavium.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-03-09 14:57:08 +00:00
Ard Biesheuvel
dfd55ad85e arm64: vmemmap: use virtual projection of linear region
Commit dd006da216 ("arm64: mm: increase VA range of identity map") made
some changes to the memory mapping code to allow physical memory to reside
at an offset that exceeds the size of the virtual mapping.

However, since the size of the vmemmap area is proportional to the size of
the VA area, but it is populated relative to the physical space, we may
end up with the struct page array being mapped outside of the vmemmap
region. For instance, on my Seattle A0 box, I can see the following output
in the dmesg log.

   vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000   (     8 GB maximum)
             0xffffffbfc0000000 - 0xffffffbfd0000000   (   256 MB actual)

We can fix this by deciding that the vmemmap region is not a projection of
the physical space, but of the virtual space above PAGE_OFFSET, i.e., the
linear region. This way, we are guaranteed that the vmemmap region is of
sufficient size, and we can even reduce the size by half.

Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-02-26 17:59:04 +00:00
Mark Rutland
3eca86e75e arm64: Remove fixmap include fragility
The asm-generic fixmap.h depends on each architecture's fixmap.h to pull
in the definition of PAGE_KERNEL_RO, if this exists. In the absence of
this, FIXMAP_PAGE_RO will not be defined. In mm/early_ioremap.c the
definition of early_memremap_ro is predicated on FIXMAP_PAGE_RO being
defined.

Currently, the arm64 fixmap.h doesn't include pgtable.h for the
definition of PAGE_KERNEL_RO, and as a knock-on effect early_memremap_ro
is not always defined, leading to link-time failures when it is used.
This has been observed with defconfig on next-20160226.

Unfortunately, as pgtable.h includes fixmap.h, adding the include
introduces a circular dependency, which is just as fragile.

Instead, this patch factors out PAGE_KERNEL_RO and other prot
definitions into a new pgtable-prot header which can be included by poth
pgtable.h and fixmap.h, avoiding the  circular dependency, and ensuring
that early_memremap_ro is alwyas defined where it is used.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-26 15:22:53 +00:00
Catalin Marinas
cac4b8cdf5 arm64: Fix building error with 16KB pages and 36-bit VA
In such configuration, Linux uses only two pages of page tables and
__pud_populate() should not be used. However, the BUILD_BUG() triggers
since pud_sect() is still defined and the compiler cannot eliminate such
code, even though at run-time it should not be triggered. This patch
extends the #ifdef ARM64_64K_PAGES condition for pud_sect to include
PGTABLE_LEVELS < 3.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-25 16:01:26 +00:00
Ard Biesheuvel
f9040773b7 arm64: move kernel image to base of vmalloc area
This moves the module area to right before the vmalloc area, and moves
the kernel image to the base of the vmalloc area. This is an intermediate
step towards implementing KASLR, which allows the kernel image to be
located anywhere in the vmalloc area.

Since other subsystems such as hibernate may still need to refer to the
kernel text or data segments via their linears addresses, both are mapped
in the linear region as well. The linear alias of the text region is
mapped read-only/non-executable to prevent inadvertent modification or
execution.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-18 18:16:44 +00:00
Ard Biesheuvel
6533945a32 arm64: pgtable: implement static [pte|pmd|pud]_offset variants
The page table accessors pte_offset(), pud_offset() and pmd_offset()
rely on __va translations, so they can only be used after the linear
mapping has been installed. For the early fixmap and kasan init routines,
whose page tables are allocated statically in the kernel image, these
functions will return bogus values. So implement pte_offset_kimg(),
pmd_offset_kimg() and pud_offset_kimg(), which can be used instead
before any page tables have been allocated dynamically.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-18 18:16:31 +00:00
Mark Rutland
961faac114 arm64: mm: add functions to walk tables in fixmap
As a preparatory step to allow us to allocate early page tables from
unmapped memory using memblock_alloc, add new p??_{set,clear}_fixmap*
functions which can be used to walk page tables outside of the linear
mapping by using fixmap slots.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-16 15:10:46 +00:00
Mark Rutland
dca56dca71 arm64: mm: add functions to walk page tables by PA
To allow us to walk tables allocated into the fixmap, we need to acquire
the physical address of a page, rather than the virtual address in the
linear map.

This patch adds new p??_page_paddr and p??_offset_phys functions to
acquire the physical address of a next-level table, and changes
p??_offset* into macros which simply convert this to a linear map VA.
This renders p??_page_vaddr unused, and hence they are removed.

At the pgd level, a new pgd_offset_raw function is added to find the
relevant PGD entry given the base of a PGD and a virtual address.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-16 15:10:45 +00:00
Mark Rutland
053520f7d3 arm64: mm: move pte_* macros
For pmd, pud, and pgd levels of table, functions including p?d_index and
p?d_offset are defined after the p?d_page_vaddr function for the
immediately higher level of table.

The pte functions however are defined much earlier, even though several
rely on the later definition of pmd_page_vaddr. While this isn't
currently a problem as these are macros, it prevents the logical
grouping of later C functions (which cannot rely on prototypes for
functions not yet defined).

Move these definitions after pmd_page_vaddr, for consistency with the
placement of these functions for other levels of table.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-16 15:10:45 +00:00
Mark Rutland
5227cfa71f arm64: mm: place empty_zero_page in bss
Currently the zero page is set up in paging_init, and thus we cannot use
the zero page earlier. We use the zero page as a reserved TTBR value
from which no TLB entries may be allocated (e.g. when uninstalling the
idmap). To enable such usage earlier (as may be required for invasive
changes to the kernel page tables), and to minimise the time that the
idmap is active, we need to be able to use the zero page before
paging_init.

This patch follows the example set by x86, by allocating the zero page
at compile time, in .bss. This means that the zero page itself is
available immediately upon entry to start_kernel (as we zero .bss before
this), and also means that the zero page takes up no space in the raw
Image binary. The associated struct page is allocated in bootmem_init,
and remains unavailable until this time.

Outside of arch code, the only users of empty_zero_page assume that the
empty_zero_page symbol refers to the zeroed memory itself, and that
ZERO_PAGE(x) must be used to acquire the associated struct page,
following the example of x86. This patch also brings arm64 inline with
these assumptions.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-16 15:10:44 +00:00
Catalin Marinas
ac15bd63bb arm64: Honour !PTE_WRITE in set_pte_at() for kernel mappings
Currently, set_pte_at() only checks the software PTE_WRITE bit for user
mappings when it sets or clears the hardware PTE_RDONLY accordingly. The
kernel ptes are written directly without any modification, relying
solely on the protection bits in macros like PAGE_KERNEL. However,
modifying kernel pte attributes via pte_wrprotect() would be ignored by
set_pte_at(). Since pte_wrprotect() does not set PTE_RDONLY (it only
clears PTE_WRITE), the new permission is not taken into account.

This patch changes set_pte_at() to adjust the read-only permission for
kernel ptes as well. As a side effect, existing PROT_* definitions used
for kernel ioremap*() need to include PTE_DIRTY | PTE_WRITE.

(additionally, white space fix for PTE_KERNEL_ROX)

Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-01-25 11:09:06 +00:00
Minchan Kim
05ee26d9e7 arch/arm64/include/asm/pgtable.h: add pmd_mkclean for THP
MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
of the contents since MADV_FREE syscall is called for THP page.

This patch adds pmd_mkclean for THP page MADV_FREE support.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Shaohua Li <shli@kernel.org>
Cc: <yalin.wang2010@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jason Evans <je@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttil <mika.penttila@nextfour.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Shaohua Li <shli@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Kirill A. Shutemov
b7ed934a7c arm64, thp: remove infrastructure for handling splitting PMDs
With new refcounting we don't need to mark PMDs splitting.  Let's drop
code to handle this.

pmdp_splitting_flush() is not needed too: on splitting PMD we will do
pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
needed for fast_gup.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-01-15 17:56:32 -08:00
Linus Torvalds
fa5fd7c628 arm64 updates for 4.5:
- Support for a separate IRQ stack, although we haven't reduced the size
   of our thread stack just yet since we don't have enough data to
   determine a safe value
 
 - Refactoring of our EFI initialisation and runtime code into
   drivers/firmware/efi/ so that it can be reused by arch/arm/.
 
 - Ftrace improvements when unwinding in the function graph tracer
 
 - Document our silicon errata handling process
 
 - Cache flushing optimisation when mapping executable pages
 
 - Support for hugetlb mappings using the contiguous hint in the pte
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJWj+pFAAoJELescNyEwWM0/V8IALu8i2d6LijVICyZ/MH6pK+F
 krbkIjdKFmIoFqo8HolCDMDqWfdzCLW671iYmks1DYVqM0Q5SXRa1rIzMw1Nbd3s
 PzHS8qvnJFGtjXgwX5yxcyA5nU5hG5/mHJ8tbEg4zlQXvGONU6rZOlt4xY3ocZR7
 iWmqoNX8LbPv5UgpifQ06QXEiC+4pm/BgADl2995oZfOaZ37L6c0oh6VcxQWyEf8
 7OFRYtwruNyX2S5zJkL41Rh8gFAL9/j7lrHt2D+cxHR58X+qiRYKTjxkwJUt6i3E
 ROZROsdQpyHojIIIYZEfNCZWjV0NwSghQfCnbsDwxVkkVeY414UXIno8JV4MyCk=
 =JHvb
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "Here is the core arm64 queue for 4.5.  As you might expect, the
  Christmas break resulted in a number of patches not making the final
  cut, so 4.6 is likely to be larger than usual.  There's still some
  useful stuff here, however, and it's detailed below.

  The EFI changes have been Reviewed-by Matt and the memblock change got
  an "OK" from akpm.

  Summary:

   - Support for a separate IRQ stack, although we haven't reduced the
     size of our thread stack just yet since we don't have enough data
     to determine a safe value

   - Refactoring of our EFI initialisation and runtime code into
     drivers/firmware/efi/ so that it can be reused by arch/arm/.

   - Ftrace improvements when unwinding in the function graph tracer

   - Document our silicon errata handling process

   - Cache flushing optimisation when mapping executable pages

   - Support for hugetlb mappings using the contiguous hint in the pte"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (45 commits)
  arm64: head.S: use memset to clear BSS
  efi: stub: define DISABLE_BRANCH_PROFILING for all architectures
  arm64: entry: remove pointless SPSR mode check
  arm64: mm: move pgd_cache initialisation to pgtable_cache_init
  arm64: module: avoid undefined shift behavior in reloc_data()
  arm64: module: fix relocation of movz instruction with negative immediate
  arm64: traps: address fallout from printk -> pr_* conversion
  arm64: ftrace: fix a stack tracer's output under function graph tracer
  arm64: pass a task parameter to unwind_frame()
  arm64: ftrace: modify a stack frame in a safe way
  arm64: remove irq_count and do_softirq_own_stack()
  arm64: hugetlb: add support for PTE contiguous bit
  arm64: Use PoU cache instr for I/D coherency
  arm64: Defer dcache flush in __cpu_copy_user_page
  arm64: reduce stack use in irq_handler
  arm64: mm: ensure that the zero page is visible to the page table walker
  arm64: Documentation: add list of software workarounds for errata
  arm64: mm: place __cpu_setup in .text
  arm64: cmpxchg: Don't incldue linux/mmdebug.h
  arm64: mm: fold alternatives into .init
  ...
2016-01-12 12:23:33 -08:00
Will Deacon
39b5be9b42 arm64: mm: move pgd_cache initialisation to pgtable_cache_init
Initialising the suppport for EFI runtime services requires us to
allocate a pgd off the back of an early_initcall. On systems where the
PGD_SIZE is smaller than PAGE_SIZE (e.g. 64k pages and 48-bit VA), the
pgd_cache isn't initialised at this stage, and we panic with a NULL
dereference during boot:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000

  __create_mapping.isra.5+0x84/0x350
  create_pgd_mapping+0x20/0x28
  efi_create_mapping+0x5c/0x6c
  arm_enable_runtime_services+0x154/0x1e4
  do_one_initcall+0x8c/0x190
  kernel_init_freeable+0x84/0x1ec
  kernel_init+0x10/0xe0
  ret_from_fork+0x10/0x50

This patch fixes the problem by initialising the pgd_cache earlier, in
the pgtable_cache_init callback, which sounds suspiciously like what it
was intended for.

Reported-by: Dennis Chen <dennis.chen@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-01-05 15:43:10 +00:00
David Woods
66b3923a1a arm64: hugetlb: add support for PTE contiguous bit
The arm64 MMU supports a Contiguous bit which is a hint that the TTE
is one of a set of contiguous entries which can be cached in a single
TLB entry.  Supporting this bit adds new intermediate huge page sizes.

The set of huge page sizes available depends on the base page size.
Without using contiguous pages the huge page sizes are as follows.

 4KB:   2MB  1GB
64KB: 512MB

With a 4KB granule, the contiguous bit groups together sets of 16 pages
and with a 64KB granule it groups sets of 32 pages.  This enables two new
huge page sizes in each case, so that the full set of available sizes
is as follows.

 4KB:  64KB   2MB  32MB  1GB
64KB:   2MB 512MB  16GB

If a 16KB granule is used then the contiguous bit groups 128 pages
at the PTE level and 32 pages at the PMD level.

If the base page size is set to 64KB then 2MB pages are enabled by
default.  It is possible in the future to make 2MB the default huge
page size for both 4KB and 64KB granules.

Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com>
Reviewed-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: David Woods <dwoods@ezchip.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-12-21 17:26:00 +00:00
Catalin Marinas
82d340081b arm64: Improve error reporting on set_pte_at() checks
Currently the BUG_ON() checks do not give enough information about the
PTEs being set. This patch changes BUG_ON to WARN_ONCE and dumps the
values of the old and new PTEs. In addition, the checks are only made if
the new PTE entry is valid.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Ming Lei <tom.leiming@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
2015-12-11 15:44:24 +00:00
Will Deacon
76c714be0e arm64: pgtable: implement pte_accessible()
This patch implements the pte_accessible() macro, which can be used to
test whether or not a given pte is a candidate for allocation in the
TLB.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-12-01 09:47:03 +00:00
Laura Abbott
0b2aa5b80b arm64: Fix R/O permissions in mark_rodata_ro
The permissions in mark_rodata_ro trigger a build error
with STRICT_MM_TYPECHECKS. Fix this by introducing
PAGE_KERNEL_ROX for the same reasons as PAGE_KERNEL_RO.
From Ard:

"PAGE_KERNEL_EXEC has PTE_WRITE set as well, making the range
writeable under the ARMv8.1 DBM feature, that manages the
dirty bit in hardware (writing to a page with the PTE_RDONLY
and PTE_WRITE bits both set will clear the PTE_RDONLY bit in that case)"

Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-11-18 12:11:36 +00:00
Linus Torvalds
a18e2fa5e6 arm64 fixes and clean-ups:
- __cmpxchg_double*() return type fix to avoid truncation of a long to
   int and subsequent logical "not" in cmpxchg_double() misinterpreting
   the operation success/failure
 - BPF fixes for mod and div by zero
 - Fix compilation with STRICT_MM_TYPECHECKS enabled
 - VDSO build fix without libgcov
 - Some static and __maybe_unused annotations
 - Kconfig clean-up (FRAME_POINTER)
 - defconfig update for CRYPTO_CRC32_ARM64
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWRNNbAAoJEGvWsS0AyF7xp+wQAIc0A+uSReEJ0Be3kSWZIy0O
 9wGCtfp2e3X78ibgVoP/+KvA1JUrMJNwNH54CgGgG6H4rwjRthCvIV/HbKfYufM8
 vfuTL2MV1ywkNO0uTzspsICqgKPcpG27SwAlgOcxNXpO0Kui2OlKSxS4kTA8+6Z5
 Lm64qDmFG7Z6wcBHhr8JSngC+xvXOvlcUW8odnjXjyCimwnpCFXXnRWDU3RnXJZa
 3Khgp8OiRtnCSLfj7YBQA9wfNNgPgKdJ5wevz2g7hiIbYx0IOHmDpzbb3sUNMMKV
 XLKeeJgqZL4EXZBCzapHRHCE/q0kiiBhzYSHw6aOBwjD9v683aytT/ax2/AgjzvW
 nB3ZPdrbRMjcmNRBT2bheoU8diilhtfxSxf+4T+pVUnVMXDNl/xY9hekGA0hFO1z
 nH5P5vkFKsX3U02Ox/G50Od2rM6p7uGRGFYuomSIoJYBItuxGOAuYWlY2+ujcxY5
 YvAQ+3FYCkjLipVutlqLxKoZSY8Ex+0LOjPYYsI/+rsE70IVjGuLj0bTm8B/aTcy
 dOctNqvOGwo8O5n2jsKM3XkjfUCPRdzu1C7rQz2BqfE9cPAZxg2fQpPv4SGtPuFe
 lEvokuYRJ3qYnMt5MG/9Mkqmczfbch88A41wgS9/ySQ57eo3wISLkOiKqzKdJjOa
 0qldWaEvST2iVUQmiMl7
 =ApkD
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes and clean-ups from Catalin Marinas:
 "Here's a second pull request for this merging window with some
  fixes/clean-ups:

   - __cmpxchg_double*() return type fix to avoid truncation of a long
     to int and subsequent logical "not" in cmpxchg_double()
     misinterpreting the operation success/failure

   - BPF fixes for mod and div by zero

   - Fix compilation with STRICT_MM_TYPECHECKS enabled

   - VDSO build fix without libgcov

   - Some static and __maybe_unused annotations

   - Kconfig clean-up (FRAME_POINTER)

   - defconfig update for CRYPTO_CRC32_ARM64"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: suspend: make hw_breakpoint_restore static
  arm64: mmu: make split_pud and fixup_executable static
  arm64: smp: make of_parse_and_init_cpus static
  arm64: use linux/types.h in kvm.h
  arm64: build vdso without libgcov
  arm64: mark cpus_have_hwcap as __maybe_unused
  arm64: remove redundant FRAME_POINTER kconfig option and force to select it
  arm64: fix R/O permissions of FDT mapping
  arm64: fix STRICT_MM_TYPECHECKS issue in PTE_CONT manipulation
  arm64: bpf: fix mod-by-zero case
  arm64: bpf: fix div-by-zero case
  arm64: Enable CRYPTO_CRC32_ARM64 in defconfig
  arm64: cmpxchg_dbl: fix return value type
2015-11-12 15:33:11 -08:00
Ard Biesheuvel
fb226c3d7c arm64: fix R/O permissions of FDT mapping
The mapping permissions of the FDT are set to 'PAGE_KERNEL | PTE_RDONLY'
in an attempt to map the FDT as read-only. However, not only does this
break at build time under STRICT_MM_TYPECHECKS (since the two terms are
of different types in that case), it also results in both the PTE_WRITE
and PTE_RDONLY attributes to be set, which means the region is still
writable under ARMv8.1 DBM (and an attempted write will simply clear the
PT_RDONLY bit).

So instead, define PAGE_KERNEL_RO (which already has an established
meaning across architectures) and use that instead.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-11-09 14:26:36 +00:00
Linus Torvalds
2dc10ad81f arm64 updates for 4.4:
- "genirq: Introduce generic irq migration for cpu hotunplugged" patch
   merged from tip/irq/for-arm to allow the arm64-specific part to be
   upstreamed via the arm64 tree
 
 - CPU feature detection reworked to cope with heterogeneous systems
   where CPUs may not have exactly the same features. The features
   reported by the kernel via internal data structures or ELF_HWCAP are
   delayed until all the CPUs are up (and before user space starts)
 
 - Support for 16KB pages, with the additional bonus of a 36-bit VA
   space, though the latter only depending on EXPERT
 
 - Implement native {relaxed, acquire, release} atomics for arm64
 
 - New ASID allocation algorithm which avoids IPI on roll-over, together
   with TLB invalidation optimisations (using local vs global where
   feasible)
 
 - KASan support for arm64
 
 - EFI_STUB clean-up and isolation for the kernel proper (required by
   KASan)
 
 - copy_{to,from,in}_user optimisations (sharing the memcpy template)
 
 - perf: moving arm64 to the arm32/64 shared PMU framework
 
 - L1_CACHE_BYTES increased to 128 to accommodate Cavium hardware
 
 - Support for the contiguous PTE hint on kernel mapping (16 consecutive
   entries may be able to use a single TLB entry)
 
 - Generic CONFIG_HZ now used on arm64
 
 - defconfig updates
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWOkmIAAoJEGvWsS0AyF7x4GgQAINU3NePjFFvWZNCkqobeH9+
 jFKwtXamIudhTSdnXNXyYWmtRL9Krg3qI4zDQf68dvDFAZAze2kVuOi1yPpCbpFZ
 /j/afNyQc7+PoyqRAzmT+EMPZlcuOA84Prrl1r3QWZ58QaFeVk/6ZxrHunTHxN0x
 mR9PIXfWx73MTo+UnG8FChkmEY6LmV4XpemgTaMR9FqFhdT51OZSxDDAYXOTm4JW
 a5HdN9OWjjJ2rhLlFEaC7tszG9B5doHdy2tr5ge/YERVJzIPDogHkMe8ZhfAJc+x
 SQU5tKN6Pg4MOi+dLhxlk0/mKCvHLiEQ5KVREJnt8GxupAR54Bat+DQ+rP9cSnpq
 dRQTcARIOyy9LGgy+ROAsSo+NiyM5WuJ0/WJUYKmgWTJOfczRYoZv6TMKlwNOUYb
 tGLCZHhKPM3yBHJlWbQykl3xmSuudxCMmjlZzg7B+MVfTP6uo0CRSPmYl+v67q+J
 bBw/Z2RYXWYGnvlc6OfbMeImI6prXeE36+5ytyJFga0m+IqcTzRGzjcLxKEvdbiU
 pr8n9i+hV9iSsT/UwukXZ8ay6zH7PrTLzILWQlieutfXlvha7MYeGxnkbLmdYcfe
 GCj374io5cdImHcVKmfhnOMlFOLuOHphl9cmsd/O2LmCIqBj9BIeNH2Om8mHVK2F
 YHczMdpESlJApE7kUc1e
 =3six
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - "genirq: Introduce generic irq migration for cpu hotunplugged" patch
   merged from tip/irq/for-arm to allow the arm64-specific part to be
   upstreamed via the arm64 tree

 - CPU feature detection reworked to cope with heterogeneous systems
   where CPUs may not have exactly the same features.  The features
   reported by the kernel via internal data structures or ELF_HWCAP are
   delayed until all the CPUs are up (and before user space starts)

 - Support for 16KB pages, with the additional bonus of a 36-bit VA
   space, though the latter only depending on EXPERT

 - Implement native {relaxed, acquire, release} atomics for arm64

 - New ASID allocation algorithm which avoids IPI on roll-over, together
   with TLB invalidation optimisations (using local vs global where
   feasible)

 - KASan support for arm64

 - EFI_STUB clean-up and isolation for the kernel proper (required by
   KASan)

 - copy_{to,from,in}_user optimisations (sharing the memcpy template)

 - perf: moving arm64 to the arm32/64 shared PMU framework

 - L1_CACHE_BYTES increased to 128 to accommodate Cavium hardware

 - Support for the contiguous PTE hint on kernel mapping (16 consecutive
   entries may be able to use a single TLB entry)

 - Generic CONFIG_HZ now used on arm64

 - defconfig updates

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (91 commits)
  arm64/efi: fix libstub build under CONFIG_MODVERSIONS
  ARM64: Enable multi-core scheduler support by default
  arm64/efi: move arm64 specific stub C code to libstub
  arm64: page-align sections for DEBUG_RODATA
  arm64: Fix build with CONFIG_ZONE_DMA=n
  arm64: Fix compat register mappings
  arm64: Increase the max granular size
  arm64: remove bogus TASK_SIZE_64 check
  arm64: make Timer Interrupt Frequency selectable
  arm64/mm: use PAGE_ALIGNED instead of IS_ALIGNED
  arm64: cachetype: fix definitions of ICACHEF_* flags
  arm64: cpufeature: declare enable_cpu_capabilities as static
  genirq: Make the cpuhotplug migration code less noisy
  arm64: Constify hwcap name string arrays
  arm64/kvm: Make use of the system wide safe values
  arm64/debug: Make use of the system wide safe value
  arm64: Move FP/ASIMD hwcap handling to common code
  arm64/HWCAP: Use system wide safe values
  arm64/capabilities: Make use of system wide safe value
  arm64: Delay cpu feature capability checks
  ...
2015-11-04 14:47:13 -08:00
Catalin Marinas
7db743c6d8 arm64: Minor coding style fixes for kc_offset_to_vaddr and kc_vaddr_to_offset
These were introduced by commit 03875ad52f (arm64: add
kc_offset_to_vaddr and kc_vaddr_to_offset macro).

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-16 14:34:53 +01:00
Ingo Molnar
c7d77a7980 Merge branch 'x86/urgent' into core/efi, to pick up a pending EFI fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-14 16:05:18 +02:00
yalin wang
03875ad52f arm64: add kc_offset_to_vaddr and kc_vaddr_to_offset macro
This patch add kc_offset_to_vaddr() and kc_vaddr_to_offset(),
the default version doesn't work on arm64, because arm64 kernel address
is below the PAGE_OFFSET, like module address and vmemmap address are
all below PAGE_OFFSET address.

Signed-off-by: yalin wang <yalin.wang2010@gmail.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-13 15:15:24 +01:00
Andrey Ryabinin
39d114ddc6 arm64: add KASAN support
This patch adds arch specific code for kernel address sanitizer
(see Documentation/kasan.txt).

1/8 of kernel addresses reserved for shadow memory. There was no
big enough hole for this, so virtual addresses for shadow were
stolen from vmalloc area.

At early boot stage the whole shadow region populated with just
one physical page (kasan_zero_page). Later, this page reused
as readonly zero shadow for some memory that KASan currently
don't track (vmalloc).
After mapping the physical memory, pages for shadow memory are
allocated and mapped.

Functions like memset/memmove/memcpy do a lot of memory accesses.
If bad pointer passed to one of these function it is important
to catch this. Compiler's instrumentation cannot do this since
these functions are written in assembly.
KASan replaces memory functions with manually instrumented variants.
Original functions declared as weak symbols so strong definitions
in mm/kasan/kasan.c could replace them. Original functions have aliases
with '__' prefix in name, so we could call non-instrumented variant
if needed.
Some files built without kasan instrumentation (e.g. mm/slub.c).
Original mem* function replaced (via #define) with prefixed variants
to disable memory access checks for such files.

Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-12 17:46:36 +01:00
Jeremy Linton
06f90d2527 arm64: Default kernel pages should be contiguous
The default page attributes for a PMD being broken should have the CONT bit
set. Create a new definition for an early boot range of PTE's that are
contiguous.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-08 18:39:46 +01:00
Jeremy Linton
93ef666a09 arm64: Macros to check/set/unset the contiguous bit
Add the supporting macros to check if the contiguous bit
is set, set the bit, or clear it in a PTE entry.

Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-08 18:39:19 +01:00
Will Deacon
120798d2e7 arm64: mm: remove dsb from update_mmu_cache
update_mmu_cache() consists of a dsb(ishst) instruction so that new user
mappings are guaranteed to be visible to the page table walker on
exception return.

In reality this can be a very expensive operation which is rarely needed.
Removing this barrier shows a modest improvement in hackbench scores and
, in the worst case, we re-take the user fault and establish that there
was nothing to do.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-07 11:56:37 +01:00
Andrey Ryabinin
127db024a7 arm64: introduce VA_START macro - the first kernel virtual address.
In order to not use lengthy (UL(0xffffffffffffffff) << VA_BITS) everywhere,
replace it with VA_START.

Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-07 11:41:32 +01:00
Steve Capper
1a541b4e3c arm64: Fix THP protection change logic
6910fa1 ("arm64: enable PTE type bit in the mask for pte_modify") fixes
a problem whereby a large block of PROT_NONE mapped memory is
incorrectly mapped as block descriptors when mprotect is called.

Unfortunately, a subtle bug was introduced by this fix to the THP logic.

If one mmaps a large block of memory, then faults it such that it is
collapsed into THPs; resulting calls to mprotect on this area of memory
will lead to incorrect table descriptors being written instead of block
descriptors. This is because pmd_modify calls pte_modify which is now
allowed to modify the type of the page table entry.

This patch reverts commit 6910fa16db, and
fixes the problem it was trying to address by adjusting PAGE_NONE to
represent a table entry. Thus no change in pte type is required when
moving from PROT_NONE to a different protection.

Fixes: 6910fa16db ("arm64: enable PTE type bit in the mask for pte_modify")
Cc: <stable@vger.kernel.org> # 4.0+
Cc: Feng Kan <fkan@apm.com>
Reported-by: Ganapatrao Kulkarni <Ganapatrao.Kulkarni@caviumnetworks.com>
Tested-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-01 18:02:21 +01:00
Will Deacon
bf950040a5 arm64: pgtable: use a single bit for PTE_WRITE regardless of DBM
Depending on CONFIG_ARM64_HW_AFDBM, we use either bit 57 or 51 of the
pte to represent PTE_WRITE. Given that bit 51 is reserved prior to
ARMv8.1, we can just use that bit regardless of the config option. That
also matches what happens if a kernel configured with ARM64_HW_AFDBM=y
is run on a CPU without the DBM functionality.

Cc: Julien Grall <julien.grall@citrix.com>
Tested-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-09-14 12:28:45 +01:00
Catalin Marinas
62d96c71d2 arm64: Fix pte_modify() to preserve the hardware dirty information
The pte_modify() function with hardware AF/DBM enabled must transfer the
hardware dirty information to the software PTE_DIRTY bit. However, it
was setting this bit in newprot and the mask does not cover such bit.
This patch sets PTE_DIRTY on the original pte which will be preserved in
the returned value.

Fixes: 2f4b829c62 ("arm64: Add support for hardware updates of the access and dirty pte bits")
Cc: Julien Grall <julien.grall@citrix.com>
Tested-by: Julien Grall <julien.grall@citrix.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-09-14 12:28:41 +01:00
Catalin Marinas
b847415ce9 arm64: Fix the pte_hw_dirty() check when AF/DBM is enabled
Commit 2f4b829c62 ("arm64: Add support for hardware updates of the
access and dirty pte bits") introduced support for handling hardware
updates of the access flag and dirty status. The PTE is automatically
dirtied in hardware (if supported) by clearing the PTE_RDONLY bit when
the PTE_DBM/PTE_WRITE bit is set. The pte_hw_dirty() macro was added to
detect a hardware dirtied pte. The pte_dirty() macro checks for both
software PTE_DIRTY and pte_hw_dirty().

Functions like pte_modify() clear the PTE_RDONLY bit since it is meant
to be set in set_pte_at() when written to memory. In such cases,
pte_hw_dirty() would return true even though such pte is clean. This
patch changes pte_hw_dirty() to test the PTE_DBM/PTE_WRITE bit together
with PTE_RDONLY.

Fixes: 2f4b829c62 ("arm64: Add support for hardware updates of the access and dirty pte bits")
Reported-by: Julien Grall <julien.grall@citrix.com>
Tested-by: Julien Grall <julien.grall@citrix.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-09-14 12:28:31 +01:00
Jonathan (Zhixiong) Zhang
8d446c8647 arm64/mm: Add PROT_DEVICE_nGnRnE and PROT_NORMAL_WT
UEFI spec 2.5 section 2.3.6.1 defines that
EFI_MEMORY_[UC|WC|WT|WB] are possible EFI memory types for
AArch64.

Each of those EFI memory types is mapped to a corresponding
AArch64 memory type. So we need to define PROT_DEVICE_nGnRnE
and PROT_NORMWL_WT additionaly.

MT_NORMAL_WT is defined, and its encoding is added to MAIR_EL1
when initializing the CPU.

Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1438936621-5215-6-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-08-08 10:37:40 +02:00
Will Deacon
766ffb6980 arm64: pgtable: fix definition of pte_valid
pte_valid should check if the PTE_VALID bit (1 << 0) is set in the pte,
so fix the macro definition to use bitwise & instead of logical &&.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-28 16:14:03 +01:00
Will Deacon
4b3dc9679c arm64: force CONFIG_SMP=y and remove redundant #ifdefs
Nobody seems to be producing !SMP systems anymore, so this is just
becoming a source of kernel bugs, particularly if people want to use
coherent DMA with non-shared pages.

This patch forces CONFIG_SMP=y for arm64, removing a modest amount of
code in the process.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27 11:08:40 +01:00
Catalin Marinas
2f4b829c62 arm64: Add support for hardware updates of the access and dirty pte bits
The ARMv8.1 architecture extensions introduce support for hardware
updates of the access and dirty information in page table entries. With
TCR_EL1.HA enabled, when the CPU accesses an address with the PTE_AF bit
cleared in the page table, instead of raising an access flag fault the
CPU sets the actual page table entry bit. To ensure that kernel
modifications to the page tables do not inadvertently revert a change
introduced by hardware updates, the exclusive monitor (ldxr/stxr) is
adopted in the pte accessors.

When TCR_EL1.HD is enabled, a write access to a memory location with the
DBM (Dirty Bit Management) bit set in the corresponding pte
automatically clears the read-only bit (AP[2]). Such DBM bit maps onto
the Linux PTE_WRITE bit and to check whether a writable (DBM set) page
is dirty, the kernel tests the PTE_RDONLY bit. In order to allow
read-only and dirty pages, the kernel needs to preserve the software
dirty bit. The hardware dirty status is transferred to the software
dirty bit in ptep_set_wrprotect() (using load/store exclusive loop) and
pte_modify().

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27 11:08:39 +01:00
Will Deacon
cba3574fd5 arm64: move update_mmu_cache() into asm/pgtable.h
Mark Brown reported an allnoconfig build failure in -next:

  Today's linux-next fails to build an arm64 allnoconfig due to "mm:
  make GUP handle pfn mapping unless FOLL_GET is requested" which
  causes:

  >       arm64-allnoconfig
  > ../mm/gup.c:51:4: error: implicit declaration of function
    'update_mmu_cache' [-Werror=implicit-function-declaration]

Fix the error by moving the function to asm/pgtable.h, as is the case
for most other architectures.

Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27 11:08:39 +01:00
Kirill A. Shutemov
9f25e6ad58 arm64: expose number of page table levels on Kconfig level
We would want to use number of page table level to define mm_struct.
Let's expose it as CONFIG_PGTABLE_LEVELS.

ARM64_PGTABLE_LEVELS is renamed to PGTABLE_LEVELS and defined before
sourcing init/Kconfig: arch/Kconfig will define default value and it's
sourced from init/Kconfig.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-14 16:49:01 -07:00
Feng Kan
6910fa16db arm64: enable PTE type bit in the mask for pte_modify
Caught during Trinity testing. The pte_modify does not allow
modification for PTE type bit. This cause the test to hang
the system. It is found that the PTE can't transit from an
inaccessible page (b00) to a valid page (b11) because the mask
does not allow it. This happens when a big block of mmaped
memory is set the PROT_NONE, then the a small piece is broken
off and set to PROT_WRITE | PROT_READ cause a huge page split.

Signed-off-by: Feng Kan <fkan@apm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-02-26 18:30:12 +00:00
Linus Torvalds
59d53737a8 Merge branch 'akpm' (patches from Andrew)
Merge second set of updates from Andrew Morton:
 "More of MM"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits)
  mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
  mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
  vmstat: Reduce time interval to stat update on idle cpu
  mm/page_owner.c: remove unnecessary stack_trace field
  Documentation/filesystems/proc.txt: describe /proc/<pid>/map_files
  mm: incorporate read-only pages into transparent huge pages
  vmstat: do not use deferrable delayed work for vmstat_update
  mm: more aggressive page stealing for UNMOVABLE allocations
  mm: always steal split buddies in fallback allocations
  mm: when stealing freepages, also take pages created by splitting buddy page
  mincore: apply page table walker on do_mincore()
  mm: /proc/pid/clear_refs: avoid split_huge_page()
  mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)
  mempolicy: apply page table walker on queue_pages_range()
  arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma()
  memcg: cleanup preparation for page table walk
  numa_maps: remove numa_maps->vma
  numa_maps: fix typo in gather_hugetbl_stats
  pagemap: use walk->vma instead of calling find_vma()
  clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk()
  ...
2015-02-11 18:23:28 -08:00
Linus Torvalds
6b00f7efb5 arm64 updates for 3.20:
- reimplementation of the virtual remapping of UEFI Runtime Services in
   a way that is stable across kexec
 - emulation of the "setend" instruction for 32-bit tasks (user
   endianness switching trapped in the kernel, SCTLR_EL1.E0E bit set
   accordingly)
 - compat_sys_call_table implemented in C (from asm) and made it a
   constant array together with sys_call_table
 - export CPU cache information via /sys (like other architectures)
 - DMA API implementation clean-up in preparation for IOMMU support
 - macros clean-up for KVM
 - dropped some unnecessary cache+tlb maintenance
 - CONFIG_ARM64_CPU_SUSPEND clean-up
 - defconfig update (CPU_IDLE)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJU25v3AAoJEGvWsS0AyF7xYjcP/j8ESvs+z0BPgeJ6XREfOnCh
 cp+w/1rJ5BafJ5RRkibrciwTNOIJS4FGMivWyURtoh430lS0Rh7fxZ3Ouna3xjrT
 Nf7AxenWoA8Lo6wHh+FlNUeGk3iWfX6WwA2tYrbKudK+LBJ1wHjwpE7cWQO0FgwJ
 aFDahu+QD5/u45p/VcVctMtiEDvOxBdO8gfat6r+YkLm7pbRxQkZnpA/JE4Gps1p
 Td5jvMNH9pXI5pffSbeR9Q+vs/r0yqKLXQg01Eb2bZgGDgwf9yzADrHuaKamZt35
 X5flmLiTGC6swJCJvUkZC1Nuue33bXcvW5+vgvar+MNGyXsxv+B/wARLqGhiWhQZ
 nLGwFpuNu6wdY9tGHb/XR8khcewkw1/lRH1hHKhchrmRyUqHvXcPgC5tamjLrY8C
 BV3BAeQvRho8OKwWUmbXIlyON1vPux6CJdj4D/A5NL+qph2WHeVWJCXg6nVFx0Wc
 Eb3bXbI4QRwTFL7pGRF8RyZJBAQtgYhQMKWMW2GHgUgn+r1EixG73BZoSwvpHrrw
 FOR9AVNfVBqmNON8xiIb3DN4EViq76EF0jrsZh5I9EoWS2w5qtk60kJQgXE+M4EE
 vOlmh3dhEVfCN2SxOn0bgoQmTulyjqGauTSSJKQbIBuinPFveukrJfGNFIWt0SZs
 f38FBMo6sgU4VG85B+Fr
 =X5x/
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "arm64 updates for 3.20:

   - reimplementation of the virtual remapping of UEFI Runtime Services
     in a way that is stable across kexec
   - emulation of the "setend" instruction for 32-bit tasks (user
     endianness switching trapped in the kernel, SCTLR_EL1.E0E bit set
     accordingly)
   - compat_sys_call_table implemented in C (from asm) and made it a
     constant array together with sys_call_table
   - export CPU cache information via /sys (like other architectures)
   - DMA API implementation clean-up in preparation for IOMMU support
   - macros clean-up for KVM
   - dropped some unnecessary cache+tlb maintenance
   - CONFIG_ARM64_CPU_SUSPEND clean-up
   - defconfig update (CPU_IDLE)

  The EFI changes going via the arm64 tree have been acked by Matt
  Fleming.  There is also a patch adding sys_*stat64 prototypes to
  include/linux/syscalls.h, acked by Andrew Morton"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (47 commits)
  arm64: compat: Remove incorrect comment in compat_siginfo
  arm64: Fix section mismatch on alloc_init_p[mu]d()
  arm64: Avoid breakage caused by .altmacro in fpsimd save/restore macros
  arm64: mm: use *_sect to check for section maps
  arm64: drop unnecessary cache+tlb maintenance
  arm64:mm: free the useless initial page table
  arm64: Enable CPU_IDLE in defconfig
  arm64: kernel: remove ARM64_CPU_SUSPEND config option
  arm64: make sys_call_table const
  arm64: Remove asm/syscalls.h
  arm64: Implement the compat_sys_call_table in C
  syscalls: Declare sys_*stat64 prototypes if __ARCH_WANT_(COMPAT_)STAT64
  compat: Declare compat_sys_sigpending and compat_sys_sigprocmask prototypes
  arm64: uapi: expose our struct ucontext to the uapi headers
  smp, ARM64: Kill SMP single function call interrupt
  arm64: Emulate SETEND for AArch32 tasks
  arm64: Consolidate hotplug notifier for instruction emulation
  arm64: Track system support for mixed endian EL0
  arm64: implement generic IOMMU configuration
  arm64: Combine coherent and non-coherent swiotlb dma_ops
  ...
2015-02-11 18:03:54 -08:00
Kirill A. Shutemov
d016bf7ece mm: make FIRST_USER_ADDRESS unsigned long on all archs
LKP has triggered a compiler warning after my recent patch "mm: account
pmd page tables to the process":

    mm/mmap.c: In function 'exit_mmap':
 >> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default]

The code:

 > 2857                WARN_ON(mm_nr_pmds(mm) >
   2858                                round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);

In this, on tile, we have FIRST_USER_ADDRESS defined as 0.  round_up() has
the same type -- int.  PUD_SHIFT.

I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned
long.  On every arch for consistency.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-11 17:06:03 -08:00
Kirill A. Shutemov
9b3e661e58 arm64: drop PTE_FILE and pte_file()-related helpers
We've replaced remap_file_pages(2) implementation with emulation.  Nobody
creates non-linear mapping anymore.

This patch also adjust __SWP_TYPE_SHIFT and increase number of bits
availble for swap offset.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-10 14:30:31 -08:00
zhichang.yuan
523d6e9fae arm64:mm: free the useless initial page table
For 64K page system, after mapping a PMD section, the corresponding initial
page table is not needed any more. That page can be freed.

Signed-off-by: Zhichang Yuan <zhichang.yuan@linaro.org>
[catalin.marinas@arm.com: added BUG_ON() to catch late memblock freeing]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-01-28 12:07:28 +00:00
Ard Biesheuvel
8ce837cee8 arm64/mm: add create_pgd_mapping() to create private page tables
For UEFI, we need to install the memory mappings used for Runtime Services
in a dedicated set of page tables. Add create_pgd_mapping(), which allows
us to allocate and install those page table entries early.

Reviewed-by: Will Deacon <will.deacon@arm.com>
Tested-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2015-01-12 08:16:52 +00:00
Jungseok Lee
5d96e0cba2 arm64: mm: Add pgd_page to support RCU fast_gup
This patch adds pgd_page definition in order to keep supporting
HAVE_GENERIC_RCU_GUP configuration. In addition, it changes pud_page
expression to align with pmd_page for readability.

An introduction of pgd_page resolves the following build breakage
under 4KB + 4Level memory management combo.

mm/gup.c: In function 'gup_huge_pgd':
mm/gup.c:889:2: error: implicit declaration of function 'pgd_page' [-Werror=implicit-function-declaration]
  head = pgd_page(orig);
  ^
mm/gup.c:889:7: warning: assignment makes pointer from integer without a cast
  head = pgd_page(orig);

Cc: Will Deacon <will.deacon@arm.com>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Jungseok Lee <jungseoklee85@gmail.com>
[catalin.marinas@arm.com: remove duplicate pmd_page definition]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-12-23 16:39:17 +00:00