Commit Graph

797184 Commits

Author SHA1 Message Date
Mark Rutland
ac0e8c72b0 arm64: string: use asm EXPORT_SYMBOL()
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.

As a step towards removing arm64ksyms.c, let's move the string routine
exports to the assembly files the functions are defined in. Routines
which should only be exported for !KASAN builds are exported using the
EXPORT_SYMBOL_NOKASAN() helper.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10 11:50:12 +00:00
Mark Rutland
56c08ec516 arm64: uaccess: use asm EXPORT_SYMBOL()
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.

As a step towards removing arm64ksyms.c, let's move the uaccess exports
to the assembly files the functions are defined in.  As we have to
include <asm/assembler.h>, the existing includes are fixed to follow the
usual ordering conventions.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10 11:50:11 +00:00
Mark Rutland
50fdecb292 arm64: page: use asm EXPORT_SYMBOL()
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.

As a step towards removing arm64ksyms.c, let's move the copy_page and
clear_page exports to the assembly files the functions are defined in.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10 11:50:11 +00:00
Mark Rutland
23fe04c0c5 arm64: smccc: use asm EXPORT_SYMBOL()
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.

As a step towards removing arm64ksyms.c, let's move the SMCCC exports to
the assembly file the functions are defined in.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10 11:50:11 +00:00
Mark Rutland
abb77f3d96 arm64: tishift: use asm EXPORT_SYMBOL()
For a while now it's been possible to use EXPORT_SYMBOL() in assembly
files, which allows us to place exports immediately after assembly
functions, as we do for C functions.

As a step towards removing arm64ksyms.c, let's move the tishift exports
to the assembly file the functions are defined in.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10 11:50:11 +00:00
Mark Rutland
386b3c7bda arm64: add EXPORT_SYMBOL_NOKASAN()
So that we can export symbols directly from assembly files, let's make
use of the generic <asm/export.h>. We have a few symbols that we'll want
to conditionally export for !KASAN kernel builds, so we add a helper for
that in <asm/assembler.h>.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10 11:50:11 +00:00
Mark Rutland
03ef055fd3 arm64: move memstart_addr export inline
Since we define memstart_addr in a C file, we can have the export
immediately after the definition of the symbol, as we do elsewhere.

As a step towards removing arm64ksyms.c, move the export of
memstart_addr to init.c, where the symbol is defined.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10 11:50:11 +00:00
Mark Rutland
2d7c89b02c arm64: remove bitop exports
Now that the arm64 bitops are inlines built atop of the regular atomics,
we don't need to export anything.

Remove the redundant exports.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-10 11:50:11 +00:00
Will Deacon
4230509978 arm64: cmpxchg: Use "K" instead of "L" for ll/sc immediate constraint
The "L" AArch64 machine constraint, which we use for the "old" value in
an LL/SC cmpxchg(), generates an immediate that is suitable for a 64-bit
logical instruction. However, for cmpxchg() operations on types smaller
than 64 bits, this constraint can result in an invalid instruction which
is correctly rejected by GAS, such as EOR W1, W1, #0xffffffff.

Whilst we could special-case the constraint based on the cmpxchg size,
it's far easier to change the constraint to "K" and put up with using
a register for large 64-bit immediates. For out-of-line LL/SC atomics,
this is all moot anyway.

Reported-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-07 17:28:13 +00:00
Will Deacon
959bf2fd03 arm64: percpu: Rewrite per-cpu ops to allow use of LSE atomics
Our percpu code is a bit of an inconsistent mess:

  * It rolls its own xchg(), but reuses cmpxchg_local()
  * It uses various different flavours of preempt_{enable,disable}()
  * It returns values even for the non-returning RmW operations
  * It makes no use of LSE atomics outside of the cmpxchg() ops
  * There are individual macros for different sizes of access, but these
    are all funneled through a switch statement rather than dispatched
    directly to the relevant case

This patch rewrites the per-cpu operations to address these shortcomings.
Whilst the new code is a lot cleaner, the big advantage is that we can
use the non-returning ST- atomic instructions when we have LSE.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-07 17:28:06 +00:00
Will Deacon
b4f9209bfc arm64: Avoid masking "old" for LSE cmpxchg() implementation
The CAS instructions implicitly access only the relevant bits of the "old"
argument, so there is no need for explicit masking via type-casting as
there is in the LL/SC implementation.

Move the casting into the LL/SC code and remove it altogether for the LSE
implementation.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-07 17:28:01 +00:00
Will Deacon
5ef3fe4cec arm64: Avoid redundant type conversions in xchg() and cmpxchg()
Our atomic instructions (either LSE atomics of LDXR/STXR sequences)
natively support byte, half-word, word and double-word memory accesses
so there is no need to mask the data register prior to being stored.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-07 17:27:55 +00:00
Will Deacon
3962446922 arm64: preempt: Provide our own implementation of asm/preempt.h
The asm-generic/preempt.h implementation doesn't make use of the
PREEMPT_NEED_RESCHED flag, since this can interact badly with load/store
architectures which rely on the preempt_count word being unchanged across
an interrupt.

However, since we're a 64-bit architecture and the preempt count is
only 32 bits wide, we can simply pack it next to the resched flag and
load the whole thing in one go, so that a dec-and-test operation doesn't
need to load twice.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-07 12:35:53 +00:00
Will Deacon
08861d33d6 preempt: Move PREEMPT_NEED_RESCHED definition into arch code
PREEMPT_NEED_RESCHED is never used directly, so move it into the arch
code where it can potentially be implemented using either a different
bit in the preempt count or as an entirely separate entity.

Cc: Robert Love <rml@tech9.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-07 12:35:46 +00:00
Allen Pais
a21b0b78ea arm64: hugetlb: Register hugepages during arch init
Add hstate for each supported hugepage size using arch initcall.

* no hugepage parameters

  Without hugepage parameters, only a default hugepage size is
  available for dynamic allocation.  It's different, for example, from
  x86_64 and sparc64 where all supported hugepage sizes are available.

* only default_hugepagesz= is specified and set not to HPAGE_SIZE

  In spite of the fact that default_hugepagesz= is set to a valid
  hugepage size, it's treated as unsupported and reverted to
  HPAGE_SIZE.  Such behaviour is also different from x86_64 and
  sparc64.

Acked-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Tom Saeger <tom.saeger@oracle.com>
Signed-off-by: Dmitry Klochkov <dmitry.klochkov@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06 17:01:13 +00:00
Jackie Liu
cc9f8349cb arm64: crypto: add NEON accelerated XOR implementation
This is a NEON acceleration method that can improve
performance by approximately 20%. I got the following
data from the centos 7.5 on Huawei's HISI1616 chip:

[ 93.837726] xor: measuring software checksum speed
[ 93.874039]   8regs  : 7123.200 MB/sec
[ 93.914038]   32regs : 7180.300 MB/sec
[ 93.954043]   arm64_neon: 9856.000 MB/sec
[ 93.954047] xor: using function: arm64_neon (9856.000 MB/sec)

I believe this code can bring some optimization for
all arm64 platform. thanks for Ard Biesheuvel's suggestions.

Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06 16:47:06 +00:00
Jackie Liu
21e28547f6 arm64/neon: add workaround for ambiguous C99 stdint.h types
In a way similar to ARM commit 09096f6a0e ("ARM: 7822/1: add workaround
for ambiguous C99 stdint.h types"), this patch redefines the macros that
are used in stdint.h so its definitions of uint64_t and int64_t are
compatible with those of the kernel.

This patch comes from: https://patchwork.kernel.org/patch/3540001/
Wrote by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

We mark this file as a private file and don't have to override asm/types.h

Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06 16:47:05 +00:00
Will Deacon
8cb3451b1f arm64: entry: Remove confusing comment
The comment about SYS_MEMBARRIER_SYNC_CORE relying on ERET being
context-synchronizing is confusing and misplaced with kpti. Given that
this is already documented under Documentation/ (see arch-support.txt
for membarrier), remove the comment altogether.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06 16:47:05 +00:00
Will Deacon
679db70801 arm64: entry: Place an SB sequence following an ERET instruction
Some CPUs can speculate past an ERET instruction and potentially perform
speculative accesses to memory before processing the exception return.
Since the register state is often controlled by a lower privilege level
at the point of an ERET, this could potentially be used as part of a
side-channel attack.

This patch emits an SB sequence after each ERET so that speculation is
held up on exception return.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06 16:47:05 +00:00
Will Deacon
bd4fb6d270 arm64: Add support for SB barrier and patch in over DSB; ISB sequences
We currently use a DSB; ISB sequence to inhibit speculation in set_fs().
Whilst this works for current CPUs, future CPUs may implement a new SB
barrier instruction which acts as an architected speculation barrier.

On CPUs that support it, patch in an SB; NOP sequence over the DSB; ISB
sequence and advertise the presence of the new instruction to userspace.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06 16:47:04 +00:00
Suzuki K Poulose
0b587c84e4 arm64: capabilities: Batch cpu_enable callbacks
We use a stop_machine call for each available capability to
enable it on all the CPUs available at boot time. Instead
we could batch the cpu_enable callbacks to a single stop_machine()
call to save us some time.

Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06 15:12:26 +00:00
Suzuki K Poulose
606f8e7b27 arm64: capabilities: Use linear array for detection and verification
Use the sorted list of capability entries for the detection and
verification.

Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06 15:12:26 +00:00
Suzuki K Poulose
f7bfc14a08 arm64: capabilities: Optimize this_cpu_has_cap
Make use of the sorted capability list to access the capability
entry in this_cpu_has_cap() to avoid iterating over the two
tables.

Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06 15:12:25 +00:00
Suzuki K Poulose
82a3a21b23 arm64: capabilities: Speed up capability lookup
We maintain two separate tables of capabilities, errata and features,
which decide the system capabilities. We iterate over each of these
tables for various operations (e.g, detection, verification etc.).
We do not have a way to map a system "capability" to its entry,
(i.e, cap -> struct arm64_cpu_capabilities) which is needed for
this_cpu_has_cap(). So we iterate over the table one by one to
find the entry and then do the operation. Also, this prevents
us from optimizing the way we "enable" the capabilities on the
CPUs, where we now issue a stop_machine() for each available
capability.

One solution is to merge the two tables into a single table,
sorted by the capability. But this is has the following
disadvantages:
  - We loose the "classification" of an errata vs. feature
  - It is quite easy to make a mistake when adding an entry,
    unless we sort the table at runtime.

So we maintain a list of pointers to the capability entry, sorted
by the "cap number" in a separate array, initialized at boot time.
The only restriction is that we can have one "entry" per capability.
While at it, remove the duplicate declaration of arm64_errata table.

Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06 15:12:25 +00:00
Suzuki K Poulose
a3dcea2c85 arm64: capabilities: Merge duplicate entries for Qualcomm erratum 1003
Remove duplicate entries for Qualcomm erratum 1003. Since the entries
are not purely based on generic MIDR checks, use the multi_cap_entry
type to merge the entries.

Cc: Christopher Covington <cov@codeaurora.org>
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06 11:47:44 +00:00
Suzuki K Poulose
f58cdf7e3c arm64: capabilities: Merge duplicate Cavium erratum entries
Merge duplicate entries for a single capability using the midr
range list for Cavium errata 30115 and 27456.

Cc: Andrew Pinski <apinski@cavium.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06 11:47:44 +00:00
Suzuki K Poulose
c9460dcb06 arm64: capabilities: Merge entries for ARM64_WORKAROUND_CLEAN_CACHE
We have two entries for ARM64_WORKAROUND_CLEAN_CACHE capability :

1) ARM Errata 826319, 827319, 824069, 819472 on A53 r0p[012]
2) ARM Errata 819472 on A53 r0p[01]

Both have the same work around. Merge these entries to avoid
duplicate entries for a single capability. Add a new Kconfig
entry to control the "capability" entry to make it easier
to handle combinations of the CONFIGs.

Cc: Will Deacon <will.deacon@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-06 11:47:44 +00:00
Ard Biesheuvel
3bbd3db864 arm64: relocatable: fix inconsistencies in linker script and options
readelf complains about the section layout of vmlinux when building
with CONFIG_RELOCATABLE=y (for KASLR):

  readelf: Warning: [21]: Link field (0) should index a symtab section.
  readelf: Warning: [21]: Info field (0) should index a relocatable section.

Also, it seems that our use of '-pie -shared' is contradictory, and
thus ambiguous. In general, the way KASLR is wired up at the moment
is highly tailored to how ld.bfd happens to implement (and conflate)
PIE executables and shared libraries, so given the current effort to
support other toolchains, let's fix some of these issues as well.

- Drop the -pie linker argument and just leave -shared. In ld.bfd,
  the differences between them are unclear (except for the ELF type
  of the produced image [0]) but lld chokes on seeing both at the
  same time.

- Rename the .rela output section to .rela.dyn, as is customary for
  shared libraries and PIE executables, so that it is not misidentified
  by readelf as a static relocation section (producing the warnings
  above).

- Pass the -z notext and -z norelro options to explicitly instruct the
  linker to permit text relocations, and to omit the RELRO program
  header (which requires a certain section layout that we don't adhere
  to in the kernel). These are the defaults for current versions of
  ld.bfd.

- Discard .eh_frame and .gnu.hash sections to avoid them from being
  emitted between .head.text and .text, screwing up the section layout.

These changes only affect the ELF image, and produce the same binary
image.

[0] b9dce7f1ba ("arm64: kernel: force ET_DYN ELF type for ...")

Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Smith <peter.smith@linaro.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-12-04 12:48:25 +00:00
Ard Biesheuvel
efdb25efc7 arm64/lib: improve CRC32 performance for deep pipelines
Improve the performance of the crc32() asm routines by getting rid of
most of the branches and small sized loads on the common path.

Instead, use a branchless code path involving overlapping 16 byte
loads to process the first (length % 32) bytes, and process the
remainder using a loop that processes 32 bytes at a time.

Tested using the following test program:

  #include <stdlib.h>

  extern void crc32_le(unsigned short, char const*, int);

  int main(void)
  {
    static const char buf[4096];

    srand(20181126);

    for (int i = 0; i < 100 * 1000 * 1000; i++)
      crc32_le(0, buf, rand() % 1024);

    return 0;
  }

On Cortex-A53 and Cortex-A57, the performance regresses but only very
slightly. On Cortex-A72 however, the performance improves from

  $ time ./crc32

  real  0m10.149s
  user  0m10.149s
  sys   0m0.000s

to

  $ time ./crc32

  real  0m7.915s
  user  0m7.915s
  sys   0m0.000s

Cc: Rui Sun <sunrui26@huawei.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30 13:58:04 +00:00
Mark Rutland
7dc48bf96a arm64: ftrace: always pass instrumented pc in x0
The core ftrace hooks take the instrumented PC in x0, but for some
reason arm64's prepare_ftrace_return() takes this in x1.

For consistency, let's flip the argument order and always pass the
instrumented PC in x0.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30 13:29:05 +00:00
Mark Rutland
49e258e05e arm64: ftrace: remove return_regs macros
The save_return_regs and restore_return_regs macros are only used by
return_to_handler, and having them defined out-of-line only serves to
obscure the logic.

Before we complicate, let's clean this up and fold the logic directly
into return_to_handler, saving a few lines of macro boilerplate in the
process. At the same time, a missing trailing space is added to the
comments, fixing a code style violation.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30 13:29:05 +00:00
Mark Rutland
6e803e2e6e arm64: ftrace: don't adjust the LR value
The core ftrace code requires that when it is handed the PC of an
instrumented function, this PC is the address of the instrumented
instruction. This is necessary so that the core ftrace code can identify
the specific instrumentation site. Since the instrumented function will
be a BL, the address of the instrumented function is LR - 4 at entry to
the ftrace code.

This fixup is applied in the mcount_get_pc and mcount_get_pc0 helpers,
which acquire the PC of the instrumented function.

The mcount_get_lr helper is used to acquire the LR of the instrumented
function, whose value does not require this adjustment, and cannot be
adjusted to anything meaningful. No adjustment of this value is made on
other architectures, including arm. However, arm64 adjusts this value by
4.

This patch brings arm64 in line with other architectures and removes the
adjustment of the LR value.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30 13:29:05 +00:00
Mark Rutland
5c176aff5b arm64: ftrace: enable graph FP test
The core frace code has an optional sanity check on the frame pointer
passed by ftrace_graph_caller and return_to_handler. This is cheap,
useful, and enabled unconditionally on x86, sparc, and riscv.

Let's do the same on arm64, so that we can catch any problems early.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30 13:29:04 +00:00
Mark Rutland
e4fe196642 arm64: ftrace: use GLOBAL()
The global exports of ftrace_call and ftrace_graph_call are somewhat
painful to read. Let's use the generic GLOBAL() macro to ameliorate
matters.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30 13:29:04 +00:00
Mark Rutland
ad697a1aec linkage: add generic GLOBAL() macro
Declaring a global symbol in assembly is tedious, error-prone, and
painful to read. While ENTRY() exists, this is supposed to be used for
function entry points, and this affects alignment in a potentially
undesireable manner.

Instead, let's add a generic GLOBAL() macro for this, as x86 added
locally in commit:

  95695547a7 ("x86: asm linkage - introduce GLOBAL macro")

... thus allowing us to use this more freely in the kernel.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Torsten Duwe <duwe@suse.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30 13:29:04 +00:00
Ard Biesheuvel
dd6846d774 arm64: drop linker script hack to hide __efistub_ symbols
Commit 1212f7a16a ("scripts/kallsyms: filter arm64's __efistub_
symbols") updated the kallsyms code to filter out symbols with
the __efistub_ prefix explicitly, so we no longer require the
hack in our linker script to emit them as absolute symbols.

Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-30 12:49:51 +00:00
Will Deacon
1b57ec8c75 arm64: io: Ensure value passed to __iormb() is held in a 64-bit register
As of commit 6460d32014 ("arm64: io: Ensure calls to delay routines
are ordered against prior readX()"), MMIO reads smaller than 64 bits
fail to compile under clang because we end up mixing 32-bit and 64-bit
register operands for the same data processing instruction:

./include/asm-generic/io.h:695:9: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths]
        return readb(addr);
               ^
./arch/arm64/include/asm/io.h:147:58: note: expanded from macro 'readb'
                                                                       ^
./include/asm-generic/io.h:695:9: note: use constraint modifier "w"
./arch/arm64/include/asm/io.h:147:50: note: expanded from macro 'readb'
                                                               ^
./arch/arm64/include/asm/io.h:118:24: note: expanded from macro '__iormb'
        asm volatile("eor       %0, %1, %1\n"                           \
                                    ^

Fix the build by casting the macro argument to 'unsigned long' when used
as an input to the inline asm.

Reported-by: Nick Desaulniers <nick.desaulniers@gmail.com>
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-29 16:36:18 +00:00
Will Deacon
3d65b6bbc0 arm64: tlbi: Set MAX_TLBI_OPS to PTRS_PER_PTE
In order to reduce the possibility of soft lock-ups, we bound the
maximum number of TLBI operations performed by a single call to
flush_tlb_range() to an arbitrary constant of 1024.

Whilst this does the job of avoiding lock-ups, we can actually be a bit
smarter by defining this as PTRS_PER_PTE. Due to the structure of our
page tables, using PTRS_PER_PTE means that an outer loop calling
flush_tlb_range() for entire table entries will end up performing just a
single TLBI operation for each entry. As an example, mremap()ing a 1GB
range mapped using 4k pages now requires only 512 TLBI operations when
moving the page tables as opposed to 262144 operations (512*512) when
using the current threshold of 1024.

Cc: Joel Fernandes <joel@joelfernandes.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-27 19:01:21 +00:00
Ard Biesheuvel
bdb85cd1d2 arm64/module: switch to ADRP/ADD sequences for PLT entries
Now that we have switched to the small code model entirely, and
reduced the extended KASLR range to 4 GB, we can be sure that the
targets of relative branches that are out of range are in range
for a ADRP/ADD pair, which is one instruction shorter than our
current MOVN/MOVK/MOVK sequence, and is more idiomatic and so it
is more likely to be implemented efficiently by micro-architectures.

So switch over the ordinary PLT code and the special handling of
the Cortex-A53 ADRP errata, as well as the ftrace trampline
handling.

Reviewed-by: Torsten Duwe <duwe@lst.de>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: Added a couple of comments in the plt equality check]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-27 19:00:45 +00:00
Ard Biesheuvel
7aaf7b2fd2 arm64/insn: add support for emitting ADR/ADRP instructions
Add support for emitting ADR and ADRP instructions so we can switch
over our PLT generation code in a subsequent patch.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-27 18:47:33 +00:00
James Morse
d8797b1257 arm64: Use a raw spinlock in __install_bp_hardening_cb()
__install_bp_hardening_cb() is called via stop_machine() as part
of the cpu_enable callback. To force each CPU to take its turn
when allocating slots, they take a spinlock.

With the RT patches applied, the spinlock becomes a mutex,
and we get warnings about sleeping while in stop_machine():
| [    0.319176] CPU features: detected: RAS Extension Support
| [    0.319950] BUG: scheduling while atomic: migration/3/36/0x00000002
| [    0.319955] Modules linked in:
| [    0.319958] Preemption disabled at:
| [    0.319969] [<ffff000008181ae4>] cpu_stopper_thread+0x7c/0x108
| [    0.319973] CPU: 3 PID: 36 Comm: migration/3 Not tainted 4.19.1-rt3-00250-g330fc2c2a880 #2
| [    0.319975] Hardware name: linux,dummy-virt (DT)
| [    0.319976] Call trace:
| [    0.319981]  dump_backtrace+0x0/0x148
| [    0.319983]  show_stack+0x14/0x20
| [    0.319987]  dump_stack+0x80/0xa4
| [    0.319989]  __schedule_bug+0x94/0xb0
| [    0.319991]  __schedule+0x510/0x560
| [    0.319992]  schedule+0x38/0xe8
| [    0.319994]  rt_spin_lock_slowlock_locked+0xf0/0x278
| [    0.319996]  rt_spin_lock_slowlock+0x5c/0x90
| [    0.319998]  rt_spin_lock+0x54/0x58
| [    0.320000]  enable_smccc_arch_workaround_1+0xdc/0x260
| [    0.320001]  __enable_cpu_capability+0x10/0x20
| [    0.320003]  multi_cpu_stop+0x84/0x108
| [    0.320004]  cpu_stopper_thread+0x84/0x108
| [    0.320008]  smpboot_thread_fn+0x1e8/0x2b0
| [    0.320009]  kthread+0x124/0x128
| [    0.320010]  ret_from_fork+0x10/0x18

Switch this to a raw spinlock, as we know this is only called with
IRQs masked.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-27 18:01:34 +00:00
Jeremy Linton
9eb1c92b47 arm64: acpi: Prepare for longer MADTs
The BAD_MADT_GICC_ENTRY check is a little too strict because
it rejects MADT entries that don't match the currently known
lengths. We should remove this restriction to avoid problems
if the table length changes. Future code which might depend on
additional fields should be written to validate those fields
before using them, rather than trying to globally check
known MADT version lengths.

Link: https://lkml.kernel.org/r/20181012192937.3819951-1-jeremy.linton@arm.com
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
[lorenzo.pieralisi@arm.com: added MADT macro comments]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Al Stone <ahs3@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-27 18:00:14 +00:00
Will Deacon
6460d32014 arm64: io: Ensure calls to delay routines are ordered against prior readX()
A relatively standard idiom for ensuring that a pair of MMIO writes to a
device arrive at that device with a specified minimum delay between them
is as follows:

	writel_relaxed(42, dev_base + CTL1);
	readl(dev_base + CTL1);
	udelay(10);
	writel_relaxed(42, dev_base + CTL2);

the intention being that the read-back from the device will push the
prior write to CTL1, and the udelay will hold up the write to CTL1 until
at least 10us have elapsed.

Unfortunately, on arm64 where the underlying delay loop is implemented
as a read of the architected counter, the CPU does not guarantee
ordering from the readl() to the delay loop and therefore the delay loop
could in theory be speculated and not provide the desired interval
between the two writes.

Fix this in a similar manner to PowerPC by introducing a dummy control
dependency on the output of readX() which, combined with the ISB in the
read of the architected counter, guarantees that a subsequent delay loop
can not be executed until the readX() has returned its result.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-27 12:18:07 +00:00
Alex Van Brunt
3403e56b41 arm64: mm: Don't wait for completion of TLB invalidation when page aging
When transitioning a PTE from young to old as part of page aging, we
can avoid waiting for the TLB invalidation to complete and therefore
drop the subsequent DSB instruction. Whilst this opens up a race with
page reclaim, where a PTE in active use via a stale, young TLB entry
does not update the underlying descriptor, the worst thing that happens
is that the page is reclaimed and then immediately faulted back in.

Given that we have a DSB in our context-switch path, the window for a
spurious reclaim is fairly limited and eliding the barrier claims to
boost NVMe/SSD accesses by over 10% on some platforms.

A similar optimisation was made for x86 in commit b13b1d2d86 ("x86/mm:
In the PTE swapout page reclaim case clear the accessed bit instead of
flushing the TLB").

Signed-off-by: Alex Van Brunt <avanbrunt@nvidia.com>
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
[will: rewrote patch]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-26 16:59:46 +00:00
Jessica Yu
c8ebf64eab arm64/module: use plt section indices for relocations
Instead of saving a pointer to the .plt and .init.plt sections to apply
plt-based relocations, save and use their section indices instead.

The mod->arch.{core,init}.plt pointers were problematic for livepatch
because they pointed within temporary section headers (provided by the
module loader via info->sechdrs) that would be freed after module load.
Since livepatch modules may need to apply relocations post-module-load
(for example, to patch a module that is loaded later), using section
indices to offset into the section headers (instead of accessing them
through a saved pointer) allows livepatch modules on arm64 to pass in
their own copy of the section headers to apply_relocate_add() to apply
delayed relocations.

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-20 11:38:26 +00:00
Ard Biesheuvel
c55191e96c arm64: mm: apply r/o permissions of VM areas to its linear alias as well
On arm64, we use block mappings and contiguous hints to map the linear
region, to minimize the TLB footprint. However, this means that the
entire region is mapped using read/write permissions, which we cannot
modify at page granularity without having to take intrusive measures to
prevent TLB conflicts.

This means the linear aliases of pages belonging to read-only mappings
(executable or otherwise) in the vmalloc region are also mapped read/write,
and could potentially be abused to modify things like module code, bpf JIT
code or other read-only data.

So let's fix this, by extending the set_memory_ro/rw routines to take
the linear alias into account. The consequence of enabling this is
that we can no longer use block mappings or contiguous hints, so in
cases where the TLB footprint of the linear region is a bottleneck,
performance may be affected.

Therefore, allow this feature to be runtime en/disabled, by setting
rodata=full (or 'on' to disable just this enhancement, or 'off' to
disable read-only mappings for code and r/o data entirely) on the
kernel command line. Also, allow the default value to be set via a
Kconfig option.

Tested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-20 11:38:26 +00:00
Ard Biesheuvel
b34d2ef0c6 arm64: mm: purge lazily unmapped vm regions before changing permissions
Call vm_unmap_aliases() every time we apply any changes to permission
attributes of mappings in the vmalloc region. This avoids any potential
issues resulting from lingering writable or executable aliases of
mappings that should be read-only or non-executable, respectively.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-11-20 11:38:26 +00:00
Linus Torvalds
9ff01193a2 Linux 4.20-rc3 2018-11-18 13:33:44 -08:00
Linus Torvalds
25e19c1fe4 libnvdimm 4.20-rc3
- Address Range Scrub overflow continuation handling has been broken
   since it was initially merged. It was only recently that error injection
   and platform-BIOS support enabled this corner case to be exercised.
 
 - The recent attempt to provide more isolation for the kernel Address
   Range Scrub state machine from userapace initiated sessions triggers a
   lockdep report. Revert and try again at the next merge window.
 
 - Fix a kasan reported buffer overflow in libnvdimm unit test
   infrastrucutre (nfit_test)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJb8MdUAAoJEB7SkWpmfYgCifYP/A+OQ19HybqcY2nfvqXUdQum
 Q5x3qcKmGmEbKbnCUMOHZJpEjW4c/Cpm6OKhuFDQJ4tijn1XG3/ATSi7PXrZxs/o
 CRK8MIg5Wz/mMvYRvypIkCHHHr9+Y1NjqmQynM4LLzNG24GMXaeHHuZUTnrZCDmu
 0+jBTylNgVYdykoIxgHDYDB+cd6w4NtAP5OD9D46pdsmzX9ac+OQyZMyNB3glUhd
 /ZFAoywVNfvfJVWEci9RoHiKttWxgVoCuNbSlCs2Y6ymepA44ApR9AgLHtaC9pFO
 DrPkfCzPSmf4PVSxLJd79+/sw9YOcBD7LZ5IxzozxRMuRn5pIofdZIsBg9PlwT5B
 NL9jQK87XPiG0vNxhJu3wzP+FlyCXxGxkWfApp7w4rlWBV7RgugOZHyH051rdKzQ
 44JAPzLLCfA5Mj4o2tIbSx42f2JNX93XDEX8fkUB+qs3GzyOcMtlcmz9UjmnrT0R
 o9KHKhDn81Vivxh33Ts2G0iHktO83XSUBDWApSd6erjEUXMsCLY0D8y+nDGTOMUh
 kVcY8q93sgZGLVbcxt0eGc8Q7osZYawQGRGucflTETFcxNwMyLL4F9lWgPirGeYF
 i1JDWeTrhcImYufNj8o78LsbT5xh6YjbZZ8Q1obIgPXpDtxHNIXO6COId49Zp2cK
 obftWyVp+7kYe79NWzmD
 =sfNx
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-fixes-4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fixes from Dan Williams:
 "A small batch of fixes for v4.20-rc3.

  The overflow continuation fix addresses something that has been broken
  for several releases. Arguably it could wait even longer, but it's a
  one line fix and this finishes the last of the known address range
  scrub bug reports. The revert addresses a lockdep regression. The unit
  tests are not critical to fix, but no reason to hold this fix back.

  Summary:

   - Address Range Scrub overflow continuation handling has been broken
     since it was initially merged. It was only recently that error
     injection and platform-BIOS support enabled this corner case to be
     exercised.

   - The recent attempt to provide more isolation for the kernel Address
     Range Scrub state machine from userapace initiated sessions
     triggers a lockdep report. Revert and try again at the next merge
     window.

   - Fix a kasan reported buffer overflow in libnvdimm unit test
     infrastrucutre (nfit_test)"

* tag 'libnvdimm-fixes-4.20-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  Revert "acpi, nfit: Further restrict userspace ARS start requests"
  acpi, nfit: Fix ARS overflow continuation
  tools/testing/nvdimm: Fix the array size for dimm devices.
2018-11-18 12:21:09 -08:00
Linus Torvalds
c67a98c00e Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "16 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/memblock.c: fix a typo in __next_mem_pfn_range() comments
  mm, page_alloc: check for max order in hot path
  scripts/spdxcheck.py: make python3 compliant
  tmpfs: make lseek(SEEK_DATA/SEK_HOLE) return ENXIO with a negative offset
  lib/ubsan.c: don't mark __ubsan_handle_builtin_unreachable as noreturn
  mm/vmstat.c: fix NUMA statistics updates
  mm/gup.c: fix follow_page_mask() kerneldoc comment
  ocfs2: free up write context when direct IO failed
  scripts/faddr2line: fix location of start_kernel in comment
  mm: don't reclaim inodes with many attached pages
  mm, memory_hotplug: check zone_movable in has_unmovable_pages
  mm/swapfile.c: use kvzalloc for swap_info_struct allocation
  MAINTAINERS: update OMAP MMC entry
  hugetlbfs: fix kernel BUG at fs/hugetlbfs/inode.c:444!
  kernel/sched/psi.c: simplify cgroup_move_task()
  z3fold: fix possible reclaim races
2018-11-18 11:31:26 -08:00