This commits selects ARCH_HAS_ELF_RANDOMIZE when an arch uses the generic
topdown mmap layout functions so that this security feature is on by
default.
Note that this commit also removes the possibility for arm64 to have elf
randomization and no MMU: without MMU, the security added by randomization
is worth nothing.
Link: http://lkml.kernel.org/r/20190730055113.23635-6-alex@ghiti.fr
Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- 52-bit virtual addressing in the kernel
- New ABI to allow tagged user pointers to be dereferenced by syscalls
- Early RNG seeding by the bootloader
- Improve robustness of SMP boot
- Fix TLB invalidation in light of recent architectural clarifications
- Support for i.MX8 DDR PMU
- Remove direct LSE instruction patching in favour of static keys
- Function error injection using kprobes
- Support for the PPTT "thread" flag introduced by ACPI 6.3
- Move PSCI idle code into proper cpuidle driver
- Relaxation of implicit I/O memory barriers
- Build with RELR relocations when toolchain supports them
- Numerous cleanups and non-critical fixes
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl1yYREQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNAM3CAChqDFQkryXoHwdeEcaukMRVNxtxOi4pM4g
5xqkb7PoqRJssIblsuhaXjrSD97yWCgaqCmFe6rKoes++lP4bFcTe22KXPPyPBED
A+tK4nTuKKcZfVbEanUjI+ihXaHJmKZ/kwAxWsEBYZ4WCOe3voCiJVNO2fHxqg1M
8TskZ2BoayTbWMXih0eJg2MCy/xApBq4b3nZG4bKI7Z9UpXiKN1NYtDh98ZEBK4V
d/oNoHsJ2ZvIQsztoBJMsvr09DTCazCijWZiECadm6l41WEPFizngrACiSJLLtYo
0qu4qxgg9zgFlvBCRQmIYSggTuv35RgXSfcOwChmW5DUjHG+f9GK
=Ru4B
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Although there isn't tonnes of code in terms of line count, there are
a fair few headline features which I've noted both in the tag and also
in the merge commits when I pulled everything together.
The part I'm most pleased with is that we had 35 contributors this
time around, which feels like a big jump from the usual small group of
core arm64 arch developers. Hopefully they all enjoyed it so much that
they'll continue to contribute, but we'll see.
It's probably worth highlighting that we've pulled in a branch from
the risc-v folks which moves our CPU topology code out to where it can
be shared with others.
Summary:
- 52-bit virtual addressing in the kernel
- New ABI to allow tagged user pointers to be dereferenced by
syscalls
- Early RNG seeding by the bootloader
- Improve robustness of SMP boot
- Fix TLB invalidation in light of recent architectural
clarifications
- Support for i.MX8 DDR PMU
- Remove direct LSE instruction patching in favour of static keys
- Function error injection using kprobes
- Support for the PPTT "thread" flag introduced by ACPI 6.3
- Move PSCI idle code into proper cpuidle driver
- Relaxation of implicit I/O memory barriers
- Build with RELR relocations when toolchain supports them
- Numerous cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (114 commits)
arm64: remove __iounmap
arm64: atomics: Use K constraint when toolchain appears to support it
arm64: atomics: Undefine internal macros after use
arm64: lse: Make ARM64_LSE_ATOMICS depend on JUMP_LABEL
arm64: asm: Kill 'asm/atomic_arch.h'
arm64: lse: Remove unused 'alt_lse' assembly macro
arm64: atomics: Remove atomic_ll_sc compilation unit
arm64: avoid using hard-coded registers for LSE atomics
arm64: atomics: avoid out-of-line ll/sc atomics
arm64: Use correct ll/sc atomic constraints
jump_label: Don't warn on __exit jump entries
docs/perf: Add documentation for the i.MX8 DDR PMU
perf/imx_ddr: Add support for AXI ID filtering
arm64: kpti: ensure patched kernel text is fetched from PoU
arm64: fix fixmap copy for 16K pages and 48-bit VA
perf/smmuv3: Validate groups for global filtering
perf/smmuv3: Validate group size
arm64: Relax Documentation/arm64/tagged-pointers.rst
arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F
arm64: mm: Ignore spurious translation faults taken from the kernel
...
* for-next/52-bit-kva: (25 commits)
Support for 52-bit virtual addressing in kernel space
* for-next/cpu-topology: (9 commits)
Move CPU topology parsing into core code and add support for ACPI 6.3
* for-next/error-injection: (2 commits)
Support for function error injection via kprobes
* for-next/perf: (8 commits)
Support for i.MX8 DDR PMU and proper SMMUv3 group validation
* for-next/psci-cpuidle: (7 commits)
Move PSCI idle code into a new CPUidle driver
* for-next/rng: (4 commits)
Support for 'rng-seed' property being passed in the devicetree
* for-next/smpboot: (3 commits)
Reduce fragility of secondary CPU bringup in debug configurations
* for-next/tbi: (10 commits)
Introduce new syscall ABI with relaxed requirements for pointer tags
* for-next/tlbi: (6 commits)
Handle spurious page faults arising from kernel space
When we fail to bring a secondary CPU online and it fails in an unknown
state, we should assume the worst and increment 'cpus_stuck_in_kernel'
so that things like kexec() are disabled.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Although SMP bringup is inherently racy, we can significantly reduce
the window during which secondary CPUs can unexpectedly enter the
kernel by sanity checking the 'stack' and 'task' fields of the
'secondary_data' structure. If the booting CPU gave up waiting for us,
then they will have been cleared to NULL and we should spin in a WFE; WFI
loop instead.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
When many debug options are enabled simultaneously (e.g. PROVE_LOCKING,
KMEMLEAK, DEBUG_PAGE_ALLOC, KASAN etc), it is possible for us to timeout
when attempting to boot a secondary CPU and give up. Unfortunately, the
CPU will /eventually/ appear, and sit in the background happily stuck
in a recursive exception due to a NULL stack pointer.
Increase the timeout to 5s, which will of course be enough for anybody.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Adding "rng-seed" to dtb. It's fine to add this property if original
fdt doesn't contain it. Since original seed will be wiped after
read, so use a default size 128 bytes here.
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Will Deacon <will@kernel.org>
Currently in arm64, FDT is mapped to RO before it's passed to
early_init_dt_scan(). However, there might be some codes
(eg. commit "fdt: add support for rng-seed") that need to modify FDT
during init. Map FDT to RO after early fixups are done.
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Will Deacon <will@kernel.org>
When taking an SError or Debug exception from EL0, we run the C
handler for these exceptions before updating the context tracking
code and unmasking lower priority interrupts.
When booting with nohz_full lockdep tells us we got this wrong:
| =============================
| WARNING: suspicious RCU usage
| 5.3.0-rc2-00010-gb4b5e9dcb11b-dirty #11271 Not tainted
| -----------------------------
| include/linux/rcupdate.h:643 rcu_read_unlock() used illegally wh!
|
| other info that might help us debug this:
|
|
| RCU used illegally from idle CPU!
| rcu_scheduler_active = 2, debug_locks = 1
| RCU used illegally from extended quiescent state!
| 1 lock held by a.out/432:
| #0: 00000000c7a79515 (rcu_read_lock){....}, at: brk_handler+0x00
|
| stack backtrace:
| CPU: 1 PID: 432 Comm: a.out Not tainted 5.3.0-rc2-00010-gb4b5e9d1
| Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno De8
| Call trace:
| dump_backtrace+0x0/0x140
| show_stack+0x14/0x20
| dump_stack+0xbc/0x104
| lockdep_rcu_suspicious+0xf8/0x108
| brk_handler+0x164/0x1b0
| do_debug_exception+0x11c/0x278
| el0_dbg+0x14/0x20
Moving the ct_user_exit calls to be before do_debug_exception() means
they are also before trace_hardirqs_off() has been updated. Add a new
ct_user_exit_irqoff macro to avoid the context-tracking code using
irqsave/restore before we've updated trace_hardirqs_off(). To be
consistent, do this everywhere.
The C helper is called enter_from_user_mode() to match x86 in the hope
we can merge them into kernel/context_tracking.c later.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 6c81fe7925 ("arm64: enable context tracking")
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
First rename the sysctl control to abi.tagged_addr_disabled and make it
default off (zero). When abi.tagged_addr_disabled == 1, only block the
enabling of the TBI ABI via prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE).
Getting the status of the ABI or disabling it is still allowed.
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
In perf_event.c we use smp_processor_id(), but we haven't included
<linux/smp.h> where it is defined, and rely on this being pulled in
via a transitive include. Let's make this more robust by including
<linux.smp.h> explicitly.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Raphael Gault <raphael.gault@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
- Don't taint the kernel if CPUs have different sets of page sizes
supported (other than the one in use).
- Issue I-cache maintenance for module ftrace trampoline.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl1W5VMACgkQa9axLQDI
XvFAWQ//RrxMHeN7/Ynv1MDqucZlMpQJMUV2V0K5wYO6ZwGMKYXw62SBzb9AMv0y
iFhYNZbtBoE2JhLNEfwhdNkk9NfUsKbUcfyt0cono2DU1tVihmmVqbNbairNhKVo
j7vxnCx8SMQ5ZT+QaTpKUddzlzb4jdycXGYaje62bbA19BCOVVIuy61wGNtHP4IU
817RHqIj6aqIbmplt79+Q3vOopB8BbxrnpC5cp8rw1CKbfu7kQN4zePIA4Z4bhag
G5hm+aTV1qzrmEud2WkuA7044vsw2Wkd/8gksRyQKkypfqfQ3NrASDn3xGqOi/5t
2DsyJPaVUC1tsJdrVqpfWZfLJJ8FGyox7aeXL7OdPrfWP6HI9jJnodRVH/av97g6
psaSoJNxXbVqrg/wva6i2f25KBYdp/vQW0Nmjljvt4dmEDrE/jpifAYAH/LUmi5H
fMNXyOdeDcgUoVfcEk/S1leiDf0gUy+B8ylknk12knbMuk/9ATq/2H3RKqrRJYWL
qUQBvB04d7NHZl8wl+IVLlK8g4x5Jeetjm7GHvLbOp9agb63kwVFq50c4zD052LA
eKy6LbUO2xHyA3PrtvBem/hKZ/GCTh0o0xcwUkyNcsLUHfKnMcHLEH/WWd5rmYIQ
xvbQVhVORR8ru7eCQCRMBmjGaHFz2ZQa4Q3V3qVBnX+q/F6dku8=
=VneQ
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Don't taint the kernel if CPUs have different sets of page sizes
supported (other than the one in use).
- Issue I-cache maintenance for module ftrace trampoline.
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side
arm64: cpufeature: Don't treat granule sizes as strict
The initial support for dynamic ftrace trampolines in modules made use
of an indirect branch which loaded its target from the beginning of
a special section (e71a4e1beb ("arm64: ftrace: add support for far
branches to dynamic ftrace")). Since no instructions were being patched,
no cache maintenance was needed. However, later in be0f272bfc ("arm64:
ftrace: emit ftrace-mod.o contents through code") this code was reworked
to output the trampoline instructions directly into the PLT entry but,
unfortunately, the necessary cache maintenance was overlooked.
Add a call to __flush_icache_range() after writing the new trampoline
instructions but before patching in the branch to the trampoline.
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: James Morse <james.morse@arm.com>
Cc: <stable@vger.kernel.org>
Fixes: be0f272bfc ("arm64: ftrace: emit ftrace-mod.o contents through code")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The trusted OS may reject CPU_OFF calls to its resident CPU, so we must
avoid issuing those. We never migrate a Trusted OS and we already take
care to prevent CPU_OFF PSCI call. However, this is not reflected
explicitly to the userspace. Any user can attempt to hotplug trusted OS
resident CPU. The entire motion of going through the various state
transitions in the CPU hotplug state machine gets executed and the
PSCI layer finally refuses to make CPU_OFF call.
This results is unnecessary unwinding of CPU hotplug state machine in
the kernel. Instead we can mark the trusted OS resident CPU as not
available for hotplug, so that the user attempt or request to do the
same will get immediately rejected.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
It seems that LLVM's linker does not correctly handle variable assignments
involving section positions that are updated during the SECTIONS
parsing. Commit aa69fb62be ("arm64/efi: Mark __efistub_stext_offset as
an absolute symbol explicitly") ran into this too, but found a different
workaround.
However, this was not enough, as other variables were also miscalculated
which manifested as boot failures under UEFI where __efistub__end was
not taking the correct _end value (they should be the same):
$ ld.lld -EL -maarch64elf --no-undefined -X -shared \
-Bsymbolic -z notext -z norelro --no-apply-dynamic-relocs \
-o vmlinux.lld -T poc.lds --whole-archive vmlinux.o && \
readelf -Ws vmlinux.lld | egrep '\b(__efistub_|)_end\b'
368272: ffff000002218000 0 NOTYPE LOCAL HIDDEN 38 __efistub__end
368322: ffff000012318000 0 NOTYPE GLOBAL DEFAULT 38 _end
$ aarch64-linux-gnu-ld.bfd -EL -maarch64elf --no-undefined -X -shared \
-Bsymbolic -z notext -z norelro --no-apply-dynamic-relocs \
-o vmlinux.bfd -T poc.lds --whole-archive vmlinux.o && \
readelf -Ws vmlinux.bfd | egrep '\b(__efistub_|)_end\b'
338124: ffff000012318000 0 NOTYPE LOCAL DEFAULT ABS __efistub__end
383812: ffff000012318000 0 NOTYPE GLOBAL DEFAULT 15325 _end
To work around this, all of the __efistub_-prefixed variable assignments
need to be moved after the linker script's SECTIONS entry. As it turns
out, this also solves the problem fixed in commit aa69fb62be, so those
changes are reverted here.
Link: https://github.com/ClangBuiltLinux/linux/issues/634
Link: https://bugs.llvm.org/show_bug.cgi?id=42990
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Will Deacon <will@kernel.org>
Prior to commit:
14c127c957 ("arm64: mm: Flip kernel VA space")
... VA_START described the start of the TTBR1 address space for a given
VA size described by VA_BITS, where all kernel mappings began.
Since that commit, VA_START described a portion midway through the
address space, where the linear map ends and other kernel mappings
begin.
To avoid confusion, let's rename VA_START to PAGE_END, making it clear
that it's not the start of the TTBR1 address space and implying that
it's related to PAGE_OFFSET. Comments and other mnemonics are updated
accordingly, along with a typo fix in the decription of VMEMMAP_SIZE.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Pull in generic CPU topology changes from Paul Walmsley (RISC-V).
* tag 'common/for-v5.4-rc1/cpu-topology' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
MAINTAINERS: Add an entry for generic architecture topology
base: arch_topology: update Kconfig help description
RISC-V: Parse cpu topology during boot.
arm: Use common cpu_topology structure and functions.
cpu-topology: Move cpu topology code to common code.
dt-binding: cpu-topology: Move cpu-map to a common binding.
Documentation: DT: arm: add support for sockets defining package boundaries
All instances of struct sys64_hook contain compile-time constant data,
and are never inentionally modified, so let's make them all const.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The aarch64_insn_encoding_class[] array contains compile-time constant
data, and is never intentionally modified, so let's mark it as const.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The icache_policy_str[] array contains compile-time constant data, and
is never intentionally modified, so let's mark it as const.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
GCC unescapes escaped string section names while Clang does not. Because
__section uses the `#` stringification operator for the section name, it
doesn't need to be escaped.
This antipattern was found with:
$ grep -e __section\(\" -e __section__\(\" -r
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
If a CPU doesn't support the page size for which the kernel is
configured, then we will complain and refuse to bring it online. For
secondary CPUs (and the boot CPU on a system booting with EFI), we will
also print an error identifying the mismatch.
Consequently, the only time that the cpufeature code can detect a
granule size mismatch is for a granule other than the one that is
currently being used. Although we would rather such systems didn't
exist, we've unfortunately lost that battle and Kevin reports that
on his amlogic S922X (odroid-n2 board) we end up warning and taining
with defconfig because 16k pages are not supported by all of the CPUs.
In such a situation, we don't actually care about the feature mismatch,
particularly now that KVM only exposes the sanitised view of the CPU
registers (commit 93390c0a1b - "arm64: KVM: Hide unsupported AArch64
CPU features from guests"). Treat the granule fields as non-strict and
let Kevin run without a tainted kernel.
Cc: Marc Zyngier <maz@kernel.org>
Reported-by: Kevin Hilman <khilman@baylibre.com>
Tested-by: Kevin Hilman <khilman@baylibre.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
[catalin.marinas@arm.com: changelog updated with KVM sanitised regs commit]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
ACPI 6.3 adds a thread flag to represent if a CPU/PE is
actually a thread. Given that the MPIDR_MT bit may not
represent this information consistently on homogeneous machines
we should prefer the PPTT flag if its available.
Signed-off-by: Jeremy Linton <jeremy.linton@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Robert Richter <rrichter@marvell.com>
[will: made acpi_cpu_is_threaded() return 'bool']
Signed-off-by: Will Deacon <will@kernel.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJdTfRfAAoJEL/70l94x66DcN0IAIwyaU2+kwP0jd2miQuKxgwl
WU4u7dZCoQC6meWEVmrSJIVMBONRubmZ9iCqT7807YP8YZSQpOth51FMbULUWuy1
VW1eaRwqidX0EAihDhg2ZbBZ8H6RQ9Fn0aiEEh44dAZZAwGSVnO3PRKvQEJ15xjk
q+OQ4hrxtoorwLj+myejmq3YenTFTCMMJfYwwvlCl+J1FfrLZi5k3X5Gjk+j8Ixd
8CL8/6u5Lu6MCgfYVvxvo8/bUPiATBdF1sWJMMALwXTrDiSy4tQRD0NvZP1HM8G1
hy0XnhgtsS9rWNLtAFOj+r/XhP9V5lOOGX8yBcj0XQQr+DC9MG6MCL+pXXOaMcA=
=ZZh8
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"Bugfixes (arm and x86) and cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
selftests: kvm: Adding config fragments
KVM: selftests: Update gitignore file for latest changes
kvm: remove unnecessary PageReserved check
KVM: arm/arm64: vgic: Reevaluate level sensitive interrupts on enable
KVM: arm: Don't write junk to CP15 registers on reset
KVM: arm64: Don't write junk to sysregs on reset
KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block
x86: kvm: remove useless calls to kvm_para_available
KVM: no need to check return value of debugfs_create functions
KVM: remove kvm_arch_has_vcpu_debugfs()
KVM: Fix leak vCPU's VMCS value into other pCPU
KVM: Check preempted_in_kernel for involuntary preemption
KVM: LAPIC: Don't need to wakeup vCPU twice afer timer fire
arm64: KVM: hyp: debug-sr: Mark expected switch fall-through
KVM: arm64: Update kvm_arm_exception_class and esr_class_str for new EC
KVM: arm: vgic-v3: Mark expected switch fall-through
arm64: KVM: regmap: Fix unexpected switch fall-through
KVM: arm/arm64: Introduce kvm_pmu_vcpu_init() to setup PMU counter index
Current PSCI code handles idle state entry through the
psci_cpu_suspend_enter() API, that takes an idle state index as a
parameter and convert the index into a previously initialized
power_state parameter before calling the PSCI.CPU_SUSPEND() with it.
This is unwieldly, since it forces the PSCI firmware layer to keep track
of power_state parameter for every idle state so that the
index->power_state conversion can be made in the PSCI firmware layer
instead of the CPUidle driver implementations.
Move the power_state handling out of drivers/firmware/psci
into the respective ACPI/DT PSCI CPUidle backends and convert
the psci_cpu_suspend_enter() API to get the power_state
parameter as input, which makes it closer to its firmware
interface PSCI.CPU_SUSPEND() API.
A notable side effect is that the PSCI ACPI/DT CPUidle backends
now can directly handle (and if needed update) power_state
parameters before handing them over to the PSCI firmware
interface to trigger PSCI.CPU_SUSPEND() calls.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Will Deacon <will@kernel.org>
Allow selection of the PSCI CPUidle in the kernel by updating
the respective Kconfig entry.
Remove PSCI callbacks from ARM/ARM64 generic CPU ops
to prevent the PSCI idle driver from clashing with the generic
ARM CPUidle driver initialization, that relies on CPU ops
to initialize and enter idle states.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Will Deacon <will@kernel.org>
Previous patches have enabled 52-bit kernel + user VAs and there is no
longer any scenario where user VA != kernel VA size.
This patch removes the, now redundant, vabits_user variable and replaces
usage with vabits_actual where appropriate.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Most of the machinery is now in place to enable 52-bit kernel VAs that
are detectable at boot time.
This patch adds a Kconfig option for 52-bit user and kernel addresses
and plumbs in the requisite CONFIG_ macros as well as sets TCR.T1SZ,
physvirt_offset and vmemmap at early boot.
To simplify things this patch also removes the 52-bit user/48-bit kernel
kconfig option.
Signed-off-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
When running with a 52-bit userspace VA and a 48-bit kernel VA we offset
ttbr1_el1 to allow the kernel pagetables with a 52-bit PTRS_PER_PGD to
be used for both userspace and kernel.
Moving on to a 52-bit kernel VA we no longer require this offset to
ttbr1_el1 should we be running on a system with HW support for 52-bit
VAs.
This patch introduces conditional logic to offset_ttbr1 to query
SYS_ID_AA64MMFR2_EL1 whenever 52-bit VAs are selected. If there is HW
support for 52-bit VAs then the ttbr1 offset is skipped.
We choose to read a system register rather than vabits_actual because
offset_ttbr1 can be called in places where the kernel data is not
actually mapped.
Calls to offset_ttbr1 appear to be made from rarely called code paths so
this extra logic is not expected to adversely affect performance.
Signed-off-by: Steve Capper <steve.capper@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
In order to support 52-bit kernel addresses detectable at boot time, one
needs to know the actual VA_BITS detected. A new variable vabits_actual
is introduced in this commit and employed for the KVM hypervisor layout,
KASAN, fault handling and phys-to/from-virt translation where there
would normally be compile time constants.
In order to maintain performance in phys_to_virt, another variable
physvirt_offset is introduced.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
In order to support 52-bit kernel addresses detectable at boot time, the
kernel needs to know the most conservative VA_BITS possible should it
need to fall back to this quantity due to lack of hardware support.
A new compile time constant VA_BITS_MIN is introduced in this patch and
it is employed in the KASAN end address, KASLR, and EFI stub.
For Arm, if 52-bit VA support is unavailable the fallback is to 48-bits.
In other words: VA_BITS_MIN = min (48, VA_BITS)
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
In order to allow for a KASAN shadow that changes size at boot time, one
must fix the KASAN_SHADOW_END for both 48 & 52-bit VAs and "grow" the
start address. Also, it is highly desirable to maintain the same
function addresses in the kernel .text between VA sizes. Both of these
requirements necessitate us to flip the kernel address space halves s.t.
the direct linear map occupies the lower addresses.
This patch puts the direct linear map in the lower addresses of the
kernel VA range and everything else in the higher ranges.
We need to adjust:
*) KASAN shadow region placement logic,
*) KASAN_SHADOW_OFFSET computation logic,
*) virt_to_phys, phys_to_virt checks,
*) page table dumper.
These are all small changes, that need to take place atomically, so they
are bundled into this commit.
As part of the re-arrangement, a guard region of 2MB (to preserve
alignment for fixed map) is added after the vmemmap. Otherwise the
vmemmap could intersect with IS_ERR pointers.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The ptrace trace SVE flags are prefixed with SVE_PT_*. Update the
comment accordingly.
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
The commit d5370f7548 ("arm64: prefetch: add alternative pattern for
CPUs without a prefetcher") introduced MIDR_IS_CPU_MODEL_RANGE() to be
used in has_no_hw_prefetch() with rv_min=0 which generates a compilation
warning from GCC,
In file included from ./arch/arm64/include/asm/cache.h:8,
from ./include/linux/cache.h:6,
from ./include/linux/printk.h:9,
from ./include/linux/kernel.h:15,
from ./include/linux/cpumask.h:10,
from arch/arm64/kernel/cpufeature.c:11:
arch/arm64/kernel/cpufeature.c: In function 'has_no_hw_prefetch':
./arch/arm64/include/asm/cputype.h:59:26: warning: comparison of
unsigned expression >= 0 is always true [-Wtype-limits]
_model == (model) && rv >= (rv_min) && rv <= (rv_max); \
^~
arch/arm64/kernel/cpufeature.c:889:9: note: in expansion of macro
'MIDR_IS_CPU_MODEL_RANGE'
return MIDR_IS_CPU_MODEL_RANGE(midr, MIDR_THUNDERX,
^~~~~~~~~~~~~~~~~~~~~~~
Fix it by converting MIDR_IS_CPU_MODEL_RANGE to a static inline
function.
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Will Deacon <will@kernel.org>
It is not desirable to relax the ABI to allow tagged user addresses into
the kernel indiscriminately. This patch introduces a prctl() interface
for enabling or disabling the tagged ABI with a global sysctl control
for preventing applications from enabling the relaxed ABI (meant for
testing user-space prctl() return error checking without reconfiguring
the kernel). The ABI properties are inherited by threads of the same
application and fork()'ed children but cleared on execve(). A Kconfig
option allows the overall disabling of the relaxed ABI.
The PR_SET_TAGGED_ADDR_CTRL will be expanded in the future to handle
MTE-specific settings like imprecise vs precise exceptions.
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
RELR is a relocation packing format for relative relocations.
The format is described in a generic-abi proposal:
https://groups.google.com/d/topic/generic-abi/bX460iggiKg/discussion
The LLD linker can be instructed to pack relocations in the RELR
format by passing the flag --pack-dyn-relocs=relr.
This patch adds a new config option, CONFIG_RELR. Enabling this option
instructs the linker to pack vmlinux's relative relocations in the RELR
format, and causes the kernel to apply the relocations at startup along
with the RELA relocations. RELA relocations still need to be applied
because the linker will emit RELA relative relocations if they are
unrepresentable in the RELR format (i.e. address not a multiple of 2).
Enabling CONFIG_RELR reduces the size of a defconfig kernel image
with CONFIG_RANDOMIZE_BASE by 3.5MB/16% uncompressed, or 550KB/5%
compressed (lz4).
Signed-off-by: Peter Collingbourne <pcc@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
The ESR.EC encoding of 0b011010 (0x1a) describes an exception generated
by an ERET, ERETAA or ERETAB instruction as a result of a nested
virtualisation trap to EL2.
Add an encoding for this EC and a string description so that we identify
it correctly if we take one unexpectedly.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
In commit b6b2735514
("tracing: Use str_has_prefix() instead of using fixed sizes")
the newly introduced str_has_prefix() was used
to replace error-prone strncmp(str, const, len).
Here fix codes with the same pattern.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
With commit b6664ba42f ("s390, kexec_file: drop arch_kexec_mem_walk()"),
we introduced the KEXEC_BUF_MEM_UNKNOWN macro. If kexec_buf.mem is set
to this value, kexec_locate_mem_hole() will try to allocate free memory.
While other arch(s) like s390 and x86_64 already use this macro to
initialize kexec_buf.mem with, arm64 uses an equivalent value of 0.
Replace it with KEXEC_BUF_MEM_UNKNOWN, to keep the convention of
initializing 'kxec_buf.mem' consistent across various archs.
Cc: takahiro.akashi@linaro.org
Cc: james.morse@arm.com
Reviewed-by: Matthias Brugger <mbrugger@suse.com>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
kprobes manipulates the interrupted PSTATE for single step, and
doesn't restore it. Thus, if we put a kprobe where the pstate.D
(debug) masked, the mask will be cleared after the kprobe hits.
Moreover, in the most complicated case, this can lead a kernel
crash with below message when a nested kprobe hits.
[ 152.118921] Unexpected kernel single-step exception at EL1
When the 1st kprobe hits, do_debug_exception() will be called.
At this point, debug exception (= pstate.D) must be masked (=1).
But if another kprobes hits before single-step of the first kprobe
(e.g. inside user pre_handler), it unmask the debug exception
(pstate.D = 0) and return.
Then, when the 1st kprobe setting up single-step, it saves current
DAIF, mask DAIF, enable single-step, and restore DAIF.
However, since "D" flag in DAIF is cleared by the 2nd kprobe, the
single-step exception happens soon after restoring DAIF.
This has been introduced by commit 7419333fa1 ("arm64: kprobe:
Always clear pstate.D in breakpoint exception handler")
To solve this issue, this stores all DAIF bits and restore it
after single stepping.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: 7419333fa1 ("arm64: kprobe: Always clear pstate.D in breakpoint exception handler")
Reviewed-by: James Morse <james.morse@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Remove rcu_read_lock()/rcu_read_unlock() from debug exception
handlers since we are sure those are not preemptible and
interrupts are off.
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Prohibit probing on return_address() and subroutines which
is called from return_address(), since the it is invoked from
trace_hardirqs_off() which is also kprobe blacklisted.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
If CTR_EL0.{CWG,ERG} are 0b0000 then they must be interpreted to have
their architecturally maximum values, which defeats the use of
FTR_HIGHER_SAFE when sanitising CPU ID registers on heterogeneous
machines.
Introduce FTR_HIGHER_OR_ZERO_SAFE so that these fields effectively
saturate at zero.
Fixes: 3c739b5710 ("arm64: Keep track of CPU feature registers")
Cc: <stable@vger.kernel.org> # 4.4.x-
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When fall-through warnings was enabled by default the following warnings
was starting to show up:
../arch/arm64/kernel/module.c: In function ‘apply_relocate_add’:
../arch/arm64/kernel/module.c:316:19: warning: this statement may fall
through [-Wimplicit-fallthrough=]
overflow_check = false;
~~~~~~~~~~~~~~~^~~~~~~
../arch/arm64/kernel/module.c:317:3: note: here
case R_AARCH64_MOVW_UABS_G0:
^~~~
../arch/arm64/kernel/module.c:322:19: warning: this statement may fall
through [-Wimplicit-fallthrough=]
overflow_check = false;
~~~~~~~~~~~~~~~^~~~~~~
../arch/arm64/kernel/module.c:323:3: note: here
case R_AARCH64_MOVW_UABS_G1:
^~~~
Rework so that the compiler doesn't warn about fall-through.
Fixes: d93512ef0f0e ("Makefile: Globally enable fall-through warning")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
When fall-through warnings was enabled by default the following warning
was starting to show up:
In file included from ../include/linux/kernel.h:15,
from ../include/linux/list.h:9,
from ../include/linux/kobject.h:19,
from ../include/linux/of.h:17,
from ../include/linux/irqdomain.h:35,
from ../include/linux/acpi.h:13,
from ../arch/arm64/kernel/smp.c:9:
../arch/arm64/kernel/smp.c: In function ‘__cpu_up’:
../include/linux/printk.h:302:2: warning: this statement may fall
through [-Wimplicit-fallthrough=]
printk(KERN_CRIT pr_fmt(fmt), ##__VA_ARGS__)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../arch/arm64/kernel/smp.c:156:4: note: in expansion of macro ‘pr_crit’
pr_crit("CPU%u: may not have shut down cleanly\n", cpu);
^~~~~~~
../arch/arm64/kernel/smp.c:157:3: note: here
case CPU_STUCK_IN_KERNEL:
^~~~
Rework so that the compiler doesn't warn about fall-through.
Fixes: d93512ef0f0e ("Makefile: Globally enable fall-through warning")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
Now that -Wimplicit-fallthrough is passed to GCC by default, the kernel
build has suddenly got noisy. Annotate the two fall-through cases in our
hw_breakpoint implementation, since they are both intentional.
Reported-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Will Deacon <will@kernel.org>
Commit d968d2b801 ("ARM: 7497/1: hw_breakpoint: allow single-byte
watchpoints on all addresses") changed the validation requirements for
hardware watchpoints on arch/arm/. Update our compat layer to implement
the same relaxation.
Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
We've added two ESR exception classes for new ARM hardware extensions:
ESR_ELx_EC_PAC and ESR_ELx_EC_SVE, but failed to update the strings
used in tracing and other debug.
Let's update "kvm_arm_exception_class" for these two EC, which the
new EC will be visible to user-space via kvm_exit trace events
Also update to "esr_class_str" for ESR_ELx_EC_PAC, by which we can
get more readable debug info.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
ARMV8_EVENT_ATTR_RESOLVE became unused after commit <4b1a9e6934ec>
("arm64/perf: Filter common events based on PMCEIDn_EL0").
Remove it.
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Signed-off-by: Will Deacon <will@kernel.org>
Both RISC-V & ARM64 are using cpu-map device tree to describe
their cpu topology. It's better to move the relevant code to
a common place instead of duplicate code.
To: Will Deacon <will.deacon@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Atish Patra <atish.patra@wdc.com>
[Tested on QDF2400]
Tested-by: Jeffrey Hugo <jhugo@codeaurora.org>
[Tested on Juno and other embedded platforms.]
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paul Walmsley <paul.walmsley@sifive.com>