linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 10:55:21 +07:00

Author	SHA1	Message	Date
Steve Capper	e842dfb5a2	arm64: mm: Offset TTBR1 to allow 52-bit PTRS_PER_PGD Enabling 52-bit VAs on arm64 requires that the PGD table expands from 64 entries (for the 48-bit case) to 1024 entries. This quantity, PTRS_PER_PGD is used as follows to compute which PGD entry corresponds to a given virtual address, addr: pgd_index(addr) -> (addr >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1) Userspace addresses are prefixed by 0's, so for a 48-bit userspace address, uva, the following is true: (uva >> PGDIR_SHIFT) & (1024 - 1) == (uva >> PGDIR_SHIFT) & (64 - 1) In other words, a 48-bit userspace address will have the same pgd_index when using PTRS_PER_PGD = 64 and 1024. Kernel addresses are prefixed by 1's so, given a 48-bit kernel address, kva, we have the following inequality: (kva >> PGDIR_SHIFT) & (1024 - 1) != (kva >> PGDIR_SHIFT) & (64 - 1) In other words a 48-bit kernel virtual address will have a different pgd_index when using PTRS_PER_PGD = 64 and 1024. If, however, we note that: kva = 0xFFFF << 48 + lower (where lower[63:48] == 0b) and, PGDIR_SHIFT = 42 (as we are dealing with 64KB PAGE_SIZE) We can consider: (kva >> PGDIR_SHIFT) & (1024 - 1) - (kva >> PGDIR_SHIFT) & (64 - 1) = (0xFFFF << 6) & 0x3FF - (0xFFFF << 6) & 0x3F // "lower" cancels out = 0x3C0 In other words, one can switch PTRS_PER_PGD to the 52-bit value globally provided that they increment ttbr1_el1 by 0x3C0 * 8 = 0x1E00 bytes when running with 48-bit kernel VAs (TCR_EL1.T1SZ = 16). For kernel configuration where 52-bit userspace VAs are possible, this patch offsets ttbr1_el1 and sets PTRS_PER_PGD corresponding to the 52-bit value. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steve Capper <steve.capper@arm.com> [will: added comment to TTBR1_BADDR_4852_OFFSET calculation] Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 18:42:17 +00:00
Steve Capper	e5d9915745	arm64: mm: Define arch_get_mmap_end, arch_get_mmap_base Now that we have DEFAULT_MAP_WINDOW defined, we can arch_get_mmap_end and arch_get_mmap_base helpers to allow for high addresses in mmap. Signed-off-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 18:42:17 +00:00
Steve Capper	363524d2b1	arm64: mm: Introduce DEFAULT_MAP_WINDOW We wish to introduce a 52-bit virtual address space for userspace but maintain compatibility with software that assumes the maximum VA space size is 48 bit. In order to achieve this, on 52-bit VA systems, we make mmap behave as if it were running on a 48-bit VA system (unless userspace explicitly requests a VA where addr[51:48] != 0). On a system running a 52-bit userspace we need TASK_SIZE to represent the 52-bit limit as it is used in various places to distinguish between kernelspace and userspace addresses. Thus we need a new limit for mmap, stack, ELF loader and EFI (which uses TTBR0) to represent the non-extended VA space. This patch introduces DEFAULT_MAP_WINDOW and DEFAULT_MAP_WINDOW_64 and switches the appropriate logic to use that instead of TASK_SIZE. Signed-off-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 18:42:17 +00:00
Steve Capper	f6795053da	mm: mmap: Allow for "high" userspace addresses This patch adds support for "high" userspace addresses that are optionally supported on the system and have to be requested via a hint mechanism ("high" addr parameter to mmap). Architectures such as powerpc and x86 achieve this by making changes to their architectural versions of arch_get_unmapped_* functions. However, on arm64 we use the generic versions of these functions. Rather than duplicate the generic arch_get_unmapped_* implementations for arm64, this patch instead introduces two architectural helper macros and applies them to arch_get_unmapped_*: arch_get_mmap_end(addr) - get mmap upper limit depending on addr hint arch_get_mmap_base(addr, base) - get mmap_base depending on addr hint If these macros are not defined in architectural code then they default to (TASK_SIZE) and (base) so should not introduce any behavioural changes to architectures that do not define them. Signed-off-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 18:42:17 +00:00
Qian Cai	6e8830674e	arm64: kasan: Increase stack size for KASAN_EXTRA If the kernel is configured with KASAN_EXTRA, the stack size is increased significantly due to setting the GCC -fstack-reuse option to "none" [1]. As a result, it can trigger a stack overrun quite often with 32k stack size compiled using GCC 8. For example, this reproducer https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c can trigger a "corrupted stack end detected inside scheduler" very reliably with CONFIG_SCHED_STACK_END_CHECK enabled. There are other reports at: https://lore.kernel.org/lkml/1542144497.12945.29.camel@gmx.us/ https://lore.kernel.org/lkml/721E7B42-2D55-4866-9C1A-3E8D64F33F9C@gmx.us/ There are just too many functions that could have a large stack with KASAN_EXTRA due to large local variables that have been called over and over again without being able to reuse the stacks. Some noticiable ones are, size 7536 shrink_inactive_list 7440 shrink_page_list 6560 fscache_stats_show 3920 jbd2_journal_commit_transaction 3216 try_to_unmap_one 3072 migrate_page_move_mapping 3584 migrate_misplaced_transhuge_page 3920 ip_vs_lblcr_schedule 4304 lpfc_nvme_info_show 3888 lpfc_debugfs_nvmestat_data.constprop There are other 49 functions over 2k in size while compiling kernel with "-Wframe-larger-than=" on this machine. Hence, it is too much work to change Makefiles for each object to compile without -fsanitize-address-use-after-scope individually. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c23 Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 17:53:12 +00:00
Will Deacon	33309ecda0	arm64: Fix minor issues with the dcache_by_line_op macro The dcache_by_line_op macro suffers from a couple of small problems: First, the GAS directives that are currently being used rely on assembler behavior that is not documented, and probably not guaranteed to produce the correct behavior going forward. As a result, we end up with some undefined symbols in cache.o: $ nm arch/arm64/mm/cache.o ... U civac ... U cvac U cvap U cvau This is due to the fact that the comparisons used to select the operation type in the dcache_by_line_op macro are comparing symbols not strings, and even though it seems that GAS is doing the right thing here (undefined symbols by the same name are equal to each other), it seems unwise to rely on this. Second, when patching in a DC CVAP instruction on CPUs that support it, the fallback path consists of a DC CVAU instruction which may be affected by CPU errata that require ARM64_WORKAROUND_CLEAN_CACHE. Solve these issues by unrolling the various maintenance routines and using the conditional directives that are documented as operating on strings. To avoid the complexity of nested alternatives, we move the DC CVAP patching to __clean_dcache_area_pop, falling back to a branch to __clean_dcache_area_poc if DCPOP is not supported by the CPU. Reported-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Suggested-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 15:03:51 +00:00
Mark Rutland	2a9cee5b7a	arm64: remove arm64ksyms.c Now that arm64ksyms.c has been reduced to a stub, let's remove it entirely. New exports should be associated with their function definition. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 11:50:12 +00:00
Mark Rutland	dbd3196299	arm64: frace: use asm EXPORT_SYMBOL() For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the ftrace exports to the assembly files the functions are defined in. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 11:50:12 +00:00
Mark Rutland	ac0e8c72b0	arm64: string: use asm EXPORT_SYMBOL() For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the string routine exports to the assembly files the functions are defined in. Routines which should only be exported for !KASAN builds are exported using the EXPORT_SYMBOL_NOKASAN() helper. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 11:50:12 +00:00
Mark Rutland	56c08ec516	arm64: uaccess: use asm EXPORT_SYMBOL() For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the uaccess exports to the assembly files the functions are defined in. As we have to include <asm/assembler.h>, the existing includes are fixed to follow the usual ordering conventions. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 11:50:11 +00:00
Mark Rutland	50fdecb292	arm64: page: use asm EXPORT_SYMBOL() For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the copy_page and clear_page exports to the assembly files the functions are defined in. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 11:50:11 +00:00
Mark Rutland	23fe04c0c5	arm64: smccc: use asm EXPORT_SYMBOL() For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the SMCCC exports to the assembly file the functions are defined in. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 11:50:11 +00:00
Mark Rutland	abb77f3d96	arm64: tishift: use asm EXPORT_SYMBOL() For a while now it's been possible to use EXPORT_SYMBOL() in assembly files, which allows us to place exports immediately after assembly functions, as we do for C functions. As a step towards removing arm64ksyms.c, let's move the tishift exports to the assembly file the functions are defined in. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 11:50:11 +00:00
Mark Rutland	386b3c7bda	arm64: add EXPORT_SYMBOL_NOKASAN() So that we can export symbols directly from assembly files, let's make use of the generic <asm/export.h>. We have a few symbols that we'll want to conditionally export for !KASAN kernel builds, so we add a helper for that in <asm/assembler.h>. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 11:50:11 +00:00
Mark Rutland	03ef055fd3	arm64: move memstart_addr export inline Since we define memstart_addr in a C file, we can have the export immediately after the definition of the symbol, as we do elsewhere. As a step towards removing arm64ksyms.c, move the export of memstart_addr to init.c, where the symbol is defined. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 11:50:11 +00:00
Mark Rutland	2d7c89b02c	arm64: remove bitop exports Now that the arm64 bitops are inlines built atop of the regular atomics, we don't need to export anything. Remove the redundant exports. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-10 11:50:11 +00:00
Will Deacon	4230509978	arm64: cmpxchg: Use "K" instead of "L" for ll/sc immediate constraint The "L" AArch64 machine constraint, which we use for the "old" value in an LL/SC cmpxchg(), generates an immediate that is suitable for a 64-bit logical instruction. However, for cmpxchg() operations on types smaller than 64 bits, this constraint can result in an invalid instruction which is correctly rejected by GAS, such as EOR W1, W1, #0xffffffff. Whilst we could special-case the constraint based on the cmpxchg size, it's far easier to change the constraint to "K" and put up with using a register for large 64-bit immediates. For out-of-line LL/SC atomics, this is all moot anyway. Reported-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-07 17:28:13 +00:00
Will Deacon	959bf2fd03	arm64: percpu: Rewrite per-cpu ops to allow use of LSE atomics Our percpu code is a bit of an inconsistent mess: * It rolls its own xchg(), but reuses cmpxchg_local() * It uses various different flavours of preempt_{enable,disable}() * It returns values even for the non-returning RmW operations * It makes no use of LSE atomics outside of the cmpxchg() ops * There are individual macros for different sizes of access, but these are all funneled through a switch statement rather than dispatched directly to the relevant case This patch rewrites the per-cpu operations to address these shortcomings. Whilst the new code is a lot cleaner, the big advantage is that we can use the non-returning ST- atomic instructions when we have LSE. Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-07 17:28:06 +00:00
Will Deacon	b4f9209bfc	arm64: Avoid masking "old" for LSE cmpxchg() implementation The CAS instructions implicitly access only the relevant bits of the "old" argument, so there is no need for explicit masking via type-casting as there is in the LL/SC implementation. Move the casting into the LL/SC code and remove it altogether for the LSE implementation. Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-07 17:28:01 +00:00
Will Deacon	5ef3fe4cec	arm64: Avoid redundant type conversions in xchg() and cmpxchg() Our atomic instructions (either LSE atomics of LDXR/STXR sequences) natively support byte, half-word, word and double-word memory accesses so there is no need to mask the data register prior to being stored. Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-07 17:27:55 +00:00
Will Deacon	3962446922	arm64: preempt: Provide our own implementation of asm/preempt.h The asm-generic/preempt.h implementation doesn't make use of the PREEMPT_NEED_RESCHED flag, since this can interact badly with load/store architectures which rely on the preempt_count word being unchanged across an interrupt. However, since we're a 64-bit architecture and the preempt count is only 32 bits wide, we can simply pack it next to the resched flag and load the whole thing in one go, so that a dec-and-test operation doesn't need to load twice. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-07 12:35:53 +00:00
Will Deacon	08861d33d6	preempt: Move PREEMPT_NEED_RESCHED definition into arch code PREEMPT_NEED_RESCHED is never used directly, so move it into the arch code where it can potentially be implemented using either a different bit in the preempt count or as an entirely separate entity. Cc: Robert Love <rml@tech9.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-07 12:35:46 +00:00
Allen Pais	a21b0b78ea	arm64: hugetlb: Register hugepages during arch init Add hstate for each supported hugepage size using arch initcall. * no hugepage parameters Without hugepage parameters, only a default hugepage size is available for dynamic allocation. It's different, for example, from x86_64 and sparc64 where all supported hugepage sizes are available. * only default_hugepagesz= is specified and set not to HPAGE_SIZE In spite of the fact that default_hugepagesz= is set to a valid hugepage size, it's treated as unsupported and reverted to HPAGE_SIZE. Such behaviour is also different from x86_64 and sparc64. Acked-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Tom Saeger <tom.saeger@oracle.com> Signed-off-by: Dmitry Klochkov <dmitry.klochkov@oracle.com> Signed-off-by: Allen Pais <allen.pais@oracle.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-06 17:01:13 +00:00
Jackie Liu	cc9f8349cb	arm64: crypto: add NEON accelerated XOR implementation This is a NEON acceleration method that can improve performance by approximately 20%. I got the following data from the centos 7.5 on Huawei's HISI1616 chip: [ 93.837726] xor: measuring software checksum speed [ 93.874039] 8regs : 7123.200 MB/sec [ 93.914038] 32regs : 7180.300 MB/sec [ 93.954043] arm64_neon: 9856.000 MB/sec [ 93.954047] xor: using function: arm64_neon (9856.000 MB/sec) I believe this code can bring some optimization for all arm64 platform. thanks for Ard Biesheuvel's suggestions. Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-06 16:47:06 +00:00
Jackie Liu	21e28547f6	arm64/neon: add workaround for ambiguous C99 stdint.h types In a way similar to ARM commit `09096f6a0e` ("ARM: 7822/1: add workaround for ambiguous C99 stdint.h types"), this patch redefines the macros that are used in stdint.h so its definitions of uint64_t and int64_t are compatible with those of the kernel. This patch comes from: https://patchwork.kernel.org/patch/3540001/ Wrote by: Ard Biesheuvel <ard.biesheuvel@linaro.org> We mark this file as a private file and don't have to override asm/types.h Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-06 16:47:05 +00:00
Will Deacon	8cb3451b1f	arm64: entry: Remove confusing comment The comment about SYS_MEMBARRIER_SYNC_CORE relying on ERET being context-synchronizing is confusing and misplaced with kpti. Given that this is already documented under Documentation/ (see arch-support.txt for membarrier), remove the comment altogether. Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-06 16:47:05 +00:00
Will Deacon	679db70801	arm64: entry: Place an SB sequence following an ERET instruction Some CPUs can speculate past an ERET instruction and potentially perform speculative accesses to memory before processing the exception return. Since the register state is often controlled by a lower privilege level at the point of an ERET, this could potentially be used as part of a side-channel attack. This patch emits an SB sequence after each ERET so that speculation is held up on exception return. Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-06 16:47:05 +00:00
Will Deacon	bd4fb6d270	arm64: Add support for SB barrier and patch in over DSB; ISB sequences We currently use a DSB; ISB sequence to inhibit speculation in set_fs(). Whilst this works for current CPUs, future CPUs may implement a new SB barrier instruction which acts as an architected speculation barrier. On CPUs that support it, patch in an SB; NOP sequence over the DSB; ISB sequence and advertise the presence of the new instruction to userspace. Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-06 16:47:04 +00:00
Suzuki K Poulose	0b587c84e4	arm64: capabilities: Batch cpu_enable callbacks We use a stop_machine call for each available capability to enable it on all the CPUs available at boot time. Instead we could batch the cpu_enable callbacks to a single stop_machine() call to save us some time. Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-06 15:12:26 +00:00
Suzuki K Poulose	606f8e7b27	arm64: capabilities: Use linear array for detection and verification Use the sorted list of capability entries for the detection and verification. Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-06 15:12:26 +00:00
Suzuki K Poulose	f7bfc14a08	arm64: capabilities: Optimize this_cpu_has_cap Make use of the sorted capability list to access the capability entry in this_cpu_has_cap() to avoid iterating over the two tables. Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-06 15:12:25 +00:00
Suzuki K Poulose	82a3a21b23	arm64: capabilities: Speed up capability lookup We maintain two separate tables of capabilities, errata and features, which decide the system capabilities. We iterate over each of these tables for various operations (e.g, detection, verification etc.). We do not have a way to map a system "capability" to its entry, (i.e, cap -> struct arm64_cpu_capabilities) which is needed for this_cpu_has_cap(). So we iterate over the table one by one to find the entry and then do the operation. Also, this prevents us from optimizing the way we "enable" the capabilities on the CPUs, where we now issue a stop_machine() for each available capability. One solution is to merge the two tables into a single table, sorted by the capability. But this is has the following disadvantages: - We loose the "classification" of an errata vs. feature - It is quite easy to make a mistake when adding an entry, unless we sort the table at runtime. So we maintain a list of pointers to the capability entry, sorted by the "cap number" in a separate array, initialized at boot time. The only restriction is that we can have one "entry" per capability. While at it, remove the duplicate declaration of arm64_errata table. Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-06 15:12:25 +00:00
Suzuki K Poulose	a3dcea2c85	arm64: capabilities: Merge duplicate entries for Qualcomm erratum 1003 Remove duplicate entries for Qualcomm erratum 1003. Since the entries are not purely based on generic MIDR checks, use the multi_cap_entry type to merge the entries. Cc: Christopher Covington <cov@codeaurora.org> Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-06 11:47:44 +00:00
Suzuki K Poulose	f58cdf7e3c	arm64: capabilities: Merge duplicate Cavium erratum entries Merge duplicate entries for a single capability using the midr range list for Cavium errata 30115 and 27456. Cc: Andrew Pinski <apinski@cavium.com> Cc: David Daney <david.daney@cavium.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Vladimir Murzin <vladimir.murzin@arm.com> Tested-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-06 11:47:44 +00:00
Suzuki K Poulose	c9460dcb06	arm64: capabilities: Merge entries for ARM64_WORKAROUND_CLEAN_CACHE We have two entries for ARM64_WORKAROUND_CLEAN_CACHE capability : 1) ARM Errata 826319, 827319, 824069, 819472 on A53 r0p[012] 2) ARM Errata 819472 on A53 r0p[01] Both have the same work around. Merge these entries to avoid duplicate entries for a single capability. Add a new Kconfig entry to control the "capability" entry to make it easier to handle combinations of the CONFIGs. Cc: Will Deacon <will.deacon@arm.com> Cc: Andre Przywara <andre.przywara@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-06 11:47:44 +00:00
Ard Biesheuvel	3bbd3db864	arm64: relocatable: fix inconsistencies in linker script and options readelf complains about the section layout of vmlinux when building with CONFIG_RELOCATABLE=y (for KASLR): readelf: Warning: [21]: Link field (0) should index a symtab section. readelf: Warning: [21]: Info field (0) should index a relocatable section. Also, it seems that our use of '-pie -shared' is contradictory, and thus ambiguous. In general, the way KASLR is wired up at the moment is highly tailored to how ld.bfd happens to implement (and conflate) PIE executables and shared libraries, so given the current effort to support other toolchains, let's fix some of these issues as well. - Drop the -pie linker argument and just leave -shared. In ld.bfd, the differences between them are unclear (except for the ELF type of the produced image [0]) but lld chokes on seeing both at the same time. - Rename the .rela output section to .rela.dyn, as is customary for shared libraries and PIE executables, so that it is not misidentified by readelf as a static relocation section (producing the warnings above). - Pass the -z notext and -z norelro options to explicitly instruct the linker to permit text relocations, and to omit the RELRO program header (which requires a certain section layout that we don't adhere to in the kernel). These are the defaults for current versions of ld.bfd. - Discard .eh_frame and .gnu.hash sections to avoid them from being emitted between .head.text and .text, screwing up the section layout. These changes only affect the ELF image, and produce the same binary image. [0] `b9dce7f1ba` ("arm64: kernel: force ET_DYN ELF type for ...") Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Smith <peter.smith@linaro.org> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-12-04 12:48:25 +00:00
Ard Biesheuvel	efdb25efc7	arm64/lib: improve CRC32 performance for deep pipelines Improve the performance of the crc32() asm routines by getting rid of most of the branches and small sized loads on the common path. Instead, use a branchless code path involving overlapping 16 byte loads to process the first (length % 32) bytes, and process the remainder using a loop that processes 32 bytes at a time. Tested using the following test program: #include <stdlib.h> extern void crc32_le(unsigned short, char const, int); int main(void) { static const char buf[4096]; srand(20181126); for (int i = 0; i < 100 1000 * 1000; i++) crc32_le(0, buf, rand() % 1024); return 0; } On Cortex-A53 and Cortex-A57, the performance regresses but only very slightly. On Cortex-A72 however, the performance improves from $ time ./crc32 real 0m10.149s user 0m10.149s sys 0m0.000s to $ time ./crc32 real 0m7.915s user 0m7.915s sys 0m0.000s Cc: Rui Sun <sunrui26@huawei.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-11-30 13:58:04 +00:00
Mark Rutland	7dc48bf96a	arm64: ftrace: always pass instrumented pc in x0 The core ftrace hooks take the instrumented PC in x0, but for some reason arm64's prepare_ftrace_return() takes this in x1. For consistency, let's flip the argument order and always pass the instrumented PC in x0. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Torsten Duwe <duwe@suse.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-11-30 13:29:05 +00:00
Mark Rutland	49e258e05e	arm64: ftrace: remove return_regs macros The save_return_regs and restore_return_regs macros are only used by return_to_handler, and having them defined out-of-line only serves to obscure the logic. Before we complicate, let's clean this up and fold the logic directly into return_to_handler, saving a few lines of macro boilerplate in the process. At the same time, a missing trailing space is added to the comments, fixing a code style violation. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Torsten Duwe <duwe@suse.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-11-30 13:29:05 +00:00
Mark Rutland	6e803e2e6e	arm64: ftrace: don't adjust the LR value The core ftrace code requires that when it is handed the PC of an instrumented function, this PC is the address of the instrumented instruction. This is necessary so that the core ftrace code can identify the specific instrumentation site. Since the instrumented function will be a BL, the address of the instrumented function is LR - 4 at entry to the ftrace code. This fixup is applied in the mcount_get_pc and mcount_get_pc0 helpers, which acquire the PC of the instrumented function. The mcount_get_lr helper is used to acquire the LR of the instrumented function, whose value does not require this adjustment, and cannot be adjusted to anything meaningful. No adjustment of this value is made on other architectures, including arm. However, arm64 adjusts this value by 4. This patch brings arm64 in line with other architectures and removes the adjustment of the LR value. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Torsten Duwe <duwe@suse.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-11-30 13:29:05 +00:00
Mark Rutland	5c176aff5b	arm64: ftrace: enable graph FP test The core frace code has an optional sanity check on the frame pointer passed by ftrace_graph_caller and return_to_handler. This is cheap, useful, and enabled unconditionally on x86, sparc, and riscv. Let's do the same on arm64, so that we can catch any problems early. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Torsten Duwe <duwe@suse.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-11-30 13:29:04 +00:00
Mark Rutland	e4fe196642	arm64: ftrace: use GLOBAL() The global exports of ftrace_call and ftrace_graph_call are somewhat painful to read. Let's use the generic GLOBAL() macro to ameliorate matters. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Torsten Duwe <duwe@suse.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-11-30 13:29:04 +00:00
Mark Rutland	ad697a1aec	linkage: add generic GLOBAL() macro Declaring a global symbol in assembly is tedious, error-prone, and painful to read. While ENTRY() exists, this is supposed to be used for function entry points, and this affects alignment in a potentially undesireable manner. Instead, let's add a generic GLOBAL() macro for this, as x86 added locally in commit: `95695547a7` ("x86: asm linkage - introduce GLOBAL macro") ... thus allowing us to use this more freely in the kernel. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Torsten Duwe <duwe@suse.de> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-11-30 13:29:04 +00:00
Ard Biesheuvel	dd6846d774	arm64: drop linker script hack to hide __efistub_ symbols Commit `1212f7a16a` ("scripts/kallsyms: filter arm64's __efistub_ symbols") updated the kallsyms code to filter out symbols with the __efistub_ prefix explicitly, so we no longer require the hack in our linker script to emit them as absolute symbols. Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-11-30 12:49:51 +00:00
Will Deacon	1b57ec8c75	arm64: io: Ensure value passed to __iormb() is held in a 64-bit register As of commit `6460d32014` ("arm64: io: Ensure calls to delay routines are ordered against prior readX()"), MMIO reads smaller than 64 bits fail to compile under clang because we end up mixing 32-bit and 64-bit register operands for the same data processing instruction: ./include/asm-generic/io.h:695:9: warning: value size does not match register size specified by the constraint and modifier [-Wasm-operand-widths] return readb(addr); ^ ./arch/arm64/include/asm/io.h:147:58: note: expanded from macro 'readb' ^ ./include/asm-generic/io.h:695:9: note: use constraint modifier "w" ./arch/arm64/include/asm/io.h:147:50: note: expanded from macro 'readb' ^ ./arch/arm64/include/asm/io.h:118:24: note: expanded from macro '__iormb' asm volatile("eor %0, %1, %1\n" \ ^ Fix the build by casting the macro argument to 'unsigned long' when used as an input to the inline asm. Reported-by: Nick Desaulniers <nick.desaulniers@gmail.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-11-29 16:36:18 +00:00
Will Deacon	3d65b6bbc0	arm64: tlbi: Set MAX_TLBI_OPS to PTRS_PER_PTE In order to reduce the possibility of soft lock-ups, we bound the maximum number of TLBI operations performed by a single call to flush_tlb_range() to an arbitrary constant of 1024. Whilst this does the job of avoiding lock-ups, we can actually be a bit smarter by defining this as PTRS_PER_PTE. Due to the structure of our page tables, using PTRS_PER_PTE means that an outer loop calling flush_tlb_range() for entire table entries will end up performing just a single TLBI operation for each entry. As an example, mremap()ing a 1GB range mapped using 4k pages now requires only 512 TLBI operations when moving the page tables as opposed to 262144 operations (512*512) when using the current threshold of 1024. Cc: Joel Fernandes <joel@joelfernandes.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-11-27 19:01:21 +00:00
Ard Biesheuvel	bdb85cd1d2	arm64/module: switch to ADRP/ADD sequences for PLT entries Now that we have switched to the small code model entirely, and reduced the extended KASLR range to 4 GB, we can be sure that the targets of relative branches that are out of range are in range for a ADRP/ADD pair, which is one instruction shorter than our current MOVN/MOVK/MOVK sequence, and is more idiomatic and so it is more likely to be implemented efficiently by micro-architectures. So switch over the ordinary PLT code and the special handling of the Cortex-A53 ADRP errata, as well as the ftrace trampline handling. Reviewed-by: Torsten Duwe <duwe@lst.de> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [will: Added a couple of comments in the plt equality check] Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-11-27 19:00:45 +00:00
Ard Biesheuvel	7aaf7b2fd2	arm64/insn: add support for emitting ADR/ADRP instructions Add support for emitting ADR and ADRP instructions so we can switch over our PLT generation code in a subsequent patch. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-11-27 18:47:33 +00:00
James Morse	d8797b1257	arm64: Use a raw spinlock in __install_bp_hardening_cb() __install_bp_hardening_cb() is called via stop_machine() as part of the cpu_enable callback. To force each CPU to take its turn when allocating slots, they take a spinlock. With the RT patches applied, the spinlock becomes a mutex, and we get warnings about sleeping while in stop_machine(): \| [ 0.319176] CPU features: detected: RAS Extension Support \| [ 0.319950] BUG: scheduling while atomic: migration/3/36/0x00000002 \| [ 0.319955] Modules linked in: \| [ 0.319958] Preemption disabled at: \| [ 0.319969] [<ffff000008181ae4>] cpu_stopper_thread+0x7c/0x108 \| [ 0.319973] CPU: 3 PID: 36 Comm: migration/3 Not tainted 4.19.1-rt3-00250-g330fc2c2a880 #2 \| [ 0.319975] Hardware name: linux,dummy-virt (DT) \| [ 0.319976] Call trace: \| [ 0.319981] dump_backtrace+0x0/0x148 \| [ 0.319983] show_stack+0x14/0x20 \| [ 0.319987] dump_stack+0x80/0xa4 \| [ 0.319989] __schedule_bug+0x94/0xb0 \| [ 0.319991] __schedule+0x510/0x560 \| [ 0.319992] schedule+0x38/0xe8 \| [ 0.319994] rt_spin_lock_slowlock_locked+0xf0/0x278 \| [ 0.319996] rt_spin_lock_slowlock+0x5c/0x90 \| [ 0.319998] rt_spin_lock+0x54/0x58 \| [ 0.320000] enable_smccc_arch_workaround_1+0xdc/0x260 \| [ 0.320001] __enable_cpu_capability+0x10/0x20 \| [ 0.320003] multi_cpu_stop+0x84/0x108 \| [ 0.320004] cpu_stopper_thread+0x84/0x108 \| [ 0.320008] smpboot_thread_fn+0x1e8/0x2b0 \| [ 0.320009] kthread+0x124/0x128 \| [ 0.320010] ret_from_fork+0x10/0x18 Switch this to a raw spinlock, as we know this is only called with IRQs masked. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-11-27 18:01:34 +00:00
Jeremy Linton	9eb1c92b47	arm64: acpi: Prepare for longer MADTs The BAD_MADT_GICC_ENTRY check is a little too strict because it rejects MADT entries that don't match the currently known lengths. We should remove this restriction to avoid problems if the table length changes. Future code which might depend on additional fields should be written to validate those fields before using them, rather than trying to globally check known MADT version lengths. Link: https://lkml.kernel.org/r/20181012192937.3819951-1-jeremy.linton@arm.com Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> [lorenzo.pieralisi@arm.com: added MADT macro comments] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Al Stone <ahs3@redhat.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-11-27 18:00:14 +00:00

1 2 3 4 5 ...

797192 Commits