linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-23 14:44:31 +07:00

Author	SHA1	Message	Date
Ard Biesheuvel	b7ede5a1f5	ARM: 8662/1: module: split core and init PLT sections Since commit `35fa91eed8` ("ARM: kernel: merge core and init PLTs"), the ARM module PLT code allocates all PLT entries in a single core section, since the overhead of having a separate init PLT section is not justified by the small number of PLT entries usually required for init code. However, the core and init module regions are allocated independently, and there is a corner case where the core region may be allocated from the VMALLOC region if the dedicated module region is exhausted, but the init region, being much smaller, can still be allocated from the module region. This puts the PLT entries out of reach of the relocated branch instructions, defeating the whole purpose of PLTs. So split the core and init PLT regions, and name the latter ".init.plt" so it gets allocated along with (and sufficiently close to) the .init sections that it serves. Also, given that init PLT entries may need to be emitted for branches that target the core module, modify the logic that disregards defined symbols to only disregard symbols that are defined in the same section. Fixes: `35fa91eed8` ("ARM: kernel: merge core and init PLTs") Cc: <stable@vger.kernel.org> # v4.9+ Reported-by: Angus Clark <angus@angusclark.org> Tested-by: Angus Clark <angus@angusclark.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>	2017-03-17 10:01:28 +00:00
Ard Biesheuvel	66e94ba3c8	ARM: kernel: avoid brute force search on PLT generation Given that we now sort the relocation sections in a way that guarantees that entries that can share a single PLT entry end up adjacently, there is no a longer a need to go over the entire list to look for an existing entry that matches our jump target. If such a match exists, it was the last one to be emitted, so we can simply check the preceding slot. Note that this will still work correctly in the [theoretical] presence of call/jump relocations against SHN_UNDEF symbols with non-zero addends, although not optimally. Since the relocations are presented in the same order that we checked them for duplicates, any duplicates that we failed to spot the first time around will be accounted for in the PLT allocation so there is guaranteed to be sufficient space for them when actually emitting the PLT. For instance, the following sequence of relocations: 000004d8 00058b0a R_ARM_THM_CALL 00000000 warn_slowpath_null 000004fc 00058b0a R_ARM_THM_CALL 00000000 warn_slowpath_null 0000050e 00058b0a R_ARM_THM_CALL 00000000 warn_slowpath_null 00000520 00058b0a R_ARM_THM_CALL 00000000 warn_slowpath_null 00000532 00058b0a R_ARM_THM_CALL 00000000 warn_slowpath_null 00000544 00058b0a R_ARM_THM_CALL 00000000 warn_slowpath_null 00000556 00058b0a R_ARM_THM_CALL 00000000 warn_slowpath_null 00000568 00058b0a R_ARM_THM_CALL 00000000 warn_slowpath_null 0000057a 00058b0a R_ARM_THM_CALL 00000000 warn_slowpath_null 0000058c 00058b0a R_ARM_THM_CALL 00000000 warn_slowpath_null 0000059e 00058b0a R_ARM_THM_CALL 00000000 warn_slowpath_null 000005b0 00058b0a R_ARM_THM_CALL 00000000 warn_slowpath_null 000005c2 00058b0a R_ARM_THM_CALL 00000000 warn_slowpath_null 000005d4 00058b0a R_ARM_THM_CALL 00000000 warn_slowpath_null may result in several PLT entries to be allocated, and also emitted, if any of the entries in the middle refer to a Place that contains a non-zero addend (i.e., one for all the preceding zero-addend relocations, one for all the following zero-addend relocations, and one for the non-zero addend relocation itself) Tested-by: Jongsung Kim <neidhard.kim@lge.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>	2016-08-30 17:45:34 +01:00
Ard Biesheuvel	1031a7e674	ARM: kernel: sort relocation sections before allocating PLTs The PLT allocation routines try to establish an upper bound on the number of PLT entries that will be required at relocation time, and optimize this by disregarding duplicates (i.e., PLT entries that will end up pointing to the same function). This is currently a O(n^2) algorithm, but we can greatly simplify this by - sorting the relocation section so that relocations that can use the same PLT entry will be listed adjacently, - disregard jump/call relocations with addends; these are highly unusual, for relocations against SHN_UNDEF symbols, and so we can simply allocate a PLT entry for each one we encounter, without trying to optimize away duplicates. Tested-by: Jongsung Kim <neidhard.kim@lge.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>	2016-08-30 17:45:34 +01:00
Ard Biesheuvel	05123fef09	ARM: kernel: allocate PLT entries only for external symbols When CONFIG_ARM_MODULE_PLTS is enabled, jump and call instructions in modules no longer need to be within 16 MB (8 MB for Thumb2) of their targets. If they are further away, a PLT entry will be generated on the fly for each of them, which extends the range to the entire 32-bit address space. However, since these PLT entries will become the branch targets of the original jump and call instructions, the PLT itself needs to be in range, or we end up in the same situation we started in. Since the PLT is in a separate section, this essentially means that all jumps and calls inside the same module must be resolvable without PLT entries. The PLT allocation code executes before the module itself is loaded in its final location, and so it has to use a worst-case estimate for which jumps and calls will require an entry in the PLT at relocation time. As an optimization, this code deduplicates entries pointing to the same symbol, using a O(n^2) algorithm. However, it does not take the above into account, i.e., that PLT entries will only be needed for jump and call relocations against symbols that are not defined in the module. So disregard relocations against symbols that are defined in the module itself. As an additional minor optimization, ignore input sections that lack the SHF_EXECINSTR flag. Since jump and call relocations operate on executable instructions only, there is no need to look in sections that do not contain executable code. Tested-by: Jongsung Kim <neidhard.kim@lge.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>	2016-08-30 17:45:34 +01:00
Ard Biesheuvel	35fa91eed8	ARM: kernel: merge core and init PLTs The PLT code uses a separate .init.plt section to allocate PLT entries for jump and call instructions in __init code. However, even for fairly sizable modules like mac80211.ko, we only end up with a couple of PLT entries in the .init section, and so we can simplify the code significantly by emitting all PLT entries into the same section. Tested-by: Jongsung Kim <neidhard.kim@lge.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>	2016-08-30 17:45:34 +01:00
Rusty Russell	7523e4dc50	module: use a structure to encapsulate layout. Makes it easier to handle init vs core cleanly, though the change is fairly invasive across random architectures. It simplifies the rbtree code immediately, however, while keeping the core data together in the same cachline (now iff the rbtree code is enabled). Acked-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2015-12-04 22:46:25 +01:00
Arnd Bergmann	73c430bf9a	ARM: 8364/1: fix BE32 module loading The new veneer support for loadable modules on ARM uses the __opcode_to_mem_thumb32() function to count R_ARM_THM_CALL and R_ARM_THM_JUMP24 relocations. However, this function is not defined for big-endian kernels on ARMv5 or before, causing a compile-time error: arch/arm/kernel/module-plts.c: In function 'count_plts': arch/arm/kernel/module-plts.c:124:9: error: implicit declaration of function '__opcode_to_mem_thumb32' [-Werror=implicit-function-declaration] __opcode_to_mem_thumb32(0x07ff2fff))) ^ As we know that this part of the function is only needed for Thumb2 kernels, and that those can never happen with BE32, we can avoid the error by enclosing the code in an #ifdef. Fixes: `7d485f647c` ("ARM: 8220/1: allow modules outside of bl range") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2015-06-02 09:58:17 +01:00
Ard Biesheuvel	7d485f647c	ARM: 8220/1: allow modules outside of bl range Loading modules far away from the kernel in memory is problematic because the 'bl' instruction only has limited reach, and modules are not built with PLTs. Instead of using the -mlong-calls option (which affects all compiler emitted bl instructions, but not the ones in assembler), this patch allocates some additional space at module load time, and populates it with PLT like veneers when encountering relocations that are out of range. This should work with all relocations against symbols exported by the kernel, including those resulting from GCC generated implicit function calls for ftrace etc. The module memory size increases by about 5% on average, regardless of whether any PLT entries were actually needed. However, due to the page based rounding that occurs when allocating module memory, the average memory footprint increase is negligible. Reviewed-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2015-05-08 10:42:34 +01:00

8 Commits