mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-28 11:18:45 +07:00
f55f0501cb
With PTI enabled, the LDT must be mapped in the usermode tables somewhere. The LDT is per process, i.e. per mm. An earlier approach mapped the LDT on context switch into a fixmap area, but that's a big overhead and exhausted the fixmap space when NR_CPUS got big. Take advantage of the fact that there is an address space hole which provides a completely unused pgd. Use this pgd to manage per-mm LDT mappings. This has a down side: the LDT isn't (currently) randomized, and an attack that can write the LDT is instant root due to call gates (thanks, AMD, for leaving call gates in AMD64 but designing them wrong so they're only useful for exploits). This can be mitigated by making the LDT read-only or randomizing the mapping, either of which is strightforward on top of this patch. This will significantly slow down LDT users, but that shouldn't matter for important workloads -- the LDT is only used by DOSEMU(2), Wine, and very old libc implementations. [ tglx: Cleaned it up. ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
76 lines
3.7 KiB
Plaintext
76 lines
3.7 KiB
Plaintext
|
|
Virtual memory map with 4 level page tables:
|
|
|
|
0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
|
|
hole caused by [47:63] sign extension
|
|
ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor
|
|
ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
|
|
ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
|
|
ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
|
|
ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
|
|
ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
|
|
... unused hole ...
|
|
ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB)
|
|
... unused hole ...
|
|
fffffe0000000000 - fffffe7fffffffff (=39 bits) LDT remap for PTI
|
|
fffffe8000000000 - fffffeffffffffff (=39 bits) cpu_entry_area mapping
|
|
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
|
|
... unused hole ...
|
|
ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
|
|
... unused hole ...
|
|
ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0
|
|
ffffffffa0000000 - [fixmap start] (~1526 MB) module mapping space (variable)
|
|
[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
|
|
ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
|
|
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
|
|
|
|
Virtual memory map with 5 level page tables:
|
|
|
|
0000000000000000 - 00ffffffffffffff (=56 bits) user space, different per mm
|
|
hole caused by [56:63] sign extension
|
|
ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor
|
|
ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory
|
|
ff90000000000000 - ff9fffffffffffff (=52 bits) LDT remap for PTI
|
|
ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space (12800 TB)
|
|
ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole
|
|
ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB)
|
|
... unused hole ...
|
|
ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB)
|
|
... unused hole ...
|
|
fffffe8000000000 - fffffeffffffffff (=39 bits) cpu_entry_area mapping
|
|
ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks
|
|
... unused hole ...
|
|
ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space
|
|
... unused hole ...
|
|
ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0
|
|
ffffffffa0000000 - [fixmap start] (~1526 MB) module mapping space
|
|
[fixmap start] - ffffffffff5fffff kernel-internal fixmap range
|
|
ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI
|
|
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole
|
|
|
|
Architecture defines a 64-bit virtual address. Implementations can support
|
|
less. Currently supported are 48- and 57-bit virtual addresses. Bits 63
|
|
through to the most-significant implemented bit are sign extended.
|
|
This causes hole between user space and kernel addresses if you interpret them
|
|
as unsigned.
|
|
|
|
The direct mapping covers all memory in the system up to the highest
|
|
memory address (this means in some cases it can also include PCI memory
|
|
holes).
|
|
|
|
vmalloc space is lazily synchronized into the different PML4/PML5 pages of
|
|
the processes using the page fault handler, with init_top_pgt as
|
|
reference.
|
|
|
|
We map EFI runtime services in the 'efi_pgd' PGD in a 64Gb large virtual
|
|
memory window (this size is arbitrary, it can be raised later if needed).
|
|
The mappings are not part of any other kernel PGD and are only available
|
|
during EFI runtime calls.
|
|
|
|
The module mapping space size changes based on the CONFIG requirements for the
|
|
following fixmap section.
|
|
|
|
Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all
|
|
physical memory, vmalloc/ioremap space and virtual memory map are randomized.
|
|
Their order is preserved but their base will be offset early at boot time.
|