linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-18 09:58:27 +07:00

Author	SHA1	Message	Date
Christophe Leroy	fca5c1e9eb	powerpc/mm: move slice_mask_for_size() into mmu.h Move slice_mask_for_size() into subarch mmu.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Retain the BUG_ON()s, rather than converting to VM_BUG_ON()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-03 01:20:22 +10:00
Christophe Leroy	6f60cc98df	powerpc/mm: hand a context_t over to slice_mask_for_size() instead of mm_struct slice_mask_for_size() only uses mm->context, so hand directly a pointer to the context. This will help moving the function in subarch mmu.h in the next patch by avoiding having to include the definition of struct mm_struct Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-03 01:20:22 +10:00
Christophe Leroy	02f89aed6b	powerpc/mm: no slice for nohash/64 Only nohash/32 and book3s/64 support mm slices. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-03 01:20:22 +10:00
Christophe Leroy	5ba666d56c	powerpc/mm: fix erroneous duplicate slb_addr_limit init Commit `67fda38f0d` ("powerpc/mm: Move slb_addr_linit to early_init_mmu") moved slb_addr_limit init out of setup_arch(). Commit `701101865f` ("powerpc/mm: Reduce memory usage for mm_context_t for radix") brought it back into setup_arch() by error. This patch reverts that erroneous regress. Fixes: `701101865f` ("powerpc/mm: Reduce memory usage for mm_context_t for radix") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-03 01:20:22 +10:00
Christophe Leroy	27e23b5f5f	powerpc/mm: Move nohash specifics in subdirectory mm/nohash Many files in arch/powerpc/mm are only for nohash. This patch creates a subdirectory for them. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Shorten new filenames] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-03 01:20:22 +10:00
Christophe Leroy	17312f258c	powerpc/mm: Move book3s32 specifics in subdirectory mm/book3s64 Several files in arch/powerpc/mm are only for book3S32. This patch creates a subdirectory for them. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Shorten new filenames] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-03 01:20:22 +10:00
Christophe Leroy	47d99948ee	powerpc/mm: Move book3s64 specifics in subdirectory mm/book3s64 Many files in arch/powerpc/mm are only for book3S64. This patch creates a subdirectory for them. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Update the selftest sym links, shorten new filenames, cleanup some whitespace and formatting in the new files.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-03 01:18:38 +10:00
Christophe Leroy	9d9f2cccde	powerpc/mm: change #include "mmu_decl.h" to <mm/mmu_decl.h> This patch make inclusion of mmu_decl.h independant of the location of the file including it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-02 21:18:58 +10:00
Christophe Leroy	71faf8145c	powerpc/nohash64: clean pgtable.h TRANSPARENT_HUGEPAGE is only supported by book3s VMEMMAP_REGION_ID is never used Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-02 21:18:57 +10:00
Christophe Leroy	a1ac2a9c4f	powerpc/book3e: drop BUG_ON() in map_kernel_page() early_alloc_pgtable() never returns NULL as it panics on failure. This patch drops the three BUG_ON() which check the non nullity of early_alloc_pgtable() returned value. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-02 21:18:57 +10:00
Breno Leitao	e620d45065	powerpc/tm: Avoid machine crash on rt_sigreturn() There is a kernel crash that happens if rt_sigreturn() is called inside a transactional block. This crash happens if the kernel hits an in-kernel page fault when accessing userspace memory, usually through copy_ckvsx_to_user(). A major page fault calls might_sleep() function, which can cause a task reschedule. A task reschedule (switch_to()) reclaim and recheckpoint the TM states, but, in the signal return path, the checkpointed memory was already reclaimed, thus the exception stack has MSR that points to MSR[TS]=0. When the code returns from might_sleep() and a task reschedule happened, then this task is returned with the memory recheckpointed, and CPU MSR[TS] = suspended. This means that there is a side effect at might_sleep() if it is called with CPU MSR[TS] = 0 and the task has regs->msr[TS] != 0. This side effect can cause a TM bad thing, since at the exception entrance, the stack saves MSR[TS]=0, and this is what will be used at RFID, but, the processor has MSR[TS] = Suspended, and this transition will be invalid and a TM Bad thing will be raised, causing the following crash: Unexpected TM Bad Thing exception at c00000000000e9ec (msr 0x8000000302a03031) tm_scratch=800000010280b033 cpu 0xc: Vector: 700 (Program Check) at [c00000003ff1fd70] pc: c00000000000e9ec: fast_exception_return+0x100/0x1bc lr: c000000000032948: handle_rt_signal64+0xb8/0xaf0 sp: c0000004263ebc40 msr: 8000000302a03031 current = 0xc000000415050300 paca = 0xc00000003ffc4080 irqmask: 0x03 irq_happened: 0x01 pid = 25006, comm = sigfuz Linux version 5.0.0-rc1-00001-g3bd6e94bec12 (breno@debian) (gcc version 8.2.0 (Debian 8.2.0-3)) #899 SMP Mon Jan 7 11:30:07 EST 2019 WARNING: exception is not recoverable, can't continue enter ? for help [c0000004263ebc40] c000000000032948 handle_rt_signal64+0xb8/0xaf0 (unreliable) [c0000004263ebd30] c000000000022780 do_notify_resume+0x2f0/0x430 [c0000004263ebe20] c00000000000e844 ret_from_except_lite+0x70/0x74 --- Exception: c00 (System Call) at 00007fffbaac400c SP (7fffeca90f40) is in userspace The solution for this problem is running the sigreturn code with regs->msr[TS] disabled, thus, avoiding hitting the side effect above. This does not seem to be a problem since regs->msr will be replaced by the ucontext value, so, it is being flushed already. In this case, it is flushed earlier. Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-01 23:03:29 +10:00
Aneesh Kumar K.V	2c474c0350	powerpc/mm/radix: Fix kernel crash when running subpage protect test This patch fixes the below crash by making sure we touch the subpage protection related structures only if we know they are allocated on the platform. With radix translation we don't allocate hash context at all and trying to access subpage_prot_table results in: Faulting instruction address: 0xc00000000008bdb4 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV .... NIP [c00000000008bdb4] sys_subpage_prot+0x74/0x590 LR [c00000000000b688] system_call+0x5c/0x70 Call Trace: [c00020002c6b7d30] [c00020002c6b7d90] 0xc00020002c6b7d90 (unreliable) [c00020002c6b7e20] [c00000000000b688] system_call+0x5c/0x70 Instruction dump: fb61ffd8 fb81ffe0 fba1ffe8 fbc1fff0 fbe1fff8 f821ff11 e92d1178 f9210068 39200000 e92d0968 ebe90630 e93f03e8 <eb891038> 60000000 3860fffe e9410068 We also move the subpage_prot_table with mmp_sem held to avoid race between two parallel subpage_prot syscall. Fixes: `701101865f` ("powerpc/mm: Reduce memory usage for mm_context_t for radix") Reported-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Tested-by: Sachin Sant <sachinp@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-01 23:01:33 +10:00
Mahesh Salgaonkar	50dbabe06a	powerpc/powernv/mce: Print additional information about MCE error. Print more information about MCE error whether it is an hardware or software error. Some of the MCE errors can be easily categorized as hardware or software errors e.g. UEs are due to hardware error, where as error triggered due to invalid usage of tlbie is a pure software bug. But not all the MCE errors can be easily categorize into either software or hardware. There are errors like multihit errors which are usually result of a software bug, but in some rare cases a hardware failure can cause a multihit error. In past, we have seen case where after replacing faulty chip, multihit errors stopped occurring. Same with parity errors, which are usually due to faulty hardware but there are chances where multihit can also cause an parity error. Such errors are difficult to determine what really caused it. Hence this patch classifies MCE errors into following four categorize: 1. Hardware error: UE and Link timeout failure errors. 2. Probable hardware error (some chance of software cause) SLB/ERAT/TLB Parity errors. 3. Software error Invalid tlbie form. 4. Probable software error (some chance of hardware cause) SLB/ERAT/TLB Multihit errors. Sample output: MCE: CPU80: machine check (Warning) Guest SLB Multihit DAR: 000001001b6e0320 [Recovered] MCE: CPU80: PID: 24765 Comm: qemu-system-ppc Guest NIP: [00007fffa309dc60] MCE: CPU80: Probable Software error (some chance of hardware cause) Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-01 22:23:20 +10:00
Mahesh Salgaonkar	cda6618d06	powerpc/powernv/mce: Print correct severity for MCE error. Currently all machine check errors are printed as severe errors which isn't correct. Print soft errors as warning instead of severe errors. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-01 22:22:51 +10:00
Mahesh Salgaonkar	d6e8a15085	powerpc/powernv/mce: Reduce MCE console logs to lesser lines. Also add cpu number while displaying MCE log. This will help cleaner logs when MCE hits on multiple cpus simultaneously. Before the changes the MCE output was: Severe Machine check interrupt [Recovered] NIP [d00000000ba80280]: insert_slb_entry.constprop.0+0x278/0x2c0 [mcetest_slb] Initiator: CPU Error type: SLB [Multihit] Effective address: d00000000ba80280 After this patch series changes the MCE output will be: MCE: CPU80: machine check (Warning) Host SLB Multihit [Recovered] MCE: CPU80: NIP: [d00000000b550280] insert_slb_entry.constprop.0+0x278/0x2c0 [mcetest_slb] MCE: CPU80: Probable software error (some chance of hardware cause) UE in host application: MCE: CPU48: machine check (Severe) Host UE Load/Store DAR: 00007fffc6079a80 paddr: 0000000f8e260000 [Not recovered] MCE: CPU48: PID: 4584 Comm: find NIP: [0000000010023368] MCE: CPU48: Hardware error and for MCE in Guest: MCE: CPU80: machine check (Warning) Guest SLB Multihit DAR: 000001001b6e0320 [Recovered] MCE: CPU80: PID: 24765 Comm: qemu-system-ppc Guest NIP: [00007fffa309dc60] MCE: CPU80: Probable software error (some chance of hardware cause) Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-01 22:22:24 +10:00
Anton Blanchard	5b2a152962	powerpc: Add doorbell tracepoints When analysing sources of OS jitter, I noticed that doorbells cannot be traced. Signed-off-by: Anton Blanchard <anton@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-01 16:45:05 +10:00
YueHaibing	32eeb5614d	ocxl: remove set but not used variables 'tid' and 'lpid' Fixes gcc '-Wunused-but-set-variable' warning: drivers/misc/ocxl/link.c: In function 'xsl_fault_handler': drivers/misc/ocxl/link.c:187:17: warning: variable 'tid' set but not used drivers/misc/ocxl/link.c:187:6: warning: variable 'lpid' set but not used They are never used and can be removed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-01 12:09:27 +10:00
Mathieu Malaterre	a5ae043de7	powerpc/64s: Remove 'dummy_copy_buffer' In commit `2bf1071a8d` ("powerpc/64s: Remove POWER9 DD1 support") the function __switch_to remove usage for 'dummy_copy_buffer'. Since it is not used anywhere else, remove it completely. This remove the following warning: arch/powerpc/kernel/process.c:1156:17: error: 'dummy_copy_buffer' defined but not used Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-01 12:08:44 +10:00
Tobin C. Harding	7e8039795a	powerpc/cacheinfo: Fix kobject memleak Currently error return from kobject_init_and_add() is not followed by a call to kobject_put(). This means there is a memory leak. Add call to kobject_put() in error path of kobject_init_and_add(). Signed-off-by: Tobin C. Harding <tobin@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-01 10:52:29 +10:00
Nick Desaulniers	33dda8c327	powerpc/vdso: Drop unnecessary cc-ldoption Towards the goal of removing cc-ldoption, it seems that --hash-style= was added to binutils 2.17.50.0.2 in 2006. The minimal required version of binutils for the kernel according to Documentation/process/changes.rst is 2.20. Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-05-01 10:49:58 +10:00
Alexey Kardashevskiy	b511cdd1c1	powerpc/powernv/ioda: Handle failures correctly in pnv_pci_ioda_iommu_bypass_supported() When the return value type was changed from int to bool, few places were left unchanged, this fixes them. We did not hit these failures as the first one is not happening at all and the second one is little more likely to happen if the user switches a 33..58bit DMA capable device between the VFIO and vendor drivers and there are not so many of these. Fixes: `2d6ad41b2c` ("powerpc/powernv: use the generic iommu bypass code") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-30 23:23:24 +10:00
Michael Ellerman	bdc7c970bc	Merge branch 'topic/ppc-kvm' into next Merge our topic branch shared with KVM. In particular this includes the rewrite of the idle code into C.	2019-04-30 22:52:03 +10:00
Michael Ellerman	e9cef0189c	powerpc/powernv/idle: Restore AMR/UAMOR/AMOR/IAMR after idle This is an implementation of commits `53a712bae5` ("powerpc/powernv/idle: Restore AMR/UAMOR/AMOR after idle") and `a3f3072db6` ("powerpc/powernv/idle: Restore IAMR after idle") using the new C-based idle code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Extract from Nick's patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-30 22:38:56 +10:00
Nicholas Piggin	10d91611f4	powerpc/64s: Reimplement book3s idle code in C Reimplement Book3S idle code in C, moving POWER7/8/9 implementation speific HV idle code to the powernv platform code. Book3S assembly stubs are kept in common code and used only to save the stack frame and non-volatile GPRs before executing architected idle instructions, and restoring the stack and reloading GPRs then returning to C after waking from idle. The complex logic dealing with threads and subcores, locking, SPRs, HMIs, timebase resync, etc., is all done in C which makes it more maintainable. This is not a strict translation to C code, there are some significant differences: - Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs, but saves and restores them itself. - The optimisation where EC=ESL=0 idle modes did not have to save GPRs or change MSR is restored, because it's now simple to do. ESL=1 sleeps that do not lose GPRs can use this optimization too. - KVM secondary entry and cede is now more of a call/return style rather than branchy. nap_state_lost is not required because KVM always returns via NVGPR restoring path. - KVM secondary wakeup from offline sequence is moved entirely into the offline wakeup, which avoids a hwsync in the normal idle wakeup path. Performance measured with context switch ping-pong on different threads or cores, is possibly improved a small amount, 1-3% depending on stop state and core vs thread test for shallow states. Deep states it's in the noise compared with other latencies. KVM improvements: - Idle sleepers now always return to caller rather than branch out to KVM first. - This allows optimisations like very fast return to caller when no state has been lost. - KVM no longer requires nap_state_lost because it controls NVGPR save/restore itself on the way in and out. - The heavy idle wakeup KVM request check can be moved out of the normal host idle code and into the not-performance-critical offline code. - KVM nap code now returns from where it is called, which makes the flow a bit easier to follow. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Squash the KVM changes in] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-30 22:37:48 +10:00
Nicholas Piggin	7ae3f6e130	powerpc/watchdog: Use hrtimers for per-CPU heartbeat Using a jiffies timer creates a dependency on the tick_do_timer_cpu incrementing jiffies. If that CPU has locked up and jiffies is not incrementing, the watchdog heartbeat timer for all CPUs stops and creates false positives and confusing warnings on local CPUs, and also causes the SMP detector to stop, so the root cause is never detected. Fix this by using hrtimer based timers for the watchdog heartbeat, like the generic kernel hardlockup detector. Cc: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reported-by: Ravikumar Bangoria <ravi.bangoria@in.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reported-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-30 11:31:02 +10:00
Nathan Fontenot	b2d3b5ee66	powerpc/pseries: Track LMB nid instead of using device tree When removing memory we need to remove the memory from the node it was added to instead of looking up the node it should be in in the device tree. During testing we have seen scenarios where the affinity for a LMB changes due to a partition migration or PRRN event. In these cases the node the LMB exists in may not match the node the device tree indicates it belongs in. This can lead to a system crash when trying to DLPAR remove the LMB after a migration or PRRN event. The current code looks up the node in the device tree to remove the LMB from, the crash occurs when we try to offline this node and it does not have any data, i.e. node_data[nid] == NULL. 36:mon> e cpu 0x36: Vector: 300 (Data Access) at [c0000001828b7810] pc: c00000000036d08c: try_offline_node+0x2c/0x1b0 lr: c0000000003a14ec: remove_memory+0xbc/0x110 sp: c0000001828b7a90 msr: 800000000280b033 dar: 9a28 dsisr: 40000000 current = 0xc0000006329c4c80 paca = 0xc000000007a55200 softe: 0 irq_happened: 0x01 pid = 76926, comm = kworker/u320:3 36:mon> t [link register ] c0000000003a14ec remove_memory+0xbc/0x110 [c0000001828b7a90] c00000000006a1cc arch_remove_memory+0x9c/0xd0 (unreliable) [c0000001828b7ad0] c0000000003a14e0 remove_memory+0xb0/0x110 [c0000001828b7b20] c0000000000c7db4 dlpar_remove_lmb+0x94/0x160 [c0000001828b7b60] c0000000000c8ef8 dlpar_memory+0x7e8/0xd10 [c0000001828b7bf0] c0000000000bf828 handle_dlpar_errorlog+0xf8/0x160 [c0000001828b7c60] c0000000000bf8cc pseries_hp_work_fn+0x3c/0xa0 [c0000001828b7c90] c000000000128cd8 process_one_work+0x298/0x5a0 [c0000001828b7d20] c000000000129068 worker_thread+0x88/0x620 [c0000001828b7dc0] c00000000013223c kthread+0x1ac/0x1c0 [c0000001828b7e30] c00000000000b45c ret_from_kernel_thread+0x5c/0x80 To resolve this we need to track the node a LMB belongs to when it is added to the system so we can remove it from that node instead of the node that the device tree indicates it should belong to. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-29 22:27:16 +10:00
Colin Ian King	f341d89790	powerpc/mm: fix spelling mistake "Outisde" -> "Outside" There are several identical spelling mistakes in warning messages, fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-28 16:28:23 +10:00
Aneesh Kumar K.V	26ad26718d	powerpc/mm: Fix section mismatch warning This patch fix the below section mismatch warnings. WARNING: vmlinux.o(.text+0x2d1f44): Section mismatch in reference from the function devm_memremap_pages_release() to the function .meminit.text:arch_remove_memory() WARNING: vmlinux.o(.text+0x2d265c): Section mismatch in reference from the function devm_memremap_pages() to the function .meminit.text:arch_add_memory() Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:12:40 +10:00
Aneesh Kumar K.V	5f53d28608	powerpc/mm/hash: Rename KERNEL_REGION_ID to LINEAR_MAP_REGION_ID The region actually point to linear map. Rename the #define to clarify thati. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:12:40 +10:00
Aneesh Kumar K.V	a092a03fa9	powerpc/mm: Print kernel map details to dmesg This helps in debugging. We can look at the dmesg to find out different kernel mapping details. On 4K config this shows kernel vmalloc start = 0xc000100000000000 kernel IO start = 0xc000200000000000 kernel vmemmap start = 0xc000300000000000 On 64K config: kernel vmalloc start = 0xc008000000000000 kernel IO start = 0xc00a000000000000 kernel vmemmap start = 0xc00c000000000000 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:12:40 +10:00
Aneesh Kumar K.V	1c946c1b7f	powerpc/mm/hash: Simplify the region id calculation. This reduces multiple comparisons in get_region_id to a bit shift operation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:12:40 +10:00
Aneesh Kumar K.V	53ed7a5947	powerpc/mm: Drop the unnecessary region check All the regions are now mapped with top nibble 0xc. Hence the region id check is not needed for virt_addr_valid() Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:12:40 +10:00
Aneesh Kumar K.V	e09093927e	powerpc/mm: Validate address values against different region limits This adds an explicit check in various functions. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:12:39 +10:00
Aneesh Kumar K.V	0034d395f8	powerpc/mm/hash64: Map all the kernel regions in the same 0xc range This patch maps vmalloc, IO and vmemap regions in the 0xc address range instead of the current 0xd and 0xf range. This brings the mapping closer to radix translation mode. With hash 64K page size each of this region is 512TB whereas with 4K config we are limited by the max page table range of 64TB and hence there regions are of 16TB size. The kernel mapping is now: On 4K hash kernel_region_map_size = 16TB kernel vmalloc start = 0xc000100000000000 kernel IO start = 0xc000200000000000 kernel vmemmap start = 0xc000300000000000 64K hash, 64K radix and 4k radix: kernel_region_map_size = 512TB kernel vmalloc start = 0xc008000000000000 kernel IO start = 0xc00a000000000000 kernel vmemmap start = 0xc00c000000000000 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:12:39 +10:00
Aneesh Kumar K.V	a35a3c6f60	powerpc/mm/hash64: Add a variable to track the end of IO mapping This makes it easy to update the region mapping in the later patch Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:12:39 +10:00
Aneesh Kumar K.V	ef629cc5bf	powerc/mm/hash: Reduce hash_mm_context size Allocate subpage protect related variables only if we use the feature. This helps in reducing the hash related mm context struct by around 4K Before the patch sizeof(struct hash_mm_context) = 8288 After the patch sizeof(struct hash_mm_context) = 4160 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:12:39 +10:00
Aneesh Kumar K.V	701101865f	powerpc/mm: Reduce memory usage for mm_context_t for radix Currently, our mm_context_t on book3s64 include all hash specific context details like slice mask and subpage protection details. We can skip allocating these with radix translation. This will help us to save 8K per mm_context with radix translation. With the patch applied we have sizeof(mm_context_t) = 136 sizeof(struct hash_mm_context) = 8288 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:12:39 +10:00
Aneesh Kumar K.V	67fda38f0d	powerpc/mm: Move slb_addr_linit to early_init_mmu Avoid #ifdef in generic code. Also enables us to do this specific to MMU translation mode on book3s64 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:12:39 +10:00
Aneesh Kumar K.V	60458fba46	powerpc/mm: Add helpers for accessing hash translation related variables We want to switch to allocating them runtime only when hash translation is enabled. Add helpers so that both book3s and nohash can be adapted to upcoming change easily. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:12:38 +10:00
Aneesh Kumar K.V	4f40b15f33	powerpc/mm: Remove PPC_MM_SLICES #ifdef for book3s64 Book3s64 always have PPC_MM_SLICES enabled. So remove the unncessary #ifdef Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:12:38 +10:00
Aneesh Kumar K.V	6161a37307	powerpc/mm: Fix build error with FLATMEM book3s64 config The current value of MAX_PHYSMEM_BITS cannot work with 32 bit configs. We used to have MAX_PHYSMEM_BITS not defined without SPARSEMEM and 32 bit configs never expected a value to be set for MAX_PHYSMEM_BITS. Dependent code such as zsmalloc derived the right values based on other fields. Instead of finding a value that works with different configs, use new values only for book3s_64. For 64 bit booke, use the definition of MAX_PHYSMEM_BITS as per commit `a7df61a0e2` ("[PATCH] ppc64: Increase sparsemem defaults") That change was done in 2005 and hopefully will work with book3e 64. Fixes: `8bc0868998` ("powerpc/mm: Only define MAX_PHYSMEM_BITS in SPARSEMEM configurations") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:12:38 +10:00
Christophe Leroy	a68c31fc01	powerpc/32s: Implement Kernel Userspace Access Protection This patch implements Kernel Userspace Access Protection for book3s/32. Due to limitations of the processor page protection capabilities, the protection is only against writing. read protection cannot be achieved using page protection. The previous patch modifies the page protection so that RW user pages are RW for Key 0 and RO for Key 1, and it sets Key 0 for both user and kernel. This patch changes userspace segment registers are set to Ku 0 and Ks 1. When kernel needs to write to RW pages, the associated segment register is then changed to Ks 0 in order to allow write access to the kernel. In order to avoid having the read all segment registers when locking/unlocking the access, some data is kept in the thread_struct and saved on stack on exceptions. The field identifies both the first unlocked segment and the first segment following the last unlocked one. When no segment is unlocked, it contains value 0. As the hash_page() function is not able to easily determine if a protfault is due to a bad kernel access to userspace, protfaults need to be handled by handle_page_fault when KUAP is set. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Drop allow_read/write_to/from_user() as they're now in kup.h, and adapt allow_user_access() to do nothing when to == NULL] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:11:47 +10:00
Christophe Leroy	f342adca3a	powerpc/32s: Prepare Kernel Userspace Access Protection This patch prepares Kernel Userspace Access Protection for book3s/32. Due to limitations of the processor page protection capabilities, the protection is only against writing. read protection cannot be achieved using page protection. book3s/32 provides the following values for PP bits: PP00 provides RW for Key 0 and NA for Key 1 PP01 provides RW for Key 0 and RO for Key 1 PP10 provides RW for all PP11 provides RO for all Today PP10 is used for RW pages and PP11 for RO pages, and user segment register's Kp and Ks are set to 1. This patch modifies page protection to use PP01 for RW pages and sets user segment registers to Kp 0 and Ks 0. This will allow to setup Userspace write access protection by settng Ks to 1 in the following patch. Kernel space segment registers remain unchanged. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:11:46 +10:00
Christophe Leroy	31ed2b13c4	powerpc/32s: Implement Kernel Userspace Execution Prevention. To implement Kernel Userspace Execution Prevention, this patch sets NX bit on all user segments on kernel entry and clears NX bit on all user segments on kernel exit. Note that powerpc 601 doesn't have the NX bit, so KUEP will not work on it. A warning is displayed at startup. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:11:46 +10:00
Christophe Leroy	2679f9bd0a	powerpc/8xx: Add Kernel Userspace Access Protection This patch adds Kernel Userspace Access Protection on the 8xx. When a page is RO or RW, it is set RO or RW for Key 0 and NA for Key 1. Up to now, the User group is defined with Key 0 for both User and Supervisor. By changing the group to Key 0 for User and Key 1 for Supervisor, this patch prevents the Kernel from being able to access user data. At exception entry, the kernel saves SPRN_MD_AP in the regs struct, and reapply the protection. At exception exit it restores SPRN_MD_AP with the value saved on exception entry. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [mpe: Drop allow_read/write_to/from_user() as they're now in kup.h] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:11:46 +10:00
Christophe Leroy	06fbe81b59	powerpc/8xx: Add Kernel Userspace Execution Prevention This patch adds Kernel Userspace Execution Prevention on the 8xx. When a page is Executable, it is set Executable for Key 0 and NX for Key 1. Up to now, the User group is defined with Key 0 for both User and Supervisor. By changing the group to Key 0 for User and Key 1 for Supervisor, this patch prevents the Kernel from being able to execute user code. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:11:46 +10:00
Christophe Leroy	c341a108a5	powerpc/8xx: Only define APG0 and APG1 Since the 8xx implements hardware page table walk assistance, the PGD entries always point to a 4k aligned page, so the 2 upper bits of the APG are not clobbered anymore and remain 0. Therefore only APG0 and APG1 are used and need a definition. We set the other APG to the lowest permission level. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:11:46 +10:00
Christophe Leroy	e2fb9f5444	powerpc/32: Prepare for Kernel Userspace Access Protection This patch adds ASM macros for saving, restoring and checking the KUAP state, and modifies setup_32 to call them on exceptions from kernel. The macros are defined as empty by default for when CONFIG_PPC_KUAP is not selected and/or for platforms which don't handle (yet) KUAP. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:11:46 +10:00
Christophe Leroy	e291b6d575	powerpc/32: Remove MSR_PR test when returning from syscall syscalls are from user only, so we can account time without checking whether returning to kernel or user as it will only be user. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:11:46 +10:00
Michael Ellerman	5e5be3aed2	powerpc/mm: Detect bad KUAP faults When KUAP is enabled we have logic to detect page faults that occur outside of a valid user access region and are blocked by the AMR. What we don't have at the moment is logic to detect a fault within a valid user access region, that has been incorrectly blocked by AMR. This is not meant to ever happen, but it can if we incorrectly save/restore the AMR, or if the AMR was overwritten for some other reason. Currently if that happens we assume it's just a regular fault that will be corrected by handling the fault normally, so we just return. But there is nothing the fault handling code can do to fix it, so the fault just happens again and we spin forever, leading to soft lockups. So add some logic to detect that case and WARN() if we ever see it. Arguably it should be a BUG(), but it's more polite to fail the access and let the kernel continue, rather than taking down the box. There should be no data integrity issue with failing the fault rather than BUG'ing, as we're just going to disallow an access that should have been allowed. To make the code a little easier to follow, unroll the condition at the end of bad_kernel_fault() and comment each case, before adding the call to bad_kuap_fault(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2019-04-21 23:06:04 +10:00

1 2 3 4 5 ...

825608 Commits