linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-14 14:46:42 +07:00

Author	SHA1	Message	Date
Nicholas Piggin	83a980f7f4	powerpc/64s: Add exception macro that does not enable RI Subsequent patches will add more non-RI variant exceptions, so create a macro for it rather than open-code it. This does not change generated instructions. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-28 21:02:25 +10:00
Nicholas Piggin	6e83985b0f	powerpc/cbe: Do not process external or decremeter interrupts from sreset Cell will wake from low power state at the system reset interrupt, with the event encoded in SRR1, rather than waking at the interrupt vector that corresponds to that event. The system reset handler for this platform decodes SRR1 event reason and calls the interrupt handler to process it directly from the system reset handlre. A subsequent change will treat the system reset interrupt as a Linux NMI with its own per-CPU stack, and this will no longer work. Remove the external and decrementer handlers from the system reset handler. - The external exception remains raised and will fire again at the EE interrupt vector when system reset returns. - The decrementer is set to 1 so it will be raised again and fire when the system reset returns. It is possible to branch to an idle handler from the system reset interrupt (like POWER does), then restore a normal stack and restore this optimisation. But simplicity wins for now. Tested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-28 21:02:25 +10:00
Nicholas Piggin	461e96a337	powerpc/pasemi: Do not process external or decrementer interrupts from sreset PA Semi will wake from low power state at the system reset interrupt, with the event encoded in SRR1, rather than waking at the interrupt vector that corresponds to that event. The system reset handler for this platform decodes SRR1 event reason and calls the interrupt handler to process it directly from the system reset handlre. A subsequent change will treat the system reset interrupt as a Linux NMI with its own per-CPU stack, and this will no longer work. Remove the external and decrementer handlers from the system reset handler. - The external exception remains raised and will fire again at the EE interrupt vector when system reset returns. - The decrementer is set to 1 so it will be raised again and fire when the system reset returns. It is possible to branch to an idle handler from the system reset interrupt (like POWER does), then restore a normal stack and restore this optimisation. But simplicity wins for now. Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-28 21:02:25 +10:00
Michael Ellerman	b13f6683ed	Merge branch 'topic/ppc-kvm' into next Merge the topic branch we were sharing with kvm-ppc, Paul has also merged it.	2017-04-28 20:19:37 +10:00
Paul Mackerras	fb7dcf723d	Merge remote-tracking branch 'remotes/powerpc/topic/xive' into kvm-ppc-next This merges in the powerpc topic/xive branch to bring in the code for the in-kernel XICS interrupt controller emulation to use the new XIVE (eXternal Interrupt Virtualization Engine) hardware in the POWER9 chip directly, rather than via a XICS emulation in firmware. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-28 08:23:16 +10:00
Denis Kirjanov	db4b0dfab7	KVM: PPC: Book3S HV: Avoid preemptibility warning in module initialization With CONFIG_DEBUG_PREEMPT, get_paca() produces the following warning in kvmppc_book3s_init_hv() since it calls debug_smp_processor_id(). There is no real issue with the xics_phys field. If paca->kvm_hstate.xics_phys is non-zero on one cpu, it will be non-zero on them all. Therefore this is not fixing any actual problem, just the warning. [ 138.521188] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/5596 [ 138.521308] caller is .kvmppc_book3s_init_hv+0x184/0x350 [kvm_hv] [ 138.521404] CPU: 5 PID: 5596 Comm: modprobe Not tainted 4.11.0-rc3-00022-gc7e790c #1 [ 138.521509] Call Trace: [ 138.521563] [c0000007d018b810] [c0000000023eef10] .dump_stack+0xe4/0x150 (unreliable) [ 138.521694] [c0000007d018b8a0] [c000000001f6ec04] .check_preemption_disabled+0x134/0x150 [ 138.521829] [c0000007d018b940] [d00000000a010274] .kvmppc_book3s_init_hv+0x184/0x350 [kvm_hv] [ 138.521963] [c0000007d018ba00] [c00000000191d5cc] .do_one_initcall+0x5c/0x1c0 [ 138.522082] [c0000007d018bad0] [c0000000023e9494] .do_init_module+0x84/0x240 [ 138.522201] [c0000007d018bb70] [c000000001aade18] .load_module+0x1f68/0x2a10 [ 138.522319] [c0000007d018bd20] [c000000001aaeb30] .SyS_finit_module+0xc0/0xf0 [ 138.522439] [c0000007d018be30] [c00000000191baec] system_call+0x38/0xfc Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-28 08:21:51 +10:00
Ankit Kumar	041939c1ec	pstore: Fix flags to enable dumps on powerpc After commit `c950fd6f20` kernel registers pstore write based on flag set. Pstore write for powerpc is broken as flags(PSTORE_FLAGS_DMESG) is not set for powerpc architecture. On panic, kernel doesn't write message to /fs/pstore/dmesg*(Entry doesn't gets created at all). This patch enables pstore write for powerpc architecture by setting PSTORE_FLAGS_DMESG flag. Fixes: `c950fd6f20` ("pstore: Split pstore fragile flags") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Ankit Kumar <ankit@linux.vnet.ibm.com> Signed-off-by: Kees Cook <keescook@chromium.org>	2017-04-27 14:49:05 -07:00
Naveen N. Rao	096ff2ddba	powerpc/ftrace/64: Split further based on -mprofile-kernel Split ftrace_64.S further retaining the core ftrace 64-bit aspects in ftrace_64.S and moving ftrace_caller() and ftrace_graph_caller() into separate files based on -mprofile-kernel. The livepatch routines are all now contained within the mprofile file. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-27 22:20:29 +10:00
Naveen N. Rao	7853f9c029	powerpc: Split ftrace bits into a separate file entry_*.S now includes a lot more than just kernel entry/exit code. As a first step at cleaning this up, let's split out the ftrace bits into separate files. Also move all related tracing code into a new trace/ subdirectory. No functional changes. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-27 22:20:29 +10:00
Christophe Leroy	2505820f7c	powerpc/mm: Rename table dump file name Page table dump debugfs file is named 'kernel_page_tables' on all other architectures implementing it, while is is named 'kernel_pagetables' on powerpc. This patch renames it. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-27 22:20:28 +10:00
Christophe Leroy	78a18dbf01	powerpc/mm: On PPC32, display 32 bits addresses in page table dump Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-27 22:20:28 +10:00
Christophe Leroy	fd893fe56a	powerpc/mm: Fix missing page attributes in page table dump On some targets, _PAGE_RW is 0 and this is _PAGE_RO which is used. There is also _PAGE_SHARED that is missing. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-27 22:20:27 +10:00
Christophe Leroy	6c01bbd2cf	powerpc/mm: Fix page table dump build on PPC32 On PPC32 (eg. mpc885_ads_defconfig), page table dump compilation fails as follows. This is because the memory layout is slightly different on PPC32. This patch adapts it. arch/powerpc/mm/dump_linuxpagetables.c: In function 'walk_pagetables': arch/powerpc/mm/dump_linuxpagetables.c:369:10: error: 'KERN_VIRT_START' undeclared (first use in this function) ... Fixes: `8eb07b1870` ("powerpc/mm: Dump linux pagetables") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-27 22:20:26 +10:00
Aneesh Kumar K.V	a5998fcb92	powerpc/mm/radix: Optimise tlbiel flush all case _tlbiel_pid() is called with a ric (Radix Invalidation Control) argument of either RIC_FLUSH_TLB or RIC_FLUSH_ALL. RIC_FLUSH_ALL says to invalidate the entire TLB and the Page Walk Cache (PWC). To flush the whole TLB, we have to iterate over each set (congruence class) of the TLB. Currently we do that and pass RIC_FLUSH_ALL each time. That is not incorrect but it means we flush the PWC 128 times, when once would suffice. Fix it by doing the first flush with the ric value we're passed, and then if it was RIC_FLUSH_ALL, we downgrade it to RIC_FLUSH_TLB, because we know we have just flushed the PWC and don't need to do it again. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Split out of combined patch, tweak logic, rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-27 22:20:21 +10:00
Aneesh Kumar K.V	cf4f08bed8	powerpc/mm/radix: Optimise Page Walk Cache flush Currently we implement flushing of the page walk cache (PWC) by calling _tlbiel_pid() with a RIC (Radix Invalidation Control) value of 1 which says to only flush the PWC. But _tlbiel_pid() loops over each set (congruence class) of the TLB, which is not necessary when we're just flushing the PWC. In fact the set argument is ignored for a PWC flush, so essentially we're just flushing the PWC 127 extra times for no benefit. Fix it by adding tlbiel_pwc() which just does a single flush of the PWC. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Split out of combined patch, drop _ in name, rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-27 22:20:05 +10:00
Radim Krčmář	72875d8a4d	KVM: add kvm_{test,clear}_request to replace {test,clear}_bit Users were expected to use kvm_check_request() for testing and clearing, but request have expanded their use since then and some users want to only test or do a faster clear. Make sure that requests are not directly accessed with bit operations. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2017-04-27 14:12:22 +02:00
Benjamin Herrenschmidt	5af5099385	KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller This patch makes KVM capable of using the XIVE interrupt controller to provide the standard PAPR "XICS" style hypercalls. It is necessary for proper operations when the host uses XIVE natively. This has been lightly tested on an actual system, including PCI pass-through with a TG3 device. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and adding empty stubs for the kvm_xive_xxx() routines, fixup subject, integrate fixes from Paul for building PR=y HV=n] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-27 21:37:29 +10:00
Al Viro	2fefc97b21	HAVE_ARCH_HARDENED_USERCOPY is unconditional now Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-04-26 12:11:06 -04:00
Al Viro	701cac61d0	CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now all architectures converted Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-04-26 12:11:01 -04:00
Al Viro	eea86b637a	Merge branches 'uaccess.alpha', 'uaccess.arc', 'uaccess.arm', 'uaccess.arm64', 'uaccess.avr32', 'uaccess.bfin', 'uaccess.c6x', 'uaccess.cris', 'uaccess.frv', 'uaccess.h8300', 'uaccess.hexagon', 'uaccess.ia64', 'uaccess.m32r', 'uaccess.m68k', 'uaccess.metag', 'uaccess.microblaze', 'uaccess.mips', 'uaccess.mn10300', 'uaccess.nios2', 'uaccess.openrisc', 'uaccess.parisc', 'uaccess.powerpc', 'uaccess.s390', 'uaccess.score', 'uaccess.sh', 'uaccess.sparc', 'uaccess.tile', 'uaccess.um', 'uaccess.unicore32', 'uaccess.x86' and 'uaccess.xtensa' into work.uaccess	2017-04-26 12:06:59 -04:00
Michael Ellerman	45b21cfeb2	powerpc/powernv: Fix oops on P9 DD1 in cause_ipi() Recently we merged the native xive support for Power9, and then separately some reworks for doorbell IPI support. In isolation both series were OK, but the merged result had a bug in one case. On P9 DD1 we use pnv_p9_dd1_cause_ipi() which tries to use doorbells, and then falls back to the interrupt controller. However the fallback is implemented by calling icp_ops->cause_ipi. But now that xive support is merged we might be using xive, in which case icp_ops is not initialised, it's a xics specific structure. This leads to an oops such as: Unable to handle kernel paging request for data at address 0x00000028 Oops: Kernel access of bad area, sig: 11 [#1] NIP pnv_p9_dd1_cause_ipi+0x74/0xe0 LR smp_muxed_ipi_message_pass+0x54/0x70 To fix it, rather than using icp_ops which might be NULL, have both xics and xive set smp_ops->cause_ipi, and then in the powernv code we save that as ic_cause_ipi before overriding smp_ops->cause_ipi. For paranoia add a WARN_ON() to check if somehow smp_ops->cause_ipi is NULL. Fixes: `b866cc2199` ("powerpc: Change the doorbell IPI calling convention") Tested-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-26 23:28:12 +10:00
Michael Ellerman	83c4919058	powerpc/powernv: Fix missing attr initialisation in opal_export_attrs() In opal_export_attrs() we dynamically allocate some bin_attributes. They're allocated with kmalloc() and although we initialise most of the fields, we don't initialise write() or mmap(), and in particular we don't initialise the lockdep related fields in the embedded struct attribute. This leads to a lockdep warning at boot: BUG: key c0000000f11906d8 not in .data! WARNING: CPU: 0 PID: 1 at ../kernel/locking/lockdep.c:3136 lockdep_init_map+0x28c/0x2a0 ... Call Trace: lockdep_init_map+0x288/0x2a0 (unreliable) __kernfs_create_file+0x8c/0x170 sysfs_add_file_mode_ns+0xc8/0x240 __machine_initcall_powernv_opal_init+0x60c/0x684 do_one_initcall+0x60/0x1c0 kernel_init_freeable+0x2f4/0x3d4 kernel_init+0x24/0x160 ret_from_kernel_thread+0x5c/0xb0 Fix it by kzalloc'ing the attr, which fixes the uninitialised write() and mmap(), and calling sysfs_bin_attr_init() on it to initialise the lockdep fields. Fixes: `11fe909d23` ("powerpc/powernv: Add OPAL exports attributes to sysfs") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-26 15:57:19 +10:00
Michael Ellerman	b409946b2a	powerpc/mm: Fix possible out-of-bounds shift in arch_mmap_rnd() The recent patch to add runtime configuration of the ASLR limits added a bug in arch_mmap_rnd() where we may shift an integer (32-bits) by up to 33 bits, leading to undefined behaviour. In practice it exhibits as every process seg faulting instantly, presumably because the rnd value hasn't been restricited by the modulus at all. We didn't notice because it only happens under certain kernel configurations and if the number of bits is actually set to a large value. Fix it by switching to unsigned long. Fixes: `9fea59bd7c` ("powerpc/mm: Add support for runtime configuration of ASLR limits") Reported-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-26 15:35:01 +10:00
Nicholas Piggin	8bf8f2e8c7	powerpc/64s: Revert setting of LPCR[LPES] on POWER9 The XIVE enablement patches included a change to set the LPES (Logical Partitioning Environment Selector) bit (bit # 3) in LPCR (Logical Partitioning Control Register) on POWER9 hosts. This bit sets external interrupts to guest delivery mode, which uses SRR0/1. The host's EE interrupt handler is written to expect HSRR0/1 (for earlier CPUs). This should be fine because XIVE is configured not to deliver EEs to the host (Hypervisor Virtulization Interrupt is used instead) so the EE handler should never be executed. However a bug in interrupt controller code, hardware, or odd configuration of a simulator could result in the host getting an EE incorrectly. Keeping the EE delivery mode matching the host EE handler prevents strange crashes due to using the wrong exception registers. KVM will configure the LPCR to set LPES prior to running a guest so that EEs are delivered to the guest using SRR0/1. Fixes: `08a1e650cc` ("powerpc: Fixup LPCR:PECE and HEIC setting on POWER9") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Massage change log to avoid referring to LPES0 which is now renamed LPES] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-26 11:40:21 +10:00
Dan Williams	d4b29fd78e	block: remove block_device_operations ->direct_access() Now that all the producers and consumers of dax interfaces have been converted to using dax_operations on a dax_device, remove the block device direct_access enabling. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-04-25 13:20:46 -07:00
David Gibson	9765ad134a	powerpc/mm: Ensure IRQs are off in switch_mm() powerpc expects IRQs to already be (soft) disabled when switch_mm() is called, as made clear in the commit message of `9c1e105238` ("powerpc: Allow perf_counters to access user memory at interrupt time"). Aside from any race conditions that might exist between switch_mm() and an IRQ, there is also an unconditional hard_irq_disable() in switch_slb(). If that isn't followed at some point by an IRQ enable then interrupts will remain disabled until we return to userspace. It is true that when switch_mm() is called from the scheduler IRQs are off, but not when it's called by use_mm(). Looking closer we see that last year in commit `f98db6013c` ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler") this was made more explicit by the addition of switch_mm_irqs_off() which is now called by the scheduler, vs switch_mm() which is used by use_mm(). Arguably it is a bug in use_mm() to call switch_mm() in a different context than it expects, but fixing that will take time. This was discovered recently when vhost started throwing warnings such as: BUG: sleeping function called from invalid context at kernel/mutex.c:578 in_atomic(): 0, irqs_disabled(): 1, pid: 10768, name: vhost-10760 no locks held by vhost-10760/10768. irq event stamp: 10 hardirqs last enabled at (9): _raw_spin_unlock_irq+0x40/0x80 hardirqs last disabled at (10): switch_slb+0x2e4/0x490 softirqs last enabled at (0): copy_process+0x5e8/0x1260 softirqs last disabled at (0): (null) Call Trace: show_stack+0x88/0x390 (unreliable) dump_stack+0x30/0x44 __might_sleep+0x1c4/0x2d0 mutex_lock_nested+0x74/0x5c0 cgroup_attach_task_all+0x5c/0x180 vhost_attach_cgroups_work+0x58/0x80 [vhost] vhost_worker+0x24c/0x3d0 [vhost] kthread+0xec/0x100 ret_from_kernel_thread+0x5c/0xd4 Prior to commit `04b96e5528` ("vhost: lockless enqueuing") (Aug 2016) the vhost_worker() would do a spin_unlock_irq() not long after calling use_mm(), which had the effect of reenabling IRQs. Since that commit removed the locking in vhost_worker() the body of the vhost_worker() loop now runs with interrupts off causing the warnings. This patch addresses the problem by making the powerpc code mirror the x86 code, ie. we disable interrupts in switch_mm(), and optimise the scheduler case by defining switch_mm_irqs_off(). Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [mpe: Flesh out/rewrite change log, add stable] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-25 00:24:59 +10:00
Tyrel Datwyler	e76ca27790	powerpc/sysfs: Fix reference leak of cpu device_nodes present at boot For CPUs present at boot each logical CPU acquires a reference to the associated device node of the core. This happens in register_cpu() which is called by topology_init(). The result of this is that we end up with a reference held by each thread of the core. However, these references are never freed if the CPU core is DLPAR removed. This patch fixes the reference leaks by acquiring and releasing the references in the CPU hotplug callbacks un/register_cpu_online(). With this patch symmetric reference counting is observed with both CPUs present at boot, and those DLPAR added after boot. Fixes: `f86e4718f2` ("driver/core: cpu: initialize of_node in cpu's device struture") Cc: stable@vger.kernel.org # v3.12+ Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-25 00:24:59 +10:00
Tyrel Datwyler	68baf692c4	powerpc/pseries: Fix of_node_put() underflow during DLPAR remove Historically struct device_node references were tracked using a kref embedded as a struct field. Commit `75b57ecf9d` ("of: Make device nodes kobjects so they show up in sysfs") (Mar 2014) refactored device_nodes to be kobjects such that the device tree could by more simply exposed to userspace using sysfs. Commit `0829f6d1f6` ("of: device_node kobject lifecycle fixes") (Mar 2014) followed up these changes to better control the kobject lifecycle and in particular the referecne counting via of_node_get(), of_node_put(), and of_node_init(). A result of this second commit was that it introduced an of_node_put() call when a dynamic node is detached, in of_node_remove(), that removes the initial kobj reference created by of_node_init(). Traditionally as the original dynamic device node user the pseries code had assumed responsibilty for releasing this final reference in its platform specific DLPAR detach code. This patch fixes a refcount underflow introduced by commit `0829f6d1f6`, and recently exposed by the upstreaming of the recount API. Messages like the following are no longer seen in the kernel log with this patch following DLPAR remove operations of cpus and pci devices. rpadlpar_io: slot PHB 72 removed refcount_t: underflow; use-after-free. ------------[ cut here ]------------ WARNING: CPU: 5 PID: 3335 at lib/refcount.c:128 refcount_sub_and_test+0xf4/0x110 Fixes: `0829f6d1f6` ("of: device_node kobject lifecycle fixes") Cc: stable@vger.kernel.org # v3.15+ Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> [mpe: Make change log commit references more verbose] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-25 00:24:59 +10:00
Michael Ellerman	8567364668	powerpc/xmon: Deindent the SLB dumping logic Currently the code that dumps SLB entries uses a double-nested if. This means the actual dumping logic is a bit squashed. Deindent it by using continue. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-25 00:24:59 +10:00
Michael Ellerman	9fc849144c	Merge branch 'topic/kprobes' into next Although most of these kprobes patches are powerpc specific, there's a couple that touch generic code (with Acks). At the moment there's one conflict with acme's tree, but it's not too bad. Still just in case some other conflicts show up, we've put these in a topic branch so another tree could merge some or all of it if necessary.	2017-04-25 00:24:04 +10:00
Naveen N. Rao	24bd909e94	powerpc/kprobes: Prefer ftrace when probing function entry KPROBES_ON_FTRACE avoids much of the overhead of regular kprobes as it eliminates the need for a trap, as well as the need to emulate or single-step instructions. Though OPTPROBES provides us with similar performance, we have limited optprobes trampoline slots. As such, when asked to probe at a function entry, default to using the ftrace infrastructure. With: # cd /sys/kernel/debug/tracing # echo 'p _do_fork' > kprobe_events before patch: # cat ../kprobes/list c0000000000daf08 k _do_fork+0x8 [DISABLED] c000000000044fc0 k kretprobe_trampoline+0x0 [OPTIMIZED] and after patch: # cat ../kprobes/list c0000000000d074c k _do_fork+0xc [DISABLED][FTRACE] c0000000000412b0 k kretprobe_trampoline+0x0 [OPTIMIZED] Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-24 19:07:59 +10:00
Naveen N. Rao	1b32cd1715	powerpc: Introduce a new helper to obtain function entry points kprobe_lookup_name() is specific to the kprobe subsystem and may not always return the function entry point (in a subsequent patch for KPROBES_ON_FTRACE). For looking up function entry points, introduce a separate helper and use it in optprobes.c Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-24 19:07:58 +10:00
Naveen N. Rao	ead514d5fb	powerpc/kprobes: Add support for KPROBES_ON_FTRACE Allow kprobes to be placed on ftrace _mcount() call sites. This optimization avoids the use of a trap, by riding on ftrace infrastructure. This depends on HAVE_DYNAMIC_FTRACE_WITH_REGS which depends on MPROFILE_KERNEL, which is only currently enabled on powerpc64le with newer toolchains. Based on the x86 code by Masami. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-24 19:07:58 +10:00
Naveen N. Rao	2f59be5b97	powerpc/ftrace: Restore LR from pt_regs Pass the real LR to the ftrace handler. This is needed for KPROBES_ON_FTRACE for the pre handlers. Also, with KPROBES_ON_FTRACE, the link register may be updated by the pre handlers or by a registed kretprobe. Honor updated LR by restoring it from pt_regs, rather than from the stack save area. Live patch and function graph continue to work fine with this change. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-24 19:07:57 +10:00
Naveen N. Rao	9a914aa682	powerpc/kprobes: Blacklist common exception handlers Blacklist all the exception common/OOL handlers as the kernel stack is not yet setup, which means we can't take a trap at this point. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-23 20:32:26 +10:00
Naveen N. Rao	7aa5b018bf	powerpc/kprobes: Blacklist exception handlers Introduce __head_end to mark end of the early fixed sections and use it to blacklist all exception handlers from kprobes. mpe: We do not need to do anything special for relocatable kernels, where the exception vectors are split from the main kernel, as the split vectors are already excluded by the check for kernel_text_address(). Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Move __head_end outside #ifdef 64-bit to unbreak the 32-bit build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-23 20:32:25 +10:00
Naveen N. Rao	71f6e58e5e	powerpc/kprobes: Convert __kprobes to NOKPROBE_SYMBOL() Along similar lines as commit `9326638cbe` ("kprobes, x86: Use NOKPROBE_SYMBOL() instead of __kprobes annotation"), convert __kprobes annotation to either NOKPROBE_SYMBOL() or nokprobe_inline. The latter forces inlining, in which case the caller needs to be added to NOKPROBE_SYMBOL(). Also: - blacklist arch_deref_entry_point(), and - convert a few regular inlines to nokprobe_inline in lib/sstep.c A key benefit is the ability to detect such symbols as being blacklisted. Before this patch: $ cat /sys/kernel/debug/kprobes/blacklist \| grep read_mem $ perf probe read_mem Failed to write event: Invalid argument Error: Failed to add events. $ dmesg \| tail -1 [ 3736.112815] Could not insert probe at _text+10014968: -22 After patch: $ cat /sys/kernel/debug/kprobes/blacklist \| grep read_mem 0xc000000000072b50-0xc000000000072d20 read_mem $ perf probe read_mem read_mem is blacklisted function, skip it. Added new events: (null):(null) (on read_mem) probe:read_mem (on read_mem) You can now use it in all perf tools, such as: perf record -e probe:read_mem -aR sleep 1 $ grep " read_mem" /proc/kallsyms c000000000072b50 t read_mem c0000000005f3b40 t read_mem $ cat /sys/kernel/debug/kprobes/list c0000000005f3b48 k read_mem+0x8 [DISABLED] Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Minor change log formatting, fix up some conflicts] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-23 20:32:25 +10:00
Naveen N. Rao	700e64377c	powerpc/ftrace: Move stack setup and teardown code into ftrace_graph_caller() Move the stack setup and teardown code into ftrace_graph_caller(). This way, we don't incur the cost of setting it up unless function graph is enabled for this function. Also, remove the extraneous LR restore code after the function graph stub. LR has previously been restored and neither livepatch_handler() nor ftrace_graph_caller() return back here. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Drop bad change to non-mprofile-kernel version of ftrace_graph_caller] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-23 20:32:24 +10:00
Naveen N. Rao	d08f8a28bc	powerpc/kprobes: Remove duplicate saving of MSR set_current_kprobe() already saves regs->msr into kprobe_saved_msr. Remove the redundant save. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-23 20:32:24 +10:00
Nicholas Piggin	9cba253df4	powerpc/64s: Simplify POWER9 DD1 idle workaround code The idle workaround does not need to load PACATOC, and it does not need to be called within a nested function that requires LR to be saved. Load the PACATOC at entry to the idle wakeup. It does not matter which PACA this comes from, so it's okay to call before the workaround. Then apply the workaround to get the right PACA. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-23 20:32:23 +10:00
Nicholas Piggin	0d7720a242	powerpc/64s: Idle POWER8 avoid full state loss recovery where possible If not all threads were in winkle, full state loss recovery is not necessary and can be avoided. A previous patch removed this optimisation due to some complexity with the implementation. Re-implement it by counting the number of threads in winkle with the per-core idle state. Only restore full state loss if all threads were in winkle. This has a small window of false positives right before threads execute winkle and just after they wake up, when the winkle count does not reflect the true number of threads in winkle. This is not a significant problem in comparison with even the minimum winkle duration. For correctness, a false positive is not a problem (only false negatives would be). Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-23 20:32:12 +10:00
Nicholas Piggin	e420249d44	powerpc/64s: Idle do not hold reservation longer than required When taking the core idle state lock, grab it immediately like a regular lock, rather than adding more tests in there. Holding the lock keeps it stable, so there is no need to do it whole holding the reservation. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-23 20:31:57 +10:00
Nicholas Piggin	adbcf8d74f	powerpc/64s: Expand core idle state bits In preparation for adding more bits to the core idle state word, move the lock bit up, and unlock by flipping the lock bit rather than masking off all but the thread bits. Add branch hints for atomic operations while we're here. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-23 20:31:49 +10:00
Nicholas Piggin	1945bc4549	powerpc/64s: Fix POWER9 machine check handler from stop state The ISA specifies power save wakeup due to a machine check exception can cause a machine check interrupt (rather than the usual system reset interrupt). The machine check handler copes with this by doing low level machine check recovery without restoring full state from idle, then queues up a machine check event for logging, then directly executes the same idle instruction it woke from. This minimises the work done before recovery is performed. The problem is that it requires machine specific instructions and knowledge of the book3s idle code. Currently it only has code to handle POWER8 idle, so POWER9 crashes when trying to execute the P8 idle instructions which don't exist in ISAv3.0B. cpu 0x0: Vector: e40 (Emulation Assist) at [c0000000008f3810] pc: c000000000008380: machine_check_handle_early+0x130/0x2f0 lr: c00000000053a098: stop_loop+0x68/0xd0 sp: c0000000008f3a90 msr: 9000000000081001 current = 0xc0000000008a1080 paca = 0xc00000000ffd0000 softe: 0 irq_happened: 0x01 pid = 0, comm = swapper/0 Instead of going to sleep after recovery, do the usual idle wakeup and state restoration by calling into the normal idle wakeup path. This reuses the normal idle wakeup paths. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-23 20:31:46 +10:00
Nicholas Piggin	10101aa9aa	powerpc/64s: Use alternative feature patching This reduces the number of nops for POWER8. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-23 20:31:43 +10:00
Nicholas Piggin	544686cae8	powerpc/64s: Stop using bit in HSPRG0 to test winkle The POWER8 idle code has a neat trick of programming the power on engine to restore a low bit into HSPRG0, so idle wakeup code can test and see if it has been programmed this way and therefore lost all state. Restore time can be reduced if winkle has not been reached. However this messes with our r13 PACA pointer, and requires HSPRG0 to be written to. It also optimizes the slowest and most uncommon case at the expense of another SPR write in the common nap state wakeup. Remove this complexity and assume winkle sleeps always require a state restore. This speedup could be made entirely contained within the winkle idle code by counting per-core winkles and setting a thread bitmap when all have gone to winkle. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-23 20:31:39 +10:00
Nicholas Piggin	bf0153c143	powerpc/64s: Move remaining system reset idle code into idle_book3s.S No functional change. Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-23 20:31:35 +10:00
Ingo Molnar	6dd29b3df9	Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" This reverts commit `2947ba054a`. Dan Williams reported dax-pmem kernel warnings with the following signature: WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155 percpu_ref_switch_to_atomic_rcu+0x1f5/0x200 percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0 (0) after switching to atomic ... and bisected it to this commit, which suggests possible memory corruption caused by the x86 fast-GUP conversion. He also pointed out: " This is similar to the backtrace when we were not properly handling pud faults and was fixed with this commit: `220ced1676` "mm: fix get_user_pages() vs device-dax pud mappings" I've found some missing _devmap checks in the generic get_user_pages_fast() path, but this does not fix the regression [...] " So given that there are known bugs, and a pretty robust looking bisection points to this commit suggesting that are unknown bugs in the conversion as well, revert it for the time being - we'll re-try in v4.13. Reported-by: Dan Williams <dan.j.williams@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: aneesh.kumar@linux.vnet.ibm.com Cc: dann.frazier@canonical.com Cc: dave.hansen@intel.com Cc: steve.capper@linaro.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-23 11:45:20 +02:00
Ingo Molnar	58d30c36d4	Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Documentation updates. - Miscellaneous fixes. - Parallelize SRCU callback handling (plus overlapping patches). Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-04-23 11:12:44 +02:00
Nicholas Piggin	2563a70c3b	powerpc/64s: Remove unnecessary relocation branch from idle handler The system reset idle handler system_reset_idle_common is relocated, so relocation is not required to branch to kvm_start_guest. The superfluous relocation does not result in incorrect code, but it does not compile outside of exception-64s.S (with fixed section definitions). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-23 17:26:35 +10:00
David S. Miller	fb796707d7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Both conflict were simple overlapping changes. In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the conversion of the driver in net-next to use in-netdev stats. Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-21 20:23:53 -07:00
Linus Torvalds	92b4fc7563	powerpc fixes for 4.11 #8 Just two fixes. The first fixes kprobing a stdu, and is marked for stable as it's been broken for ~ever. In hindsight this could have gone in next. The other is a fix for a change we merged this cycle, where if we take a certain exception when the kernel is running relocated (currently only used for kdump), we checkstop the box. Thanks to: Ravi Bangoria. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJY+YyZAAoJEFHr6jzI4aWA6iYP/iwHOwFrAVMKl/zN8f5Vq/Ci pNPhOJoWFNkKfNkjzqGZXleoB76jc342d3uDDCuGI65YZVFIrlCFc1k93hCrlZYS jU0dJX+RLFEqcqXbwGhsBLEUjT17R8kxzgQ1J8gHzsFUjvbo7c2u49e7WvGrmqB9 +ksoqy81XfN9nkW4xDw+ME6bUcodW5rkxYNIPuZ0BUdnarPC/sdVvLPVzKuPwcRj 56wlry8kIwTdhqUSA9pYDaq1BY80AEp8d2VFEVsibhiNyJjyDFVHX6t4k/9an7oD IXiqBVuMX7RnYnAI86aaaoqkZ8EOVeNX0A4U2XtQjGuu+avnwAlYaJ+cFvhbzWXX zfjX8XanuFc7+Yok4G5W6Rqlye6DB6Ep4Asj3S1Nihv/UHKToVfvtAd46pXmUf2e 3Y+ut69AhT4aJZV4QGpJdUuh98xVR5dnmiAV/Yx+vKkcf3Bz2FzJ3OA/PGkevE0C M6hY8kjMI9cKFgN6WO5ziNFwAj4t2JHf78F7A5Fkp3I3H0FbDKqhU31Gp/Gnrv3L Giavyms78Z2+XVg+uxvXUIakKfnrWLao8HxwwCsfKKh9uhPoltM+I+5FI9mG3FSq XVyA81XqcSkH3Gq3Y5aYZI3cq7YOk3auiWKazQ9H4Fbpi342LiT0v9eIxaP4XxbC cY/QV6StaJ8cvQA/p2oL =yUj5 -----END PGP SIGNATURE----- Merge tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Just two fixes. The first fixes kprobing a stdu, and is marked for stable as it's been broken for ~ever. In hindsight this could have gone in next. The other is a fix for a change we merged this cycle, where if we take a certain exception when the kernel is running relocated (currently only used for kdump), we checkstop the box. Thanks to Ravi Bangoria" * tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction	2017-04-21 09:34:45 -07:00
Michael Ellerman	9fea59bd7c	powerpc/mm: Add support for runtime configuration of ASLR limits Add powerpc support for mmap_rnd_bits and mmap_rnd_compat_bits, which are two sysctls that allow a user to configure the number of bits of randomness used for ASLR. Because of the way the Kconfig for ARCH_MMAP_RND_BITS is defined, we have to construct at least the MIN value in Kconfig, vs in a header which would be more natural. Given that we just go ahead and do it all in Kconfig. At least according to the code (the documentation makes no mention of it), the value is defined as the number of bits of randomisation of the page, not the address. This makes some sense, with larger page sizes more of the low bits are forced to zero, which would reduce the randomisation if we didn't take the PAGE_SIZE into account. However it does mean the min/max values have to change depending on the PAGE_SIZE in order to actually limit the amount of address space consumed by the randomisation. The result of that is that we have to define the default values based on both 32-bit vs 64-bit, but also the configured PAGE_SIZE. Furthermore now that we have 128TB address space support on Book3S, we also have to take that into account. Finally we can wire up the value in arch_mmap_rnd(). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> Tested-by: Bhupesh Sharma <bhsharma@redhat.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2017-04-21 22:57:55 +10:00
Michael Ellerman	0f89f6e188	crypto: crct10dif-vpmsum - Fix missing preempt_disable() In crct10dif_vpmsum() we call enable_kernel_altivec() without first disabling preemption, which is not allowed. It used to be sufficient just to call pagefault_disable(), because that also disabled preemption. But the two were decoupled in commit `8222dbe21e` ("sched/preempt, mm/fault: Decouple preemption from the page fault logic") in mid 2015. The crct10dif-vpmsum code inherited this bug from the crc32c-vpmsum code on which it was modelled. So add the missing preempt_disable/enable(). We should also call disable_kernel_fp(), although it does nothing by default, there is a debug switch to make it active and all enables should be paired with disables. Fixes: `b01df1c16c` ("crypto: powerpc - Add CRC-T10DIF acceleration") Acked-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2017-04-21 20:30:51 +08:00
Oliver O'Halloran	f855b2f544	powerpc/mm: Wire up ioremap_cache() The default implementation of ioremap_cache() is aliased to ioremap(). On powerpc ioremap() creates cache-inhibited mappings by default which is almost certainly not what you wanted. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-21 21:08:47 +10:00
David Woodhouse	f66e225828	PCI: Add BAR index argument to pci_mmap_page_range() In all cases we know which BAR it is. Passing it in means that arch code (or generic code; watch this space) won't have to go looking for it again. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2017-04-20 08:47:47 -05:00
Naveen N. Rao	22d8b3dec2	powerpc/kprobes: Emulate instructions on kprobe handler re-entry On kprobe handler re-entry, try to emulate the instruction rather than single stepping always. Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-20 23:18:56 +10:00
Naveen N. Rao	1cabd2f8f7	powerpc/kprobes: Factor out code to emulate instruction into a helper Factor out code to emulate instruction into a try_to_emulate() helper function. This makes no functional changes. Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-20 23:18:56 +10:00
Naveen N. Rao	a64e3f35a4	powerpc/kretprobes: Override default function entry offset With ABIv2, we offset 8 bytes into a function to get at the local entry point. mpe: NB this function is currently not called, the change to generic code to call it is being merged via the tip tree. Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-20 23:18:55 +10:00
Naveen N. Rao	290e307076	powerpc/kprobes: Fix handling of function offsets on ABIv2 commit `239aeba764` ("perf powerpc: Fix kprobe and kretprobe handling with kallsyms on ppc64le") changed how we use the offset field in struct kprobe on ABIv2. perf now offsets from the global entry point if an offset is specified and otherwise chooses the local entry point. Fix the same in kernel for kprobe API users. We do this by extending kprobe_lookup_name() to accept an additional parameter to indicate the offset specified with the kprobe registration. If offset is 0, we return the local function entry and return the global entry point otherwise. With: # cd /sys/kernel/debug/tracing/ # echo "p _do_fork" >> kprobe_events # echo "p _do_fork+0x10" >> kprobe_events before this patch: # cat ../kprobes/list c0000000000d0748 k _do_fork+0x8 [DISABLED] c0000000000d0758 k _do_fork+0x18 [DISABLED] c0000000000412b0 k kretprobe_trampoline+0x0 [OPTIMIZED] and after: # cat ../kprobes/list c0000000000d04c8 k _do_fork+0x8 [DISABLED] c0000000000d04d0 k _do_fork+0x10 [DISABLED] c0000000000412b0 k kretprobe_trampoline+0x0 [OPTIMIZED] Acked-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-20 23:18:55 +10:00
Naveen N. Rao	49e0b4658f	kprobes: Convert kprobe_lookup_name() to a function The macro is now pretty long and ugly on powerpc. In the light of further changes needed here, convert it to a __weak variant to be over-ridden with a nicer looking function. Suggested-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-20 23:18:54 +10:00
Nicholas Piggin	a050d20d02	powerpc/64s: Use relon prolog for EXC_VIRT_OOL_MASKABLE_HV handlers Hypervisor Virtualization and Directed Hypervisor Doorbell interrupt handlers use the macro EXC_VIRT_OOL_MASKABLE_HV for their relocation-on handlers, which calls MASKABLE_RELON_EXCEPTION_HV_OOL, which uses the real mode interrupt prolog. This means we needlessly rfid from virtual mode to virtual mode. For POWER8 it only affects doorbell IPIs. Context switch microbenchmark between threads with snooze disabled (which causes IPI) gets about 3% faster, about 370 cycles. Should be more important on POWER9 with global doorbells and HVI for host interrupts. Use the RELON variant instead to reduce overhead. Fixes: `1707dd1613` ("powerpc: Save CFAR before branching in interrupt entry paths") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fold some more detail into the change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-20 17:05:11 +10:00
Michael Ellerman	686978b15c	powerpc/xive: Fix missing check of rc != OPAL_BUSY Dan Carpenter noticed that the code in __xive_native_disable_queue() has a for loop with an unconditional break in the middle, which doesn't make a lot of sense. What the code's supposed to do is loop as long as OPAL says it's busy, if we get any other return code, either success or failure, then we should break the loop. So add the missing check. Fixes: `243e25112d` ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-20 14:43:19 +10:00
Thomas Huth	feafd13c96	KVM: PPC: Book3S PR: Do not fail emulation with mtspr/mfspr for unknown SPRs According to the PowerISA 2.07, mtspr and mfspr should not always generate an illegal instruction exception when being used with an undefined SPR, but rather treat the instruction as a NOP or inject a privilege exception in some cases, too - depending on the SPR number. Also turn the printk here into a ratelimited print statement, so that the guest can not flood the dmesg log of the host by issueing lots of illegal mtspr/mfspr instruction here. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:39:32 +10:00
Alexey Kardashevskiy	121f80ba68	KVM: PPC: VFIO: Add in-kernel acceleration for VFIO This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO without passing them to user space which saves time on switching to user space and back. This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM. KVM tries to handle a TCE request in the real mode, if failed it passes the request to the virtual mode to complete the operation. If it a virtual mode handler fails, the request is passed to the user space; this is not expected to happen though. To avoid dealing with page use counters (which is tricky in real mode), this only accelerates SPAPR TCE IOMMU v2 clients which are required to pre-register the userspace memory. The very first TCE request will be handled in the VFIO SPAPR TCE driver anyway as the userspace view of the TCE table (iommu_table::it_userspace) is not allocated till the very first mapping happens and we cannot call vmalloc in real mode. If we fail to update a hardware IOMMU table unexpected reason, we just clear it and move on as there is nothing really we can do about it - for example, if we hot plug a VFIO device to a guest, existing TCE tables will be mirrored automatically to the hardware and there is no interface to report to the guest about possible failures. This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd and associates a physical IOMMU table with the SPAPR TCE table (which is a guest view of the hardware IOMMU table). The iommu_table object is cached and referenced so we do not have to look up for it in real mode. This does not implement the UNSET counterpart as there is no use for it - once the acceleration is enabled, the existing userspace won't disable it unless a VFIO container is destroyed; this adds necessary cleanup to the KVM_DEV_VFIO_GROUP_DEL handler. This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user space. This adds real mode version of WARN_ON_ONCE() as the generic version causes problems with rcu_sched. Since we testing what vmalloc_to_phys() returns in the code, this also adds a check for already existing vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect(). This finally makes use of vfio_external_user_iommu_id() which was introduced quite some time ago and was considered for removal. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:39:26 +10:00
Alexey Kardashevskiy	b1af23d836	KVM: PPC: iommu: Unify TCE checking This reworks helpers for checking TCE update parameters in way they can be used in KVM. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:39:21 +10:00
Alexey Kardashevskiy	da6f59e192	KVM: PPC: Use preregistered memory API to access TCE list VFIO on sPAPR already implements guest memory pre-registration when the entire guest RAM gets pinned. This can be used to translate the physical address of a guest page containing the TCE list from H_PUT_TCE_INDIRECT. This makes use of the pre-registrered memory API to access TCE list pages in order to avoid unnecessary locking on the KVM memory reverse map as we know that all of guest memory is pinned and we have a flat array mapping GPA to HPA which makes it simpler and quicker to index into that array (even with looking up the kernel page tables in vmalloc_to_phys) than it is to find the memslot, lock the rmap entry, look up the user page tables, and unlock the rmap entry. Note that the rmap pointer is initialized to NULL where declared (not in this patch). If a requested chunk of memory has not been preregistered, this will fall back to non-preregistered case and lock rmap. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:39:16 +10:00
Alexey Kardashevskiy	503bfcbe18	KVM: PPC: Pass kvm* to kvmppc_find_table() The guest view TCE tables are per KVM anyway (not per VCPU) so pass kvm* there. This will be used in the following patches where we will be attaching VFIO containers to LIOBNs via ioctl() to KVM (rather than to VCPU). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:39:12 +10:00
Alexey Kardashevskiy	e91aa8e6ec	KVM: PPC: Enable IOMMU_API for KVM_BOOK3S_64 permanently It does not make much sense to have KVM in book3s-64 and not to have IOMMU bits for PCI pass through support as it costs little and allows VFIO to function on book3s KVM. Having IOMMU_API always enabled makes it unnecessary to have a lot of "#ifdef IOMMU_API" in arch/powerpc/kvm/book3s_64_vio*. With those ifdef's we could have only user space emulated devices accelerated (but not VFIO) which do not seem to be very useful. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:39:07 +10:00
Paul Mackerras	644d2d6fef	Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next This merges in the commits in the topic/ppc-kvm branch of the powerpc tree to get the changes to arch/powerpc which subsequent patches will rely on. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:38:33 +10:00
Alexey Kardashevskiy	3762d45aa7	KVM: PPC: Align the table size to system page size At the moment the userspace can request a table smaller than a page size and this value will be stored as kvmppc_spapr_tce_table::size. However the actual allocated size will still be aligned to the system page size as alloc_page() is used there. This aligns the table size up to the system page size. It should not change the existing behaviour but when in-kernel TCE acceleration patchset reaches the upstream kernel, this will allow small TCE tables be accelerated as well: PCI IODA iommu_table allocator already aligns the size and, without this patch, an IOMMU group won't attach to LIOBN due to the mismatching table size. Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:38:20 +10:00
Alexey Kardashevskiy	96df226769	KVM: PPC: Book3S PR: Preserve storage control bits PR KVM page fault handler performs eaddr to pte translation for a guest, however kvmppc_mmu_book3s_64_xlate() does not preserve WIMG bits (storage control) in the kvmppc_pte struct. If PR KVM is running as a second level guest under HV KVM, and PR KVM tries inserting HPT entry, this fails in HV KVM if it already has this mapping. This preserves WIMG bits between kvmppc_mmu_book3s_64_xlate() and kvmppc_mmu_map_page(). Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:38:14 +10:00
Alexey Kardashevskiy	bd9166ffe6	KVM: PPC: Book3S PR: Exit KVM on failed mapping At the moment kvmppc_mmu_map_page() returns -1 if mmu_hash_ops.hpte_insert() fails for any reason so the page fault handler resumes the guest and it faults on the same address again. This adds distinction to kvmppc_mmu_map_page() to return -EIO if mmu_hash_ops.hpte_insert() failed for a reason other than full pteg. At the moment only pSeries_lpar_hpte_insert() returns -2 if plpar_pte_enter() failed with a code other than H_PTEG_FULL. Other mmu_hash_ops.hpte_insert() instances can only fail with -1 "full pteg". With this change, if PR KVM fails to update HPT, it can signal the userspace about this instead of returning to guest and having the very same page fault over and over again. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:38:04 +10:00
Alexey Kardashevskiy	9eecec126e	KVM: PPC: Book3S PR: Get rid of unused local variable @is_mmio has never been used since introduction in commit `2f4cf5e42d` ("Add book3s.c") from 2009. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:38:00 +10:00
Markus Elfring	37655490db	KVM: PPC: e500: Use kcalloc() in e500_mmu_host_init() * A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "kcalloc". This issue was detected by using the Coccinelle software. * Replace the specification of a data type by a pointer dereference to make the corresponding size determination a bit safer according to the Linux coding style convention. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:37:55 +10:00
Markus Elfring	a1c52e1c7c	KVM: PPC: Book3S HV: Use common error handling code in kvmppc_clr_passthru_irq() Add a jump target so that a bit of exception handling can be better reused at the end of this function. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:37:50 +10:00
Paul Mackerras	9b5ab00513	KVM: PPC: Add MMIO emulation for remaining floating-point instructions For completeness, this adds emulation of the lfiwax and lfiwzx instructions. With this, all floating-point load and store instructions as of Power ISA V2.07 are emulated. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:37:44 +10:00
Paul Mackerras	ceba57df43	KVM: PPC: Emulation for more integer loads and stores This adds emulation for the following integer loads and stores, thus enabling them to be used in a guest for accessing emulated MMIO locations. - lhaux - lwaux - lwzux - ldu - lwa - stdux - stwux - stdu - ldbrx - stdbrx Previously, most of these would cause an emulation failure exit to userspace, though ldu and lwa got treated incorrectly as ld, and stdu got treated incorrectly as std. This also tidies up some of the formatting and updates the comment listing instructions that still need to be implemented. With this, all integer loads and stores that are defined in the Power ISA v2.07 are emulated, except for those that are permitted to trap when used on cache-inhibited or write-through mappings (and which do in fact trap on POWER8), that is, lmw/stmw, lswi/stswi, lswx/stswx, lq/stq, and l[bhwdq]arx/st[bhwdq]cx. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:37:38 +10:00
Alexey Kardashevskiy	91242fd1a3	KVM: PPC: Add MMIO emulation for stdx (store doubleword indexed) This adds missing stdx emulation for emulated MMIO accesses by KVM guests. This allows the Mellanox mlx5_core driver from recent kernels to work when MMIO emulation is enforced by userspace. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:37:33 +10:00
Bin Lu	6f63e81bda	KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions This patch provides the MMIO load/store emulation for instructions of 'double & vector unsigned char & vector signed char & vector unsigned short & vector signed short & vector unsigned int & vector signed int & vector double '. The instructions that this adds emulation for are: - ldx, ldux, lwax, - lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux, - stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx, - lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx, - stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x [paulus@ozlabs.org - some cleanups, fixes and rework, make it compile for Book E, fix build when PR KVM is built in] Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 11:36:41 +10:00
Paul Mackerras	307d927967	KVM: PPC: Provide functions for queueing up FP/VEC/VSX unavailable interrupts This provides functions that can be used for generating interrupts indicating that a given functional unit (floating point, vector, or VSX) is unavailable. These functions will be used in instruction emulation code. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-04-20 10:39:50 +10:00
Dan Williams	60fcd55cc2	axon_ram: add dax_operations support Setup a dax_device to have the same lifetime as the axon_ram block device and add a ->direct_access() method that is equivalent to axon_ram_direct_access(). Once fs/dax.c has been converted to use dax_operations the old axon_ram_direct_access() will be removed. Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-04-19 15:14:36 -07:00
Yongji Xie	3827463769	powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned Override pcibios_default_alignment() to set default alignment to PAGE_SIZE for all PCI devices on PowerNV platform. Thus sub-page BARs would not share a page and could be mapped into guest when VFIO passthrough them. Signed-off-by: Yongji Xie <elohimes@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2017-04-19 12:51:26 -05:00
Nicholas Piggin	ca80d5d0a8	powerpc/64s: Remove SAO feature from Power9 DD1 Power9 DD1 does not implement SAO. Although it's not widely used, its presence or absence is visible to user space via arch_validate_prot() so it's moderately important that we get the value right. Fixes: `7dccfbc325` ("powerpc/book3s: Add a cpu table entry for different POWER9 revs") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:48:25 +10:00
Nicholas Piggin	2384d2d7ad	powerpc/64s: Remove ICSWX feature from Power9 Power9 does not implement the icswx instruction. This CPU feature is not visible to userspace and is only used in the CONFIG_PPC_ICSWX code, which is generally not enabled, and can only be triggered by other code using icswx, which should not happen on Power9 systems in the first place. So impact should be minimal. Fixes: `c3ab300ea5` ("powerpc: Add POWER9 cputable entry") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:21:50 +10:00
Madhavan Srinivasan	f2080b9ac3	powerpc/perf: Add Power8 mem_access event to sysfs Patch add "mem_access" event to sysfs. This as-is not a raw event supported by Power8 pmu. Instead, it is formed based on raw event encoding specificed in isa207-common.h. Primary PMU event used here is PM_MRK_INST_CMPL. This event tracks only the completed marked instructions. Random sampling mode (MMCRA[SM]) with Random Instruction Sampling (RIS) is enabled to mark type of instructions. With Random sampling in RLS mode with PM_MRK_INST_CMPL event, the LDST /DATA_SRC fields in SIER identifies the memory hierarchy level (eg: L1, L2 etc) statisfied a data-cache miss for a marked instruction. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:23 +10:00
Madhavan Srinivasan	d148c94c27	powerpc/perf: Support to export SIERs bit in Power9 Patch to export SIER bits to userspace via perf_mem_data_src and perf_sample_data struct. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:23 +10:00
Madhavan Srinivasan	453ce7a943	powerpc/perf: Support to export SIERs bit in Power8 Patch to export SIER bits to userspace via perf_mem_data_src and perf_sample_data struct. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:22 +10:00
Madhavan Srinivasan	170a315f41	powerpc/perf: Support to export MMCRA[TEC*] field to userspace Threshold feature when used with MMCRA [Threshold Event Counter Event], MMCRA[Threshold Start event] and MMCRA[Threshold End event] will update MMCRA[Threashold Event Counter Exponent] and MMCRA[Threshold Event Counter Multiplier] with the corresponding threshold event count values. Patch to export MMCRA[TECX/TECM] to userspace in 'weight' field of struct perf_sample_data. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:22 +10:00
Madhavan Srinivasan	79e96f8f93	powerpc/perf: Export memory hierarchy info to user space The LDST field and DATA_SRC in SIER identifies the memory hierarchy level (eg: L1, L2 etc), from which a data-cache miss for a marked instruction was satisfied. Use the 'perf_mem_data_src' object to export this hierarchy level to user space. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:21 +10:00
Alexey Kardashevskiy	e889e96e98	powerpc/iommu: Do not call PageTransHuge() on tail pages The CMA pages migration code does not support compound pages at the moment so it performs few tests before proceeding to actual page migration. One of the tests - PageTransHuge() - has VM_BUG_ON_PAGE(PageTail()) as it is designed to be called on head pages only. Since we also test for PageCompound(), and it contains PageTail() and PageHead(), we can simplify the check by leaving just PageCompound() and therefore avoid possible VM_BUG_ON_PAGE. Fixes: `2e5bbb5461` ("KVM: PPC: Book3S HV: Migrate pinned pages out of CMA") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:20 +10:00
Aneesh Kumar K.V	321f7d29e5	powerpc/mmap: Any hint > 128TB searches the full VA space As part of the new large address space support, processes start out life with a 128TB virtual address space. However when calling mmap() a process can pass a hint address, and if that hint is > 128TB the kernel will use the full 512TB address space to try and satisfy the mmap() request. Currently we have a check that the hint is > 128TB and < 512TB (TASK_SIZE), which was added as an optimisation to avoid updating addr_limit unnecessarily and also to avoid calling slice_flush_segments() on all CPUs more than necessary. However this has the user-visible side effect that an mmap() hint above 512TB does not search the full address space unless a preceding mmap() used a hint value > 128TB && < 512TB. So fix it to treat any hint above 128TB as a hint to search the full address space, instead of checking the hint against TASK_SIZE, we instead check if the addr_limit is already == TASK_SIZE. This also brings the ABI in-line with what is proposed on x86. ie, that a hint address above 128TB up to and including (2^64)-1 is an indication to search the full address space. Fixes: `f4ea6dcb08` (powerpc/mm: Enable mappings above 128TB) Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:19 +10:00
Nicholas Piggin	95dbdf4fa0	powerpc/64s: Minor fix for MCE TLB flush for radix The TLB flush for radix first flushes TLB for radix configuration, then flushes for hash configuration. The second flush is unnecessary but does not affect correctness. Fixes: `1a472c9dba` ("powerpc/mm/radix: Add tlbflush routines") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:18 +10:00
Aneesh Kumar K.V	be77e999e3	powerpc/mm/radix: Use mm->task_size for boundary checking instead of addr_limit We don't init addr_limit correctly for 32 bit applications. So default to using mm->task_size for boundary condition checking. We use addr_limit to only control free space search. This makes sure that we do the right thing with 32 bit applications. We should consolidate the usage of TASK_SIZE/mm->task_size and mm->context.addr_limit later. This partially reverts commit `fbfef9027c` (powerpc/mm: Switch some TASK_SIZE checks to use mm_context addr_limit). Fixes: `fbfef9027c` ("powerpc/mm: Switch some TASK_SIZE checks to use mm_context addr_limit") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:18 +10:00
Nicholas Piggin	8d1b48ef58	powerpc/64s: Revert setting of LPCR[LPES] on POWER9 The XIVE enablement patches included a change to set the LPES (Logical Partitioning Environment Selector) bit (bit # 3) in LPCR (Logical Partitioning Control Register) on POWER9 hosts. This bit sets external interrupts to guest delivery mode, which uses SRR0/1. The host's EE interrupt handler is written to expect HSRR0/1 (for earlier CPUs). This should be fine because XIVE is configured not to deliver EEs to the host (Hypervisor Virtulization Interrupt is used instead) so the EE handler should never be executed. However a bug in interrupt controller code, hardware, or odd configuration of a simulator could result in the host getting an EE incorrectly. Keeping the EE delivery mode matching the host EE handler prevents strange crashes due to using the wrong exception registers. KVM will configure the LPCR to set LPES prior to running a guest so that EEs are delivered to the guest using SRR0/1. Fixes: `08a1e650cc` ("powerpc: Fixup LPCR:PECE and HEIC setting on POWER9") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Massage change log to avoid referring to LPES0 which is now renamed LPES] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-19 20:00:17 +10:00
Laura Abbott	f318dd083c	cma: Store a name in the cma structure Frameworks that may want to enumerate CMA heaps (e.g. Ion) will find it useful to have an explicit name attached to each region. Store the name in each CMA structure. Signed-off-by: Laura Abbott <labbott@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-04-18 20:41:12 +02:00
Paul E. McKenney	77e5849688	rcu: Make arch select smp_mb__after_unlock_lock() strength The definition of smp_mb__after_unlock_lock() is currently smp_mb() for CONFIG_PPC and a no-op otherwise. It would be better to instead provide an architecture-selectable Kconfig option, and select the strength of smp_mb__after_unlock_lock() based on that option. This commit therefore creates ARCH_WEAK_RELEASE_ACQUIRE, has PPC select it, and bases the definition of smp_mb__after_unlock_lock() on this new ARCH_WEAK_RELEASE_ACQUIRE Kconfig option. Reported-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Boqun Feng <boqun.feng@linux.vnet.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: <linuxppc-dev@lists.ozlabs.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org>	2017-04-18 11:20:15 -07:00
David Woodhouse	e854d8b2a8	PCI: Add arch_can_pci_mmap_io() on architectures which can mmap() I/O space This is relatively esoteric, and knowing that we don't have it makes life easier in some cases rather than just an eventual -EINVAL from pci_mmap_page_range(). Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2017-04-18 13:02:26 -05:00
David Woodhouse	11df19546f	PCI: Move multiple declarations of pci_mmap_page_range() to <linux/pci.h> We can declare it <linux/pci.h> even on platforms where it isn't going to be defined. There's no need to have it littered through the various <asm/pci.h> files. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2017-04-18 13:02:11 -05:00
David Woodhouse	ae749c7ab4	PCI: Add arch_can_pci_mmap_wc() macro Most of the almost-identical versions of pci_mmap_page_range() silently ignore the 'write_combine' argument and give uncached mappings. Yet we allow the PCIIOC_WRITE_COMBINE ioctl in /proc/bus/pci, expose the 'resourceX_wc' file in sysfs, and allow an attempted mapping to apparently succeed. To fix this, introduce a macro arch_can_pci_mmap_wc() which indicates whether the platform can do a write-combining mapping. On x86 this ends up being pat_enabled(), while the few other platforms that support it can just set it to a literal '1'. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2017-04-18 13:01:42 -05:00
Michael Ellerman	be5c5e843c	powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y Prior to commit `2337d20728` ("powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts"), the branch from hmi_exception_early() to hmi_exception_realmode() was just a bl hmi_exception_realmode, which the linker would turn into a bl to the local entry point of hmi_exception_realmode. This was broken when CONFIG_RELOCATABLE=y because hmi_exception_realmode() is not in the low part of the kernel text that is copied down to 0x0. But in fixing that, we added a new bug on little endian kernels. Because the branch is now a bctrl when CONFIG_RELOCATABLE=y, we branch to the global entry point of hmi_exception_realmode(). The global entry point must be called with r12 containing the address of hmi_exception_realmode(), because it uses that value to calculate the TOC value (r2). This may manifest as a checkstop, because we take a junk value from r12 which came from HSRR1, add a small constant to it and then use that as the TOC pointer. The HSRR1 value will have 0x9 as the top nibble, which puts it above RAM and somewhere in MMIO space. Fix it by changing the BRANCH_LINK_TO_FAR() macro to always use r12 to load the label we're branching to. This means r12 will be setup correctly on LE, fixing this bug, and r12 is also volatile across function calls on BE so it's a good choice anyway. Fixes: `2337d20728` ("powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts") Reported-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-18 20:19:52 +10:00
Ravi Bangoria	9e1ba4f27f	powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction If we set a kprobe on a 'stdu' instruction on powerpc64, we see a kernel OOPS: Bad kernel stack pointer cd93c840 at c000000000009868 Oops: Bad kernel stack pointer, sig: 6 [#1] ... GPR00: c000001fcd93cb30 00000000cd93c840 c0000000015c5e00 00000000cd93c840 ... NIP [c000000000009868] resume_kernel+0x2c/0x58 LR [c000000000006208] program_check_common+0x108/0x180 On a 64-bit system when the user probes on a 'stdu' instruction, the kernel does not emulate actual store in emulate_step() because it may corrupt the exception frame. So the kernel does the actual store operation in exception return code i.e. resume_kernel(). resume_kernel() loads the saved stack pointer from memory using lwz, which only loads the low 32-bits of the address, causing the kernel crash. Fix this by loading the 64-bit value instead. Fixes: `be96f63375` ("powerpc: Split out instruction analysis part of emulate_step()") Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> [mpe: Change log massage, add stable tag] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-18 20:19:21 +10:00
David S. Miller	6b6cbc1471	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts were simply overlapping changes. In the net/ipv4/route.c case the code had simply moved around a little bit and the same fix was made in both 'net' and 'net-next'. In the net/sched/sch_generic.c case a fix in 'net' happened at the same time that a new argument was added to qdisc_hash_add(). Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-15 21:16:30 -04:00
Thomas Gleixner	6d11b87d55	powerpc/smp: Replace open coded task affinity logic Init task invokes smp_ops->setup_cpu() from smp_cpus_done(). Init task can run on any online CPU at this point, but the setup_cpu() callback requires to be invoked on the boot CPU. This is achieved by temporarily setting the affinity of the calling user space thread to the requested CPU and reset it to the original affinity afterwards. That's racy vs. CPU hotplug and concurrent affinity settings for that thread resulting in code executing on the wrong CPU and overwriting the new affinity setting. That's actually not a problem in this context as neither CPU hotplug nor affinity settings can happen, but the access to task_struct::cpus_allowed is about to restricted. Replace it with a call to work_on_cpu_safe() which achieves the same result. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Tejun Heo <tj@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/20170412201042.518053336@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-04-15 12:20:54 +02:00
Nicolai Stange	115631c350	powerpc/time: Set ->min_delta_ticks and ->max_delta_ticks In preparation for making the clockevents core NTP correction aware, all clockevent device drivers must set ->min_delta_ticks and ->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a clockevent device's rate is going to change dynamically and thus, the ratio of ns to ticks ceases to stay invariant. Make the powerpc arch's clockevent driver initialize these fields properly. This patch alone doesn't introduce any change in functionality as the clockevents core still looks exclusively at the (untouched) ->min_delta_ns and ->max_delta_ns. As soon as this has changed, a followup patch will purge the initialization of ->min_delta_ns and ->max_delta_ns from this driver. Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Oliver O'Halloran <oohall@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2017-04-14 13:11:10 -07:00
Michael Ellerman	270e2dc9b8	powerpc/pseries: Always enable SMP when building pseries The pseries platform supports Power4 and later CPUs, all of which are multithreaded and/or multicore. In practice no one ever builds a SMP=n kernel for these machines. So as we did for powernv, have the pseries platform imply SMP=y. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-13 23:37:23 +10:00
Michael Ellerman	40e275653e	powerpc/powernv: Always enable SMP when building powernv The powernv platform supports Power7 and later CPUs, all of which are multithreaded and multicore. As such we never build a SMP=n kernel for those machines, other than possibly for debugging or running in a simulator. In the debugging case we can get a similar effect by booting with nr_cpus=1, or there's always the option of building a custom kernel with SMP hacked out. For running in simulators the code size reduction from building without SMP is not particularly important, what matters is the number of instructions executed. A quick test shows that a SMP=y kernel takes ~6% more instructions to boot to a shell. Booting with nr_cpus=1 recovers about half that deficit. On the flip side, keeping the SMP=n kernel building can be a pain at times. And although we've mostly kept it building in recent years, no one is regularly testing that the SMP=n kernel actually boots and works well on these machines. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-13 23:37:17 +10:00
Michael Ellerman	ebbe9d7d3a	powerpc: Allow platforms to force-enable CONFIG_SMP Of the 64-bit Book3S platforms, only powermac supports booting on an actual non-SMP system. The other platforms can be built with SMP disabled, but it doesn't make a lot of sense given the CPUs they support are all multicore or multithreaded. So give platforms the option of forcing SMP=y. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-13 23:36:43 +10:00
Michael Ellerman	590c369e7e	powerpc: Drop include of linux/io.h from asm/io.h Currently powerpc's asm/io.h includes linux/io.h, and linux/io.h includes asm/io.h. This can cause problems because depending on which is included first the order of definitions between the two files will change. The include of linux/io.h was added back in 2008 in commit `b41e5fffe8` ("[POWERPC] devres: Add devm_ioremap_prot()"). It's not entirely clear it was needed then, but devm_ioremap_prot() has since been removed entirely as unused, in `dedd24a12f` ("powerpc: Remove unused devm_ioremap_prot()"). So it seems to be unnecessary and can potentially cause problems, so remove the include of linux/io.h from asm/io.h Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-13 23:34:35 +10:00
Nicholas Piggin	6b3edefefa	powerpc/powernv: POWER9 support for msgsnd/doorbell IPI POWER9 requires msgsync for receiver-side synchronization, and a DD1 workaround restricts IPIs to core-local. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Drop no longer needed asm feature macro changes] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-13 23:34:34 +10:00
Nicholas Piggin	a5adf28246	powerpc/64s: Avoid a branch for ppc_msgsnd IPIs are a pretty hot path and we already have the ability to do asm feature patching, so use it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Change log detail] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-13 23:34:34 +10:00
Nicholas Piggin	b87ac02183	powerpc: Introduce msgsnd/doorbell barrier primitives POWER9 changes requirements and adds new instructions for synchronization. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-13 23:34:33 +10:00
Nicholas Piggin	b866cc2199	powerpc: Change the doorbell IPI calling convention Change the doorbell callers to know about their msgsnd addressing, rather than have them set a per-cpu target data tag at boot that gets sent to the cause_ipi functions. The data is only used for doorbell IPI functions, no other IPI types, so it makes sense to keep that detail local to doorbell. Have the platform code understand doorbell IPIs, rather than the interrupt controller code understand them. Platform code can look at capabilities it has available and decide which to use. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-13 23:34:33 +10:00
Nicholas Piggin	9b7ff0c658	powerpc/64s: Add SCV FSCR bit for ISA v3.0 Add the bit definition and use it in facility_unavailable_exception() so we can intelligently report the cause if we take a fault for SCV. This doesn't actually enable SCV. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Drop whitespace changes to the existing entries, flush out change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-13 23:34:32 +10:00
Nicholas Piggin	794464f4de	powerpc/64s: Add msgp facility unavailable log string Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-13 23:34:32 +10:00
Aneesh Kumar K.V	5f8122611b	powerpc/mm/hash: Don't open code VMALLOC_INDEX We have a #define for it, so use it. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-13 23:34:31 +10:00
Rashmica Gupta	9e4114b391	powerpc/mm: Fix hash table dump when memory is not contiguous The current behaviour of the hash table dump assumes that memory is contiguous and iterates from the start of memory to (start + size of memory). When memory isn't physically contiguous, this doesn't work. If memory exists at 0-5 GB and 6-10 GB then the current approach will check if entries exist in the hash table from 0GB to 9GB. This patch changes the behaviour to iterate over any holes up to the end of memory. Fixes: `1515ab9321` ("powerpc/mm: Dump hash table") Signed-off-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-12 23:03:32 +10:00
Oliver O'Halloran	aaa2295292	powerpc/mm: Add physical address to Linux page table dump The current page table dumper scans the Linux page tables and coalesces mappings with adjacent virtual addresses and similar PTE flags. This behaviour is somewhat broken when you consider the IOREMAP space where entirely unrelated mappings will appear to be virtually contiguous. This patch modifies the range coalescing so that only ranges that are both physically and virtually contiguous are combined. This patch also adds to the dump output the physical address at the start of each range. Fixes: `8eb07b1870` ("powerpc/mm: Dump linux pagetables") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> [mpe: Print the physicall address with 0x like the other addresses] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-12 23:00:25 +10:00
Oliver O'Halloran	70538eaa70	powerpc/mm: Fix missing _PAGE_NON_IDEMPOTENT in pgtable dump On Book3s we have two PTE flags used to mark cache-inhibited mappings: _PAGE_TOLERANT and _PAGE_NON_IDEMPOTENT. Currently the kernel page table dumper only looks at the generic _PAGE_NO_CACHE which is defined to be _PAGE_TOLERANT. This patch modifies the dumper so both flags are shown in the dump. Fixes: `8eb07b1870` ("powerpc/mm: Dump linux pagetables") Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-12 22:47:27 +10:00
Balbir Singh	9c355917fc	powerpc/tracing: Allow tracing of mmap syscalls Currently sys_mmap() and sys_mmap2() (32-bit only), are not visible to the syscall tracing machinery. This means users are not able to see the execution of mmap() syscalls using the syscall tracer. Fix that by using SYSCALL_DEFINE6 for sys_mmap() and sys_mmap2() so that the meta-data associated with these syscalls is visible to the syscall tracer. A side-effect of this change is that the return type has changed from unsigned long to long. However this should have no effect, the only code in the kernel which uses the result of these syscalls is in the syscall return path, which is written in asm and treats the result as unsigned regardless. Example output: cat-3399 [001] .... 196.542410: sys_mmap(addr: 7fff922a0000, len: 20000, prot: 3, flags: 812, fd: 3, offset: 1b0000) cat-3399 [001] .... 196.542443: sys_mmap -> 0x7fff922a0000 cat-3399 [001] .... 196.542668: sys_munmap(addr: 7fff922c0000, len: 6d2c) cat-3399 [001] .... 196.542677: sys_munmap -> 0x0 Signed-off-by: Balbir Singh <bsingharora@gmail.com> [mpe: Massage change log, add detail on return type change] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-12 22:32:43 +10:00
Michael Ellerman	03dfee6d5f	powerpc/mm: Fix swapper_pg_dir size on 64-bit hash w/64K pages Recently in commit `f6eedbba7a` ("powerpc/mm/hash: Increase VA range to 128TB"), we increased H_PGD_INDEX_SIZE to 15 when we're building with 64K pages. This makes it larger than RADIX_PGD_INDEX_SIZE (13), which means the logic to calculate MAX_PGD_INDEX_SIZE in book3s/64/pgtable.h is wrong. The end result is that the PGD (Page Global Directory, ie top level page table) of the kernel (aka. swapper_pg_dir), is too small. This generally doesn't lead to a crash, as we don't use the full range in normal operation. However if we try to dump the kernel pagetables we can trigger a crash because we walk off the end of the pgd into other memory and eventually try to dereference something bogus: $ cat /sys/kernel/debug/kernel_pagetables Unable to handle kernel paging request for data at address 0xe8fece0000000000 Faulting instruction address: 0xc000000000072314 cpu 0xc: Vector: 380 (Data SLB Access) at [c0000000daa13890] pc: c000000000072314: ptdump_show+0x164/0x430 lr: c000000000072550: ptdump_show+0x3a0/0x430 dar: e802cf0000000000 seq_read+0xf8/0x560 full_proxy_read+0x84/0xc0 __vfs_read+0x6c/0x1d0 vfs_read+0xbc/0x1b0 SyS_read+0x6c/0x110 system_call+0x38/0xfc The root cause is that MAX_PGD_INDEX_SIZE isn't actually computed to be the max of H_PGD_INDEX_SIZE or RADIX_PGD_INDEX_SIZE. To fix that move the calculation into asm-offsets.c where we can do it easily using max(). Fixes: `f6eedbba7a` ("powerpc/mm/hash: Increase VA range to 128TB") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-12 22:32:43 +10:00
Michael Ellerman	3c19d5ada1	Merge branch 'topic/xive' (early part) into next This merges the arch part of the XIVE support, leaving the final commit with the KVM specific pieces dangling on the branch for Paul to merge via the kvm-ppc tree.	2017-04-12 22:31:37 +10:00
Gautham R. Shenoy	17ed4c8f81	powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1 POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus the HSPRG0 of a thread waking up from can contain the paca pointer of its sibling. This patch implements a context recovery framework within threads of a core, by provisioning space in paca_struct for saving every sibling threads's paca pointers. Basically, we should be able to arrive at the right paca pointer from any of the thread's existing paca pointer. At bootup, during powernv idle-init, we save the paca address of every CPU in each one its siblings paca_struct in the slot corresponding to this CPU's index in the core. On wakeup from a stop, the thread will determine its index in the core from the TIR register and recover its PACA pointer by indexing into the correct slot in the provisioned space in the current PACA. Furthermore, ensure that the NVGPRs are restored from the stack on the way out by setting the NAPSTATELOST in paca. [Changelog written with inputs from svaidy@linux.vnet.ibm.com] Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Call it a bug] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-11 08:45:09 +10:00
Gautham R. Shenoy	f3b3f28493	powerpc/powernv/idle: Don't override default/deepest directly in kernel Currently during idle-init on power9, if we don't find suitable stop states in the device tree that can be used as the default_stop/deepest_stop, we set stop0 (ESL=1,EC=1) as the default stop state psscr to be used by power9_idle and deepest stop state which is used by CPU-Hotplug. However, if the platform firmware has not configured or enabled a stop state, the kernel should not make any assumptions and fallback to a default choice. If the kernel uses a stop state that is not configured by the platform firmware, it may lead to further failures which should be avoided. In this patch, we modify the init code to ensure that the kernel uses only the stop states exposed by the firmware through the device tree. When a suitable default stop state isn't found, we disable ppc_md.power_save for power9. Similarly, when a suitable deepest_stop_state is not found in the device tree exported by the firmware, fall back to the default busy-wait loop in the CPU-Hotplug code. [Changelog written with inputs from svaidy@linux.vnet.ibm.com] Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-11 08:45:09 +10:00
Gautham R. Shenoy	9006123157	powerpc/powernv/smp: Add busy-wait loop as fall back for CPU-Hotplug Currently, the powernv cpu-offline function assumes that platform idle states such as stop on POWER9, winkle/sleep/nap on POWER8 are always available. On POWER8, it picks nap as the default state if other deep idle states like sleep/winkle are not available and enabled in the platform. On POWER9, nap is not available and all idle states are managed by STOP instruction. The parameters to the idle state are passed through processor stop status control register (PSSCR). Hence as such executing STOP would take parameters from current PSSCR. We do not want to make any assumptions in kernel on what STOP states and PSSCR features are configured by the platform. Ideally platform will configure a good set of stop states that can be used in the kernel. We would like to start with a clean slate, if the platform choose to not configure any state or there is an error in platform firmware that lead to no stop states being configured or allowed to be requested. This patch adds a fallback method for CPU-Hotplug that is similar to snooze loop at idle where the threads are left to spin at low priority and hence reduce the cycles consumed. This is a safe fallback mechanism in the case when no stop state would be requested if the platform firmware did not configure them most likely due to an error condition. Requesting a stop state when the platform has not configured them or enabled them would lead to further error conditions which could be difficult to debug. [Changelog written with inputs from svaidy@linux.vnet.ibm.com] Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-11 08:45:09 +10:00
Gautham R. Shenoy	a7cd88da97	powerpc/powernv: Move CPU-Offline idle state invocation from smp.c to idle.c Move the piece of code in powernv/smp.c::pnv_smp_cpu_kill_self() which transitions the CPU to the deepest available platform idle state to a new function named pnv_cpu_offline() in powernv/idle.c. The rationale behind this code movement is that the data required to determine the deepest available platform state resides in powernv/idle.c. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-11 08:45:09 +10:00
Anshuman Khandual	2c9faa7675	powerpc/hugetlb: Add ABI defines for supported HugeTLB page sizes Add user space exported API definitions for 512KB, 1MB, 2MB, 8MB, 16MB, 1GB, 16GB non default huge page sizes to be used with mmap() system call. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Reword the comment to emphasise that these are only needed to use the non-default huge page size, and updated the change log.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-11 07:46:06 +10:00
Anshuman Khandual	ea61455574	powerpc/mm: Remove reduntant initmem information from log Generic core VM already prints these information in the log buffer, hence there is no need for a second print. This just removes the second print from arch powerpc NUMA init path. Before the patch: $ dmesg \| grep "Initmem" numa: Initmem setup node 0 [mem 0x00000000-0xffffffff] numa: Initmem setup node 1 [mem 0x100000000-0x1ffffffff] numa: Initmem setup node 2 [mem 0x200000000-0x2ffffffff] numa: Initmem setup node 3 [mem 0x300000000-0x3ffffffff] numa: Initmem setup node 4 [mem 0x400000000-0x4ffffffff] numa: Initmem setup node 5 [mem 0x500000000-0x5ffffffff] numa: Initmem setup node 6 [mem 0x600000000-0x6ffffffff] numa: Initmem setup node 7 [mem 0x700000000-0x7ffffffff] Initmem setup node 0 [mem 0x0000000000000000-0x00000000ffffffff] Initmem setup node 1 [mem 0x0000000100000000-0x00000001ffffffff] Initmem setup node 2 [mem 0x0000000200000000-0x00000002ffffffff] Initmem setup node 3 [mem 0x0000000300000000-0x00000003ffffffff] Initmem setup node 4 [mem 0x0000000400000000-0x00000004ffffffff] Initmem setup node 5 [mem 0x0000000500000000-0x00000005ffffffff] Initmem setup node 6 [mem 0x0000000600000000-0x00000006ffffffff] Initmem setup node 7 [mem 0x0000000700000000-0x00000007ffffffff] After the patch just the latter set is printed. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-11 07:46:05 +10:00
Michael Ellerman	7b3912f422	powerpc: Make sparsemem the default on 64-bit Book3S Make sparsemem the default on all 64-bit Book3S platforms. It already is for pseries and ps3, and we need to enable it for powernv because on POWER9 memory between chips is discontiguous. For the other platforms sparsemem should work fine, though it might add a small amount of overhead. We can always force FLATMEM in the defconfigs if necessary. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-11 07:46:05 +10:00
Michael Ellerman	4868e3508d	powerpc/nohash: Fix use of mmu_has_feature() in setup_initial_memory_limit() setup_initial_memory_limit() is called from early_init_devtree(), which runs prior to feature patching. If the kernel is built with CONFIG_JUMP_LABEL=y and CONFIG_JUMP_LABEL_FEATURE_CHECKS=y then we will potentially get the wrong value. If we also have CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG=y we get a warning and backtrace: Warning! mmu_has_feature() used prior to jump label init! CPU: 0 PID: 0 Comm: swapper Not tainted 4.11.0-rc4-gccN-next-20170331-g6af2434 #1 Call Trace: [c000000000fc3d50] [c000000000a26c30] .dump_stack+0xa8/0xe8 (unreliable) [c000000000fc3de0] [c00000000002e6b8] .setup_initial_memory_limit+0xa4/0x104 [c000000000fc3e60] [c000000000d5c23c] .early_init_devtree+0xd0/0x2f8 [c000000000fc3f00] [c000000000d5d3b0] .early_setup+0x90/0x11c [c000000000fc3f90] [c000000000000520] start_here_multiplatform+0x68/0x80 Fix it by using early_mmu_has_feature(). Fixes: `c12e6f24d4` ("powerpc: Add option to use jump label for mmu_has_feature()") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-11 07:46:04 +10:00
Michael Ellerman	3ae05fb3cc	powerpc: Remove unnecessary includes of asm/debug.h These files don't seem to have any need for asm/debug.h, now that all it includes are the debugger hooks and breakpoint definitions. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-11 07:46:04 +10:00
Michael Ellerman	7644d5819c	powerpc: Create asm/debugfs.h and move powerpc_debugfs_root there powerpc_debugfs_root is the dentry representing the root of the "powerpc" directory tree in debugfs. Currently it sits in asm/debug.h, a long with some other things that have "debug" in the name, but are otherwise unrelated. Pull it out into a separate header, which also includes linux/debugfs.h, and convert all the users to include debugfs.h instead of debug.h. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-11 07:46:03 +10:00
Alistair Popple	abfe8026b5	powerpc/powernv: Require MMU_NOTIFIER to fix NPU build In the recent commit `1ab66d1fba` ("powerpc/powernv: Introduce address translation services for Nvlink2") the NPU code gained a dependency on MMU notifiers. All our defconfigs have KVM enabled, which selects MMU_NOTIFIER, but if KVM is not enabled then the build breaks. Fix it by always selecting MMU_NOTIFIER when we're building powernv. Fixes: `1ab66d1fba` ("powerpc/powernv: Introduce address translation services for Nvlink2") Signed-off-by: Alistair Popple <alistair@popple.id.au> [mpe: Reword change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-11 07:46:03 +10:00
Aneesh Kumar K.V	f7327e0ba3	powerpc/mm/radix: Remove unnecessary ptesync For a tlbiel with pid, we need to issue tlbiel with set number encoded. We don't need to do ptesync for each of those. Instead we need one for the entire tlbiel pid operation. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-11 07:46:02 +10:00
Aneesh Kumar K.V	f6b0df55ca	powerpc/mm/radix: Don't do page walk cache flush when doing full mm flush For fullmm tlb flush, we do a flush with RIC_FLUSH_ALL which will invalidate all related caches (radix__tlb_flush()). Hence the pwc flush is not needed. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-11 07:46:02 +10:00
Borislav Petkov	e3c4ff6d8c	EDAC: Remove EDAC_MM_EDAC Move all the EDAC core functionality behind CONFIG_EDAC and get rid of that indirection. Update defconfigs which had it. While at it, fix dependencies such that EDAC depends on RAS for the tracepoints. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: linux-edac@vger.kernel.org	2017-04-10 17:14:41 +02:00
Benjamin Herrenschmidt	08a1e650cc	powerpc: Fixup LPCR:PECE and HEIC setting on POWER9 We need to set LPES in order for normal external interrupts (0x500) to be directed to the guest while running in guest state. We also need HEIC set to prevent them to be sent to the host while in host state. With XIVE the host never gets one of these and wouldn't know how to handle it. All host external interrupts come in via the new hypervisor virtualization interrupts vector. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-10 21:43:17 +10:00
Benjamin Herrenschmidt	d381d7caf8	powerpc: Consolidate variants of real-mode MMIOs We have all sort of variants of MMIO accessors for the real mode instructions. This creates a clean set of accessors based on Linux normal naming conventions, replacing all occurrences of the old ones in the tree. I have purposefully removed the "out/in" variants in favor of only including __raw variants. Any code using these is already pretty much hand tuned to operate in a very specific environment. I've fixed up the 2 users (only one of them actually needed a barrier in the first place). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-10 21:43:16 +10:00
Benjamin Herrenschmidt	f50d6bd344	powerpc/kvm: Remove obsolete kvm_vm_ioctl_xics_irq declaration The function doesn't exist anymore Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-10 21:43:16 +10:00
Benjamin Herrenschmidt	936774cd3f	powerpc/kvm: Make kvmppc_xics_create_icp static It's only used within the same file it's defined Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-10 21:43:15 +10:00
Benjamin Herrenschmidt	d3989143d0	powerpc/kvm: Massage order of #include We traditionally have linux/ before asm/ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-10 21:43:15 +10:00
Benjamin Herrenschmidt	243e25112d	powerpc/xive: Native exploitation of the XIVE interrupt controller The XIVE interrupt controller is the new interrupt controller found in POWER9. It supports advanced virtualization capabilities among other things. Currently we use a set of firmware calls that simulate the old "XICS" interrupt controller but this is fairly inefficient. This adds the framework for using XIVE along with a native backend which OPAL for configuration. Later, a backend allowing the use in a KVM or PowerVM guest will also be provided. This disables some fast path for interrupts in KVM when XIVE is enabled as these rely on the firmware emulation code which is no longer available when the XIVE is used natively by Linux. A latter patch will make KVM also directly exploit the XIVE, thus recovering the lost performance (and more). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Fixup pr_xxx("XIVE:"...), don't split pr_xxx() strings, tweak Kconfig so XIVE_NATIVE selects XIVE and depends on POWERNV, fix build errors when SMP=n, fold in fixes from Ben: Don't call cpu_online() on an invalid CPU number Fix irq target selection returning out of bounds cpu# Extra sanity checks on cpu numbers ] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-10 21:41:34 +10:00
Linus Torvalds	894ca30cf6	powerpc fixes for 4.11 #7 Headed to stable: - disable HFSCR[TM] if TM is not supported, fixes a potential host kernel crash triggered by a hostile guest, but only in configurations that no one uses - don't try to fix up misaligned load-with-reservation instructions - fix flush_(d\|i)cache_range() called from modules on little endian kernels - add missing global TLB invalidate if cxl is active - fix missing preempt_disable() in crc32c-vpmsum And a fix for selftests build changes that went in this release: - selftests/powerpc: Fix standalone powerpc build Thanks to: Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran, Paul Mackerras. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJY6LIKAAoJEFHr6jzI4aWAhfcQAKORHx/tJf9w8KqcfSfKfeEL O8cZEl5/N3ArNXVM5J5QK5KnMVHnoWWR3FWYwntOjt3RJywjJYJ02YvhOVvt4q+M YinRS34KzAhnT1f526zx97v0BGqi//UJamrcFBUBTd4rLuHGbol7fdtWHVrsMYa0 KWQ+ooPLEpGDk4I3sDz37yeJBQXVpyhC/UF8vzHpvHGPvIQ8Dw8rfWwOZ0HooJuZ ewKdkeIsYF8SrM461c1GhOI0VXB0q+CMn9mzIaEKMuZMhHDKyiaM5rm8mWXapzcT HsCQKlF9X9YHAbhbSbz9DGvNCEYaW7T4vnudSNHjQaAJlA4HsmeRwWXy4+zqZuPc rIbRIFZAyV3wYowN7j3P6Se3lLBDMmlHZvVkygJnwoaR4rmoujePGwdAv8ZH4Udn hrbieC41HKVxcm5t3whIDOcHmxaAo1MDqmrVhyxJSjgnkdBtN/gnZXvHDb0VeOJV 9wFGGE8WvMXnTKEcjM2l+a14CuOrV/wRbHQ1B1O0Kfk613cPrukMYab6eLPqyJzF lmkCm1o46bib5oBOmvlqK+5oVuwNyfHmJSzvL+VOylhLVbJPmFJUhHQFssCvsTUf k36ZAUxH4fbz1TzAPipXl+wrkE/yzthGmA9FTC9hLkYE/rzvrZt9IKowFw1mq5n/ 2zFabXQBl5JBQ4hdL54f =bTuf -----END PGP SIGNATURE----- Merge tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 4.11: Headed to stable: - disable HFSCR[TM] if TM is not supported, fixes a potential host kernel crash triggered by a hostile guest, but only in configurations that no one uses - don't try to fix up misaligned load-with-reservation instructions - fix flush_(d\|i)cache_range() called from modules on little endian kernels - add missing global TLB invalidate if cxl is active - fix missing preempt_disable() in crc32c-vpmsum And a fix for selftests build changes that went in this release: - selftests/powerpc: Fix standalone powerpc build Thanks to: Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran, Paul Mackerras" * tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable() powerpc/mm: Add missing global TLB invalidate if cxl is active powerpc/64: Fix flush_(d\|i)cache_range() called from modules powerpc: Don't try to fix up misaligned load-with-reservation instructions powerpc: Disable HFSCR[TM] if TM is not supported selftests/powerpc: Fix standalone powerpc build	2017-04-08 11:06:12 -07:00
Chenbo Feng	5daab9db7b	New getsockopt option to get socket cookie Introduce a new getsockopt operation to retrieve the socket cookie for a specific socket based on the socket fd. It returns a unique non-decreasing cookie for each socket. Tested: https://android-review.googlesource.com/#/c/358163/ Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Chenbo Feng <fengc@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2017-04-08 08:07:01 -07:00
Paolo Bonzini	4b4357e025	kvm: make KVM_COALESCED_MMIO_PAGE_OFFSET public Its value has never changed; we might as well make it part of the ABI instead of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO). Because PPC does not always make MMIO available, the code has to be made dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-04-07 16:49:01 +02:00
Paolo Bonzini	3042255899	kvm: make KVM_CAP_COALESCED_MMIO architecture agnostic Remove code from architecture files that can be moved to virt/kvm, since there is already common code for coalesced MMIO. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Removed a pointless 'break' after 'return'.] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2017-04-07 16:49:00 +02:00
Michael Ellerman	4749228f02	powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable() In crc32c_vpmsum() we call enable_kernel_altivec() without first disabling preemption, which is not allowed: WARNING: CPU: 9 PID: 2949 at ../arch/powerpc/kernel/process.c:277 enable_kernel_altivec+0x100/0x120 Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c vmx_crypto ... CPU: 9 PID: 2949 Comm: docker Not tainted 4.11.0-rc5-compiler_gcc-6.3.1-00033-g308ac7563944 #381 ... NIP [c00000000001e320] enable_kernel_altivec+0x100/0x120 LR [d000000003df0910] crc32c_vpmsum+0x108/0x150 [crc32c_vpmsum] Call Trace: 0xc138fd09 (unreliable) crc32c_vpmsum+0x108/0x150 [crc32c_vpmsum] crc32c_vpmsum_update+0x3c/0x60 [crc32c_vpmsum] crypto_shash_update+0x88/0x1c0 crc32c+0x64/0x90 [libcrc32c] dm_bm_checksum+0x48/0x80 [dm_persistent_data] sb_check+0x84/0x120 [dm_thin_pool] dm_bm_validate_buffer.isra.0+0xc0/0x1b0 [dm_persistent_data] dm_bm_read_lock+0x80/0xf0 [dm_persistent_data] __create_persistent_data_objects+0x16c/0x810 [dm_thin_pool] dm_pool_metadata_open+0xb0/0x1a0 [dm_thin_pool] pool_ctr+0x4cc/0xb60 [dm_thin_pool] dm_table_add_target+0x16c/0x3c0 table_load+0x184/0x400 ctl_ioctl+0x2f0/0x560 dm_ctl_ioctl+0x38/0x50 do_vfs_ioctl+0xd8/0x920 SyS_ioctl+0x68/0xc0 system_call+0x38/0xfc It used to be sufficient just to call pagefault_disable(), because that also disabled preemption. But the two were decoupled in commit `8222dbe21e` ("sched/preempt, mm/fault: Decouple preemption from the page fault logic") in mid 2015. So add the missing preempt_disable/enable(). We should also call disable_kernel_fp(), although it does nothing by default, there is a debug switch to make it active and all enables should be paired with disables. Fixes: `6dd7a82cc5` ("crypto: powerpc - Add POWER8 optimised crc32c") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-07 21:12:58 +10:00
Benjamin Herrenschmidt	a978e13965	powerpc/smp: Remove migrate_irq() custom implementation Some powerpc platforms use this to move IRQs away from a CPU being unplugged. This function has several bugs such as not taking the right locks or failing to NULL check pointers. There's a new generic function doing exactly the same thing without all the bugs, so let's use it instead. mpe: The obvious place for the select of GENERIC_IRQ_MIGRATION is on HOTPLUG_CPU, but that doesn't work. On some configs PM_SLEEP_SMP will select HOTPLUG_CPU even though its dependencies are not met, which means the select of GENERIC_IRQ_MIGRATION doesn't happen. That leads to the build breaking. Fix it by moving the select of GENERIC_IRQ_MIGRATION to SMP. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-04-07 12:01:27 +10:00
Al Viro	fccfb99508	Merge commit 'b4fb8f66f1ae2e167d06c12d018025a8d4d3ba7e' into uaccess.ia64 backmerge of mainline ia64 fix	2017-04-06 19:35:03 -04:00
Al Viro	3448890c32	powerpc: get rid of zeroing, switch to RAW_COPY_USER Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-04-06 15:08:42 -04:00

1 2 3 4 5 ...

16495 Commits