linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Linus Torvalds	49a695ba72	powerpc updates for 4.17 Notable changes: - Support for 4PB user address space on 64-bit, opt-in via mmap(). - Removal of POWER4 support, which was accidentally broken in 2016 and no one noticed, and blocked use of some modern instructions. - Workarounds so that the hypervisor can enable Transactional Memory on Power9. - A series to disable the DAWR (Data Address Watchpoint Register) on Power9. - More information displayed in the meltdown/spectre_v1/v2 sysfs files. - A vpermxor (Power8 Altivec) implementation for the raid6 Q Syndrome. - A big series to make the allocation of our pacas (per cpu area), kernel page tables, and per-cpu stacks NUMA aware when using the Radix MMU on Power9. And as usual many fixes, reworks and cleanups. Thanks to: Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy, Alistair Popple, Andy Shevchenko, Aneesh Kumar K.V, Anshuman Khandual, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe Lombard, Cyril Bur, Daniel Axtens, Dave Young, Finn Thain, Frederic Barrat, Gustavo Romero, Horia Geantă, Jonathan Neuschäfer, Kees Cook, Larry Finger, Laurent Dufour, Laurent Vivier, Logan Gunthorpe, Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus Elfring, Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de Oliveira, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher Boessenkool, Simon Guo, Simon Horman, Stewart Smith, Sukadev Bhattiprolu, Suraj Jitindar Singh, Thiago Jung Bauermann, Vaibhav Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wei Yongjun. -----BEGIN PGP SIGNATURE----- iQIwBAABCAAaBQJayKxDExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAr JQ/6A9Xs4zHDn9OeT9esEIxciETqUlrP0Wp64c4JVC7EkG1E7xRDZ4Xb4m8R2nNt 9sPhtNO1yCtEk6kFQtPNB0N8v6pud4I6+aMcYnn+tP8mJRYQ4x9bYaF3Hw98IKmE Kd6TglmsUQvh2GpwPiF93KpzzWu1HB2kZzzqJcAMTMh7C79Qz00BjrTJltzXB2jx tJ+B4lVy8BeU8G5nDAzJEEwb5Ypkn8O40rS/lpAwVTYOBJ8Rbyq8Fj82FeREK9YO 4EGaEKPkC/FdzX7OJV3v2/nldCd8pzV471fAoGuBUhJiJBMBoBybcTHIdDex7LlL zMLV1mUtGo8iolRPhL8iCH+GGifZz2WzstYCozz7hgIraWtc/frq9rZp6q0LdH/K trk7UbPGlVb92ecWZVpZyEcsMzKrCgZqnAe9wRNh1uEKScEdzd/bmRaMhENUObRh Hili6AVvmSKExpy7k2sZP/oUMaeC15/xz8Lk7l8a/iCkYhNmPYh5iSXM5+UKpcRT FYOcO0o3DwXsN46Whow3nJ7TqAsDy9/ecPUG71JQi3ZrHnRrm8jxkn8MCG5pZ1Fi KvKDxlg6RiJo3DF9/fSOpJUokvMwqBS5dJo4eh5eiDy94aBTqmBKFecvPxQm7a0L l3uXCF/6JuXEvMukFjGBO4RiYhw8i+B2uKsh81XUh7HKrgE= =HAB1 -----END PGP SIGNATURE----- Merge tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Notable changes: - Support for 4PB user address space on 64-bit, opt-in via mmap(). - Removal of POWER4 support, which was accidentally broken in 2016 and no one noticed, and blocked use of some modern instructions. - Workarounds so that the hypervisor can enable Transactional Memory on Power9. - A series to disable the DAWR (Data Address Watchpoint Register) on Power9. - More information displayed in the meltdown/spectre_v1/v2 sysfs files. - A vpermxor (Power8 Altivec) implementation for the raid6 Q Syndrome. - A big series to make the allocation of our pacas (per cpu area), kernel page tables, and per-cpu stacks NUMA aware when using the Radix MMU on Power9. And as usual many fixes, reworks and cleanups. Thanks to: Aaro Koskinen, Alexandre Belloni, Alexey Kardashevskiy, Alistair Popple, Andy Shevchenko, Aneesh Kumar K.V, Anshuman Khandual, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe Lombard, Cyril Bur, Daniel Axtens, Dave Young, Finn Thain, Frederic Barrat, Gustavo Romero, Horia Geantă, Jonathan Neuschäfer, Kees Cook, Larry Finger, Laurent Dufour, Laurent Vivier, Logan Gunthorpe, Madhavan Srinivasan, Mark Greer, Mark Hairgrove, Markus Elfring, Mathieu Malaterre, Matt Brown, Matt Evans, Mauricio Faria de Oliveira, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai, Rob Herring, Sam Bobroff, Segher Boessenkool, Simon Guo, Simon Horman, Stewart Smith, Sukadev Bhattiprolu, Suraj Jitindar Singh, Thiago Jung Bauermann, Vaibhav Jain, Vaidyanathan Srinivasan, Vasant Hegde, Wei Yongjun" * tag 'powerpc-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (207 commits) powerpc/64s/idle: Fix restore of AMOR on POWER9 after deep sleep powerpc/64s: Fix POWER9 DD2.2 and above in cputable features powerpc/64s: Fix pkey support in dt_cpu_ftrs, add CPU_FTR_PKEY bit powerpc/64s: Fix dt_cpu_ftrs to have restore_cpu clear unwanted LPCR bits Revert "powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead" powerpc: iomap.c: introduce io{read\|write}64_{lo_hi\|hi_lo} powerpc: io.h: move iomap.h include so that it can use readq/writeq defs cxl: Fix possible deadlock when processing page faults from cxllib powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it powerpc/mm/radix: Update command line parsing for disable_radix powerpc/mm/radix: Parse disable_radix commandline correctly. powerpc/mm/hugetlb: initialize the pagetable cache correctly for hugetlb powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix powerpc/mm/keys: Update documentation and remove unnecessary check powerpc/64s/idle: POWER9 ESL=0 stop avoid save/restore overhead powerpc/64s/idle: Consolidate power9_offline_stop()/power9_idle_stop() powerpc/powernv: Always stop secondaries before reboot/shutdown powerpc: hard disable irqs in smp_send_stop loop powerpc: use NMI IPI for smp_send_stop powerpc/powernv: Fix SMT4 forcing idle code ...	2018-04-07 12:08:19 -07:00
Linus Torvalds	3c0d551e02	pci-v4.17-changes -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlrHeY8UHGJoZWxnYWFz QGdvb2dsZS5jb20ACgkQWYigwDrT+vxhLRAAndV/0NDyWZU0eZNM6twri2SEFnF7 E4ar+YthxDxxJG4TLJbIA12jc5NgHZy4WuttDa6Jb99KreBXIHJFlNi/V/tme6zf +yXUuxWae7wJzBiaay57VqLGSc80gt/LTgjLa1siwQqjTbO3wSXR6JJXNaE9FtQ4 /jL61t8bD1Peb5cWTpt9p0hrnKI0/pHwASdReyFS4F/HDKdvpof7BxE/OU3HSxxA XKC2v6RjY4S93vkzvApDXQ+vhKquVRK7/ojyTXQUO/GIzcARprO7H4k62N4ar0x/ qbXLkR8IMkwA8ecsNmcL92ftb/cXoHfd+wdK8WpijqzF4kW4SdteVWbIhUzI0gbr 0gjDYIzjplvH3pZGv/qvx+8sFtAP95OdPjuAAW2qJ9TCVfmiS8naNFCvcxg87RhD gjyQD3If1X7F8wy309lhq7VNyRexTHgIMgTXHyFvuZMzn/Qe1huL2XCwDcEAg/OX AvU2iuSE5tWAh7gIUMF/aWi3uoeJUyyoru5ZR//gqdFfx9YxpSimO1UDXnpPi8SR Iz/jzHJc0aWGYdQ9l6HiSbJF3P/QQcWYs9igt0A7BRGB05SPdWCh7sSO70FJa8ME f4WID5/qEiaH26kiSRX4cUqpc8Amk8bT0DXw2OT57qy3JM0ZdV5ENQX11pSpr9hv uLEf0DU7AEmdvzQ= =T++R -----END PGP SIGNATURE----- Merge tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - move pci_uevent_ers() out of pci.h (Michael Ellerman) - skip ASPM common clock warning if BIOS already configured it (Sinan Kaya) - fix ASPM Coverity warning about threshold_ns (Gustavo A. R. Silva) - remove last user of pci_get_bus_and_slot() and the function itself (Sinan Kaya) - add decoding for 16 GT/s link speed (Jay Fang) - add interfaces to get max link speed and width (Tal Gilboa) - add pcie_bandwidth_capable() to compute max supported link bandwidth (Tal Gilboa) - add pcie_bandwidth_available() to compute bandwidth available to device (Tal Gilboa) - add pcie_print_link_status() to log link speed and whether it's limited (Tal Gilboa) - use PCI core interfaces to report when device performance may be limited by its slot instead of doing it in each driver (Tal Gilboa) - fix possible cpqphp NULL pointer dereference (Shawn Lin) - rescan more of the hierarchy on ACPI hotplug to fix Thunderbolt/xHCI hotplug (Mika Westerberg) - add support for PCI I/O port space that's neither directly accessible via CPU in/out instructions nor directly mapped into CPU physical memory space. This is fairly intrusive and includes minor changes to interfaces used for I/O space on most platforms (Zhichang Yuan, John Garry) - add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan, John Garry) - use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas) - remove possible NULL pointer dereference in of_pci_bus_find_domain_nr() (Shawn Lin) - report quirk timings with dev_info (Bjorn Helgaas) - report quirks that take longer than 10ms (Bjorn Helgaas) - add and use Altera Vendor ID (Johannes Thumshirn) - tidy Makefiles and comments (Bjorn Helgaas) - don't set up INTx if MSI or MSI-X is enabled to align cris, frv, ia64, and mn10300 with x86 (Bjorn Helgaas) - move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick Lawler) - merge pcieport_if.h into portdrv.h (Bjorn Helgaas) - move workaround for BIOS PME issue from portdrv to PCI core (Bjorn Helgaas) - completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas) - remove portdrv link order dependency (Bjorn Helgaas) - remove support for unused VC portdrv service (Bjorn Helgaas) - simplify portdrv feature permission checking (Bjorn Helgaas) - remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn Helgaas) - remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas) - use cached AER capability offset (Frederick Lawler) - don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg) - rename pcie-dpc.c to dpc.c (Bjorn Helgaas) - use generic pci_mmap_resource_range() instead of powerpc and xtensa arch-specific versions (David Woodhouse) - support arbitrary PCI host bridge offsets on sparc (Yinghai Lu) - remove System and Video ROM reservations on sparc (Bjorn Helgaas) - probe for device reset support during enumeration instead of runtime (Bjorn Helgaas) - add ACS quirk for Ampere (née APM) root ports (Feng Kan) - add function 1 DMA alias quirk for Marvell 88SE9220 (Thomas Vincent-Cross) - protect device restore with device lock (Sinan Kaya) - handle failure of FLR gracefully (Sinan Kaya) - handle CRS (config retry status) after device resets (Sinan Kaya) - skip various config reads for SR-IOV VFs as an optimization (KarimAllah Ahmed) - consolidate VPD code in vpd.c (Bjorn Helgaas) - add Tegra dependency on PCI_MSI_IRQ_DOMAIN (Arnd Bergmann) - add DT support for R-Car r8a7743 (Biju Das) - fix a PCI_EJECT vs PCI_BUS_RELATIONS race condition in Hyper-V host bridge driver that causes a general protection fault (Dexuan Cui) - fix Hyper-V host bridge hang in MSI setup on 1-vCPU VMs with SR-IOV (Dexuan Cui) - fix Hyper-V host bridge hang when ejecting a VF before setting up MSI (Dexuan Cui) - make several structures static (Fengguang Wu) - increase number of MSI IRQs supported by Synopsys DesignWare bridges from 32 to 256 (Gustavo Pimentel) - implemented multiplexed IRQ domain API and remove obsolete MSI IRQ API from DesignWare drivers (Gustavo Pimentel) - add Tegra power management support (Manikanta Maddireddy) - add Tegra loadable module support (Manikanta Maddireddy) - handle 64-bit BARs correctly in endpoint support (Niklas Cassel) - support optional regulator for HiSilicon STB (Shawn Guo) - use regulator bulk API for Qualcomm apq8064 (Srinivas Kandagatla) - support power supplies for Qualcomm msm8996 (Srinivas Kandagatla) * tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (123 commits) MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver HISI LPC: Add ACPI support ACPI / scan: Do not enumerate Indirect IO host children ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings of: Add missing I/O range exception for indirect-IO devices PCI: Apply the new generic I/O management on PCI IO hosts PCI: Add fwnode handler as input param of pci_register_io_range() PCI: Remove __weak tag from pci_register_io_range() MAINTAINERS: Add missing /drivers/pci/cadence directory entry fm10k: Report PCIe link properties with pcie_print_link_status() net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth net/mlx5: Report PCIe link properties with pcie_print_link_status() net/mlx4_core: Report PCIe link properties with pcie_print_link_status() PCI: Add pcie_print_link_status() to log link speed and whether it's limited PCI: Add pcie_bandwidth_available() to compute bandwidth available to device misc: pci_endpoint_test: Handle 64-bit BARs properly PCI: designware-ep: Make dw_pcie_ep_reset_bar() handle 64-bit BARs properly PCI: endpoint: Make sure that BAR_5 does not have 64-bit flag set when clearing PCI: endpoint: Make epc->ops->clear_bar()/pci_epc_clear_bar() take struct *epf_bar ...	2018-04-06 18:31:06 -07:00
Dan Williams	09135cc594	mm, powerpc: use vma_kernel_pagesize() in vma_mmu_pagesize() Patch series "mm, smaps: MMUPageSize for device-dax", v3. Similar to commit `31383c6865` ("mm, hugetlbfs: introduce ->split() to vm_operations_struct") here is another occasion where we want special-case hugetlbfs/hstate enabling to also apply to device-dax. This prompts the question what other hstate conversions we might do beyond ->split() and ->pagesize(), but this appears to be the last of the usages of hstate_vma() in generic/non-hugetlbfs specific code paths. This patch (of 3): The current powerpc definition of vma_mmu_pagesize() open codes looking up the page size via hstate. It is identical to the generic vma_kernel_pagesize() implementation. Now, vma_kernel_pagesize() is growing support for determining the page size of Device-DAX vmas in addition to the existing Hugetlbfs page size determination. Ideally, if the powerpc vma_mmu_pagesize() used vma_kernel_pagesize() it would automatically benefit from any new vma-type support that is added to vma_kernel_pagesize(). However, the powerpc vma_mmu_pagesize() is prevented from calling vma_kernel_pagesize() due to a circular header dependency that requires vma_mmu_pagesize() to be defined before including <linux/hugetlb.h>. Break this circular dependency by defining the default vma_mmu_pagesize() as a __weak symbol to be overridden by the powerpc version. Link: http://lkml.kernel.org/r/151996254179.27922.2213728278535578744.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Jane Chu <jane.chu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-05 21:36:26 -07:00
Nicholas Piggin	3a52f6014d	powerpc/64s: Fix POWER9 DD2.2 and above in cputable features The CPU_FTR_POWER9_DD2_1 flag is intended to be set for DD2.1 and above (which is what the dt_cpu_ftrs setup does). Fix cputable for DD2.2 to match. This came about due to patches `b5af4f2793` ("powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2"), and `9e9626ed3a` ("powerpc/64s: Fix POWER9 DD2.2 and above in DT CPU features") being in-flight at once. The latter patch fixed dt_cpu_ftrs like this one does. The former changed cputable to match dt_cpu_ftrs. Fixes: `b5af4f2793` ("powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-05 16:28:13 +10:00
Logan Gunthorpe	ef237039c5	powerpc: io.h: move iomap.h include so that it can use readq/writeq defs Subsequent patches in this series makes use of the readq and writeq defines in iomap.h. However, as is, they get missed on the powerpc platform seeing the include comes before the define. This patch moves the include down to fix this. Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-05 14:58:52 +10:00
Naveen N. Rao	5d6a03ebc8	powerpc/hw_breakpoint: Only disable hw breakpoint if cpu supports it We get the below warning if we try to use kexec on P9: kexec_core: Starting new kernel WARNING: CPU: 0 PID: 1223 at arch/powerpc/kernel/process.c:826 __set_breakpoint+0xb4/0x140 [snip] NIP __set_breakpoint+0xb4/0x140 LR kexec_prepare_cpus_wait+0x58/0x150 Call Trace: 0xc0000000ee70fb20 (unreliable) 0xc0000000ee70fb20 default_machine_kexec+0x234/0x2c0 machine_kexec+0x84/0x90 kernel_kexec+0xd8/0xe0 SyS_reboot+0x214/0x2c0 system_call+0x58/0x6c This happens since we are trying to clear hw breakpoint on POWER9, though we don't have CPU_FTR_DAWR enabled. Guard __set_breakpoint() within hw_breakpoint_disable() with ppc_breakpoint_available() to address this. Fixes: `9654153158` ("powerpc: Disable DAWR in the base POWER9 CPU features") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-04 21:54:02 +10:00
Aneesh Kumar K.V	fb4e5dbd44	powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix With split PTL (page table lock) config, we allocate the level 4 (leaf) page table using pte fragment framework instead of slab cache like other levels. This was done to enable us to have split page table lock at the level 4 of the page table. We use page->plt backing the all the level 4 pte fragment for the lock. Currently with Radix, we use only 16 fragments out of the allocated page. In radix each fragment is 256 bytes which means we use only 4k out of the allocated 64K page wasting 60k of the allocated memory. This was done earlier to keep it closer to hash. This patch update the pte fragment count to 256, thereby using the full 64K page and reducing the memory usage. Performance tests shows really low impact even with THP disabled. With THP disabled we will be contenting further less on level 4 ptl and hence the impact should be further low. 256 threads: without patch (10 runs of ./ebizzy -m -n 1000 -s 131072 -S 100) median = 15678.5 stdev = 42.1209 with patch: median = 15354 stdev = 194.743 This is with THP disabled. With THP enabled the impact of the patch will be less. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-04 16:58:06 +10:00
Linus Torvalds	4608f06453	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next Pull sparc updates from David Miller: 1) Add support for ADI (Application Data Integrity) found in more recent sparc64 cpus. Essentially this is keyed based access to virtual memory, and if the key encoded in the virual address is wrong you get a trap. The mm changes were reviewed by Andrew Morton and others. Work by Khalid Aziz. 2) Validate DAX completion index range properly, from Rob Gardner. 3) Add proper Kconfig deps for DAX driver. From Guenter Roeck. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: sparc64: Make atomic_xchg() an inline function rather than a macro. sparc64: Properly range check DAX completion index sparc: Make auxiliary vectors for ADI available on 32-bit as well sparc64: Oracle DAX driver depends on SPARC64 sparc64: Update signal delivery to use new helper functions sparc64: Add support for ADI (Application Data Integrity) mm: Allow arch code to override copy_highpage() mm: Clear arch specific VM flags on protection change mm: Add address parameter to arch_validate_prot() sparc64: Add auxiliary vectors to report platform ADI properties sparc64: Add handler for "Memory Corruption Detected" trap sparc64: Add HV fault type handlers for ADI related faults sparc64: Add support for ADI register fields, ASIs and traps mm, swap: Add infrastructure for saving page metadata on swap signals, sparc: Add signal codes for ADI violations	2018-04-03 14:08:58 -07:00
Nicholas Piggin	f2748bdfe1	powerpc/powernv: Always stop secondaries before reboot/shutdown Currently powernv reboot and shutdown requests just leave secondaries to do their own things. This is undesirable because they can trigger any number of watchdogs while waiting for reboot, but also we don't know what else they might be doing -- they might be causing trouble, trampling memory, etc. The opal scheduled flash update code already ran into watchdog problems due to flashing taking a long time, and it was fixed with `2196c6f1ed` ("powerpc/powernv: Return secondary CPUs to firmware before FW update"), which returns secondaries to opal. It's been found that regular reboots can take over 10 seconds, which can result in the hard lockup watchdog firing, reboot: Restarting system [ 360.038896709,5] OPAL: Reboot request... Watchdog CPU:0 Hard LOCKUP Watchdog CPU:44 detected Hard LOCKUP other CPUS:16 Watchdog CPU:16 Hard LOCKUP watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0] This patch removes the special case for flash update, and calls smp_send_stop in all cases before calling reboot/shutdown. smp_send_stop could return CPUs to OPAL, the main reason not to is that the request could come from a NMI that interrupts OPAL code, so re-entry to OPAL can cause a number of problems. Putting secondaries into simple spin loops improves the chances of a successful reboot. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-03 22:59:57 +10:00
Mauricio Faria de Oliveira	e7347a8683	powerpc: Move default security feature flags This moves the definition of the default security feature flags (i.e., enabled by default) closer to the security feature flags. This can be used to restore current flags to the default flags. Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-03 21:50:08 +10:00
Aneesh Kumar K.V	a6201da34f	powerpc: Fix oops due to bad access of lppaca on bare metal Commit `8e0b634b13` ("powerpc/64s: Do not allocate lppaca if we are not virtualized") removed allocation of lppaca on bare metal platforms. But with CONFIG_PPC_SPLPAR enabled, we still access the lppaca on bare metal in some code paths. Fix this but adding runtime checks for SPLPAR (shared processor LPAR). Fixes: `8e0b634b13` ("powerpc/64s: Do not allocate lppaca if we are not virtualized") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-03 21:50:07 +10:00
Linus Torvalds	2fcd2b306a	Merge branch 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 dma mapping updates from Ingo Molnar: "This tree, by Christoph Hellwig, switches over the x86 architecture to the generic dma-direct and swiotlb code, and also unifies more of the dma-direct code between architectures. The now unused x86-only primitives are removed" * 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: dma-mapping: Don't clear GFP_ZERO in dma_alloc_attrs swiotlb: Make swiotlb_{alloc,free}_buffer depend on CONFIG_DMA_DIRECT_OPS dma/swiotlb: Remove swiotlb_{alloc,free}_coherent() dma/direct: Handle force decryption for DMA coherent buffers in common code dma/direct: Handle the memory encryption bit in common code dma/swiotlb: Remove swiotlb_set_mem_attributes() set_memory.h: Provide set_memory_{en,de}crypted() stubs x86/dma: Remove dma_alloc_coherent_gfp_flags() iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean up intel_{alloc,free}_coherent() iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}() x86/dma/amd_gart: Use dma_direct_{alloc,free}() x86/dma/amd_gart: Look at dev->coherent_dma_mask instead of GFP_DMA x86/dma: Use generic swiotlb_ops x86/dma: Use DMA-direct (CONFIG_DMA_DIRECT_OPS=y) x86/dma: Remove dma_alloc_coherent_mask()	2018-04-02 17:18:45 -07:00
Nicholas Piggin	db5ae1c155	powerpc/64s: Refine feature sets for little endian builds This reduces vmlinux text size by 1kB and data by 1.5kB with a small build! Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add the recently added CPU_FTRS_POWER9_DD2_2 to the little endian possible mask as noticed by Nick.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 22:14:40 +10:00
Nicholas Piggin	471d7ff8b5	powerpc/64s: Remove POWER4 support POWER4 has been broken since at least the change `49d09bf2a6` ("powerpc/64s: Optimise MSR handling in exception handling"), which requires mtmsrd L=1 support. This was introduced in ISA v2.01, and POWER4 supports ISA v2.00. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:50 +11:00
Nicholas Piggin	3735eb850e	powerpc: Remove unused CPU_FTR_ARCH_201 The last usage was removed in `c17b98cf60` ("KVM: PPC: Book3S HV: Remove code for PPC970 processors") (Dec 2014). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:50 +11:00
Nicholas Piggin	15a3204d24	powerpc/64s: Set assembler machine type to POWER4 Rather than override the machine type in .S code (which can hide wrong or ambiguous code generation for the target), set the type to power4 for all assembly. This also means we need to be careful not to build power4-only code when we're not building for Book3S, such as the "power7" versions of copyuser/page/memcpy. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix Book3E build, don't build the "power7" variants for non-Book3S] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:49 +11:00
Nicholas Piggin	d50614fa45	powerpc/64s: Explicitly add vector features to CPU_FTRS_POSSIBLE ALTIVEC and VSX features are not added by to default to the POWERx CPU feature sets because they are intended to be enabled by firmware. Currently they end up in CPU_FTRS_POSSIBLE due to their inclusion in other the set for other CPUs, eg. PPC970. But they should be added individually to the CPU_FTRS_POSSIBLE set, because if we reduce the set of CPUs that are built-for they may disappear from the possible mask. It already contains CPU_FTR_VSX, so add ALTIVEC. The _COMP features should be used because they won't be present if compiled out. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add detail to change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:48 +11:00
Nicholas Piggin	b842bd0f7a	powerpc/64s: Add all POWER9 features to CPU_FTRS_ALWAYS It's not a bug to have features missing in CPU_FTR_ALWAYS, but it is a missed opportunity for optimisation. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:48 +11:00
Nicholas Piggin	3d4fbffdd7	powerpc/64s/idle: POWER9 implement a separate idle stop function for hotplug Implement a new function to invoke stop, power9_offline_stop, which is like power9_idle_stop but used by the cpu hotplug code. Move KVM secondary state manipulation code to the offline case. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-04-01 00:47:46 +11:00
Michael Ellerman	f437c51748	Merge branch 'topic/paca' into next Bring in yet another series that touches KVM code, and might need to be merged into the kvm-ppc branch to resolve conflicts. This required some changes in pnv_power9_force_smt4_catch/release() due to the paca array becomming an array of pointers.	2018-03-31 09:09:36 +11:00
Aneesh Kumar K.V	872a100a49	powerpc/mm/hash: Don't memset pgd table if not needed We need to zero-out pgd table only if we share the slab cache with pud/pmd level caches. With the support of 4PB, we don't share the slab cache anymore. Instead of removing the code completely hide it within an #ifdef. We don't need to do this with any other page table level, because they all allocate table of double the size and we take of initializing the first half corrrectly during page table zap. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Consolidate multiple #if / #ifdef into one] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:39 +11:00
Aneesh Kumar K.V	c2b4d8b741	powerpc/mm/hash64: Increase the VA range This patch increases the max virtual (effective) address value to 4PB. With 4K page size config we continue to limit ourself to 64TB. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Keep the H_PGTABLE_RANGE test, update it to work] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:38 +11:00
Aneesh Kumar K.V	f384796c40	powerpc/mm: Add support for handling > 512TB address in SLB miss For addresses above 512TB we allocate additional mmu contexts. To make it all easy, addresses above 512TB are handled with IR/DR=1 and with stack frame setup. The mmu_context_t is also updated to track the new extended_ids. To support upto 4PB we need a total 8 contexts. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Minor formatting tweaks and comment wording, switch BUG to WARN in get_ea_context().] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:38 +11:00
Aneesh Kumar K.V	1a2f778970	powerpc/mm/keys: Move pte bits to correct headers Memory keys are supported only with hash translation mode. Instead of using #ifdef in generic code move the key related pte bits to respective headers Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:36 +11:00
Nicholas Piggin	0bfdf59890	powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently asm/barrier.h is not always included after asm/synch.h, which meant it was missing __SUBARCH_HAS_LWSYNC, so in some files smp_wmb() would be eieio when it should be lwsync. kernel/time/hrtimer.c is one case. __SUBARCH_HAS_LWSYNC is only used in one place, so just fold it in to where it's used. Previously with my small simulator config, 377 instances of eieio in the tree. After this patch there are 55. Fixes: `46d075be58` ("powerpc: Optimise smp_wmb") Cc: stable@vger.kernel.org # v2.6.29+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:10:34 +11:00
Nicholas Piggin	29ab6c4708	powerpc/mm: Pass node id into create_section_mapping Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Move __map_kernel_page_nid() inside #ifdef SPARSEMEM_VMEMMAP] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-31 00:07:10 +11:00
Nicholas Piggin	59f577743d	powerpc/64: Defer paca allocation until memory topology is discovered Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Rename the dummy allocate_pacas() to fix 32-bit build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-30 23:34:28 +11:00
Nicholas Piggin	9f593f131e	powerpc/setup: Add cpu_to_phys_id array Build an array that finds hardware CPU number from logical CPU number in firmware CPU discovery. Use that rather than setting paca of other CPUs directly, to begin with. Subsequent patch will not have pacas allocated at this point. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix SMP=n build by adding #ifdef in arch_match_cpu_phys_id()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-30 23:34:27 +11:00
Nicholas Piggin	9bd9be006c	powerpc/mm/numa: move numa topology discovery earlier Split sparsemem initialisation from basic numa topology discovery. Move the parsing earlier in boot, before pacas are allocated. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-30 23:34:26 +11:00
Nicholas Piggin	499dcd4137	powerpc/64s: Allocate LPPACAs individually We no longer allocate lppacas in an array, so this patch removes the 1kB static alignment for the structure, and enforces the PAPR alignment requirements at allocation time. We can not reduce the 1kB allocation size however, due to existing KVM hypervisors. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-30 23:34:24 +11:00
Nicholas Piggin	d2e60075a3	powerpc/64: Use array of paca pointers and allocate pacas individually Change the paca array into an array of pointers to pacas. Allocate pacas individually. This allows flexibility in where the PACAs are allocated. Future work will allocate them node-local. Platforms that don't have address limits on PACAs would be able to defer PACA allocations until later in boot rather than allocate all possible ones up-front then freeing unused. This is slightly more overhead (one additional indirection) for cross CPU paca references, but those aren't too common. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-30 23:34:23 +11:00
Nicholas Piggin	8e0b634b13	powerpc/64s: Do not allocate lppaca if we are not virtualized The "lppaca" is a structure registered with the hypervisor. This is unnecessary when running on non-virtualised platforms. One field from the lppaca (pmcregs_in_use) is also used by the host, so move the host part out into the paca (lppaca field is still updated in guest mode). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fix non-pseries build with some #ifdefs] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-30 23:34:22 +11:00
Radim Krčmář	27aa896281	KVM PPC update for 4.17 - Improvements for the radix page fault handler for HV KVM on POWER9. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJavHlMAAoJEJ2a6ncsY3GfWTAIANbY6b8xvhUY6c3WM1dt6o78 6MWJW1vCcsDYuM5yyIjYTds2xjY6vm7oWbo9thDdLIE9sWP2uKBDbdem8KxZb6JL t/tdzuLOYkB60BfwQL0z77UmLlHSQYF5RfJjbYVe5oXt7OU5TCe43udkHsT9QtLH HTpxvl7ebf2TIoex1+XqrD0eJ93tSOVFWB7Ay7WRUQu08CMEQRcHaszyMqdNfHfs LHoPwLgxyWf+7/zt2T4++ebfysQDFpQgsEuEBugXkaHkw6pSGi6R+BOrZwVVmcGm jiHq5+mdho3wL+B47Rt7UTjkpsMLyFbWR6TrhMD7y84/CislizKhEdnEYymKwH4= =0igD -----END PGP SIGNATURE----- Merge tag 'kvm-ppc-next-4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc KVM PPC update for 4.17 - Improvements for the radix page fault handler for HV KVM on POWER9.	2018-03-29 20:20:13 +02:00
Michael Ellerman	95dff480bb	Merge branch 'fixes' into next Merge our fixes branch from the 4.16 cycle. There were a number of important fixes merged, in particular some Power9 workarounds that we want in next for testing purposes. There's also been some conflicting changes in the CPU features code which are best merged and tested before going upstream.	2018-03-28 22:59:50 +11:00
Michael Ellerman	c0b346729b	Merge branch 'topic/ppc-kvm' into next Merge the DAWR series, which touches arch code and KVM code and may need to be merged into the kvm-ppc tree.	2018-03-27 23:55:49 +11:00
Michael Neuling	9654153158	powerpc: Disable DAWR in the base POWER9 CPU features Using the DAWR on POWER9 can cause xstops, hence we need to disable it. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:55:33 +11:00
Michael Neuling	e8ebedbf31	KVM: PPC: Book3S HV: Return error from h_set_dabr() on POWER9 POWER7 compat mode guests can use h_set_dabr on POWER9. POWER9 should use the DAWR but since it's disabled there we can't. This returns H_UNSUPPORTED on a h_set_dabr() on POWER9 where the DAWR is disabled. Current Linux guests ignore this error, so they will silently not get the DAWR (sigh). The same error code is being used by POWERVM in this case. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:55:32 +11:00
Michael Neuling	404b27d66e	powerpc: Add ppc_breakpoint_available() Add ppc_breakpoint_available() to determine if a breakpoint is available currently via the DAWR or DABR. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:52:43 +11:00
Sam Bobroff	34a286a4ac	powerpc/eeh: Add eeh_state_active() helper Checking for a "fully active" device state requires testing two flag bits, which is open coded in several places, so add a function to do it. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:45:19 +11:00
Sam Bobroff	37fd812587	powerpc/eeh: Manage EEH_PE_RECOVERING inside eeh_handle_normal_event() Currently the EEH_PE_RECOVERING flag for a PE is managed by both the caller and callee of eeh_handle_normal_event() (among other places not considered here). This is complicated by the fact that the PE may or may not have been invalidated by the call. So move the callee's handling into eeh_handle_normal_event(), which clarifies it and allows the return type to be changed to void (because it no longer needs to indicate at the PE has been invalidated). This should not change behaviour except in eeh_event_handler() where it was previously possible to cause eeh_pe_state_clear() to be called on an invalid PE, which is now avoided. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:58 +11:00
Sam Bobroff	6870178071	powerpc/eeh: Remove eeh_handle_event() The function eeh_handle_event(pe) does nothing other than switching between calling eeh_handle_normal_event(pe) and eeh_handle_special_event(). However it is only called in two places, one where pe can't be NULL and the other where it must be NULL (see eeh_event_handler()) so it does nothing but obscure the flow of control. So, remove it. Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:57 +11:00
Michael Ellerman	ff348355e9	powerpc/64s: Enhance the information in cpu_show_meltdown() Now that we have the security feature flags we can make the information displayed in the "meltdown" file more informative. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:53 +11:00
Michael Ellerman	9a868f6343	powerpc: Add security feature flags for Spectre/Meltdown This commit adds security feature flags to reflect the settings we receive from firmware regarding Spectre/Meltdown mitigations. The feature names reflect the names we are given by firmware on bare metal machines. See the hostboot source for details. Arguably these could be firmware features, but that then requires them to be read early in boot so they're available prior to asm feature patching, but we don't actually want to use them for patching. We may also want to dynamically update them in future, which would be incompatible with the way firmware features work (at the moment at least). So for now just make them separate flags. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:51 +11:00
Michael Ellerman	c4bc36628d	powerpc/pseries: Add new H_GET_CPU_CHARACTERISTICS flags Add some additional values which have been defined for the H_GET_CPU_CHARACTERISTICS hypercall. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 23:44:51 +11:00
Michael Ellerman	abf110f3e1	powerpc/rfi-flush: Make it possible to call setup_rfi_flush() again For PowerVM migration we want to be able to call setup_rfi_flush() again after we've migrated the partition. To support that we need to check that we're not trying to allocate the fallback flush area after memblock has gone away (i.e., boot-time only). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 19:25:12 +11:00
Madhavan Srinivasan	b58064da04	powerpc/perf: Infrastructure to support addition of blacklisted events Introduce code to support addition of blacklisted events for a processor version. Blacklisted events are events that are known to not count correctly on that CPU revision, and so should be prevented from being counted so as to avoid user confusion. A 'pointer' and 'int' variable to hold the number of events are added to 'struct power_pmu', along with a generic function to loop through the list to validate the given event. Generic function 'is_event_blacklisted' is called in power_pmu_event_init() to detect and reject early. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-27 19:25:10 +11:00
Michael Ellerman	a26cf1c9fe	Merge branch 'topic/ppc-kvm' into next This brings in two series from Paul, one of which touches KVM code and may need to be merged into the kvm-ppc tree to resolve conflicts.	2018-03-24 08:43:18 +11:00
Paul Mackerras	4bb3c7a020	KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9 POWER9 has hardware bugs relating to transactional memory and thread reconfiguration (changes to hardware SMT mode). Specifically, the core does not have enough storage to store a complete checkpoint of all the architected state for all four threads. The DD2.2 version of POWER9 includes hardware modifications designed to allow hypervisor software to implement workarounds for these problems. This patch implements those workarounds in KVM code so that KVM guests see a full, working transactional memory implementation. The problems center around the use of TM suspended state, where the CPU has a checkpointed state but execution is not transactional. The workaround is to implement a "fake suspend" state, which looks to the guest like suspended state but the CPU does not store a checkpoint. In this state, any instruction that would cause a transition to transactional state (rfid, rfebb, mtmsrd, tresume) or would use the checkpointed state (treclaim) causes a "soft patch" interrupt (vector 0x1500) to the hypervisor so that it can be emulated. The trechkpt instruction also causes a soft patch interrupt. On POWER9 DD2.2, we avoid returning to the guest in any state which would require a checkpoint to be present. The trechkpt in the guest entry path which would normally create that checkpoint is replaced by either a transition to fake suspend state, if the guest is in suspend state, or a rollback to the pre-transactional state if the guest is in transactional state. Fake suspend state is indicated by a flag in the PACA plus a new bit in the PSSCR. The new PSSCR bit is write-only and reads back as 0. On exit from the guest, if the guest is in fake suspend state, we still do the treclaim instruction as we would in real suspend state, in order to get into non-transactional state, but we do not save the resulting register state since there was no checkpoint. Emulation of the instructions that cause a softpatch interrupt is handled in two paths. If the guest is in real suspend mode, we call kvmhv_p9_tm_emulation_early() to handle the cases where the guest is transitioning to transactional state. This is called before we do the treclaim in the guest exit path; because we haven't done treclaim, we can get back to the guest with the transaction still active. If the instruction is a case that kvmhv_p9_tm_emulation_early() doesn't handle, or if the guest is in fake suspend state, then we proceed to do the complete guest exit path and subsequently call kvmhv_p9_tm_emulation() in host context with the MMU on. This handles all the cases including the cases that generate program interrupts (illegal instruction or TM Bad Thing) and facility unavailable interrupts. The emulation is reasonably straightforward and is mostly concerned with checking for exception conditions and updating the state of registers such as MSR and CR0. The treclaim emulation takes care to ensure that the TEXASR register gets updated as if it were the guest treclaim instruction that had done failure recording, not the treclaim done in hypervisor state in the guest exit path. With this, the KVM_CAP_PPC_HTM capability returns true (1) even if transactional memory is not available to host userspace. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-24 00:39:13 +11:00
Paul Mackerras	7672691a08	powerpc/powernv: Provide a way to force a core into SMT4 mode POWER9 processors up to and including "Nimbus" v2.2 have hardware bugs relating to transactional memory and thread reconfiguration. One of these bugs has a workaround which is to get the core into SMT4 state temporarily. This workaround is only needed when running bare-metal. This patch provides a function which gets the core into SMT4 mode by preventing threads from going to a stop state, and waking up those which are already in a stop state. Once at least 3 threads are not in a stop state, the core will be in SMT4 and we can continue. To do this, we add a "dont_stop" flag to the paca to tell the thread not to go into a stop state. If this flag is set, power9_idle_stop() just returns immediately with a return value of 0. The pnv_power9_force_smt4_catch() function does the following: 1. Set the dont_stop flag for each thread in the core, except ourselves (in fact we use an atomic_inc() in case more than one thread is calling this function concurrently). 2. See how many threads are awake, indicated by their requested_psscr field in the paca being 0. If this is at least 3, skip to step 5. 3. Send a doorbell interrupt to each thread that was seen as being in a stop state in step 2. 4. Until at least 3 threads are awake, scan the threads to which we sent a doorbell interrupt and check if they are awake now. This relies on the following properties: - Once dont_stop is non-zero, requested_psccr can't go from zero to non-zero, except transiently (and without the thread doing stop). - requested_psscr being zero guarantees that the thread isn't in a state-losing stop state where thread reconfiguration could occur. - Doing stop with a PSSCR value of 0 won't be a state-losing stop and thus won't allow thread reconfiguration. - Once threads_per_core/2 + 1 (i.e. 3) threads are awake, the core must be in SMT4 mode, since SMT modes are powers of 2. This does add a sync to power9_idle_stop(), which is necessary to provide the correct ordering between setting requested_psscr and checking dont_stop. The overhead of the sync should be unnoticeable compared to the latency of going into and out of a stop state. Because some objected to incurring this extra latency on systems where the XER[SO] bug is not relevant, I have put the test in power9_idle_stop inside a feature section. This means that pnv_power9_force_smt4_catch() WILL NOT WORK correctly on systems without the CPU_FTR_P9_TM_XER_SO_BUG feature bit set, and will probably hang the system. In order to cater for uses where the caller has an operation that has to be done while the core is in SMT4, the core continues to be kept in SMT4 after pnv_power9_force_smt4_catch() function returns, until the pnv_power9_force_smt4_release() function is called. It undoes the effect of step 1 above and allows the other threads to go into a stop state. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-24 00:39:11 +11:00
Paul Mackerras	b5af4f2793	powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2 This adds a CPU feature bit which is set for POWER9 "Nimbus" DD2.2 processors which will be used to enable the hypervisor to assist hardware with the handling of checkpointed register values while the CPU is in suspend state, in order to work around hardware bugs. The hardware assistance for these workarounds introduced a new hardware bug relating to the XER[SO] bit. We add a separate feature bit for this bug in case future chips fix it while still requiring the hypervisor assistance with suspend state. When the dt_cpu_ftrs subsystem is in use, the software assistance can be enabled using a "tm-suspend-hypervisor-assist" node in the device tree, and a "tm-suspend-xer-so-bug" node enables the workarounds for the XER[SO] bug. In the absence of such nodes, a quirk enables both for POWER9 "Nimbus" DD2.2 processors. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-24 00:39:09 +11:00
Paul Mackerras	9bbf0b576d	powerpc: Free up CPU feature bits on 64-bit machines This moves all the CPU feature bits that are only used on 32-bit machines to the top 20 bits of the CPU feature word and arranges for them to be defined only in 32-bit builds. The features that are common to 32-bit and 64-bit machines are moved to bits 0-11 of the CPU feature word. This means that for 64-bit platforms, bits 44-63 can now be used for new features that only exist on 64-bit machines. (These bit numbers are counting from the right, i.e. the LSB is bit 0.) Because CPU_FTR_L3_DISABLE_NAP moved from the low 16 bits to the high 16 bits, we have to adjust some assembly code. Also, CPU_FTR_EMB_HV moved from the high 16 bits to the low 16 bits. Note that CPU_FTR_REAL_LE only applies to 64-bit chips, because only 64-bit chips (POWER6, 7, 8, 9) have a true little-endian mode that is a CPU execution mode as opposed to being a page attribute. With this we now have 20 free CPU feature bits on 64-bit machines. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-24 00:38:51 +11:00
Paul Mackerras	dd0efb3f11	powerpc: Book E: Remove unused CPU_FTR_L2CSR bit The CPU_FTR_L2CSR bit is never tested anywhere, so let's reclaim the bit. The last usage was removed in `86d63363de` ("powerpc/e500mc: Remove dead L2 flushing code in idle_e500.S") (Jun 2015). Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-24 00:38:00 +11:00
Paul Mackerras	c0d64cf9fe	powerpc: Use feature bit for RTC presence rather than timebase presence All PowerPC CPUs other than the original PPC601 have a timebase register rather than the "real-time clock" (RTC) register that the PPC601 (and the original POWER and POWER2 CPUs) had. Currently we have a CPU feature bit to indicate the presence of the timebase, but it makes more sense to use a bit to indicate the unusual situation rather than the common situation. This therefore defines a CPU_FTR_USE_RTC bit in place of the CPU_FTR_USE_TB bit, and arranges for it to be set on PPC601 systems. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-24 00:36:45 +11:00
Aneesh Kumar K.V	a5d4b5891c	powerpc/mm: Fixup tlbie vs store ordering issue on POWER9 On POWER9, under some circumstances, a broadcast TLB invalidation might complete before all previous stores have drained, potentially allowing stale stores from becoming visible after the invalidation. This works around it by doubling up those TLB invalidations which was verified by HW to be sufficient to close the risk window. This will be documented in a yet-to-be-published errata. Fixes: `1a472c9dba` ("powerpc/mm/radix: Add tlbflush routines") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Enable the feature in the DT CPU features code for all Power9, rename the feature to CPU_FTR_P9_TLBIE_BUG per benh.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-23 20:48:03 +11:00
Aneesh Kumar K.V	99491e2d0e	powerpc/mm/radix: Remove unused code These function are not used in the code. Remove them. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-23 16:17:39 +11:00
Benjamin Herrenschmidt	aff6f8cb3e	powerpc/mm: Add tracking of the number of coprocessors using a context Currently, when using coprocessors (which use the Nest MMU), we simply increment the active_cpu count to force all TLB invalidations to be come broadcast. Unfortunately, due to an errata in POWER9, we will need to know more specifically that coprocessors are in use. This maintains a separate copros counter in the MMU context for that purpose. NB. The commit mentioned in the fixes tag below is not at fault for the bug we're fixing in this commit and the next, but this fix applies on top the infrastructure it introduced. Fixes: `03b8abedf4` ("cxl: Enable global TLBIs for cxl contexts") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-23 14:14:31 +11:00
Christoph Hellwig	b6e05477c1	dma/direct: Handle the memory encryption bit in common code Give the basic phys_to_dma() and dma_to_phys() helpers a __-prefix and add the memory encryption mask to the non-prefixed versions. Use the __-prefixed versions directly instead of clearing the mask again in various places. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-13-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 10:01:59 +01:00
Matt Brown	31513207ce	powerpc: Remove unused flush_dcache_phys_range() The flush_dcache_phys_range() function is no longer used in the kernel. The last usage was removed in `c40785ad30` ("powerpc/dart: Use a cachable DART"). This patch removes the function and declaration. Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com> [mpe: Munge change log, include commit that removed last user] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-20 16:47:53 +11:00
Matt Brown	751ba79cc5	lib/raid6/altivec: Add vpermxor implementation for raid6 Q syndrome This patch uses the vpermxor instruction to optimise the raid6 Q syndrome. This instruction was made available with POWER8, ISA version 2.07. It allows for both vperm and vxor instructions to be done in a single instruction. This has been tested for correctness on a ppc64le vm with a basic RAID6 setup containing 5 drives. The performance benchmarks are from the raid6test in the /lib/raid6/test directory. These results are from an IBM Firestone machine with ppc64le architecture. The benchmark results show a 35% speed increase over the best existing algorithm for powerpc (altivec). The raid6test has also been run on a big-endian ppc64 vm to ensure it also works for big-endian architectures. Performance benchmarks: raid6: altivecx4 gen() 18773 MB/s raid6: altivecx8 gen() 19438 MB/s raid6: vpermxor4 gen() 25112 MB/s raid6: vpermxor8 gen() 26279 MB/s Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> [mpe: Add VPERMXOR macro so we can build with old binutils] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-20 16:47:25 +11:00
Paul Mackerras	39c983ea0f	KVM: PPC: Remove unused kvm_unmap_hva callback Since commit `fb1522e099` ("KVM: update to new mmu_notifier semantic v2", 2017-08-31), the MMU notifier code in KVM no longer calls the kvm_unmap_hva callback. This removes the PPC implementations of kvm_unmap_hva(). Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-03-19 10:08:29 +11:00
Khalid Aziz	9035cf9a97	mm: Add address parameter to arch_validate_prot() A protection flag may not be valid across entire address space and hence arch_validate_prot() might need the address a protection bit is being set on to ensure it is a valid protection flag. For example, sparc processors support memory corruption detection (as part of ADI feature) flag on memory addresses mapped on to physical RAM but not on PFN mapped pages or addresses mapped on to devices. This patch adds address to the parameters being passed to arch_validate_prot() so protection bits can be validated in the relevant context. Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Khalid Aziz <khalid@gonehiking.org> Reviewed-by: Anthony Yznaga <anthony.yznaga@oracle.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-18 07:38:47 -07:00
Nicholas Piggin	014a32b30e	powerpc/mm/slice: remove radix calls to the slice code This is a tidy up which removes radix MMU calls into the slice code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:08 +11:00
Nicholas Piggin	5709f7cfd8	powerpc/mm/slice: implement a slice mask cache Calculating the slice mask can become a signifcant overhead for get_unmapped_area. This patch adds a struct slice_mask for each page size in the mm_context, and keeps these in synch with the slices psize arrays and slb_addr_limit. On Book3S/64 this adds 288 bytes to the mm_context_t for the slice mask caches. On POWER8, this increases vfork+exec+exit performance by 9.9% and reduces time to mmap+munmap a 64kB page by 28%. Reduces time to mmap+munmap by about 10% on 8xx. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:06 +11:00
Nicholas Piggin	1753dd1830	powerpc/mm/slice: Simplify and optimise slice context initialisation The slice state of an mm gets zeroed then initialised upon exec. This is the only caller of slice_set_user_psize now, so that can be removed and instead implement a faster and simplified approach that requires no locking or checking existing state. This speeds up vfork+exec+exit performance on POWER8 by 3%. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:05 +11:00
Michael Ellerman	ab83dc794c	powerpc/xmon: Move empty plpar_set_ciabr() into plpar_wrappers.h Now that plpar_wrappers.h has an #ifdef PSERIES we can move the empty version of plpar_set_ciabr() which xmon wants into there. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:04 +11:00
Michael Ellerman	7c09c1869c	powerpc: Rename plapr routines to plpar Back in 2013 we added some hypercall wrappers which misspelled "plpar" (P-series Logical PARtition) as "plapr". Visually they're hard to distinguish and it almost doesn't matter, but it is confusing when grepping to miss some calls because of the typo. They've also started spreading, so before they take over let's fix them all to be "plpar". Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:04 +11:00
Michael Ellerman	5017e875e4	powerpc/pseries: Make plpar_wrappers.h safe to include when PSERIES=n Currently plpar_wrappers.h is not safe to include when CONFIG_PPC_PSERIES=n, or at least it can be depending on other config options and so on. Fix that by wrapping the entire content in an ifdef. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:04 +11:00
Michael Ellerman	16560e8832	powerpc/pseries: Move smp_query_cpu_stopped() etc. out of plpar_wrappers.h smp_query_cpu_stopped() and related #defines are currently in plpar_wrappers.h. The function actually does an RTAS call, not an hcall, and basically has nothing to do with plpar_wrappers.h Move it into pseries.h, where it can easily be used by the only two callers in pseries/smp.c and pseries/hotplug-cpu.c. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 23:43:03 +11:00
Mathieu Malaterre	e82d70cf96	powerpc/32: Add missing prototypes for (early\|machine)_init() early_init() and machine_init() have no prototype, add one in asm-prototypes.h. Fixes the following warnings (treated as error in W=1): arch/powerpc/kernel/setup_32.c:68:30: error: no previous prototype for ‘early_init’ arch/powerpc/kernel/setup_32.c:99:21: error: no previous prototype for ‘machine_init’ Signed-off-by: Mathieu Malaterre <malat@debian.org> [mpe: Move them to asm-prototypes.h, drop other functions] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:42 +11:00
Mathieu Malaterre	ef85dffd42	powerpc: Avoid comparison of unsigned long >= 0 in __access_ok() Rewrite function-like macro into regular static inline function to avoid a warning during macro expansion. Fix warning (treated as error in W=1): ./arch/powerpc/include/asm/uaccess.h:52:35: error: comparison of unsigned expression >= 0 is always true (((size) == 0) \|\| (((size) - 1) <= ((segment).seg - (addr))))) ^ Suggested-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Mathieu Malaterre <malat@debian.org> Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:41 +11:00
Mathieu Malaterre	603b892200	powerpc: Avoid comparison of unsigned long >= 0 in pfn_valid() Rewrite comparison since all values compared are of type `unsigned long`. Instead of using unsigned properties and rewriting the original code as: (originally suggested by Segher Boessenkool <segher@kernel.crashing.org>) #define pfn_valid(pfn) \ (((pfn) - ARCH_PFN_OFFSET) < (max_mapnr - ARCH_PFN_OFFSET)) Prefer a static inline function to make code as readable as possible. Fix a warning (treated as error in W=1): arch/powerpc/include/asm/page.h:129:32: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits] #define pfn_valid(pfn) ((pfn) >= ARCH_PFN_OFFSET && (pfn) < max_mapnr) ^ Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:41 +11:00
Mathieu Malaterre	bf7fb32dd5	powerpc: Add missing prototypes for ppc_select() & ppc_fadvise64_64() Add missing prototypes for ppc_select() & ppc_fadvise64_64() to header asm-prototypes.h. Fix the following warnings (treated as errors in W=1) arch/powerpc/kernel/syscalls.c:87:1: error: no previous prototype for ‘ppc_select’ arch/powerpc/kernel/syscalls.c:119:6: error: no previous prototype for ‘ppc_fadvise64_64’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:40 +11:00
Mathieu Malaterre	b0d876da1d	powerpc: Add missing prototypes for hw_breakpoint_handler() & arch_unregister_hw_breakpoint() In commit `5aae8a5370` ("powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors") function hw_breakpoint_handler() and arch_unregister_hw_breakpoint() were added without function prototypes in hw_breakpoint.h header. Fix the following warning(s) (treated as error in W=1): arch/powerpc/kernel/hw_breakpoint.c:106:6: error: no previous prototype for ‘arch_unregister_hw_breakpoint’ arch/powerpc/kernel/hw_breakpoint.c:209:5: error: no previous prototype for ‘hw_breakpoint_handler’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:39 +11:00
Mathieu Malaterre	0d60619e1c	powerpc: Add missing prototype for sys_debug_setcontext() In commit `81e7009ea4` ("powerpc: merge ppc signal.c and ppc64 signal32.c") the function sys_debug_setcontext was added without a prototype. Fix compilation warning (treated as error in W=1): arch/powerpc/kernel/signal_32.c:1227:5: error: no previous prototype for ‘sys_debug_setcontext’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:38 +11:00
Mathieu Malaterre	23a6d8b963	powerpc: Add missing prototype for init_IRQ() A function init_IRQ() was added without a prototype declared in header irq.h. Fix the following warning (treated as error in W=1): arch/powerpc/kernel/irq.c:662:13: error: no previous prototype for ‘init_IRQ’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:38 +11:00
Mathieu Malaterre	f5246862f8	powerpc: Add missing prototype for arch_irq_work_raise() In commit `4f8b50bbbe` ("irq_work, ppc: Fix up arch hooks") a new function arch_irq_work_raise() was added without a prototype in header irq_work.h. Fix the following warning (treated as error in W=1): arch/powerpc/kernel/time.c:523:6: error: no previous prototype for ‘arch_irq_work_raise’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:37 +11:00
Mathieu Malaterre	fd70d9f96d	powerpc: Add missing prototype for arch_dup_task_struct() In commit `55ccf3fe3f` ("fork: move the real prepare_to_copy() users to arch_dup_task_struct()") a new arch_dup_task_struct() was added without a prototype declared in thread_info.h header. Fix the following warning (treated as error in W=1): arch/powerpc/kernel/process.c:1609:5: error: no previous prototype for ‘arch_dup_task_struct’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:37 +11:00
Mathieu Malaterre	848092faa0	powerpc: Add missing prototype for time_init() The function time_init did not have a prototype defined in the time.h header. Fix the following warning (treated as error in W=1): arch/powerpc/kernel/time.c:1068:13: error: no previous prototype for ‘time_init’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:36 +11:00
Mathieu Malaterre	8b604faff7	powerpc: Add missing prototype for hdec_interrupt In commit `dabe859ec6` ("powerpc: Give hypervisor decrementer interrupts their own handler") an empty body function was added, but no prototype was declared. Fix warning (treated as error in W=1): arch/powerpc/kernel/time.c:629:6: error: no previous prototype for ‘hdec_interrupt’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:35 +11:00
Mathieu Malaterre	45b4d27a38	powerpc: Add missing prototype for slb_miss_bad_addr() In commit `f0f558b131` ("powerpc/mm: Preserve CFAR value on SLB miss caused by access to bogus address"), the function slb_miss_bad_addr() was added without a prototype. This commit adds it. Fix a warning (treated as error in W=1): arch/powerpc/kernel/traps.c:1498:6: error: no previous prototype for ‘slb_miss_bad_addr’ Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:35 +11:00
Mathieu Malaterre	1cdf039bf8	powerpc/kernel: Make function __giveup_fpu() static __giveup_fpu() is never called outside process.c, so it can be static. That also means we don't need an empty definition in switch_to.h Signed-off-by: Mathieu Malaterre <malat@debian.org> [mpe: Also drop the empty version, rewrite change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:35 +11:00
Mathieu Malaterre	65e13c202d	powerpc/epapr: Move register keyword at the beginning of declaration Fix warning for all register unsigned long (0,3-12) that appear during W=1 compilation: ./arch/powerpc/include/asm/epapr_hcalls.h:479:2: warning: ‘register’ is not at beginning of declaration [-Wold-style-declaration] unsigned long register r[\d] asm("r[\d]"); Signed-off-by: Mathieu Malaterre <malat@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:32 +11:00
Philippe Bergheaud	d6a90bb83b	powerpc/powernv: Enable tunneled operations P9 supports PCI tunneled operations (atomics and as_notify). This patch adds support for tunneled operations on powernv, with a new API, to be called by device drivers: pnv_pci_enable_tunnel() Enable tunnel operations, tell driver the 16-bit ASN indication used by kernel. pnv_pci_disable_tunnel() Disable tunnel operations. pnv_pci_set_tunnel_bar() Tell kernel the Tunnel BAR Response address used by driver. This function uses two new OPAL calls, as the PBCQ Tunnel BAR register is configured by skiboot. pnv_pci_get_as_notify_info() Return the ASN info of the thread to be woken up. Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-13 15:50:30 +11:00
Wanpeng Li	a4429e53c9	KVM: Introduce paravirtualization hints and KVM_HINTS_DEDICATED This patch introduces kvm_para_has_hint() to query for hints about the configuration of the guests. The first hint KVM_HINTS_DEDICATED, is set if the guest has dedicated physical CPUs for each vCPU (i.e. pinning and no over-commitment). This allows optimizing spinlocks and tells the guest to avoid PV TLB flush. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-06 18:40:44 +01:00
Christophe Leroy	4bd13772ee	powerpc/8xx: Increase number of slices to 64 On the 8xx, the minimum slice size is the size of the area covered by a single PMD entry, ie 4M in 4K pages mode and 64M in 16K pages mode. This patch increases the number of slices from 16 to 64 on the 8xx. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-06 09:21:24 +11:00
Christophe Leroy	15472423ce	powerpc/mm/slice: Allow up to 64 low slices While the implementation of the "slices" address space allows a significant amount of high slices, it limits the number of low slices to 16 due to the use of a single u64 low_slices_psize element in struct mm_context_t On the 8xx, the minimum slice size is the size of the area covered by a single PMD entry, ie 4M in 4K pages mode and 64M in 16K pages mode. This means we could have at least 64 slices. In order to override this limitation, this patch switches the handling of low_slices_psize to char array as done already for high_slices_psize. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-06 09:21:23 +11:00
Christophe Leroy	aa0ab02ba9	powerpc/mm/slice: Fix hugepage allocation at hint address on 8xx On the 8xx, the page size is set in the PMD entry and applies to all pages of the page table pointed by the said PMD entry. When an app has some regular pages allocated (e.g. see below) and tries to mmap() a huge page at a hint address covered by the same PMD entry, the kernel accepts the hint allthough the 8xx cannot handle different page sizes in the same PMD entry. 10000000-10001000 r-xp 00000000 00:0f 2597 /root/malloc 10010000-10011000 rwxp 00000000 00:0f 2597 /root/malloc mmap(0x10080000, 524288, PROT_READ\|PROT_WRITE, MAP_PRIVATE\|MAP_ANONYMOUS\|0x40000, -1, 0) = 0x10080000 This results the app remaining forever in do_page_fault()/hugetlb_fault() and when interrupting that app, we get the following warning: [162980.035629] WARNING: CPU: 0 PID: 2777 at arch/powerpc/mm/hugetlbpage.c:354 hugetlb_free_pgd_range+0xc8/0x1e4 [162980.035699] CPU: 0 PID: 2777 Comm: malloc Tainted: G W 4.14.6 #85 [162980.035744] task: c67e2c00 task.stack: c668e000 [162980.035783] NIP: c000fe18 LR: c00e1eec CTR: c00f90c0 [162980.035830] REGS: c668fc20 TRAP: 0700 Tainted: G W (4.14.6) [162980.035854] MSR: 00029032 <EE,ME,IR,DR,RI> CR: 24044224 XER: 20000000 [162980.036003] [162980.036003] GPR00: c00e1eec c668fcd0 c67e2c00 00000010 c6869410 10080000 00000000 77fb4000 [162980.036003] GPR08: ffff0001 0683c001 00000000 ffffff80 44028228 10018a34 00004008 418004fc [162980.036003] GPR16: c668e000 00040100 c668e000 c06c0000 c668fe78 c668e000 c6835ba0 c668fd48 [162980.036003] GPR24: 00000000 73ffffff 74000000 00000001 77fb4000 100fffff 10100000 10100000 [162980.036743] NIP [c000fe18] hugetlb_free_pgd_range+0xc8/0x1e4 [162980.036839] LR [c00e1eec] free_pgtables+0x12c/0x150 [162980.036861] Call Trace: [162980.036939] [c668fcd0] [c00f0774] unlink_anon_vmas+0x1c4/0x214 (unreliable) [162980.037040] [c668fd10] [c00e1eec] free_pgtables+0x12c/0x150 [162980.037118] [c668fd40] [c00eabac] exit_mmap+0xe8/0x1b4 [162980.037210] [c668fda0] [c0019710] mmput.part.9+0x20/0xd8 [162980.037301] [c668fdb0] [c001ecb0] do_exit+0x1f0/0x93c [162980.037386] [c668fe00] [c001f478] do_group_exit+0x40/0xcc [162980.037479] [c668fe10] [c002a76c] get_signal+0x47c/0x614 [162980.037570] [c668fe70] [c0007840] do_signal+0x54/0x244 [162980.037654] [c668ff30] [c0007ae8] do_notify_resume+0x34/0x88 [162980.037744] [c668ff40] [c000dae8] do_user_signal+0x74/0xc4 [162980.037781] Instruction dump: [162980.037821] 7fdff378 81370000 54a3463a 80890020 7d24182e 7c841a14 712a0004 4082ff94 [162980.038014] 2f890000 419e0010 712a0ff0 408200e0 <0fe00000> 54a9000a 7f984840 419d0094 [162980.038216] ---[ end trace c0ceeca8e7a5800a ]--- [162980.038754] BUG: non-zero nr_ptes on freeing mm: 1 [162985.363322] BUG: non-zero nr_ptes on freeing mm: -1 In order to fix this, this patch uses the address space "slices" implemented for BOOK3S/64 and enhanced to support PPC32 by the preceding patch. This patch modifies the context.id on the 8xx to be in the range [1:16] instead of [0:15] in order to identify context.id == 0 as not initialised contexts as done on BOOK3S This patch activates CONFIG_PPC_MM_SLICES when CONFIG_HUGETLB_PAGE is selected for the 8xx Alltough we could in theory have as many slices as PMD entries, the current slices implementation limits the number of low slices to 16. This limitation is not preventing us to fix the initial issue allthough it is suboptimal. It will be cured in a subsequent patch. Fixes: `4b91428699` ("powerpc/8xx: Implement support of hugepages") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-06 09:21:23 +11:00
Christophe Leroy	db3a528db4	powerpc/mm/slice: Enhance for supporting PPC32 In preparation for the following patch which will fix an issue on the 8xx by re-using the 'slices', this patch enhances the 'slices' implementation to support 32 bits CPUs. On PPC32, the address space is limited to 4Gbytes, hence only the low slices will be used. The high slices use bitmaps. As bitmap functions are not prepared to handle bitmaps of size 0, this patch ensures that bitmap functions are called only when SLICE_NUM_HIGH is not nul. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-06 09:21:23 +11:00
Christophe Leroy	a3286f05bc	powerpc/mm/slice: create header files dedicated to slices In preparation for the following patch which will enhance 'slices' for supporting PPC32 in order to fix an issue on hugepages on 8xx, this patch takes out of page*.h all bits related to 'slices' and put them into newly created slice.h header files. While common parts go into asm/slice.h, subarch specific parts go into respective books3s/64/slice.c and nohash/64/slice.c 'slices' Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-03-06 09:21:22 +11:00
David Woodhouse	28f8f1833b	powerpc/pci: Use generic pci_mmap_resource_range() Commit `f719582435` ("PCI: Add pci_mmap_resource_range() and use it for ARM64") added this generic function with the intent of using it everywhere and ultimately killing the old arch-specific implementations. Remove the powerpc-specific pci_mmap_page_range() and use the generic pci_mmap_resource_range() instead. Powerpc can mmap I/O port space, so supply the powerpc-specific pci_iobar_pfn() required to make that work. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>	2018-02-28 16:18:53 -06:00
Michael Ellerman	5539d31a04	powerpc/pseries: Fix duplicate firmware feature for DRC_INFO We had a mid-air collision between two new firmware features, DRMEM_V2 and DRC_INFO, and they ended up with the same value. No one's actually reported any problems, presumably because the new firmware that supports both properties is not widely available, and the two properties tend to be enabled together. Still if we ever had one enabled but not the other, the bugs that could result are many and varied. So fix it. Fixes: `3f38000eda` ("powerpc/firmware: Add definitions for new drc-info firmware feature") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>	2018-02-22 14:32:32 +11:00
Corentin Labbe	c1e150ceb6	powerpc/pseries: Add empty update_numa_cpu_lookup_table() for NUMA=n When CONFIG_NUMA is not set, the build fails with: arch/powerpc/platforms/pseries/hotplug-cpu.c:335:4: error: déclaration implicite de la fonction « update_numa_cpu_lookup_table » So we have to add update_numa_cpu_lookup_table() as an empty function when CONFIG_NUMA is not set. Fixes: `1d9a090783` ("powerpc/numa: Invalidate numa_cpu_lookup_table on cpu remove") Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-02-15 10:10:02 +11:00
Linus Torvalds	694a20dae6	powerpc fixes for 4.16 #2 A larger batch of fixes than we'd like. Roughly 1/3 fixes for new code, 1/3 fixes for stable and 1/3 minor things. There's four commits fixing bugs when using 16GB huge pages on hash, caused by some of the preparatory changes for pkeys. Two fixes for bugs in the enhanced IRQ soft masking for local_t, one of which broke KVM in some circumstances. Four fixes for Power9. The most bizarre being a bug where futexes stopped working because a NULL pointer dereference didn't trap during early boot (it aliased the kernel mapping). A fix for memory hotplug when using the Radix MMU, and a fix for live migration of guests using the Radix MMU. Two fixes for hotplug on pseries machines. One where we weren't correctly updating NUMA info when CPUs are added and removed. And the other fixes crashes/hangs seen when doing memory hot remove during boot, which is apparently a thing people do. Finally a handful of build fixes for obscure configs and other minor fixes. Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Balbir Singh, Colin Ian King, Daniel Henrique Barboza, Florian Weimer, Guenter Roeck, Harish, Laurent Vivier, Madhavan Srinivasan, Mauricio Faria de Oliveira, Nathan Fontenot, Nicholas Piggin, Sam Bobroff. -----BEGIN PGP SIGNATURE----- iQIwBAABCAAaBQJahDTmExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAd chAAtVe8hmkEJefTbU63GBeqva0JHSiTu2DENZAlN/epWtbtyl05PLETMdTcwGCv nK2zzR+xbSFN1DzZK8KQfDBW33McKZE+YkHwYOC8Kff/N0SKdHK4zvxYr7FTZGzG 9uSG5vrxVEsPLT/yANabl0d0vKWMsJ1jZquvJAU0eLNUbA/skGjEPADtXqYQUXiA EnW4xeczsMLjuzTleoRqrBx74Gulovuq9LVAjfDvkydWlCU9MQkrodCgP0V2hQtw RAJ/QLY+NS/vMCBnvVOGBaKzIqrfeQTHF3P0j4pyBeBq/2kNuidM5n25uoc31wUq DE4Ebe2FJA6CHP5KEyf7dr9y7gsks/ak3/CKs+l6Yz3/0BqenEMhu6WKJ1tgf9cC qAmi1dIjtpw6JZ6baCbkloUdAGNjKVfLWB9ld9VIfg0C+C3y4L7+TKJukxrCBGI6 hopfT/3p8xUdla3euiRXRLZzajyKDGrqk71hk5J/J0ChXfWB0B51X0F6NIfH41Mn YsVUQ95p3zS79Pl942ijGScFX/bNVLfEEGzlI/nwU/wbTxF5g/XNXm5PjBsGSr/W zFcCwCpFV2b/kypQoxQA5CbrKRCLOleDA/lLOxW/1NMYOQsNj05DM9wYAw5Bl+lX AVj2c5jM9heNN4scxDiufRNfqZbyjZ4fFUpXLNqs7N5vcks= =BmuL -----END PGP SIGNATURE----- Merge tag 'powerpc-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A larger batch of fixes than we'd like. Roughly 1/3 fixes for new code, 1/3 fixes for stable and 1/3 minor things. There's four commits fixing bugs when using 16GB huge pages on hash, caused by some of the preparatory changes for pkeys. Two fixes for bugs in the enhanced IRQ soft masking for local_t, one of which broke KVM in some circumstances. Four fixes for Power9. The most bizarre being a bug where futexes stopped working because a NULL pointer dereference didn't trap during early boot (it aliased the kernel mapping). A fix for memory hotplug when using the Radix MMU, and a fix for live migration of guests using the Radix MMU. Two fixes for hotplug on pseries machines. One where we weren't correctly updating NUMA info when CPUs are added and removed. And the other fixes crashes/hangs seen when doing memory hot remove during boot, which is apparently a thing people do. Finally a handful of build fixes for obscure configs and other minor fixes. Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Balbir Singh, Colin Ian King, Daniel Henrique Barboza, Florian Weimer, Guenter Roeck, Harish, Laurent Vivier, Madhavan Srinivasan, Mauricio Faria de Oliveira, Nathan Fontenot, Nicholas Piggin, Sam Bobroff" * tag 'powerpc-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: selftests/powerpc: Fix to use ucontext_t instead of struct ucontext powerpc/kdump: Fix powernv build break when KEXEC_CORE=n powerpc/pseries: Fix build break for SPLPAR=n and CPU hotplug powerpc/mm/hash64: Zero PGD pages on allocation powerpc/mm/hash64: Store the slot information at the right offset for hugetlb powerpc/mm/hash64: Allocate larger PMD table if hugetlb config is enabled powerpc/mm: Fix crashes with 16G huge pages powerpc/mm: Flush radix process translations when setting MMU type powerpc/vas: Don't set uses_vas for kernel windows powerpc/pseries: Enable RAS hotplug events later powerpc/mm/radix: Split linear mapping on hot-unplug powerpc/64s/radix: Boot-time NULL pointer protection using a guard-PID ocxl: fix signed comparison with less than zero powerpc/64s: Fix may_hard_irq_enable() for PMI soft masking powerpc/64s: Fix MASKABLE_RELON_EXCEPTION_HV_OOL macro powerpc/numa: Invalidate numa_cpu_lookup_table on cpu remove	2018-02-14 10:06:41 -08:00
Guenter Roeck	9109617545	powerpc/kdump: Fix powernv build break when KEXEC_CORE=n If KEXEC_CORE is not enabled, powernv builds fail as follows. arch/powerpc/platforms/powernv/smp.c: In function 'pnv_smp_cpu_kill_self': arch/powerpc/platforms/powernv/smp.c:236:4: error: implicit declaration of function 'crash_ipi_callback' Add dummy function calls, similar to kdump_in_progress(), to solve the problem. Fixes: `4145f35864` ("powernv/kdump: Fix cases where the kdump kernel can get HMI's") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-02-13 22:39:37 +11:00
Guenter Roeck	82343484a2	powerpc/pseries: Fix build break for SPLPAR=n and CPU hotplug Commit `e67e02a544` ("powerpc/pseries: Fix cpu hotplug crash with memoryless nodes") adds an unconditional call to find_and_online_cpu_nid(), which is only declared if CONFIG_PPC_SPLPAR is enabled. This results in the following build error if this is not the case. arch/powerpc/platforms/pseries/hotplug-cpu.o: In function `dlpar_online_cpu': arch/powerpc/platforms/pseries/hotplug-cpu.c:369: undefined reference to `.find_and_online_cpu_nid' Follow the guideline provided by similar functions and provide a dummy function if CONFIG_PPC_SPLPAR is not enabled. This also moves the external function declaration into an include file where it should be. Fixes: `e67e02a544` ("powerpc/pseries: Fix cpu hotplug crash with memoryless nodes") Signed-off-by: Guenter Roeck <linux@roeck-us.net> [mpe: Change subject to emphasise the build fix] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-02-13 22:38:39 +11:00
Aneesh Kumar K.V	fc5c2f4a55	powerpc/mm/hash64: Zero PGD pages on allocation On powerpc we allocate page table pages from slab caches of different sizes. Currently we have a constructor that zeroes out the objects when we allocate them for the first time. We expect the objects to be zeroed out when we free the the object back to slab cache. This happens in the unmap path. For hugetlb pages we call huge_pte_get_and_clear() to do that. With the current configuration of page table size, both PUD and PGD level tables are allocated from the same slab cache. At the PUD level, we use the second half of the table to store the slot information. But we never clear that when unmapping. When such a freed object is then allocated for a PGD page, the second half of the page table page will not be zeroed as expected. This results in a kernel crash. Fix it by always clearing PGD pages when they're allocated. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Change log wording and formatting, add whitespace] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-02-13 22:37:48 +11:00
Aneesh Kumar K.V	ff31e10546	powerpc/mm/hash64: Store the slot information at the right offset for hugetlb The hugetlb pte entries are at the PMD and PUD level, so we can't use PTRS_PER_PTE to find the second half of the page table. Use the right offset for PUD/PMD to get to the second half of the table. Fixes: `bf9a95f9a6` ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-02-13 22:37:48 +11:00
Aneesh Kumar K.V	4a7aa4fecb	powerpc/mm/hash64: Allocate larger PMD table if hugetlb config is enabled We use the second half of the page table to store slot information, so we must allocate it always if hugetlb is possible. Fixes: `bf9a95f9a6` ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-02-13 22:37:48 +11:00
Aneesh Kumar K.V	fae2211697	powerpc/mm: Fix crashes with 16G huge pages To support memory keys, we moved the hash pte slot information to the second half of the page table. This was ok with PTE entries at level 4 (PTE page) and level 3 (PMD). We already allocate larger page table pages at those levels to accomodate extra details. For level 4 we already have the extra space which was used to track 4k hash page table entry details and at level 3 the extra space was allocated to track the THP details. With hugetlbfs PTE, we used this extra space at the PMD level to store the slot details. But we also support hugetlbfs PTE at PUD level for 16GB pages and PUD level page didn't allocate extra space. This resulted in memory corruption. Fix this by allocating extra space at PUD level when HUGETLB is enabled. Fixes: `bf9a95f9a6` ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-02-13 22:37:47 +11:00
Linus Torvalds	15303ba5d1	KVM changes for 4.16 ARM: - Include icache invalidation optimizations, improving VM startup time - Support for forwarded level-triggered interrupts, improving performance for timers and passthrough platform devices - A small fix for power-management notifiers, and some cosmetic changes PPC: - Add MMIO emulation for vector loads and stores - Allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without requiring the complex thread synchronization of older CPU versions - Improve the handling of escalation interrupts with the XIVE interrupt controller - Support decrement register migration - Various cleanups and bugfixes. s390: - Cornelia Huck passed maintainership to Janosch Frank - Exitless interrupts for emulated devices - Cleanup of cpuflag handling - kvm_stat counter improvements - VSIE improvements - mm cleanup x86: - Hypervisor part of SEV - UMIP, RDPID, and MSR_SMI_COUNT emulation - Paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit - Allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more AVX512 features - Show vcpu id in its anonymous inode name - Many fixes and cleanups - Per-VCPU MSR bitmaps (already merged through x86/pti branch) - Stable KVM clock when nesting on Hyper-V (merged through x86/hyperv) -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJafvMtAAoJEED/6hsPKofo6YcH/Rzf2RmshrWaC3q82yfIV0Qz Z8N8yJHSaSdc3Jo6cmiVj0zelwAxdQcyjwlT7vxt5SL2yML+/Q0st9Hc3EgGGXPm Il99eJEl+2MYpZgYZqV8ff3mHS5s5Jms+7BITAeh6Rgt+DyNbykEAvzt+MCHK9cP xtsIZQlvRF7HIrpOlaRzOPp3sK2/MDZJ1RBE7wYItK3CUAmsHim/LVYKzZkRTij3 /9b4LP1yMMbziG+Yxt1o682EwJB5YIat6fmDG9uFeEVI5rWWN7WFubqs8gCjYy/p FX+BjpOdgTRnX+1m9GIj0Jlc/HKMXryDfSZS07Zy4FbGEwSiI5SfKECub4mDhuE= =C/uD -----END PGP SIGNATURE----- Merge tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Radim Krčmář: "ARM: - icache invalidation optimizations, improving VM startup time - support for forwarded level-triggered interrupts, improving performance for timers and passthrough platform devices - a small fix for power-management notifiers, and some cosmetic changes PPC: - add MMIO emulation for vector loads and stores - allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without requiring the complex thread synchronization of older CPU versions - improve the handling of escalation interrupts with the XIVE interrupt controller - support decrement register migration - various cleanups and bugfixes. s390: - Cornelia Huck passed maintainership to Janosch Frank - exitless interrupts for emulated devices - cleanup of cpuflag handling - kvm_stat counter improvements - VSIE improvements - mm cleanup x86: - hypervisor part of SEV - UMIP, RDPID, and MSR_SMI_COUNT emulation - paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit - allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more AVX512 features - show vcpu id in its anonymous inode name - many fixes and cleanups - per-VCPU MSR bitmaps (already merged through x86/pti branch) - stable KVM clock when nesting on Hyper-V (merged through x86/hyperv)" * tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits) KVM: PPC: Book3S: Add MMIO emulation for VMX instructions KVM: PPC: Book3S HV: Branch inside feature section KVM: PPC: Book3S HV: Make HPT resizing work on POWER9 KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code KVM: PPC: Book3S PR: Fix broken select due to misspelling KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs() KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled KVM: PPC: Book3S HV: Drop locks before reading guest memory kvm: x86: remove efer_reload entry in kvm_vcpu_stat KVM: x86: AMD Processor Topology Information x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested kvm: embed vcpu id to dentry of vcpu anon inode kvm: Map PFN-type memory regions as writable (if possible) x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n KVM: arm/arm64: Fixup userspace irqchip static key optimization KVM: arm/arm64: Fix userspace_irqchip_in_use counting KVM: arm/arm64: Fix incorrect timer_is_pending logic MAINTAINERS: update KVM/s390 maintainers MAINTAINERS: add Halil as additional vfio-ccw maintainer MAINTAINERS: add David as a reviewer for KVM/s390 ...	2018-02-10 13:16:35 -08:00
Radim Krčmář	1ab03c072f	Second PPC KVM update for 4.16 Seven fixes that are either trivial or that address bugs that people are actually hitting. The main ones are: - Drop spinlocks before reading guest memory - Fix a bug causing corruption of VCPU state in PR KVM with preemption enabled - Make HPT resizing work on POWER9 - Add MMIO emulation for vector loads and stores, because guests now use these instructions in memcpy and similar routines. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJafWn0AAoJEJ2a6ncsY3GfaMsIANF0hQD8SS78WNKnoy0vnZ/X PUXdjwHEsfkg5KdQ7o0oaa2BJHHqO3vozddmMiG14r2L1mNCHJpnVZCVV0GaEJcZ eU8++OPK6yrsPNNpAjnrtQ0Vk4LwzoT0bftEjS3TtLt1s2uSo+R1+HLmxbxGhQUX bZngo9wQ3cjUfAXLrPtAVhE5wTmgVOiufVRyfRsBRdFzRsAWqjY4hBtJAfwdff4r AA5H0RCrXO6e1feKr5ElU8KzX6b7IjH9Xu868oJ1r16zZfE05PBl1X5n4XG7XDm7 xWvs8uLAB7iRv2o/ecFznYJ+Dz1NCBVzD0RmAUTqPCcVKDrxixaTkqMPFW97IAA= =HOJR -----END PGP SIGNATURE----- Merge tag 'kvm-ppc-next-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc Second PPC KVM update for 4.16 Seven fixes that are either trivial or that address bugs that people are actually hitting. The main ones are: - Drop spinlocks before reading guest memory - Fix a bug causing corruption of VCPU state in PR KVM with preemption enabled - Make HPT resizing work on POWER9 - Add MMIO emulation for vector loads and stores, because guests now use these instructions in memcpy and similar routines.	2018-02-09 22:03:06 +01:00
Jose Ricardo Ziviani	09f984961c	KVM: PPC: Book3S: Add MMIO emulation for VMX instructions This patch provides the MMIO load/store vector indexed X-Form emulation. Instructions implemented: lvx: the quadword in storage addressed by the result of EA & 0xffff_ffff_ffff_fff0 is loaded into VRT. stvx: the contents of VRS are stored into the quadword in storage addressed by the result of EA & 0xffff_ffff_ffff_fff0. Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com> Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-02-09 16:51:51 +11:00
Nicholas Piggin	6cc3f91bf6	powerpc/64s: Fix may_hard_irq_enable() for PMI soft masking The soft IRQ masking code has to hard-disable interrupts in cases where the exception is not cleared by the masked handler. External interrupts used this approach for soft masking. Now recently PMU interrupts do the same thing. The soft IRQ masking code additionally allowed for interrupt handlers to hard-enable interrupts after soft-disabling them. The idea is to allow PMU interrupts through to profile interrupt handlers. So when interrupts are being replayed when there is a pending interrupt that requires hard-disabling, there is a test to prevent those handlers from hard-enabling them if there is a pending external interrupt. may_hard_irq_enable() handles this. After `f442d00480` ("powerpc/64s: Add support to mask perf interrupts and replay them"), may_hard_irq_enable() could prematurely enable MSR[EE] when a PMU exception exists, which would result in the interrupt firing again while masked, and MSR[EE] being disabled again. I haven't seen that this could cause a serious problem, but it's more consistent to handle these soft-masked interrupts in the same way. So introduce a define for all types of interrupts that require MSR[EE] masking in their soft-disable handlers, and use that in may_hard_irq_enable(). Fixes: `f442d00480` ("powerpc/64s: Add support to mask perf interrupts and replay them") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-02-08 23:56:10 +11:00
Madhavan Srinivasan	5c11d1e52d	powerpc/64s: Fix MASKABLE_RELON_EXCEPTION_HV_OOL macro Commit `f14e953b19` ("powerpc/64s: Add support to take additional parameter in MASKABLE_* macro") messed up MASKABLE_RELON_EXCEPTION_HV_OOL macro by adding the wrong SOFTEN test which caused guest kernel crash at boot. Patch to fix the macro to use SOFTEN_TEST_HV instead of SOFTEN_NOTEST_HV. Fixes: `f14e953b19` ("powerpc/64s: Add support to take additional parameter in MASKABLE_* macro") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Fix-Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-02-08 23:56:10 +11:00
Nathan Fontenot	1d9a090783	powerpc/numa: Invalidate numa_cpu_lookup_table on cpu remove When DLPAR removing a CPU, the unmapping of the cpu from a node in unmap_cpu_from_node() should also invalidate the CPUs entry in the numa_cpu_lookup_table. There is not a guarantee that on a subsequent DLPAR add of the CPU the associativity will be the same and thus could be in a different node. Invalidating the entry in the numa_cpu_lookup_table causes the associativity to be read from the device tree at the time of the add. The current behavior of not invalidating the CPUs entry in the numa_cpu_lookup_table can result in scenarios where the the topology layout of CPUs in the partition does not match the device tree or the topology reported by the HMC. This bug looks like it was introduced in 2004 in the commit titled "ppc64: cpu hotplug notifier for numa", which is 6b15e4e87e32 in the linux-fullhist tree. Hence tag it for all stable releases. Cc: stable@vger.kernel.org Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-02-08 23:56:10 +11:00
Ingo Molnar	8284507916	Merge branch 'linus' into sched/urgent, to resolve conflicts Conflicts: arch/arm64/kernel/entry.S arch/x86/Kconfig include/linux/sched/mm.h kernel/fork.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-02-06 21:12:31 +01:00
Mathieu Desnoyers	c5f58bd58f	membarrier: Provide GLOBAL_EXPEDITED command Allow expedited membarrier to be used for data shared between processes through shared memory. Processes wishing to receive the membarriers register with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED. Those which want to issue membarrier invoke MEMBARRIER_CMD_GLOBAL_EXPEDITED. This allows extremely simple kernel-level implementation: we have almost everything we need with the PRIVATE_EXPEDITED barrier code. All we need to do is to add a flag in the mm_struct that will be used to check whether we need to send the IPI to the current thread of each CPU. There is a slight downside to this approach compared to targeting specific shared memory users: when performing a membarrier operation, all registered "global" receivers will get the barrier, even if they don't share a memory mapping with the sender issuing MEMBARRIER_CMD_GLOBAL_EXPEDITED. This registration approach seems to fit the requirement of not disturbing processes that really deeply care about real-time: they simply should not register with MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED. In order to align the membarrier command names, the "MEMBARRIER_CMD_SHARED" command is renamed to "MEMBARRIER_CMD_GLOBAL", keeping an alias of MEMBARRIER_CMD_SHARED to MEMBARRIER_CMD_GLOBAL for UAPI header backward compatibility. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Link: http://lkml.kernel.org/r/20180129202020.8515-5-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-02-05 21:34:31 +01:00
Mathieu Desnoyers	3ccfebedd8	powerpc, membarrier: Skip memory barrier in switch_mm() Allow PowerPC to skip the full memory barrier in switch_mm(), and only issue the barrier when scheduling into a task belonging to a process that has registered to use expedited private. Threads targeting the same VM but which belong to different thread groups is a tricky case. It has a few consequences: It turns out that we cannot rely on get_nr_threads(p) to count the number of threads using a VM. We can use (atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1) instead to skip the synchronize_sched() for cases where the VM only has a single user, and that user only has a single thread. It also turns out that we cannot use for_each_thread() to set thread flags in all threads using a VM, as it only iterates on the thread group. Therefore, test the membarrier state variable directly rather than relying on thread flags. This means membarrier_register_private_expedited() needs to set the MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows private expedited membarrier commands to succeed. membarrier_arch_switch_mm() now tests for the MEMBARRIER_STATE_PRIVATE_EXPEDITED flag. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Andrew Hunter <ahh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Avi Kivity <avi@scylladb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Dave Watson <davejwatson@fb.com> Cc: David Sehr <sehr@google.com> Cc: Greg Hackmann <ghackmann@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maged Michael <maged.michael@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20180129202020.8515-3-mathieu.desnoyers@efficios.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-02-05 21:34:02 +01:00
Linus Torvalds	03f51d4efa	powerpc updates for 4.16 Highlights: - Enable support for memory protection keys aka "pkeys" on Power7/8/9 when using the hash table MMU. - Extend our interrupt soft masking to support masking PMU interrupts as well as "normal" interrupts, and then use that to implement local_t for a ~4x speedup vs the current atomics-based implementation. - A new driver "ocxl" for "Open Coherent Accelerator Processor Interface (OpenCAPI)" devices. - Support for new device tree properties on PowerVM to describe hotpluggable memory and devices. - Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 64-bit VDSO. - Freescale updates from Scott: "Contains fixes for CPM GPIO and an FSL PCI erratum workaround, plus a minor cleanup patch." As well as quite a lot of other changes all over the place, and small fixes and cleanups as always. Thanks to: Alan Modra, Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andreas Schwab, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual, Anton Blanchard, Arnd Bergmann, Balbir Singh, Benjamin Herrenschmidt, Bhaktipriya Shridhar, Bryant G. Ly, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Cyril Bur, David Gibson, Desnes A. Nunes do Rosario, Dmitry Torokhov, Frederic Barrat, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo A. R. Silva, Gustavo Romero, Ivan Mikhaylov, Joakim Tjernlund, Joe Perches, Josh Poimboeuf, Juan J. Alvarez, Julia Cartwright, Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Malaterre, Michael Bringmann, Michael Hanselmann, Michael Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai, Russell Currey, Santosh Sivaraj, Scott Wood, Seth Forshee, Simon Guo, Stewart Smith, Sukadev Bhattiprolu, Thiago Jung Bauermann, Vaibhav Jain, Vasyl Gomonovych. -----BEGIN PGP SIGNATURE----- iQIwBAABCAAaBQJadF6wExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYA2 nBAAnguCEyAIYpc+ffE3WU9xJEWxa6bKuVufHcUFVntGiGD+igmMS+SHp4ay3Aos HcA4WFrpzNb2KZ++kmFWtAKWnMfCiW9xuYJNicjr7X5ZiVBEhLWN/mQCwBKs3p6L 5+HhvytcdkKVbEcyVjEGvRL40AyxXNOI02o6Co9X8vanHsmWB4q0eWe4PHstZqlg 6K6kazMp+NTvEFYwKNXDOvuHouKSL57l14SLROH7CpJkNTOQ9s+W59/LmnuCjRlu o70b7iWOAEbF9tvMma1ksDZVNj7mSyaymLYCyOXu4CkuuleJacZYJ9oQGNddoIbC wk7l93vPT/yze7DYg8x3uXpKcaDEvEepPuQ/ubz+UXFQWuJtl5ej6Cv+0eOmyZIs +bjWhGHKdNttnsiPlTRCX/gWD13RE1dB6xXJlfOJ7Oz9OnXXK8ZKc1NTREbQXRWM 8tClAwf9upWpm86GHPVnyrgYbgZo5b1os4SoS8e3kESzakrQVQP7J376u2DtccRq 2AGqjJ+tl5tYPnhm8zG1cNrpqHHpgkNGqLS7DvWRg3EPmEKVQcltN1b/0aKaAjHA aTRofjrVo+jJ4MX1uyEo59yNCEQPfjkmHRQdLwm+xjWTzEPfIMzpWyXm14tawDQf OjcAe90W/qQ18brw4z+2BI14J76XziOSX/QcunOn1u/sqaM= =3rYn -----END PGP SIGNATURE----- Merge tag 'powerpc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights: - Enable support for memory protection keys aka "pkeys" on Power7/8/9 when using the hash table MMU. - Extend our interrupt soft masking to support masking PMU interrupts as well as "normal" interrupts, and then use that to implement local_t for a ~4x speedup vs the current atomics-based implementation. - A new driver "ocxl" for "Open Coherent Accelerator Processor Interface (OpenCAPI)" devices. - Support for new device tree properties on PowerVM to describe hotpluggable memory and devices. - Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 64-bit VDSO. - Freescale updates from Scott: fixes for CPM GPIO and an FSL PCI erratum workaround, plus a minor cleanup patch. As well as quite a lot of other changes all over the place, and small fixes and cleanups as always. Thanks to: Alan Modra, Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andreas Schwab, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual, Anton Blanchard, Arnd Bergmann, Balbir Singh, Benjamin Herrenschmidt, Bhaktipriya Shridhar, Bryant G. Ly, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Cyril Bur, David Gibson, Desnes A. Nunes do Rosario, Dmitry Torokhov, Frederic Barrat, Geert Uytterhoeven, Guilherme G. Piccoli, Gustavo A. R. Silva, Gustavo Romero, Ivan Mikhaylov, Joakim Tjernlund, Joe Perches, Josh Poimboeuf, Juan J. Alvarez, Julia Cartwright, Kamalesh Babulal, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Malaterre, Michael Bringmann, Michael Hanselmann, Michael Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Philippe Bergheaud, Ram Pai, Russell Currey, Santosh Sivaraj, Scott Wood, Seth Forshee, Simon Guo, Stewart Smith, Sukadev Bhattiprolu, Thiago Jung Bauermann, Vaibhav Jain, Vasyl Gomonovych" * tag 'powerpc-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (199 commits) powerpc/mm/radix: Fix build error when RADIX_MMU=n macintosh/ams-input: Use true and false for boolean values macintosh: change some data types from int to bool powerpc/watchdog: Print the NIP in soft_nmi_interrupt() powerpc/watchdog: regs can't be null in soft_nmi_interrupt() powerpc/watchdog: Tweak watchdog printks powerpc/cell: Remove axonram driver rtc-opal: Fix handling of firmware error codes, prevent busy loops powerpc/mpc52xx_gpt: make use of raw_spinlock variants macintosh/adb: Properly mark continued kernel messages powerpc/pseries: Fix cpu hotplug crash with memoryless nodes powerpc/numa: Ensure nodes initialized for hotplug powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes powerpc/kernel: Block interrupts when updating TIDR powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn powerpc/mm/nohash: do not flush the entire mm when range is a single page powerpc/pseries: Add Initialization of VF Bars powerpc/pseries/pci: Associate PEs to VFs in configure SR-IOV powerpc/eeh: Add EEH notify resume sysfs powerpc/eeh: Add EEH operations to notify resume ...	2018-02-02 10:01:04 -08:00
Linus Torvalds	3879ae653a	The core framework has a handful of patches this time around, mostly due to the clk rate protection support added by Jerome Brunet. This feature will allow consumers to lock in a certain rate on the output of a clk so that things like audio playback don't hear pops when the clk frequency changes due to shared parent clks changing rates. Currently the clk API doesn't guarantee the rate of a clk stays at the rate you request after clk_set_rate() is called, so this new API will allow drivers to express that requirement. Beyond this, the core got some debugfs pretty printing patches and a couple minor non-critical fixes. Looking outside of the core framework diff we have some new driver additions and the removal of a legacy TI clk driver. Both of these hit high in the dirstat. Also, the removal of the asm-generic/clkdev.h file causes small one-liners in all the architecture Kbuild files. Overall, the driver diff seems to be the normal stuff that comes all the time to fix little problems here and there and to support new hardware. Core: - Clk rate protection - Symbolic clk flags in debugfs output - Clk registration enabled clks while doing bookkeeping updates New Drivers: - Spreadtrum SC9860 - HiSilicon hi3660 stub - Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS - Amlogic Meson-AXG - ASPEED BMC Removed Drivers: - TI OMAP 3xxx legacy clk (non-DT) support - asm/clkdev.h got removed (not really a driver) Updates: - Renesas FDP1-0 module clock on R-Car M3-W - Renesas LVDS module clock on R-Car V3M - Misc fixes to pr_err() prints - Qualcomm MSM8916 audio fixes - Qualcomm IPQ8074 rounded out support for more peripherals - Qualcomm Alpha PLL variants - Divider code was using container_of() on bad pointers - Allwinner DE2 clks on H3 - Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED - Mediatek clk driver compile test support - AT91 PMC clk suspend/resume restoration support - PLL issues fixed on si5351 - Broadcom IProc PLL calculation updates - DVFS support for Armada mvebu CPU clks - Allwinner fixed post-divider support - TI clkctrl fixes and support for newer SoCs -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJac5vRAAoJEK0CiJfG5JUlUaIP/Riq0tbApfc4k4GMvSvaieR/ AwZFIMCxOxO+KGdUsBWj7UUoDfBYmxyknHZkVUA/m+Lm7cRH/YHHMghEceZLaBYW zPQmDfkTl/QkwysXZMCw9vg4vO0tt5gWbHljQnvVhxVVTCkIRpaE8Vkktj1RZzpY WU/TkvPbVGY3SNm504TRXKWC9KpMTEXVvzqlg6zLDJ/jE7PGzBKtewqMoLDCBH2L q6b50BSXDo2Hep0vm6e5xneXKjLNR4kgN4PkbM4Yoi4iWLLbgAu79NfyOvvr/imS HxOHRms9tejtyaiR6bQSF0pbLOERZ3QSbMFEbxdxnCTuPEfy3Nw/2W7mNJlhJa8g EGLMnLL4WdloL4Z83dAcMrj9OmxYf7Yobf5dMidLrQT5EYuafdj0ParbI8TQpWSB eTqaffSUGPE/7xuKouYBcbvocpXXWCcokrP/mEn3OEHXkIeeut1Jd3RmEvsi3gtJ pNraJTIpvt4c05rj6yLUOhWfyqlA+fH3p4Fx3rrH1tmKEiG+lrhKoxF26uALZe0V OvarhG+LPIE10pCIYlQjZjQVnYLGCxsGAIoK1uz7VYvFPh2T0cxQlzzeqFgrlTyN 32hMj3LhkQw82FG9xZqjTX1935R35mySRlx63x7HStI1YFief2X9+RHjJR/lofG0 nC0JWTp5sC/pKf54QBXj =bGPp -----END PGP SIGNATURE----- Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk updates from Stephen Boyd: "The core framework has a handful of patches this time around, mostly due to the clk rate protection support added by Jerome Brunet. This feature will allow consumers to lock in a certain rate on the output of a clk so that things like audio playback don't hear pops when the clk frequency changes due to shared parent clks changing rates. Currently the clk API doesn't guarantee the rate of a clk stays at the rate you request after clk_set_rate() is called, so this new API will allow drivers to express that requirement. Beyond this, the core got some debugfs pretty printing patches and a couple minor non-critical fixes. Looking outside of the core framework diff we have some new driver additions and the removal of a legacy TI clk driver. Both of these hit high in the dirstat. Also, the removal of the asm-generic/clkdev.h file causes small one-liners in all the architecture Kbuild files. Overall, the driver diff seems to be the normal stuff that comes all the time to fix little problems here and there and to support new hardware. Summary: Core: - Clk rate protection - Symbolic clk flags in debugfs output - Clk registration enabled clks while doing bookkeeping updates New Drivers: - Spreadtrum SC9860 - HiSilicon hi3660 stub - Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS - Amlogic Meson-AXG - ASPEED BMC Removed Drivers: - TI OMAP 3xxx legacy clk (non-DT) support - asm/clkdev.h got removed (not really a driver) Updates: - Renesas FDP1-0 module clock on R-Car M3-W - Renesas LVDS module clock on R-Car V3M - Misc fixes to pr_err() prints - Qualcomm MSM8916 audio fixes - Qualcomm IPQ8074 rounded out support for more peripherals - Qualcomm Alpha PLL variants - Divider code was using container_of() on bad pointers - Allwinner DE2 clks on H3 - Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED - Mediatek clk driver compile test support - AT91 PMC clk suspend/resume restoration support - PLL issues fixed on si5351 - Broadcom IProc PLL calculation updates - DVFS support for Armada mvebu CPU clks - Allwinner fixed post-divider support - TI clkctrl fixes and support for newer SoCs" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (125 commits) clk: aspeed: Handle inverse polarity of USB port 1 clock gate clk: aspeed: Fix return value check in aspeed_cc_init() clk: aspeed: Add reset controller clk: aspeed: Register gated clocks clk: aspeed: Add platform driver and register PLLs clk: aspeed: Register core clocks clk: Add clock driver for ASPEED BMC SoCs clk: mediatek: adjust dependency of reset.c to avoid unexpectedly being built clk: fix reentrancy of clk_enable() on UP systems clk: meson-axg: fix potential NULL dereference in axg_clkc_probe() clk: Simplify debugfs registration clk: Fix debugfs_create_*() usage clk: Show symbolic clock flags in debugfs clk: renesas: r8a7796: Add FDP clock clk: Move __clk_{get,put}() into private clk.h API clk: sunxi: Use CLK_IS_CRITICAL flag for critical clks clk: Improve flags doc for of_clk_detect_critical() arch: Remove clkdev.h asm-generic from Kbuild clk: sunxi-ng: a83t: Add M divider to TCON1 clock clk: Prepare to remove asm-generic/clkdev.h ...	2018-02-01 16:56:07 -08:00
Linus Torvalds	ab486bc9a5	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Add a console_msg_format command line option: The value "default" keeps the old "[time stamp] text\n" format. The value "syslog" allows to see the syslog-like "<log level>[timestamp] text" format. This feature was requested by people doing regression tests, for example, 0day robot. They want to have both filtered and full logs at hands. - Reduce the risk of softlockup: Pass the console owner in a busy loop. This is a new approach to the old problem. It was first proposed by Steven Rostedt on Kernel Summit 2017. It marks a context in which the console_lock owner calls console drivers and could not sleep. On the other side, printk() callers could detect this state and use a busy wait instead of a simple console_trylock(). Finally, the console_lock owner checks if there is a busy waiter at the end of the special context and eventually passes the console_lock to the waiter. The hand-off works surprisingly well and helps in many situations. Well, there is still a possibility of the softlockup, for example, when the flood of messages stops and the last owner still has too much to flush. There is increasing number of people having problems with printk-related softlockups. We might eventually need to get better solution. Anyway, this looks like a good start and promising direction. - Do not allow to schedule in console_unlock() called from printk(): This reverts an older controversial commit. The reschedule helped to avoid softlockups. But it also slowed down the console output. This patch is obsoleted by the new console waiter logic described above. In fact, the reschedule made the hand-off less effective. - Deprecate "%pf" and "%pF" format specifier: It was needed on ia64, ppc64 and parisc64 to dereference function descriptors and show the real function address. It is done transparently by "%ps" and "pS" format specifier now. Sergey Senozhatsky found that all the function descriptors were in a special elf section and could be easily detected. - Remove printk_symbol() API: It has been obsoleted by "%pS" format specifier, and this change helped to remove few continuous lines and a less intuitive old API. - Remove redundant memsets: Sergey removed unnecessary memset when processing printk.devkmsg command line option. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (27 commits) printk: drop redundant devkmsg_log_str memsets printk: Never set console_may_schedule in console_trylock() printk: Hide console waiter logic into helpers printk: Add console owner and waiter logic to load balance console writes kallsyms: remove print_symbol() function checkpatch: add pF/pf deprecation warning symbol lookup: introduce dereference_symbol_descriptor() parisc64: Add .opd based function descriptor dereference powerpc64: Add .opd based function descriptor dereference ia64: Add .opd based function descriptor dereference sections: split dereference_function_descriptor() openrisc: Fix conflicting types for _exext and _stext lib: do not use print_symbol() irq debug: do not use print_symbol() sysfs: do not use print_symbol() drivers: do not use print_symbol() x86: do not use print_symbol() unicore32: do not use print_symbol() sh: do not use print_symbol() mn10300: do not use print_symbol() ...	2018-02-01 13:36:15 -08:00
Radim Krčmář	d2b9b2079e	PPC KVM update for 4.16 - Allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without requiring the complex thread synchronization that earlier CPU versions required. - A series from Ben Herrenschmidt to improve the handling of escalation interrupts with the XIVE interrupt controller. - Provide for the decrementer register to be copied across on migration. - Various minor cleanups and bugfixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJaYXViAAoJEJ2a6ncsY3GfDhgIAIDVBZH/Ftq7eJiUSxDpqyCQ DF/x7fNKzK/J33pu+3ntOI2gZsldExAy7vH2M27I4qLIkbI5y3vu4v8l3CDlS1LK 9dKi72zg7baozoVF5mGUNm0B1sSvZiIQlC/kaami2aPTF1GcrJ561GthzfZwxENX TSLqOA4LkeUZh2tUsvbcUrPi6v+E4Em2lgacQcx2ioMblWz56sZu79VsUbSSw/a3 P8+pIv7EbHw+TrOZMehjCbZkOdBeZ3IRLJsdlIAfe7y4vWME/5b9uVnQS/+XQj/B 6f3rQrduGvF2P6GMjsm8gDkgE5oZ1zbKlgO4i5WApnu80MMLFlfEUN+GWuGJ95Q= =OjGs -----END PGP SIGNATURE----- Merge tag 'kvm-ppc-next-4.16-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc PPC KVM update for 4.16 - Allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without requiring the complex thread synchronization that earlier CPU versions required. - A series from Ben Herrenschmidt to improve the handling of escalation interrupts with the XIVE interrupt controller. - Provide for the decrementer register to be copied across on migration. - Various minor cleanups and bugfixes.	2018-02-01 16:13:07 +01:00
Alexander Graf	07ae5389e9	KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled When copying between the vcpu and svcpu, we may get scheduled away onto a different host CPU which in turn means our svcpu pointer may change. That means we need to atomically copy to and from the svcpu with preemption disabled, so that all code around it always sees a coherent state. Reported-by: Simon Guo <wei.guo.simon@gmail.com> Fixes: `3d3319b45e` ("KVM: PPC: Book3S: PR: Enable interrupts earlier") Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-02-01 13:35:33 +11:00
Aneesh Kumar K.V	423ac9af3c	mm/thp: remove pmd_huge_split_prepare() Instead of marking the pmd ready for split, invalidate the pmd. This should take care of powerpc requirement. Only side effect is that we mark the pmd invalid early. This can result in us blocking access to the page a bit longer if we race against a thp split. [kirill.shutemov@linux.intel.com: rebased, dirty THP once] Link: http://lkml.kernel.org/r/20171213105756.69879-13-kirill.shutemov@linux.intel.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Daney <david.daney@cavium.com> Cc: David Miller <davem@davemloft.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Nitin Gupta <nitin.m.gupta@oracle.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-01-31 17:18:38 -08:00
Aneesh Kumar K.V	8cc931e033	powerpc/mm: update pmdp_invalidate to return old pmd value It's required to avoid losing dirty and accessed bits. Link: http://lkml.kernel.org/r/20171213105756.69879-7-kirill.shutemov@linux.intel.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-01-31 17:18:37 -08:00
Linus Torvalds	2382dc9a3e	dma mapping changes for Linux 4.16: This pull requests contains a consolidation of the generic no-IOMMU code, a well as the glue code for swiotlb. All the code is based on the x86 implementation with hooks to allow all architectures that aren't cache coherent to use it. The x86 conversion itself has been deferred because the x86 maintainers were a little busy in the last months. -----BEGIN PGP SIGNATURE----- iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlpxcVoLHGhjaEBsc3Qu ZGUACgkQD55TZVIEUYN/Lw/+Je9teM4NPQ8lU/ncbJN/bUzCFGJ6dFt2eVX/6xs3 sfl8vBdeHt6CBM02rRNecEr31z3+orjQes5JnlEJFYeG3jumV0zCPw/zbxqjzbJ1 3n6cckLxbxzy8Ca1G/BVjHLAUX5eWp1ujn/Q4d03VKVQZhJvFYlqDbP3TrNVx7xn k86u37p/o+ngjwX66UdZ3C4iIBF8zqy6n2kkpv4HUQtHHzPwEvliN39eNilovb56 iGOzjDX1UWHAu4xCTVnPHSG4fA4XU41NWzIN3DIVPE25lYSISSl9TFAdR8GeZA0G 0Yj6sW53pRSoUwco1ocoS44/FgrPOB5/vHIL06pABvicXBiomje1QylqcK7zAczk esjkfPEZrmZuu99GtqFyDNKEvKKdy+aBGaTZ3y+NxsuBs+0xS2Owz1IE4Tk28xaw xh7zn+CVdk2fJh6ZIdw5Eu9b9VN08UriqDmDzO/ylDlcNGcDi7wcxiSTEkHJ1ON/ g9nletV6f3egL0wljDcOnhCJCHTvmWEeq3z8lE55QzPzSH0hHpnGQ2WD0tKrroxz kjOZp0TdXa4F5iysOHe2xl2sftOH0zIkBQJ+oBcK12mTaLu21+yeuCggQXJ/CBdk 1Ol7l9g9T0TDuZPfiTHt5+6jmECQs92LElWA8x7uF7Fpix3BpnafWaaSMSsosF3F D1Y= =Nrl9 -----END PGP SIGNATURE----- Merge tag 'dma-mapping-4.16' of git://git.infradead.org/users/hch/dma-mapping Pull dma mapping updates from Christoph Hellwig: "Except for a runtime warning fix from Christian this is all about consolidation of the generic no-IOMMU code, a well as the glue code for swiotlb. All the code is based on the x86 implementation with hooks to allow all architectures that aren't cache coherent to use it. The x86 conversion itself has been deferred because the x86 maintainers were a little busy in the last months" * tag 'dma-mapping-4.16' of git://git.infradead.org/users/hch/dma-mapping: (57 commits) MAINTAINERS: add the iommu list for swiotlb and xen-swiotlb arm64: use swiotlb_alloc and swiotlb_free arm64: replace ZONE_DMA with ZONE_DMA32 mips: use swiotlb_{alloc,free} mips/netlogic: remove swiotlb support tile: use generic swiotlb_ops tile: replace ZONE_DMA with ZONE_DMA32 unicore32: use generic swiotlb_ops ia64: remove an ifdef around the content of pci-dma.c ia64: clean up swiotlb support ia64: use generic swiotlb_ops ia64: replace ZONE_DMA with ZONE_DMA32 swiotlb: remove various exports swiotlb: refactor coherent buffer allocation swiotlb: refactor coherent buffer freeing swiotlb: wire up ->dma_supported in swiotlb_dma_ops swiotlb: add common swiotlb_map_ops swiotlb: rename swiotlb_free to swiotlb_exit x86: rename swiotlb_dma_ops powerpc: rename swiotlb_dma_ops ...	2018-01-31 11:32:27 -08:00
Linus Torvalds	d4173023e6	Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo cleanups from Eric Biederman: "Long ago when 2.4 was just a testing release copy_siginfo_to_user was made to copy individual fields to userspace, possibly for efficiency and to ensure initialized values were not copied to userspace. Unfortunately the design was complex, it's assumptions unstated, and humans are fallible and so while it worked much of the time that design failed to ensure unitialized memory is not copied to userspace. This set of changes is part of a new design to clean up siginfo and simplify things, and hopefully make the siginfo handling robust enough that a simple inspection of the code can be made to ensure we don't copy any unitializied fields to userspace. The design is to unify struct siginfo and struct compat_siginfo into a single definition that is shared between all architectures so that anyone adding to the set of information shared with struct siginfo can see the whole picture. Hopefully ensuring all future si_code assignments are arch independent. The design is to unify copy_siginfo_to_user32 and copy_siginfo_from_user32 so that those function are complete and cope with all of the different cases documented in signinfo_layout. I don't think there was a single implementation of either of those functions that was complete and correct before my changes unified them. The design is to introduce a series of helpers including force_siginfo_fault that take the values that are needed in struct siginfo and build the siginfo structure for their callers. Ensuring struct siginfo is built correctly. The remaining work for 4.17 (unless someone thinks it is post -rc1 material) is to push usage of those helpers down into the architectures so that architecture specific code will not need to deal with the fiddly work of intializing struct siginfo, and then when struct siginfo is guaranteed to be fully initialized change copy siginfo_to_user into a simple wrapper around copy_to_user. Further there is work in progress on the issues that have been documented requires arch specific knowledge to sort out. The changes below fix or at least document all of the issues that have been found with siginfo generation. Then proceed to unify struct siginfo the 32 bit helpers that copy siginfo to and from userspace, and generally clean up anything that is not arch specific with regards to siginfo generation. It is a lot but with the unification you can of siginfo you can already see the code reduction in the kernel" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (45 commits) signal/memory-failure: Use force_sig_mceerr and send_sig_mceerr mm/memory_failure: Remove unused trapno from memory_failure signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap signal: Helpers for faults with specialized siginfo layouts signal: Add send_sig_fault and force_sig_fault signal: Replace memset(info,...) with clear_siginfo for clarity signal: Don't use structure initializers for struct siginfo signal/arm64: Better isolate the COMPAT_TASK portion of ptrace_hbptriggered ptrace: Use copy_siginfo in setsiginfo and getsiginfo signal: Unify and correct copy_siginfo_to_user32 signal: Remove the code to clear siginfo before calling copy_siginfo_from_user32 signal: Unify and correct copy_siginfo_from_user32 signal/blackfin: Remove pointless UID16_SIGINFO_COMPAT_NEEDED signal/blackfin: Move the blackfin specific si_codes to asm-generic/siginfo.h signal/tile: Move the tile specific si_codes to asm-generic/siginfo.h signal/frv: Move the frv specific si_codes to asm-generic/siginfo.h signal/ia64: Move the ia64 specific si_codes to asm-generic/siginfo.h signal/powerpc: Remove redefinition of NSIGTRAP on powerpc signal: Move addr_lsb into the _sigfault union for clarity ...	2018-01-30 14:18:52 -08:00
Michael Ellerman	015eb1b89e	powerpc/mm/radix: Fix build error when RADIX_MMU=n The recent TLB flush rework broke the build when the Radix MMU is disabled at build time, eg: (.text+0x264): undefined reference to `.radix__tlbiel_all' We could add an empty version, but if we ever called it by accident that would indicate a bad bug, so add a stub that just WARNs if we do. Fixes: `d4748276ae` ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-30 20:41:30 +11:00
Linus Torvalds	49f9c3552c	init_task out-of-lining -----BEGIN PGP SIGNATURE----- iQIVAwUAWl80tvSw1s6N8H32AQJq8A//ViRN5fExrd678Eh2Bz1ytrJYMUfYY3Hv QTH5TH9zFyLFyWLB1Iwe13sdLVTTM88O0qcDb54Lx9fWUqeMZyYvBhLtWPc00lTU 0m3EyYR87MFWaEV+VxaVWgWaWkMDkd39KubDitcS+YIBDszTuMpYodhPUsgLt7lr pePX7eurXKdQPTh4NUOjGA2NaZot3tga76J6D8NKruGYUstQCGxpP1ryiFfACnwf NLWNO8ZBMtlDwX1mHYOOMFMaBzFzXorPm7jY4HJDf3mUM84xI3ach6CuH9RTSzfq A+qB1U3QILPVFo2HtqOHui4bFjRwqOf6uIrI/KcnioJ37w1O+KFcMJeDnX2I211q f2lXehJLQA7kPmxQw8T3//HDRaLXc0Qxt7IPZRFinrlkcN4oh3DD5euMfCFBSoZG PTbjxlgMfzJPoZtqAcy0rV5L54a/F4h915OQPJCKLwujIsXD2nT993vNmGDyq4zh BzNMxSXJC8p+jYvQpNhWyyxwDBBT/YsVQo/ACwg4eJnD3blVTAioRT9ZZcAcsY0F 0z1eWW5RiknzIaXQWvjfK0gYKpO+aMSu9+gipHfMbU3yXG+sPj/H6zAHYzqX3uQZ jb5Iujjnu49W/YD+RiMenuu59lNXUnLSeRnlV7dw0qxGK1FzGo24+ZzKFhJhKvzG tdfUsev1Mc8= =jhWg -----END PGP SIGNATURE----- Merge tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull init_task initializer cleanups from David Howells: "It doesn't seem useful to have the init_task in a header file rather than in a normal source file. We could consolidate init_task handling instead and expand out various macros. Here's a series of patches that consolidate init_task handling: (1) Make THREAD_SIZE available to vmlinux.lds for cris, hexagon and openrisc. (2) Alter the INIT_TASK_DATA linker script macro to set init_thread_union and init_stack rather than defining these in C. Insert init_task and init_thread_into into the init_stack area in the linker script as appropriate to the configuration, with different section markers so that they end up correctly ordered. We can then get merge ia64's init_task.c into the main one. We then have a bunch of single-use INIT_() macros that seem only to be macros because they used to be used per-arch. We can then expand these in place of the user and get rid of a few lines and a lot of backslashes. (3) Expand INIT_TASK() in place. (4) Expand in place various small INIT_() macros that are defined conditionally. Expand them and surround them by #if[n]def/#endif in the .c file as it takes fewer lines. (5) Expand INIT_SIGNALS() and INIT_SIGHAND() in place. (6) Expand INIT_STRUCT_PID in place. These macros can then be discarded" * tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: Expand INIT_STRUCT_PID and remove Expand the INIT_SIGNALS and INIT_SIGHAND macros and remove Expand various INIT_* macros and remove Expand INIT_TASK() in init/init_task.c and remove Construct init thread stack in the linker script rather than by union openrisc: Make THREAD_SIZE available to vmlinux.lds hexagon: Make THREAD_SIZE available to vmlinux.lds cris: Make THREAD_SIZE available to vmlinux.lds	2018-01-29 09:08:34 -08:00
Alexey Kardashevskiy	902bdc5745	powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn The pcidev value stored in pci_dn is only used for NPU/NPU2 initialization. We can easily drop the cached pointer and use an ancient helper - pci_get_domain_bus_and_slot() instead in order to reduce complexity. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-27 20:39:01 +11:00
Bryant G. Ly	fc5f622163	powerpc/pseries: Add Initialization of VF Bars When enabling SR-IOV in pseries platform, the VF bar properties for a PF are reported on the device node in the device tree. This patch adds the IOV Bar resources to Linux structures from the device tree for later use when configuring SR-IOV by PF driver. Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-27 20:02:53 +11:00
Bryant G. Ly	67923cfcfa	powerpc/eeh: Add EEH operations to notify resume When pseries SR-IOV is enabled and after a PF driver has resumed from EEH, platform has to be notified of the event so the child VFs can be allowed to resume their normal recovery path. This patch makes the EEH operation allow unfreeze platform dependent code and adds the call to pseries EEH code. Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-27 20:02:52 +11:00
Bryant G. Ly	565a744dd2	powerpc/pseries: Set eeh_pe of EEH_PE_VF type To correctly use EEH code one has to make sure that the EEH_PE_VF is set for dynamic created VFs. Therefore this patch allocates an eeh_pe of eeh type EEH_PE_VF and associates PE with parent. Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-27 20:02:51 +11:00
Bryant G. Ly	64ba3dc7bf	powerpc/eeh: Update VF config space after EEH Add EEH platform operations for pseries to update VF config space. With this change after EEH, the VF will have updated config space for pseries platform. Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Juan J. Alvarez <jjalvare@linux.vnet.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-27 20:02:51 +11:00
Frederic Barrat	aeddad1760	ocxl: Add AFU interrupt support Add user APIs through ioctl to allocate, free, and be notified of an AFU interrupt. For opencapi, an AFU can trigger an interrupt on the host by sending a specific command targeting a 64-bit object handle. On POWER9, this is implemented by mapping a special page in the address space of a process and a write to that page will trigger an interrupt. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-24 11:42:58 +11:00
Frederic Barrat	2cb3d64b26	powerpc/powernv: Capture actag information for the device In the opencapi protocol, host memory contexts are referenced by a 'actag'. During setup, a driver must tell the device how many actags it can used, and what values are acceptable. On POWER9, the NPU can handle 64 actags per link, so they must be shared between all the PCI functions of the link. To get a global picture of how many actags are used by each AFU of every function, we capture some data at the end of PCI enumeration, so that actags can be shared fairly if needed. This is not powernv specific per say, but rather a consequence of the opencapi configuration specification being quite general. The number of available actags on POWER9 makes it more likely to be hit. This is somewhat mitigated by the fact that existing AFUs are coded by requesting a reasonable count of actags and existing devices carry only one AFU. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-24 11:42:57 +11:00
Frederic Barrat	6914c75711	powerpc/powernv: Add platform-specific services for opencapi Implement a few platform-specific calls which can be used by drivers: - provide the Transaction Layer capabilities of the host, so that the driver can find some common ground and configure the device and host appropriately. - provide the hw interrupt to be used for translation faults raised by the NPU - map/unmap some NPU mmio registers to get the fault context when the NPU raises an address translation fault The rest are wrappers around the previously-introduced opal calls. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-24 11:42:57 +11:00
Frederic Barrat	74d656d219	powerpc/powernv: Add opal calls for opencapi Add opal calls to interact with the NPU: OPAL_NPU_SPA_SETUP: set the Shared Process Area (SPA) The SPA is a table containing one entry (Process Element) per memory context which can be accessed by the opencapi device. OPAL_NPU_SPA_CLEAR_CACHE: clear the context cache The NPU keeps a cache of recently accessed memory contexts. When a Process Element is removed from the SPA, the cache for the link must be cleared. OPAL_NPU_TL_SET: configure the Transaction Layer The Transaction Layer specification defines several templates for messages to be exchanged on the link. During link setup, the host and device must negotiate what templates are supported on both sides and at what rates those messages can be sent. Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-24 11:42:56 +11:00
Nicholas Piggin	bdcb1aefc5	powerpc/64s: Improve RFI L1-D cache flush fallback The fallback RFI flush is used when firmware does not provide a way to flush the cache. It's a "displacement flush" that evicts useful data by displacing it with an uninteresting buffer. The flush has to take care to work with implementation specific cache replacment policies, so the recipe has been in flux. The initial slow but conservative approach is to touch all lines of a congruence class, with dependencies between each load. It has since been determined that a linear pattern of loads without dependencies is sufficient, and is significantly faster. Measuring the speed of a null syscall with RFI fallback flush enabled gives the relative improvement: P8 - 1.83x P9 - 1.75x The flush also becomes simpler and more adaptable to different cache geometries. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-23 16:16:33 +11:00
Eric W. Biederman	47355040d2	signal/powerpc: Remove unnecessary signal_code parameter of do_send_trap signal_code is always TRAP_HWBKPT Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-01-22 19:07:10 -06:00
Nicholas Piggin	35adacd6fc	powerpc/pseries, ps3: panic flush kernel messages before halting system Platforms with a panic handler that halts the system can have problems getting kernel messages out, because the panic notifiers are called before kernel/panic.c does its flushing of printk buffers an console etc. This was attempted to be solved with commit `a3b2cb30f2` ("powerpc: Do not call ppc_md.panic in fadump panic notifier"), but that wasn't the right approach and caused other problems, and was reverted by commit `ab9dbf771f`. Instead, the powernv shutdown paths have already had a similar problem, fixed by taking the message flushing sequence from kernel/panic.c. That's a little bit ugly, but while we have the code duplicated, it will work for this case as well. So have ppc panic handlers do the same flushing before they terminate. Without this patch, a qemu pseries_le_defconfig guest stops silently when issued the nmi command when xmon is off and no crash dumpers enabled. Afterwards, an oops is printed by each CPU as expected. Fixes: `ab9dbf771f` ("Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-22 11:44:24 +11:00
Mathieu Malaterre	38833faa11	powerpc/xive: Properly use static keyword for inline function Fix fatal warning during compilation: In file included from arch/powerpc/xmon/xmon.c:54:0: ./arch/powerpc/include/asm/xive.h:157:20: error: no previous prototype for ‘xive_smp_prepare_cpu’ [-Werror=missing-prototypes] extern inline int xive_smp_prepare_cpu(unsigned int cpu) { return -EINVAL; } ^ Signed-off-by: Mathieu Malaterre <malat@debian.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-21 23:37:43 +11:00
Michael Ellerman	4afa0f3a89	Merge branch 'next' of https://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next Freescale updates from Scott: "Contains fixes for CPM GPIO and an FSL PCI erratum workaround, plus a minor cleanup patch."	2018-01-21 23:32:02 +11:00
Michael Ellerman	ebf0b6a8b1	Merge branch 'fixes' into next Merge our fixes branch from the 4.15 cycle. Unusually the fixes branch saw some significant features merged, notably the RFI flush patches, so we want the code in next to be tested against that, to avoid any surprises when the two are merged. There's also some other work on the panic handling that was reverted in fixes and we now want to do properly in next, which would conflict. And we also fix a few other minor merge conflicts.	2018-01-21 23:21:14 +11:00
Michael Ellerman	5400fc229e	Merge branch 'topic/ppc-kvm' into next Merge the topic branch we share with kvm-ppc, this brings in two xive commits, one from Paul to rework HMI handling, and a minor cleanup to drop an unused flag.	2018-01-21 22:43:43 +11:00
Christophe Leroy	c095ff93f9	powerpc/sysdev: change CPM GPIO to platform_device Since commit `9427ecbed4` ("gpio: Rework of_gpiochip_set_names() to use device property accessors"), gpio chips have to have a parent, otherwise devprop_gpiochip_set_names() prematurely exists with message "GPIO chip parent is NULL" and doesn't proceed 'gpio-line-names' DT property. This patch wraps the CPM GPIO into a platform driver to allow assignment of the parent device. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>	2018-01-20 23:29:02 -06:00
Michael Bringmann	e83636ac33	pseries/drc-info: Search DRC properties for CPU indexes pseries/drc-info: Provide parallel routines to convert between drc_index and CPU numbers at runtime, using the older device-tree properties ("ibm,drc-indexes", "ibm,drc-names", "ibm,drc-types" and "ibm,drc-power-domains"), or the new property "ibm,drc-info". Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-21 16:21:46 +11:00
Michael Bringmann	3f38000eda	powerpc/firmware: Add definitions for new drc-info firmware feature Firmware Features: Define new bit flag representing the presence of new device tree property "ibm,drc-info". The flag is used to tell the front end processor whether the Linux kernel supports the new property, and by the front end processor to tell the Linux kernel that the new property is present in the device tree. Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-21 16:21:40 +11:00
Christophe Leroy	8183d99f4a	powerpc/lib/feature-fixups: use raw_patch_instruction() feature fixups need to use patch_instruction() early in the boot, even before the code is relocated to its final address, requiring patch_instruction() to use PTRRELOC() in order to address data. But feature fixups applies on code before it is set to read only, even for modules. Therefore, feature fixups can use raw_patch_instruction() instead. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-21 15:06:25 +11:00
Arnd Bergmann	11ed8c5569	powerpc/mpic_timer: avoid struct timeval In an effort to remove all instances of 'struct timeval' from the kernel, I'm changing the powerpc mpic_timer interface to use plain seconds instead. There is only one user of this interface, and that doesn't use the microseconds portion, so the code gets noticeably simpler in the process. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-21 15:06:16 +11:00
Linus Torvalds	24b6124047	KVM fixes for v4.15-rc9 ARM: * fix incorrect huge page mappings on systems using the contiguous hint for hugetlbfs * support alternative GICv4 init sequence * correctly implement the ARM SMCC for HVC and SMC handling PPC: * add KVM IOCTL for reporting vulnerability and workaround status s390: * provide userspace interface for branch prediction changes in firmware x86: * use correct macros for bits -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJaY3/eAAoJEED/6hsPKofo64kH/16SCSA9pKJTf39+jLoCPzbp tlhzxoaqb9cPNMQBAk8Cj5xNJ6V4Clwnk8iRWaE6dRI5nWQxnxRHiWxnrobHwUbK I0zSy+SywynSBnollKzLzQrDUBZ72fv3oLwiYEYhjMvs0zW6Q/vg10WERbav912Q bv8nb5e8TbvU500ErndKTXOa8/B6uZYkMVjBNvAHwb+4AQ7bJgDQs5/qOeXllm8A MT/SNYop/fkjRP7mQng5XYzoO+70tbe0hWpOQGgBnduzrbkNNvZtYtovusHYytLX PAB7DDPbLZm5L2HBo4zvKgTHIoHTxU0X2yfUDzt7O151O2WSyqBRC3y1tpj6xa8= =GnNJ -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Radim Krčmář: "ARM: - fix incorrect huge page mappings on systems using the contiguous hint for hugetlbfs - support alternative GICv4 init sequence - correctly implement the ARM SMCC for HVC and SMC handling PPC: - add KVM IOCTL for reporting vulnerability and workaround status s390: - provide userspace interface for branch prediction changes in firmware x86: - use correct macros for bits" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: wire up bpb feature KVM: PPC: Book3S: Provide information about hardware/firmware CVE workarounds KVM/x86: Fix wrong macro references of X86_CR0_PG_BIT and X86_CR4_PAE_BIT in kvm_valid_sregs() arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls KVM: arm64: Fix GICv4 init when called from vgic_its_create KVM: arm/arm64: Check pagesize when allocating a hugepage at Stage 2	2018-01-20 11:41:09 -08:00
Ram Pai	3350eb2ea1	powerpc: sys_pkey_mprotect() system call Patch provides the ability for a process to associate a pkey with a address range. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-21 01:06:16 +11:00
Ram Pai	9499ec1b5e	powerpc: sys_pkey_alloc() and sys_pkey_free() system calls Finally this patch provides the ability for a process to allocate and free a protection key. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-21 01:06:15 +11:00
Ram Pai	cf43d3b264	powerpc: Enable pkey subsystem PAPR defines 'ibm,processor-storage-keys' property. It exports two values. The first value holds the number of data-access keys and the second holds the number of instruction-access keys. Due to a bug in the firmware, instruction-access keys is always reported as zero. However any key can be configured to disable data-access and/or disable execution-access. The inavailablity of the second value is not a big handicap, though it could have been used to determine if the platform supported disable-execution-access. Non-PAPR platforms do not define this property in the device tree yet. Fortunately power8 is the only released Non-PAPR platform that is supported. Here, we hardcode the number of supported pkey to 32, by consulting the PowerISA3.0 This patch calculates the number of keys supported by the platform. Also it determines the platform support for read/write/execution access support for pkeys. Signed-off-by: Ram Pai <linuxram@us.ibm.com> [mpe: Use a PVR check instead of CPU_FTR for execute. Restrict to Power7/8/9 for now until older CPUs are tested.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-21 01:06:10 +11:00
Thiago Jung Bauermann	c5cc1f4df6	powerpc/ptrace: Add memory protection key regset The AMR/IAMR/UAMOR are part of the program context. Allow it to be accessed via ptrace and through core files. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:59:06 +11:00
Ram Pai	99cd130232	powerpc: Deliver SEGV signal on pkey violation The value of the pkey, whose protection got violated, is made available in si_pkey field of the siginfo structure. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:59:05 +11:00
Ram Pai	087003e9ef	powerpc: introduce get_mm_addr_key() helper get_mm_addr_key() helper returns the pkey associated with an address corresponding to a given mm_struct. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:59:05 +11:00
Ram Pai	e6c2a4797e	powerpc: Handle exceptions caused by pkey violation Handle Data and Instruction exceptions caused by memory protection-key. The CPU will detect the key fault if the HPTE is already programmed with the key. However if the HPTE is not hashed, a key fault will not be detected by the hardware. The software will detect pkey violation in such a case. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:59:04 +11:00
Ram Pai	1137573acf	powerpc: implementation for arch_vma_access_permitted() This patch provides the implementation for arch_vma_access_permitted(). Returns true if the requested access is allowed by pkey associated with the vma. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:59:04 +11:00
Ram Pai	bca7aacfe8	powerpc: check key protection for user page access Make sure that the kernel does not access user pages without checking their key-protection. Signed-off-by: Ram Pai <linuxram@us.ibm.com> [mpe: Integrate with upstream version of pte_access_permitted()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:59:04 +11:00
Ram Pai	f2407ef3ba	powerpc: helper to validate key-access permissions of a pte helper function that checks if the read/write/execute is allowed on the pte. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:59:03 +11:00
Ram Pai	a6590ca55f	powerpc: Program HPTE key protection bits Map the PTE protection key bits to the HPTE key protection bits, while creating HPTE entries. Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:59:03 +11:00
Ram Pai	eb95d016ce	powerpc: map vma key-protection bits to pte key bits. Map the key protection bits of the vma to the pkey bits in the PTE. The PTE bits used for pkey are 3,4,5,6 and 57. The first four bits are the same four bits that were freed up initially in this patch series. remember? :-) Without those four bits this patch wouldn't be possible. BUT, on 4k kernel, bit 3, and 4 could not be freed up. remember? Hence we have to be satisfied with 5, 6 and 7. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:59:02 +11:00
Ram Pai	87bbabbed8	powerpc: implementation for arch_override_mprotect_pkey() arch independent code calls arch_override_mprotect_pkey() to return a pkey that best matches the requested protection. This patch provides the implementation. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:59:02 +11:00
Ram Pai	013a91b39c	powerpc: ability to associate pkey to a vma arch-independent code expects the arch to map a pkey into the vma's protection bit setting. The patch provides that ability. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:59:01 +11:00
Ram Pai	5586cf61e1	powerpc: introduce execute-only pkey This patch provides the implementation of execute-only pkey. The architecture-independent layer expects the arch-dependent layer, to support the ability to create and enable a special key which has execute-only permission. Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:59:01 +11:00
Ram Pai	06bb53b338	powerpc: store and restore the pkey state across context switches Store and restore the AMR, IAMR and UAMOR register state of the task before scheduling out and after scheduling in, respectively. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:59:00 +11:00
Ram Pai	dcf872956d	powerpc: ability to create execute-disabled pkeys powerpc has hardware support to disable execute on a pkey. This patch enables the ability to create execute-disabled keys. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:59:00 +11:00
Ram Pai	2ddc53f3a7	powerpc: implementation for arch_set_user_pkey_access() This patch provides the detailed implementation for a user to allocate a key and enable it in the hardware. It provides the plumbing, but it cannot be used till the system call is implemented. The next patch will do so. Reviewed-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:58:59 +11:00
Ram Pai	0685f217af	powerpc: cleanup AMR, IAMR when a key is allocated or freed Cleanup the bits corresponding to a key in the AMR, and IAMR register, when the key is newly allocated/activated or is freed. We dont want some residual bits cause the hardware enforce unintended behavior when the key is activated or freed. Reviewed-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:58:59 +11:00
Ram Pai	4fb158f65a	powerpc: track allocation status of all pkeys Total 32 keys are available on power7 and above. However pkey 0,1 are reserved. So effectively we have 30 pkeys. On 4K kernels, we do not have 5 bits in the PTE to represent all the keys; we only have 3bits. Two of those keys are reserved; pkey 0 and pkey 1. So effectively we have 6 pkeys. This patch keeps track of reserved keys, allocated keys and keys that are currently free. Also it adds skeletal functions and macros, that the architecture-independent code expects to be available. Reviewed-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 22:58:35 +11:00
Ram Pai	92e3da3cf1	powerpc: initial pkey plumbing Basic plumbing to initialize the pkey system. Nothing is enabled yet. A later patch will enable it once all the infrastructure is in place. Signed-off-by: Ram Pai <linuxram@us.ibm.com> [mpe: Rework copyrights to use SPDX tags] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-20 21:45:03 +11:00
Linus Torvalds	4917d5df38	powerpc fixes for 4.15 #8 More than we'd like after rc8, but nothing very alarming either, just tying up loose ends before the release: Since we changed powernv to use cpufreq_get() from show_cpuinfo(), we see warnings with PREEMPT enabled. But the preempt_disable() in show_cpuinfo() doesn't actually prevent CPU hotplug as it suggests, so remove it. Two updates to the recently merged RFI flush code. Wire up the generic sysfs file to report the status, and add a debugfs file to allow enabling/disabling it at runtime. Two updates to xmon, one to add the RFI flush related fields to the paca dump, and another to not use hashed pointers in the paca dump. And one minor fix to add a missing include of linux/types.h in asm/hvcall.h, not seen to break the build in upstream, but correct anyway. Thanks to: Benjamin Herrenschmidt, Michal Suchanek, Nicholas Piggin. -----BEGIN PGP SIGNATURE----- iQIwBAABCAAaBQJaYcINExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYDQ /w//S5OeowmRncTPsiCNlZSLlGsynE7/3QiB3qS0+hGK9vzcdl9Zw5nBxPmA16g+ 53z2pgRpsJvagqR3JFHwYsoOKg157heMZKbrFyV/EWXPUoZQD8Iko88BkNQfV9kH dZM4gOuPnn7lLJQJrdc13SRwlOlc1oqOiPqFQtwH5CTlH/I4kVe3iR+oaReA71+U bKZQWwXjRqjexLgxfPhaMH9CzfgK6LpM/TEc2tG1YBbKxbTCbXeFYhrNFvLKILHg sC/8QDYzFbdgqTrcw6rXpwoxA65Eu8I3puOLpTl/oJ/OQZZBLN6sjjJ33+Aq1yNR cXOakJgC+WjNYmw0C2eRBttMG1o0t/g52C9e0DWN2cXU9nwZ5mdFtbQos1kOdaLv V5sOpX6HH0+K8jefvw9qdmx4+kma38nyBY1hlsSXkMG5FwXVOx5V/hLFdH4zJs4+ /9q3Bnwklc7X7Y+Qvqz/5rwi2T4++cIQpngRE2B7uUwjqNQR/iQUrXcb1MBie8LB RO7JD8D0ut+Utj75/b6PmBh2tvIHjWvC8BY9mexGfYgU4e/kHbpUa0tN8/OZhYv2 pm1LP7Qiq+Lt8ii9UjvRKRdXWuxP/8dZ5EI7l7DJkwDItHTzQCDnfCRYjLqvXuDQ CGSsyMpQzzReWhWHDn4WJp4f8wRqDyN24d2a95ozaY7HJ4M= =zgy0 -----END PGP SIGNATURE----- Merge tag 'powerpc-4.15-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "More than we'd like after rc8, but nothing very alarming either, just tying up loose ends before the release: Since we changed powernv to use cpufreq_get() from show_cpuinfo(), we see warnings with PREEMPT enabled. But the preempt_disable() in show_cpuinfo() doesn't actually prevent CPU hotplug as it suggests, so remove it. Two updates to the recently merged RFI flush code. Wire up the generic sysfs file to report the status, and add a debugfs file to allow enabling/disabling it at runtime. Two updates to xmon, one to add the RFI flush related fields to the paca dump, and another to not use hashed pointers in the paca dump. And one minor fix to add a missing include of linux/types.h in asm/hvcall.h, not seen to break the build in upstream, but correct anyway. Thanks to: Benjamin Herrenschmidt, Michal Suchanek, Nicholas Piggin" * tag 'powerpc-4.15-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/pseries: include linux/types.h in asm/hvcall.h powerpc/64s: Allow control of RFI flush via debugfs powerpc/64s: Wire up cpu_show_meltdown() powerpc: Don't preempt_disable() in show_cpuinfo() powerpc/xmon: Don't print hashed pointers in paca dump powerpc/xmon: Add RFI flush related fields to paca dump	2018-01-19 11:19:11 -08:00
Anju T Sudhakar	684d984038	powerpc/powernv: Add debugfs interface for imc-mode and imc-command In memory Collection (IMC) counter pmu driver controls the ucode's execution state. At the system boot, IMC perf driver pause the ucode. Ucode state is changed to "running" only when any of the nest units are monitored or profiled using perf tool. Nest units support only limited set of hardware counters and ucode is always programmed in the "production mode" ("accumulation") mode. This mode is configured to provide key performance metric data for most of the nest units. But ucode also supports other modes which would be used for "debug" to drill down specific nest units. That is, ucode when switched to "powerbus" debug mode (for example), will dynamically reconfigure the nest counters to target only "powerbus" related events in the hardware counters. This allows the IMC nest unit to focus on powerbus related transactions in the system in more detail. At this point, production mode events may or may not be counted. IMC nest counters has both in-band (ucode access) and out of band access to it. Since not all nest counter configurations are supported by ucode, out of band tools are used to characterize other nest counter configurations. Patch provides an interface via "debugfs" to enable the switching of ucode modes in the system. To switch ucode mode, one has to first pause the microcode (imc_cmd), and then write the target mode value to the "imc_mode" file. Proposed Approach: In the proposed approach, the function (export_imc_mode_and_cmd) which creates the debugfs interface for imc mode and command is implemented in opal-imc.c. Thus we can use imc_get_mem_addr() to get the homer base address for each chip. The interface to expose imc mode and command is required only if we have nest pmu units registered. Employing the existing data structures to track whether we have any nest units registered will require to extend data from perf side to opal-imc.c. Instead an integer is introduced to hold that information by counting successful nest unit registration. Debugfs interface is removed based on the integer count. Example for the interface: $ ls /sys/kernel/debug/imc imc_cmd_0 imc_cmd_8 imc_mode_0 imc_mode_8 Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 23:05:00 +11:00
Anju T Sudhakar	8b4e6deaff	powerpc/perf: Pass struct imc_events as a parameter to imc_parse_event() Remove the allocation of struct imc_events from imc_parse_event(). Instead pass imc_events as a parameter to imc_parse_event(), which is a pointer to a slot in the array allocated in update_events_in_group(). Reported-by: Dan Carpenter ("powerpc/perf: Fix a sizeof() typo so we allocate less memory") Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 23:04:44 +11:00
Madhavan Srinivasan	6cd74d2b61	powerpc/64s: Implement local_t using irq soft masking local_t is used for atomic modifications for per-CPU data, versus re-entrant modifications via interrupts. local_t read-modify-write atomic operations are currently implemented with hardware atomics (larx/stcx), which are quite slow. This patch implements them by masking all types of interrupts that may do local_t operations ("standard" and perf interrupts). Rusty's benchmark (https://lkml.org/lkml/2008/12/16/450) gives the following timings for the local_t test, in nanoseconds per iteration: larx/stcx irq+pmu disable _inc 38 10 _add 38 10 _read 4 4 _add_return 38 10 There are still some interrupt types (system reset, machine check, and watchdog), which can not safely use local_t operations, because they are not masked. An alternative approach was proposed, using a CR bit to mark a critical section, which is tested in the interrupt return path, and would then branch to a fixup handler (similar to exception fixups), which re-starts the operation. The problem with this was the complexity of the fixup handler and the latency of the slow path. https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-November/123024.html Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:37:04 +11:00
Madhavan Srinivasan	3b7e302086	powerpc: use generic atomic implementation for local_t powerpc implements local_t with atomic operations. There is already an asm-generic implementation which does this using atomic_t. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:37:04 +11:00
Madhavan Srinivasan	c6424382de	powerpc/64s: Add new set of irq_soft_mask_ functions for PMI masking To support soft-masking of the performance monitor interrupt, a set of new powerpc_local_irq_pmu_save() and powerpc_local_irq_restore() functions are added. And powerpc_local_irq_save() implemented, by adding a new irq_soft_mask manipulation function irq_soft_mask_or_return(). Local_irq_pmu_* macros are provided to access these powerpc_local_irq_pmu* functions which includes trace_hardirqs_on\|off() to match what we have in include/linux/irqflags.h. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:37:03 +11:00
Madhavan Srinivasan	9aa88188ee	powerpc: Add new kconfig CONFIG_PPC_IRQ_SOFT_MASK_DEBUG New Kconfig is added "CONFIG_PPC_IRQ_SOFT_MASK_DEBUG" to add WARN_ON to alert the invalid transitions. Also moved the code under the CONFIG_TRACE_IRQFLAGS in arch_local_irq_restore() to new Kconfig. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> [mpe: Fix name of CONFIG option in change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:37:03 +11:00
Madhavan Srinivasan	f442d00480	powerpc/64s: Add support to mask perf interrupts and replay them Two new bit mask field "IRQ_DISABLE_MASK_PMU" is introduced to support the masking of PMI and "IRQ_DISABLE_MASK_ALL" to aid interrupt masking checking. Couple of new irq #defs "PACA_IRQ_PMI" and "SOFTEN_VALUE_0xf0*" added to use in the exception code to check for PMI interrupts. In the masked_interrupt handler, for PMIs we reset the MSR[EE] and return. In the __check_irq_replay(), replay the PMI interrupt by calling performance_monitor_common handler. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:37:02 +11:00
Madhavan Srinivasan	f14e953b19	powerpc/64s: Add support to take additional parameter in MASKABLE_* macro To support addition of "bitmask" to MASKABLE_* macros, factor out the EXCPETION_PROLOG_1 macro. Make it explicit the interrupt masking supported by a gievn interrupt handler. Patch correspondingly extends the MASKABLE_* macros with an addition's parameter. "bitmask" parameter is passed to SOFTEN_TEST macro to decide on masking the interrupt. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:37:02 +11:00
Madhavan Srinivasan	d32eb1b550	powerpc/64s: Avoid using EXCEPTION_PROLOG_1 macro in MASKABLE_* Currently we use both EXCEPTION_PROLOG_1 and __EXCEPTION_PROLOG_1 in the MASKABLE_* macros. As a cleanup, this patch makes MASKABLE_* to use only __EXCEPTION_PROLOG_1. There is not logic change. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:37:01 +11:00
Madhavan Srinivasan	4e26bc4a4e	powerpc/64: Rename soft_enabled to irq_soft_mask Rename the paca->soft_enabled to paca->irq_soft_mask as it is no longer used as a flag for interrupt state, but a mask. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:37:01 +11:00
Madhavan Srinivasan	01417c6cc7	powerpc/64: Change soft_enabled from flag to bitmask "paca->soft_enabled" is used as a flag to mask some of interrupts. Currently supported flags values and their details: soft_enabled MSR[EE] 0 0 Disabled (PMI and HMI not masked) 1 1 Enabled "paca->soft_enabled" is initialized to 1 to make the interripts as enabled. arch_local_irq_disable() will toggle the value when interrupts needs to disbled. At this point, the interrupts are not actually disabled, instead, interrupt vector has code to check for the flag and mask it when it occurs. By "mask it", it update interrupt paca->irq_happened and return. arch_local_irq_restore() is called to re-enable interrupts, which checks and replays interrupts if any occured. Now, as mentioned, current logic doesnot mask "performance monitoring interrupts" and PMIs are implemented as NMI. But this patchset depends on local_irq_* for a successful local_* update. Meaning, mask all possible interrupts during local_* update and replay them after the update. So the idea here is to reserve the "paca->soft_enabled" logic. New values and details: soft_enabled MSR[EE] 1 0 Disabled (PMI and HMI not masked) 0 1 Enabled Reason for the this change is to create foundation for a third mask value "0x2" for "soft_enabled" to add support to mask PMIs. When ->soft_enabled is set to a value "3", PMI interrupts are mask and when set to a value of "1", PMI are not mask. With this patch also extends soft_enabled as interrupt disable mask. Current flags are renamed from IRQ_[EN?DIS}ABLED to IRQS_ENABLED and IRQS_DISABLED. Patch also fixes the ptrace call to force the user to see the softe value to be alway 1. Reason being, even though userspace has no business knowing about softe, it is part of pt_regs. Like-wise in signal context. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:37:00 +11:00
Madhavan Srinivasan	acb396d7c2	powerpc/64: Cleanup hard_irq_disable() macro Minor cleanup to use helper function for manipulating paca->soft_enabled variable. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:37:00 +11:00
Madhavan Srinivasan	a67c543aac	powerpc/64: Implement and use soft_enabled_set_return API Add a new wrapper function, soft_enabled_set_return(), added to do the paca->soft_enabled updates requiring a set-return. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:36:59 +11:00
Madhavan Srinivasan	e0b5687bed	powerpc/64: Implement and use soft_enabled_return API Add a new wrapper function, soft_enabled_return(), added to return paca->soft_enabled value. Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:36:59 +11:00
Madhavan Srinivasan	0b63acf4a0	powerpc/64: Move set_soft_enabled() and rename Move set_soft_enabled() from powerpc/kernel/irq.c to asm/hw_irq.c, to encourage updates to paca->soft_enabled done via these access function. Add "memory" clobber to hint compiler since paca->soft_enabled memory is the target here. Renaming it as soft_enabled_set() will make namespaces works better as prefix than a postfix when new soft_enabled manipulation functions are introduced. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:36:58 +11:00
Madhavan Srinivasan	b5c1bd62c0	powerpc/64: Fix arch_local_irq_disable() prototype In powerpc/64, the arch_local_irq_disable() function returns unsigned long, which is not consistent with other architectures. Move that set-return asm implementation into arch_local_irq_save(), and make arch_local_irq_disable() return void, simplifying the assembly. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:36:58 +11:00
Nicholas Piggin	9f83e00f4c	powerpc/64: Improve inline asm in arch_local_irq_disable arch_local_irq_disable is implemented strangely, with a temporary output register being set to the desired soft_enabled value via an immediate input, which is then used to store to memory. This is not required, the immediate can be specified directly as a register input. For simple cases at least, assembly is unchanged except register mapping. Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:36:57 +11:00
Madhavan Srinivasan	c2e480ba82	powerpc/64: Add #defines for paca->soft_enabled flags Two #defines IRQS_ENABLED and IRQS_DISABLED are added to be used when updating paca->soft_enabled. Replace the hardcoded values used when updating paca->soft_enabled with IRQ_(EN\|DIS)ABLED #define. No logic change. Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:36:56 +11:00
Michael Ellerman	7a074fc083	powerpc/64s: Fix ps3 build error due to tlbiel_all() The recent changes to TLB handling broke the PS3 build: arch/powerpc/include/asm/book3s/64/tlbflush.h:30: undefined reference to `.hash__tlbiel_all' Fix it by adding an fallback version of tlbiel_all() for non-native builds. It should never be called, due to checks in callers so it calls BUG(). We should probably clean it up further but this will suffice for now. Fixes: `d4748276ae` ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-19 22:36:45 +11:00
Paul Mackerras	3214d01f13	KVM: PPC: Book3S: Provide information about hardware/firmware CVE workarounds This adds a new ioctl, KVM_PPC_GET_CPU_CHAR, that gives userspace information about the underlying machine's level of vulnerability to the recently announced vulnerabilities CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754, and whether the machine provides instructions to assist software to work around the vulnerabilities. The ioctl returns two u64 words describing characteristics of the CPU and required software behaviour respectively, plus two mask words which indicate which bits have been filled in by the kernel, for extensibility. The bit definitions are the same as for the new H_GET_CPU_CHARACTERISTICS hypercall. There is also a new capability, KVM_CAP_PPC_GET_CPU_CHAR, which indicates whether the new ioctl is available. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-01-19 15:17:01 +11:00
Benjamin Herrenschmidt	9b9b13a6d1	KVM: PPC: Book3S HV: Keep XIVE escalation interrupt masked unless ceded This works on top of the single escalation support. When in single escalation, with this change, we will keep the escalation interrupt disabled unless the VCPU is in H_CEDE (idle). In any other case, we know the VCPU will be rescheduled and thus there is no need to take escalation interrupts in the host whenever a guest interrupt fires. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-01-19 12:10:21 +11:00
Benjamin Herrenschmidt	35c2405efc	KVM: PPC: Book3S HV: Make xive_pushed a byte, not a word Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-01-19 12:10:21 +11:00
Benjamin Herrenschmidt	2267ea7661	KVM: PPC: Book3S HV: Don't use existing "prodded" flag for XIVE escalations The prodded flag is only cleared at the beginning of H_CEDE, so every time we have an escalation, we will cause the next H_CEDE to return immediately. Instead use a dedicated "irq_pending" flag to indicate that a guest interrupt is pending for the VCPU. We don't reuse the existing exception bitmap so as to avoid expensive atomic ops. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-01-19 12:10:21 +11:00
Benjamin Herrenschmidt	bf4159da47	KVM: PPC: Book3S HV: Enable use of the new XIVE "single escalation" feature That feature, provided by Power9 DD2.0 and later, when supported by newer OPAL versions, allows us to sacrifice a queue (priority 7) in favor of merging all the escalation interrupts of the queues of a single VP into a single interrupt. This reduces the number of host interrupts used up by KVM guests especially when those guests use multiple priorities. It will also enable a future change to control the masking of the escalation interrupts more precisely to avoid spurious ones. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-01-19 12:10:21 +11:00
Paul Mackerras	d27998185d	Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next This merges in the ppc-kvm topic branch of the powerpc tree to get two patches which are prerequisites for the following patch series, plus another patch which touches both powerpc and KVM code. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-01-19 12:09:57 +11:00
Nicholas Piggin	c16bee4bde	powerpc: define __ARCH_IRQ_EXIT_IRQS_DISABLED powerpc calls irq_exit() with local irqs disabled, therefore it can define __ARCH_IRQ_EXIT_IRQS_DISABLED. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-18 15:43:43 +11:00
Nicholas Piggin	47712a921b	powerpc/watchdog: remove arch_trigger_cpumask_backtrace The powerpc NMI IPIs may not be recoverable if they are taken in some sections of code, and also there have been and still are issues with taking NMIs (in KVM guest code, in firmware, etc) which makes them a bit dangerous to use. Generic code like softlockup detector and rcu stall detectors really hammer on trigger_*_backtrace, which has lead to further problems because we've implemented it with the NMI. So stop providing NMI backtraces for now. Importantly, the powerpc code uses NMI IPIs in crash/debug, and the SMP hardlockup watchdog. So if the softlockup and rcu hang detection traces are not being printed because the CPU is stuck with interrupts off, then the hard lockup watchdog should get it with the NMI IPI. Fixes: `2104180a53` ("powerpc/64s: implement arch-specific hardlockup watchdog") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-18 15:43:43 +11:00
Paul Mackerras	d075745d89	KVM: PPC: Book3S HV: Improve handling of debug-trigger HMIs on POWER9 Hypervisor maintenance interrupts (HMIs) are generated by various causes, signalled by bits in the hypervisor maintenance exception register (HMER). In most cases calling OPAL to handle the interrupt is the correct thing to do, but the "debug trigger" HMIs signalled by PPC bit 17 (bit 46) of HMER are used to invoke software workarounds for hardware bugs, and OPAL does not have any code to handle this cause. The debug trigger HMI is used in POWER9 DD2.0 and DD2.1 chips to work around a hardware bug in executing vector load instructions to cache inhibited memory. In POWER9 DD2.2 chips, it is generated when conditions are detected relating to threads being in TM (transactional memory) suspended mode when the core SMT configuration needs to be reconfigured. The kernel currently has code to detect the vector CI load condition, but only when the HMI occurs in the host, not when it occurs in a guest. If a HMI occurs in the guest, it is always passed to OPAL, and then we always re-sync the timebase, because the HMI cause might have been a timebase error, for which OPAL would re-sync the timebase, thus removing the timebase offset which KVM applied for the guest. Since we don't know what OPAL did, we don't know whether to subtract the timebase offset from the timebase, so instead we re-sync the timebase. This adds code to determine explicitly what the cause of a debug trigger HMI will be. This is based on a new device-tree property under the CPU nodes called ibm,hmi-special-triggers, if it is present, or otherwise based on the PVR (processor version register). The handling of debug trigger HMIs is pulled out into a separate function which can be called from the KVM guest exit code. If this function handles and clears the HMI, and no other HMI causes remain, then we skip calling OPAL and we proceed to subtract the guest timebase offset from the timebase. The overall handling for HMIs that occur in the host (i.e. not in a KVM guest) is largely unchanged, except that we now don't set the flag for the vector CI load workaround on DD2.2 processors. This also removes a BUG_ON in the KVM code. BUG_ON is generally not useful in KVM guest entry/exit code since it is difficult to handle the resulting trap gracefully. Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-18 15:31:25 +11:00
Nicholas Piggin	d4748276ae	powerpc/64s: Improve local TLB flush for boot and MCE on POWER9 There are several cases outside the normal address space management where a CPU's entire local TLB is to be flushed: 1. Booting the kernel, in case something has left stale entries in the TLB (e.g., kexec). 2. Machine check, to clean corrupted TLB entries. One other place where the TLB is flushed, is waking from deep idle states. The flush is a side-effect of calling ->cpu_restore with the intention of re-setting various SPRs. The flush itself is unnecessary because in the first case, the TLB should not acquire new corrupted TLB entries as part of sleep/wake (though they may be lost). This type of TLB flush is coded inflexibly, several times for each CPU type, and they have a number of problems with ISA v3.0B: - The current radix mode of the MMU is not taken into account, it is always done as a hash flushn For IS=2 (LPID-matching flush from host) and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if the R field does not match the current radix mode. - ISA v3.0B hash must flush the partition and process table caches as well. - ISA v3.0B radix must flush partition and process scoped translations, partition and process table caches, and also the page walk cache. So consolidate the flushing code and implement it in C and inline asm under the mm/ directory with the rest of the flush code. Add ISA v3.0B cases for radix and hash, and use the radix flush in radix environment. Provide a way for IS=2 (LPID flush) to specify the radix mode of the partition. Have KVM pass in the radix mode of the guest. Take out the flushes from early cputable/dt_cpu_ftrs detection hooks, and move it later in the boot process after, the MMU registers are set up and before relocation is first turned on. The TLB flush is no longer called when restoring from deep idle states. This was not be done as a separate step because booting secondaries uses the same cpu_restore as idle restore, which needs the TLB flush. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-18 00:40:31 +11:00
Michal Suchanek	1b689a95ce	powerpc/pseries: include linux/types.h in asm/hvcall.h Commit `6e032b350c` ("powerpc/powernv: Check device-tree for RFI flush settings") uses u64 in asm/hvcall.h without including linux/types.h This breaks hvcall.h users that do not include the header themselves. Fixes: `6e032b350c` ("powerpc/powernv: Check device-tree for RFI flush settings") Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-17 23:30:46 +11:00
Benjamin Herrenschmidt	872e2ae4bd	powerpc: Remove useless EXC_COMMON_HV The only difference between EXC_COMMON_HV and EXC_COMMON is that the former adds "2" to the trap number which is supposed to represent the fact that this is an "HV" interrupt which uses HSRR0/1. However KVM is the only one who cares and it has its own separate macros. In fact, we only have one user of EXC_COMMON_HV and it's for an unknown interrupt case. All the other ones already using EXC_COMMON. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-16 23:51:42 +11:00
Christophe Leroy	4f94b2c746	powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP When CONFIG_SWAP is set, the TLB miss handlers have to also take into account _PAGE_ACCESSED flag. At the moment it is done by anding _PAGE_ACCESSED into _PAGE_PRESENT using 3 instructions. This patch uses APG for handling _PAGE_ACCESSED, allowing to just copy _PAGE_ACCESSED bit into APG field, hence reducing the action to a single instruction. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-16 23:47:15 +11:00
Christophe Leroy	de0f938739	powerpc/8xx: Remove _PAGE_USER and handle user access at PMD level As Linux kernel separates KERNEL and USER address spaces, there is therefore no need to flag USER access at page level. Today, the 8xx TLB handlers already handle user access in the L1 entry through Access Protection Groups, it is then natural to move the user access handling at PMD level once _PAGE_NA allows to handle PAGE_NONE protection without _PAGE_USER In the mean time, as we free up one bit in the PTE, we can use it to include SPS (page size flag) in the PTE and avoid handling it at every TLB miss hence removing special handling based on compiled page size. For _PAGE_EXEC, we rework it to use PP PTE bits, avoiding the copy of _PAGE_EXEC bit into the L1 entry. Unfortunatly we are not able to put it at the correct location as it conflicts with NA/RO/RW bits for data entries. Upper bits of APG in L1 entry overlap with PMD base address. In order to avoid having to filter that out, we set up all groups so that upper bits can have any value. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-16 23:47:14 +11:00
Christophe Leroy	351750331f	powerpc/mm: Introduce _PAGE_NA Today, PAGE_NONE is defined as a page not having _PAGE_USER. In some circunstances, when the CPU supports it, it might be better to be able to flag a page with NO ACCESS. In a following patch, the 8xx will switch user access being flagged in the PMD, therefore it will not be possible anymore to use _PAGE_USER as a way to flag a page with no access. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-16 23:47:14 +11:00
Christophe Leroy	812fadcb94	powerpc/mm: extend _PAGE_PRIVILEGED to all CPUs commit `ac29c64089` ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED") introduced _PAGE_PRIVILEGED for BOOK3S/64 This patch generalises _PAGE_PRIVILEGED for all CPUs, allowing to have either _PAGE_PRIVILEGED or _PAGE_USER or both. PPC_8xx has a _PAGE_SHARED flag which is set for and only for all non user pages. Lets rename it _PAGE_PRIVILEGED to remove confusion as it has nothing to do with Linux shared pages. On BookE, there's a _PAGE_BAP_SR which has to be set for kernel pages: defining _PAGE_PRIVILEGED as _PAGE_BAP_SR will make this generic Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-16 23:47:13 +11:00
Christophe Leroy	5f356497c3	powerpc/8xx: remove unused _PAGE_WRITETHRU _PAGE_WRITETHRU is only used in: * AMIGA_Z2RAM block driver which is never activated on powerPC * Video/FB driver which is for PPC_PMAC Therefore, no need to spend time in 8xx TLB miss handlers for handling it. And by removing it, we free up bit 20 which then avoids having to clear it on each TLB miss. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-16 23:47:13 +11:00
Christophe Leroy	cd99ddbea2	powerpc/8xx: Only perform perf counting when perf is in use. In TLB miss handlers, updating the perf counter is only useful when performing a perf analysis. As it has a noticeable overhead, let's only do it when needed. In order to do so, the exit of the miss handlers will be patched when starting/stopping 'perf': the first register restore instruction of each exit point will be replaced by a jump to the counting code. Once this is done, CONFIG_PPC_8xx_PERF_EVENT becomes useless as this feature doesn't add any overhead. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-16 23:47:12 +11:00
Christophe Leroy	2a45addd21	powerpc/8xx: Remove CPU6 ERRATA Workaround CPU6 ERRATA affects only MPC860 revisions prior to C.0. Manufacturing of those revisiosn was stopped in 1999-2000. Therefore, it has been almost 20 years since this ERRATA has been fixed in the silicon. This patch removes the workaround for that ERRATA. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-16 23:47:12 +11:00
Balbir Singh	4145f35864	powernv/kdump: Fix cases where the kdump kernel can get HMI's Certain HMI's such as malfunction error propagate through all threads/core on the system. If a thread was offline prior to us crashing the system and jumping to the kdump kernel, bad things happen when it wakes up due to an HMI in the kdump kernel. There are several possible ways to solve this problem 1. Put the offline cores in a state such that they are not woken up for machine check and HMI errors. This does not work, since we might need to wake up offline threads to handle TB errors 2. Ignore HMI errors, setup HMEER to mask HMI errors, but this still leads the window open for any MCEs and masking them for the duration of the dump might be a concern 3. Wake up offline CPUs, as in send them to crash_ipi_callback (not wake them up as in mark them online as seen by the hotplug). kexec does a wake_online_cpus() call, this patch does something similar, but instead sends an IPI and forces them to crash_ipi_callback() This patch takes approach #3. Care is taken to enable this only for powenv platforms via crash_wake_offline (a global value set at setup time). The crash code sends out IPI's to all CPU's which then move to crash_ipi_callback and kexec_smp_wait(). Signed-off-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-16 23:47:11 +11:00
Nathan Fontenot	0c38ed6f6f	powerpc/pseries: Enable support of ibm,dynamic-memory-v2 Add required bits to the architecture vector to enable support of the ibm,dynamic-memory-v2 device tree property. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-16 23:26:30 +11:00
Nathan Fontenot	2b31e3aec1	powerpc/drmem: Add support for ibm, dynamic-memory-v2 property The Power Hypervisor has introduced a new device tree format for the property describing the dynamic reconfiguration LMBs for a system, ibm,dynamic-memory-v2. This new format condenses the size of the property, especially on large memory systems, by reporting sets of LMBs that have the same properties (flags and associativity array index). This patch updates the powerpc/mm/drmem.c code to provide routines that can parse the new device tree format during the walk_drmem_lmb* routines used during boot, the creation of the LMB array, and updating the device tree to create a new property in the proper format for ibm,dynamic-memory-v2. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-16 23:26:29 +11:00
Nathan Fontenot	2c77721552	powerpc: Move of_drconf_cell struct to asm/drmem.h Now that the powerpc code parses dynamic reconfiguration memory LMB information from the LMB array and not the device tree directly we can move the of_drconf_cell struct to drmem.h where it fits better. In addition, the struct is renamed to of_drconf_cell_v1 in anticipation of upcoming support for version 2 of the dynamic reconfiguration property and the members are typed as __be* values to reflect how they exist in the device tree. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-16 23:26:29 +11:00
Nathan Fontenot	6195a5001f	powerpc/pseries: Update memory hotplug code to use drmem LMB array Update the pseries memory hotplug code to use the newly added dynamic reconfiguration LMB array. Doing this is required for the upcoming support of version 2 of the dynamic reconfiguration device tree property. In addition, making this change cleans up the code that parses the LMB information as we no longer need to worry about device tree format. This allows us to discard one of the first steps on memory hotplug where we make a working copy of the device tree property and convert the entire property to cpu format. Instead we just use the LMB array directly while holding the memory hotplug lock. This patch also moves the updating of the device tree property to powerpc/mm/drmem.c. This allows to the hotplug code to work without needing to know the device tree format and provides a single routine for updating the device tree property. This new routine will handle determination of the proper device tree format and generate a properly formatted device tree property. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-16 23:26:28 +11:00
Nathan Fontenot	514a9cb331	powerpc/numa: Update numa code use walk_drmem_lmbs Update code in powerpc/numa.c to use the walk_drmem_lmbs() routine instead of parsing the device tree directly. This is in anticipation of introducing a new ibm,dynamic-memory-v2 property with a different format. This will allow the numa code to use a single initialization routine per-LMB irregardless of the device tree format. Additionally, to support additional routines in numa.c that need to look up LMB information, an late_init routine is added to drmem.c to allocate the array of LMB information. This LMB array will provide per-LMB information to separate the LMB data from the device tree format. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-16 23:26:28 +11:00
Nathan Fontenot	6c6ea53725	powerpc/mm: Separate ibm, dynamic-memory data from DT format We currently have code to parse the dynamic reconfiguration LMB information from the ibm,dynamic-meory device tree property in multiple locations; numa.c, prom.c, and pseries/hotplug-memory.c. In anticipation of adding support for a version 2 of the ibm,dynamic-memory property this patch aims to separate the device tree information from the device tree format. Doing this requires a two step process to avoid a possibly very large bootmem allocation early in boot. During initial boot, new routines are provided to walk the device tree property and make a call-back for each LMB. The second step (introduced in later patches) will allocate an array of LMB information that can be used directly without needing to know the DT format. This approach provides the benefit of consolidating the device tree property parsing to a single location and (eventually) providing a common data structure for retrieving LMB information. This patch introduces a routine to walk the ibm,dynamic-memory property in the flattened device tree and updates the prom.c code to use this to initialize memory. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-16 23:26:27 +11:00
Eric W. Biederman	ea64d5acc8	signal: Unify and correct copy_siginfo_to_user32 Among the existing architecture specific versions of copy_siginfo_to_user32 there are several different implementation problems. Some architectures fail to handle all of the cases in in the siginfo union. Some architectures perform a blind copy of the siginfo union when the si_code is negative. A blind copy suggests the data is expected to be in 32bit siginfo format, which means that receiving such a signal via signalfd won't work, or that the data is in 64bit siginfo and the code is copying nonsense to userspace. Create a single instance of copy_siginfo_to_user32 that all of the architectures can share, and teach it to handle all of the cases in the siginfo union correctly, with the assumption that siginfo is stored internally to the kernel is 64bit siginfo format. A special case is made for x86 x32 format. This is needed as presence of both x32 and ia32 on x86_64 results in two different 32bit signal formats. By allowing this small special case there winds up being exactly one code base that needs to be maintained between all of the architectures. Vastly increasing the testing base and the chances of finding bugs. As the x86 copy of copy_siginfo_to_user32 the call of the x86 signal_compat_build_tests were moved into sigaction_compat_abi, so that they will keep running. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-01-15 19:56:20 -06:00
Paul Mackerras	5855564c8a	KVM: PPC: Book3S HV: Enable migration of decrementer register This adds a register identifier for use with the one_reg interface to allow the decrementer expiry time to be read and written by userspace. The decrementer expiry time is in guest timebase units and is equal to the sum of the decrementer and the guest timebase. (The expiry time is used rather than the decrementer value itself because the expiry time is not constantly changing, though the decrementer value is, while the guest vcpu is not running.) Without this, a guest vcpu migrated to a new host will see its decrementer set to some random value. On POWER8 and earlier, the decrementer is 32 bits wide and counts down at 512MHz, so the guest vcpu will potentially see no decrementer interrupts for up to about 4 seconds, which will lead to a stall. With POWER9, the decrementer is now 56 bits side, so the stall can be much longer (up to 2.23 years) and more noticeable. To help work around the problem in cases where userspace has not been updated to migrate the decrementer expiry time, we now set the default decrementer expiry at vcpu creation time to the current time rather than the maximum possible value. This should mean an immediate decrementer interrupt when a migrated vcpu starts running. In cases where the decrementer is 32 bits wide and more than 4 seconds elapse between the creation of the vcpu and when it first runs, the decrementer would have wrapped around to positive values and there may still be a stall - but this is no worse than the current situation. In the large-decrementer case, we are sure to get an immediate decrementer interrupt (assuming the time from vcpu creation to first run is less than 2.23 years) and we thus avoid a very long stall. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2018-01-16 11:54:45 +11:00
Eric W. Biederman	ad2b1ab57d	signal/powerpc: Remove redefinition of NSIGTRAP on powerpc NSIGTRAP is 4 in the generic siginfo and powerpc just undefines NSGTRAP and redefine it as 4. That accomplishes nothing so remove the duplication. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-01-15 17:42:33 -06:00
Al Viro	b713da69e4	signal: unify compat_siginfo_t --EWB Added #ifdef CONFIG_X86_X32_ABI to arch/x86/kernel/signal_compat.c Changed #ifdef CONFIG_X86_X32 to #ifdef CONFIG_X86_X32_ABI in linux/compat.h CONFIG_X86_X32 is set when the user requests X32 support. CONFIG_X86_X32_ABI is set when the user requests X32 support and the tool-chain has X32 allowing X32 support to be built. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2018-01-15 17:40:31 -06:00
Christoph Hellwig	a37a3710f3	powerpc: rename swiotlb_dma_ops We'll need that name for a generic implementation soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2018-01-15 09:35:26 +01:00
Linus Torvalds	6bb821193b	powerpc fixes for 4.15 #7 One fix for an oops at boot if we take a hotplug interrupt before we are ready to handle it. The bulk is patches to implement mitigation for Meltdown, see the change logs for more details. Thanks to: Nicholas Piggin, Michael Neuling, Oliver O'Halloran, Jon Masters, Jose Ricardo Ziviani, David Gibson. -----BEGIN PGP SIGNATURE----- iQIwBAABCAAaBQJaWXZ+ExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYBe ew/+KMp80VAIGSyFeYHLg8oClnQjKcEDMzlNoHo8WU6NwwNzk2XXBXNaGT7Osg7B Ny69R3/VRrGmi/2uule9OzF2eZt8BUmRn3cHDcUB5OwFjMysv4M2jc+OK0STQxLU F769rX+ZlkyAvQn4UxtNCUob+JDfVbE5Lurrx475ZGL1YHk6QqBGhlN/niXKtkBz upeBEpu4JgwsLLD9H7c4QQha9PkhxY51jtvxL24cgwDuKJKhTdxLahPlA3QmLafu FJxR9R0YzTj9Ivuae5yM7xRKr5wROTPtTRbG9sN7XOgFNrQ+LSgemXGrlV/fzdhV 6iPn2hHs4bu45SOIM+Hhf7OwHTtnMetXmKo6NldlKHVCw1HwBLigRohnOTPjiHfZ P6brR4T/cglakBiE29+8T0UFqOPizEJiQRHnsoWzwIa/aTqCGEE3m3rgTxU6ovt5 3JggC51ZRDFsjGvj+JFnxGxCjYC4tDbfBI28M+GuWZZPBKLkFnEk8PlSzYyHPW2p 5uVA/UN4KHVGYUfJVUC+LMP+ZIOSEyEISKk7RuK0C0mwHSXcvxbeEefXbc18Xjai J/pl/c9/4CRnEbdPm305xBxKKmU6gWxWRK3KbRFsoTI/wp+Ryx5TjPpgAa5GnaLv nrtrfKs8skWfGMoDWH7+KILt0fgRR2x+91GTSxc5aaHtYBI= =UZHv -----END PGP SIGNATURE----- Merge tag 'powerpc-4.15-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One fix for an oops at boot if we take a hotplug interrupt before we are ready to handle it. The bulk is patches to implement mitigation for Meltdown, see the change logs for more details. Thanks to: Nicholas Piggin, Michael Neuling, Oliver O'Halloran, Jon Masters, Jose Ricardo Ziviani, David Gibson" * tag 'powerpc-4.15-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powernv: Check device-tree for RFI flush settings powerpc/pseries: Query hypervisor for RFI flush settings powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti powerpc/64s: Add support for RFI flush of L1-D cache powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL powerpc/64s: Simple RFI macro conversions powerpc/64: Add macros for annotating the destination of rfid/hrfid powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper powerpc/pseries: Make RAS IRQ explicitly dependent on DLPAR WQ	2018-01-14 15:03:17 -08:00
Eric W. Biederman	2f82a46f66	signal: Remove _sys_private and _overrun_incr from struct compat_siginfo We have never passed either field to or from userspace so just remove them. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-01-12 14:34:46 -06:00
Eric W. Biederman	cf4674c46c	signal/powerpc: Document conflicts with SI_USER and SIGFPE and SIGTRAP Setting si_code to 0 results in a userspace seeing an si_code of 0. This is the same si_code as SI_USER. Posix and common sense requires that SI_USER not be a signal specific si_code. As such this use of 0 for the si_code is a pretty horribly broken ABI. Further use of si_code == 0 guaranteed that copy_siginfo_to_user saw a value of __SI_KILL and now sees a value of SIL_KILL with the result that uid and pid fields are copied and which might copying the si_addr field by accident but certainly not by design. Making this a very flakey implementation. Utilizing FPE_FIXME and TRAP_FIXME, siginfo_layout() will now return SIL_FAULT and the appropriate fields will be reliably copied. Possible ABI fixes includee: - Send the signal without siginfo - Don't generate a signal - Possibly assign and use an appropriate si_code - Don't handle cases which can't happen Cc: Paul Mackerras <paulus@samba.org> Cc: Kumar Gala <kumar.gala@freescale.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: linuxppc-dev@lists.ozlabs.org Ref: 9bad068c24d7 ("[PATCH] ppc32: support for e500 and 85xx") Ref: 0ed70f6105ef ("PPC32: Provide proper siginfo information on various exceptions.") History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2018-01-12 14:21:04 -06:00
Benjamin Herrenschmidt	7f1c410da5	powerpc/xive: Add interrupt flag to disable automatic EOI This will be used by KVM in order to keep escalation interrupts in the non-EOI (masked) state after they fire. They will be re-enabled directly in HW by KVM when needed. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-12 15:24:44 +11:00
Benjamin Herrenschmidt	12c1f339cd	powerpc/xive: Move definition of ESB bits From xive.h to xive-regs.h since it's a HW register definition and it can be used from assembly Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-12 15:24:41 +11:00
Christoph Hellwig	2d9d6f6c9e	powerpc: rename dma_direct_ to dma_nommu_ We want to use the dma_direct_ namespace for a generic implementation, so rename powerpc to the second best choice: dma_nommu_. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-01-10 16:41:14 +01:00
Christoph Hellwig	b49efd7624	dma-mapping: move dma_mark_clean to dma-direct.h And unlike the other helpers we don't require a <asm/dma-direct.h> as this helper is a special case for ia64 only, and this keeps it as simple as possible. Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-01-10 16:41:12 +01:00
Christoph Hellwig	ea8c64ace8	dma-mapping: move swiotlb arch helpers to a new header phys_to_dma, dma_to_phys and dma_capable are helpers published by architecture code for use of swiotlb and xen-swiotlb only. Drivers are not supposed to use these directly, but use the DMA API instead. Move these to a new asm/dma-direct.h helper, included by a linux/dma-direct.h wrapper that provides the default linear mapping unless the architecture wants to override it. In the MIPS case the existing dma-coherent.h is reused for now as untangling it will take a bit of work. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Robin Murphy <robin.murphy@arm.com>	2018-01-10 16:40:54 +01:00
Michael Ellerman	aa8a5e0062	powerpc/64s: Add support for RFI flush of L1-D cache On some CPUs we can prevent the Meltdown vulnerability by flushing the L1-D cache on exit from kernel to user mode, and from hypervisor to guest. This is known to be the case on at least Power7, Power8 and Power9. At this time we do not know the status of the vulnerability on other CPUs such as the 970 (Apple G5), pasemi CPUs (AmigaOne X1000) or Freescale CPUs. As more information comes to light we can enable this, or other mechanisms on those CPUs. The vulnerability occurs when the load of an architecturally inaccessible memory region (eg. userspace load of kernel memory) is speculatively executed to the point where its result can influence the address of a subsequent speculatively executed load. In order for that to happen, the first load must hit in the L1, because before the load is sent to the L2 the permission check is performed. Therefore if no kernel addresses hit in the L1 the vulnerability can not occur. We can ensure that is the case by flushing the L1 whenever we return to userspace. Similarly for hypervisor vs guest. In order to flush the L1-D cache on exit, we add a section of nops at each (h)rfi location that returns to a lower privileged context, and patch that with some sequence. Newer firmwares are able to advertise to us that there is a special nop instruction that flushes the L1-D. If we do not see that advertised, we fall back to doing a displacement flush in software. For guest kernels we support migration between some CPU versions, and different CPUs may use different flush instructions. So that we are prepared to migrate to a machine with a different flush instruction activated, we may have to patch more than one flush instruction at boot if the hypervisor tells us to. In the end this patch is mostly the work of Nicholas Piggin and Michael Ellerman. However a cast of thousands contributed to analysis of the issue, earlier versions of the patch, back ports testing etc. Many thanks to all of them. Tested-by: Jon Masters <jcm@redhat.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-10 21:27:06 +11:00
David Howells	0500871f21	Construct init thread stack in the linker script rather than by union Construct the init thread stack in the linker script rather than doing it by means of a union so that ia64's init_task.c can be got rid of. The following symbols are then made available from INIT_TASK_DATA() linker script macro: init_thread_union init_stack INIT_TASK_DATA() also expands the region to THREAD_SIZE to accommodate the size of the init stack. init_thread_union is given its own section so that it can be placed into the stack space in the right order. I'm assuming that the ia64 ordering is correct and that the task_struct is first and the thread_info second. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Will Deacon <will.deacon@arm.com> (arm64) Tested-by: Palmer Dabbelt <palmer@sifive.com> Acked-by: Thomas Gleixner <tglx@linutronix.de>	2018-01-09 23:21:02 +00:00
Nicholas Piggin	222f20f140	powerpc/64s: Simple RFI macro conversions This commit does simple conversions of rfi/rfid to the new macros that include the expected destination context. By simple we mean cases where there is a single well known destination context, and it's simply a matter of substituting the instruction for the appropriate macro. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-10 03:07:30 +11:00
Nicholas Piggin	50e51c13b3	powerpc/64: Add macros for annotating the destination of rfid/hrfid The rfid/hrfid ((Hypervisor) Return From Interrupt) instruction is used for switching from the kernel to userspace, and from the hypervisor to the guest kernel. However it can and is also used for other transitions, eg. from real mode kernel code to virtual mode kernel code, and it's not always clear from the code what the destination context is. To make it clearer when reading the code, add macros which encode the expected destination context. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-10 03:07:30 +11:00
Christoph Hellwig	c91a7a405b	powerpc: remove unused flush_write_buffers definition Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-01-09 16:28:36 +01:00
Michael Ellerman	a6978f405d	Merge branch 'topic/ppc-kvm' into fixes Merge the topic branch with share with the kvm-ppc tree. In this case we need to share the definition of a new hypervisor call and associated flags.	2018-01-10 02:24:34 +11:00
Michael Neuling	191eccb158	powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper A new hypervisor call has been defined to communicate various characteristics of the CPU to guests. Add definitions for the hcall number, flags and a wrapper function. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-01-10 01:46:34 +11:00
Sergey Senozhatsky	5633e85b2c	powerpc64: Add .opd based function descriptor dereference We are moving towards separate kernel and module function descriptor dereference callbacks. This patch enables it for powerpc64. For pointers that belong to the kernel - Added __start_opd and __end_opd pointers, to track the kernel .opd section address range; - Added dereference_kernel_function_descriptor(). Now we will dereference only function pointers that are within [__start_opd, __end_opd); For pointers that belong to a module - Added dereference_module_function_descriptor() to handle module function descriptor dereference. Now we will dereference only pointers that are within [module->opd.start, module->opd.end). Link: http://lkml.kernel.org/r/20171109234830.5067-4-sergey.senozhatsky@gmail.com To: Tony Luck <tony.luck@intel.com> To: Fenghua Yu <fenghua.yu@intel.com> To: Helge Deller <deller@gmx.de> To: Benjamin Herrenschmidt <benh@kernel.crashing.org> To: Paul Mackerras <paulus@samba.org> To: Michael Ellerman <mpe@ellerman.id.au> To: James Bottomley <jejb@parisc-linux.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linux-ia64@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-kernel@vger.kernel.org Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Tested-by: Santosh Sivaraj <santosh@fossix.org> #powerpc Signed-off-by: Petr Mladek <pmladek@suse.com>	2018-01-09 10:45:37 +01:00
Stephen Boyd	e0af0c1610	arch: Remove clkdev.h asm-generic from Kbuild Now that every architecture is using the generic clkdev.h file and we no longer include asm/clkdev.h anywhere in the tree, we can remove it. Cc: Russell King <linux@armlinux.org.uk> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: <linux-arch@vger.kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>	2018-01-03 09:02:11 -08:00
Linus Torvalds	caf9a82657	Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 PTI preparatory patches from Thomas Gleixner: "Todays Advent calendar window contains twentyfour easy to digest patches. The original plan was to have twenty three matching the date, but a late fixup made that moot. - Move the cpu_entry_area mapping out of the fixmap into a separate address space. That's necessary because the fixmap becomes too big with NRCPUS=8192 and this caused already subtle and hard to diagnose failures. The top most patch is fresh from today and cures a brain slip of that tall grumpy german greybeard, who ignored the intricacies of 32bit wraparounds. - Limit the number of CPUs on 32bit to 64. That's insane big already, but at least it's small enough to prevent address space issues with the cpu_entry_area map, which have been observed and debugged with the fixmap code - A few TLB flush fixes in various places plus documentation which of the TLB functions should be used for what. - Rename the SYSENTER stack to CPU_ENTRY_AREA stack as it is used for more than sysenter now and keeping the name makes backtraces confusing. - Prevent LDT inheritance on exec() by moving it to arch_dup_mmap(), which is only invoked on fork(). - Make vysycall more robust. - A few fixes and cleanups of the debug_pagetables code. Check PAGE_PRESENT instead of checking the PTE for 0 and a cleanup of the C89 initialization of the address hint array which already was out of sync with the index enums. - Move the ESPFIX init to a different place to prepare for PTI. - Several code moves with no functional change to make PTI integration simpler and header files less convoluted. - Documentation fixes and clarifications" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit init: Invoke init_espfix_bsp() from mm_init() x86/cpu_entry_area: Move it out of the fixmap x86/cpu_entry_area: Move it to a separate unit x86/mm: Create asm/invpcid.h x86/mm: Put MMU to hardware ASID translation in one place x86/mm: Remove hard-coded ASID limit checks x86/mm: Move the CR3 construction functions to tlbflush.h x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what x86/mm: Remove superfluous barriers x86/mm: Use __flush_tlb_one() for kernel memory x86/microcode: Dont abuse the TLB-flush interface x86/uv: Use the right TLB-flush API x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation x86/mm/64: Improve the memory map documentation x86/ldt: Prevent LDT inheritance on exec x86/ldt: Rework locking arch, mm: Allow arch_dup_mmap() to fail x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode ...	2017-12-23 11:53:04 -08:00
Thomas Gleixner	c10e83f598	arch, mm: Allow arch_dup_mmap() to fail In order to sanitize the LDT initialization on x86 arch_dup_mmap() must be allowed to fail. Fix up all instances. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: dan.j.williams@intel.com Cc: hughd@google.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-12-22 20:13:01 +01:00
Aneesh Kumar K.V	5769beaf18	powerpc/mm: Add proper pte access check helper for other platforms pte_access_premitted get called in get_user_pages_fast path. If we have marked the pte PROT_NONE, we should not allow a read access on the address. With the current implementation we are not checking the READ and only check for WRITE. This is needed on archs like ppc64 that implement PROT_NONE using _PAGE_USER access instead of _PAGE_PRESENT. Also add pte_user check just to make sure we are not accessing kernel mapping. Even though there is code duplication, keeping the low level pte accessors different for different platforms helps in code readability. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-12-22 22:28:40 +11:00
Aneesh Kumar K.V	f72a85e347	powerpc/mm/book3s/64: Add proper pte access check helper pte_access_premitted get called in get_user_pages_fast path. If we have marked the pte PROT_NONE, we should not allow a read access on the address. With the current implementation we are not checking the READ and only check for WRITE. This is needed on archs like ppc64 that implement PROT_NONE using RWX access instead of _PAGE_PRESENT. Also add pte_user check just to make sure we are not accessing kernel mapping. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-12-22 22:28:35 +11:00
Ram Pai	273b493689	powerpc: Swizzle around 4K PTE bits to free up bit 5 and bit 6 We need PTE bits 3 ,4, 5, 6 and 57 to support protection-keys, because these are the bits we want to consolidate on across all configuration to support protection keys. Bit 3,4,5 and 6 are currently used on 4K-pte kernels. But bit 9 and 10 are available. Hence we use the two available bits and free up bit 5 and 6. We will still not be able to free up bit 3 and 4. In the absence of any other free bits, we will have to stay satisfied with what we have :-(. This means we will not be able to support 32 protection keys, but only 8. The bit numbers are big-endian as defined in the ISA3.0 This patch does the following change to 4K PTE. H_PAGE_F_SECOND (S) which occupied bit 4 moves to bit 7. H_PAGE_F_GIX (G,I,X) which occupied bit 5, 6 and 7 also moves to bit 8,9, 10 respectively. H_PAGE_HASHPTE (H) which occupied bit 8 moves to bit 4. Before the patch, the 4k PTE format was as follows 0 1 2 3 4 5 6 7 8 9 10....................57.....63 : : : : : : : : : : : : : v v v v v v v v v v v v v ,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-, \|x\|x\|x\|B\|S \|G \|I \|X \|H\| \| \|x\|x\|................\| \|x\|x\|x\| '_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_' After the patch, the 4k PTE format is as follows 0 1 2 3 4 5 6 7 8 9 10....................57.....63 : : : : : : : : : : : : : v v v v v v v v v v v v v ,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-, \|x\|x\|x\|B\|H \| \| \|S \|G\|I\|X\|x\|x\|................\| \|.\|.\|.\| '_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_' The patch has no code changes; just swizzles around bits. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-12-20 18:57:24 +11:00
Ram Pai	7b84947cad	powerpc: shifted-by-one hidx value 0xf is considered invalid hidx value. It indicates absence of a backing HPTE. A PTE is initialized to 0xf either a) when it is new it is newly allocated to hold 4k-backing-HPTE or b) Any time it gets demoted to a 4k-backing-HPTE This patch shifts the representation by one-modulo-0xf; i.e hidx 0 is represented as 1, 1 as 2,... , and 0xf as 0. This convention lets us initialize the secondary-part of the PTE to all zeroes. PTEs are anyway zero'd when allocated. We do not have to zero them again; thus saving on the initialization. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-12-20 18:57:23 +11:00
Ram Pai	bf9a95f9a6	powerpc: Free up four 64K PTE bits in 64K backed HPTE pages Rearrange 64K PTE bits to free up bits 3, 4, 5 and 6 in the 64K backed HPTE pages. This along with the earlier patch will entirely free up the four bits from 64K PTE. The bit numbers are big-endian as defined in the ISA3.0 This patch does the following change to 64K PTE backed by 64K HPTE. H_PAGE_F_SECOND (S) which occupied bit 4 moves to the second part of the pte to bit 60. H_PAGE_F_GIX (G,I,X) which occupied bit 5, 6 and 7 also moves to the second part of the pte to bit 61, 62, 63, 64 respectively since bit 7 is now freed up, we move H_PAGE_BUSY (B) from bit 9 to bit 7. The second part of the PTE will hold (H_PAGE_F_SECOND\|H_PAGE_F_GIX) at bit 60,61,62,63. NOTE: None of the bits in the secondary PTE were not used by 64k-HPTE backed PTE. Before the patch, the 64K HPTE backed 64k PTE format was as follows 0 1 2 3 4 5 6 7 8 9 10...........................63 : : : : : : : : : : : : v v v v v v v v v v v v ,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-, \|x\|x\|x\| \|S \|G \|I \|X \|x\|B\| \|x\|x\|................\|x\|x\|x\|x\| <- primary pte '_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_' \| \| \| \| \| \| \| \| \| \| \| \| \|..................\| \| \| \| \| <- secondary pte '_'_'_'_'__'__'__'__'_'_'_'_'__________________'_'_'_'_' After the patch, the 64k HPTE backed 64k PTE format is as follows 0 1 2 3 4 5 6 7 8 9 10...........................63 : : : : : : : : : : : : v v v v v v v v v v v v ,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-, \|x\|x\|x\| \| \| \| \|B \|x\| \| \|x\|x\|................\|.\|.\|.\|.\| <- primary pte '_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_' \| \| \| \| \| \| \| \| \| \| \| \| \|..................\|S\|G\|I\|X\| <- secondary pte '_'_'_'_'__'__'__'__'_'_'_'_'__________________'_'_'_'_' The above PTE changes is applicable to hugetlbpages aswell. The patch does the following code changes: a) moves the H_PAGE_F_SECOND and H_PAGE_F_GIX to 4k PTE header since it is no more needed b the 64k PTEs. b) abstracts out __real_pte() and __rpte_to_hidx() so the caller need not know the bit location of the slot. c) moves the slot bits to the secondary pte. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-12-20 18:57:22 +11:00
Ram Pai	9d2edb1848	powerpc: Free up four 64K PTE bits in 4K backed HPTE pages Rearrange 64K PTE bits to free up bits 3, 4, 5 and 6, in the 4K backed HPTE pages.These bits continue to be used for 64K backed HPTE pages in this patch, but will be freed up in the next patch. The bit numbers are big-endian as defined in the ISA3.0 The patch does the following change to the 4k HTPE backed 64K PTE's format. H_PAGE_BUSY moves from bit 3 to bit 9 (B bit in the figure below) V0 which occupied bit 4 is not used anymore. V1 which occupied bit 5 is not used anymore. V2 which occupied bit 6 is not used anymore. V3 which occupied bit 7 is not used anymore. Before the patch, the 4k backed 64k PTE format was as follows 0 1 2 3 4 5 6 7 8 9 10...........................63 : : : : : : : : : : : : v v v v v v v v v v v v ,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-, \|x\|x\|x\|B\|V0\|V1\|V2\|V3\|x\| \| \|x\|x\|................\|x\|x\|x\|x\| <- primary pte '_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_' \|S\|G\|I\|X\|S \|G \|I \|X \|S\|G\|I\|X\|..................\|S\|G\|I\|X\| <- secondary pte '_'_'_'_'__'__'__'__'_'_'_'_'__________________'_'_'_'_' After the patch, the 4k backed 64k PTE format is as follows 0 1 2 3 4 5 6 7 8 9 10...........................63 : : : : : : : : : : : : v v v v v v v v v v v v ,-,-,-,-,--,--,--,--,-,-,-,-,-,------------------,-,-,-, \|x\|x\|x\| \| \| \| \| \|x\|B\| \|x\|x\|................\|.\|.\|.\|.\| <- primary pte '_'_'_'_'__'__'__'__'_'_'_'_'_'________________'_'_'_'_' \|S\|G\|I\|X\|S \|G \|I \|X \|S\|G\|I\|X\|..................\|S\|G\|I\|X\| <- secondary pte '_'_'_'_'__'__'__'__'_'_'_'_'__________________'_'_'_'_' the four bits S,G,I,X (one quadruplet per 4k HPTE) that cache the hash-bucket slot value, is initialized to 1,1,1,1 indicating -- an invalid slot. If a HPTE gets cached in a 1111 slot(i.e 7th slot of secondary hash bucket), it is released immediately. In other words, even though 1111 is a valid slot value in the hash bucket, we consider it invalid and release the slot and the HPTE. This gives us the opportunity to determine the validity of S,G,I,X bits based on its contents and not on any of the bits V0,V1,V2 or V3 in the primary PTE When we release a HPTE cached in the 1111 slot we also release a legitimate slot in the primary hash bucket and unmap its corresponding HPTE. This is to ensure that we do get a HPTE cached in a slot of the primary hash bucket, the next time we retry. Though treating 1111 slot as invalid, reduces the number of available slots in the hash bucket and may have an effect on the performance, the probabilty of hitting a 1111 slot is extermely low. Compared to the current scheme, the above scheme reduces the number of false hash table updates significantly and has the added advantage of releasing four valuable PTE bits for other purpose. NOTE:even though bits 3, 4, 5, 6, 7 are not used when the 64K PTE is backed by 4k HPTE, they continue to be used if the PTE gets backed by 64k HPTE. The next patch will decouple that aswell, and truely release the bits. This idea was jointly developed by Paul Mackerras, Aneesh, Michael Ellermen and myself. 4K PTE format remains unchanged currently. The patch does the following code changes a) PTE flags are split between 64k and 4k header files. b) __hash_page_4K() is reimplemented to reflect the above logic. Acked-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-12-20 18:57:20 +11:00
Ram Pai	318995b4f5	powerpc: introduce pte_get_hash_gslot() helper Introduce pte_get_hash_gslot()() which returns the global slot number of the HPTE in the global hash table. This function will come in handy as we work towards re-arranging the PTE bits in the later patches. Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-12-20 18:57:19 +11:00
Ram Pai	59aa31fd6f	powerpc: introduce pte_set_hidx() helper Introduce pte_set_hidx().It sets the (H_PAGE_F_SECOND\|H_PAGE_F_GIX) bits at the appropriate location in the PTE of 4K PTE. For 64K PTE, it sets the bits in the second part of the PTE. Though the implementation for the former just needs the slot parameter, it does take some additional parameters to keep the prototype consistent. This function will be handy as we work towards re-arranging the bits in the subsequent patches. Acked-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-12-20 18:57:19 +11:00
Bryant G. Ly	988fc3ba56	powerpc/pci: Separate SR-IOV Calls SR-IOV can now be enabled for the powernv platform and pseries platform. Therefore move the appropriate calls to machine dependent code instead of relying on definition at compile time. Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com> Signed-off-by: Juan J. Alvarez <jjalvare@us.ibm.com> Acked-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-12-11 13:03:35 +11:00
Josh Poimboeuf	b9eab08d01	powerpc/modules: Don't try to restore r2 after a sibling call When attempting to load a livepatch module, I got the following error: module_64: patch_module: Expect noop after relocate, got 3c820000 The error was triggered by the following code in unregister_netdevice_queue(): 14c: 00 00 00 48 b 14c <unregister_netdevice_queue+0x14c> 14c: R_PPC64_REL24 net_set_todo 150: 00 00 82 3c addis r4,r2,0 GCC didn't insert a nop after the branch to net_set_todo() because it's a sibling call, so it never returns. The nop isn't needed after the branch in that case. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-and-tested-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-12-11 13:03:29 +11:00
Linus Torvalds	e9ef1fe312	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb drivers). 2) Revert returning -EEXIST from __dev_alloc_name() as this propagates to userspace and broke some apps. From Johannes Berg. 3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong Wang. 4) Gianfar MAC can't do EEE so don't advertise it by default, from Claudiu Manoil. 5) Relax strict netlink attribute validation, but emit a warning. From David Ahern. 6) Fix regression in checksum offload of thunderx driver, from Florian Westphal. 7) Fix UAPI bpf issues on s390, from Hendrik Brueckner. 8) New card support in iwlwifi, from Ihab Zhaika. 9) BBR congestion control bug fixes from Neal Cardwell. 10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren. 11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan. 12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni. 13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits) net: mvpp2: fix the RSS table entry offset tcp: evaluate packet losses upon RTT change tcp: fix off-by-one bug in RACK tcp: always evaluate losses in RACK upon undo tcp: correctly test congestion state in RACK bnxt_en: Fix sources of spurious netpoll warnings tcp_bbr: reset long-term bandwidth sampling on loss recovery undo tcp_bbr: reset full pipe detection on loss recovery undo tcp_bbr: record "full bw reached" decision in new full_bw_reached bit sfc: pass valid pointers from efx_enqueue_unwind gianfar: Disable EEE autoneg by default tcp: invalidate rate samples during SACK reneging can: peak/pcie_fd: fix potential bug in restarting tx queue can: usb_8dev: cancel urb on -EPIPE and -EPROTO can: kvaser_usb: cancel urb on -EPIPE and -EPROTO can: esd_usb2: cancel urb on -EPIPE and -EPROTO can: ems_usb: cancel urb on -EPIPE and -EPROTO can: mcba_usb: cancel urb on -EPROTO usbnet: fix alignment for frames with no ethernet header tcp: use current time in tcp_rcv_space_adjust() ...	2017-12-08 13:32:44 -08:00
Hendrik Brueckner	c895f6f703	bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type Commit `0515e5999a` ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type") introduced the bpf_perf_event_data structure which exports the pt_regs structure. This is OK for multiple architectures but fail for s390 and arm64 which do not export pt_regs. Programs using them, for example, the bpf selftest fail to compile on these architectures. For s390, exporting the pt_regs is not an option because s390 wants to allow changes to it. For arm64, there is a user_pt_regs structure that covers parts of the pt_regs structure for use by user space. To solve the broken uapi for s390 and arm64, introduce an abstract type for pt_regs and add an asm/bpf_perf_event.h file that concretes the type. An asm-generic header file covers the architectures that export pt_regs today. The arch-specific enablement for s390 and arm64 follows in separate commits. Reported-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Fixes: `0515e5999a` ("bpf: introduce BPF_PROG_TYPE_PERF_EVENT program type") Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-and-tested-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2017-12-05 15:02:40 +01:00
David Gibson	ab9dbf771f	Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier" This reverts commit `a3b2cb30f2`. That commit tried to fix problems with panic on powerpc in certain circumstances, where some output from the generic panic code was being dropped. Unfortunately, it breaks things worse in other circumstances. In particular when running a PAPR guest, it will now attempt to reboot instead of informing the hypervisor (KVM or PowerVM) that the guest has crashed. The crash notification is important to some virtualization management layers. Revert it for now until we can come up with a better solution. Fixes: `a3b2cb30f2` ("powerpc: Do not call ppc_md.panic in fadump panic notifier") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: David Gibson <david@gibson.dropbear.id.au> [mpe: Tweak change log a bit] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2017-12-05 23:21:46 +11:00
Linus Torvalds	9e0600f5cf	* x86 bugfixes: APIC, nested virtualization, IOAPIC * PPC bugfix: HPT guests on a POWER9 radix host -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJaICi1AAoJEL/70l94x66DjvEIAIML/e9YX1YrJZi0rsB9cbm0 Le3o5b3wKxPrlZdnpOZQ2mVWubUQdiHMPGX6BkpgyiJWUchnbj5ql1gUf5S0i3jk TOk6nae6DU94xBuboeqZJlmx2VfPY/fqzLWsX3HFHpnzRl4XvXL5o7cWguIxVcVO yU6bPgbAXyXSBennLWZxC3aQ2Ojikr3uxZQpUZTAPOW5hFINpCKCpqJBMxsb67wq rwI0cJhRl92mHpbe8qeNJhavqY5eviy9iPUaZrOW9P4yw1uqjTAjgsUc1ydiaZSV rOHeKBOgVfY/KBaNJKyKySfuL1MJ+DLcQqm9RlGpKNpFIeB0vvSf0gtmmqIAXIk= =kh2y -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: - x86 bugfixes: APIC, nested virtualization, IOAPIC - PPC bugfix: HPT guests on a POWER9 radix host * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (26 commits) KVM: Let KVM_SET_SIGNAL_MASK work as advertised KVM: VMX: Fix vmx->nested freeing when no SMI handler KVM: VMX: Fix rflags cache during vCPU reset KVM: X86: Fix softlockup when get the current kvmclock KVM: lapic: Fixup LDR on load in x2apic KVM: lapic: Split out x2apic ldr calculation KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts KVM: vmx: use X86_CR4_UMIP and X86_FEATURE_UMIP KVM: x86: Fix CPUID function for word 6 (80000001_ECX) KVM: nVMX: Fix vmx_check_nested_events() return value in case an event was reinjected to L2 KVM: x86: ioapic: Preserve read-only values in the redirection table KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq KVM: x86: ioapic: Don't fire level irq when Remote IRR set KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race KVM: x86: inject exceptions produced by x86_decode_insn KVM: x86: Allow suppressing prints on RDMSR/WRMSR of unhandled MSRs KVM: x86: fix em_fxstor() sleeping while in atomic KVM: nVMX: Fix mmu context after VMLAUNCH/VMRESUME failure KVM: nVMX: Validate the IA32_BNDCFGS on nested VM-entry ...	2017-11-30 08:15:19 -08:00
Dan Williams	e4e40e0263	mm: switch to 'define pmd_write' instead of __HAVE_ARCH_PMD_WRITE In response to compile breakage introduced by a series that added the pud_write helper to x86, Stephen notes: did you consider using the other paradigm: In arch include files: #define pud_write pud_write static inline int pud_write(pud_t pud) ..... Then in include/asm-generic/pgtable.h: #ifndef pud_write tatic inline int pud_write(pud_t pud) { .... } #endif If you had, then the powerpc code would have worked ... ;-) and many of the other interfaces in include/asm-generic/pgtable.h are protected that way ... Given that some architecture already define pmd_write() as a macro, it's a net reduction to drop the definition of __HAVE_ARCH_PMD_WRITE. Link: http://lkml.kernel.org/r/151129126721.37405.13339850900081557813.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Suggested-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Oliver OHalloran <oliveroh@au1.ibm.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2017-11-29 18:40:42 -08:00
Linus Torvalds	83ada03196	powerpc fixes for 4.15 #2 A small batch of fixes, about 50% tagged for stable and the rest for recently merged code. There's one more fix for the >128T handling on hash. Once a process had requested a single mmap above 128T we would then always search above 128T. The correct behaviour is to consider the hint address in isolation for each mmap request. Then a couple of fixes for the IMC PMU, a missing EXPORT_SYMBOL in VAS, a fix for STRICT_KERNEL_RWX on 32-bit, and a fix to correctly identify P9 DD2.1 but in code that is currently not used by default. Thanks to: Aneesh Kumar K.V, Christophe Leroy, Madhavan Srinivasan, Sukadev Bhattiprolu. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJaF/VqAAoJEFHr6jzI4aWA994P/3NNXkSASJHjLrIlQAKXtmx9 lrv1v+6MbPWhyB8Q8LVnnC3Ab2LTHnkccjq2Jw0bP0RQ86HF4mH7Sb7N5Wj0cG+M 5NioikvGE057ncLfxVhesOK0C9Lhc7Zb+zphXZliYP76IGxwbxorJRepeZctVkyO KPMv4eaImdblVn71aoQQSlepON4+/rtiW2yo5u98uCqR+Ttds4J1fiDZ4TNrBYRP Ilh6DmA//CWvN+KsGT+brRd/PjEkxQKHyS8px3lxRl4cwCJucXPCik/Gn9t6OiMw 3S6y1Mu8nrh4z+YepKv6APvl2DEwwXn8w9f85kn+QiE9Qp3Z/wckW9/4LT5FeuKE L8E3dKq2NzJ9oDs/20sVbBvVR7CUvBoyWytsXVkmmlC6sVReTrYAJ1UP9HnNvcF6 be4zYUKusU83uG6saGgchRrPUrD31XKXw8Piv9EoWo1Uz7VgWCkxidclRNocgeDO k5VxYnRd9jPsv2pCzXH2YmuQAypGUh12IPTxEOnSt5uzXSXcamZJBLKp5fAJ/9dl jD6GlRQMX8JpNRJzxOBLly3CmwQBw2ekOuPLXI+M/ilks66AGK8lp4bg5cWwDGNe puzmRJ2mO3dnFlVUHBQ5LyX8ne4yunin1JZB1YQ4xm8yxZbGO2AdypEWMSkPKNPN fkrGPlwQ1JwFheMbHHLj =gv70 -----END PGP SIGNATURE----- Merge tag 'powerpc-4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A small batch of fixes, about 50% tagged for stable and the rest for recently merged code. There's one more fix for the >128T handling on hash. Once a process had requested a single mmap above 128T we would then always search above 128T. The correct behaviour is to consider the hint address in isolation for each mmap request. Then a couple of fixes for the IMC PMU, a missing EXPORT_SYMBOL in VAS, a fix for STRICT_KERNEL_RWX on 32-bit, and a fix to correctly identify P9 DD2.1 but in code that is currently not used by default. Thanks to: Aneesh Kumar K.V, Christophe Leroy, Madhavan Srinivasan, Sukadev Bhattiprolu" * tag 'powerpc-4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Fix Power9 DD2.1 logic in DT CPU features powerpc/perf: Fix IMC_MAX_PMU macro powerpc/perf: Fix pmu_count to count only nest imc pmus powerpc: Fix boot on BOOK3S_32 with CONFIG_STRICT_KERNEL_RWX powerpc/perf/imc: Use cpu_to_node() not topology_physical_package_id() powerpc/vas: Export chip_to_vas_id() powerpc/64s/slice: Use addr limit when computing slice mask	2017-11-24 19:40:12 -10:00
Paul Mackerras	cda2eaa359	KVM: PPC: Book3S HV: Avoid shifts by negative amounts The kvmppc_hpte_page_shifts function decodes the actual and base page sizes for a HPTE, returning -1 if it doesn't recognize the page size encoding. This then gets used as a shift amount in various places, which is undefined behaviour. This was reported by Coverity. In fact this should never occur, since we should only get HPTEs in the HPT which have a recognized page size encoding. The only place where this might not be true is in the call to kvmppc_actual_pgsz() near the beginning of kvmppc_do_h_enter(), where we are validating the HPTE value passed in from the guest. So to fix this and eliminate the undefined behaviour, we make kvmppc_hpte_page_shifts return 0 for unrecognized page size encodings, and make kvmppc_actual_pgsz() detect that case and return 0 for the page size, which will then cause kvmppc_do_h_enter() to return an error and refuse to insert any HPTE with an unrecognized page size encoding. To ensure that we don't get undefined behaviour in compute_tlbie_rb(), we take the 4k page size path for any unrecognized page size encoding. This should never be hit in practice because it is only used on HPTE values which have previously been checked for having a recognized page size encoding. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-11-23 14:23:00 +11:00
Paul Mackerras	ded13fc11b	KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts This fixes two errors that prevent a guest using the HPT MMU from successfully migrating to a POWER9 host in radix MMU mode, or resizing its HPT when running on a radix host. The first bug was that commit `8dc6cca556` ("KVM: PPC: Book3S HV: Don't rely on host's page size information", 2017-09-11) missed two uses of hpte_base_page_size(), one in the HPT rehashing code and one in kvm_htab_write() (which is used on the destination side in migrating a HPT guest). Instead we use kvmppc_hpte_base_page_shift(). Having the shift count means that we can use left and right shifts instead of multiplication and division in a few places. Along the way, this adds a check in kvm_htab_write() to ensure that the page size encoding in the incoming HPTEs is recognized, and if not return an EINVAL error to userspace. The second bug was that kvm_htab_write was performing some but not all of the functions of kvmhv_setup_mmu(), resulting in the destination VM being left in radix mode as far as the hardware is concerned. The simplest fix for now is make kvm_htab_write() call kvmppc_setup_partition_table() like kvmppc_hv_setup_htab_rma() does. In future it would be better to refactor the code more extensively to remove the duplication. Fixes: `8dc6cca556` ("KVM: PPC: Book3S HV: Don't rely on host's page size information") Fixes: `7a84084c60` ("KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9") Reported-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2017-11-23 14:17:54 +11:00

... 3 4 5 6 7 ...

4579 Commits