Commit Graph

125267 Commits

Author SHA1 Message Date
Amitoj Kaur Chawla
33799a6d1a MIPS: Modify error handling
debugfs_create_file returns NULL on error so an IS_ERR test is
incorrect here and a NULL check is required.

The Coccinelle semantic patch used to make this change is as follows:
@@
expression e;
@@

  e = debugfs_create_file(...);
if(
-    IS_ERR(e)
+    !e
    )
    {
  <+...
  return
- PTR_ERR(e)
+ -ENOMEM
  ;
  ...+>
  }

Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Cc: julia.lawall@lip6.fr
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13834/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-08-01 22:54:48 +02:00
Linus Torvalds
aeb35d6b74 Merge branch 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 header cleanups from Ingo Molnar:
 "This tree is a cleanup of the x86 tree reducing spurious uses of
  module.h - which should improve build performance a bit"

* 'x86-headers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, crypto: Restore MODULE_LICENSE() to glue_helper.c so it loads
  x86/apic: Remove duplicated include from probe_64.c
  x86/ce4100: Remove duplicated include from ce4100.c
  x86/headers: Include spinlock_types.h in x8664_ksyms_64.c for missing spinlock_t
  x86/platform: Delete extraneous MODULE_* tags fromm ts5500
  x86: Audit and remove any remaining unnecessary uses of module.h
  x86/kvm: Audit and remove any unnecessary uses of module.h
  x86/xen: Audit and remove any unnecessary uses of module.h
  x86/platform: Audit and remove any unnecessary uses of module.h
  x86/lib: Audit and remove any unnecessary uses of module.h
  x86/kernel: Audit and remove any unnecessary uses of module.h
  x86/mm: Audit and remove any unnecessary uses of module.h
  x86: Don't use module.h just for AUTHOR / LICENSE tags
2016-08-01 14:23:42 -04:00
Sam Bobroff
23528bb21e KVM: PPC: Introduce KVM_CAP_PPC_HTM
Introduce a new KVM capability, KVM_CAP_PPC_HTM, that can be queried to
determine if a PowerPC KVM guest should use HTM (Hardware Transactional
Memory).

This will be used by QEMU to populate the pa-features bits in the
guest's device tree.

Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 19:42:06 +02:00
Bjorn Helgaas
9454c23852 Merge branch 'pci/msi-affinity' into next
Conflicts:
	drivers/nvme/host/pci.c
2016-08-01 12:34:01 -05:00
Bjorn Helgaas
a04bee8285 Merge branches 'pci/host-aardvark', 'pci/host-altera', 'pci/host-dra7xx', 'pci/host-hv', 'pci/host-vmd' and 'pci/host-xilinx' into next
* pci/host-aardvark:
  arm64: dts: marvell: Add Aardvark PCIe support for Armada 3700
  PCI: aardvark: Add Aardvark PCI host controller driver
  dt-bindings: add DT binding for the Aardvark PCIe controller

* pci/host-altera:
  PCI: altera: Poll for link up status after retraining the link
  PCI: altera: Check link status before retrain link
  PCI: altera: Reorder read/write functions

* pci/host-dra7xx:
  PCI: dra7xx: Fix return value in case of error

* pci/host-hv:
  PCI: hv: Fix interrupt cleanup path
  PCI: hv: Handle all pending messages in hv_pci_onchannelcallback()
  PCI: hv: Don't leak buffer in hv_pci_onchannelcallback()

* pci/host-vmd:
  x86/PCI: VMD: Separate MSI and MSI-X vector sharing
  x86/PCI: VMD: Use x86_vector_domain as parent domain
  x86/PCI: VMD: Use lock save/restore in interrupt enable path
  x86/PCI: VMD: Initialize list item in IRQ disable
  x86/PCI: VMD: Select device dma ops to override

* pci/host-xilinx:
  PCI: xilinx: Fix return value in case of error

Manually apply changes from pci/demodularize-hosts and
pci/host-request-windows to drivers/pci/host/pci-aardvark.c
2016-08-01 12:32:13 -05:00
Bjorn Helgaas
79dd993461 Merge branches 'pci/demodularize-hosts' and 'pci/host-request-windows' into next
* pci/demodularize-hosts:
  PCI: xgene: Make explicitly non-modular
  PCI: thunder-pem: Make explicitly non-modular
  PCI: thunder-ecam: Make explicitly non-modular
  PCI: tegra: Make explicitly non-modular
  PCI: rcar-gen2: Make explicitly non-modular
  PCI: rcar: Make explicitly non-modular
  PCI: mvebu: Make explicitly non-modular
  PCI: layerscape: Make explicitly non-modular
  PCI: keystone: Make explicitly non-modular
  PCI: hisi: Make explicitly non-modular
  PCI: generic: Make explicitly non-modular
  PCI: designware-plat: Make it explicitly non-modular
  PCI: artpec6: Make explicitly non-modular
  PCI: armada8k: Make explicitly non-modular
  PCI: artpec: Add PCI_MSI_IRQ_DOMAIN dependency
  PCI: artpec: Add Axis ARTPEC-6 PCIe controller driver
  PCI: Add DT binding for Axis ARTPEC-6 PCIe controller
  PCI: generic: Select IRQ_DOMAIN

* pci/host-request-windows:
  PCI: versatile: Simplify host bridge window iteration
  PCI: versatile: Request host bridge window resources with core function
  PCI: tegra: Request host bridge window resources with core function
  PCI: tegra: Remove top-level resource from hierarchy
  PCI: rcar: Simplify host bridge window iteration
  PCI: rcar: Request host bridge window resources with core function
  PCI: rcar Gen2: Request host bridge window resources
  PCI: rcar: Drop gen2 dummy I/O port region
  ARM: Make PCI I/O space optional
  PCI: mvebu: Request host bridge window resources with core function
  PCI: generic: Simplify host bridge window iteration
  PCI: generic: Request host bridge window resources with core function
  PCI: altera: Simplify host bridge window iteration
  PCI: altera: Request host bridge window resources with core function
  PCI: xilinx-nwl: Use dev_printk() when possible
  PCI: xilinx-nwl: Request host bridge window resources
  PCI: xilinx-nwl: Free bridge resource list on failure
  PCI: xilinx: Request host bridge window resources
  PCI: xilinx: Free bridge resource list on failure
  PCI: xgene: Request host bridge window resources
  PCI: xgene: Free bridge resource list on failure
  PCI: iproc: Request host bridge window resources
  PCI: designware: Simplify host bridge window iteration
  PCI: designware: Request host bridge window resources
  PCI: designware: Free bridge resource list on failure
  PCI: Add devm_request_pci_bus_resources()
2016-08-01 12:23:57 -05:00
Bjorn Helgaas
3efc702378 Merge branch 'pci/resource' into next
* pci/resource:
  unicore32/PCI: Remove pci=firmware command line parameter handling
  ARM/PCI: Remove arch-specific pcibios_enable_device()
  ARM64/PCI: Remove arch-specific pcibios_enable_device()
  MIPS/PCI: Claim bus resources on PCI_PROBE_ONLY set-ups
  ARM/PCI: Claim bus resources on PCI_PROBE_ONLY set-ups
  PCI: generic: Claim bus resources on PCI_PROBE_ONLY set-ups
  PCI: Add generic pci_bus_claim_resources()
  alx: Use pci_(request|release)_mem_regions
  ethernet/intel: Use pci_(request|release)_mem_regions
  GenWQE: Use pci_(request|release)_mem_regions
  lpfc: Use pci_(request|release)_mem_regions
  NVMe: Use pci_(request|release)_mem_regions
  PCI: Add helpers to request/release memory and I/O regions
  PCI: Extending pci=resource_alignment to specify device/vendor IDs
  sparc/PCI: Implement pci_resource_to_user() with pcibios_resource_to_bus()
  powerpc/pci: Implement pci_resource_to_user() with pcibios_resource_to_bus()
  microblaze/PCI: Implement pci_resource_to_user() with pcibios_resource_to_bus()
  PCI: Unify pci_resource_to_user() declarations
  microblaze/PCI: Remove useless __pci_mmap_set_pgprot()
  powerpc/pci: Remove __pci_mmap_set_pgprot()
  PCI: Ignore write combining when mapping I/O port space
2016-08-01 12:23:44 -05:00
Bjorn Helgaas
a00c74c166 Merge branches 'pci/aspm', 'pci/dpc', 'pci/hotplug', 'pci/misc', 'pci/msi', 'pci/pm' and 'pci/virtualization' into next
* pci/aspm:
  PCI/ASPM: Remove redundant check of pcie_set_clkpm

* pci/dpc:
  PCI: Remove DPC tristate module option
  PCI: Bind DPC to Root Ports as well as Downstream Ports
  PCI: Fix whitespace in struct dpc_dev
  PCI: Convert Downstream Port Containment driver to use devm_* functions

* pci/hotplug:
  PCI: Allow additional bus numbers for hotplug bridges

* pci/misc:
  PCI: Include <asm/dma.h> for isa_dma_bridge_buggy
  PCI: Make bus_attr_resource_alignment static
  MAINTAINERS: Add file patterns for PCI device tree bindings
  PCI: Fix comment typo

* pci/msi:
  PCI/MSI: irqchip: Fix PCI_MSI dependencies

* pci/pm:
  PCI: pciehp: Ignore interrupts during D3cold
  PCI: Document connection between pci_power_t and hardware PM capability
  PCI: Add runtime PM support for PCIe ports
  ACPI / hotplug / PCI: Runtime resume bridge before rescan
  PCI: Power on bridges before scanning new devices
  PCI: Put PCIe ports into D3 during suspend
  PCI: Don't clear d3cold_allowed for PCIe ports
  PCI / PM: Enforce type casting for pci_power_t

* pci/virtualization:
  PCI: Add ACS quirk for Solarflare SFC9220
  PCI: Add DMA alias quirk for Adaptec 3805
  PCI: Mark Atheros AR9485 and QCA9882 to avoid bus reset
  PCI: Add function 1 DMA alias quirk for Marvell 88SE9182
2016-08-01 12:23:31 -05:00
James Hogan
40a2df4985 MIPS: Select HAVE_KVM for MIPS64_R{2,6}
We are now able to support KVM T&E with MIPS32 guests on some MIPS64r2
and MIPS64r6 hosts, so select HAVE_KVM so it can be enabled.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 18:42:27 +02:00
James Hogan
a700434d80 MIPS: KVM: Reset CP0_PageMask during host TLB flush
KVM sometimes flushes host TLB entries, reading each one to check if it
corresponds to a guest KSeg0 address. In the absence of EntryHi.EHInv
bits to invalidate the whole entry, the entries will be set to unique
virtual addresses in KSeg0 (which is not TLB mapped), spaced 2*PAGE_SIZE
apart.

The TLB read however will clobber the CP0_PageMask register with
whatever page size that TLB entry had, and that same page size will be
written back into the TLB entry along with the unique address.

This would cause breakage when transparent huge pages are enabled on
64-bit host kernels, since huge page entries will overlap other nearby
entries when separated by only 2*PAGE_SIZE, causing a machine check
exception.

Fix this by restoring the old CP0_PageMask value (which should be set to
the normal page size) after reading the TLB entry if we're going to go
ahead and invalidate it.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 18:42:27 +02:00
James Hogan
8296963e6e MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX()
kvm_mips_trans_replace() passes a pointer to KVM_GUEST_KSEGX(). This
breaks on 64-bit builds due to the cast of that 64-bit pointer to a
different sized 32-bit int. Cast the pointer argument to an unsigned
long to work around the warning.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 18:42:26 +02:00
James Hogan
172e02d147 MIPS: KVM: Sign extend MFC0/RDHWR results
When emulating MFC0 instructions to load 32-bit values from guest COP0
registers and the RDHWR instruction to read the CC (Count) register,
sign extend the result to comply with the MIPS64 architecture. The
result must be in canonical 32-bit form or the guest may malfunction.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 18:42:25 +02:00
James Hogan
5808844f03 MIPS: KVM: Fix 64-bit big endian dynamic translation
The MFC0 and MTC0 instructions in the guest which cause traps can be
replaced with 32-bit loads and stores to the commpage, however on big
endian 64-bit builds the offset needs to have 4 added so as to
load/store the least significant half of the long instead of the most
significant half.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 18:42:25 +02:00
James Hogan
2a06dab877 MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase
Fail if the address of the allocated exception base doesn't fit into the
CP0_EBase register. This can happen on MIPS64 if CP0_EBase.WG isn't
implemented but RAM is available outside of the range of KSeg0.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 18:42:24 +02:00
James Hogan
0d17aea5c2 MIPS: KVM: Use 64-bit CP0_EBase when appropriate
Update the KVM entry point to write CP0_EBase as a 64-bit register when
it is 64-bits wide, and to set the WG (write gate) bit if it exists in
order to write bits 63:30 (or 31:30 on MIPS32).

Prior to MIPS64r6 it was UNDEFINED to perform a 64-bit read or write of
a 32-bit COP0 register. Since this is dynamically generated code,
generate the right type of access depending on whether the kernel is
64-bit and cpu_has_ebase_wg.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 18:42:24 +02:00
James Hogan
1d75694253 MIPS: KVM: Set CP0_Status.KX on MIPS64
Update the KVM entry code to set the CP0_Entry.KX bit on 64-bit kernels.
This is important to allow the entry code, running in kernel mode, to
access the full 64-bit address space right up to the point of entering
the guest, and immediately after exiting the guest, so it can safely
restore & save the guest context from 64-bit segments.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 18:42:23 +02:00
James Hogan
e41637d858 MIPS: KVM: Make entry code MIPS64 friendly
The MIPS KVM entry code (originally kvm_locore.S, later locore.S, and
now entry.c) has never quite been right when built for 64-bit, using
32-bit instructions when 64-bit instructions were needed for handling
64-bit registers and pointers. Fix several cases of this now.

The changes roughly fall into the following categories.

- COP0 scratch registers contain guest register values and the VCPU
  pointer, and are themselves full width. Similarly CP0_EPC and
  CP0_BadVAddr registers are full width (even though technically we
  don't support 64-bit guest address spaces with trap & emulate KVM).
  Use MFC0/MTC0 for accessing them.

- Handling of stack pointers and the VCPU pointer must match the pointer
  size of the kernel ABI (always o32 or n64), so use ADDIU.

- The CPU number in thread_info, and the guest_{user,kernel}_asid arrays
  in kvm_vcpu_arch are all 32 bit integers, so use lw (instead of LW) to
  load them.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 18:42:22 +02:00
James Hogan
28cc5bd568 MIPS: KVM: Use kmap instead of CKSEG0ADDR()
There are several unportable uses of CKSEG0ADDR() in MIPS KVM, which
implicitly assume that a host physical address will be in the low 512MB
of the physical address space (accessible in KSeg0). These assumptions
don't hold for highmem or on 64-bit kernels.

When interpreting the guest physical address when reading or overwriting
a trapping instruction, use kmap_atomic() to get a usable virtual
address to access guest memory, which is portable to 64-bit and highmem
kernels.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 18:42:22 +02:00
James Hogan
cfacaced0c MIPS: KVM: Use virt_to_phys() to get commpage PFN
Calculate the PFN of the commpage using virt_to_phys() instead of
CPHYSADDR(). This is more portable as kzalloc() may allocate from XKPhys
instead of KSeg0 on 64-bit kernels, which CPHYSADDR() doesn't handle.
This is sufficient for highmem kernels too since kzalloc() will allocate
from lowmem in KSeg0.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim Krčmář" <rkrcmar@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kvm@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 18:42:21 +02:00
James Hogan
6002bdd3e6 MIPS: Fix definition of KSEGX() for 64-bit
The KSEGX() macro is defined to 32-bit sign extend the address argument
and logically AND the result with 0xe0000000, with the final result
usually compared against one of the CKSEG macros. However the literal
0xe0000000 is unsigned as the high bit is set, and is therefore
zero-extended on 64-bit kernels, resulting in the sign extension bits of
the argument being masked to zero. This results in the odd situation
where:

  KSEGX(CKSEG) != CKSEG
  (0xffffffff80000000 & 0x00000000e0000000) != 0xffffffff80000000)

Fix this by 32-bit sign extending the 0xe0000000 literal using
_ACAST32_.

This will help some MIPS KVM code handling 32-bit guest addresses to
work on 64-bit host kernels, but will also affect KSEGX in
dec_kn01_be_backend() on a 64-bit DECstation kernel, and the SiByte DMA
page ops KSEGX check in clear_page() and copy_page() on 64-bit SB1
kernels, neither of which appear to be designed with 64-bit segments in
mind anyway.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-mips@linux-mips.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 18:42:20 +02:00
Andrew Jones
89581f06b2 arm64: KVM: Set cpsr before spsr on fault injection
We need to set cpsr before determining the spsr bank, as the bank
depends on the target exception level of the injection, not the
current mode of the vcpu. Normally this is one in the same (EL1),
but not when we manage to trap an EL0 fault. It still doesn't really
matter for the 64-bit EL0 case though, as vcpu_spsr() unconditionally
uses the EL1 bank for that. However the 32-bit EL0 case gets fun, as
that path will lead to the BUG() in vcpu_spsr32().

This patch fixes the assignment order and also modifies some white
space in order to better group pairs of lines that have strict order.

Cc: stable@vger.kernel.org # v4.5
Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2016-08-01 15:09:47 +01:00
Ard Biesheuvel
04a8481061 arm64: mm: avoid fdt_check_header() before the FDT is fully mapped
As reported by Zijun, the fdt_check_header() call in __fixmap_remap_fdt()
is not safe since it is not guaranteed that the FDT header is mapped
completely. Due to the minimum alignment of 8 bytes, the only fields we
can assume to be mapped are 'magic' and 'totalsize'.

Since the OF layer is in charge of validating the FDT image, and we are
only interested in making reasonably sure that the size field contains
a meaningful value, replace the fdt_check_header() call with an explicit
comparison of the magic field's value against the expected value.

Cc: <stable@vger.kernel.org>
Reported-by: Zijun Hu <zijun_hu@htc.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-08-01 14:17:01 +01:00
Jim Mattson
b80c76ec98 KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD
Kexec needs to know the addresses of all VMCSs that are active on
each CPU, so that it can flush them from the VMCS caches. It is
safe to record superfluous addresses that are not associated with
an active VMCS, but it is not safe to omit an address associated
with an active VMCS.

After a call to vmcs_load, the VMCS that was loaded is active on
the CPU. The VMCS should be added to the CPU's list of active
VMCSs before it is loaded.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2016-08-01 15:14:24 +02:00
David Matlack
4f2777bc97 kvm: x86: nVMX: maintain internal copy of current VMCS
KVM maintains L1's current VMCS in guest memory, at the guest physical
page identified by the argument to VMPTRLD. This makes hairy
time-of-check to time-of-use bugs possible,as VCPUs can be writing
the the VMCS page in memory while KVM is emulating VMLAUNCH and
VMRESUME.

The spec documents that writing to the VMCS page while it is loaded is
"undefined". Therefore it is reasonable to load the entire VMCS into
an internal cache during VMPTRLD and ignore writes to the VMCS page
-- the guest should be using VMREAD and VMWRITE to access the current
VMCS.

To adhere to the spec, KVM should flush the current VMCS during VMPTRLD,
and the target VMCS during VMCLEAR (as given by the operand to VMCLEAR).
Since this implementation of VMCS caching only maintains the the current
VMCS, VMCLEAR will only do a flush if the operand to VMCLEAR is the
current VMCS pointer.

KVM will also flush during VMXOFF, which is not mandated by the spec,
but also not in conflict with the spec.

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-01 14:49:05 +02:00
Juergen Gross
005bd0077a perf/x86: Modify error message in virtualized environment
It is known that PMU isn't working in some virtualized environments.

Modify the message issued in that case to mention why hardware PMU
isn't usable instead of reporting it to be broken.

As a side effect this will correct a little bug in the error message:
The error message was meant to be either of level err or info
depending on the environment (native or virtualized). As the level is
taken from the format string and not the printed string, specifying
it via %s and a conditional argument didn't work the way intended.

Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Link: http://lkml.kernel.org/r/1470051427-16795-1-git-send-email-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-01 14:30:34 +02:00
David Howells
f7d665627e x86/syscalls/64: Add compat_sys_keyctl for 32-bit userspace
x86_64 needs to use compat_sys_keyctl for 32-bit userspace rather than
calling sys_keyctl(). The latter will work in a lot of cases, thereby
hiding the issue.

Reported-by: Stephan Mueller <smueller@chronox.de>
Tested-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keyrings@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/146961615805.14395.5581949237156769439.stgit@warthog.procyon.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-01 11:31:24 +02:00
Ingo Molnar
7946c2beac Merge branch 'x86/asm' into x86/urgent, to pick up fix
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-08-01 09:30:19 +02:00
Linus Torvalds
27acbec338 Merge git://www.linux-watchdog.org/linux-watchdog
Pull watchdog updates from Wim Van Sebroeck:
 "Core:
   - min and max timeout improvements, WDOG_HW_RUNNING improvements,
     status funtionality
   - Add a device managed API for watchdog_register_device()

  New watchdog drivers:
   -  Aspeed SoCs
   -  Maxim PMIC MAX77620
   -  Amlogic Meson GXBB SoC

  Enhancements:
   - support for the r8a7796 watchdog device
   - support for F81866 watchdog device
   - support for 5th variation of Apollo Lake
   - support for MCP78S chipset
   - clean-up of softdog.c watchdog device driver
   - pic32-wdt and pic32-dmt fixes
   - Documentation/watchdog: watchdog-test improvements
   - several other fixes and improvements"

* git://www.linux-watchdog.org/linux-watchdog: (50 commits)
  watchdog: gpio_wdt: Fix missing platform_set_drvdata() in gpio_wdt_probe()
  watchdog: core: Clear WDOG_HW_RUNNING before calling the stop function
  watchdog: core: Fix error handling of watchdog_dev_init()
  watchdog: pic32-wdt: Fix return value check in pic32_wdt_drv_probe()
  watchdog: pic32-dmt: Remove .owner field for driver
  watchdog: pic32-wdt: Remove .owner field for driver
  watchdog: renesas-wdt: Add support for the r8a7796 wdt
  Documentation/watchdog: check return value for magic close
  watchdog: sbsa: Drop status function
  watchdog: Implement status function in watchdog core
  watchdog: tangox: Set max_hw_heartbeat_ms instead of max_timeout
  watchdog: change watchdog_need_worker logic
  watchdog: add support for MCP78S chipset in nv_tco
  watchdog: bcm2835_wdt: remove redundant ->set_timeout callback
  watchdog: bcm2835_wdt: constify _ops and _info structures
  dt-bindings: watchdog: Add Meson GXBB Watchdog bindings
  watchdog: Add Meson GXBB Watchdog Driver
  watchdog: qcom: configure BARK time in addition to BITE time
  watchdog: qcom: add option for standalone watchdog not in timer block
  watchdog: qcom: update device tree bindings
  ...
2016-07-31 21:32:22 -04:00
Anshuman Khandual
a67ae75802 powerpc/ptrace: Enable support for Performance Monitor registers
This patch enables support for Performance monitor registers related
ELF core note NT_PPC_PMU based ptrace requests through
PTRACE_GETREGSET, PTRACE_SETREGSET calls. This is achieved
through adding one new register sets REGSET_PMU in powerpc
corresponding to the ELF core note sections added in this
regard. It also implements the get, set and active functions
for this new register sets added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:24 +10:00
Anshuman Khandual
cf89d4e1b1 powerpc/ptrace: Enable support for EBB registers
This patch enables support for EBB state registers related
ELF core note NT_PPC_EBB based ptrace requests through
PTRACE_GETREGSET, PTRACE_SETREGSET calls. This is achieved
through adding one new register sets REGSET_EBB in powerpc
corresponding to the ELF core note sections added in this
regard. It also implements the get, set and active functions
for this new register sets added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:23 +10:00
Anshuman Khandual
fa439810cc powerpc/ptrace: Enable support for NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR
This patch enables support for running TAR, PPR, DSCR registers
related ELF core notes NT_PPPC_TAR, NT_PPC_PPR, NT_PPC_DSCR based
ptrace requests through PTRACE_GETREGSET, PTRACE_SETREGSET calls.
This is achieved through adding three new register sets REGSET_TAR,
REGSET_PPR, REGSET_DSCR in powerpc corresponding to the ELF core
note sections added in this regad. It implements the get, set and
active functions for all these new register sets added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:22 +10:00
Anshuman Khandual
c45dc9003a powerpc/ptrace: Enable NT_PPC_TM_CTAR, NT_PPC_TM_CPPR, NT_PPC_TM_CDSCR
This patch enables support for all three TM checkpointed SPR
states related ELF core note  NT_PPC_TM_CTAR, NT_PPC_TM_CPPR,
NT_PPC_TM_CDSCR based ptrace requests through PTRACE_GETREGSET,
PTRACE_SETREGSET calls. This is achieved through adding three
new register sets REGSET_TM_CTAR, REGSET_TM_CPPR and
REGSET_TM_CDSCR in powerpc corresponding to the ELF core note
sections added. It implements the get, set and active functions
for all these new register sets added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:22 +10:00
Anshuman Khandual
08e1c01d6a powerpc/ptrace: Enable support for TM SPR state
This patch enables support for TM SPR state related ELF core
note NT_PPC_TM_SPR based ptrace requests through PTRACE_GETREGSET,
PTRACE_SETREGSET calls. This is achieved through adding a register
set REGSET_TM_SPR in powerpc corresponding to the ELF core note
section added. It implements the get, set and active functions for
this new register set added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:21 +10:00
Anshuman Khandual
9d3918f7c0 powerpc/ptrace: Enable support for NT_PPC_CVSX
This patch enables support for TM checkpointed VSX register
set ELF core note NT_PPC_CVSX based ptrace requests through
PTRACE_GETREGSET, PTRACE_SETREGSET calls. This is achieved
through adding a register set REGSET_CVSX in powerpc
corresponding to the ELF core note section added. It
implements the get, set and active functions for this new
register set added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:20 +10:00
Anshuman Khandual
8c13f59999 powerpc/ptrace: Enable support for NT_PPC_CVMX
This patch enables support for TM checkpointed VMX register
set ELF core note NT_PPC_CVMX based ptrace requests through
PTRACE_GETREGSET, PTRACE_SETREGSET calls. This is achieved
through adding a register set REGSET_CVMX in powerpc
corresponding to the ELF core note section added. It
implements the get, set and active functions for this new
register set added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:19 +10:00
Anshuman Khandual
19cbcbf75a powerpc/ptrace: Enable support for NT_PPC_CFPR
This patch enables support for TM checkpointed FPR register
set ELF core note NT_PPC_CFPR based ptrace requests through
PTRACE_GETREGSET, PTRACE_SETREGSET calls. This is achieved
through adding a register set REGSET_CFPR in powerpc
corresponding to the ELF core note section added. It
implements the get, set and active functions for this new
register set added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:19 +10:00
Anshuman Khandual
25847fb195 powerpc/ptrace: Enable support for NT_PPC_CGPR
This patch enables support for TM checkpointed GPR register
set ELF core note NT_PPC_CGPR based ptrace requests through
PTRACE_GETREGSET, PTRACE_SETREGSET calls. This is achieved
through adding a register set REGSET_CGPR in powerpc
corresponding to the ELF core note section added. It
implements the get, set and active functions for this new
register set added.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:18 +10:00
Anshuman Khandual
04fcadce0e powerpc/ptrace: Adapt gpr32_get, gpr32_set functions for transaction
This patch splits gpr32_get, gpr32_set functions to accommodate
in transaction ptrace requests implemented in patches later in
the series.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:17 +10:00
Anshuman Khandual
94b7d3610e powerpc/ptrace: Enable in transaction NT_PPC_VSX ptrace requests
This patch enables in transaction NT_PPC_VSX ptrace requests. The
function vsr_get which gets the running value of all VSX registers
and the function vsr_set which sets the running value of of all VSX
registers work on the running set of VMX registers whose location
will be different if transaction is active. This patch makes these
functions adapt to situations when the transaction is active.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:17 +10:00
Anshuman Khandual
d844e27915 powerpc/ptrace: Enable in transaction NT_PPC_VMX ptrace requests
This patch enables in transaction NT_PPC_VMX ptrace requests. The
function vr_get which gets the running value of all VMX registers
and the function vr_set which sets the running value of of all VMX
registers work on the running set of VMX registers whose location
will be different if transaction is active. This patch makes these
functions adapt to situations when the transaction is active.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:16 +10:00
Anshuman Khandual
1ec8549d44 powerpc/ptrace: Enable in transaction NT_PRFPREG ptrace requests
This patch enables in transaction NT_PRFPREG ptrace requests.
The function fpr_get which gets the running value of all FPR
registers and the function fpr_set which sets the running
value of of all FPR registers work on the running set of FPR
registers whose location will be different if transaction is
active. This patch makes these functions adapt to situations
when the transaction is active.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:15 +10:00
Anshuman Khandual
8d460f6156 powerpc/process: Add the function flush_tmregs_to_thread
This patch creates a function flush_tmregs_to_thread which
will then be used by subsequent patches in this series. The
function checks for self tracing ptrace interface attempts
while in the TM context and logs appropriate warning message.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:15 +10:00
Aneesh Kumar K.V
703b41ad1a powerpc/mm: remove flush_tlb_page_nohash
This should be same as flush_tlb_page except for hash32. For hash32
I guess the existing code is wrong, because we don't seem to be
flushing tlb for Hash != 0 case at all. Fix this by switching to
calling flush_tlb_page() which does the right thing by flushing
tlb for both hash and nohash case with hash32

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:13 +10:00
Aneesh Kumar K.V
5491ae7b6f powerpc/mm/hugetlb: Add flush_hugetlb_tlb_range
Some archs like ppc64 need to do special things when flushing tlb for
hugepage. Add a new helper to flush hugetlb tlb range. This helps us to
avoid flushing the entire tlb mapping for the pid.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:13 +10:00
Aneesh Kumar K.V
fbfa26d854 powerpc/mm/radix/hugetlb: Add helper for finding page size from hstate
Use the helper instead of open coding the same at multiple place

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:12 +10:00
Aneesh Kumar K.V
f22dfc9158 powerpc/mm/radix: Rename function and drop unused arg
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:11 +10:00
Aneesh Kumar K.V
d8e91e93e9 powerpc/mm/radix: Add tlb flush of THP ptes
Instead of flushing the entire mm, implement a flush_pmd_tlb_range

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:10 +10:00
Aneesh Kumar K.V
9d4dab1158 powerpc/mm: Drop multiple definition of mm_is_core_local
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:10 +10:00
Aneesh Kumar K.V
13dce03363 powerpc/mm: Use hugetlb flush functions
Use flush_hugetlb_page instead of flush_tlb_page when we clear flush the
pte.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:09 +10:00
Aneesh Kumar K.V
138ee7ee01 powerpc/mm/hash: Add helper for finding SLBE LLP encoding
Replace opencoding of the same at multiple places with the helper.
No functional change with this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:08 +10:00
Aneesh Kumar K.V
8cb8140c4c powerpc/mm/radix: Implement tlb mmu gather flush efficiently
Now that we track page size in mmu_gather, we can use address based
tlbie format when doing a tlb_flush(). We don't do this if we are
invalidating the full address space.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:08 +10:00
Michael Ellerman
e2985fd9b8 powerpc/jump_label: Annotate jump label assembly
Add a comment to the generated assembler for jump labels. This makes it
easier to identify them in asm listings (generated with $ make foo.s).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:07 +10:00
Aneesh Kumar K.V
c812c7d8f1 powerpc/mm: Catch usage of cpu/mmu_has_feature() before jump label init
This allows us to catch incorrect usage of cpu_has_feature() and
mmu_has_feature() prior to jump labels being initialised.

mpe: Use printk() and dump_stack() rather than WARN_ON(), because
WARN_ON() may not work this early in boot. Rename the Kconfig.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:06 +10:00
Kevin Hao
c12e6f24d4 powerpc: Add option to use jump label for mmu_has_feature()
As we just did for CPU features.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:06 +10:00
Kevin Hao
4db7327194 powerpc: Add option to use jump label for cpu_has_feature()
We do binary patching of asm code using CPU features, which is a
one-time operation, done during early boot. However checks of CPU
features in C code are currently done at run time, even though the set
of CPU features can never change after boot.

We can optimise this by using jump labels to implement cpu_has_feature(),
meaning checks in C code are binary patched into a single nop or branch.

For a C sequence along the lines of:

    if (cpu_has_feature(FOO))
         return 2;

The generated code before is roughly:

    ld      r9,-27640(r2)
    ld      r9,0(r9)
    lwz     r9,32(r9)
    cmpwi   cr7,r9,0
    bge     cr7, 1f
    li      r3,2
    blr
1:  ...

After (true):
    nop
    li      r3,2
    blr

After (false):
    b	1f
    li      r3,2
    blr
1:  ...

mpe: Rename MAX_CPU_FEATURES as we already have a #define with that
name, and define it simply as a constant, rather than doing tricks with
sizeof and NULL pointers. Rename the array to cpu_feature_keys. Use the
kconfig we added to guard it. Add BUILD_BUG_ON() if the feature is not a
compile time constant. Rewrite the change log.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:05 +10:00
Michael Ellerman
bfbfc8a43c powerpc: Add kconfig option to use jump labels for cpu/mmu_has_feature()
Add a kconfig option to control whether we use jump label for the
cpu/mmu_has_feature() checks. Currently this does nothing, but we will
enabled it in the subsequent patches.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:04 +10:00
Kevin Hao
b92a226e52 powerpc: Move cpu_has_feature() to a separate file
We plan to use jump label for cpu_has_feature(). In order to implement
this we need to include the linux/jump_label.h in asm/cputable.h.

Unfortunately if we do that it leads to an include loop. The root of the
problem seems to be that reg.h needs cputable.h (for CPU_FTRs), and then
cputable.h via jump_label.h eventually pulls in hw_irq.h which needs
reg.h (for MSR_EE).

So move cpu_has_feature() to a separate file on its own.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Rename to cpu_has_feature.h and flesh out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:03 +10:00
Kevin Hao
905259e33d powerpc: Remove mfvtb()
This function is only used by get_vtb(). They are almost the same except
the reading from the real register. Move the mfspr() to get_vtb() and
kill the function mfvtb(). With this, we can eliminate the use of
cpu_has_feature() in very core header file like reg.h. This is a
preparation for the use of jump label for cpu_has_feature().

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:03 +10:00
Aneesh Kumar K.V
309b315b6e powerpc: Call jump_label_init() in apply_feature_fixups()
Call jump_label_init() early so that we can use static keys for CPU and
MMU feature checks.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:02 +10:00
Aneesh Kumar K.V
b8f1b4f860 powerpc/mm: Convert early cpu/mmu feature check to use the new helpers
This switches early feature checks to use the non static key variant of
the function. In later patches we will be switching cpu_has_feature()
and mmu_has_feature() to use static keys and we can use them only after
static key/jump label is initialized. Any check for feature before jump
label init should be done using this new helper.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:01 +10:00
Michael Ellerman
a141cca389 powerpc/mm: Add early_[cpu|mmu]_has_feature()
In later patches, we will be switching CPU and MMU feature checks to
use static keys.

For checks in early boot before jump label is initialized we need a
variant of [cpu|mmu]_has_feature() that doesn't use jump labels.

So create those called, unimaginatively, early_[cpu|mmu]_has_feature().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:15:00 +10:00
Michael Ellerman
bab4c8de62 powerpc/mm: Define radix_enabled() in one place & use static inline
Currently we have radix_enabled() three times, twice in asm/book3s/64/mmu.h
and then a fallback in asm/mmu.h.

Consolidate them in asm/mmu.h. While we're at it convert them to be
static inlines, and change the fallback case to returning a bool, like
mmu_has_feature().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:59 +10:00
Michael Ellerman
6574ba950b powerpc/kernel: Convert cpu_has_feature() to returning bool
The intention is that the result is only used as a boolean, so enforce
that by changing the return type to bool.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:58 +10:00
Michael Ellerman
a81dc9d995 powerpc/kernel: Convert mmu_has_feature() to returning bool
The intention is that the result is only used as a boolean, so enforce
that by changing the return type to bool.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:58 +10:00
Aneesh Kumar K.V
5a25b6f527 powerpc/mm: Make MMU_FTR_RADIX a MMU family feature
MMU feature bits are defined such that we use the lower half to
present MMU family features. Remove the strict split of half and
also move Radix to a mmu family feature. Radix introduce a new MMU
model and strictly speaking it is a new MMU family. This also free
up bits which can be used for individual features later.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:57 +10:00
Michael Ellerman
a28e46f109 powerpc/kernel: Check features don't change after patching
Early in boot we binary patch some sections of code based on the CPU and
MMU feature bits. But it is a one-time patching, there is no facility
for repatching the code later if the set of features change.

It is a major bug if the set of features changes after we've done the
code patching - so add a check for it.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:56 +10:00
Michael Ellerman
9e8066f398 powerpc/64: Do feature patching before MMU init
Up until now we needed to do the MMU init before feature patching,
because part of the MMU init was scanning the device tree and setting
and/or clearing some MMU feature bits.

Now that we have split that MMU feature modification out into routines
called from early_init_devtree() (called earlier) we can now do feature
patching before calling MMU init.

The advantage of this is it means the remainder of the MMU init runs
with the final set of features which will apply for the rest of the life
of the system. This means we don't have to special case anything called
from MMU init to deal with a changing set of feature bits.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:56 +10:00
Michael Ellerman
2537b09c93 powerpc/mm: Do radix device tree scanning earlier
Like we just did for hash, split the device tree scanning parts out and
call them from mmu_early_init_devtree().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:55 +10:00
Michael Ellerman
bacf9cf883 powerpc/mm: Do hash device tree scanning earlier
Currently MMU initialisation (early_init_mmu()) consists of a mixture of
scanning the device tree, setting MMU feature bits, and then also doing
actual initialisation of MMU data structures.

We'd like to decouple the setting of the MMU features from the actual
setup. So split out the device tree scanning, and associated code, and
call it from mmu_init_early_devtree().

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:54 +10:00
Michael Ellerman
c610ec60ed powerpc/mm: Move disable_radix handling into mmu_early_init_devtree()
Move the handling of the disable_radix command line argument into the
newly created mmu_early_init_devtree().

It's an MMU option so it's preferable to have it in an mm related file,
and it also means platforms that don't support radix don't have to carry
the code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:53 +10:00
Michael Ellerman
1a01dc87e0 powerpc/mm: Add mmu_early_init_devtree()
Empty for now, but we'll add to it in the next patch.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-01 11:14:53 +10:00
Jiri Olsa
e64a5470dc s390/ftrace/jprobes: Fix conflict between jprobes and function graph tracing
This fixes the same issue Steven already fixed for x86
in following commit:

  237d28db03 ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing

It fixes the crash, that happens when function graph tracing
and jprobes are used simultaneously. Please refer to above
commit for details.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
2016-07-31 09:28:12 -04:00
James Hogan
68c5cf5a60 s390: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
AT_VECTOR_SIZE_ARCH should be defined with the maximum number of
NEW_AUX_ENT entries that ARCH_DLINFO can contain, but it wasn't defined
for s390 at all even though ARCH_DLINFO can contain one NEW_AUX_ENT when
VDSO is enabled.

This shouldn't be a problem as AT_VECTOR_SIZE_BASE includes space for
AT_BASE_PLATFORM which s390 doesn't use, but lets define it now and add
the comment above ARCH_DLINFO as found in several other architectures to
remind future modifiers of ARCH_DLINFO to keep AT_VECTOR_SIZE_ARCH up to
date.

Fixes: b020632e40 ("[S390] introduce vdso on s390")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
2016-07-31 09:28:09 -04:00
Heiko Carstens
ef4423ce70 s390/numa: only set possible nodes within node_possible_map
Make sure that only those nodes appear in the node_possible_map that
may actually be used. Usually that means that the node online and
possible maps are identical. For mode "plain" we only have one node,
for mode "emu" we have "emu_nodes" nodes.

Before this the possible map included (with default config) 16 nodes
while usually only one was used. That made a couple of loops that
iterated over all possible nodes do more work than necessary.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:28:00 -04:00
Heiko Carstens
8814309163 s390/als: fix compile with gcov enabled
Fix this one when gcov is enabled:

arch/s390/kernel/als.o:(.data+0x118): undefined reference to `__gcov_merge_add'
arch/s390/kernel/als.o: In function `_GLOBAL__sub_I_65535_0_verify_facilities':
(.text.startup+0x8): undefined reference to `__gcov_init'

Please merge with "s390/als: convert architecture level set code to C".

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:28:00 -04:00
Heiko Carstens
fbd6534ce0 s390/facilities: do not generate DWORDS define anymore
The architecture level set code has been converted to C and doesn't
need a define to figure out array sizes. Since the old code was the
only user of the DWORDS define, we can get rid of it again.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:27:59 -04:00
Heiko Carstens
a02d1988b5 s390/als: print missing facilities on facility mismatch
If the kernel needs more facilities to run than the machine provides
it is running on, print the facility bit numbers which are missing.

This allows to easily tell what went wrong and if simply the machine
does not provide a required facility or if either the kernel or the
hypervisor may have a bug.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:27:59 -04:00
Heiko Carstens
06ed5512a2 s390/als: print machine type on facility mismatch
If we have a facility mismatch the kernel only emits a warning that
the processor is not recent enough and stops operating. This doesn't
give us a lot of an idea of what actually went wrong.

As a first step print the machine type in addition.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:27:59 -04:00
Heiko Carstens
be2412c247 s390/als: convert architecture level set code to C
There is no reason to have this code in assembly language. Therefore
convert it to C.

Note that this code needs special treatment: it is called very early
and one of the side effects is that e.g. the bss section is not
cleared. Therefore the preferred way for static variables is to put
them on the stack which has a size of 16KB.

There is no functional change with this patch.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:27:58 -04:00
Heiko Carstens
cf9fdfea5f s390/sclp: move uninitialized data to data section
The early sclp code may be called before the bss section is
cleared. Therefore move all variables to the data section.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:27:58 -04:00
Gerald Schaefer
bc29b7ac1d s390/mm: clean up pte/pmd encoding
The hugetlbfs pte<->pmd conversion functions currently assume that the pmd
bit layout is consistent with the pte layout, which is not really true.

The SW read and write bits are encoded as the sequence "wr" in a pte, but
in a pmd it is "rw". The hugetlbfs conversion assumes that the sequence
is identical in both cases, which results in swapped read and write bits
in the pmd. In practice this is not a problem, because those pmd bits are
only relevant for THP pmds and not for hugetlbfs pmds. The hugetlbfs code
works on (fake) ptes, and the converted pte bits are correct.

There is another variation in pte/pmd encoding which affects dirty
prot-none ptes/pmds. In this case, a pmd has both its HW read-only and
invalid bit set, while it is only the invalid bit for a pte. This also has
no effect in practice, but it should better be consistent.

This patch fixes both inconsistencies by changing the SW read/write bit
layout for pmds as well as the PAGE_NONE encoding for ptes. It also makes
the hugetlbfs conversion functions more robust by introducing a
move_set_bit() macro that uses the pte/pmd bit #defines instead of
constant shifts.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-07-31 05:27:57 -04:00
Linus Torvalds
bad60e6f25 powerpc updates for 4.8 # 1
Highlights:
  - PowerNV PCI hotplug support.
  - Lots more Power9 support.
  - eBPF JIT support on ppc64le.
  - Lots of cxl updates.
  - Boot code consolidation.
 
 Bug fixes:
  - Fix spin_unlock_wait() from Boqun Feng
  - Fix stack pointer corruption in __tm_recheckpoint() from Michael Neuling
  - Fix multiple bugs in memory_hotplug_max() from Bharata B Rao
  - mm: Ensure "special" zones are empty from Oliver O'Halloran
  - ftrace: Separate the heuristics for checking call sites from Michael Ellerman
  - modules: Never restore r2 for a mprofile-kernel style mcount() call from Michael Ellerman
  - Fix endianness when reading TCEs from Alexey Kardashevskiy
  - start rtasd before PCI probing from Greg Kurz
  - PCI: rpaphp: Fix slot registration for multiple slots under a PHB from Tyrel Datwyler
  - powerpc/mm: Add memory barrier in __hugepte_alloc() from Sukadev Bhattiprolu
 
 Cleanups & fixes:
  - Drop support for MPIC in pseries from Rashmica Gupta
  - Define and use PPC64_ELF_ABI_v2/v1 from Michael Ellerman
  - Remove unused symbols in asm-offsets.c from Rashmica Gupta
  - Fix SRIOV not building without EEH enabled from Russell Currey
  - Remove kretprobe_trampoline_holder. from Thiago Jung Bauermann
  - Reduce log level of PCI I/O space warning from Benjamin Herrenschmidt
  - Add array bounds checking to crash_shutdown_handlers from Suraj Jitindar Singh
  - Avoid -maltivec when using clang integrated assembler from Anton Blanchard
  - Fix array overrun in ppc_rtas() syscall from Andrew Donnellan
  - Fix error return value in cmm_mem_going_offline() from Rasmus Villemoes
  - export cpu_to_core_id() from Mauricio Faria de Oliveira
  - Remove old symbols from defconfigs from Andrew Donnellan
  - Update obsolete comments in setup_32.c about entry conditions from Benjamin Herrenschmidt
  - Add comment explaining the purpose of setup_kdump_trampoline() from Benjamin Herrenschmidt
  - Merge the RELOCATABLE config entries for ppc32 and ppc64 from Kevin Hao
  - Remove RELOCATABLE_PPC32 from Kevin Hao
  - Fix .long's in tlb-radix.c to more meaningful from Balbir Singh
 
 Minor cleanups & fixes:
  - Andrew Donnellan, Anna-Maria Gleixner, Anton Blanchard, Benjamin
    Herrenschmidt, Bharata B Rao, Christophe Leroy, Colin Ian King, Geliang
    Tang, Greg Kurz, Madhavan Srinivasan, Michael Ellerman, Michael Ellerman,
    Stephen Rothwell, Stewart Smith.
 
 Freescale updates from Scott:
  - "Highlights include more 8xx optimizations, device tree updates,
    and MVME7100 support."
 
 PowerNV PCI hotplug from Gavin Shan:
  - PCI: Add pcibios_setup_bridge()
  - Override pcibios_setup_bridge()
  - Remove PCI_RESET_DELAY_US
  - Move pnv_pci_ioda_setup_opal_tce_kill() around
  - Increase PE# capacity
  - Allocate PE# in reverse order
  - Create PEs in pcibios_setup_bridge()
  - Setup PE for root bus
  - Extend PCI bridge resources
  - Make pnv_ioda_deconfigure_pe() visible
  - Dynamically release PE
  - Update bridge windows on PCI plug
  - Delay populating pdn
  - Support PCI slot ID
  - Use PCI slot reset infrastructure
  - Introduce pnv_pci_get_slot_id()
  - Functions to get/set PCI slot state
  - PCI/hotplug: PowerPC PowerNV PCI hotplug driver
  - Print correct PHB type names
 
 Power9 idle support from Shreyas B. Prabhu:
  - set power_save func after the idle states are initialized
  - Use PNV_THREAD_WINKLE macro while requesting for winkle
  - make hypervisor state restore a function
  - Rename idle_power7.S to idle_book3s.S
  - Rename reusable idle functions to hardware agnostic names
  - Make pnv_powersave_common more generic
  - abstraction for saving SPRs before entering deep idle states
  - Add platform support for stop instruction
  - cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES
  - cpuidle/powernv: cleanup cpuidle-powernv.c
  - cpuidle/powernv: Add support for POWER ISA v3 idle states
  - Use deepest stop state when cpu is offlined
 
 Power9 PMU from Madhavan Srinivasan:
  - factor out power8 pmu macros and defines
  - factor out power8 pmu functions
  - factor out power8 __init_pmu code
  - Add power9 event list macros for generic and cache events
  - Power9 PMU support
  - Export Power9 generic and cache events to sysfs
 
 Power9 preliminary interrupt & PCI support from Benjamin Herrenschmidt:
  - Add XICS emulation APIs
  - Move a few exception common handlers to make room
  - Add support for HV virtualization interrupts
  - Add mechanism to force a replay of interrupts
  - Add ICP OPAL backend
  - Discover IODA3 PHBs
  - pci: Remove obsolete SW invalidate
  - opal: Add real mode call wrappers
  - Rename TCE invalidation calls
  - Remove SWINV constants and obsolete TCE code
  - Rework accessing the TCE invalidate register
  - Fallback to OPAL for TCE invalidations
  - Use the device-tree to get available range of M64's
  - Check status of a PHB before using it
  - pci: Don't try to allocate resources that will be reassigned
 
 Other Power9:
  - Send SIGBUS on unaligned copy and paste from Chris Smart
  - Large Decrementer support from Oliver O'Halloran
  - Load Monitor Register Support from Jack Miller
 
 Performance improvements from Anton Blanchard:
  - Avoid load hit store in __giveup_fpu() and __giveup_altivec()
  - Avoid load hit store in setup_sigcontext()
  - Remove assembly versions of strcpy, strcat, strlen and strcmp
  - Align hot loops of some string functions
 
 eBPF JIT from Naveen N. Rao:
  - Fix/enhance 32-bit Load Immediate implementation
  - Optimize 64-bit Immediate loads
  - Introduce rotate immediate instructions
  - A few cleanups
  - Isolate classic BPF JIT specifics into a separate header
  - Implement JIT compiler for extended BPF
 
 Operator Panel driver from Suraj Jitindar Singh:
  - devicetree/bindings: Add binding for operator panel on FSP machines
  - Add inline function to get rc from an ASYNC_COMP opal_msg
  - Add driver for operator panel on FSP machines
 
 Sparse fixes from Daniel Axtens:
  - make some things static
  - Introduce asm-prototypes.h
  - Include headers containing prototypes
  - Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE
  - kvm: Clarify __user annotations
  - Pass endianness to sparse
  - Make ppc_md.{halt, restart} __noreturn
 
 MM fixes & cleanups from Aneesh Kumar K.V:
  - radix: Update LPCR HR bit as per ISA
  - use _raw variant of page table accessors
  - Compile out radix related functions if RADIX_MMU is disabled
  - Clear top 16 bits of va only on older cpus
  - Print formation regarding the the MMU mode
  - hash: Update SDR1 size encoding as documented in ISA 3.0
  - radix: Update PID switch sequence
  - radix: Update machine call back to support new HCALL.
  - radix: Add LPID based tlb flush helpers
  - radix: Add a kernel command line to disable radix
  - Cleanup LPCR defines
 
 Boot code consolidation from Benjamin Herrenschmidt:
  - Move epapr_paravirt_early_init() to early_init_devtree()
  - cell: Don't use flat device-tree after boot
  - ge_imp3a: Don't use the flat device-tree after boot
  - mpc85xx_ds: Don't use the flat device-tree after boot
  - mpc85xx_rdb: Don't use the flat device-tree after boot
  - Don't test for machine type in rtas_initialize()
  - Don't test for machine type in smp_setup_cpu_maps()
  - dt: Add of_device_compatible_match()
  - Factor do_feature_fixup calls
  - Move 64-bit feature fixup earlier
  - Move 64-bit memory reserves to setup_arch()
  - Use a cachable DART
  - Move FW feature probing out of pseries probe()
  - Put exception configuration in a common place
  - Remove early allocation of the SMU command buffer
  - Move MMU backend selection out of platform code
  - pasemi: Remove IOBMAP allocation from platform probe()
  - mm/hash: Don't use machine_is() early during boot
  - Don't test for machine type to detect HEA special case
  - pmac: Remove spurrious machine type test
  - Move hash table ops to a separate structure
  - Ensure that ppc_md is empty before probing for machine type
  - Move 64-bit probe_machine() to later in the boot process
  - Move 32-bit probe() machine to later in the boot process
  - Get rid of ppc_md.init_early()
  - Move the boot time info banner to a separate function
  - Move setting of {i,d}cache_bsize to initialize_cache_info()
  - Move the content of setup_system() to setup_arch()
  - Move cache info inits to a separate function
  - Re-order the call to smp_setup_cpu_maps()
  - Re-order setup_panic()
  - Make a few boot functions __init
  - Merge 32-bit and 64-bit setup_arch()
 
 Other new features:
  - tty/hvc: Use IRQF_SHARED for OPAL hvc consoles from Sam Mendoza-Jonas
  - tty/hvc: Use opal irqchip interface if available from Sam Mendoza-Jonas
  - powerpc: Add module autoloading based on CPU features from Alastair D'Silva
  - crypto: vmx - Convert to CPU feature based module autoloading from Alastair D'Silva
  - Wake up kopald polling thread before waiting for events from Benjamin Herrenschmidt
  - xmon: Dump ISA 2.06 SPRs from Michael Ellerman
  - xmon: Dump ISA 2.07 SPRs from Michael Ellerman
  - Add a parameter to disable 1TB segs from Oliver O'Halloran
  - powerpc/boot: Add OPAL console to epapr wrappers from Oliver O'Halloran
  - Assign fixed PHB number based on device-tree properties from Guilherme G. Piccoli
  - pseries: Add pseries hotplug workqueue from John Allen
  - pseries: Add support for hotplug interrupt source from John Allen
  - pseries: Use kernel hotplug queue for PowerVM hotplug events from John Allen
  - pseries: Move property cloning into its own routine from Nathan Fontenot
  - pseries: Dynamic add entires to associativity lookup array from Nathan Fontenot
  - pseries: Auto-online hotplugged memory from Nathan Fontenot
  - pseries: Remove call to memblock_add() from Nathan Fontenot
 
 cxl:
  - Add set and get private data to context struct from Michael Neuling
  - make base more explicitly non-modular from Paul Gortmaker
  - Use for_each_compatible_node() macro from Wei Yongjun
  - Frederic Barrat
    - Abstract the differences between the PSL and XSL
    - Make vPHB device node match adapter's
  - Philippe Bergheaud
    - Add mechanism for delivering AFU driver specific events
    - Ignore CAPI adapters misplaced in switched slots
    - Refine slice error debug messages
  - Andrew Donnellan
    - static-ify variables to fix sparse warnings
    - PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl
    - PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state
    - Add cxl_check_and_switch_mode() API to switch bi-modal cards
    - remove dead Kconfig options
    - fix potential NULL dereference in free_adapter()
  - Ian Munsie
    - Update process element after allocating interrupts
    - Add support for CAPP DMA mode
    - Fix allowing bogus AFU descriptors with 0 maximum processes
    - Fix allocating a minimum of 2 pages for the SPA
    - Fix bug where AFU disable operation had no effect
    - Workaround XSL bug that does not clear the RA bit after a reset
    - Fix NULL pointer dereference on kernel contexts with no AFU interrupts
    - powerpc/powernv: Split cxl code out into a separate file
    - Add cxl_slot_is_supported API
    - Enable bus mastering for devices using CAPP DMA mode
    - Move cxl_afu_get / cxl_afu_put to base
    - Allow a default context to be associated with an external pci_dev
    - Do not create vPHB if there are no AFU configuration records
    - powerpc/powernv: Add support for the cxl kernel api on the real phb
    - Add support for using the kernel API with a real PHB
    - Add kernel APIs to get & set the max irqs per context
    - Add preliminary workaround for CX4 interrupt limitation
    - Add support for interrupts on the Mellanox CX4
    - Workaround PE=0 hardware limitation in Mellanox CX4
    - powerpc/powernv: Fix pci-cxl.c build when CONFIG_MODULES=n
 
 selftests:
  - Test unaligned copy and paste from Chris Smart
  - Load Monitor Register Tests from Jack Miller
  - Cyril Bur
    - exec() with suspended transaction
    - Use signed long to read perf_event_paranoid
    - Fix usage message in context_switch
    - Fix generation of vector instructions/types in context_switch
  - Michael Ellerman
    - Use "Delta" rather than "Error" in normal output
    - Import Anton's mmap & futex micro benchmarks
    - Add a test for PROT_SAO
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXnWchAAoJEFHr6jzI4aWAe64P/36Vd9yJLptjkoyZp8/IQtu1
 Cv8buQwGdKuSMzdkcUAOXcC3fe2u70ZWXMKKLfY3koIV1IAiqdWk5/XWRKMP2XmE
 dG0LhSf0uu7uh+mE0WvQnRu46ImeKtQ+mPp4Hbs/s9SxMSeYjruv3vdWWmgUq0cl
 Gac2qJSRtAMmgLuHWMjf7N5mxOTOnKejU4o2i9cJ+YHmWKOdCigv2Ge1UadOQFlC
 E7tRPiUR3asfDfj+e+LVTTdToH6p8pk+mOUzIoZ8jIkQ+IXzi62UDl5+Rw9mqiuX
 1CtqEMUXxo2qwX+d4TcV/QUOp0YKPuIcUZ9NMMS+S3lOyJ4NFt+j2Izk7QJp5kNP
 gKVqB68TjDQsBuDr3P9ynlHbduxTIhZAqopbTrLe0FIg48nUe4n1yHJBVzqaVajX
 rFBJSsSUffBLAARNPSXJJhIgc2C1/qOC8dgMeDMcR2kPirDHaQZ/lY1yEpq1yiqR
 q6e3v5hvIAm4IjbYk0mF7TUxBrPGVE/ExyBINyASRoYxAJ1PyeD/iljZ9vI3asRA
 s+hhxT8H3f7lnqTrmJqMjHgAdGkmag07EdmvFNX4xK4aADSy7Y6g4dw25ffRopo9
 p9Jf9HX+dZv65Y3UjbV/6HuXcaSEBJJLSVWvii65PebqSN0LuHEFvNeIJ6Iblx0B
 AWh/hd0Iin2gdkcG39Mr
 =Z5kM
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Highlights:
   - PowerNV PCI hotplug support.
   - Lots more Power9 support.
   - eBPF JIT support on ppc64le.
   - Lots of cxl updates.
   - Boot code consolidation.

  Bug fixes:
   - Fix spin_unlock_wait() from Boqun Feng
   - Fix stack pointer corruption in __tm_recheckpoint() from Michael
     Neuling
   - Fix multiple bugs in memory_hotplug_max() from Bharata B Rao
   - mm: Ensure "special" zones are empty from Oliver O'Halloran
   - ftrace: Separate the heuristics for checking call sites from
     Michael Ellerman
   - modules: Never restore r2 for a mprofile-kernel style mcount() call
     from Michael Ellerman
   - Fix endianness when reading TCEs from Alexey Kardashevskiy
   - start rtasd before PCI probing from Greg Kurz
   - PCI: rpaphp: Fix slot registration for multiple slots under a PHB
     from Tyrel Datwyler
   - powerpc/mm: Add memory barrier in __hugepte_alloc() from Sukadev
     Bhattiprolu

  Cleanups & fixes:
   - Drop support for MPIC in pseries from Rashmica Gupta
   - Define and use PPC64_ELF_ABI_v2/v1 from Michael Ellerman
   - Remove unused symbols in asm-offsets.c from Rashmica Gupta
   - Fix SRIOV not building without EEH enabled from Russell Currey
   - Remove kretprobe_trampoline_holder from Thiago Jung Bauermann
   - Reduce log level of PCI I/O space warning from Benjamin
     Herrenschmidt
   - Add array bounds checking to crash_shutdown_handlers from Suraj
     Jitindar Singh
   - Avoid -maltivec when using clang integrated assembler from Anton
     Blanchard
   - Fix array overrun in ppc_rtas() syscall from Andrew Donnellan
   - Fix error return value in cmm_mem_going_offline() from Rasmus
     Villemoes
   - export cpu_to_core_id() from Mauricio Faria de Oliveira
   - Remove old symbols from defconfigs from Andrew Donnellan
   - Update obsolete comments in setup_32.c about entry conditions from
     Benjamin Herrenschmidt
   - Add comment explaining the purpose of setup_kdump_trampoline() from
     Benjamin Herrenschmidt
   - Merge the RELOCATABLE config entries for ppc32 and ppc64 from Kevin
     Hao
   - Remove RELOCATABLE_PPC32 from Kevin Hao
   - Fix .long's in tlb-radix.c to more meaningful from Balbir Singh

  Minor cleanups & fixes:
   - Andrew Donnellan, Anna-Maria Gleixner, Anton Blanchard, Benjamin
     Herrenschmidt, Bharata B Rao, Christophe Leroy, Colin Ian King,
     Geliang Tang, Greg Kurz, Madhavan Srinivasan, Michael Ellerman,
     Michael Ellerman, Stephen Rothwell, Stewart Smith.

  Freescale updates from Scott:
   - "Highlights include more 8xx optimizations, device tree updates,
     and MVME7100 support."

  PowerNV PCI hotplug from Gavin Shan:
   - PCI: Add pcibios_setup_bridge()
   - Override pcibios_setup_bridge()
   - Remove PCI_RESET_DELAY_US
   - Move pnv_pci_ioda_setup_opal_tce_kill() around
   - Increase PE# capacity
   - Allocate PE# in reverse order
   - Create PEs in pcibios_setup_bridge()
   - Setup PE for root bus
   - Extend PCI bridge resources
   - Make pnv_ioda_deconfigure_pe() visible
   - Dynamically release PE
   - Update bridge windows on PCI plug
   - Delay populating pdn
   - Support PCI slot ID
   - Use PCI slot reset infrastructure
   - Introduce pnv_pci_get_slot_id()
   - Functions to get/set PCI slot state
   - PCI/hotplug: PowerPC PowerNV PCI hotplug driver
   - Print correct PHB type names

  Power9 idle support from Shreyas B. Prabhu:
   - set power_save func after the idle states are initialized
   - Use PNV_THREAD_WINKLE macro while requesting for winkle
   - make hypervisor state restore a function
   - Rename idle_power7.S to idle_book3s.S
   - Rename reusable idle functions to hardware agnostic names
   - Make pnv_powersave_common more generic
   - abstraction for saving SPRs before entering deep idle states
   - Add platform support for stop instruction
   - cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES
   - cpuidle/powernv: cleanup cpuidle-powernv.c
   - cpuidle/powernv: Add support for POWER ISA v3 idle states
   - Use deepest stop state when cpu is offlined

  Power9 PMU from Madhavan Srinivasan:
   - factor out power8 pmu macros and defines
   - factor out power8 pmu functions
   - factor out power8 __init_pmu code
   - Add power9 event list macros for generic and cache events
   - Power9 PMU support
   - Export Power9 generic and cache events to sysfs

  Power9 preliminary interrupt & PCI support from Benjamin Herrenschmidt:
   - Add XICS emulation APIs
   - Move a few exception common handlers to make room
   - Add support for HV virtualization interrupts
   - Add mechanism to force a replay of interrupts
   - Add ICP OPAL backend
   - Discover IODA3 PHBs
   - pci: Remove obsolete SW invalidate
   - opal: Add real mode call wrappers
   - Rename TCE invalidation calls
   - Remove SWINV constants and obsolete TCE code
   - Rework accessing the TCE invalidate register
   - Fallback to OPAL for TCE invalidations
   - Use the device-tree to get available range of M64's
   - Check status of a PHB before using it
   - pci: Don't try to allocate resources that will be reassigned

  Other Power9:
   - Send SIGBUS on unaligned copy and paste from Chris Smart
   - Large Decrementer support from Oliver O'Halloran
   - Load Monitor Register Support from Jack Miller

  Performance improvements from Anton Blanchard:
   - Avoid load hit store in __giveup_fpu() and __giveup_altivec()
   - Avoid load hit store in setup_sigcontext()
   - Remove assembly versions of strcpy, strcat, strlen and strcmp
   - Align hot loops of some string functions

  eBPF JIT from Naveen N. Rao:
   - Fix/enhance 32-bit Load Immediate implementation
   - Optimize 64-bit Immediate loads
   - Introduce rotate immediate instructions
   - A few cleanups
   - Isolate classic BPF JIT specifics into a separate header
   - Implement JIT compiler for extended BPF

  Operator Panel driver from Suraj Jitindar Singh:
   - devicetree/bindings: Add binding for operator panel on FSP machines
   - Add inline function to get rc from an ASYNC_COMP opal_msg
   - Add driver for operator panel on FSP machines

  Sparse fixes from Daniel Axtens:
   - make some things static
   - Introduce asm-prototypes.h
   - Include headers containing prototypes
   - Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE
   - kvm: Clarify __user annotations
   - Pass endianness to sparse
   - Make ppc_md.{halt, restart} __noreturn

  MM fixes & cleanups from Aneesh Kumar K.V:
   - radix: Update LPCR HR bit as per ISA
   - use _raw variant of page table accessors
   - Compile out radix related functions if RADIX_MMU is disabled
   - Clear top 16 bits of va only on older cpus
   - Print formation regarding the the MMU mode
   - hash: Update SDR1 size encoding as documented in ISA 3.0
   - radix: Update PID switch sequence
   - radix: Update machine call back to support new HCALL.
   - radix: Add LPID based tlb flush helpers
   - radix: Add a kernel command line to disable radix
   - Cleanup LPCR defines

  Boot code consolidation from Benjamin Herrenschmidt:
   - Move epapr_paravirt_early_init() to early_init_devtree()
   - cell: Don't use flat device-tree after boot
   - ge_imp3a: Don't use the flat device-tree after boot
   - mpc85xx_ds: Don't use the flat device-tree after boot
   - mpc85xx_rdb: Don't use the flat device-tree after boot
   - Don't test for machine type in rtas_initialize()
   - Don't test for machine type in smp_setup_cpu_maps()
   - dt: Add of_device_compatible_match()
   - Factor do_feature_fixup calls
   - Move 64-bit feature fixup earlier
   - Move 64-bit memory reserves to setup_arch()
   - Use a cachable DART
   - Move FW feature probing out of pseries probe()
   - Put exception configuration in a common place
   - Remove early allocation of the SMU command buffer
   - Move MMU backend selection out of platform code
   - pasemi: Remove IOBMAP allocation from platform probe()
   - mm/hash: Don't use machine_is() early during boot
   - Don't test for machine type to detect HEA special case
   - pmac: Remove spurrious machine type test
   - Move hash table ops to a separate structure
   - Ensure that ppc_md is empty before probing for machine type
   - Move 64-bit probe_machine() to later in the boot process
   - Move 32-bit probe() machine to later in the boot process
   - Get rid of ppc_md.init_early()
   - Move the boot time info banner to a separate function
   - Move setting of {i,d}cache_bsize to initialize_cache_info()
   - Move the content of setup_system() to setup_arch()
   - Move cache info inits to a separate function
   - Re-order the call to smp_setup_cpu_maps()
   - Re-order setup_panic()
   - Make a few boot functions __init
   - Merge 32-bit and 64-bit setup_arch()

  Other new features:
   - tty/hvc: Use IRQF_SHARED for OPAL hvc consoles from Sam Mendoza-Jonas
   - tty/hvc: Use opal irqchip interface if available from Sam Mendoza-Jonas
   - powerpc: Add module autoloading based on CPU features from Alastair D'Silva
   - crypto: vmx - Convert to CPU feature based module autoloading from Alastair D'Silva
   - Wake up kopald polling thread before waiting for events from Benjamin Herrenschmidt
   - xmon: Dump ISA 2.06 SPRs from Michael Ellerman
   - xmon: Dump ISA 2.07 SPRs from Michael Ellerman
   - Add a parameter to disable 1TB segs from Oliver O'Halloran
   - powerpc/boot: Add OPAL console to epapr wrappers from Oliver O'Halloran
   - Assign fixed PHB number based on device-tree properties from Guilherme G. Piccoli
   - pseries: Add pseries hotplug workqueue from John Allen
   - pseries: Add support for hotplug interrupt source from John Allen
   - pseries: Use kernel hotplug queue for PowerVM hotplug events from John Allen
   - pseries: Move property cloning into its own routine from Nathan Fontenot
   - pseries: Dynamic add entires to associativity lookup array from Nathan Fontenot
   - pseries: Auto-online hotplugged memory from Nathan Fontenot
   - pseries: Remove call to memblock_add() from Nathan Fontenot

  cxl:
   - Add set and get private data to context struct from Michael Neuling
   - make base more explicitly non-modular from Paul Gortmaker
   - Use for_each_compatible_node() macro from Wei Yongjun
   - Frederic Barrat
   - Abstract the differences between the PSL and XSL
   - Make vPHB device node match adapter's
   - Philippe Bergheaud
   - Add mechanism for delivering AFU driver specific events
   - Ignore CAPI adapters misplaced in switched slots
   - Refine slice error debug messages
   - Andrew Donnellan
   - static-ify variables to fix sparse warnings
   - PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl
   - PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state
   - Add cxl_check_and_switch_mode() API to switch bi-modal cards
   - remove dead Kconfig options
   - fix potential NULL dereference in free_adapter()
   - Ian Munsie
   - Update process element after allocating interrupts
   - Add support for CAPP DMA mode
   - Fix allowing bogus AFU descriptors with 0 maximum processes
   - Fix allocating a minimum of 2 pages for the SPA
   - Fix bug where AFU disable operation had no effect
   - Workaround XSL bug that does not clear the RA bit after a reset
   - Fix NULL pointer dereference on kernel contexts with no AFU interrupts
   - powerpc/powernv: Split cxl code out into a separate file
   - Add cxl_slot_is_supported API
   - Enable bus mastering for devices using CAPP DMA mode
   - Move cxl_afu_get / cxl_afu_put to base
   - Allow a default context to be associated with an external pci_dev
   - Do not create vPHB if there are no AFU configuration records
   - powerpc/powernv: Add support for the cxl kernel api on the real phb
   - Add support for using the kernel API with a real PHB
   - Add kernel APIs to get & set the max irqs per context
   - Add preliminary workaround for CX4 interrupt limitation
   - Add support for interrupts on the Mellanox CX4
   - Workaround PE=0 hardware limitation in Mellanox CX4
   - powerpc/powernv: Fix pci-cxl.c build when CONFIG_MODULES=n

  selftests:
   - Test unaligned copy and paste from Chris Smart
   - Load Monitor Register Tests from Jack Miller
   - Cyril Bur
   - exec() with suspended transaction
   - Use signed long to read perf_event_paranoid
   - Fix usage message in context_switch
   - Fix generation of vector instructions/types in context_switch
   - Michael Ellerman
   - Use "Delta" rather than "Error" in normal output
   - Import Anton's mmap & futex micro benchmarks
   - Add a test for PROT_SAO"

* tag 'powerpc-4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (263 commits)
  powerpc/mm: Parenthesise IS_ENABLED() in if condition
  tty/hvc: Use opal irqchip interface if available
  tty/hvc: Use IRQF_SHARED for OPAL hvc consoles
  selftests/powerpc: exec() with suspended transaction
  powerpc: Improve comment explaining why we modify VRSAVE
  powerpc/mm: Drop unused externs for hpte_init_beat[_v3]()
  powerpc/mm: Rename hpte_init_lpar() and move the fallback to a header
  powerpc/mm: Fix build break when PPC_NATIVE=n
  crypto: vmx - Convert to CPU feature based module autoloading
  powerpc: Add module autoloading based on CPU features
  powerpc/powernv/ioda: Fix endianness when reading TCEs
  powerpc/mm: Add memory barrier in __hugepte_alloc()
  powerpc/modules: Never restore r2 for a mprofile-kernel style mcount() call
  powerpc/ftrace: Separate the heuristics for checking call sites
  powerpc: Merge 32-bit and 64-bit setup_arch()
  powerpc/64: Make a few boot functions __init
  powerpc: Re-order setup_panic()
  powerpc: Re-order the call to smp_setup_cpu_maps()
  powerpc/32: Move cache info inits to a separate function
  powerpc/64: Move the content of setup_system() to setup_arch()
  ...
2016-07-30 21:01:36 -07:00
Grygorii Strashko
213fa10db2 ARM: OMAP2+: omap_device: fix crash on omap_device removal
Below call chain causes system crash when OMAP device is
removed by calling of_platform_depopulate()/device_del():

device_del()
- blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 			     BUS_NOTIFY_DEL_DEVICE, dev);
  - _omap_device_notifier_call()
    - omap_device_delete()
      - od->pdev->archdata.od = NULL;
	kfree(od->hwmods);
	kfree(od);
  - bus_remove_device()
    - device_release_driver()
      - __device_release_driver()
	- pm_runtime_get_sync()
	   - _od_runtime_resume()
	     - omap_hwmod_enable() <- OOPS od's delted already

Backtrace:
Unable to handle kernel NULL pointer dereference at virtual address 0000000d
pgd = eb100000
[0000000d] *pgd=ad6e1831, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
CPU: 1 PID: 1273 Comm: modprobe Not tainted 4.4.15-rt19-00115-ge4d3cd3-dirty #68
Hardware name: Generic DRA74X (Flattened Device Tree)
task: eb1ee800 ti: ec962000 task.ti: ec962000
PC is at omap_device_enable+0x10/0x90
LR is at _od_runtime_resume+0x10/0x24
[...]
[<c00299dc>] (omap_device_enable) from [<c0029a6c>] (_od_runtime_resume+0x10/0x24)
[<c0029a6c>] (_od_runtime_resume) from [<c04ad404>] (__rpm_callback+0x20/0x34)
[<c04ad404>] (__rpm_callback) from [<c04ad438>] (rpm_callback+0x20/0x80)
[<c04ad438>] (rpm_callback) from [<c04aee28>] (rpm_resume+0x48c/0x964)
[<c04aee28>] (rpm_resume) from [<c04af360>] (__pm_runtime_resume+0x60/0x88)
[<c04af360>] (__pm_runtime_resume) from [<c04a4974>] (__device_release_driver+0x30/0x100)
[<c04a4974>] (__device_release_driver) from [<c04a4a60>] (device_release_driver+0x1c/0x28)
[<c04a4a60>] (device_release_driver) from [<c04a38c0>] (bus_remove_device+0xec/0x144)
[<c04a38c0>] (bus_remove_device) from [<c04a0764>] (device_del+0x10c/0x210)
[<c04a0764>] (device_del) from [<c04a67b0>] (platform_device_del+0x18/0x84)
[<c04a67b0>] (platform_device_del) from [<c04a6828>] (platform_device_unregister+0xc/0x20)
[<c04a6828>] (platform_device_unregister) from [<c05adcfc>] (of_platform_device_destroy+0x8c/0x90)
[<c05adcfc>] (of_platform_device_destroy) from [<c04a02f0>] (device_for_each_child+0x4c/0x78)
[<c04a02f0>] (device_for_each_child) from [<c05adc5c>] (of_platform_depopulate+0x30/0x44)
[<c05adc5c>] (of_platform_depopulate) from [<bf123920>] (cpsw_remove+0x68/0xf4 [ti_cpsw])
[<bf123920>] (cpsw_remove [ti_cpsw]) from [<c04a68d8>] (platform_drv_remove+0x24/0x3c)
[<c04a68d8>] (platform_drv_remove) from [<c04a49c8>] (__device_release_driver+0x84/0x100)
[<c04a49c8>] (__device_release_driver) from [<c04a4b20>] (driver_detach+0xac/0xb0)
[<c04a4b20>] (driver_detach) from [<c04a3be8>] (bus_remove_driver+0x60/0xd4)
[<c04a3be8>] (bus_remove_driver) from [<c00d9870>] (SyS_delete_module+0x184/0x20c)
[<c00d9870>] (SyS_delete_module) from [<c0010540>] (ret_fast_syscall+0x0/0x1c)
Code: e3500000 e92d4070 1590630c 01a06000 (e5d6300d)

Hence, fix it by using BUS_NOTIFY_REMOVED_DEVICE event for OMAP device
deletion which is sent when DD has finished processing of device
deletion.

Cc: Tony Lindgren <tony@atomide.com>
Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-30 21:00:34 -07:00
Rich Felker
03767daa13 sh: fix build regression with CONFIG_OF && !CONFIG_OF_FLATTREE
Such a configuration could only be selected by manually selecting
CONFIG_OF; SH_DEVICE_TREE selects both. The affected code is using the
flat DTB at boot time and thus rightfully should depend on
OF_FLATTREE, not just OF.

Signed-off-by: Rich Felker <dalias@libc.org>
2016-07-31 03:33:32 +00:00
Rich Felker
b46ed37042 sh: allow clocksource drivers to register sched_clock backends
There is no arch-specific sched_clock implementation for sh, resulting
in use of the old default jiffies-based implementation. Instead, use
the modern generic sched_clock framework so that drivers can register
better backends.

Signed-off-by: Rich Felker <dalias@libc.org>
2016-07-31 03:33:32 +00:00
Paul Gortmaker
e75438e2a2 sh: make heartbeat driver explicitly non-modular
The Kconfig for this driver is currently:

config HEARTBEAT
        bool "Heartbeat LED"

....meaning that it currently is not being built as a module by anyone.
Lets remove the modular code that is essentially orphaned, so that
when reading the driver there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.

We explicitly disallow a driver unbind, since that doesn't have a
sensible use case anyway, and it allows us to drop the ".remove"
code for non-modular drivers.

We also delete the MODULE_LICENSE tag etc. since all that information
is already contained at the top of the file in the comments.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: linux-sh@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Rich Felker <dalias@libc.org>
2016-07-31 03:33:32 +00:00
Paul Gortmaker
f368d47587 sh: make board-secureedge5410 explicitly non-modular
The Kconfig currently controlling compilation of this code is:

config SH_SECUREEDGE5410
        bool "SecureEdge5410"

....meaning that it currently is not being built as a module by anyone.

Lets remove the couple traces of modularity so that when reading the
driver there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.

We don't replace module.h with init.h since the file already has that.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: linux-sh@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Rich Felker <dalias@libc.org>
2016-07-31 03:33:32 +00:00
Paul Gortmaker
f15412aa08 sh: make mm/asids-debugfs explicitly non-modular
The Makefile/Kconfig currently controlling compilation of this code is:

obj-$(CONFIG_DEBUG_FS)          += $(debugfs-y)
debugfs-y                       := asids-debugfs.o

lib/Kconfig.debug:config DEBUG_FS
lib/Kconfig.debug:      bool "Debug Filesystem"

....meaning that it currently is not being built as a module by anyone.

Lets remove the couple traces of modular code, so that when reading the
driver there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: linux-sh@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Rich Felker <dalias@libc.org>
2016-07-31 03:33:32 +00:00
Paul Gortmaker
7a65a34fae sh: make time.c explicitly non-modular
The Makefile currently controlling compilation of this code is:

obj-y   := debugtraps.o dma-nommu.o dumpstack.o                 \
[...]
           syscalls_$(BITS).o time.o topology.o traps.o         \
           traps_$(BITS).o unwinder.o

....meaning that it currently is not being built as a module by anyone.

Lets remove the couple traces of modular code, so that when reading
the driver there is no doubt it is builtin-only.

Since module_init translates to device_initcall in the non-modular
case, the init ordering remains unchanged with this commit.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: linux-sh@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Rich Felker <dalias@libc.org>
2016-07-31 03:33:32 +00:00
Rich Felker
72cc564f16 sh: fix futex/robust_list on nommu models
The futex cmpxchg runtime testing in kernel/futex.c depends on
accesses to address 0 producing EFAULT, which obviously does not work
on nommu. Since SH always has cmpxchg, disable the broken runtime
detection.

At some point this should be fixed at the kernel/futex.c level. UP
machines can always provide a working cmpxchg with interrupt masking,
and SMP cannot function without a working cmpxchg anyway.

Signed-off-by: Rich Felker <dalias@libc.org>
2016-07-31 03:33:32 +00:00
Rich Felker
57155c6523 sh: disable aliased page logic on NOMMU models
SH3/4 (with MMU) have a virtually indexed cache, requiring explicit
work to avoid consistency problems arising from having the same
physical address range cached in multiple cache lines. This is
unneeded for the NOMMU case, and some of the resulting code paths
(kmap_coherent) don't work. SH2 only avoided this problem by having a
4-way associative cache with way size equal to the page size (4k),
yielding no cache index bits outside of the page offset and thus no
aliases.

Signed-off-by: Rich Felker <dalias@libc.org>
2016-07-31 03:33:32 +00:00
Rich Felker
bbe6c77857 sh: make sigcontext definition consistent across fpu/nofpu models
Up until now, the SH version of the sigcontext structure, and thus
mcontext_t/ucontext_t, varied depending on the cpu model the kernel
was built to run on. SH-4 (including SH-4A) and SH-2A used the form
with space for FPU registers, and everything else used a form that
omitted them.

From a userspace perspective, however, the structure layout must be
fixed for a given ABI. Traditionally glibc and uClibc used the form
with space for FPU registers only when __SH4__ (which implies FPU;
__SH4_NOFPU__ is the predefined macro for SH-4 but with no-FPU ABI)
was defined. As a result:

- SH-4 no-FPU programs never matched kernel sigcontext.

- SH-3 programs did not match kernel sigcontext if run on SH-4,
  despite an apparent intent that they be compatible.

- SH-2 and SH-2A programs (using uClibc) did not match kernel
  sigcontext if run on SH-2A.

The mismatch might seem inconsequential because it occurs at the end
of the sigcontext structure, but sigcontext is embedded as uc_mcontext
in ucontext_t, where it is followed by uc_sigmask, an important member
for signal handlers to have access to. In particular, access to
uc_sigmask is necessary for a correct implementation of thread
cancellation.

It would be possible to retain support for both sigcontext ABIs via a
personality mechanism, but since many configurations were already
broken and nobody noticed, and since there are very few if any users
of legacy no-FPU models anymore, I have opted to just remove the
variation and always include space for the FPU registers in
sigcontext. This was proposed and discussed on a thread "SH sigcontext
ABI is broken" cross-posted to linux-sh, libc-alpha, and musl libc
lists in June 2015, and no objections were raised.

Signed-off-by: Rich Felker <dalias@libc.org>
2016-07-31 03:33:32 +00:00
Rich Felker
190fe191cf sh: add support for linking a builtin device tree blob in the kernel
Signed-off-by: Rich Felker <dalias@libc.org>
2016-07-31 03:33:32 +00:00
Pan Xinhui
ff18143cee sh: cmpxchg: fix a bit shift bug in big_endian os
Correct bitoff in big endian OS.
Current code works correctly for 1 byte but not for 2 bytes.

Fixes: 3226aad81a ("sh: support 1 and 2 byte xchg")
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rich Felker <dalias@libc.org>
2016-07-31 03:33:32 +00:00
Linus Torvalds
d761f3ed6e Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 microcode updates from Thomas Gleixner:

 - more work to make the microcode loader robust

 - a fix for the micro code load precedence

 - fixes for initrd loading with randomized memory

 - less printk noise on SMP machines

* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm, x86/microcode: Add __PAGE_OFFSET_BASE define on 32-bit
  x86/microcode/intel: Fix initrd loading with CONFIG_RANDOMIZE_MEMORY=y
  x86/microcode: Remove unused symbol exports
  x86/microcode/intel: Do not issue microcode updates messages on each CPU
  Documentation/microcode: Document some aspects for more clarity
  x86/microcode/AMD: Make amd_ucode_patch[] static
  x86/microcode/intel: Unexport save_mc_for_early()
  x86/microcode/intel: Rename load_microcode_early() to find_microcode_patch()
  x86/microcode: Propagate save_microcode_in_initrd() retval
  x86/microcode: Get rid of find_cpio_data()'s dummy offset arg
  lib/cpio: Make find_cpio_data()'s offset arg optional
  x86/microcode: Fix suspend to RAM with builtin microcode
  x86/microcode: Fix loading precedence
2016-07-30 13:18:33 -07:00
Linus Torvalds
b325e04ea2 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpufeature updates from Thomas Gleixner:

 - a workaround for the MONITOR instruction erratum of Goldmont CPUs

 - small fixes and cleanups here and there

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/cpu: Add workaround for MONITOR instruction erratum on Goldmont based CPUs
  x86/cpu: Rename "WESTMERE2" family to "NEHALEM_G"
  x86/amd_nb: Clean up init path
  x86/cpufeature: Add helper macro for mask check macros
  x86/cpufeature: Make sure DISABLED/REQUIRED macros are updated
  x86/cpufeature: Update cpufeaure macros
2016-07-30 12:56:26 -07:00
Linus Torvalds
f64d6e2aaa DeviceTree update for 4.8:
- Removal of most of_platform_populate() calls in arch code. Now the DT
 core code calls it in the default case and platforms only need to call
 it if they have special needs.
 
 - Use pr_fmt on all the DT core print statements.
 
 - CoreSight binding doc improvements to block name descriptions.
 
 - Add dt_to_config script which can parse dts files and list
 corresponding kernel config options.
 
 - Fix memory leak hit with a PowerMac DT.
 
 - Correct a bunch of STMicro compatible strings to use the correct
 vendor prefix.
 
 - Fix DA9052 PMIC binding doc to match what is actually used in dts
 files.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXm9KcAAoJEPr7XbWNvGHDRT4QAIIIOSB4AWHardnMLROgGge9
 aOQKZ/05O9feOcxYKe8FkQbcH+IujJjrUL+yrRD36yGQPAyBP21gtcmmfrkCcwFM
 kH915f/JbGvXpfwEf8dcarHhzYH6FFJiQGduPpWfwSSWynx+xq5EKPwCqYzMg8bN
 SExxt7vUx1MKFOExZ0K8BNCo8VMVLUWQoJ1DNeJDuL25Op4EU3i2l1HQNYV/3XDk
 BSA3x7Lw3GjrWEH20VWYn2Azq1OFLY+E2FC2lnG4nbkk5X8dZbUH9PR1Sk7uTQDj
 uxTjWe59NBpliCxKSAbMbTAU/WwSB1pJ0I+zDJBiQsdFT+nb5F4zOrs3qSKHa/A9
 Rv6AC8k5gdSMrDB1dOspfF2vWvOOInXgNV4/Kza0D92mbCpwyUuF+vhE6rfcMrZU
 OiD7rj2/fvO7Y9fUAhrp6zrfrOfH9B1Z9vS+940AlK96YwPE2+J0SA2vBxR/wg8H
 7fj4Ud5X+SFisXWQhh5Wlv0W9o6e7C7fsi8vpkQ7gufmezLFWVnJKsUfQaxGEwhG
 Hkhm9kuSHHMd+6dEnn2756DnNfJAtQv6rSR0/QR4Lf9y5L4dvR3kAQIci8X/nx4P
 sIk+IJWGZG6wziZq59hh+SO6HEqdSNuvh+5sbR0iUimdE/1HsDBdPiocXf/r8iwK
 NY9nGeZPRrXmFgdpoZfm
 =wLMr
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull DeviceTree updates from Rob Herring:

 - remove most of_platform_populate() calls in arch code.  Now the DT
   core code calls it in the default case and platforms only need to
   call it if they have special needs

 - use pr_fmt on all the DT core print statements

 - CoreSight binding doc improvements to block name descriptions

 - add dt_to_config script which can parse dts files and list
   corresponding kernel config options

 - fix memory leak hit with a PowerMac DT

 - correct a bunch of STMicro compatible strings to use the correct
   vendor prefix

 - fix DA9052 PMIC binding doc to match what is actually used in dts
   files

* tag 'devicetree-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (35 commits)
  documentation: da9052: Update regulator bindings names to match DA9052/53 DTS expectations
  xtensa: Partially Revert "xtensa: Remove unnecessary of_platform_populate with default match table"
  xtensa: Fix build error due to missing include file
  MIPS: ath79: Add missing include file
  Fix spelling errors in Documentation/devicetree
  ARM: dts: fix STMicroelectronics compatible strings
  powerpc/dts: fix STMicroelectronics compatible strings
  Documentation: dt: i2c: use correct STMicroelectronics vendor prefix
  scripts/dtc: dt_to_config - kernel config options for a devicetree
  of: fdt: mark unflattened tree as detached
  of: overlay: add resolver error prints
  coresight: document binding acronyms
  Documentation/devicetree: document cavium-pip rx-delay/tx-delay properties
  of: use pr_fmt prefix for all console printing
  of/irq: Mark initialised interrupt controllers as populated
  of: fix memory leak related to safe_name()
  Revert "of/platform: export of_default_bus_match_table"
  of: unittest: use of_platform_default_populate() to populate default bus
  memory: omap-gpmc: use of_platform_default_populate() to populate default bus
  bus: uniphier-system-bus: use of_platform_default_populate() to populate default bus
  ...
2016-07-30 11:32:01 -07:00
Linus Torvalds
1056c9bd27 The bulk of the changes are updates and fixes to existing clk provider
drivers, along with a pretty standard number of new drivers. The core
 recieved a small number of updates as well.
 
 Core changes of note:
 * Removed CLK_IS_ROOT flag
 
 New clk provider drivers:
 * Renesas r8a7796 Clock Pulse Generator / Module Standby and Software
   Reset
 * Allwinner sun8i H3 Clock Controller Unit
 * AmLogic meson8b clock controller (rewritten)
 * AmLogic gxbb clock controller
 * Support for some new ICs was added by simple changes to static data
   tables for chips sharing the same family
 
 Driver updates of note:
 * the Allwinner sunxi clock driver infrastucture was rewritten to
   comform to the state of the art at drivers/clk/sunxi-ng. The old
   implementation is still supported for backwards compatibility with the
   DT ABI
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJXm9LNAAoJEKI6nJvDJaTUHYcP/1oKTHA4uSPux2+5l9vApEJc
 eSV0oSEP/zDDwYfV4xRt1byAOtzHe3R4p5zDGShnk3+CCwXbQSLGmFJFmZDoSs5E
 ulftXA8uRV4ac0SEh86BOlstdJHGOVo7Q38XtdPvgKRTv58+ZqHuFcvB726bWY/I
 GMzVTkpugbn5U7e3MLM58gkBoqa8BS9uIdsf5q/JIxdpe+VgUtjmioH9Rz6G5U5Y
 T6ObeU6HNusDaz6kUJIiAEkl4UNHk+aebnY7FTbwd9JGGySp5nEknVY7tSMeC6iU
 hG/knzdMiasa5yv0xdW3/OxnKCQtEeuPPHnJDSXWc9yb20vNs2QhRtd8uckMGHOH
 X7fXq9xoLxDH81AggW188g7+xfhx7J+ASpjHmHs74itUIoqhhy61PM3I95yQ8m6s
 QDkInqSszFd2C7OiYsy6m5XS0K8g0jyA2tzWhxlIZNVAl7yP71+HVLJ45NNKCPkb
 oYbMx4ho538Tl6+c6HHJG0vC3XWSdHbvwMOKyE9k/GyrIJ7FxLZS3BLOYPm6u9t4
 9n041MbtBb/UrWVi8kMSdmBjIIs6VSGjq5gwRFAqoPQt3VkPuy0IfnwIpjdE1aFz
 PbDJQF4ze7D3JgH2rWtivXbCTRthTANmoar299xjSwwb99TSlP2IABY3cNnJrhPr
 9/PGShM3hqRpBEUnEmt+
 =1dpY
 -----END PGP SIGNATURE-----

Merge tag 'clk-for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk updates from Michael Turquette:
 "The bulk of the changes are updates and fixes to existing clk provider
  drivers, along with a pretty standard number of new drivers.  The core
  recieved a small number of updates as well.

  Core changes of note:
   - removed CLK_IS_ROOT flag

  New clk provider drivers:
   - Renesas r8a7796 clock pulse generator / module standby and
     software reset
   - Allwinner sun8i H3 clock controller unit
   - AmLogic meson8b clock controller (rewritten)
   - AmLogic gxbb clock controller
   - support for some new ICs was added by simple changes to static
     data tables for chips sharing the same family

  Driver updates of note:
   - the Allwinner sunxi clock driver infrastucture was rewritten to
     comform to the state of the art at drivers/clk/sunxi-ng.  The old
     implementation is still supported for backwards compatibility with
     the DT ABI"

* tag 'clk-for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (162 commits)
  clk: Makefile: re-sort and clean up
  Revert "clk: gxbb: expose CLKID_MMC_PCLK"
  clk: samsung: Allow modular build of the Audio Subsystem CLKCON driver
  clk: samsung: make clk-s5pv210-audss explicitly non-modular
  clk: exynos5433: remove CLK_IGNORE_UNUSED flag from SPI clocks
  clk: oxnas: Add hardware dependencies
  clk: imx7d: do not set parent of ethernet time/ref clocks
  ARM: dt: sun8i: switch the H3 to the new CCU driver
  clk: sunxi-ng: h3: Fix Kconfig symbol typo
  clk: sunxi-ng: h3: Fix audio clock divider offset
  clk: sunxi-ng: Add H3 clocks
  clk: sunxi-ng: Add N-K-M-P factor clock
  clk: sunxi-ng: Add N-K-M Factor clock
  clk: sunxi-ng: Add N-M-factor clock support
  clk: sunxi-ng: Add N-K-factor clock support
  clk: sunxi-ng: Add M-P factor clock support
  clk: sunxi-ng: Add divider
  clk: sunxi-ng: Add phase clock support
  clk: sunxi-ng: Add mux clock support
  clk: sunxi-ng: Add gate clock support
  ...
2016-07-30 11:20:02 -07:00
Michael Ellerman
719dbb2df7 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Highlights include more 8xx optimizations, device tree updates,
and MVME7100 support."
2016-07-30 13:43:19 +10:00
Linus Torvalds
797cee982e Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit
Pull audit updates from Paul Moore:
 "Six audit patches for 4.8.

  There are a couple of style and minor whitespace tweaks for the logs,
  as well as a minor fixup to catch errors on user filter rules, however
  the major improvements are a fix to the s390 syscall argument masking
  code (reviewed by the nice s390 folks), some consolidation around the
  exclude filtering (less code, always a win), and a double-fetch fix
  for recording the execve arguments"

* 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit:
  audit: fix a double fetch in audit_log_single_execve_arg()
  audit: fix whitespace in CWD record
  audit: add fields to exclude filter by reusing user filter
  s390: ensure that syscall arguments are properly masked on s390
  audit: fix some horrible switch statement style crimes
  audit: fixup: log on errors from filter user rules
2016-07-29 17:54:17 -07:00
Linus Torvalds
7a1e8b80fb Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "Highlights:

   - TPM core and driver updates/fixes
   - IPv6 security labeling (CALIPSO)
   - Lots of Apparmor fixes
   - Seccomp: remove 2-phase API, close hole where ptrace can change
     syscall #"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits)
  apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling
  tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family)
  tpm: Factor out common startup code
  tpm: use devm_add_action_or_reset
  tpm2_i2c_nuvoton: add irq validity check
  tpm: read burstcount from TPM_STS in one 32-bit transaction
  tpm: fix byte-order for the value read by tpm2_get_tpm_pt
  tpm_tis_core: convert max timeouts from msec to jiffies
  apparmor: fix arg_size computation for when setprocattr is null terminated
  apparmor: fix oops, validate buffer size in apparmor_setprocattr()
  apparmor: do not expose kernel stack
  apparmor: fix module parameters can be changed after policy is locked
  apparmor: fix oops in profile_unpack() when policy_db is not present
  apparmor: don't check for vmalloc_addr if kvzalloc() failed
  apparmor: add missing id bounds check on dfa verification
  apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task
  apparmor: use list_next_entry instead of list_entry_next
  apparmor: fix refcount race when finding a child profile
  apparmor: fix ref count leak when profile sha1 hash is read
  apparmor: check that xindex is in trans_table bounds
  ...
2016-07-29 17:38:46 -07:00
Linus Torvalds
601f887d61 Power management fix for v4.8-rc1
Fix a nasty (and really hard to debug) memory corruption during
 resume from hibernation on x86-64 (that leads to a kernel panic
 most of the time) due to the use of a stale stack pointer value
 in FRAME_BEGIN (Josh Poimboeuf).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXm76YAAoJEILEb/54YlRxBu0QAIwOnS0YJO/AuXNNEUxzK3xu
 /+nRda38MJVUeoU/Z8wBkf5l8XEzNgYZPabj87r4Dkyg32dWKXHwW8R9ApyO1RfA
 7CReTgO2rKfOrDWCq2uYdea0hKM36vavjUHGJ+fSScaqHWaGSqvqeHusroUDL+3y
 grRI8GDNyfG1eV9IxrJwcE0Oegp0QKX5saSKXXVoeaM63YnNUwv9usdEOpdE1lmk
 r6WwpYNZSh/vjaNlLEcrmEAbU3Nv/wBNqGBkoZoATgTT0YsONfqUJcU3fhVXeNMR
 u3Tnk2HJZUmJ3HQgFBcpGU5r1/c/qX8PzPK7e3D60QDgztkADCroW9WOYfwICs+A
 qmsCUs0bVyqxUOBWLIbyJ/n9/VzkeE/cD3lOjkGh/ChRZWpUeDobJWeoseMxDzG5
 21kOwEQvel9Itk8C57zoWCvuSNddg9M2f/GL7yoYtyqKrMriRwwqftIOhJocySP0
 2eONtBn2be9IUX/cdSuVVnqvWP9pGCXqr4IJZoormX3lf1Zi+/G9buPo0At35k5F
 kHxRhnAGsnYxolAhVboxQdkGItrP0LlDvXLnLzPhWjz8qOLOkZpnIfpjsj+bwF7K
 2S+Yr3oBldxIYu1A+jne/fOiTUiQ5iAicWnLwNQpmETWu/+Rz72bweWRUYiE71mW
 H0M0HaD9UzhockpqpeEM
 =O25M
 -----END PGP SIGNATURE-----

Merge tag 'pm-urgent-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fix from Rafael Wysocki:
 "Fix a nasty (and really hard to debug) memory corruption during resume
  from hibernation on x86-64 (that leads to a kernel panic most of the
  time) due to the use of a stale stack pointer value in FRAME_BEGIN
  (Josh Poimboeuf)"

* tag 'pm-urgent-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  x86/power/64: Fix hibernation return address corruption
2016-07-29 14:34:55 -07:00
Linus Torvalds
a6408f6cb6 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull smp hotplug updates from Thomas Gleixner:
 "This is the next part of the hotplug rework.

   - Convert all notifiers with a priority assigned

   - Convert all CPU_STARTING/DYING notifiers

     The final removal of the STARTING/DYING infrastructure will happen
     when the merge window closes.

  Another 700 hundred line of unpenetrable maze gone :)"

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
  timers/core: Correct callback order during CPU hot plug
  leds/trigger/cpu: Move from CPU_STARTING to ONLINE level
  powerpc/numa: Convert to hotplug state machine
  arm/perf: Fix hotplug state machine conversion
  irqchip/armada: Avoid unused function warnings
  ARC/time: Convert to hotplug state machine
  clocksource/atlas7: Convert to hotplug state machine
  clocksource/armada-370-xp: Convert to hotplug state machine
  clocksource/exynos_mct: Convert to hotplug state machine
  clocksource/arm_global_timer: Convert to hotplug state machine
  rcu: Convert rcutree to hotplug state machine
  KVM/arm/arm64/vgic-new: Convert to hotplug state machine
  smp/cfd: Convert core to hotplug state machine
  x86/x2apic: Convert to CPU hotplug state machine
  profile: Convert to hotplug state machine
  timers/core: Convert to hotplug state machine
  hrtimer: Convert to hotplug state machine
  x86/tboot: Convert to hotplug state machine
  arm64/armv8 deprecated: Convert to hotplug state machine
  hwtracing/coresight-etm4x: Convert to hotplug state machine
  ...
2016-07-29 13:55:30 -07:00
Linus Torvalds
86505fc06b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc updates from David Miller:

 1) Double spin lock bug in sunhv serial driver, from Dan Carpenter.

 2) Use correct RSS estimate when determining whether to grow the huge
    TSB or not, from Mike Kravetz.

 3) Don't use full three level page tables for hugepages, PMD level is
    sufficient.  From Nitin Gupta.

 4) Mask out extraneous bits from TSB_TAG_ACCESS register, we only want
    the address bits.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Trim page tables for 8M hugepages
  sparc64 mm: Fix base TSB sizing when hugetlb pages are used
  sparc: serial: sunhv: fix a double lock bug
  sparc32: off by ones in BUG_ON()
  sparc: Don't leak context bits into thread->fault_address
2016-07-29 13:23:18 -07:00
Linus Torvalds
9d3bc3d4a4 ARC updates for 4.8-rc1
Things have been calm here - nothing much except for a few fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXm7MUAAoJEGnX8d3iisJeTbkP/3z5vrczERECRsaHGU6KpaUP
 F9+SpDG+BmCzcKhz60GzY64p6gl2KWtbE3es66lQf45yHD7s3UhAH1Zxc6pFgZjn
 JTF8z3LFRDlKQ0H4gVxWsDp4+QXn3LOklckUpgqTocLxNpg9qXVXCVZhoUb12A3d
 AxPtOUGsFTtC1e6wnYofGknJjApRls8f11CYmJdQ8aS9lLC/pA+fC0U0fM7LUzx5
 DE+KLB3LithYxQ9TBfVrFSbCTbyxxDmYE59v9DEZQftn9pwVMtZLpGEs4BVO31fw
 bQLvROfx5xn1yElN4yH2XT6q+N47XA6MtJ3qDvjPN59yWYwP9mOCWlfoIJ0q8UY0
 sduU+9K1qZ5WQkXjh66+tPUKpm01YrZy2vghkCJ6YWXnOg9WzbYZQtkwia5TyU8h
 lQ36ri72ncK9gPfOSxQGmxY19o7ujX9our9T+bQ4JyjtMicCVKlxSGJLruyTd2ma
 LqOjqpLuZv2Ryf/2UbpoOmCsynfjqSLLc2CR+jlGJvH1vD9ycvfjMS1dcTcpcJ3Z
 AcsEDBoviMRbM2mWCybtT5gs35vzWWCpgsi+hG4lg0kYtrclGWsc2/uG13BFNK3w
 N9/aKgy6a8hYWWOEpKzqzuonR13oob91pDbkUD/m8uS9PSg/5W4WFlvgAjOIog9G
 pfL3M6oF/gEZ4CByLS7J
 =/rhI
 -----END PGP SIGNATURE-----

Merge tag 'arc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC updates from Vineet Gupta:
 "Things have been calm here - nothing much except for a few fixes"

* tag 'arc-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: mm: don't loose PTE_SPECIAL in pte_modify()
  ARC: dma: fix address translation in arc_dma_free
  ARC: typo fix in mm/ioremap.c
  ARC: fix linux-next build breakage
2016-07-29 13:17:34 -07:00
Rafael J. Wysocki
e148d0f85c Merge branch 'pm-sleep'
* pm-sleep:
  x86/power/64: Fix hibernation return address corruption
2016-07-29 22:12:54 +02:00
Linus Torvalds
befff3bfb3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32
Pull AVR32 updates from Hans-Christian Noren Egtvedt.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32:
  avr32: off by one in at32_init_pio()
  avr32: fixup code style in unistd.h and syscall_table.S
  avr32: wire up preadv2 and pwritev2 syscalls
2016-07-29 13:09:55 -07:00
Linus Torvalds
b5f00d18cc Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
 "Included in this update are:

   - Patches from Gregory Clement to fix the coherent DMA cases in our
     dma-mapping code.

   - A number of CPU errata updates and fixes.

   - ARM cpuidle improvements from Jisheng Zhang.

   - Fix from Kees for the location of _etext.

   - Cleanups from Masahiro Yamada to avoid duplicated messages during
     the kernel build, and remove CONFIG_ARCH_HAS_BARRIERS.

   - Remove a udelay loop limitation, allowing for faster CPUs to
     calibrate the delay correctly.

   - Cleanup some left-overs from the SW PAN implementation.

   - Ensure that a modified address limit is not visible to exception
     handlers"

* 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (21 commits)
  ARM: 8586/1: cpuidle: make arm_cpuidle_suspend() a bit more efficient
  ARM: 8585/1: cpuidle: fix !cpuidle_ops[cpu].init case during init
  ARM: 8561/4: dma-mapping: Fix the coherent case when iommu is used
  ARM: 8561/3: dma-mapping: Don't use outer_flush_range when the L2C is coherent
  ARM: 8560/1: errata: Workaround errata A12 825619 / A17 852421
  ARM: 8559/1: errata: Workaround erratum A12 821420
  ARM: 8558/1: errata: Workaround errata A12 818325/852422 A17 852423
  ARM: save and reset the address limit when entering an exception
  ARM: 8577/1: Fix Cortex-A15 798181 errata initialization
  ARM: 8584/1: floppy: avoid gcc-6 warning
  ARM: 8583/1: mm: fix location of _etext
  ARM: 8582/1: remove unused CONFIG_ARCH_HAS_BARRIERS
  ARM: 8306/1: loop_udelay: remove bogomips value limitation
  ARM: 8581/1: add missing <asm/prom.h> to arch/arm/kernel/devtree.c
  ARM: 8576/1: avoid duplicating "Kernel: arch/arm/boot/*Image is ready"
  ARM: 8556/1: on a generic DT system: do not touch l2x0
  ARM: uaccess: remove put_user() code duplication
  ARM: 8580/1: Remove orphaned __addr_ok() definition
  ARM: get rid of horrible *(unsigned int *)(regs + 1)
  ARM: introduce svc_pt_regs structure
  ...
2016-07-29 13:03:49 -07:00
Nitin Gupta
7bc3777ca1 sparc64: Trim page tables for 8M hugepages
For PMD aligned (8M) hugepages, we currently allocate
all four page table levels which is wasteful. We now
allocate till PMD level only which saves memory usage
from page tables.

Also, when freeing page table for 8M hugepage backed region,
make sure we don't try to access non-existent PTE level.

Orabug: 22630259

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-29 10:49:16 -07:00
Josh Poimboeuf
4ce827b4cc x86/power/64: Fix hibernation return address corruption
In kernel bug 150021, a kernel panic was reported when restoring a
hibernate image.  Only a picture of the oops was reported, so I can't
paste the whole thing here.  But here are the most interesting parts:

  kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
  BUG: unable to handle kernel paging request at ffff8804615cfd78
  ...
  RIP: ffff8804615cfd78
  RSP: ffff8804615f0000
  RBP: ffff8804615cfdc0
  ...
  Call Trace:
   do_signal+0x23
   exit_to_usermode_loop+0x64
   ...

The RIP is on the same page as RBP, so it apparently started executing
on the stack.

The bug was bisected to commit ef0f3ed5a4 (x86/asm/power: Create
stack frames in hibernate_asm_64.S), which in retrospect seems quite
dangerous, since that code saves and restores the stack pointer from a
global variable ('saved_context').

There are a lot of moving parts in the hibernate save and restore paths,
so I don't know exactly what caused the panic.  Presumably, a FRAME_END
was executed without the corresponding FRAME_BEGIN, or vice versa.  That
would corrupt the return address on the stack and would be consistent
with the details of the above panic.

[ rjw: One major problem is that by the time the FRAME_BEGIN in
  restore_registers() is executed, the stack pointer value may not
  be valid any more.  Namely, the stack area pointed to by it
  previously may have been overwritten by some image memory contents
  and that page frame may now be used for whatever different purpose
  it had been allocated for before hibernation.  In that case, the
  FRAME_BEGIN will corrupt that memory. ]

Instead of doing the frame pointer save/restore around the bounds of the
affected functions, just do it around the call to swsusp_save().

That has the same effect of ensuring that if swsusp_save() sleeps, the
frame pointers will be correct.  It's also a much more obviously safe
way to do it than the original patch.  And objtool still doesn't report
any warnings.

Fixes: ef0f3ed5a4 (x86/asm/power: Create stack frames in hibernate_asm_64.S)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=150021
Cc: 4.6+ <stable@vger.kernel.org> # 4.6+
Reported-by: Andre Reinke <andre.reinke@mailbox.org>
Tested-by: Andre Reinke <andre.reinke@mailbox.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-07-29 13:38:59 +02:00
Dan Carpenter
55f1cf83d5 avr32: off by one in at32_init_pio()
The pio_dev[] array has MAX_NR_PIO_DEVICES elements so the > should be
>=.

Fixes: 5f97f7f940 ('[PATCH] avr32 architecture')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2016-07-29 11:55:57 +02:00
Hans-Christian Noren Egtvedt
6ad4a21b67 avr32: fixup code style in unistd.h and syscall_table.S
This patch swaps the mix of tabs and space for alignment of comment
after code to use spaces only.

Also document why recvmmsg was defined twice in the syscall_table.S
table, but only once in unistd.h. In short, wired in the table by
generic arch patch, but forgotten in unistd.h (review slip).
2016-07-29 11:55:57 +02:00
Hans-Christian Noren Egtvedt
389ce5a961 avr32: wire up preadv2 and pwritev2 syscalls
This patch wires up the new preadv2 and pwritev2 syscall on AVR32.

On AVR32, all parameters beyond the 5th are passed on the stack. System
calls don't use the stack -- they borrow a callee-saved register
instead. This means that syscalls that take 6 parameters must be called
through a stub that pushes the last parameter on the stack.

Signed-off-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
2016-07-29 11:55:57 +02:00
James Hogan
3146bc64d1 arm64: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
AT_VECTOR_SIZE_ARCH should be defined with the maximum number of
NEW_AUX_ENT entries that ARCH_DLINFO can contain, but it wasn't defined
for arm64 at all even though ARCH_DLINFO will contain one NEW_AUX_ENT
for the VDSO address.

This shouldn't be a problem as AT_VECTOR_SIZE_BASE includes space for
AT_BASE_PLATFORM which arm64 doesn't use, but lets define it now and add
the comment above ARCH_DLINFO as found in several other architectures to
remind future modifiers of ARCH_DLINFO to keep AT_VECTOR_SIZE_ARCH up to
date.

Fixes: f668cd1673 ("arm64: ELF definitions")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-07-29 10:45:58 +01:00
Ard Biesheuvel
08cc55b2af arm64: relocatable: suppress R_AARCH64_ABS64 relocations in vmlinux
The linker routines that we rely on to produce a relocatable PIE binary
treat it as a shared ELF object in some ways, i.e., it emits symbol based
R_AARCH64_ABS64 relocations into the final binary since doing so would be
appropriate when linking a shared library that is subject to symbol
preemption. (This means that an executable can override certain symbols
that are exported by a shared library it is linked with, and that the
shared library *must* update all its internal references as well, and point
them to the version provided by the executable.)

Symbol preemption does not occur for OS hosted PIE executables, let alone
for vmlinux, and so we would prefer to get rid of these symbol based
relocations. This would allow us to simplify the relocation routines, and
to strip the .dynsym, .dynstr and .hash sections from the binary. (Note
that these are tiny, and are placed in the .init segment, but they clutter
up the vmlinux binary.)

Note that these R_AARCH64_ABS64 relocations are only emitted for absolute
references to symbols defined in the linker script, all other relocatable
quantities are covered by anonymous R_AARCH64_RELATIVE relocations that
simply list the offsets to all 64-bit values in the binary that need to be
fixed up based on the offset between the link time and run time addresses.

Fortunately, GNU ld has a -Bsymbolic option, which is intended for shared
libraries to allow them to ignore symbol preemption, and unconditionally
bind all internal symbol references to its own definitions. So set it for
our PIE binary as well, and get rid of the asoociated sections and the
relocation code that processes them.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: fixed conflict with __dynsym_offset linker script entry]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-07-29 10:45:01 +01:00
Ard Biesheuvel
d6732fc402 arm64: vmlinux.lds: make __rela_offset and __dynsym_offset ABSOLUTE
Due to the untyped KIMAGE_VADDR constant, the linker may not notice
that the __rela_offset and __dynsym_offset expressions are absolute
values (i.e., are not subject to relocation). This does not matter for
KASLR, but it does confuse kallsyms in relative mode, since it uses
the lowest non-absolute symbol address as the anchor point, and expects
all other symbol addresses to be within 4 GB of it.

Fix this by qualifying these expressions as ABSOLUTE() explicitly.

Fixes: 0cd3defe0a ("arm64: kernel: perform relocation processing from ID map")
Cc: <stable@vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-07-29 10:44:53 +01:00
James Hogan
11f769039e MIPS: c-r4k: Use SMP calls for CM indexed cache ops
The MIPS Coherence Manager (CM) can propagate address-based ("hit")
cache operations to other cores in the coherent system, alleviating
software of the need to use SMP calls, however indexed cache operations
are not propagated by hardware since doing so makes no sense for
separate caches.

Update r4k_op_needs_ipi() to report that only hit cache operations are
globalized by the CM, requiring indexed cache operations to be
globalized by software via an SMP call.

r4k_on_each_cpu() previously had a special case for CONFIG_MIPS_MT_SMP,
intended to avoid the SMP calls when the only other CPUs in the system
were other VPEs in the same core, and hence sharing the same caches.
This was changed by commit cccf34e941 ("MIPS: c-r4k: Fix cache
flushing for MT cores") to apparently handle multi-core multi-VPE
systems, but it focussed mainly on hit cache ops, so the SMP calls were
still disabled entirely for CM systems.

This doesn't normally cause problems, but tests can be written to hit
these corner cases by using multiple threads, or changing task
affinities to force the process to migrate cores. For example the
failure of mprotect RW->RX to globally sync icaches (via
flush_cache_range) can be detected by modifying and mprotecting a code
page on one core, and migrating to a different core to execute from it.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13807/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29 10:19:30 +02:00
James Hogan
f70ddc07b6 MIPS: c-r4k: Avoid small flush_icache_range SMP calls
Avoid SMP calls for flushing small icache ranges. On non-CM platforms,
and CM platforms too after we make r4k_on_each_cpu() take the cache op
type into account, it will be called on multiple CPUs due to the
possibility that local_r4k_flush_icache_range_ipi() could do
non-globalized indexed cache ops. This rougly copies the range size
check out into r4k_flush_icache_range(), which can disallow indexed
cache ops and allow r4k_on_each_cpu() to skip the SMP call.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13805/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29 10:19:30 +02:00
James Hogan
27b93d9c1d MIPS: c-r4k: Local flush_icache_range cache op override
Allow the permitted cache op types used by
local_r4k_flush_icache_range_ipi() to be overridden by the SMP caller.
This will allow SMP calls to be avoided under certain circumstances,
falling back to a single CPU performing globalized hit cache ops only.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13803/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29 10:19:30 +02:00
James Hogan
a9341ae241 MIPS: c-r4k: Split r4k_flush_kernel_vmap_range()
Split the operation of r4k_flush_kernel_vmap_range() into separate
SMP callbacks for the indexed cache flush and hit cache flush cases,
since the logic to determine which to use can be determined by the
initiating CPU prior to doing any SMP calls.

This will help when we change r4k_on_each_cpu() to distinguish indexed
and hit cache ops in a later patch, preventing globalized hit cache ops
being performed redundantly on multiple CPUs.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13806/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29 10:19:29 +02:00
James Hogan
640511ae92 MIPS: c-r4k: Exclude sibling CPUs in SMP calls
When performing SMP calls to foreign cores, exclude sibling CPUs from
the provided map, as we already handle the local core on the current
CPU. This prevents an SMP call from for example core 0, VPE 1 to VPE 0
on the same core.

In the process the cpu_foreign_map cpumask is turned into an array of
cpumasks, so that each CPU has its own version of it which excludes
sibling CPUs. r4k_op_needs_ipi() is also updated to reflect that cache
management SMP calls are not needed when all CPUs are siblings (i.e.
there are no foreign CPUs according to the new cpu_foreign_map[]
semantics which exclude siblings).

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Jayachandran C. <jchandra@broadcom.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13801/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29 10:19:29 +02:00
James Hogan
6d758bfc7b MIPS: c-r4k: Fix valid ASID optimisation
Several cache operations are optimised to return early from the SMP call
handler if the memory map in question has no valid ASID on the current
CPU, or any online CPU in the case of MIPS_MT_SMP. The idea is that if a
memory map has never been used on a CPU it shouldn't have cache lines in
need of flushing.

However this doesn't cover all cases when ASIDs for other CPUs need to
be checked:
- Offline VPEs may have recently been online and brought lines into the
  (shared) cache, so they should also be checked, rather than only
  online CPUs.
- SMP systems with a Coherence Manager (CM), but with MT disabled still
  have globalized hit cache ops, but don't use SMP calls, so all present
  CPUs should be taken into account.
- R6 systems have a different multithreading implementation, so
  MIPS_MT_SMP won't be set, but as above may still have a CM which
  globalizes hit cache ops.

Additionally for non-globalized cache operations where an SMP call to a
single VPE in each foreign core is used, it is not necessary to check
every CPU in the system, only sibling CPUs sharing the same first level
cache.

Fix this by making has_valid_asid() take a cache op type argument like
r4k_on_each_cpu(), so it can determine whether r4k_on_each_cpu() will
have done SMP calls to other cores. It can then determine which set of
CPUs to check the ASIDs of based on that, excluding foreign CPUs if an
SMP call will have been performed.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13804/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29 10:19:29 +02:00
James Hogan
d374d93742 MIPS: c-r4k: Add r4k_on_each_cpu cache op type arg
The r4k_on_each_cpu() function calls the specified cache flush helper on
other CPUs if deemed necessary due to the cache ops not being
globalized by hardware. However this really depends on the cache op
addressing type, as the MIPS Coherence Manager (CM) if present will
globalize "hit" cache ops (addressed by virtual address), but not
"index" cache ops (addressed by cache index). This results in index
cache ops only being performed on a single CPU when CM is present.

Most (but not all) of the functions called by r4k_on_each_cpu() perform
cache operations exclusively with a single cache op type, so add a type
argument and modify the callers to pass in some combination of R4K_HIT
(global kernel virtual addressing or user virtual addressing
conditional upon matching active_mm) and R4K_INDEX (index into cache).

This will allow r4k_on_each_cpu() to later distinguish these cases and
decide whether to perform an SMP call based on it.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13798/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29 10:19:29 +02:00
James Hogan
8bd646e92b MIPS: c-r4k: Avoid dcache flush for sigtramps
Avoid the dcache and scache flush in local_r4k_flush_cache_sigtramp() if
the icache fills straight from the dcache.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13802/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29 10:19:29 +02:00
James Hogan
e523f289fe MIPS: c-r4k: Fix sigtramp SMP call to use kmap
Fix r4k_flush_cache_sigtramp() and local_r4k_flush_cache_sigtramp() to
flush the delay slot emulation trampoline cacheline through a kmap
rather than directly when the active_mm doesn't match that of the task
initiating the flush, a bit like local_r4k_flush_cache_page() does.

This would fix a corner case on SMP systems without hardware globalized
hit cache ops, where a migration to another CPU after the flush, where
that CPU did not have the same mm active at the time of the flush, could
result in stale icache content being executed instead of the trampoline,
e.g. from a previous delay slot emulation with a similar stack pointer.

This case was artificially triggered by replacing the icache flush with
a full indexed flush (not globalized on CM systems) and forcing the SMP
call to take place, with a test program that alternated two FPU delay
slots with a parent process repeatedly changing scheduler affinity.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13797/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29 10:19:29 +02:00
James Hogan
0758b116b4 MIPS: c-r4k: Fix protected_writeback_scache_line for EVA
The protected_writeback_scache_line() function is used by
local_r4k_flush_cache_sigtramp() to flush an FPU delay slot emulation
trampoline on the userland stack from the caches so it is visible to
subsequent instruction fetches.

Commit de8974e3f7 ("MIPS: asm: r4kcache: Add EVA cache flushing
functions") updated some protected_ cache flush functions to use EVA
CACHEE instructions via protected_cachee_op(), and commit 83fd43449b
("MIPS: r4kcache: Add EVA case for protected_writeback_dcache_line") did
the same thing for protected_writeback_dcache_line(), but
protected_writeback_scache_line() never got updated. Lets fix that now
to flush the right user address from the secondary cache rather than
some arbitrary kernel unmapped address.

This issue was spotted through code inspection, and it seems unlikely to
be possible to hit this in practice. It theoretically affect EVA kernels
on EVA capable cores with an L2 cache, where the icache fetches straight
from RAM (cpu_icache_snoops_remote_store == 0), running a hard float
userland with FPU disabled (nofpu). That both Malta and Boston platforms
override cpu_icache_snoops_remote_store to 1 suggests that all MIPS
cores fetch instructions into icache straight from L2 rather than RAM.

Fixes: de8974e3f7 ("MIPS: asm: r4kcache: Add EVA cache flushing functions")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13800/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29 10:19:29 +02:00
James Hogan
926963160c MIPS: SMP: Drop stop_this_cpu() cpu_foreign_map hack
Commit cccf34e941 ("MIPS: c-r4k: Fix cache flushing for MT cores")
added the cpu_foreign_map cpumask containing a single VPE from each
online core, and recalculated it when secondary CPUs are brought up.

stop_this_cpu() was also updated to recalculate cpu_foreign_map, but
with an additional hack before marking the CPU as offline to copy
cpu_online_mask into cpu_foreign_map and perform an SMP memory barrier.

This appears to have been intended to prevent cache management IPIs
being missed when the VPE representing the core in cpu_foreign_map is
taken offline while other VPEs remain online. Unfortunately there is
nothing in this hack to prevent r4k_on_each_cpu() from reading the old
cpu_foreign_map, and smp_call_function_many() from reading that new
cpu_online_mask with the core's representative VPE marked offline. It
then wouldn't send an IPI to any online VPEs of that core.

stop_this_cpu() is only actually called in panic and system shutdown /
halt / reboot situations, in which case all CPUs are going down and we
don't really need to care about cache management, so drop this hack.

Note that the __cpu_disable() case for CPU hotplug is handled in the
previous commit, and no synchronisation is needed there due to the use
of stop_machine() which prevents hotplug from taking place while any CPU
has disabled preemption (as r4k_on_each_cpu() does).

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13796/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29 10:19:28 +02:00
James Hogan
826e99be6a MIPS: SMP: Update cpu_foreign_map on CPU disable
When a CPU is disabled via CPU hotplug, cpu_foreign_map is not updated.
This could result in cache management SMP calls being sent to offline
CPUs instead of online siblings in the same core.

Add a call to calculate_cpu_foreign_map() in the various MIPS cpu
disable callbacks after set_cpu_online(). All cases are updated for
consistency and to keep cpu_foreign_map strictly up to date, not just
those which may support hardware multithreading.

Fixes: cccf34e941 ("MIPS: c-r4k: Fix cache flushing for MT cores")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: Hongliang Tao <taohl@lemote.com>
Cc: Hua Yan <yanh@lemote.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13799/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29 10:19:28 +02:00
James Hogan
a05c392032 MIPS: SMP: Clear ASID without confusing has_valid_asid()
The SMP flush_tlb_*() functions may clear the memory map's ASIDs for
other CPUs if the mm has only a single user (the current CPU) in order
to avoid SMP calls. However this makes it appear to has_valid_asid(),
which is used by various cache flush functions, as if the CPUs have
never run in the mm, and therefore can't have cached any of its memory.

For flush_tlb_mm() this doesn't sound unreasonable.

flush_tlb_range() corresponds to flush_cache_range() which does do full
indexed cache flushes, but only on the icache if the specified mapping
is executable, otherwise it doesn't guarantee that there are no cache
contents left for the mm.

flush_tlb_page() corresponds to flush_cache_page(), which will perform
address based cache ops on the specified page only, and also only
touches the icache if the page is executable. It does not guarantee that
there are no cache contents left for the mm.

For example, this affects flush_cache_range() which uses the
has_valid_asid() optimisation. It is required to flush the icache when
mappings are made executable (e.g. using mprotect) so they are
immediately usable. If some code is changed to non executable in order
to be modified then it will not be flushed from the icache during that
time, but the ASID on other CPUs may still be cleared for TLB flushing.
When the code is changed back to executable, flush_cache_range() will
assume the code hasn't run on those other CPUs due to the zero ASID, and
won't invalidate the icache on them.

This is fixed by clearing the other CPUs ASIDs to 1 instead of 0 for the
above two flush_tlb_*() functions when the corresponding cache flushes
are likely to be incomplete (non executable range flush, or any page
flush). This ASID appears valid to has_valid_asid(), but still triggers
ASID regeneration due to the upper ASID version bits being 0, which is
less than the minimum ASID version of 1 and so always treated as stale.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13795/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-29 10:19:28 +02:00
Mike Kravetz
af1b1a9b36 sparc64 mm: Fix base TSB sizing when hugetlb pages are used
do_sparc64_fault() calculates both the base and huge page RSS sizes and
uses this information in calls to tsb_grow().  The calculation for base
page TSB size is not correct if the task uses hugetlb pages.  hugetlb
pages are not accounted for in RSS, therefore the call to get_mm_rss(mm)
does not include hugetlb pages.  However, the number of pages based on
huge_pte_count (which does include hugetlb pages) is subtracted from
this value.  This will result in an artificially small and often negative
RSS calculation.  The base TSB size is then often set to max_tsb_size
as the passed RSS is unsigned, so a negative value looks really big.

THP pages are also accounted for in huge_pte_count, and THP pages are
accounted for in RSS so the calculation in do_sparc64_fault() is correct
if a task only uses THP pages.

A single huge_pte_count is not sufficient for TSB sizing if both hugetlb
and THP pages can be used.  Instead of a single counter, use two:  one
for hugetlb and one for THP.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-28 21:22:12 -07:00
Linus Torvalds
f0c98ebc57 libnvdimm for 4.8
1/ Replace pcommit with ADR / directed-flushing:
    The pcommit instruction, which has not shipped on any product, is
    deprecated. Instead, the requirement is that platforms implement either
    ADR, or provide one or more flush addresses per nvdimm. ADR
    (Asynchronous DRAM Refresh) flushes data in posted write buffers to the
    memory controller on a power-fail event. Flush addresses are defined in
    ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure:
    "Flush Hint Address Structure". A flush hint is an mmio address that
    when written and fenced assures that all previous posted writes
    targeting a given dimm have been flushed to media.
 
 2/ On-demand ARS (address range scrub):
    Linux uses the results of the ACPI ARS commands to track bad blocks
    in pmem devices.  When latent errors are detected we re-scrub the media
    to refresh the bad block list, userspace can also request a re-scrub at
    any time.
 
 3/ Support for the Microsoft DSM (device specific method) command format.
 
 4/ Support for EDK2/OVMF virtual disk device memory ranges.
 
 5/ Various fixes and cleanups across the subsystem.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXmXBsAAoJEB7SkWpmfYgCEwwP/1IOt9ocP+iHLMDH9KE7VaTZ
 NmUDR+Zy6g5cRQM7SgcuU5BXUcx+OsSrSrUTVF1cW994o9Gbz1mFotkv0ZAsPcYY
 ZVRQxo2oqHrssyOcg+PsgKWiXn68rJOCgmpEyzaJywl5qTMst7pzsT1s1f7rSh6h
 trCf4VaJJwxZR8fARGtlHUnnhPe2Orp99EZRKEWprAsIv2kPuWpPHSjRjuEgN1JG
 KW8AYwWqFTtiLRUk86I4KBB0wcDrfctsjgN9Ogd6+aHyQBRnVSr2U+vDCFkC8KLu
 qiDCpYp+yyxBjclnljz7tRRT3GtzfCUWd4v2KVWqgg2IaobUc0Lbukp/rmikUXQP
 WLikT2OCQ994eFK5OX3Q3cIU/4j459TQnof8q14yVSpjAKrNUXVSR5puN7Hxa+V7
 41wKrAsnsyY1oq+Yd/rMR8VfH7PHx3bFkrmRCGZCufLX1UQm4aYj+sWagDKiV3yA
 DiudghbOnhfurfGsnXUVw7y7GKs+gNWNBmB6ndAD6ZEHmKoGUhAEbJDLCc3DnANl
 b/2mv1MIdIcC1DlCmnbbcn6fv6bICe/r8poK3VrCK3UgOq/EOvKIWl7giP+k1JuC
 6DdVYhlNYIVFXUNSLFAwz8OkLu8byx7WDm36iEqrKHtPw+8qa/2bWVgOU6OBgpjV
 cN3edFVIdxvZeMgM5Ubq
 =xCBG
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:

 - Replace pcommit with ADR / directed-flushing.

   The pcommit instruction, which has not shipped on any product, is
   deprecated.  Instead, the requirement is that platforms implement
   either ADR, or provide one or more flush addresses per nvdimm.

   ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
   to the memory controller on a power-fail event.

   Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
   Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
   A flush hint is an mmio address that when written and fenced assures
   that all previous posted writes targeting a given dimm have been
   flushed to media.

 - On-demand ARS (address range scrub).

   Linux uses the results of the ACPI ARS commands to track bad blocks
   in pmem devices.  When latent errors are detected we re-scrub the
   media to refresh the bad block list, userspace can also request a
   re-scrub at any time.

 - Support for the Microsoft DSM (device specific method) command
   format.

 - Support for EDK2/OVMF virtual disk device memory ranges.

 - Various fixes and cleanups across the subsystem.

* tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
  libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
  nfit: do an ARS scrub on hitting a latent media error
  nfit: move to nfit/ sub-directory
  nfit, libnvdimm: allow an ARS scrub to be triggered on demand
  libnvdimm: register nvdimm_bus devices with an nd_bus driver
  pmem: clarify a debug print in pmem_clear_poison
  x86/insn: remove pcommit
  Revert "KVM: x86: add pcommit support"
  nfit, tools/testing/nvdimm/: unify shutdown paths
  libnvdimm: move ->module to struct nvdimm_bus_descriptor
  nfit: cleanup acpi_nfit_init calling convention
  nfit: fix _FIT evaluation memory leak + use after free
  tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
  tools/testing/nvdimm: add virtual ramdisk range
  acpi, nfit: treat virtual ramdisk SPA as pmem region
  pmem: kill __pmem address space
  pmem: kill wmb_pmem()
  libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
  fs/dax: remove wmb_pmem()
  libnvdimm, pmem: flush posted-write queues on shutdown
  ...
2016-07-28 17:38:16 -07:00
Linus Torvalds
1c88e19b0f Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "The rest of MM"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (101 commits)
  mm, compaction: simplify contended compaction handling
  mm, compaction: introduce direct compaction priority
  mm, thp: remove __GFP_NORETRY from khugepaged and madvised allocations
  mm, page_alloc: make THP-specific decisions more generic
  mm, page_alloc: restructure direct compaction handling in slowpath
  mm, page_alloc: don't retry initial attempt in slowpath
  mm, page_alloc: set alloc_flags only once in slowpath
  lib/stackdepot.c: use __GFP_NOWARN for stack allocations
  mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB
  mm, kasan: account for object redzone in SLUB's nearest_obj()
  mm: fix use-after-free if memory allocation failed in vma_adjust()
  zsmalloc: Delete an unnecessary check before the function call "iput"
  mm/memblock.c: fix index adjustment error in __next_mem_range_rev()
  mem-hotplug: alloc new page from a nearest neighbor node when mem-offline
  mm: optimize copy_page_to/from_iter_iovec
  mm: add cond_resched() to generic_swapfile_activate()
  Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"
  mm, compaction: don't isolate PageWriteback pages in MIGRATE_SYNC_LIGHT mode
  mm: hwpoison: remove incorrect comments
  make __section_nr() more efficient
  ...
2016-07-28 16:36:48 -07:00
Dennis Chen
cb0a650213 arm64:acpi: fix the acpi alignment exception when 'mem=' specified
When booting an ACPI enabled kernel with 'mem=x', there is the
possibility that ACPI data regions from the firmware will lie above the
memory limit.  Ordinarily these will be removed by
memblock_enforce_memory_limit(.).

Unfortunately, this means that these regions will then be mapped by
acpi_os_ioremap(.) as device memory (instead of normal) thus unaligned
accessess will then provoke alignment faults.

In this patch we adopt memblock_mem_limit_remove_map instead, and this
preserves these ACPI data regions (marked NOMAP) thus ensuring that
these regions are not mapped as device memory.

For example, below is an alignment exception observed on ARM platform
when booting the kernel with 'acpi=on mem=8G':

  ...
  Unable to handle kernel paging request at virtual address ffff0000080521e7
  pgd = ffff000008aa0000
  [ffff0000080521e7] *pgd=000000801fffe003, *pud=000000801fffd003, *pmd=000000801fffc003, *pte=00e80083ff1c1707
  Internal error: Oops: 96000021 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc3-next-20160616+ #172
  Hardware name: AMD Overdrive/Supercharger/Default string, BIOS ROD1001A 02/09/2016
  task: ffff800001ef0000 ti: ffff800001ef8000 task.ti: ffff800001ef8000
  PC is at acpi_ns_lookup+0x520/0x734
  LR is at acpi_ns_lookup+0x4a4/0x734
  pc : [<ffff0000083b8b10>] lr : [<ffff0000083b8a94>] pstate: 60000045
  sp : ffff800001efb8b0
  x29: ffff800001efb8c0 x28: 000000000000001b
  x27: 0000000000000001 x26: 0000000000000000
  x25: ffff800001efb9e8 x24: ffff000008a10000
  x23: 0000000000000001 x22: 0000000000000001
  x21: ffff000008724000 x20: 000000000000001b
  x19: ffff0000080521e7 x18: 000000000000000d
  x17: 00000000000038ff x16: 0000000000000002
  x15: 0000000000000007 x14: 0000000000007fff
  x13: ffffff0000000000 x12: 0000000000000018
  x11: 000000001fffd200 x10: 00000000ffffff76
  x9 : 000000000000005f x8 : ffff000008725fa8
  x7 : ffff000008a8df70 x6 : ffff000008a8df70
  x5 : ffff000008a8d000 x4 : 0000000000000010
  x3 : 0000000000000010 x2 : 000000000000000c
  x1 : 0000000000000006 x0 : 0000000000000000
  ...
    acpi_ns_lookup+0x520/0x734
    acpi_ds_load1_begin_op+0x174/0x4fc
    acpi_ps_build_named_op+0xf8/0x220
    acpi_ps_create_op+0x208/0x33c
    acpi_ps_parse_loop+0x204/0x838
    acpi_ps_parse_aml+0x1bc/0x42c
    acpi_ns_one_complete_parse+0x1e8/0x22c
    acpi_ns_parse_table+0x8c/0x128
    acpi_ns_load_table+0xc0/0x1e8
    acpi_tb_load_namespace+0xf8/0x2e8
    acpi_load_tables+0x7c/0x110
    acpi_init+0x90/0x2c0
    do_one_initcall+0x38/0x12c
    kernel_init_freeable+0x148/0x1ec
    kernel_init+0x10/0xec
    ret_from_fork+0x10/0x40
  Code: b9009fbc 2a00037b 36380057 3219037b (b9400260)
  ---[ end trace 03381e5eb0a24de4 ]---
  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

With 'efi=debug', we can see those ACPI regions loaded by firmware on
that board as:

  efi:   0x0083ff185000-0x0083ff1b4fff [Reserved           |   |  |  |  |  |  |  |   |WB|WT|WC|UC]*
  efi:   0x0083ff1b5000-0x0083ff1c2fff [ACPI Reclaim Memory|   |  |  |  |  |  |  |   |WB|WT|WC|UC]*
  efi:   0x0083ff223000-0x0083ff224fff [ACPI Memory NVS    |   |  |  |  |  |  |  |   |WB|WT|WC|UC]*

Link: http://lkml.kernel.org/r/1468475036-5852-3-git-send-email-dennis.chen@arm.com
Acked-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Dennis Chen <dennis.chen@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Kaly Xin <kaly.xin@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Mel Gorman
11fb998986 mm: move most file-based accounting to the node
There are now a number of accounting oddities such as mapped file pages
being accounted for on the node while the total number of file pages are
accounted on the zone.  This can be coped with to some extent but it's
confusing so this patch moves the relevant file-based accounted.  Due to
throttling logic in the page allocator for reliable OOM detection, it is
still necessary to track dirty and writeback pages on a per-zone basis.

[mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting]
  Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net
Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Mel Gorman
50658e2e04 mm: move page mapped accounting to the node
Reclaim makes decisions based on the number of pages that are mapped but
it's mixing node and zone information.  Account NR_FILE_MAPPED and
NR_ANON_PAGES pages on the node.

Link: http://lkml.kernel.org/r/1467970510-21195-18-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Mel Gorman
599d0c954f mm, vmscan: move LRU lists to node
This moves the LRU lists from the zone to the node and related data such
as counters, tracing, congestion tracking and writeback tracking.

Unfortunately, due to reclaim and compaction retry logic, it is
necessary to account for the number of LRU pages on both zone and node
logic.  Most reclaim logic is based on the node counters but the retry
logic uses the zone counters which do not distinguish inactive and
active sizes.  It would be possible to leave the LRU counters on a
per-zone basis but it's a heavier calculation across multiple cache
lines that is much more frequent than the retry checks.

Other than the LRU counters, this is mostly a mechanical patch but note
that it introduces a number of anomalies.  For example, the scans are
per-zone but using per-node counters.  We also mark a node as congested
when a zone is congested.  This causes weird problems that are fixed
later but is easier to review.

In the event that there is excessive overhead on 32-bit systems due to
the nodes being on LRU then there are two potential solutions

1. Long-term isolation of highmem pages when reclaim is lowmem

   When pages are skipped, they are immediately added back onto the LRU
   list. If lowmem reclaim persisted for long periods of time, the same
   highmem pages get continually scanned. The idea would be that lowmem
   keeps those pages on a separate list until a reclaim for highmem pages
   arrives that splices the highmem pages back onto the LRU. It potentially
   could be implemented similar to the UNEVICTABLE list.

   That would reduce the skip rate with the potential corner case is that
   highmem pages have to be scanned and reclaimed to free lowmem slab pages.

2. Linear scan lowmem pages if the initial LRU shrink fails

   This will break LRU ordering but may be preferable and faster during
   memory pressure than skipping LRU pages.

Link: http://lkml.kernel.org/r/1467970510-21195-4-git-send-email-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-28 16:07:41 -07:00
Linus Torvalds
69c4289449 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  fat: fix error message for bogus number of directory entries
  fat: fix typo s/supeblock/superblock/
  ASoC: max9877: Remove unused function declaration
  dw2102: don't output spurious blank lines to the kernel log
  init: fix Kconfig text
  ARM: io: fix comment grammar
  ocfs: fix ocfs2_xattr_user_get() argument name
  scsi/qla2xxx: Remove erroneous unused macro qla82xx_get_temp_val1()
2016-07-28 14:22:25 -07:00
Vineet Gupta
3925a16ae9 ARC: mm: don't loose PTE_SPECIAL in pte_modify()
LTP madvise05 was generating mm splat

| [ARCLinux]# /sd/ltp/testcases/bin/madvise05
| BUG: Bad page map in process madvise05  pte:80e08211 pmd:9f7d4000
| page:9fdcfc90 count:1 mapcount:-1 mapping:  (null) index:0x0 flags: 0x404(referenced|reserved)
| page dumped because: bad pte
| addr:200b8000 vm_flags:00000070 anon_vma:  (null) mapping:  (null) index:1005c
| file:  (null) fault:  (null) mmap:  (null) readpage:  (null)
| CPU: 2 PID: 6707 Comm: madvise05

And for newer kernels, the system was rendered unusable afterwards.

The problem was mprotect->pte_modify() clearing PTE_SPECIAL (which is
set to identify the special zero page wired to the pte).
When pte was finally unmapped, special casing for zero page was not
done, and instead it was treated as a "normal" page, tripping on the
map counts etc.

This fixes ARC STAR 9001053308

Cc: <stable@vger.kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2016-07-28 12:38:17 -07:00
Russell King
5f5a00eaa1 Merge branches 'cpuidle', 'fixes' and 'misc' into for-linus 2016-07-28 15:28:11 +01:00
James Hogan
233b2ca181 MIPS: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
AT_VECTOR_SIZE_ARCH should be defined with the maximum number of
NEW_AUX_ENT entries that ARCH_DLINFO can contain, but it wasn't defined
for MIPS at all even though ARCH_DLINFO will contain one NEW_AUX_ENT for
the VDSO address.

This shouldn't be a problem as AT_VECTOR_SIZE_BASE includes space for
AT_BASE_PLATFORM which MIPS doesn't use, but lets define it now and add
the comment above ARCH_DLINFO as found in several other architectures to
remind future modifiers of ARCH_DLINFO to keep AT_VECTOR_SIZE_ARCH up to
date.

Fixes: ebb5e78cc6 ("MIPS: Initial implementation of a VDSO")
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13823/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-28 12:06:16 +02:00
Steven J. Hill
7e78db9975 MIPS: Octeon: Improve USB reset code for OCTEON II.
At boot time, do a better job of resetting the USB host controller
to make the frequency "eye" diagram more compliant with the USB
standard while making the controller more reliable.

Signed-off-by: Steven J. Hill <steven.hill@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13831/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-28 12:01:53 +02:00
Steven J. Hill
8552b5b4da MIPS: Octeon: Put restrictions on DMA descriptors.
Set the DMA mask such that all descriptors stay in the
lower 4GB of memory.

Signed-off-by: Steven J. Hill <steven.hill@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13830/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-28 12:01:34 +02:00
Steven J. Hill
71471e2866 MIPS: Octeon: Remove forced mappings of USB interrupts.
Get rid of unnecessary forced interrupt mappings for
the USB host controller on OCTEON II.

Signed-off-by: Steven J. Hill <steven.hill@cavium.com>
Acked-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13824/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-28 12:01:06 +02:00
David Howells
20f06ed9f6 KEYS: 64-bit MIPS needs to use compat_sys_keyctl for 32-bit userspace
MIPS64 needs to use compat_sys_keyctl for 32-bit userspace rather than
calling sys_keyctl.  The latter will work in a lot of cases, thereby hiding
the issue.

Reported-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-security-module@vger.kernel.org
Cc: keyrings@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/13832/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-28 11:56:48 +02:00
James Hogan
5573f6ad3e MIPS: Print segment physical address when EU=1
Currently the debugfs interface to print the segment configuration
refuses to print the physical address of mapped segments. However if the
EU bit is set these become unmapped at error level (when
CP0_Status.ERL=1), so the physical address is still relevant.

Update the logic to print the physical address of mapped segments when
the EU bit is set, while still hiding the Cache Coherency Attribute
(since EU overrides that to uncached when ERL=1 too).

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/13833/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2016-07-28 11:44:30 +02:00
Paul Mackerras
93d17397e4 KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
It turns out that if the guest does a H_CEDE while the CPU is in
a transactional state, and the H_CEDE does a nap, and the nap
loses the architected state of the CPU (which is is allowed to do),
then we lose the checkpointed state of the virtual CPU.  In addition,
the transactional-memory state recorded in the MSR gets reset back
to non-transactional, and when we try to return to the guest, we take
a TM bad thing type of program interrupt because we are trying to
transition from non-transactional to transactional with a hrfid
instruction, which is not permitted.

The result of the program interrupt occurring at that point is that
the host CPU will hang in an infinite loop with interrupts disabled.
Thus this is a denial of service vulnerability in the host which can
be triggered by any guest (and depending on the guest kernel, it can
potentially triggered by unprivileged userspace in the guest).

This vulnerability has been assigned the ID CVE-2016-5412.

To fix this, we save the TM state before napping and restore it
on exit from the nap, when handling a H_CEDE in real mode.  The
case where H_CEDE exits to host virtual mode is already OK (as are
other hcalls which exit to host virtual mode) because the exit
path saves the TM state.

Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-07-28 16:10:07 +10:00
Paul Mackerras
f024ee0984 KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
This moves the transactional memory state save and restore sequences
out of the guest entry/exit paths into separate procedures.  This is
so that these sequences can be used in going into and out of nap
in a subsequent patch.

The only code changes here are (a) saving and restore LR on the
stack, since these new procedures get called with a bl instruction,
(b) explicitly saving r1 into the PACA instead of assuming that
HSTATE_HOST_R1(r13) is already set, and (c) removing an unnecessary
and redundant setting of MSR[TM] that should have been removed by
commit 9d4d0bdd9e0a ("KVM: PPC: Book3S HV: Add transactional memory
support", 2013-09-24) but wasn't.

Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2016-07-28 16:09:34 +10:00
Dan Carpenter
fa1608285b sparc32: off by ones in BUG_ON()
Smatch complains that these tests are off by one, which is true but not
life threatening.

	arch/sparc/kernel/irq_32.c:169 irq_link()
	error: buffer overflow 'irq_map' 384 <= 384

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-27 22:53:17 -07:00
Nicolas Pitre
002d2f01f1 m68k: enable binfmt_flat on systems with an MMU
Now that the generic changes are in place, this can be enabled on m68k
with the use of proper user space accessors in the flat_get_addr_from_rp()
and flat_put_addr_at_rp() handlers as rp actually holds a user space
address.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Greg Ungerer <gerg@linux-m68k.org>
2016-07-28 13:29:13 +10:00
Stephen Rothwell
fbef66f0ad powerpc/mm: Parenthesise IS_ENABLED() in if condition
Currently IS_ENABLED() produces an expression surrounded by parentheses,
which allows this code to compile, generating eg:

    else if (1 || 0)
        hpte_init_native();

However a change to the macro in the kbuild tree will break this in
future by removing the parentheses.

Fixes: 7353644fa9 ("powerpc/mm: Fix build break when PPC_NATIVE=n")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-28 12:39:15 +10:00