Commit Graph

4390 Commits

Author SHA1 Message Date
Balbir Singh
d1e1b351f5 powerpc/xmon: Add ISA v3.0 SPRs to SPR dump
Add support for printing the PIDR/TIDR for ISA 300 and PSSCR and PTCR
in ISA 3.0 hypervisor mode.

SPRN_PSSCR_PR is the privileged mode access and is used when we are
not in hypervisor mode.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:45 +10:00
Sukadev Bhattiprolu
2392c8c8c0 powerpc/powernv/vas: Define copy/paste interfaces
Define interfaces (wrappers) to the 'copy' and 'paste'
instructions (which are new in PowerISA 3.0). These are intended to be
used to by NX driver(s) to submit Coprocessor Request Blocks (CRBs) to
the NX hardware engines.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:38 +10:00
Sukadev Bhattiprolu
5239af679a powerpc/powernv/vas: Define vas_tx_win_open()
Define an interface to open a VAS send window. This interface is
intended to be used the Nest Accelerator (NX) driver(s) to open
a send window and use it to submit compression/encryption requests
to a VAS receive window.

The receive window, identified by the [vasid, cop] parameters, must
already be open in VAS (i.e connected to an NX engine).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:37 +10:00
Sukadev Bhattiprolu
98271d4198 powerpc/powernv/vas: Define vas_win_close() interface
Define the vas_win_close() interface which should be used to close a
send or receive windows.

While the hardware configurations required to open send and receive
windows differ, the configuration to close a window is the same for
both. So we use a single interface to close the window.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:37 +10:00
Sukadev Bhattiprolu
62c4eda4fa powerpc/powernv/vas: Define vas_rx_win_open() interface
Define the vas_rx_win_open() interface. This interface is intended to
be used by the Nest Accelerator (NX) driver(s) to setup receive
windows for one or more NX engines (which implement compression &
encryption algorithms in the hardware).

Follow-on patches will provide an interface to close the window and to
open a send window that kernel subsystems can use to access the NX
engines.

The interface to open a receive window is expected to be invoked for
each instance of VAS in the system.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:36 +10:00
Sukadev Bhattiprolu
b6622a339e powerpc/powernv: Move GET_FIELD/SET_FIELD to vas.h
Move the GET_FIELD and SET_FIELD macros to vas.h as VAS and other
users of VAS, including NX-842 can use those macros.

There is a lot of related code between the VAS/NX kernel drivers
and skiboot. For consistency, switch the order of parameters in
SET_FIELD to match the order in skiboot.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:20 +10:00
Sukadev Bhattiprolu
967689141e powerpc/powernv/vas: Define macros, register fields and structures
Define macros for the VAS hardware registers and bit-fields as well
as couple of data structures needed by the VAS driver.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[mpe: Fixup include guard to use _ASM_POWERPC_VAS_H]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:17 +10:00
Alexey Kardashevskiy
f1e08232ed powerpc/pci: Remove OF node back pointer from pci_dn
The check_req() helper uses pci_get_pdn() to get an OF node pointer.
pci_get_pdn() returns a pci_dn pointer which either:
1) from the OF node returned by pci_device_to_OF_node();
2) from the parent child_list where entries don't have OF node pointers.
Since check_req() does not care about 2), it can call
pci_device_to_OF_node() directly, hence the change.

The find_pe_dn() helper uses embedded pci_dn to get an OF node which is
also stored in edev->pdev so let's take a shortcut and call
pci_device_to_OF_node() directly.

With these 2 changes, we can finally get rid of the OF node back pointer.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:12 +10:00
Alexey Kardashevskiy
405b33a76d powerpc/eeh: Remove unnecessary config_addr from eeh_dev
The eeh_dev struct hold a config space address of an associated node
and the very same address is also stored in the pci_dn struct which
is always present during the eeh_dev lifetime.

This uses bus:devfn directly from pci_dn instead of cached and packed
config_addr.

Since config_addr is made from device's bus:dev.fn, there is no point
in keeping it in the debugfs either so remove that too.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:09 +10:00
Alexey Kardashevskiy
69672bd748 powerpc/eeh: Remove unnecessary pointer to phb from eeh_dev
The eeh_dev struct already holds a pointer to pci_dn which it does not
exist without and pci_dn itself holds the very same pointer so just
use it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:09 +10:00
Alexey Kardashevskiy
8bae6a2319 powerpc/eeh: Reduce to one the number of places where edev is allocated
arch/powerpc/kernel/eeh_dev.c:57 is the only legit place where edev
is allocated; other 2 places allocate it on stack and in the heap for
a very short period of time to use eeh_pe_get() as takes edev.

This changes eeh_pe_get() to receive required parameters explicitly.

This removes unnecessary temporary allocation of edev.

This uses the "pe_no" name instead of the "pe_config_addr" name as
it actually is a PE number and not a config space address as it seemed.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:08 +10:00
Nicholas Piggin
6fcd6baa90 powerpc/powernv: Use kernel crash path for machine checks
There are quite a few machine check exceptions that can be caused by
kernel bugs. To make debugging easier, use the kernel crash path in
cases of synchronous machine checks that occur in kernel mode, if that
would not result in the machine going straight to panic or crash dump.

There is a downside here that die()ing the process in kernel mode can
still leave the system unstable. panic_on_oops will always force the
system to fail-stop, so systems where that behaviour is important will
still do the right thing.

As a test, when triggering an i-side 0111b error (ifetch from foreign
address) in kernel mode process context on POWER9, the kernel currently
dies quickly like this:

  Severe Machine check interrupt [Not recovered]
    NIP [ffff000000000000]: 0xffff000000000000
    Initiator: CPU
    Error type: Real address [Instruction fetch (foreign)]
  [  127.426651616,0] OPAL: Reboot requested due to Platform error.
      Effective[  127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
  opal: Reboot type 1 not supported
  Kernel panic - not syncing: PowerNV Unrecovered Machine Check
  CPU: 56 PID: 4425 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a261072-dirty #35
  Call Trace:
  [  128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to
    buggy/mising code in OPAL for this BMC
    Rebooting in 10 seconds..
  Trying to free IRQ 496 from IRQ context!

After this patch, the process is killed and the kernel continues with
this message, which gives enough information to identify the offending
branch (i.e., with CFAR):

  Severe Machine check interrupt [Not recovered]
    NIP [ffff000000000000]: 0xffff000000000000
    Initiator: CPU
    Error type: Real address [Instruction fetch (foreign)]
      Effective address: ffff000000000000
  Oops: Machine check, sig: 7 [#1]
  SMP NR_CPUS=2048
  NUMA
  PowerNV
  Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ...
  CPU: 22 PID: 4436 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a261072-dirty #36
  task: c000000932300000 task.stack: c000000932380000
  NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000
  REGS: c00000000fc8fd80 TRAP: 0200   Tainted: G   M             (4.12.0-rc1-13857-ga4700a261072-dirty)
  MSR: 90000000001c1003 <SF,HV,ME,RI,LE>
    CR: 24000484  XER: 20000000
  CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1
  GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000
  GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8
  GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030
  GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918
  NIP [ffff000000000000] 0xffff000000000000
  LR [00000000217706a4] 0x217706a4
  Call Trace:
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:04 +10:00
Nicholas Piggin
b746e3e01e powerpc/powernv: Flush console before platform error reboot
Unrecovered MCE and HMI errors are sent through a special restart OPAL
call to log the platform error. The downside is that they don't go
through normal Linux crash paths, so they don't give much information
to the Linux console.

Change this by providing a special crash function which does some of
the console flushing from the panic() path before calling firmware to
reboot.

The downside of this is a little more code to execute before reaching
the firmware reboot. However in practice, it's critical to get the
Linux console messages output in order to debug a problem. So this is
a desirable tradeoff.

Note on the implementation: It is difficult to plumb a custom reboot
handler into the panic path, because panic does a little bit too much
work. For example, it will try to delay with the timebase, but that
may be corrupted in some cases resulting in a hang without reaching
the platform reboot. Another problem is that panic can invoke the
crash dump code which is not what we want in the case of a hardware
platform error. Long-term the best solution will be to rework the
panic path so it can be suitable for this kind of panic, but for now
we just duplicate a bit of the code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:03 +10:00
Nicholas Piggin
a3b2cb30f2 powerpc: Do not call ppc_md.panic in fadump panic notifier
If fadump is not registered, and no other crash or debug handlers are
registered, the powerpc panic handler stops the guest before the
generic panic code can push out debug information to the console.

Currently, system reset injection causes the guest to silently stop.

Stop calling ppc_md.panic in the panic notifier. crash_fadump already
does rtas_os_term() to terminate the guest if fadump is registered.

Remove ppc_md.panic. Move fadump panic notifier into fadump code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:01 +10:00
Nicholas Piggin
70412c55d4 powerpc/64: Fix watchdog configuration regressions
This fixes a couple more bits of fallout from the new hard lockup watchdog
patch.

It restores the required hw_nmi_get_sample_period() function for the
perf watchdog, and removes some function declarations on 64e that are only
defined for 64s. This fixes the 64e build when the hardlockup detector is
enabled.

It restores the default behaviour of disabling the perf watchdog, and also
fixes disabling the 64s watchdog when running as a guest.

Fixes: 2104180a53 ("powerpc/64s: implement arch-specific hardlockup watchdog")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:00 +10:00
Paul Mackerras
4dafecde44 Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next
This merges in the 'ppc-kvm' topic branch from the powerpc tree in
order to bring in some fixes which touch both powerpc and KVM code.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-08-31 12:37:03 +10:00
Ram Pai
d182b8fd60 KVM: PPC: Book3S HV: Fix setting of storage key in H_ENTER
In handling a H_ENTER hypercall, the code in kvmppc_do_h_enter
clobbers the high-order two bits of the storage key, which is stored
in a split field in the second doubleword of the HPTE.  Any storage
key number above 7 hence fails to operate correctly.

This makes sure we preserve all the bits of the storage key.

Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-08-31 12:36:44 +10:00
Nicholas Piggin
aafc8a8300 powerpc/64s: Move IDLE_STATE_ENTER_SEQ[_NORET] into idle_book3s.S
This macro is only used in idle_book3s.S, move it in there and add a
more descriptive comment.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch and write change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-29 21:38:47 +10:00
Michael Ellerman
82b7fcc005 Merge branch 'topic/ppc-kvm' into next
Merge Nicks commit to rework the KVM thread management, shared with the
KVM tree via the ppc-kvm topic branch.
2017-08-29 21:26:30 +10:00
Nicholas Piggin
94a04bc25a KVM: PPC: Book3S HV: POWER9 does not require secondary thread management
POWER9 CPUs have independent MMU contexts per thread, so KVM does not
need to quiesce secondary threads, so the hwthread_req/hwthread_state
protocol does not have to be used. So patch it away on POWER9, and patch
away the branch from the Linux idle wakeup to kvm_start_guest that is
never used.

Add a warning and error out of kvmppc_grab_hwthread in case it is ever
called on POWER9.

This avoids a hwsync in the idle wakeup path on POWER9.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Use WARN(...) instead of WARN_ON()/pr_err(...)]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-29 14:48:59 +10:00
Linus Torvalds
42e6d5e5ee powerpc fixes for 4.13 #8
Just one fix, to add a barrier in the switch_mm() code to make sure the mm
 cpumask update is ordered vs the MMU starting to load translations. As far as we
 know no one's actually hit the bug, but that's just luck.
 
 Thanks to:
   Benjamin Herrenschmidt, Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZoAZDAAoJEFHr6jzI4aWAi3AQAJq4boEBqdmL042oNK4PWW0M
 uGfehNmtzCw9Hp8bPfzOf8NypJ51Kw7eDQELaeSaazKW+gffUCBeEsKGS7kmHvc+
 x1tHxkXxI7PXuNIRojJg9y7rlKXdRym5SecvPSo1cm/c46RRWOlNGZaIwiHyrXSh
 eBjyP5EHu1HXpRxkcUh+//PQp2b+7SmgUYzSf0hA9UCtzSZSJr19DuY8uhetI9Ws
 AfjkO1uvb2KETqBVegGBpAruZzQtxqdtffd2HToSaCHUnAKma2iqUZqkqBNjL6OQ
 gSXWpXVInng/7ktrrfEgSiwlHns7pgHkxYHS8thDZqQpIt3GNsUg2UwpHGf6oL7V
 L+GtRp36LM91Ueq6KdlU7bJkmoiJ798Hnp3FOjpkqo+j/MGuCQDDDK4Ge1popehJ
 a17K7lE/FKGqNaFINc1Q6hnXg4MPyawAOLDlV839Ap5+ISPS6WcHaa1AgKjdQNkH
 fIkZZsYT531FIf853AjUGFw8frSlVfrHmIx9/HJOhEa1KHQhBqGRV1sWYEjuN6IB
 av+tQDlleG5aT641qhHlA/hN5DGrGZXLp8e6cFRufF+CSsRayL27u0Qw9pP9VZ3S
 bgfdnmZZyP23+bzaq/m/bjhRiOf0snSQPxIKe56KmNCJ8buTrGWDw4IuiPKB7Y6V
 06vBFn7ZUP5aeHIZkS62
 =IClj
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.13-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 "Just one fix, to add a barrier in the switch_mm() code to make sure
  the mm cpumask update is ordered vs the MMU starting to load
  translations. As far as we know no one's actually hit the bug, but
  that's just luck.

  Thanks to Benjamin Herrenschmidt, Nicholas Piggin"

* tag 'powerpc-4.13-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Ensure cpumask update is ordered
2017-08-25 17:32:35 -07:00
Jiri Slaby
30d6e0a419 futex: Remove duplicated code and fix undefined behaviour
There is code duplicated over all architecture's headers for
futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
and comparison of the result.

Remove this duplication and leave up to the arches only the needed
assembly which is now in arch_futex_atomic_op_inuser.

This effectively distributes the Will Deacon's arm64 fix for undefined
behaviour reported by UBSAN to all architectures. The fix was done in
commit 5f16a046f8 (arm64: futex: Fix undefined behaviour with
FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.

And as suggested by Thomas, check for negative oparg too, because it was
also reported to cause undefined behaviour report.

Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
remove pointless access_ok() checks") as access_ok there returns true.
We introduce it back to the helper for the sake of simplicity (it gets
optimized away anyway).

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390]
Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org>
Reviewed-by: Will Deacon <will.deacon@arm.com> [core/arm64]
Cc: linux-mips@linux-mips.org
Cc: Rich Felker <dalias@libc.org>
Cc: linux-ia64@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: peterz@infradead.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: sparclinux@vger.kernel.org
Cc: Jonas Bonn <jonas@southpole.se>
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: linux-hexagon@vger.kernel.org
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-xtensa@linux-xtensa.org
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: openrisc@lists.librecores.org
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Stafford Horne <shorne@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Richard Henderson <rth@twiddle.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-parisc@vger.kernel.org
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-alpha@vger.kernel.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz
2017-08-25 22:49:59 +02:00
Benjamin Herrenschmidt
3a2df3798d powerpc/mm: Make switch_mm_irqs_off() out of line
It's too big to be inline, there is no reason to keep it
that way.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rework to incorporate the comment changes via fixes branch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 22:48:51 +10:00
Benjamin Herrenschmidt
a619e59c07 powerpc/mm: Optimize detection of thread local mm's
Instead of comparing the whole CPU mask every time, let's
keep a counter of how many bits are set in the mask. Thus
testing for a local mm only requires testing if that counter
is 1 and the current CPU bit is set in the mask.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 22:28:38 +10:00
Benjamin Herrenschmidt
058ccc3465 powerpc/mm: Avoid double irq save/restore in activate_mm
It calls switch_mm() which already does the irq save/restore
these days.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 22:27:44 +10:00
Benjamin Herrenschmidt
43ed84a891 powerpc/mm: Move pgdir setting into a helper
Makes switch_mm_irqs_off() a bit more readable

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 22:27:42 +10:00
Michael Ellerman
15c659ff9d Merge branch 'fixes' into next
There's a non-trivial dependency between some commits we want to put in
next and the KVM prefetch work around that went into fixes. So merge
fixes into next.
2017-08-23 22:20:10 +10:00
Ingo Molnar
94edf6f3c2 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

 - Removal of spin_unlock_wait()
 - SRCU updates
 - Torture-test updates
 - Documentation updates
 - Miscellaneous fixes
 - CPU-hotplug fixes
 - Miscellaneous non-RCU fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-21 09:45:19 +02:00
Benjamin Herrenschmidt
1a92a80ad3 powerpc/mm: Ensure cpumask update is ordered
There is no guarantee that the various isync's involved with
the context switch will order the update of the CPU mask with
the first TLB entry for the new context being loaded by the HW.

Be safe here and add a memory barrier to order any subsequent
load/store which may bring entries into the TLB.

The corresponding barrier on the other side already exists as
pte updates use pte_xchg() which uses __cmpxchg_u64 which has
a sync after the atomic operation.

Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add comments in the code]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-18 13:07:16 +10:00
Paul E. McKenney
952111d7db arch: Remove spin_unlock_wait() arch-specific definitions
There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait() for all architectures providing them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
2017-08-17 08:08:59 -07:00
Aneesh Kumar K.V
fa4531f753 powerpc/mm: Don't send IPI to all cpus on THP updates
Now that we made sure that lockless walk of linux page table is mostly
limitted to current task(current->mm->pgdir) we can update the THP
update sequence to only send IPI to CPUs on which this task has run.
This helps in reducing the IPI overload on systems with large number
of CPUs.

WRT kvm even though kvm is walking page table with vpc->arch.pgdir,
it is done only on secondary CPUs and in that case we have primary CPU
added to task's mm cpumask. Sending an IPI to primary will force the
secondary to do a vm exit and hence this mm cpumask usage is safe
here.

WRT CAPI, we still end up walking linux page table with capi context
MM. For now the pte lookup serialization sends an IPI to all CPUs in
CPI is in use. We can further improve this by adding the CAPI
interrupt handling CPU to task mm cpumask. That will be done in a
later patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17 23:31:13 +10:00
Michael Ellerman
8434f0892e Merge branch 'topic/ppc-kvm' into next
Bring in the commit to rename find_linux_pte_or_hugepte() which touches
arch and KVM code, and might need to be merged with the kvmppc tree to
avoid conflicts.
2017-08-17 23:14:17 +10:00
Aneesh Kumar K.V
94171b19c3 powerpc/mm: Rename find_linux_pte_or_hugepte()
Add newer helpers to make the function usage simpler. It is always
recommended to use find_current_mm_pte() for walking the page table.
If we cannot use find_current_mm_pte(), it should be documented why
the said usage of __find_linux_pte() is safe against a parallel THP
split.

For now we have KVM code using __find_linux_pte(). This is because kvm
code ends up calling __find_linux_pte() in real mode with MSR_EE=0 but
with PACA soft_enabled = 1. We may want to fix that later and make
sure we keep the MSR_EE and PACA soft_enabled in sync. When we do that
we can switch kvm to use find_linux_pte().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17 23:13:46 +10:00
Naveen N. Rao
694fc88ce2 powerpc/string: Implement optimized memset variants
Based on Matthew Wilcox's patches for other architectures.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17 23:04:35 +10:00
Dou Liyang
7baebe54a6 powerpc/topology: Remove the unused parent_node() macro
Commit a7be6e5a7f ("mm: drop useless local parameters of
__register_one_node()") removes the last user of parent_node().

The parent_node() macro in POWERPC platform is unnecessary.

Remove it for cleanup.

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16 21:12:41 +10:00
Aneesh Kumar K.V
4ae279c2c9 powerpc/mm/hugetlb: Allow runtime allocation of 16G.
Now that we have GIGANTIC_PAGE enabled on powerpc, use this for 16G hugepages
with hash translation mode. Depending on the total system memory we have, we may
be able to allocate 16G hugepages runtime. This also remove the hugetlb setup
difference between hash/radix translation mode.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16 14:56:13 +10:00
Aneesh Kumar K.V
79cc38ded1 powerpc/mm/hugetlb: Add support for reserving gigantic huge pages via kernel command line
With commit aa888a7497 ("hugetlb: support larger than MAX_ORDER") we added
support for allocating gigantic hugepages via kernel command line. Switch
ppc64 arch specific code to use that.

W.r.t FSL support, we now limit our allocation range using BOOTMEM_ALLOC_ACCESSIBLE.

We use the kernel command line to do reservation of hugetlb pages on powernv
platforms. On pseries hash mmu mode the supported gigantic huge page size is
16GB and that can only be allocated with hypervisor assist. For pseries the
command line option doesn't do the allocation. Instead pseries does gigantic
hugepage allocation based on hypervisor hint that is specified via
"ibm,expected#pages" property of the memory node.

Cc: Scott Wood <oss@buserror.net>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16 14:56:12 +10:00
Christophe Leroy
ca8afd4046 powerpc/hugetlb: fix page rights verification in gup_hugepte()
gup_hugepte() checks if pages are present and readable, and
when  'write' is set, also checks if the pages are writable.

Initially this was done by checking if _PAGE_PRESENT and
_PAGE_READ were set. In addition, _PAGE_WRITE was verified for write
accesses.

The problem is that we have to handle the three following cases:
1/ The target defines __PAGE_READ and __PAGE_WRITE
2/ The target defines __PAGE_RW
3/ The target defines __PAGE_RO

In case 1/, this is obvious
In case 2/, __PAGE_READ is defined as 0 and __PAGE_WRITE as __PAGE_RW
so it works as well.
But in case 3, __PAGE_RW is defined as 0, which means __PAGE_WRITE is 0
and then the test returns true (page writable) in all cases.

A first correction was attempted in commit 6b8cb66a6a ("powerpc: Fix
usage of _PAGE_RO in hugepage"), but that fix is wrong:
instead of checking that the page is writable when write is requested,
it checks that the page is NOT writable when write is NOT requested.

This patch adds a new pte_read() helper to check whether a page is
readable or not. This avoids handling all possible cases in
gup_hugepte().

Then gup_hugepte() is modified to use pte_present(), pte_read()
and pte_write() instead of the raw flags.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 22:55:58 +10:00
Christophe Leroy
4cfac2f9c7 powerpc/mm: Simplify __set_fixmap()
__set_fixmap() uses __fix_to_virt() then does the boundary checks
by it self. Instead, we can use fix_to_virt() which does the
verification at build time. For this, we need to use it inline
so that GCC can see the real value of idx at buildtime.

In the meantime, we remove the 'fixmaps' variable.
This variable is set but has never been used from the beginning
(commit 2c419bdeca ("[POWERPC] Port fixmap from x86 and use
for kmap_atomic"))

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 22:55:58 +10:00
Christophe Leroy
86b19520e7 powerpc/mm: declare some local functions static
get_pteptr() and __mapin_ram_chunk() are only used locally,
so define them static

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 22:55:57 +10:00
Christophe Leroy
3184cc4b6f powerpc/mm: Fix kernel RAM protection after freeing unused memory on PPC32
As seen below, allthough the init sections have been freed, the
associated memory area is still marked as executable in the
page tables.

~ dmesg
[    5.860093] Freeing unused kernel memory: 592K (c0570000 - c0604000)

~ cat /sys/kernel/debug/kernel_page_tables
---[ Start of kernel VM ]---
0xc0000000-0xc0497fff        4704K  rw  X  present dirty accessed shared
0xc0498000-0xc056ffff         864K  rw     present dirty accessed shared
0xc0570000-0xc059ffff         192K  rw  X  present dirty accessed shared
0xc05a0000-0xc7ffffff      125312K  rw     present dirty accessed shared
---[ vmalloc() Area ]---

This patch fixes that.

The implementation is done by reusing the change_page_attr()
function implemented for CONFIG_DEBUG_PAGEALLOC

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 22:55:56 +10:00
Michael Ellerman
5b6c133e08 powerpc/mm/nohash: Move definition of PGALLOC_GFP to fix build errors
In some obscure Book3E configs (randconfig) we can end up missing a
definition for PGALLOC_GFP in pgtable_64.c.

Fix it by moving the definition to asm/pgalloc.h.

Fixes: de3b87611d ("powerpc/mm/book(e)(3s)/64: Add page table accounting")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 20:02:56 +10:00
Christophe Leroy
3ee87674e0 powerpc/8xx: Use symbolic PVR value
For the 8xx, PVR values defined in arch/powerpc/include/asm/reg.h
are nowhere used.

Remove all defines and add PVR_8xx

Use it in arch/powerpc/kernel/cputable.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:18 +10:00
Christophe Leroy
968159c003 powerpc/8xx: Getting rid of remaining use of CONFIG_8xx
Two config options exist to define powerpc MPC8xx:
* CONFIG_PPC_8xx
* CONFIG_8xx

arch/powerpc/platforms/Kconfig.cputype has contained the following
comment about CONFIG_8xx item for some years:
"# this is temp to handle compat with arch=ppc"

arch/powerpc is now the only place with remaining use of
CONFIG_8xx: get rid of them.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:12 +10:00
Christophe Leroy
72e4b2cdf0 powerpc/time: refactor MFTB() to limit number of ifdefs
The 8xx cannot access the TBL and TBU registers using mfspr/mtspr
It must be accessed using mftb/mftbu

Due to this, there is a number of places with #ifdef CONFIG_8xx

This patch defines new macros MFTBL(x) and MFTBU(x) on the same model
as MFTB(x) and tries to make use of them as much as possible.

In arch/powerpc/include/asm/timex.h, we also remove the ifdef
for the asm() operands as the compiler doesn't mind unused operands

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:09 +10:00
Michael Ellerman
d30a5a5262 powerpc/traps: Use SRR1 defines for program check reasons
Currently we open code the reason codes for program checks. Instead use
the existing SRR1 defines.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:31:32 +10:00
Shilpasri G Bhat
bf9571550f powerpc/powernv: Add support to clear sensor groups data
Adds support for clearing different sensor groups. OCC inband sensor
groups like CSM, Profiler, Job Scheduler can be cleared using this
driver. The min/max of all sensors belonging to these sensor groups
will be cleared.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:40:05 +10:00
Shilpasri G Bhat
8e84b2d1f0 powerpc/powernv: Add support to set power-shifting-ratio
This patch adds support to set power-shifting-ratio which hints the
firmware how to distribute/throttle power between different entities
in a system (e.g CPU v/s GPU). This ratio is used by OCC for power
capping algorithm.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:40:01 +10:00
Shilpasri G Bhat
cb8b340de2 powerpc/powernv: Add support for powercap framework
Adds a generic powercap framework to change the system powercap
inband through OPAL-OCC command/response interface.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:39:53 +10:00
Nicholas Piggin
04019bf8eb powerpc: Add irq accounting for watchdog interrupts
This adds an irq counter for the watchdog soft-NMI. This interrupt
only fires when interrupts are soft-disabled, so it will not
increment much even when the watchdog is running. However it's
useful for debugging and sanity checking.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:30:02 +10:00
Nicholas Piggin
ca41ad4377 powerpc: Add irq accounting for system reset interrupts
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:30:01 +10:00
Andreas Schwab
8a583c0a8d powerpc: Fix invalid use of register expressions
binutils >= 2.26 now warns about misuse of register expressions in
assembler operands that are actually literals, for example:

  arch/powerpc/kernel/entry_64.S:535: Warning: invalid register expression

In practice these are almost all uses of r0 that should just be a
literal 0.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
[mpe: Mention r0 is almost always the culprit, fold in purgatory change]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:29:41 +10:00
Peter Zijlstra
a9668cd6ee locking: Remove smp_mb__before_spinlock()
Now that there are no users of smp_mb__before_spinlock() left, remove
it entirely.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:03 +02:00
Peter Zijlstra
d89e588ca4 locking: Introduce smp_mb__after_spinlock()
Since its inception, our understanding of ACQUIRE, esp. as applied to
spinlocks, has changed somewhat. Also, I wonder if, with a simple
change, we cannot make it provide more.

The problem with the comment is that the STORE done by spin_lock isn't
itself ordered by the ACQUIRE, and therefore a later LOAD can pass over
it and cross with any prior STORE, rendering the default WMB
insufficient (pointed out by Alan).

Now, this is only really a problem on PowerPC and ARM64, both of
which already defined smp_mb__before_spinlock() as a smp_mb().

At the same time, we can get a much stronger construct if we place
that same barrier _inside_ the spin_lock(). In that case we upgrade
the RCpc spinlock to an RCsc.  That would make all schedule() calls
fully transitive against one another.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:02 +02:00
Michael Ellerman
21a0e8c14b powerpc/mm/hash64: Make vmalloc 56T on hash
On 64-bit book3s, with the hash MMU, we currently define the kernel
virtual space (vmalloc, ioremap etc.), to be 16T in size. This is a
leftover from pre v3.7 when our user VM was also 16T.

Of that 16T we split it 50/50, with half used for PCI IO and ioremap
and the other 8T for vmalloc.

We never bothered to make it any bigger because 8T of vmalloc ought to
be enough for anybody. But it turns out that's not true, the per cpu
allocator wants large amounts of vmalloc space, not to make large
allocations, but to allow a large stride between allocations, because
we use pcpu_embed_first_chunk().

With a bit of juggling we can increase the entire kernel virtual space
to 64T. The only real complication is the check of the address in the
SLB miss handler, see the comment in the code.

Although we could continue to split virtual space 50/50 as we do now,
no one seems to be running out of PCI IO or ioremap space. So instead
keep that as 8T, and use the remaining 56T for vmalloc.

In future we should be able to increase the kernel virtual space to
512T, the code already supports that, it just needs testing on older
hardware.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2017-08-08 19:37:05 +10:00
Michael Ellerman
63ee9b2ff9 powerpc/mm/book3s64: Make KERN_IO_START a variable
Currently KERN_IO_START is defined as:

 #define KERN_IO_START  (KERN_VIRT_START + (KERN_VIRT_SIZE >> 1))

Although it looks like a constant, both the components are actually
variables, to allow us to have a different value between Radix and
Hash with a single kernel.

However that still requires both Radix and Hash to place the kernel IO
region at the same location relative to the start and end of the
kernel virtual region (namely 1/2 way through it), and we'd like to
change that.

So split KERN_IO_START out into its own variable, and initialise it
for Radix and Hash. In the medium term we should be able to
reconsolidate this, by doing a more involved rearrangement of the
location of the regions.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-08 19:37:04 +10:00
Matt Brown
e66ca3db59 powerpc/powernv: Use darn instruction for get_random_seed() on Power9
This adds powernv_get_random_darn() which utilises the darn instruction,
introduced in ISA v3.0/POWER9.

The darn instruction can potentially return an error, which is supported
by the get_random_seed() API, in normal usage if we see an error we just
return that to the caller.

However when detecting whether darn is functional at boot we try up to
10 times, before deciding that darn doesn't work and failing the
registration of get_random_seed(). That way an intermittent failure
at boot doesn't deprive the system of randomness until the next reboot.

Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com>
[mpe: Move init into a function, tweak change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-08 19:37:03 +10:00
Frederic Barrat
2552910084 powerpc/powernv: Enable PCI peer-to-peer
P9 has support for PCI peer-to-peer, enabling a device to write in the
MMIO space of another device directly, without interrupting the CPU.

This patch adds support for it on powernv, by adding a new API to be
called by drivers. The pnv_pci_set_p2p(...) call configures an
'initiator', i.e the device which will issue the MMIO operation, and a
'target', i.e. the device on the receiving side.

P9 really only supports MMIO stores for the time being but that's
expected to change in the future, so the API allows to define both
load and store operations.

  /* PCI p2p descriptor */
  #define OPAL_PCI_P2P_ENABLE           0x1
  #define OPAL_PCI_P2P_LOAD             0x2
  #define OPAL_PCI_P2P_STORE            0x4

  int pnv_pci_set_p2p(struct pci_dev *initiator, struct pci_dev *target,
                      u64 desc)

It uses a new OPAL call, as the configuration magic is done on the
PHBs by skiboot.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
[mpe: Drop unrelated OPAL calls, s/uint64_t/u64/, minor formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-08 11:27:30 +10:00
Linus Torvalds
841fe95303 powerpc fixes for 4.13 #5
Fixes for recently merged code:
  - a fix for the _PAGE_DEVMAP support, which was breaking KVM on Power9 radix
  - avoid a (harmless) lockdep warning in the early SMP code
  - return failure for some uses of dma_set_mask() rather than falling back to 32-bits
  - fix stack setup in watchdog soft_nmi_common() to use emergency stack
  - fix of_irq_to_resource() error check in of_fsl_spi_probe()
 
 Two fixes going to stable:
  - fix saving of Transactional Memory SPRs in core dump
  - fix __check_irq_replay missing decrementer interrupt
 
 And two misc:
  - fix 64-bit boot wrapper build with non-biarch compiler
  - work around a POWER9 PMU hang after state-loss idle
 
 Thanks to:
   Alistair Popple, Aneesh Kumar K.V, Cyril Bur, Gustavo Romero, Jose Ricardo
   Ziviani, Laurent Vivier, Nicholas Piggin, Oliver O'Halloran, Sergei Shtylyov,
   Suraj Jitindar Singh, Thomas Gleixner.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZhEH/AAoJEFHr6jzI4aWA8GoP+wdtaBYZIElFMnPay/BY8lSQ
 MODF6FlaexdNuVFo5In2BnMAs12bIXDO69wMziqc+uZIA+vI3K7lqsi+8p1R38Ts
 FG5OFLzX/gx1ykoeDe0mC8GPUHGVBdMr6+rHzt/9M7PrQWApCoBJLakYIElOrN5B
 EiAxunwJQISKNQYYmyVhjMVFq8ydMHSgtR7JVDHTh1AoStaO9M5hoJegD4Dnytaf
 F0/0Y5U+NGGWwPIyDrjB4wE9L8NCk0JfvpBhjORMkFSpAOe4VhxZrAGDyaCzVKVh
 YkxSirVKuVwEmvV2prmq4SAJi61EMUEPx2WL7hRN3ol/dF+2lU9nl2lyL3yRnGPY
 9mtNHur07qoMqyXI8750ayTzbMG75aakJF2be9G+zwNc3Sw8dqJvs2PwpSVdnima
 jWmWnRikws4qmEOEBV6nxtWf8qYACHevyK7YzIHDU8SVXLjAkX/u6LZGHAfTQ8Xo
 jYcISpiA2ko8YnMR7yy9FZqs4KGqeOotMamgJRGUcSY/9Xn8WDIrtQ4tIuJtGjzW
 XFOFafZOzyt6s1dNz01C4hzP60J+/CxPNRxx6zWnbijY6puADu/Mv1g+ai3nYdKA
 e/CYt6O8tzooFHCHtVLUPgWA82heDwNtjmlnSikWTdT07KLRhP+0w9Qq8gBzztKT
 7B/vPMjCMPQaJrrnj6GX
 =Rdy8
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fixes for recently merged code:
   - a fix for the _PAGE_DEVMAP support, which was breaking KVM on
     Power9 radix
   - avoid a (harmless) lockdep warning in the early SMP code
   - return failure for some uses of dma_set_mask() rather than falling
     back to 32-bits
   - fix stack setup in watchdog soft_nmi_common() to use emergency
     stack
   - fix of_irq_to_resource() error check in of_fsl_spi_probe()

  Two fixes going to stable:
   - fix saving of Transactional Memory SPRs in core dump
   - fix __check_irq_replay missing decrementer interrupt

  And two misc:
   - fix 64-bit boot wrapper build with non-biarch compiler
   - work around a POWER9 PMU hang after state-loss idle

  Thanks to: Alistair Popple, Aneesh Kumar K.V, Cyril Bur, Gustavo
  Romero, Jose Ricardo Ziviani, Laurent Vivier, Nicholas Piggin, Oliver
  O'Halloran, Sergei Shtylyov, Suraj Jitindar Singh, Thomas Gleixner"

* tag 'powerpc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64: Fix __check_irq_replay missing decrementer interrupt
  powerpc/perf: POWER9 PMU stops after idle workaround
  powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check
  powerpc/64s: Fix stack setup in watchdog soft_nmi_common()
  powerpc/powernv/pci: Return failure for some uses of dma_set_mask()
  powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler
  powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU
  powerpc/tm: Fix saving of TM SPRs in core dump
  powerpc/mm: Fix pmd/pte_devmap() on non-leaf entries
2017-08-04 09:56:54 -07:00
Benjamin Herrenschmidt
6ff4d3e966 powerpc: Remove old unused icswx based coprocessor support
We have a whole pile of unused code to maintain the ACOP register,
allocate coprocessor PIDs and handle ACOP faults. This mechanism
was used for the HFI adapter on POWER7 which is dead and gone and
whose driver never went upstream. It was used on some A2 core based
stuff that also never saw the light of day.

Take out all that code.

There is still some POWER8 coprocessor code that uses icswx but it's
kernel only and thus doesn't use any of that infrastructure.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03 16:06:52 +10:00
Benjamin Herrenschmidt
870cfe77a9 powerpc/mm: Update definitions of DSISR bits
This updates the definitions for the various DSISR bits to
match both some historical stuff and to match new bits on
POWER9.

In addition, we define some masks corresponding to the "bad"
faults on Book3S, and some masks corresponding to the bits
that match between DSISR and SRR1 for a DSI and an ISI.

This comes with a small code update to change the definition
of DSISR_PGDIRFAULT which becomes DSISR_PRTABLE_FAULT to
match architecture 3.0B

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03 16:06:43 +10:00
Benjamin Herrenschmidt
424de9c6e3 powerpc/mm/radix: Avoid flushing the PWC on every flush_tlb_range
We do that because it's used by THP pmd collapsing, so use
instead a dedicated flush function.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-02 13:11:06 +10:00
Benjamin Herrenschmidt
a46cc7a90f powerpc/mm/radix: Improve TLB/PWC flushes
At the moment we have to rather sub-optimal flushing behaviours:

 - flush_tlb_mm() will flush the PWC which is unnecessary (for example
   when doing a fork)

 - A large unmap will call flush_tlb_pwc() multiple times causing us
   to perform that fairly expensive operation repeatedly. This happens
   often in batches of 3 on every new process.

So we change flush_tlb_mm() to only flush the TLB, and we use the
existing "need_flush_all" flag in struct mmu_gather to indicate
that the PWC needs flushing.

Unfortunately, flush_tlb_range() still needs to do a full flush
for now as it's used by the THP collapsing. We will fix that later.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-02 13:11:06 +10:00
Gautham R. Shenoy
e1c1cfed54 powerpc/powernv: Save/Restore additional SPRs for stop4 cpuidle
The stop4 idle state on POWER9 is a deep idle state which loses
hypervisor resources, but whose latency is low enough that it can be
exposed via cpuidle.

Until now, the deep idle states which lose hypervisor resources (eg:
winkle) were only exposed via CPU-Hotplug.  Hence currently on wakeup
from such states, barring a few SPRs which need to be restored to
their older value, rest of the SPRS are reinitialized to their values
corresponding to that at boot time.

When stop4 is used in the context of cpuidle, we want these additional
SPRs to be restored to their older value, to ensure that the context
on the CPU coming back from idle is same as it was before going idle.

In this patch, we define a SPR save area in PACA (since we have used
up the volatile register space in the stack) and on POWER9, we restore
SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
SPRN_MMCR2 to the values they had before entering stop.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-01 21:01:20 +10:00
Michael Ellerman
bb272221e9 Linux v4.13-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZapWhAAoJEHm+PkMAQRiGKb0IAJM6b7SbWaw69Og7+qiFB+zZ
 xp29iXqbE9fPISC6a5BRQV1ONjeDM6opGixGHqGC8Hla6k2IYz25VDNoF8wd0MXN
 cz/Ih20vd3C5afxXGe5cTT8lsPAlV0mWXxForlu6j8jPeL62FPfq6RhEkw7AcrYL
 yfYy3k3qSdOrrvBdII0WAAUi46UfIs+we9BQgbsMbkHOiqV2K0MOrzKE84Xbgepq
 RAy2xg6P4b4+hTx8xTrYc1MXwpnqjRc0oJ08gdmiwW3AOOU7LxYFn7zDkLPWi9Rr
 g4x6r4YhBTGxT4wNvovLIiqd9QFs//dMCuPWYwEtTICG48umIqqq24beQ0mvCdg=
 =08Ic
 -----END PGP SIGNATURE-----

Merge tag 'v4.13-rc1' into fixes

The fixes branch is based off a random pre-rc1 commit, because we had
some fixes that needed to go in before rc1 was released.

However we now need to fix some code that went in after that point, but
before rc1, so merge rc1 to get that code into fixes so we can fix it!
2017-07-31 20:20:29 +10:00
Linus Torvalds
080012bad6 powerpc fixes for 4.13 #4
The highlight is Ben's patch to work around a host killing bug when running KVM
 guests with the Radix MMU on Power9. See the long change log of that commit for
 more detail.
 
 And then three fairly minor fixes:
  - Fix of_node_put() underflow during reconfig remove, using old DLPAR tools.
  - Fix recently introduced ld version check with 64-bit LE-only toolchain.
  - Free the subpage_prot_table correctly, avoiding a memory leak.
 
 Thanks to:
   Aneesh Kumar K.V, Benjamin Herrenschmidt, Laurent Vivier.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZezIGAAoJEFHr6jzI4aWAyukP/3mxlQ3WdQlYPByZ18cj6YL5
 L0kbRxDgAosD9HzcOPqku1um1l7D6Gk5KZFZfsol7SasSmCZEwV4MbdTrAiRxS6K
 tF0V14hP/BDQeIKfxlnUepzfL8PY7CkO6sDAa6BjHXvBk4+POI+37uw9+2GEV8DY
 tA45fHjA/Zq3eUXsK0WTHIcd09lJXXarf9Tlx+YNZ+3yJ1OMfOji3CXgTkjtwYM9
 XTtsKzsagY1zLwr5gXJu1P05+OGna2VmY6+Tn2lnf7scTFW3qYGF3eWRx71diiKS
 PpZCjqfzWF4+TDIGPoYIrkTE+ZKR0lyo6F38GYwae0cYZMs9pGPEpeNahd8Nun+v
 MLU6TnhNfOI40GEYgmOMNKHPJJLSx59Qr/GnrAi/h2nUEocuN76jzNbaeFBtj3jD
 /vrRTmVUtt1wGqORX7BK4YZFHqcHmZBCM7bQnxibJtLv7fMue0sk58fs2jAaZ1iD
 NacpzsXG7CWYgj6ApclVCYuF99dXTpjrw/WPxilXDg84Pxb7Dv1SpIvLb2T5Guq+
 iqqavViRHP1ng+5/giIsOvF9CnsCzbRYLb0zZTP91nckMmYI6wX2zc56lofjcI5j
 Qc5o/aJvBk4vSM9sibBGEdrZJ1Vt16gGorQ5NZUurZund/cVqvQFhm/4Tvnc0cVN
 yvLNZI8am35pI9CCJ2im
 =Z6uF
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "The highlight is Ben's patch to work around a host killing bug when
  running KVM guests with the Radix MMU on Power9. See the long change
  log of that commit for more detail.

  And then three fairly minor fixes:

   - fix of_node_put() underflow during reconfig remove, using old DLPAR
     tools.

   - fix recently introduced ld version check with 64-bit LE-only
     toolchain.

   - free the subpage_prot_table correctly, avoiding a memory leak.

  Thanks to: Aneesh Kumar K.V, Benjamin Herrenschmidt, Laurent Vivier"

* tag 'powerpc-4.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm/hash: Free the subpage_prot_table correctly
  powerpc/Makefile: Fix ld version check with 64-bit LE-only toolchain
  powerpc/pseries: Fix of_node_put() underflow during reconfig remove
  powerpc/mm/radix: Workaround prefetch issue with KVM
2017-07-28 13:25:15 -07:00
Oliver O'Halloran
c9c98bc5cc powerpc/mm: Fix pmd/pte_devmap() on non-leaf entries
The Radix MMU translation tree as defined in ISA v3.0 contains two
different types of entry, directories and leaves. Leaves are
identified by _PAGE_PTE being set.

The formats of the two entries are different, with the directory
entries containing no spare bits for use by software. In particular
the bit we use for _PAGE_DEVMAP is not reserved for software, and is
part of the NLB (Next Level Base) field, essentially the address of
the next level in the tree.

Note that the Linux pte_t is not == _PAGE_PTE. A huge page pmd
entry (or devmap!) is also a leaf and so has _PAGE_PTE set, even
though we use a pmd_t for it in Linux.

The fix is to ensure that the pmd/pte_devmap() confirm they are
looking at a leaf entry (_PAGE_PTE) as well as checking _PAGE_DEVMAP.

Fixes: ebd3119793 ("powerpc/mm: Add devmap support for ppc64")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Jose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Add a comment in the code and flesh out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-28 15:55:30 +10:00
Benjamin Herrenschmidt
a25bd72bad powerpc/mm/radix: Workaround prefetch issue with KVM
There's a somewhat architectural issue with Radix MMU and KVM.

When coming out of a guest with AIL (Alternate Interrupt Location, ie,
MMU enabled), we start executing hypervisor code with the PID register
still containing whatever the guest has been using.

The problem is that the CPU can (and will) then start prefetching or
speculatively load from whatever host context has that same PID (if
any), thus bringing translations for that context into the TLB, which
Linux doesn't know about.

This can cause stale translations and subsequent crashes.

Fixing this in a way that is neither racy nor a huge performance
impact is difficult. We could just make the host invalidations always
use broadcast forms but that would hurt single threaded programs for
example.

We chose to fix it instead by partitioning the PID space between guest
and host. This is possible because today Linux only use 19 out of the
20 bits of PID space, so existing guests will work if we make the host
use the top half of the 20 bits space.

We additionally add support for a property to indicate to Linux the
size of the PID register which will be useful if we eventually have
processors with a larger PID space available.

There is still an issue with malicious guests purposefully setting the
PID register to a value in the hosts PID range. Hopefully future HW
can prevent that, but in the meantime, we handle it with a pair of
kludges:

 - On the way out of a guest, before we clear the current VCPU in the
   PACA, we check the PID and if it's outside of the permitted range
   we flush the TLB for that PID.

 - When context switching, if the mm is "new" on that CPU (the
   corresponding bit was set for the first time in the mm cpumask), we
   check if any sibling thread is in KVM (has a non-NULL VCPU pointer
   in the PACA). If that is the case, we also flush the PID for that
   CPU (core).

This second part is needed to handle the case where a process is
migrated (or starts a new pthread) on a sibling thread of the CPU
coming out of KVM, as there's a window where stale translations can
exist before we detect it and flush them out.

A future optimization could be added by keeping track of whether the
PID has ever been used and avoid doing that for completely fresh PIDs.
We could similarily mark PIDs that have been the subject of a global
invalidation as "fresh". But for now this will do.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rework the asm to build with CONFIG_PPC_RADIX_MMU=n, drop
      unneeded include of kvm_book3s_asm.h]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-26 16:41:52 +10:00
Madhavan Srinivasan
8f95faaac5 powerpc/powernv: Detect and create IMC device
Code to create platform device for the In-Memory Collection (IMC)
counters. Platform devices are created based on the IMC compatibility.
New header file created to contain the data structures and macros
needed for In-Memory Collection (IMC) counter pmu devices.

The device tree for IMC counters starts at the node "imc-counters".
This node contains all the IMC PMU nodes and event nodes for these IMC
PMUs. Device probe() parses the device to locate three possible IMC
device types (Nest/Core/Thread). Function then branch to parse each
unit nodes to populate vital information such as device memory sizes,
event nodes information, base address for reserve memory access (if
any) and so on. Simple bare-minimum shutdown function added which only
"stops" the engines.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
[mpe: Fix build with CONFIG_PERF_EVENTS=n]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-25 22:55:27 +10:00
Madhavan Srinivasan
28a5db0061 powerpc/powernv: Add IMC OPAL APIs
In-Memory Collection (IMC) counters are performance monitoring
infrastructure. These counters need special sequence of SCOMs to
init/start/stop which is handled by OPAL. And OPAL provides three APIs
to init and control these IMC engines.

OPAL API documentation:
  https://github.com/open-power/skiboot/blob/master/doc/opal-api/opal-imc-counters.rst

Patch updates the kernel side powernv platform code to support the new
OPAL APIs

Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-24 23:00:22 +10:00
Laurentiu Tudor
8f36479d0e powerpc: allow compiling with GENERIC_MSI_IRQ_DOMAIN
This allows building powerpc with the GENERIC_MSI_IRQ_DOMAIN
Kconfig by enabling the asm-generic msi.h in Kbuild. Without
this, there's a compilation error [1] because powerpc, as most
arches, doesn't provide an asm/msi.h.

[1] In file included from ./include/linux/kvm_host.h:20:0,
                 from ./arch/powerpc/include/asm/kvm_ppc.h:30,
                 from arch/powerpc/kernel/dbell.c:20:
./include/linux/msi.h:195:21: fatal error: asm/msi.h: No such file or directory

Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-24 21:19:32 +10:00
Linus Torvalds
ae75d1aefe TTY/Serial fixes for 4.13-rc2
Here are some small tty and serial driver fixes for 4.13-rc2.  Nothing
 huge at all, a revert of a patch that turned out to break things, a fix
 up for a new tty ioctl we added in 4.13-rc1 to get the uapi definition
 correct, and a few minor serial driver fixes for reported issues.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWXMebA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yn9MgCdGsIEQlTvTeG4QikxUPB7dkEJQacAoLWOKJzQ
 A65mBAeOfiJztczi8uo4
 =KEXp
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial fixes from Greg KH:
 "Here are some small tty and serial driver fixes for 4.13-rc2. Nothing
  huge at all, a revert of a patch that turned out to break things, a
  fix up for a new tty ioctl we added in 4.13-rc1 to get the uapi
  definition correct, and a few minor serial driver fixes for reported
  issues.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-4.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: Fix TIOCGPTPEER ioctl definition
  tty: hide unused pty_get_peer function
  tty: serial: lpuart: Fix the logic for detecting the 32-bit type UART
  serial: imx: Prevent TX buffer PIO write when a DMA has been started
  Revert "serial: imx-serial - move DMA buffer configuration to DT"
  serial: sh-sci: Uninitialized variables in sysfs files
  serial: st-asc: Potential error pointer dereference
2017-07-22 09:00:24 -07:00
Linus Torvalds
10fc95547f powerpc fixes for 4.13 #3
A handful of fixes, mostly for new code.
 
 Some reworking of the new STRICT_KERNEL_RWX support to make sure we also remove
 executable permission from __init memory before it's freed.
 
 A fix to some recent optimisations to the hypercall entry where we were
 clobbering r12, this was breaking nested guests (PR KVM).
 
 A fix for the recent patch to opal_configure_cores(). This could break booting
 on bare metal Power8 boxes if the kernel was built without
 CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG.
 
 And finally a workaround for spurious PMU interrupts on Power9 DD2.
 
 Thanks to:
   Nicholas Piggin, Anton Blanchard, Balbir Singh.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZcctRAAoJEFHr6jzI4aWAji8P/RaG3/vGmhP4HGsAh7jHx/3v
 EIBsjXPozQ16AIs4235JcQ2Vg54LLms6IaFfw1w9oYZxi98sNC3b56Rtd3AGxLbq
 WHEGAD+mE9aXUOXkycLRkKZ+THiNzR0ZQKiMDqU+oJ9XbIAoof2gI4nc7UzUxGnt
 EqxERiL2RrYqRBvO3W2ErtL29yXhopUhv+HKdbuMlIcoH4uooSdn435MPleIJWxl
 WRFFv17DFw7PN6juwlff2YWvbTgrM1ND8czLfINzSZQDy0F+HEbKDFbtHlgAvOFN
 GjbODX9OuU/QRrPQv6MoY2UumZNtLSCUsw5BUg0y4Qaak8p0gAEb4D/gmvIZNp/r
 9ZqJDPfFKh/oA08apLTPJBKQmDUwCaYHyHVJNw10CU9GZ9CzW/2/B2/h2Egpgqao
 vt0TgR3FXYizhXGAEpmBZzadAg9VKZTXH/irN6X62wMxNfvRIDOS57829QFHr0ka
 wbsOK1gI1rE1g1GwsZafyURRSe95Sv84RDHW1fFQAtvFgpA3eHl26LUkpcvwOOFI
 53g9zMedYOyZXXIdBn05QojwxgXZ0ry1Wc93oDHOn3rgDL6XBeOXFfFI5jIwPJDL
 VZhOSvN1v+Ti3Q+I05zH+ll6BMhlU8RKBf+esq419DiLUeaNLHZR6GF0+UVJyJo+
 WcLsJ1x/GDa9pOBfMEJy
 =Oy4/
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "A handful of fixes, mostly for new code:

   - some reworking of the new STRICT_KERNEL_RWX support to make sure we
     also remove executable permission from __init memory before it's
     freed.

   - a fix to some recent optimisations to the hypercall entry where we
     were clobbering r12, this was breaking nested guests (PR KVM).

   - a fix for the recent patch to opal_configure_cores(). This could
     break booting on bare metal Power8 boxes if the kernel was built
     without CONFIG_JUMP_LABEL_FEATURE_CHECK_DEBUG.

   - .. and finally a workaround for spurious PMU interrupts on Power9
     DD2.

  Thanks to: Nicholas Piggin, Anton Blanchard, Balbir Singh"

* tag 'powerpc-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=y
  powerpc/mm/hash: Refactor hash__mark_rodata_ro()
  powerpc/mm/radix: Refactor radix__mark_rodata_ro()
  powerpc/64s: Fix hypercall entry clobbering r12 input
  powerpc/perf: Avoid spurious PMU interrupts after idle
  powerpc/powernv: Fix boot on Power8 bare metal due to opal_configure_cores()
2017-07-21 13:54:37 -07:00
Linus Torvalds
0a6109fd1b Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Ingo Molnar:
 "A fix to WARN_ON_ONCE() done by modules, plus a MAINTAINERS update"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  debug: Fix WARN_ON_ONCE() for modules
  MAINTAINERS: Update the PTRACE entry
2017-07-21 10:41:19 -07:00
Josh Poimboeuf
325cdacd03 debug: Fix WARN_ON_ONCE() for modules
Mike Galbraith reported a situation where a WARN_ON_ONCE() call in DRM
code turned into an oops.  As it turns out, WARN_ON_ONCE() seems to be
completely broken when called from a module.

The bug was introduced with the following commit:

  19d436268d ("debug: Add _ONCE() logic to report_bug()")

That commit changed WARN_ON_ONCE() to move its 'once' logic into the bug
trap handler.  It requires a writable bug table so that the BUGFLAG_DONE
bit can be written to the flags to indicate the first warning has
occurred.

The bug table was made writable for vmlinux, which relies on
vmlinux.lds.S and vmlinux.lds.h for laying out the sections.  However,
it wasn't made writable for modules, which rely on the ELF section
header flags.

Reported-by: Mike Galbraith <efault@gmx.de>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 19d436268d ("debug: Add _ONCE() logic to report_bug()")
Link: http://lkml.kernel.org/r/a53b04235a65478dd9afc51f5b329fdc65c84364.1500095401.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-20 12:31:04 +02:00
Michael Ellerman
029d9252b1 powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=y
Currently even with STRICT_KERNEL_RWX we leave the __init text marked
executable after init, which is bad.

Add a hook to mark it NX (no-execute) before we free it, and implement
it for radix and hash.

Note that we use __init_end as the end address, not _einittext,
because overlaps_kernel_text() uses __init_end, because there are
additional executable sections other than .init.text between
__init_begin and __init_end.

Tested on radix and hash with:

  0:mon> p $__init_begin
  *** 400 exception occurred

Fixes: 1e0fc9d1eb ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-18 19:54:24 +10:00
Gleb Fotengauer-Malinovskiy
c632517923 tty: Fix TIOCGPTPEER ioctl definition
This ioctl does nothing to justify an _IOC_READ or _IOC_WRITE flag
because it doesn't copy anything from/to userspace to access the
argument.

Fixes: 54ebbfb160 ("tty: add TIOCGPTPEER ioctl")
Signed-off-by: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Acked-by: Aleksa Sarai <asarai@suse.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-17 17:04:41 +02:00
Linus Torvalds
89cbec71fe Merge branch 'work.uaccess-unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uacess-unaligned removal from Al Viro:
 "That stuff had just one user, and an exotic one, at that - binfmt_flat
  on arm and m68k"

* 'work.uaccess-unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  kill {__,}{get,put}_user_unaligned()
  binfmt_flat: flat_{get,put}_addr_from_rp() should be able to fail
2017-07-15 11:17:52 -07:00
Linus Torvalds
deed9deb62 powerpc fixes for 4.13 #2
Nothing that really stands out, just a bunch of fixes that have come in in the
 last couple of weeks.
 
 None of these are actually fixes for code that is new in 4.13. It's roughly half
 older bugs, with fixes going to stable, and half fixes/updates for Power9.
 
 Thanks to:
   Aneesh Kumar K.V, Anton Blanchard, Balbir Singh, Benjamin Herrenschmidt,
   Madhavan Srinivasan, Michael Neuling, Nicholas Piggin, Oliver O'Halloran.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZaJKuAAoJEFHr6jzI4aWAojsQAI/Mnu3T9XCPbcuWJVHiNZu2
 CubNKFb9H+HdjRZlgdT7dpx0XITCNWziQmBpQZO+w1kwa3s5uv5qNwj8I30HyTFH
 qAArMq2Yb7GIW4PR8IpbSdCEWZii4YcIiLYk3kiEMyC2UIWeznBq+j79jFyITd9T
 beevhLLq7dMoLO8T2VRSdp4YhdLKNYmgjQG1YFh3YIh8s/tyPQ0OsDaxNk5z8wgt
 /htWiVQ2X/xWRHBnVKT4olc97pXqTvtaCdBWjD/fKNZK50YIwAjr2cJVoXpJidAW
 POwwMto7rjSJDYpE6IgqQnhz0VqRTNB/4QkyaeHJkt6BPhAaafkrHZYtDgc2GaUJ
 G96J6xT9bwHn8Jz1tpsxSgSA9A2GfBhdOUKMpIUVn1l9Q1ELPA0dJzzagLhgRDRG
 w97UQ+CJqvRNfJkRHcuJbNqP5i++6yOHcHGB8QLkmkSYVcCm9Hj/fMXj/DJ5tHt/
 c9t6uw25eqI7raFcxfh09+QTX8VDV96fa+GTo1HnCH71nGG2GqCoJFXyb3HEW64s
 gD7uOPOlFlqfpuGDmV1kmdwH0R15EIJb2b6QsDssrcHEg5fA6z/xcAcV839c9y47
 U8tcBiqFF0vWgB1QljTDrKuNsjg8dIeZfkSekBGD0WuBVLSADBl7f3b2k3BZB5H3
 WxosyR4ufSVfw53HaDsb
 =6eIT
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Nothing that really stands out, just a bunch of fixes that have come
  in in the last couple of weeks.

  None of these are actually fixes for code that is new in 4.13. It's
  roughly half older bugs, with fixes going to stable, and half
  fixes/updates for Power9.

  Thanks to: Aneesh Kumar K.V, Anton Blanchard, Balbir Singh, Benjamin
  Herrenschmidt, Madhavan Srinivasan, Michael Neuling, Nicholas Piggin,
  Oliver O'Halloran"

* tag 'powerpc-4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64: Fix atomic64_inc_not_zero() to return an int
  powerpc: Fix emulation of mfocrf in emulate_step()
  powerpc: Fix emulation of mcrf in emulate_step()
  powerpc/perf: Add POWER9 alternate PM_RUN_CYC and PM_RUN_INST_CMPL events
  powerpc/perf: Fix SDAR_MODE value for continous sampling on Power9
  powerpc/asm: Mark cr0 as clobbered in mftb()
  powerpc/powernv: Fix local TLB flush for boot and MCE on POWER9
  powerpc/mm/radix: Synchronize updates to the process table
  powerpc/mm/radix: Properly clear process table entry
  powerpc/powernv: Tell OPAL about our MMU mode on POWER9
  powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR
2017-07-14 15:33:15 -07:00
Michal Hocko
dcda9b0471 mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
__GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
the page allocator.  This has been true but only for allocations
requests larger than PAGE_ALLOC_COSTLY_ORDER.  It has been always
ignored for smaller sizes.  This is a bit unfortunate because there is
no way to express the same semantic for those requests and they are
considered too important to fail so they might end up looping in the
page allocator for ever, similarly to GFP_NOFAIL requests.

Now that the whole tree has been cleaned up and accidental or misled
usage of __GFP_REPEAT flag has been removed for !costly requests we can
give the original flag a better name and more importantly a more useful
semantic.  Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
that the allocator would try really hard but there is no promise of a
success.  This will work independent of the order and overrides the
default allocator behavior.  Page allocator users have several levels of
guarantee vs.  cost options (take GFP_KERNEL as an example)

 - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
   attempt to free memory at all. The most light weight mode which even
   doesn't kick the background reclaim. Should be used carefully because
   it might deplete the memory and the next user might hit the more
   aggressive reclaim

 - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
   allocation without any attempt to free memory from the current
   context but can wake kswapd to reclaim memory if the zone is below
   the low watermark. Can be used from either atomic contexts or when
   the request is a performance optimization and there is another
   fallback for a slow path.

 - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
   non sleeping allocation with an expensive fallback so it can access
   some portion of memory reserves. Usually used from interrupt/bh
   context with an expensive slow path fallback.

 - GFP_KERNEL - both background and direct reclaim are allowed and the
   _default_ page allocator behavior is used. That means that !costly
   allocation requests are basically nofail but there is no guarantee of
   that behavior so failures have to be checked properly by callers
   (e.g. OOM killer victim is allowed to fail currently).

 - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
   and all allocation requests fail early rather than cause disruptive
   reclaim (one round of reclaim in this implementation). The OOM killer
   is not invoked.

 - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
   behavior and all allocation requests try really hard. The request
   will fail if the reclaim cannot make any progress. The OOM killer
   won't be triggered.

 - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
   and all allocation requests will loop endlessly until they succeed.
   This might be really dangerous especially for larger orders.

Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
because they already had their semantic.  No new users are added.
__alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
there is no progress and we have already passed the OOM point.

This means that all the reclaim opportunities have been exhausted except
the most disruptive one (the OOM killer) and a user defined fallback
behavior is more sensible than keep retrying in the page allocator.

[akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
[mhocko@suse.com: semantic fix]
  Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
[mhocko@kernel.org: address other thing spotted by Vlastimil]
  Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alex Belits <alex.belits@cavium.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: NeilBrown <neilb@suse.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:03 -07:00
Nicholas Piggin
2104180a53 powerpc/64s: implement arch-specific hardlockup watchdog
Implement an arch-speicfic watchdog rather than use the perf-based
hardlockup detector.

The new watchdog takes the soft-NMI directly, rather than going through
perf.  Perf interrupts are to be made maskable in future, so that would
prevent the perf detector from working in those regions.

Additionally, implement a SMP based detector where all CPUs watch one
another by pinging a shared cpumask.  This is because powerpc Book3S
does not have a true periodic local NMI, but some platforms do implement
a true NMI IPI.

If a CPU is stuck with interrupts hard disabled, the soft-NMI watchdog
does not work, but the SMP watchdog will.  Even on platforms without a
true NMI IPI to get a good trace from the stuck CPU, other CPUs will
notice the lockup sufficiently to report it and panic.

[npiggin@gmail.com: honor watchdog disable at boot/hotplug]
  Link: http://lkml.kernel.org/r/20170621001346.5bb337c9@roar.ozlabs.ibm.com
[npiggin@gmail.com: fix false positive warning at CPU unplug]
  Link: http://lkml.kernel.org/r/20170630080740.20766-1-npiggin@gmail.com
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20170616065715.18390-6-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Tested-by: Babu Moger <babu.moger@oracle.com>	[sparc]
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:02 -07:00
Michael Ellerman
01e6a61ace powerpc/64: Fix atomic64_inc_not_zero() to return an int
Although it's not documented anywhere, there is an expectation that
atomic64_inc_not_zero() returns a result which fits in an int. This is
the behaviour implemented on all arches except powerpc.

This has caused at least one bug in practice, in the percpu-refcount
code, where the long result from our atomic64_inc_not_zero() was
truncated to an int leading to lost references and stuck systems. That
was worked around in that code in commit 966d2b04e0 ("percpu-refcount:
fix reference leak during percpu-atomic transition").

To the best of my grepping abilities there are no other callers
in-tree which truncate the value, but we should fix it anyway. Because
the breakage is subtle and potentially very harmful I'm also tagging
it for stable.

Code generation is largely unaffected because in most cases the
callers are just using the result for a test anyway. In particular the
case of fget() that was mentioned in commit a6cf7ed511
("powerpc/atomic: Implement atomic*_inc_not_zero") generates exactly
the same code.

Fixes: a6cf7ed511 ("powerpc/atomic: Implement atomic*_inc_not_zero")
Cc: stable@vger.kernel.org # v3.4
Noticed-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-12 21:49:55 +10:00
Oliver O'Halloran
2400fd822f powerpc/asm: Mark cr0 as clobbered in mftb()
The workaround for the CELL timebase bug does not correctly mark cr0 as
being clobbered. This means GCC doesn't know that the asm block changes cr0 and
might leave the result of an unrelated comparison in cr0 across the block, which
we then trash, leading to basically random behaviour.

Fixes: 859deea949 ("[POWERPC] Cell timebase bug workaround")
Cc: stable@vger.kernel.org # v2.6.19+
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Tweak change log and flag for stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-11 12:53:53 +10:00
Kees Cook
47ebb09d54 powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
Now that explicitly executed loaders are loaded in the mmap region, we
have more freedom to decide where we position PIE binaries in the
address space to avoid possible collisions with mmap or stack regions.

For 64-bit, align to 4GB to allow runtimes to use the entire 32-bit
address space for 32-bit pointers.  On 32-bit use 4MB, which is the
traditional x86 minimum load location, likely to avoid historically
requiring a 4MB page table entry when only a portion of the first 4MB
would be used (since the NULL address is avoided).

Link: http://lkml.kernel.org/r/1498154792-49952-4-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Pratyush Anand <panand@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-10 16:32:36 -07:00
Benjamin Herrenschmidt
1c0eaf0f56 powerpc/powernv: Tell OPAL about our MMU mode on POWER9
That will allow OPAL to configure the CPU in an optimal way.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-10 21:12:27 +10:00
Linus Torvalds
d691b7e7d1 powerpc updates for 4.13
Highlights include:
 
  - Support for STRICT_KERNEL_RWX on 64-bit server CPUs.
 
  - Platform support for FSP2 (476fpe) board
 
  - Enable ZONE_DEVICE on 64-bit server CPUs.
 
  - Generic & powerpc spin loop primitives to optimise busy waiting
 
  - Convert VDSO update function to use new update_vsyscall() interface
 
  - Optimisations to hypercall/syscall/context-switch paths
 
  - Improvements to the CPU idle code on Power8 and Power9.
 
 As well as many other fixes and improvements.
 
 Thanks to:
   Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman Khandual, Anton
   Blanchard, Balbir Singh, Benjamin Herrenschmidt, Christophe Leroy, Christophe
   Lombard, Colin Ian King, Dan Carpenter, Gautham R. Shenoy, Hari Bathini, Ian
   Munsie, Ivan Mikhaylov, Javier Martinez Canillas, Madhavan Srinivasan,
   Masahiro Yamada, Matt Brown, Michael Neuling, Michal Suchanek, Murilo
   Opsfelder Araujo, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul
   Mackerras, Pavel Machek, Russell Currey, Santosh Sivaraj, Stephen Rothwell,
   Thiago Jung Bauermann, Yang Li.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZXyPCAAoJEFHr6jzI4aWAI9QQAISf2x5y//cqCi4ISyQB5pTq
 KLS/yQajNkQOw7c0fzBZOaH5Xd/SJ6AcKWDg8yDlpDR3+sRRsr98iIRECgKS5I7/
 DxD9ywcbSoMXFQQo1ZMCp5CeuMUIJRtugBnUQM+JXCSUCPbznCY5DchDTLyTBTpO
 MeMVhI//JxthhoOMA9MudiEGaYCU9ho442Z4OJUSiLUv8WRbvQX9pTqoc4vx1fxA
 BWf2mflztBVcIfKIyxIIIlDLukkMzix6gSYPMCbC7lzkbnU7JSqKiheJXjo1gJS2
 ePHKDxeNR2/QP0g/j3aT/MR1uTt9MaNBSX3gANE1xQ9OoJ8m1sOtCO4gNbSdLWae
 eXhDnoiEp930DRZOeEioOItuWWoxFaMyYk3BMmRKV4QNdYL3y3TRVeL2/XmRGqYL
 Lxz4IY/x5TteFEJNGcRX90uizNSi8AaEXPF16pUk8Ctt6eH3ZSwPMv2fHeYVCMr0
 KFlKHyaPEKEoztyzLcUR6u9QB56yxDN58bvLpd32AeHvKhqyxFoySy59x9bZbatn
 B2y8mmDItg860e0tIG6jrtplpOVvL8i5jla5RWFVoQDuxxrLAds3vG9JZQs+eRzx
 Fiic93bqeUAS6RzhXbJ6QQJYIyhE7yqpcgv7ME1W87SPef3HPBk9xlp3yIDwdA2z
 bBDvrRnvupusz8qCWrxe
 =w8rj
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Highlights include:

   - Support for STRICT_KERNEL_RWX on 64-bit server CPUs.

   - Platform support for FSP2 (476fpe) board

   - Enable ZONE_DEVICE on 64-bit server CPUs.

   - Generic & powerpc spin loop primitives to optimise busy waiting

   - Convert VDSO update function to use new update_vsyscall() interface

   - Optimisations to hypercall/syscall/context-switch paths

   - Improvements to the CPU idle code on Power8 and Power9.

  As well as many other fixes and improvements.

  Thanks to: Akshay Adiga, Andrew Donnellan, Andrew Jeffery, Anshuman
  Khandual, Anton Blanchard, Balbir Singh, Benjamin Herrenschmidt,
  Christophe Leroy, Christophe Lombard, Colin Ian King, Dan Carpenter,
  Gautham R. Shenoy, Hari Bathini, Ian Munsie, Ivan Mikhaylov, Javier
  Martinez Canillas, Madhavan Srinivasan, Masahiro Yamada, Matt Brown,
  Michael Neuling, Michal Suchanek, Murilo Opsfelder Araujo, Naveen N.
  Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pavel Machek,
  Russell Currey, Santosh Sivaraj, Stephen Rothwell, Thiago Jung
  Bauermann, Yang Li"

* tag 'powerpc-4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (158 commits)
  powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs
  powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for Radix
  powerpc/mm/hash: Implement mark_rodata_ro() for hash
  powerpc/vmlinux.lds: Align __init_begin to 16M
  powerpc/lib/code-patching: Use alternate map for patch_instruction()
  powerpc/xmon: Add patch_instruction() support for xmon
  powerpc/kprobes/optprobes: Use patch_instruction()
  powerpc/kprobes: Move kprobes over to patch_instruction()
  powerpc/mm/radix: Fix execute permissions for interrupt_vectors
  powerpc/pseries: Fix passing of pp0 in updatepp() and updateboltedpp()
  powerpc/64s: Blacklist rtas entry/exit from kprobes
  powerpc/64s: Blacklist functions invoked on a trap
  powerpc/64s: Un-blacklist system_call() from kprobes
  powerpc/64s: Move system_call() symbol to just after setting MSR_EE
  powerpc/64s: Blacklist system_call() and system_call_common() from kprobes
  powerpc/64s: Convert .L__replay_interrupt_return to a local label
  powerpc64/elfv1: Only dereference function descriptor for non-text symbols
  cxl: Export library to support IBM XSL
  powerpc/dts: Use #include "..." to include local DT
  powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8
  ...
2017-07-07 13:55:45 -07:00
Linus Torvalds
9f45efb928 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - a few hotfixes

 - various misc updates

 - ocfs2 updates

 - most of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (108 commits)
  mm, memory_hotplug: move movable_node to the hotplug proper
  mm, memory_hotplug: drop CONFIG_MOVABLE_NODE
  mm, memory_hotplug: drop artificial restriction on online/offline
  mm: memcontrol: account slab stats per lruvec
  mm: memcontrol: per-lruvec stats infrastructure
  mm: memcontrol: use generic mod_memcg_page_state for kmem pages
  mm: memcontrol: use the node-native slab memory counters
  mm: vmstat: move slab statistics from zone to node counters
  mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()
  mm/zswap.c: improve a size determination in zswap_frontswap_init()
  mm/zswap.c: delete an error message for a failed memory allocation in zswap_pool_create()
  mm/swapfile.c: sort swap entries before free
  mm/oom_kill: count global and memory cgroup oom kills
  mm: per-cgroup memory reclaim stats
  mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects
  mm: kmemleak: factor object reference updating out of scan_block()
  mm: kmemleak: slightly reduce the size of some structures on 64-bit architectures
  mm, mempolicy: don't check cpuset seqlock where it doesn't matter
  mm, cpuset: always use seqlock when changing task's nodemask
  mm, mempolicy: simplify rebinding mempolicies when updating cpusets
  ...
2017-07-06 22:27:08 -07:00
Linus Torvalds
dc502142b6 Merge branch 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull user access str* updates from Al Viro:
 "uaccess str...() dead code removal"

* 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  s390 keyboard.c: don't open-code strndup_user()
  mips: get rid of unused __strnlen_user()
  get rid of unused __strncpy_from_user() instances
  kill strlen_user()
2017-07-06 22:07:44 -07:00
Linus Torvalds
c856863988 Merge branch 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc compat stuff updates from Al Viro:
 "This part is basically untangling various compat stuff. Compat
  syscalls moved to their native counterparts, getting rid of quite a
  bit of double-copying and/or set_fs() uses. A lot of field-by-field
  copyin/copyout killed off.

   - kernel/compat.c is much closer to containing just the
     copyin/copyout of compat structs. Not all compat syscalls are gone
     from it yet, but it's getting there.

   - ipc/compat_mq.c killed off completely.

   - block/compat_ioctl.c cleaned up; floppy compat ioctls moved to
     drivers/block/floppy.c where they belong. Yes, there are several
     drivers that implement some of the same ioctls. Some are m68k and
     one is 32bit-only pmac. drivers/block/floppy.c is the only one in
     that bunch that can be built on biarch"

* 'misc.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  mqueue: move compat syscalls to native ones
  usbdevfs: get rid of field-by-field copyin
  compat_hdio_ioctl: get rid of set_fs()
  take floppy compat ioctls to sodding floppy.c
  ipmi: get rid of field-by-field __get_user()
  ipmi: get COMPAT_IPMICTL_RECEIVE_MSG in sync with the native one
  rt_sigtimedwait(): move compat to native
  select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()
  put_compat_rusage(): switch to copy_to_user()
  sigpending(): move compat to native
  getrlimit()/setrlimit(): move compat to native
  times(2): move compat to native
  compat_{get,put}_bitmap(): use unsafe_{get,put}_user()
  fb_get_fscreeninfo(): don't bother with do_fb_ioctl()
  do_sigaltstack(): lift copying to/from userland into callers
  take compat_sys_old_getrlimit() to native syscall
  trim __ARCH_WANT_SYS_OLD_GETRLIMIT
2017-07-06 20:57:13 -07:00
Linus Torvalds
f72e24a124 This is the first pull request for the new dma-mapping subsystem
In this new subsystem we'll try to properly maintain all the generic
 code related to dma-mapping, and will further consolidate arch code
 into common helpers.
 
 This pull request contains:
 
  - removal of the DMA_ERROR_CODE macro, replacing it with calls
    to ->mapping_error so that the dma_map_ops instances are
    more self contained and can be shared across architectures (me)
  - removal of the ->set_dma_mask method, which duplicates the
    ->dma_capable one in terms of functionality, but requires more
    duplicate code.
  - various updates for the coherent dma pool and related arm code
    (Vladimir)
  - various smaller cleanups (me)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlldmw0LHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOiKA/+Ln1mFLSf3nfTzIHa24Bbk8ZTGr0B8TD4Vmyyt8iG
 oO3AeaTLn3d6ugbH/uih/tPz8PuyXsdiTC1rI/ejDMiwMTSjW6phSiIHGcStSR9X
 VFNhmMFacp7QpUpvxceV0XZYKDViAoQgHeGdp3l+K5h/v4AYePV/v/5RjQPaEyOh
 YLbCzETO+24mRWdJxdAqtTW4ovYhzj6XsiJ+pAjlV0+SWU6m5L5E+VAPNi1vqv1H
 1O2KeCFvVYEpcnfL3qnkw2timcjmfCfeFAd9mCUAc8mSRBfs3QgDTKw3XdHdtRml
 LU2WuA5cpMrOdBO4mVra2plo8E2szvpB1OZZXoKKdCpK3VGwVpVHcTvClK2Ks/3B
 GDLieroEQNu2ZIUIdWXf/g2x6le3BcC9MmpkAhnGPqCZ7skaIBO5Cjpxm0zTJAPl
 PPY3CMBBEktAvys6DcudOYGixNjKUuAm5lnfpcfTEklFdG0AjhdK/jZOplAFA6w4
 LCiy0rGHM8ZbVAaFxbYoFCqgcjnv6EjSiqkJxVI4fu/Q7v9YXfdPnEmE0PJwCVo5
 +i7aCLgrYshTdHr/F3e5EuofHN3TDHwXNJKGh/x97t+6tt326QMvDKX059Kxst7R
 rFukGbrYvG8Y7yXwrSDbusl443ta0Ht7T1oL4YUoJTZp0nScAyEluDTmrH1JVCsT
 R4o=
 =0Fso
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping infrastructure from Christoph Hellwig:
 "This is the first pull request for the new dma-mapping subsystem

  In this new subsystem we'll try to properly maintain all the generic
  code related to dma-mapping, and will further consolidate arch code
  into common helpers.

  This pull request contains:

   - removal of the DMA_ERROR_CODE macro, replacing it with calls to
     ->mapping_error so that the dma_map_ops instances are more self
     contained and can be shared across architectures (me)

   - removal of the ->set_dma_mask method, which duplicates the
     ->dma_capable one in terms of functionality, but requires more
     duplicate code.

   - various updates for the coherent dma pool and related arm code
     (Vladimir)

   - various smaller cleanups (me)"

* tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping: (56 commits)
  ARM: dma-mapping: Remove traces of NOMMU code
  ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus
  ARM: NOMMU: Introduce dma operations for noMMU
  drivers: dma-mapping: allow dma_common_mmap() for NOMMU
  drivers: dma-coherent: Introduce default DMA pool
  drivers: dma-coherent: Account dma_pfn_offset when used with device tree
  dma: Take into account dma_pfn_offset
  dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs
  dma-mapping: remove dmam_free_noncoherent
  crypto: qat - avoid an uninitialized variable warning
  au1100fb: remove a bogus dma_free_nonconsistent call
  MAINTAINERS: add entry for dma mapping helpers
  powerpc: merge __dma_set_mask into dma_set_mask
  dma-mapping: remove the set_dma_mask method
  powerpc/cell: use the dma_supported method for ops switching
  powerpc/cell: clean up fixed mapping dma_ops initialization
  tile: remove dma_supported and mapping_error methods
  xen-swiotlb: remove xen_swiotlb_set_dma_mask
  arm: implement ->dma_supported instead of ->set_dma_mask
  mips/loongson64: implement ->dma_supported instead of ->set_dma_mask
  ...
2017-07-06 19:20:54 -07:00
Linus Torvalds
c136b84393 PPC:
- Better machine check handling for HV KVM
 - Ability to support guests with threads=2, 4 or 8 on POWER9
 - Fix for a race that could cause delayed recognition of signals
 - Fix for a bug where POWER9 guests could sleep with interrupts pending.
 
 ARM:
 - VCPU request overhaul
 - allow timer and PMU to have their interrupt number selected from userspace
 - workaround for Cavium erratum 30115
 - handling of memory poisonning
 - the usual crop of fixes and cleanups
 
 s390:
 - initial machine check forwarding
 - migration support for the CMMA page hinting information
 - cleanups and fixes
 
 x86:
 - nested VMX bugfixes and improvements
 - more reliable NMI window detection on AMD
 - APIC timer optimizations
 
 Generic:
 - VCPU request overhaul + documentation of common code patterns
 - kvm_stat improvements
 
 There is a small conflict in arch/s390 due to an arch-wide field rename.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZW4XTAAoJEL/70l94x66DkhMH/izpk54KI17PtyQ9VYI2sYeZ
 BWK6Kl886g3ij4pFi3pECqjDJzWaa3ai+vFfzzpJJ8OkCJT5Rv4LxC5ERltVVmR8
 A3T1I/MRktSC0VJLv34daPC2z4Lco/6SPipUpPnL4bE2HATKed4vzoOjQ3tOeGTy
 dwi7TFjKwoVDiM7kPPDRnTHqCe5G5n13sZ49dBe9WeJ7ttJauWqoxhlYosCGNPEj
 g8ZX8+cvcAhVnz5uFL8roqZ8ygNEQq2mgkU18W8ZZKuiuwR0gdsG0gSBFNTdwIMK
 NoreRKMrw0+oLXTIB8SZsoieU6Qi7w3xMAMabe8AJsvYtoersugbOmdxGCr1lsA=
 =OD7H
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "PPC:
   - Better machine check handling for HV KVM
   - Ability to support guests with threads=2, 4 or 8 on POWER9
   - Fix for a race that could cause delayed recognition of signals
   - Fix for a bug where POWER9 guests could sleep with interrupts pending.

  ARM:
   - VCPU request overhaul
   - allow timer and PMU to have their interrupt number selected from userspace
   - workaround for Cavium erratum 30115
   - handling of memory poisonning
   - the usual crop of fixes and cleanups

  s390:
   - initial machine check forwarding
   - migration support for the CMMA page hinting information
   - cleanups and fixes

  x86:
   - nested VMX bugfixes and improvements
   - more reliable NMI window detection on AMD
   - APIC timer optimizations

  Generic:
   - VCPU request overhaul + documentation of common code patterns
   - kvm_stat improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
  Update my email address
  kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
  x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
  kvm: x86: mmu: allow A/D bits to be disabled in an mmu
  x86: kvm: mmu: make spte mmio mask more explicit
  x86: kvm: mmu: dead code thanks to access tracking
  KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code
  KVM: PPC: Book3S HV: Close race with testing for signals on guest entry
  KVM: PPC: Book3S HV: Simplify dynamic micro-threading code
  KVM: x86: remove ignored type attribute
  KVM: LAPIC: Fix lapic timer injection delay
  KVM: lapic: reorganize restart_apic_timer
  KVM: lapic: reorganize start_hv_timer
  kvm: nVMX: Check memory operand to INVVPID
  KVM: s390: Inject machine check into the nested guest
  KVM: s390: Inject machine check into the guest
  tools/kvm_stat: add new interactive command 'b'
  tools/kvm_stat: add new command line switch '-i'
  tools/kvm_stat: fix error on interactive command 'g'
  KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit
  ...
2017-07-06 18:38:31 -07:00
Aneesh Kumar K.V
40692eb5ee powerpc/mm/hugetlb: add support for 1G huge pages
POWER9 supports hugepages of size 2M and 1G in radix MMU mode.  This
patch enables the usage of 1G page size for hugetlbfs.  This also update
the helper such we can do 1G page allocation at runtime.

We still don't enable 1G page size on DD1 version.  This is to avoid
doing workaround mentioned in commit 6d3a0379eb ("powerpc/mm: Add
radix__tlb_flush_pte_p9_dd1()").

Link: http://lkml.kernel.org/r/1494995292-4443-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-06 16:24:33 -07:00
Linus Torvalds
5518b69b76 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Reasonably busy this cycle, but perhaps not as busy as in the 4.12
  merge window:

   1) Several optimizations for UDP processing under high load from
      Paolo Abeni.

   2) Support pacing internally in TCP when using the sch_fq packet
      scheduler for this is not practical. From Eric Dumazet.

   3) Support mutliple filter chains per qdisc, from Jiri Pirko.

   4) Move to 1ms TCP timestamp clock, from Eric Dumazet.

   5) Add batch dequeueing to vhost_net, from Jason Wang.

   6) Flesh out more completely SCTP checksum offload support, from
      Davide Caratti.

   7) More plumbing of extended netlink ACKs, from David Ahern, Pablo
      Neira Ayuso, and Matthias Schiffer.

   8) Add devlink support to nfp driver, from Simon Horman.

   9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa
      Prabhu.

  10) Add stack depth tracking to BPF verifier and use this information
      in the various eBPF JITs. From Alexei Starovoitov.

  11) Support XDP on qed device VFs, from Yuval Mintz.

  12) Introduce BPF PROG ID for better introspection of installed BPF
      programs. From Martin KaFai Lau.

  13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann.

  14) For loads, allow narrower accesses in bpf verifier checking, from
      Yonghong Song.

  15) Support MIPS in the BPF selftests and samples infrastructure, the
      MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David
      Daney.

  16) Support kernel based TLS, from Dave Watson and others.

  17) Remove completely DST garbage collection, from Wei Wang.

  18) Allow installing TCP MD5 rules using prefixes, from Ivan
      Delalande.

  19) Add XDP support to Intel i40e driver, from Björn Töpel

  20) Add support for TC flower offload in nfp driver, from Simon
      Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub
      Kicinski, and Bert van Leeuwen.

  21) IPSEC offloading support in mlx5, from Ilan Tayari.

  22) Add HW PTP support to macb driver, from Rafal Ozieblo.

  23) Networking refcount_t conversions, From Elena Reshetova.

  24) Add sock_ops support to BPF, from Lawrence Brako. This is useful
      for tuning the TCP sockopt settings of a group of applications,
      currently via CGROUPs"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits)
  net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap
  dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap
  cxgb4: Support for get_ts_info ethtool method
  cxgb4: Add PTP Hardware Clock (PHC) support
  cxgb4: time stamping interface for PTP
  nfp: default to chained metadata prepend format
  nfp: remove legacy MAC address lookup
  nfp: improve order of interfaces in breakout mode
  net: macb: remove extraneous return when MACB_EXT_DESC is defined
  bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case
  bpf: fix return in load_bpf_file
  mpls: fix rtm policy in mpls_getroute
  net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t
  net, ax25: convert ax25_route.refcount from atomic_t to refcount_t
  net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t
  net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t
  ...
2017-07-05 12:31:59 -07:00
Linus Torvalds
9a715cd543 TTY/Serial patches for 4.13-rc1
Here is the large tty/serial patchset for 4.13-rc1.
 
 A lot of tty and serial driver updates are in here, along with some
 fixups for some __get/put_user usages that were reported.  Nothing huge,
 just lots of development by a number of different developers, full
 details in the shortlog.
 
 All of these have been in linux-next for a while.  There will be a merge
 issue with the arm-soc tree in the include/linux/platform_data/atmel.h
 file.  Stephen has sent out a fixup for it, so it shouldn't be that
 difficult to merge.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWVpZ9w8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ylkTgCfV2HhbxIph/aEL1nJmwW64oCXFrMAoK59ZH65
 tBZIosv0d91K1A+mObBT
 =adPL
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial updates from Greg KH:
 "Here is the large tty/serial patchset for 4.13-rc1.

  A lot of tty and serial driver updates are in here, along with some
  fixups for some __get/put_user usages that were reported. Nothing
  huge, just lots of development by a number of different developers,
  full details in the shortlog.

  All of these have been in linux-next for a while"

* tag 'tty-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (71 commits)
  tty: serial: lpuart: add a more accurate baud rate calculation method
  tty: serial: lpuart: add earlycon support for imx7ulp
  tty: serial: lpuart: add imx7ulp support
  dt-bindings: serial: fsl-lpuart: add i.MX7ULP support
  tty: serial: lpuart: add little endian 32 bit register support
  tty: serial: lpuart: refactor lpuart32_{read|write} prototype
  tty: serial: lpuart: introduce lpuart_soc_data to represent SoC property
  serial: imx-serial - move DMA buffer configuration to DT
  serial: imx: Enable RTSD only when needed
  serial: imx: Remove unused members from imx_port struct
  serial: 8250: 8250_omap: Fix race b/w dma completion and RX timeout
  serial: 8250: Fix THRE flag usage for CAP_MINI
  tty/serial: meson_uart: update to stable bindings
  dt-bindings: serial: Add bindings for the Amlogic Meson UARTs
  serial: Delete dead code for CIR serial ports
  serial: sirf: make of_device_ids const
  serial/mpsc: switch to dma_alloc_attrs
  tty: serial: Add Actions Semi Owl UART earlycon
  dt-bindings: serial: Document Actions Semi Owl UARTs
  tty/serial: atmel: make the driver DT only
  ...
2017-07-03 20:04:16 -07:00
Balbir Singh
cd65d69713 powerpc/mm/hash: Implement mark_rodata_ro() for hash
With hash we update the bolted pte to mark it read-only. We rely
on the MMU_FTR_KERNEL_RO to generate the correct permissions
for read-only text. The radix implementation just prints a warning
in this implementation

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Make the warning louder when we don't have MMU_FTR_KERNEL_RO]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-04 11:35:16 +10:00
Linus Torvalds
9a9594efe5 Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP hotplug updates from Thomas Gleixner:
 "This update is primarily a cleanup of the CPU hotplug locking code.

  The hotplug locking mechanism is an open coded RWSEM, which allows
  recursive locking. The main problem with that is the recursive nature
  as it evades the full lockdep coverage and hides potential deadlocks.

  The rework replaces the open coded RWSEM with a percpu RWSEM and
  establishes full lockdep coverage that way.

  The bulk of the changes fix up recursive locking issues and address
  the now fully reported potential deadlocks all over the place. Some of
  these deadlocks have been observed in the RT tree, but on mainline the
  probability was low enough to hide them away."

* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
  cpu/hotplug: Constify attribute_group structures
  powerpc: Only obtain cpu_hotplug_lock if called by rtasd
  ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init
  cpu/hotplug: Remove unused check_for_tasks() function
  perf/core: Don't release cred_guard_mutex if not taken
  cpuhotplug: Link lock stacks for hotplug callbacks
  acpi/processor: Prevent cpu hotplug deadlock
  sched: Provide is_percpu_thread() helper
  cpu/hotplug: Convert hotplug locking to percpu rwsem
  s390: Prevent hotplug rwsem recursion
  arm: Prevent hotplug rwsem recursion
  arm64: Prevent cpu hotplug rwsem recursion
  kprobes: Cure hotplug lock ordering issues
  jump_label: Reorder hotplug lock and jump_label_lock
  perf/tracing/cpuhotplug: Fix locking order
  ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus()
  PCI: Replace the racy recursion prevention
  PCI: Use cpu_hotplug_disable() instead of get_online_cpus()
  perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode()
  x86/perf: Drop EXPORT of perf_check_microcode
  ...
2017-07-03 18:08:06 -07:00
Al Viro
3170d8d226 kill {__,}{get,put}_user_unaligned()
no users left

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-07-03 18:44:22 -04:00
Naveen N. Rao
83e840c770 powerpc64/elfv1: Only dereference function descriptor for non-text symbols
Currently, we assume that the function pointer we receive in
ppc_function_entry() points to a function descriptor. However, this is
not always the case. In particular, assembly symbols without the right
annotation do not have an associated function descriptor. Some of these
symbols are added to the kprobe blacklist using _ASM_NOKPROBE_SYMBOL().

When such addresses are subsequently processed through
arch_deref_entry_point() in populate_kprobe_blacklist(), we see the
below errors during bootup:
    [    0.663963] Failed to find blacklist at 7d9b02a648029b6c
    [    0.663970] Failed to find blacklist at a14d03d0394a0001
    [    0.663972] Failed to find blacklist at 7d5302a6f94d0388
    [    0.663973] Failed to find blacklist at 48027d11e8610178
    [    0.663974] Failed to find blacklist at f8010070f8410080
    [    0.663976] Failed to find blacklist at 386100704801f89d
    [    0.663977] Failed to find blacklist at 7d5302a6f94d00b0

Fix this by checking if the function pointer we receive in
ppc_function_entry() already points to kernel text. If so, we just
return it as is. If not, we assume that this is a function descriptor
and proceed to dereference it.

Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:08:50 +10:00
Christophe Lombard
3ced8d7300 cxl: Export library to support IBM XSL
This patch exports a in-kernel 'library' API which can be called by
other drivers to help interacting with an IBM XSL on a POWER9 system.

The XSL (Translation Service Layer) is a stripped down version of the
PSL (Power Service Layer) used in some cards such as the Mellanox CX5.
Like the PSL, it implements the CAIA architecture, but has a number
of differences, mostly in it's implementation dependent registers.

The XSL also uses a special DMA cxl mode, which uses a slightly
different init sequence for the CAPP and PHB.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-03 23:07:03 +10:00
Michael Ellerman
218ea31039 Merge branch 'fixes' into next
Merge our fixes branch, a few of them are tripping people up while
working on top of next, and we also have a dependency between the CXL
fixes and new CXL code we want to merge into next.
2017-07-03 23:05:43 +10:00
Paolo Bonzini
8a53e7e572 Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
- Better machine check handling for HV KVM
- Ability to support guests with threads=2, 4 or 8 on POWER9
- Fix for a race that could cause delayed recognition of signals
- Fix for a bug where POWER9 guests could sleep with interrupts
  pending.
2017-07-03 10:41:59 +02:00
Oliver O'Halloran
ebd3119793 powerpc/mm: Add devmap support for ppc64
Add support for the devmap bit on PTEs and PMDs for PPC64 Book3S.  This
is used to differentiate device backed memory from transparent huge
pages since they are handled in more or less the same manner by the core
mm code.

Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:28 +10:00
Nicholas Piggin
4e287e655e powerpc: use spin loop primitives in some functions
Use the different spin loop primitives in some simple powerpc
spin loops, including those which will spin as a common case.

This will help to test the spin loop primitives before more
conversions are done.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add some includes of <linux/processor.h>]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:24 +10:00
Nicholas Piggin
ede8e2bbb0 powerpc/64: implement spin loop primitives
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-02 20:40:17 +10:00
Paul Mackerras
898b25b202 KVM: PPC: Book3S HV: Simplify dynamic micro-threading code
Since commit b009031f74 ("KVM: PPC: Book3S HV: Take out virtual
core piggybacking code", 2016-09-15), we only have at most one
vcore per subcore.  Previously, the fact that there might be more
than one vcore per subcore meant that we had the notion of a
"master vcore", which was the vcore that controlled thread 0 of
the subcore.  We also needed a list per subcore in the core_info
struct to record which vcores belonged to each subcore.  Now that
there can only be one vcore in the subcore, we can replace the
list with a simple pointer and get rid of the notion of the
master vcore (and in fact treat every vcore as a master vcore).

We can also get rid of the subcore_vm[] field in the core_info
struct since it is never read.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-07-01 18:59:01 +10:00
Linus Torvalds
b4df2e3537 powerpc fixes for 4.12 #8
Two fixes for code we merged this cycle:
 
  - cxl: Fixes for Coherent Accelerator Interface Architecture 2.0
  - Avoid miscompilation w/GCC 4.6.3 on 32-bit - don't inline copy_to/from_user()
 
 Thanks to:
   Al Viro, Larry Finger, Christophe Lombard.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZViCDAAoJEFHr6jzI4aWA5eMQAMBGbDU3k+OHT2kuZg1Obnyo
 HADdBg1ZcCZ4MI0xOTiFb4ETsUcXcazGle6N1z/RjNYLA0KJobV5b+t/i+ybGtz2
 0a+35j7G7i+rxBMkWFfGUgZewwWPZkOry4BmXyQHHHeVnEOyF6jj/pbm22oedf1o
 NCogUbWKhxm2YqYzftfur09dG00T59mAKQ7BeHMkhR3p6lbOD/sMZPiquXO2cV2C
 78buxYCl1SqAx2yyPrmSBbVxUF5+PKvANaniQL+jYe7fC9GVNUoJJ5Dh0NCgvqKJ
 r9u8/1K9hSCAZDGhOWePPCFnqLH4hnyFN8m8S94tMNFnK3VDhoy+45GJ+7x6RCGH
 7Xvi6qef6n2jqrj7pggsPu3NKGtd8mmBVcPOxjdyPI6R2QZeRbdrx7NyvNB3xDDF
 rUsju/aHjJJPKDIq4hbDJTMSWQMe5+Bb8aEKOYupEQ/X//MFqz8gukVcQCJNU6Pn
 0TbOE+FUSgICY8IB2rI7UBa+rKKM8VDcg1rz0YYSCGfDOccMfq9IxAlihe4y3fpz
 KzuKnkCQBVT6+Q6AayqZlqVttWU+eIG/cm9dHS9bPXDKb0XyoOSl0ZcytflmlFR9
 xsZxD7/69DoRpdV0t0kpiLK9lWd3QhPaSukhn/aoUGXsFcMeJTYpsinuvVNi3hFh
 ldhIKrQbvY7k0s7xGOCi
 =Yq9i
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.12-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Hopefully the last two powerpc fixes for 4.12.

  The CXL one is larger than I'd usually send at rc7, but it fixes new
  code this cycle, so better to have it working for the release. It was
  actually sent a few weeks back but got blocked in testing behind
  another fix that was causing issues.

  We are still tracking one crash in v4.12-rc7, but only one person has
  reproduced it and the commit identified by bisect doesn't touch any of
  the relevant code, so I think it's 50/50 whether that commit is
  actually the problem or it's some code layout / toolchain issue.

  Two fixes for code we merged this cycle:

   - cxl: Fixes for Coherent Accelerator Interface Architecture 2.0

   - Avoid miscompilation w/GCC 4.6.3 on 32-bit - don't inline
     copy_to/from_user()

  Thanks to Al Viro, Larry Finger, Christophe Lombard"

* tag 'powerpc-4.12-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/32: Avoid miscompilation w/GCC 4.6.3 - don't inline copy_to/from_user()
  cxl: Fixes for Coherent Accelerator Interface Architecture 2.0
2017-06-30 10:55:34 -07:00
David S. Miller
b079115937 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
A set of overlapping changes in macvlan and the rocker
driver, nothing serious.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30 12:43:08 -04:00
Paolo Bonzini
04a7ea04d5 KVM/ARM updates for 4.13
- vcpu request overhaul
 - allow timer and PMU to have their interrupt number
   selected from userspace
 - workaround for Cavium erratum 30115
 - handling of memory poisonning
 - the usual crop of fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAllWCM0VHG1hcmMuenlu
 Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDjJ0QAI16x6+trKhH31lTSYekYfqm4hZ2
 Fp7IbALW9KNCaY35tZov2Zuh99qGRduxTh7ewqhKpON8kkU+UKj0F7zH22+vfN4m
 yas/+uNr8R9VLyvea4ysPsgx8Q8v1Ix9setohHYNZIL9/klVqtaHpYvArHVF/mzq
 p2j/NxRS2dlp9r2TtoMRMhA05u6r0wolhUuh+z9v2ipib0gfOBIG24jsqCTEcD9n
 5A/cVd+ztYshkrV95h3y9peahwt3zOA4QBGzrQ2K25jp0s54nqhmC7JTNSa8dtar
 YGW2MuAMoIFTwCFAlpwCzrwpOJFzF3Q6A8bOxei2fjclzjPMgT1xQxuhOoe4ntFa
 lTPxSHalm5W6dFTW90YSo2DBcPe+N7sQkhjR0cCeY3GYsOFhXMLTlOl5Pt1YK1or
 +3FAI74tFRKvVmb9mhZeGTvuzhDgRvtf3Qq5rjwlGzKc2BBOEgtMyj/Wgwo4N6Dz
 IjOnoRaUGELoBCWoTorMxLpsPBdPVSUxNyJTdAhqZ/ZtT1xqjhFNLZcrVWmOTzDM
 1cav+jZkla4sLmJSNDD54aCSvvtPHis0nZn9PRlh12xgOyYiAVx4K++MNuWP0P37
 hbh1gbPT+FcoVxPurUsX/pjNlTucPZcBwFytZDQlpwtPBpEFzJiImLYe/PldRb0f
 9WQOH1Y1+q14MF+N
 =6hNK
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM updates for 4.13

- vcpu request overhaul
- allow timer and PMU to have their interrupt number
  selected from userspace
- workaround for Cavium erratum 30115
- handling of memory poisonning
- the usual crop of fixes and cleanups

Conflicts:
	arch/s390/include/asm/kvm_host.h
2017-06-30 12:38:26 +02:00
Tobias Klauser
6474924e2b arch: remove unused macro/function thread_saved_pc()
The only user of thread_saved_pc() in non-arch-specific code was removed
in commit 8243d55977 ("sched/core: Remove pointless printout in
sched_show_task()").  Remove the implementations as well.

Some architectures use thread_saved_pc() in their arch-specific code.
Leave their thread_saved_pc() intact.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-06-28 16:13:57 -07:00
Christoph Hellwig
a9a7b06f58 powerpc: merge __dma_set_mask into dma_set_mask
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-06-28 06:54:55 -07:00
Christoph Hellwig
6009faa43f powerpc: implement ->mapping_error
DMA_ERROR_CODE is going to go away, so don't rely on it.  Instead
define a ->mapping_error method for all IOMMU based dma operation
instances.  The direct ops don't ever return an error and don't
need a ->mapping_error method.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 06:54:33 -07:00
Hari Bathini
eae0dfcc44 powerpc/fadump: avoid holes in boot memory area when fadump is registered
To register fadump, boot memory area - the size of low memory chunk that
is required for a kernel to boot successfully when booted with restricted
memory, is assumed to have no holes. But this memory area is currently
not protected from hot-remove operations. So, fadump could fail to
re-register after a memory hot-remove operation, if memory is removed
from boot memory area. To avoid this, ensure that memory from boot
memory area is not hot-removed when fadump is registered.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Reviewed-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-28 13:08:09 +10:00
Russell Currey
a4b48ba904 powerpc/powernv/pci: Add support for PHB4 diagnostics
As with P7IOC and PHB3, add kernel-side support for decoding and printing
diagnostic data for PHB4.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-27 12:14:27 +10:00
Michael Ellerman
d6bd8194e2 powerpc/32: Avoid miscompilation w/GCC 4.6.3 - don't inline copy_to/from_user()
Larry Finger reported that his Powerbook G4 was no longer booting with v4.12-rc,
userspace was up but giving weird errors such as:

  udevd[64]: starting version 175
  udevd[64]: Unable to receive ctrl message: Bad address.
  modprobe: chdir(4.12-rc1): No such file or directory

He bisected the problem to commit 3448890c32 ("powerpc: get rid of zeroing,
switch to RAW_COPY_USER").

Al identified that the problem is actually a miscompilation by GCC 4.6.3, which
is exposed by the above commit.

Al also pointed out that inlining copy_to/from_user() is probably of little or
no benefit, which is correct. Using Anton's copy_to_user benchmark, with a
pathological single byte copy, we see a small increase in performance
by *removing* inlining:

  Before (inlined):
  # time ./copy_to_user -w -l 1 -i 10000000	( x 3 )
  real	0m22.063s
  real	0m22.059s
  real	0m22.076s

  After:
  # time ./copy_to_user -w -l 1 -i 10000000	( x 3 )
  real	0m21.325s
  real	0m21.299s
  real	0m21.364s

So as a small performance improvement and to avoid the miscompilation, drop
inlining copy_to/from_user() on 32-bit.

Fixes: 3448890c32 ("powerpc: get rid of zeroing, switch to RAW_COPY_USER")
Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-26 23:25:08 +10:00
Balbir Singh
0428491cba powerpc/mm: Trace tlbie(l) instructions
Add a trace point for tlbie(l) (Translation Lookaside Buffer Invalidate
Entry (Local)) instructions.

The tlbie instruction has changed over the years, so not all versions
accept the same operands. Use the ISA v3 field operands because they are
the most verbose, we may change them in future.

Example output:

  qemu-system-ppc-5371  [016]  1412.369519: tlbie:
  	tlbie with lpid 0, local 1, rb=67bd8900174c11c1, rs=0, ric=0 prs=0 r=0

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Add some missing trace_tlbie()s, reword change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-23 21:14:49 +10:00
Thiago Jung Bauermann
3e401f7a2e powerpc: Only obtain cpu_hotplug_lock if called by rtasd
Calling arch_update_cpu_topology from a CPU hotplug state machine callback
hits a deadlock because the function tries to get a read lock on
cpu_hotplug_lock while the state machine still holds a write lock on it.

Since all callers of arch_update_cpu_topology except rtasd already hold
cpu_hotplug_lock, this patch changes the function to use
stop_machine_cpuslocked and creates a separate function for rtasd which
still tries to obtain the lock.

Michael Bringmann investigated the bug and provided a detailed analysis
of the deadlock on this previous RFC for an alternate solution:

Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: John Allen <jallen@linux.vnet.ibm.com>
Cc: Michael Bringmann <mwb@linux.vnet.ibm.com>
Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1497996510-4032-1-git-send-email-bauerman@linux.vnet.ibm.com
Link: https://patchwork.ozlabs.org/patch/771293/
2017-06-23 09:32:11 +02:00
Aravinda Prasad
e20bbd3d8d KVM: PPC: Book3S HV: Exit guest upon MCE when FWNMI capability is enabled
Enhance KVM to cause a guest exit with KVM_EXIT_NMI
exit reason upon a machine check exception (MCE) in
the guest address space if the KVM_CAP_PPC_FWNMI
capability is enabled (instead of delivering a 0x200
interrupt to guest). This enables QEMU to build error
log and deliver machine check exception to guest via
guest registered machine check handler.

This approach simplifies the delivery of machine
check exception to guest OS compared to the earlier
approach of KVM directly invoking 0x200 guest interrupt
vector.

This design/approach is based on the feedback for the
QEMU patches to handle machine check exception. Details
of earlier approach of handling machine check exception
in QEMU and related discussions can be found at:

https://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg00813.html

Note:

This patch now directly invokes machine_check_print_event_info()
from kvmppc_handle_exit_hv() to print the event to host console
at the time of guest exit before the exception is passed on to the
guest. Hence, the host-side handling which was performed earlier
via machine_check_fwnmi is removed.

The reasons for this approach is (i) it is not possible
to distinguish whether the exception occurred in the
guest or the host from the pt_regs passed on the
machine_check_exception(). Hence machine_check_exception()
calls panic, instead of passing on the exception to
the guest, if the machine check exception is not
recoverable. (ii) the approach introduced in this
patch gives opportunity to the host kernel to perform
actions in virtual mode before passing on the exception
to the guest. This approach does not require complex
tweaks to machine_check_fwnmi and friends.

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-22 11:24:57 +10:00
David S. Miller
3d09198243 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two entries being added at the same time to the IFLA
policy table, whilst parallel bug fixes to decnet
routing dst handling overlapping with the dst gc removal
in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-21 17:35:22 -04:00
Aravinda Prasad
134764ed6e KVM: PPC: Book3S HV: Add new capability to control MCE behaviour
This introduces a new KVM capability to control how KVM behaves
on machine check exception (MCE) in HV KVM guests.

If this capability has not been enabled, KVM redirects machine check
exceptions to guest's 0x200 vector, if the address in error belongs to
the guest. With this capability enabled, KVM will cause a guest exit
with the exit reason indicating an NMI.

The new capability is required to avoid problems if a new kernel/KVM
is used with an old QEMU, running a guest that doesn't issue
"ibm,nmi-register".  As old QEMU does not understand the NMI exit
type, it treats it as a fatal error.  However, the guest could have
handled the machine check error if the exception was delivered to
guest's 0x200 interrupt vector instead of NMI exit in case of old
QEMU.

[paulus@ozlabs.org - Reworded the commit message to be clearer,
 enable only on HV KVM.]

Signed-off-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-21 13:37:08 +10:00
Nicholas Piggin
8568f1e026 powerpc/64s/paca: EX_CTR is not used with RELOCATABLE=n, remove it
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20 22:22:02 +10:00
Nicholas Piggin
635942ae53 powerpc/64s/paca: EX_R3 can be merged with EX_DAR
EX_R3 is used only for a small section of the bad stack handler.
Merge it with EX_DAR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20 22:22:01 +10:00
Nicholas Piggin
dbeea1d6b4 powerpc/64s/paca: EX_LR can be merged with EX_DAR
EX_LR is used only for a small section of the SLB miss handler.
Merge it with EX_DAR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20 22:22:01 +10:00
Nicholas Piggin
36670fcf01 powerpc/64s/paca: EX_SRR0 is unused, remove it
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20 22:22:00 +10:00
Nicholas Piggin
8c38851415 powerpc/64s: Add EX_SIZE definition for paca exception save areas
Rather than open-coding it 4 times.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move __ASSEMBLY__ guards into head-64.h where they're really needed]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-20 22:22:00 +10:00
Nicholas Piggin
b51351e264 powerpc/64s/idle: Branch to handler with virtual mode offset
Have the system reset idle wakeup handlers branched to in real mode
with the 0xc... kernel address applied. This allows simplifications of
avoiding rfid when switching to virtual mode in the wakeup handler.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19 19:46:28 +10:00
Nicholas Piggin
a9af97aa0a powerpc/64s: msgclr when handling doorbell exceptions from system reset
msgsnd doorbell exceptions are cleared when the doorbell interrupt is
taken. However if a doorbell exception causes a system reset interrupt
wake from power saving state, the message is not cleared. Processing
the doorbell from the system reset interrupt requires msgclr to avoid
taking the exception again.

Testing this plus the previous wakup direct patch gives:

                                original         wakeup direct     msgclr
Different threads, same core:   315k/s           264k/s            345k/s
Different cores:                235k/s           242k/s            242k/s

Net speedup is +10% for same core, and +3% for different core.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19 19:46:27 +10:00
Nicholas Piggin
771d4304d0 powerpc/64s/idle: Process interrupts from system reset wakeup
When the CPU wakes from low power state, it begins at the system reset
interrupt with the exception that caused the wakeup encoded in SRR1.

Today, powernv idle wakeup ignores the wakeup reason (except a special
case for HMI), and the regular interrupt corresponding to the
exception will fire after the idle wakeup exits.

Change this to replay the interrupt from the idle wakeup before
interrupts are hard-enabled.

Test on POWER8 of context_switch selftests benchmark with polling idle
disabled (e.g., always nap, giving cross-CPU IPIs) gives the following
results:

                                original         wakeup direct
Different threads, same core:   315k/s           264k/s
Different cores:                235k/s           242k/s

There is a slowdown for doorbell IPI (same core) case because system
reset wakeup does not clear the message and the doorbell interrupt
fires again needlessly.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19 19:46:27 +10:00
Nicholas Piggin
2201f994a5 powerpc/64s/idle: Move soft interrupt mask logic into C code
This simplifies the asm and fixes irq-off tracing over sleep
instructions.

Also move powersave_nap check for POWER8 into C code, and move
PSSCR register value calculation for POWER9 into C.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-19 19:46:26 +10:00
Paul Mackerras
579006944e KVM: PPC: Book3S HV: Virtualize doorbell facility on POWER9
On POWER9, we no longer have the restriction that we had on POWER8
where all threads in a core have to be in the same partition, so
the CPU threads are now independent.  However, we still want to be
able to run guests with a virtual SMT topology, if only to allow
migration of guests from POWER8 systems to POWER9.

A guest that has a virtual SMT mode greater than 1 will expect to
be able to use the doorbell facility; it will expect the msgsndp
and msgclrp instructions to work appropriately and to be able to read
sensible values from the TIR (thread identification register) and
DPDES (directed privileged doorbell exception status) special-purpose
registers.  However, since each CPU thread is a separate sub-processor
in POWER9, these instructions and registers can only be used within
a single CPU thread.

In order for these instructions to appear to act correctly according
to the guest's virtual SMT mode, we have to trap and emulate them.
We cause them to trap by clearing the HFSCR_MSGP bit in the HFSCR
register.  The emulation is triggered by the hypervisor facility
unavailable interrupt that occurs when the guest uses them.

To cause a doorbell interrupt to occur within the guest, we set the
DPDES register to 1.  If the guest has interrupts enabled, the CPU
will generate a doorbell interrupt and clear the DPDES register in
hardware.  The DPDES hardware register for the guest is saved in the
vcpu->arch.vcore->dpdes field.  Since this gets written by the guest
exit code, other VCPUs wishing to cause a doorbell interrupt don't
write that field directly, but instead set a vcpu->arch.doorbell_request
flag.  This is consumed and set to 0 by the guest entry code, which
then sets DPDES to 1.

Emulating reads of the DPDES register is somewhat involved, because
it requires reading the doorbell pending interrupt status of all of the
VCPU threads in the virtual core, and if any of those VCPUs are
running, their doorbell status is only up-to-date in the hardware
DPDES registers of the CPUs where they are running.  In order to get
a reasonable approximation of the current doorbell status, we send
those CPUs an IPI, causing an exit from the guest which will update
the vcpu->arch.vcore->dpdes field.  We then use that value in
constructing the emulated DPDES register value.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-19 14:34:37 +10:00
Paul Mackerras
3c31352460 KVM: PPC: Book3S HV: Allow userspace to set the desired SMT mode
This allows userspace to set the desired virtual SMT (simultaneous
multithreading) mode for a VM, that is, the number of VCPUs that
get assigned to each virtual core.  Previously, the virtual SMT mode
was fixed to the number of threads per subcore, and if userspace
wanted to have fewer vcpus per vcore, then it would achieve that by
using a sparse CPU numbering.  This had the disadvantage that the
vcpu numbers can get quite large, particularly for SMT1 guests on
a POWER8 with 8 threads per core.  With this patch, userspace can
set its desired virtual SMT mode and then use contiguous vcpu
numbering.

On POWER8, where the threading mode is "strict", the virtual SMT mode
must be less than or equal to the number of threads per subcore.  On
POWER9, which implements a "loose" threading mode, the virtual SMT
mode can be any power of 2 between 1 and 8, even though there is
effectively one thread per subcore, since the threads are independent
and can all be in different partitions.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-19 14:34:20 +10:00
Paul Mackerras
769377f77c KVM: PPC: Book3S HV: Context-switch HFSCR between host and guest on POWER9
This adds code to allow us to use a different value for the HFSCR
(Hypervisor Facilities Status and Control Register) when running the
guest from that which applies in the host.  The reason for doing this
is to allow us to trap the msgsndp instruction and related operations
in future so that they can be virtualized.  We also save the value of
HFSCR when a hypervisor facility unavailable interrupt occurs, because
the high byte of HFSCR indicates which facility the guest attempted to
access.

We save and restore the host value on guest entry/exit because some
bits of it affect host userspace execution.

We only do all this on POWER9, not on POWER8, because we are not
intending to virtualize any of the facilities controlled by HFSCR on
POWER8.  In particular, the HFSCR bit that controls execution of
msgsndp and related operations does not exist on POWER8.  The HFSCR
doesn't exist at all on POWER7.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-19 14:08:02 +10:00
Paul Mackerras
1bc3fe818c KVM: PPC: Book3S HV: Enable guests to use large decrementer mode on POWER9
This allows userspace (e.g. QEMU) to enable large decrementer mode for
the guest when running on a POWER9 host, by setting the LPCR_LD bit in
the guest LPCR value.  With this, the guest exit code saves 64 bits of
the guest DEC value on exit.  Other places that use the guest DEC
value check the LPCR_LD bit in the guest LPCR value, and if it is set,
omit the 32-bit sign extension that would otherwise be done.

This doesn't change the DEC emulation used by PR KVM because PR KVM
is not supported on POWER9 yet.

This is partly based on an earlier patch by Oliver O'Halloran.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-06-19 14:02:04 +10:00
Naveen N. Rao
c05b8c4474 powerpc/kprobes: Skip livepatch_handler() for jprobes
ftrace_caller() depends on a modified regs->nip to detect if a certain
function has been livepatched. However, with KPROBES_ON_FTRACE, it is
possible for regs->nip to have been modified by the kprobes pre_handler
(jprobes, for instance). In this case, we do not want to invoke the
livepatch_handler so as not to consume the livepatch stack.

To distinguish between the two (kprobes and livepatch), we check if
there is an active kprobe on the current function. If there is, then we
know for sure that it must have modified the NIP as we don't support
livepatching a kprobe'd function. In this case, we simply skip the
livepatch_handler and branch to the new NIP. Otherwise, the
livepatch_handler is invoked.

Fixes: ead514d5fb ("powerpc/kprobes: Add support for KPROBES_ON_FTRACE")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-16 19:49:43 +10:00
Alexey Kardashevskiy
a093c92dc7 powerpc/debug: Add missing warn flag to WARN_ON's non-builtin path
When trapped on WARN_ON(), report_bug() is expected to return
BUG_TRAP_TYPE_WARN so the caller will increment NIP by 4 and continue.
The __builtin_constant_p() path of the PPC's WARN_ON()
calls (indirectly) __WARN_FLAGS() which has BUGFLAG_WARNING set,
however the other branch does not which makes report_bug() report a
bug rather than a warning.

Fixes: f26dee1510 ("debug: Avoid setting BUGFLAG_WARNING twice")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-16 16:10:37 +10:00
David S. Miller
0ddead90b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 11:59:32 -04:00
Benjamin Herrenschmidt
25642705b2 powerpc/xive: Fix offset for store EOI MMIOs
Architecturally we should apply a 0x400 offset for these. Not doing
it will break future HW implementations.

The offset of 0 is supposed to remain for "triggers" though not all
sources support both trigger and store EOI, and in P9 specifically,
some sources will treat 0 as a store EOI. But future chips will not.
So this makes us use the properly architected offset which should work
always.

Fixes: 243e25112d ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 23:29:39 +10:00
Nicholas Piggin
07d2a628bc powerpc/64s: Avoid cpabort in context switch when possible
The ISA v3.0B copy-paste facility only requires cpabort when switching
to a process that has foreign real addresses mapped (direct access to
accelerators), to clear a potential copy buffer filled by a previous
thread. There is no accelerator driver implemented yet, so cpabort can
be removed. It can be be re-added when a driver is implemented.

POWER9 DD1 requires the copy buffer to always be cleared on context
switch, but if accelerators are not in use, then an unpaired copy from
a dummy region is sufficient to clear data out of the copy buffer.

This increases context switch performance by about 5% on POWER9.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:34:39 +10:00
Nicholas Piggin
9145effd62 powerpc/64: Drop explicit hwsync in context switch
The sync (aka. hwsync, aka. heavyweight sync) in the context switch
code to prevent MMIO access being reordered from the point of view of
a single process if it gets migrated to a different CPU is not
required because there is an hwsync performed earlier in the context
switch path.

Comment this so it's clear enough if anything changes on the scheduler
or the powerpc sides. Remove the hwsync from _switch.

This improves context switch performance by 2-3% on POWER8.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-15 16:34:39 +10:00
Thomas Falcon
40c9db8ad8 ibmvnic: Client-initiated failover
The IBM vNIC protocol provides support for the user to initiate
a failover from the client LPAR in case the current backing infrastructure
is deemed inadequate or in an error state.

Support for two H_VIOCTL sub-commands for vNIC devices are required
to implement this function. These commands are H_GET_SESSION_TOKEN
and H_SESSION_ERR_DETECTED.

"[H_GET_SESSION_TOKEN] is used to obtain a session token from a VNIC client
adapter.  This token is opaque to the caller and is intended to be used in
tandem with the SESSION_ERROR_DETECTED vioctl subfunction."

"[H_SESSION_ERR_DETECTED] is used to report that the currently active
backing device for a VNIC client adapter is behaving poorly, and that
the hypervisor should attempt to fail over to a different backing device,
if one is available."

To provide tools access to this functionality the vNIC driver creates a
sysfs file that, when written to, will send a request to pHyp to failover
to a different backing device.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-13 12:53:35 -04:00
Aleksa Sarai
54ebbfb160 tty: add TIOCGPTPEER ioctl
When opening the slave end of a PTY, it is not possible for userspace to
safely ensure that /dev/pts/$num is actually a slave (in cases where the
mount namespace in which devpts was mounted is controlled by an
untrusted process). In addition, there are several unresolvable
race conditions if userspace were to attempt to detect attacks through
stat(2) and other similar methods [in addition it is not clear how
userspace could detect attacks involving FUSE].

Resolve this by providing an interface for userpace to safely open the
"peer" end of a PTY file descriptor by using the dentry cached by
devpts. Since it is not possible to have an open master PTY without
having its slave exposed in /dev/pts this interface is safe. This
interface currently does not provide a way to get the master pty (since
it is not clear whether such an interface is safe or even useful).

Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Valentin Rothberg <vrothberg@suse.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-09 12:27:54 +02:00
Aneesh Kumar K.V
92d9dfda8b powerpc/mm/4k: Limit 4k page size config to 64TB virtual address space
Supporting 512TB requires us to do a order 3 allocation for level 1 page
table (pgd). This results in page allocation failures with certain workloads.
For now limit 4k linux page size config to 64TB.

Fixes: f6eedbba7a ("powerpc/mm/hash: Increase VA range to 128TB")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-08 20:42:56 +10:00
David S. Miller
216fe8f021 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just some simple overlapping changes in marvell PHY driver
and the DSA core code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 22:20:08 -04:00
Michael Ellerman
ba4a648f12 powerpc/numa: Fix percpu allocations to be NUMA aware
In commit 8c27226119 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we
switched to the generic implementation of cpu_to_node(), which uses a percpu
variable to hold the NUMA node for each CPU.

Unfortunately we neglected to notice that we use cpu_to_node() in the allocation
of our percpu areas, leading to a chicken and egg problem. In practice what
happens is when we are setting up the percpu areas, cpu_to_node() reports that
all CPUs are on node 0, so we allocate all percpu areas on node 0.

This is visible in the dmesg output, as all pcpu allocs being in group 0:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31
  pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39
  pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47

To fix it we need an early_cpu_to_node() which can run prior to percpu being
setup. We already have the numa_cpu_lookup_table we can use, so just plumb it
in. With the patch dmesg output shows two groups, 0 and 1:

  pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07
  pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15
  pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23
  pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31
  pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39
  pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47

We can also check the data_offset in the paca of various CPUs, with the fix we
see:

  CPU 0:  data_offset = 0x0ffe8b0000
  CPU 24: data_offset = 0x1ffe5b0000

And we can see from dmesg that CPU 24 has an allocation on node 1:

  node   0: [mem 0x0000000000000000-0x0000000fffffffff]
  node   1: [mem 0x0000001000000000-0x0000001fffffffff]

Cc: stable@vger.kernel.org # v3.16+
Fixes: 8c27226119 ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-06 21:19:46 +10:00
Nicholas Piggin
90df4bfb4d powerpc/64s: Machine check handle ifetch from foreign real address for POWER9
The i-side 0111b machine check, which is "Instruction Fetch to foreign
address space", was missed by 7b9f71f974 ("powerpc/64s: POWER9 machine
check handler").

    The POWER9 processor core considers host real addresses with a
    nonzero value in RA(8:12) as foreign address space, accessible only
    by the copy and paste instructions. The copy and paste instruction
    pair can be used to invoke the Nest accelerators via the Virtual
    Accelerator Switchboard (VAS).

It is an error for any regular load/store or ifetch to go to a foreign
addresses. When relocation is on, this causes an MMU exception. When
relocation is off, a machine check exception. It is possible to trigger
this machine check by branching to a foreign address with MSR[IR]=0.

Fixes: 7b9f71f974 ("powerpc/64s: POWER9 machine check handler")
Reported-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-06 21:17:15 +10:00
Christophe Leroy
4386c096c2 powerpc/mm: Rename map_page() to map_kernel_page() on 32-bit
These two functions implement the same semantics, so unify their naming so we
can share code that calls them. The longer name is more descriptive so use it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 19:59:03 +10:00
Balbir Singh
abd667be15 powerpc/mm/book(e)(3s)/32: Add page table accounting
Add support in pte_alloc_one() and pgd_alloc() by
passing __GFP_ACCOUNT in the flags

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 19:03:11 +10:00
Balbir Singh
de3b87611d powerpc/mm/book(e)(3s)/64: Add page table accounting
Introduce a helper pgtable_gfp_flags() which
just returns the current gfp flags and adds
__GFP_ACCOUNT to account for page table allocation.
The generic helper is added to include/asm/pgalloc.h
and has two variants - WARNING ugly bits ahead

1. If the header is included from a module, no check
for mm == &init_mm is done, since init_mm is not
exported
2. For kernel includes, the check is done and required
see (3e79ec7 arch: x86: charge page tables to kmemcg)

The fundamental assumption is that no module should be
doing pgd/pud/pmd and pte alloc's on behalf of init_mm
directly.

NOTE: This adds an overhead to pmd/pud/pgd allocations
similar to x86.  The other alternative was to implement
pmd_alloc_kernel/pud_alloc_kernel and pgd_alloc_kernel
with their offset variants.

For 4k page size, pte_alloc_one no longer calls
pte_alloc_one_kernel.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-05 19:03:10 +10:00
Andrew Jones
2387149ead KVM: improve arch vcpu request defining
Marc Zyngier suggested that we define the arch specific VCPU request
base, rather than requiring each arch to remember to start from 8.
That suggestion, along with Radim Krcmar's recent VCPU request flag
addition, snowballed into defining something of an arch VCPU request
defining API.

No functional change.

(Looks like x86 is running out of arch VCPU request bits.  Maybe
 someday we'll need to extend to 64.)

Signed-off-by: Andrew Jones <drjones@redhat.com>
Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
2017-06-04 16:53:00 +02:00
Hari Bathini
48a316e350 powerpc/fadump: Set an upper limit for boot memory size
By default, 5% of system RAM is reserved for preserving boot memory.
Alternatively, a user can specify the amount of memory to reserve.
See Documentation/powerpc/firmware-assisted-dump.txt for details. In
addition to the memory reserved for preserving boot memory, some more
memory is reserved, to save HPTE region, CPU state data and ELF core
headers.

Memory Reservation during first kernel looks like below:

  Low memory                                        Top of memory
  0      boot memory size                                       |
  |           |                       |<--Reserved dump area -->|
  V           V                       |   Permanent Reservation V
  +-----------+----------/ /----------+---+----+-----------+----+
  |           |                       |CPU|HPTE|  DUMP     |ELF |
  +-----------+----------/ /----------+---+----+-----------+----+
        |                                           ^
        |                                           |
        \                                           /
         -------------------------------------------
          Boot memory content gets transferred to
          reserved area by firmware at the time of
          crash

This implicitly means that the sum of the sizes of boot memory, CPU
state data, HPTE region, DUMP preserving area and ELF core headers
can't be greater than the total memory size. But currently, a user is
allowed to specify any value as boot memory size. So, the above rule
is violated when a boot memory size around 50% of the total available
memory is specified. As the kernel is not handling this currently, it
may lead to undefined behavior. Fix it by setting an upper limit for
boot memory size to 25% of the total available memory. Also, instead
of using memblock_end_of_DRAM(), which doesn't take the holes, if any,
in the memory layout into account, use memblock_phys_mem_size() to
calculate the percentage of total available memory.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 20:16:50 +10:00
Christophe Leroy
f782ddf297 powerpc: Remove __ilog2()s and use generic ones
With the __ilog2() function as defined in
arch/powerpc/include/asm/bitops.h, GCC will not optimise the code
in case of constant parameter.

The generic ilog2() function in include/linux/log2.h is written
to handle the case of the constant parameter.

This patch discards the three __ilog2() functions and
defines __ilog2() as ilog2()

For non constant calls, the generated code is doing the same:
int test__ilog2(unsigned long x)
{
	return __ilog2(x);
}

int test__ilog2_u32(u32 n)
{
	return __ilog2_u32(n);
}

int test__ilog2_u64(u64 n)
{
	return __ilog2_u64(n);
}

On PPC32 before the patch:
00000000 <test__ilog2>:
   0:	7c 63 00 34 	cntlzw  r3,r3
   4:	20 63 00 1f 	subfic  r3,r3,31
   8:	4e 80 00 20 	blr

0000000c <test__ilog2_u32>:
   c:	7c 63 00 34 	cntlzw  r3,r3
  10:	20 63 00 1f 	subfic  r3,r3,31
  14:	4e 80 00 20 	blr

On PPC32 after the patch:
00000000 <test__ilog2>:
   0:	7c 63 00 34 	cntlzw  r3,r3
   4:	20 63 00 1f 	subfic  r3,r3,31
   8:	4e 80 00 20 	blr

0000000c <test__ilog2_u32>:
   c:	7c 63 00 34 	cntlzw  r3,r3
  10:	20 63 00 1f 	subfic  r3,r3,31
  14:	4e 80 00 20 	blr

On PPC64 before the patch:
0000000000000000 <.test__ilog2>:
   0:	7c 63 00 74 	cntlzd  r3,r3
   4:	20 63 00 3f 	subfic  r3,r3,63
   8:	7c 63 07 b4 	extsw   r3,r3
   c:	4e 80 00 20 	blr

0000000000000010 <.test__ilog2_u32>:
  10:	7c 63 00 34 	cntlzw  r3,r3
  14:	20 63 00 1f 	subfic  r3,r3,31
  18:	7c 63 07 b4 	extsw   r3,r3
  1c:	4e 80 00 20 	blr

0000000000000020 <.test__ilog2_u64>:
  20:	7c 63 00 74 	cntlzd  r3,r3
  24:	20 63 00 3f 	subfic  r3,r3,63
  28:	7c 63 07 b4 	extsw   r3,r3
  2c:	4e 80 00 20 	blr

On PPC64 after the patch:
0000000000000000 <.test__ilog2>:
   0:	7c 63 00 74 	cntlzd  r3,r3
   4:	20 63 00 3f 	subfic  r3,r3,63
   8:	7c 63 07 b4 	extsw   r3,r3
   c:	4e 80 00 20 	blr

0000000000000010 <.test__ilog2_u32>:
  10:	7c 63 00 34 	cntlzw  r3,r3
  14:	20 63 00 1f 	subfic  r3,r3,31
  18:	7c 63 07 b4 	extsw   r3,r3
  1c:	4e 80 00 20 	blr

0000000000000020 <.test__ilog2_u64>:
  20:	7c 63 00 74 	cntlzd  r3,r3
  24:	20 63 00 3f 	subfic  r3,r3,63
  28:	7c 63 07 b4 	extsw   r3,r3
  2c:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:23:56 +10:00
Christophe Leroy
22ef33b368 powerpc: Replace ffz() by equivalent generic function
With the ffz() function as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter.

This patch replaces ffz() by the generic function.

The generic ffz(x) expects to never be called with ~x == 0
as written in the comment in include/asm-generic/bitops/ffz.h
The only user of ffz() within arch/powerpc/ is
platforms/512x/mpc5121_ads_cpld.c, which checks if x is not 0xff

For non constant calls, the generated code is doing the same:

unsigned long testffz(unsigned long x)
{
	return ffz(x);
}

On PPC32, before the patch:
00000018 <testffz>:
  18:	7c 63 18 f9 	not.    r3,r3
  1c:	40 82 00 0c 	bne     28 <testffz+0x10>
  20:	38 60 00 20 	li      r3,32
  24:	4e 80 00 20 	blr
  28:	7d 23 00 d0 	neg     r9,r3
  2c:	7d 23 18 38 	and     r3,r9,r3
  30:	7c 63 00 34 	cntlzw  r3,r3
  34:	20 63 00 1f 	subfic  r3,r3,31
  38:	4e 80 00 20 	blr

On PPC32, after the patch:
00000018 <testffz>:
  18:	39 23 00 01 	addi    r9,r3,1
  1c:	7d 23 18 78 	andc    r3,r9,r3
  20:	7c 63 00 34 	cntlzw  r3,r3
  24:	20 63 00 1f 	subfic  r3,r3,31
  28:	4e 80 00 20 	blr

On PPC64, before the patch:
0000000000000030 <.testffz>:
  30:	7c 60 18 f9 	not.    r0,r3
  34:	38 60 00 40 	li      r3,64
  38:	4d 82 00 20 	beqlr
  3c:	7c 60 00 d0 	neg     r3,r0
  40:	7c 63 00 38 	and     r3,r3,r0
  44:	7c 63 00 74 	cntlzd  r3,r3
  48:	20 63 00 3f 	subfic  r3,r3,63
  4c:	7c 63 07 b4 	extsw   r3,r3
  50:	4e 80 00 20 	blr

On PPC64, after the patch:
0000000000000030 <.testffz>:
  30:	38 03 00 01 	addi    r0,r3,1
  34:	7c 03 18 78 	andc    r3,r0,r3
  38:	7c 63 00 74 	cntlzd  r3,r3
  3c:	20 63 00 3f 	subfic  r3,r3,63
  40:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:23:55 +10:00
Christophe Leroy
2fcff790dc powerpc: Use builtin functions for fls()/__fls()/fls64()
With the fls() functions as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter.

This patch replaces __fls() by the builtin function, and modifies
fls() and fls64() to use builtins instead of inline assembly

For non constant calls, the generated code is doing the same:

int testfls(unsigned int x)
{
	return fls(x);
}

unsigned long test__fls(unsigned long x)
{
	return __fls(x);
}

int testfls64(__u64 x)
{
	return fls64(x);
}

On PPC32, before the patch:
00000064 <testfls>:
  64:	7c 63 00 34 	cntlzw  r3,r3
  68:	20 63 00 20 	subfic  r3,r3,32
  6c:	4e 80 00 20 	blr

00000070 <test__fls>:
  70:	7c 63 00 34 	cntlzw  r3,r3
  74:	20 63 00 1f 	subfic  r3,r3,31
  78:	4e 80 00 20 	blr

0000007c <testfls64>:
  7c:	2c 03 00 00 	cmpwi   r3,0
  80:	40 82 00 10 	bne     90 <testfls64+0x14>
  84:	7c 83 00 34 	cntlzw  r3,r4
  88:	20 63 00 20 	subfic  r3,r3,32
  8c:	4e 80 00 20 	blr
  90:	7c 63 00 34 	cntlzw  r3,r3
  94:	20 63 00 40 	subfic  r3,r3,64
  98:	4e 80 00 20 	blr

On PPC32, after the patch:
00000054 <testfls>:
  54:	7c 63 00 34 	cntlzw  r3,r3
  58:	20 63 00 20 	subfic  r3,r3,32
  5c:	4e 80 00 20 	blr

00000060 <test__fls>:
  60:	7c 63 00 34 	cntlzw  r3,r3
  64:	20 63 00 1f 	subfic  r3,r3,31
  68:	4e 80 00 20 	blr

0000006c <testfls64>:
  6c:	2c 03 00 00 	cmpwi   r3,0
  70:	41 82 00 10 	beq     80 <testfls64+0x14>
  74:	7c 63 00 34 	cntlzw  r3,r3
  78:	20 63 00 40 	subfic  r3,r3,64
  7c:	4e 80 00 20 	blr
  80:	7c 83 00 34 	cntlzw  r3,r4
  84:	20 63 00 40 	subfic  r3,r3,32
  88:	4e 80 00 20 	blr

On PPC64, before the patch:
00000000000000a0 <.testfls>:
  a0:	7c 63 00 34 	cntlzw  r3,r3
  a4:	20 63 00 20 	subfic  r3,r3,32
  a8:	7c 63 07 b4 	extsw   r3,r3
  ac:	4e 80 00 20 	blr

00000000000000b0 <.test__fls>:
  b0:	7c 63 00 74 	cntlzd  r3,r3
  b4:	20 63 00 3f 	subfic  r3,r3,63
  b8:	7c 63 07 b4 	extsw   r3,r3
  bc:	4e 80 00 20 	blr

00000000000000c0 <.testfls64>:
  c0:	7c 63 00 74 	cntlzd  r3,r3
  c4:	20 63 00 40 	subfic  r3,r3,64
  c8:	7c 63 07 b4 	extsw   r3,r3
  cc:	4e 80 00 20 	blr

On PPC64, after the patch:
0000000000000090 <.testfls>:
  90:	7c 63 00 34 	cntlzw  r3,r3
  94:	20 63 00 20 	subfic  r3,r3,32
  98:	7c 63 07 b4 	extsw   r3,r3
  9c:	4e 80 00 20 	blr

00000000000000a0 <.test__fls>:
  a0:	7c 63 00 74 	cntlzd  r3,r3
  a4:	20 63 00 3f 	subfic  r3,r3,63
  a8:	4e 80 00 20 	blr
  ac:	60 00 00 00 	nop

00000000000000b0 <.testfls64>:
  b0:	7c 63 00 74 	cntlzd  r3,r3
  b4:	20 63 00 40 	subfic  r3,r3,64
  b8:	7c 63 07 b4 	extsw   r3,r3
  bc:	4e 80 00 20 	blr

Those builtins have been in GCC since at least 3.4.6 (see
https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html )

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:23:55 +10:00
Christophe Leroy
f83647d642 powerpc: Discard ffs()/__ffs() function and use builtin functions instead
With the ffs() function as defined in arch/powerpc/include/asm/bitops.h
GCC will not optimise the code in case of constant parameter, as shown
by the small exemple below.

int ffs_test(void)
{
	return 4 << ffs(31);
}

c0012334 <ffs_test>:
c0012334:       39 20 00 01     li      r9,1
c0012338:       38 60 00 04     li      r3,4
c001233c:       7d 29 00 34     cntlzw  r9,r9
c0012340:       21 29 00 20     subfic  r9,r9,32
c0012344:       7c 63 48 30     slw     r3,r3,r9
c0012348:       4e 80 00 20     blr

With this patch, the same function will compile as follows:

c0012334 <ffs_test>:
c0012334:       38 60 00 08     li      r3,8
c0012338:       4e 80 00 20     blr

The same happens with __ffs()

For non constant calls, the generated code is doing the same,
allthought it is slightly different on 64 bits for ffs():

unsigned long test__ffs(unsigned long x)
{
	return __ffs(x);
}

int testffs(int x)
{
	return ffs(x);
}

On PPC32, before the patch:
0000003c <test__ffs>:
  3c:	7d 23 00 d0 	neg     r9,r3
  40:	7d 23 18 38 	and     r3,r9,r3
  44:	7c 63 00 34 	cntlzw  r3,r3
  48:	20 63 00 1f 	subfic  r3,r3,31
  4c:	4e 80 00 20 	blr

00000050 <testffs>:
  50:	7d 23 00 d0 	neg     r9,r3
  54:	7d 23 18 38 	and     r3,r9,r3
  58:	7c 63 00 34 	cntlzw  r3,r3
  5c:	20 63 00 20 	subfic  r3,r3,32
  60:	4e 80 00 20 	blr

On PPC32, after the patch:
0000002c <test__ffs>:
  2c:	7d 23 00 d0 	neg     r9,r3
  30:	7d 23 18 38 	and     r3,r9,r3
  34:	7c 63 00 34 	cntlzw  r3,r3
  38:	20 63 00 1f 	subfic  r3,r3,31
  3c:	4e 80 00 20 	blr

00000040 <testffs>:
  40:	7d 23 00 d0 	neg     r9,r3
  44:	7d 23 18 38 	and     r3,r9,r3
  48:	7c 63 00 34 	cntlzw  r3,r3
  4c:	20 63 00 20 	subfic  r3,r3,32
  50:	4e 80 00 20 	blr

On PPC64, before the patch:
0000000000000060 <.test__ffs>:
  60:	7c 03 00 d0 	neg     r0,r3
  64:	7c 03 18 38 	and     r3,r0,r3
  68:	7c 63 00 74 	cntlzd  r3,r3
  6c:	20 63 00 3f 	subfic  r3,r3,63
  70:	7c 63 07 b4 	extsw   r3,r3
  74:	4e 80 00 20 	blr

0000000000000080 <.testffs>:
  80:	7c 03 00 d0 	neg     r0,r3
  84:	7c 03 18 38 	and     r3,r0,r3
  88:	7c 63 00 74 	cntlzd  r3,r3
  8c:	20 63 00 40 	subfic  r3,r3,64
  90:	7c 63 07 b4 	extsw   r3,r3
  94:	4e 80 00 20 	blr

On PPC64, after the patch:
0000000000000050 <.test__ffs>:
  50:	7c 03 00 d0 	neg     r0,r3
  54:	7c 03 18 38 	and     r3,r0,r3
  58:	7c 63 00 74 	cntlzd  r3,r3
  5c:	20 63 00 3f 	subfic  r3,r3,63
  60:	4e 80 00 20 	blr

0000000000000070 <.testffs>:
  70:	7c 03 00 d0 	neg     r0,r3
  74:	7c 03 18 38 	and     r3,r0,r3
  78:	7c 63 00 34 	cntlzw  r3,r3
  7c:	20 63 00 20 	subfic  r3,r3,32
  80:	7c 63 07 b4 	extsw   r3,r3
  84:	4e 80 00 20 	blr
(ffs() operates on an int so cntlzw is equivalent to cntlzd)

In addition, when reading the generated vmlinux, we can observe
that with the builtin functions, GCC sometimes efficiently spreads
the instructions within the generated functions while the inline
assembly force them to remain grouped together.

__builtin_ffs() is already used in arch/powerpc/include/asm/page_32.h

Those builtins have been in GCC since at least 3.4.6 (see
https://gcc.gnu.org/onlinedocs/gcc-3.4.6/gcc/Other-Builtins.html )

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-02 19:23:54 +10:00
Stephen Rothwell
042cc40934 powerpc: use asm-generic/socket.h as much as possible
asm-generic/socket.h already has an exception for the differences that
powerpc needs, so just include it after defining the differences.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-01 14:48:05 -04:00
Michael Ellerman
0e5e7f5e97 powerpc/64: Reclaim CPU_FTR_SUBCORE
We are running low on CPU feature bits, so we only want to use them when
it's really necessary.

CPU_FTR_SUBCORE is only used in one place, and only in C, so we don't
need it in order to make asm patching work. It can only be set on
"Power8" CPUs, which in practice means POWER8, POWER8E and POWER8NVL.
There are no plans to implement it on future CPUs, but if there ever
were we could retrofit it then.

Although KVM uses subcores, it never looks at the CPU feature, it either
looks at the ISA level or the threads_per_subcore value.

So drop the CPU feature and do a PVR check instead. Drop the device tree
"subcore" feature as we no longer support doing anything with it, and we
will drop it from skiboot too.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-06-01 19:56:28 +10:00
Nicholas Piggin
c494adefef powerpc/64: Tool to check head sections location sanity
Use a tool to check that the location of "fixed sections" are where
we expected them to be, which catches cases the linker script can't
(stubs being added to start of .text section), and which ends up
being neater.

Sample output:

  ERROR: start_text address is c000000000008100, should be c000000000008000
  ERROR: see comments in arch/powerpc/tools/head_check.sh

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fold in fix from Nick for 4.6 era toolchains]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Nicholas Piggin
951eedebcd powerpc/64: Handle linker stubs in low .text code
Very large kernels may require linker stubs for branches from HEAD
text code. The linker may place these stubs before the HEAD text
sections, which breaks the assumption that HEAD text is located at 0
(or the .text section being located at 0x7000/0x8000 on Book3S
kernels).

Provide an option to create a small section just before the .text
section with an empty 256 - 4 bytes, and adjust the start of the .text
section to match. The linker will tend to put stubs in that section
and not break our relative-to-absolute offset assumptions.

This causes a small waste of space on common kernels, but allows large
kernels to build and boot. For now, it is an EXPERT config option,
defaulting to =n, but a reference is provided for it in the build-time
check for such breakage. This is good enough for allyesconfig and
custom users / hackers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Stephen Rothwell
308d263d3f powerpc: Use uapi/asm-generic/sockios.h
The arch version is identical except for comments and white space.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Stephen Rothwell
b87901e6ec powerpc: Use the asm-generic versions of some uapi includes
These are completely obvious as all they do is include the asm-generic
versions.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Gautham R. Shenoy
22c6663dc6 powerpc/powernv/idle: Use Requested Level for restoring state on P9 DD1
On Power9 DD1 due to a hardware bug the Power-Saving Level Status
field (PLS) of the PSSCR for a thread waking up from a deep state can
under-report if some other thread in the core is in a shallow stop
state. The scenario in which this can manifest is as follows:

   1) All the threads of the core are in deep stop.
   2) One of the threads is woken up. The PLS for this thread will
      correctly reflect that it is waking up from deep stop.
   3) The thread that has woken up now executes a shallow stop.
   4) When some other thread in the core is woken, its PLS will reflect
      the shallow stop state.

Thus, the subsequent thread for which the PLS is under-reporting the
wakeup state will not restore the hypervisor resources.

Hence, on DD1 systems, use the Requested Level (RL) field as a
workaround to restore the contents of the hypervisor resources on the
wakeup from the stop state.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Nicholas Piggin
f1fe525201 powerpc/64s: Fix FIXUP_ENDIAN non-maskable interrupt reentrancy
FIXUP_ENDIAN uses SRR[01] with MSR_RI=1, which gets corrupted if there
is an interleaving system reset or machine check interrupt.

Set MSR_RI=0 before setting SRRs. The rfid will restore MSR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-30 14:59:51 +10:00
Al Viro
613763a1f0 take compat_sys_old_getrlimit() to native syscall
... and sanitize the ifdefs in there

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-27 15:38:06 -04:00
Nicholas Piggin
a4700a2610 powerpc: Add PPC_FEATURE userspace bits for SCV and DARN instructions
Providing "scv" support to userspace requires kernel support, so it
must be advertised as independently to the base ISA 3 instruction set.

The darn instruction relies on firmware enablement, so it has been
decided to split this out from the core ISA 3 feature as well.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-25 23:07:45 +10:00
David S. Miller
218b6a5b23 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-05-22 23:32:48 -04:00
David S. Miller
1c4f676a68 net: Define SCM_TIMESTAMPING_PKTINFO on all architectures.
A definition was only provided for asm-generic/socket.h
using platforms, define it for the others as well

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-05-21 23:13:37 -04:00
Michael Ellerman
e41e53cd4f powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash
virt_addr_valid() is supposed to tell you if it's OK to call virt_to_page() on
an address. What this means in practice is that it should only return true for
addresses in the linear mapping which are backed by a valid PFN.

We are failing to properly check that the address is in the linear mapping,
because virt_to_pfn() will return a valid looking PFN for more or less any
address. That bug is actually caused by __pa(), used in virt_to_pfn().

eg: __pa(0xc000000000010000) = 0x10000  # Good
    __pa(0xd000000000010000) = 0x10000  # Bad!
    __pa(0x0000000000010000) = 0x10000  # Bad!

This started happening after commit bdbc29c19b ("powerpc: Work around gcc
miscompilation of __pa() on 64-bit") (Aug 2013), where we changed the definition
of __pa() to work around a GCC bug. Prior to that we subtracted PAGE_OFFSET from
the value passed to __pa(), meaning __pa() of a 0xd or 0x0 address would give
you something bogus back.

Until we can verify if that GCC bug is no longer an issue, or come up with
another solution, this commit does the minimal fix to make virt_addr_valid()
work, by explicitly checking that the address is in the linear mapping region.

Fixes: bdbc29c19b ("powerpc: Work around gcc miscompilation of __pa() on 64-bit")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Tested-by: Breno Leitao <breno.leitao@gmail.com>
2017-05-19 13:04:35 +10:00
Al Viro
8298525839 kill strlen_user()
no callers, no consistent semantics, no sane way to use it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-15 23:40:22 -04:00
Michael Ellerman
43e24e82f3 powerpc/modules: If mprofile-kernel is enabled add it to vermagic
On powerpc we can build the kernel with two different ABIs for mcount(), which
is used by ftrace. Kernels built with one ABI do not know how to load modules
built with the other ABI. The new style ABI is called "mprofile-kernel", for
want of a better name.

Currently if we build a module using the old style ABI, and the kernel with
mprofile-kernel, when we load the module we'll oops something like:

  # insmod autofs4-no-mprofile-kernel.ko
  ftrace-powerpc: Unexpected instruction f8810028 around bl _mcount
  ------------[ cut here ]------------
  WARNING: CPU: 6 PID: 3759 at ../kernel/trace/ftrace.c:2024 ftrace_bug+0x2b8/0x3c0
  CPU: 6 PID: 3759 Comm: insmod Not tainted 4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269 #11
  ...
  NIP [c0000000001eaa48] ftrace_bug+0x2b8/0x3c0
  LR [c0000000001eaff8] ftrace_process_locs+0x4a8/0x590
  Call Trace:
    alloc_pages_current+0xc4/0x1d0 (unreliable)
    ftrace_process_locs+0x4a8/0x590
    load_module+0x1c8c/0x28f0
    SyS_finit_module+0x110/0x140
    system_call+0x38/0xfc
  ...
  ftrace failed to modify
  [<d000000002a31024>] 0xd000000002a31024
   actual:   35:65:00:48

We can avoid this by including in the vermagic whether the kernel/module was
built with mprofile-kernel. Which results in:

  # insmod autofs4-pg.ko
  autofs4: version magic
  '4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269 SMP mod_unload modversions '
  should be
  '4.11.0-rc3-gcc-5.4.1-00017-g5a61ef74f269-dirty SMP mod_unload modversions mprofile-kernel'
  insmod: ERROR: could not insert module autofs4-pg.ko: Invalid module format

Fixes: 8c50b72a3b ("powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-15 19:31:38 +10:00
Linus Torvalds
dc2a248166 powerpc updates for 4.12 part 2
Highlights include:
 
  - rework the Linux page table geometry to lower memory usage on 64-bit Book3S
    (IBM chips) using the Hash MMU.
 
  - support for a new device tree binding for discovering CPU features on future
    firmwares.
 
  - Freescale updates from Scott: "Includes a fix for a powerpc/next mm regression
    on 64e, a fix for a kernel hang on 64e when using a debugger inside a
    relocated kernel, a qman fix, and misc qe improvements."
 
 Thanks to:
   Christophe Leroy, Gavin Shan, Horia Geantă, LiuHailong, Nicholas Piggin, Roy
   Pledge, Scott Wood, Valentin Longchamp.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZFXjPAAoJEFHr6jzI4aWAgG4QAJoF7G5Txj0Du2I2/wQDkVq1
 InJ+BNji0xnOrFpz2EcIIlbIwBeJbY9cSIbmKUEPQU4hxtQgI8Q5WNEl2btWq8xz
 I0Ej3uc5obc9ltUdQoGxgXih/XDd8UN3fscSE2/SSuPY/A7JwAVZMsCEJ1tWdxpM
 hx+R9wlaUT3I6jmQwj9gg6zuBdIOL5szvZXKh9ruPKNyZWbPmPSUwIqiyT0YHsiD
 01OZsFYpdSH6Ka/eNHSNx5HC+kK8aDVaqd5E2fkHeH9+sxerpEzMo2PmK4T8vChh
 mSD4nhfqRwC2WRpPF/MY+zGBeXrFkCkR+nYhaqVDXXACKzfHgU58NOfvrmtRj52X
 vTW+cn92wqFTmi0TNUfhEFt8elcOO7/fKh1OVhsFx+bD+bgj8G1ZkLoBU/0QUzRf
 R4hiKKuOMnDHriNPdlAOKjHpR+ewh8Q679INThEJzEQpn7VBY72hcQwapQ3MjMnd
 E7LfsGwqGPkTc6gy1bFbWum5HMGOcmE0qkrnZo5VyFhNNwBs1Kx/B1GHjUOiucVu
 km5GEVNTfCkZqeabdca7fwbGcMH7zchR1ootqH2m18PZJAzr85A+aTqfrdJ5fDBs
 v/nznfcPVNEgvEW0im2jhpPoAlQE6/YvYa+kG4zjjxWA5FKVKdTzINexD82jlcqP
 +fDtIDxNcFkzlt4gacjh
 =YOQs
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull more powerpc updates from Michael Ellerman:
 "The change to the Linux page table geometry was delayed for more
  testing with 16G pages, and there's the new CPU features stuff which
  just needed one more polish before going in. Plus a few changes from
  Scott which came in a bit late. And then various fixes, mostly minor.

  Summary highlights:

   - rework the Linux page table geometry to lower memory usage on
     64-bit Book3S (IBM chips) using the Hash MMU.

   - support for a new device tree binding for discovering CPU features
     on future firmwares.

   - Freescale updates from Scott:
      "Includes a fix for a powerpc/next mm regression on 64e, a fix for
       a kernel hang on 64e when using a debugger inside a relocated
       kernel, a qman fix, and misc qe improvements."

  Thanks to: Christophe Leroy, Gavin Shan, Horia Geantă, LiuHailong,
  Nicholas Piggin, Roy Pledge, Scott Wood, Valentin Longchamp"

* tag 'powerpc-4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Support new device tree binding for discovering CPU features
  powerpc: Don't print cpu_spec->cpu_name if it's NULL
  of/fdt: introduce of_scan_flat_dt_subnodes and of_get_flat_dt_phandle
  powerpc/64s: Fix unnecessary machine check handler relocation branch
  powerpc/mm/book3s/64: Rework page table geometry for lower memory usage
  powerpc: Fix distclean with Makefile.postlink
  powerpc/64e: Don't place the stack beyond TASK_SIZE
  powerpc/powernv: Block PCI config access on BCM5718 during EEH recovery
  powerpc/8xx: Adding support of IRQ in MPC8xx GPIO
  soc/fsl/qbman: Disable IRQs for deferred QBMan work
  soc/fsl/qe: add EXPORT_SYMBOL for the 2 qe_tdm functions
  soc/fsl/qe: only apply QE_General4 workaround on affected SoCs
  soc/fsl/qe: round brg_freq to 1kHz granularity
  soc/fsl/qe: get rid of immrbar_virt_to_phys()
  net: ethernet: ucc_geth: fix MEM_PART_MURAM mode
  powerpc/64e: Fix hang when debugging programs with relocated kernel
2017-05-12 10:04:09 -07:00
Linus Torvalds
791a9a666d Kbuild UAPI header export updates for v4.12
Improvement of headers_install by Nicolas Dichtel.
 
 It has been long since the introduction of uapi directories,
 but the de-coupling of exported headers has not been completed.
 Headers listed in header-y are exported whether they exist in
 uapi directories or not.  His work fixes this inconsistency.
 
 All (and only) headers under uapi directories are now exported.
 The asm-generic wrappers are still exceptions, but this is a big
 step forward.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZE7MBAAoJED2LAQed4NsGroAP/iARejrIFmxuH96D5h2aiP1j
 c8KHQ+5fuq4w2KBmfbfkNvWbazlVheT6RrYWBUh/GABGsSqQC07d8New6B8TaUkE
 K0E48RsuYxouP18Ys6BOO4/zyRhEFD7Ta72PGQ/gDQY+6hAu4jYQnMdG0wipTblS
 QWgnUxTqfCbTjnRpRKXpcwRff+OeTWtOv3s0V8UashJUxnFVQ7Br2uRsm/KKkU/k
 jQC65KyHL4HlsFeeAiMmQ9IQPVwLsd6+d5crs0nydHaJ2XrFlNNQ7EEMyG8FxPdx
 9b/VpS+XY6DO+jeqkcpFrdL9IgcmCn72Qc5/4vrHuQO2dpWW5mVaVPq9RAGP0Yq/
 FB0vZRTp/tOIkD+0esirZW2gJtU3DWMY1A9rc5jjLRabdnRXVTdLfhEnksYJEfES
 yPbDEuKyzo6a+zBSqNtMquJPmYVYEDS2mcmgxY5sB58qtXkUN2Yr+uUALxC8XhXW
 SHHwIAV3a+UX5ZU9Ys8dp2hI4EXYXtdvsz2zvl4qPIn/Q9d1YoEJRe7/Y0p8gBXM
 5pVJ1yohKoYrNZVGBe0LO/gHGVAVgMj0cKn0Xg51bbvjxY2U5djUbMY0uw1gFrrM
 O9ld3C6O8zH5BsExCfwp9iPz2SW5W9N80kgnKfjCHBRUKuMTkm02DJf8Hx+pyfVQ
 DCy9lYTi76IgZ1uflKq9
 =Rqdo
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-uapi-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild UAPI updates from Masahiro Yamada:
 "Improvement of headers_install by Nicolas Dichtel.

  It has been long since the introduction of uapi directories, but the
  de-coupling of exported headers has not been completed. Headers listed
  in header-y are exported whether they exist in uapi directories or
  not. His work fixes this inconsistency.

  All (and only) headers under uapi directories are now exported. The
  asm-generic wrappers are still exceptions, but this is a big step
  forward"

* tag 'kbuild-uapi-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  arch/include: remove empty Kbuild files
  uapi: export all arch specifics directories
  uapi: export all headers under uapi directories
  smc_diag.h: fix include from userland
  btrfs_tree.h: fix include from userland
  uapi: includes linux/types.h before exporting files
  Makefile.headersinst: remove destination-y option
  Makefile.headersinst: cleanup input files
  x86: stop exporting msr-index.h to userland
  nios2: put setup.h in uapi
  h8300: put bitsperlong.h in uapi
2017-05-10 20:45:36 -07:00
Nicolas Dichtel
fcc8487d47 uapi: export all headers under uapi directories
Regularly, when a new header is created in include/uapi/, the developer
forgets to add it in the corresponding Kbuild file. This error is usually
detected after the release is out.

In fact, all headers under uapi directories should be exported, thus it's
useless to have an exhaustive list.

After this patch, the following files, which were not exported, are now
exported (with make headers_install_all):
asm-arc/kvm_para.h
asm-arc/ucontext.h
asm-blackfin/shmparam.h
asm-blackfin/ucontext.h
asm-c6x/shmparam.h
asm-c6x/ucontext.h
asm-cris/kvm_para.h
asm-h8300/shmparam.h
asm-h8300/ucontext.h
asm-hexagon/shmparam.h
asm-m32r/kvm_para.h
asm-m68k/kvm_para.h
asm-m68k/shmparam.h
asm-metag/kvm_para.h
asm-metag/shmparam.h
asm-metag/ucontext.h
asm-mips/hwcap.h
asm-mips/reg.h
asm-mips/ucontext.h
asm-nios2/kvm_para.h
asm-nios2/ucontext.h
asm-openrisc/shmparam.h
asm-parisc/kvm_para.h
asm-powerpc/perf_regs.h
asm-sh/kvm_para.h
asm-sh/ucontext.h
asm-tile/shmparam.h
asm-unicore32/shmparam.h
asm-unicore32/ucontext.h
asm-x86/hwcap2.h
asm-xtensa/kvm_para.h
drm/armada_drm.h
drm/etnaviv_drm.h
drm/vgem_drm.h
linux/aspeed-lpc-ctrl.h
linux/auto_dev-ioctl.h
linux/bcache.h
linux/btrfs_tree.h
linux/can/vxcan.h
linux/cifs/cifs_mount.h
linux/coresight-stm.h
linux/cryptouser.h
linux/fsmap.h
linux/genwqe/genwqe_card.h
linux/hash_info.h
linux/kcm.h
linux/kcov.h
linux/kfd_ioctl.h
linux/lightnvm.h
linux/module.h
linux/nbd-netlink.h
linux/nilfs2_api.h
linux/nilfs2_ondisk.h
linux/nsfs.h
linux/pr.h
linux/qrtr.h
linux/rpmsg.h
linux/sched/types.h
linux/sed-opal.h
linux/smc.h
linux/smc_diag.h
linux/stm.h
linux/switchtec_ioctl.h
linux/vfio_ccw.h
linux/wil6210_uapi.h
rdma/bnxt_re-abi.h

Note that I have removed from this list the files which are generated in every
exported directories (like .install or .install.cmd).

Thanks to Julien Floret <julien.floret@6wind.com> for the tip to get all
subdirs with a pure makefile command.

For the record, note that exported files for asm directories are a mix of
files listed by:
 - include/uapi/asm-generic/Kbuild.asm;
 - arch/<arch>/include/uapi/asm/Kbuild;
 - arch/<arch>/include/asm/Kbuild.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Mark Salter <msalter@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2017-05-11 00:21:54 +09:00
Nicholas Piggin
5a61ef74f2 powerpc/64s: Support new device tree binding for discovering CPU features
The ibm,powerpc-cpu-features device tree binding describes CPU features with
ASCII names and extensible compatibility, privilege, and enablement metadata
that allows improved flexibility and compatibility with new hardware.

The interface is described in detail in ibm,powerpc-cpu-features.txt in this
patch.

Currently this code is not enabled by default, and there are no released
firmwares that provide the binding.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-05-09 23:42:55 +10:00
Michael Ellerman
0b382fb3d9 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Includes a fix for a powerpc/next mm regression on 64e, a fix for a
kernel hang on 64e when using a debugger inside a relocated kernel, a
qman fix, and misc qe improvements."
2017-05-09 22:54:35 +10:00
Paolo Bonzini
4415b33528 Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
The main thing here is a new implementation of the in-kernel
XICS interrupt controller emulation for POWER9 machines, from Ben
Herrenschmidt.

POWER9 has a new interrupt controller called XIVE (eXternal Interrupt
Virtualization Engine) which is able to deliver interrupts directly
to guest virtual CPUs in hardware without hypervisor intervention.
With this new code, the guest still sees the old XICS interface but
performance is better because the XICS emulation in the host uses the
XIVE directly rather than going through a XICS emulation in firmware.

Conflicts:
	arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix]
	arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]
2017-05-09 11:50:01 +02:00
Michael Ellerman
ba95b5d035 powerpc/mm/book3s/64: Rework page table geometry for lower memory usage
Recently in commit f6eedbba7a ("powerpc/mm/hash: Increase VA range to 128TB")
we increased the virtual address space for user processes to 128TB by default,
and up to 512TB if user space opts in.

This obviously required expanding the range of the Linux page tables. For Book3s
64-bit using hash and with PAGE_SIZE=64K, we increased the PGD to 2^15 entries.
This meant we could cover the full address range, while still being able to
insert a 16G hugepage at the PGD level and a 16M hugepage in the PMD.

The downside of that geometry is that it uses a lot of memory for the PGD, and
in particular makes the PGD a 4-page allocation, which means it's much more
likely to fail under memory pressure.

Instead we can make the PMD larger, so that a single PUD entry maps 16G,
allowing the 16G hugepages to sit at that level in the tree. We're then able to
split the remaining bits between the PUG and PGD. We make the PGD slightly
larger as that results in lower memory usage for typical programs.

When THP is enabled the PMD actually doubles in size, to 2^11 entries, or 2^14
bytes, which is large but still < PAGE_SIZE.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2017-05-09 19:24:23 +10:00
Linus Torvalds
857f864014 pci-v4.12-changes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZEHmsAAoJEFmIoMA60/r88SgQAJbFddueb0+DfJ+USDud4b/Z
 akfS+G1UAm+TgtMyh1wM49dHzFssp36uWJxtWI+bPqBzuy94PMCbz7JVUV28gX9G
 tFhFuc5YH94I/3y85rbZnolb6uZN9MhLjzTFqDC9ilW6HFqmwK4t4wlHSCjQN1St
 svLYvs2G6n6/VK3Fre7/wOvdZ1erG4Qod+kn5Tx3K5TQydmRlaSBfK+DRANuDBkM
 KzGO7Bkc/Cx8hb9pHmaey/wxmNrrgmVjTtWrEnb2tEq833zP4h6GhUIJEKodMSi5
 gXPNZgKlu3n5L592M0UCh4EoHejzkv9wrcsoDm+djmsc5Zg2Howq4kAdHP8k4hUG
 0gt8n0ni9vhJN56jikrGi7cAdHCKSNnx2Ue/qTCbX0ncB3XUMuJxJwCsgW/6wa9f
 oU7tRtTS03UltnKoFAcyYclS4TaSY4SA4ySaK6Hi+cRkdVFDdyHQYbHHNSU7MsA+
 IS2tXvGoIdSYyrZMHSRcl2rRTfYQUkmPEvBF3LvqZr32M4mJMmUNAPLZaly373ZE
 iwq0ZJlrLeM0cqdFIG3S60RtJyQk/HBN1NMqrYHArWOxvWIgNd5F8NCsTTxY3wU3
 IxgBIuUFcbVwVkqEHGs8K5AvB3oghqdnA3eGOV79799eMtLn3LOvyIlpHMSw9WUq
 ags00JtMLitfNPBH3eSl
 =eE4D
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:

 - add framework for supporting PCIe devices in Endpoint mode (Kishon
   Vijay Abraham I)

 - use non-postable PCI config space mappings when possible (Lorenzo
   Pieralisi)

 - clean up and unify mmap of PCI BARs (David Woodhouse)

 - export and unify Function Level Reset support (Christoph Hellwig)

 - avoid FLR for Intel 82579 NICs (Sasha Neftin)

 - add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig)

 - short-circuit config access failures for disconnected devices (Keith
   Busch)

 - remove D3 sleep delay when possible (Adrian Hunter)

 - freeze PME scan before suspending devices (Lukas Wunner)

 - stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava)

 - disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann)

 - add arch-specific alignment control to improve device passthrough by
   avoiding multiple BARs in a page (Yongji Xie)

 - add sysfs sriov_drivers_autoprobe to control VF driver binding
   (Bodong Wang)

 - allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas)

 - fix crashes when unbinding host controllers that don't support
   removal (Brian Norris)

 - add driver for MicroSemi Switchtec management interface (Logan
   Gunthorpe)

 - add driver for Faraday Technology FTPCI100 host bridge (Linus
   Walleij)

 - add i.MX7D support (Andrey Smirnov)

 - use generic MSI support for Aardvark (Thomas Petazzoni)

 - make Rockchip driver modular (Brian Norris)

 - advertise 128-byte Read Completion Boundary support for Rockchip
   (Shawn Lin)

 - advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin)

 - convert atomic_t to refcount_t in HV driver (Elena Reshetova)

 - add CPU IRQ affinity in HV driver (K. Y. Srinivasan)

 - fix PCI bus removal in HV driver (Long Li)

 - add support for ThunderX2 DMA alias topology (Jayachandran C)

 - add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki)

 - add ITE 8893 bridge DMA alias quirk (Jarod Wilson)

 - restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices
   (Manish Jaggi)

* tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits)
  PCI: Don't allow unbinding host controllers that aren't prepared
  ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
  MAINTAINERS: Add PCI Endpoint maintainer
  Documentation: PCI: Add userguide for PCI endpoint test function
  tools: PCI: Add sample test script to invoke pcitest
  tools: PCI: Add a userspace tool to test PCI endpoint
  Documentation: misc-devices: Add Documentation for pci-endpoint-test driver
  misc: Add host side PCI driver for PCI test function device
  PCI: Add device IDs for DRA74x and DRA72x
  dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access
  PCI: dwc: dra7xx: Workaround for errata id i870
  dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode
  PCI: dwc: dra7xx: Add EP mode support
  PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
  dt-bindings: PCI: Add DT bindings for PCI designware EP mode
  PCI: dwc: designware: Add EP mode support
  Documentation: PCI: Add binding documentation for pci-test endpoint function
  ixgbe: Use pcie_flr() instead of duplicating it
  IB/hfi1: Use pcie_flr() instead of duplicating it
  PCI: imx6: Fix spelling mistake: "contol" -> "control"
  ...
2017-05-08 19:03:25 -07:00
Linus Torvalds
bf5f89463f Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - the rest of MM

 - various misc things

 - procfs updates

 - lib/ updates

 - checkpatch updates

 - kdump/kexec updates

 - add kvmalloc helpers, use them

 - time helper updates for Y2038 issues. We're almost ready to remove
   current_fs_time() but that awaits a btrfs merge.

 - add tracepoints to DAX

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits)
  drivers/staging/ccree/ssi_hash.c: fix build with gcc-4.4.4
  selftests/vm: add a test for virtual address range mapping
  dax: add tracepoint to dax_insert_mapping()
  dax: add tracepoint to dax_writeback_one()
  dax: add tracepoints to dax_writeback_mapping_range()
  dax: add tracepoints to dax_load_hole()
  dax: add tracepoints to dax_pfn_mkwrite()
  dax: add tracepoints to dax_iomap_pte_fault()
  mtd: nand: nandsim: convert to memalloc_noreclaim_*()
  treewide: convert PF_MEMALLOC manipulations to new helpers
  mm: introduce memalloc_noreclaim_{save,restore}
  mm: prevent potential recursive reclaim due to clearing PF_MEMALLOC
  mm/huge_memory.c: deposit a pgtable for DAX PMD faults when required
  mm/huge_memory.c: use zap_deposited_table() more
  time: delete CURRENT_TIME_SEC and CURRENT_TIME
  gfs2: replace CURRENT_TIME with current_time
  apparmorfs: replace CURRENT_TIME with current_time()
  lustre: replace CURRENT_TIME macro
  fs: ubifs: replace CURRENT_TIME_SEC with current_time
  fs: ufs: use ktime_get_real_ts64() for birthtime
  ...
2017-05-08 18:17:56 -07:00
Hari Bathini
22bd0177bd powerpc/fadump: remove dependency with CONFIG_KEXEC
Now that crashkernel parameter parsing and vmcoreinfo related code is
moved under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE, remove
dependency with CONFIG_KEXEC for CONFIG_FA_DUMP.  While here, get rid of
definitions of fadump_append_elf_note() & fadump_final_note() functions
to reuse similar functions compiled under CONFIG_CRASH_CORE.

Link: http://lkml.kernel.org/r/149035343956.6881.1536459326017709354.stgit@hbathini.in.ibm.com
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:11 -07:00
Linus Torvalds
2d3e4866de * ARM: HYP mode stub supports kexec/kdump on 32-bit; improved PMU
support; virtual interrupt controller performance improvements; support
 for userspace virtual interrupt controller (slower, but necessary for
 KVM on the weird Broadcom SoCs used by the Raspberry Pi 3)
 
 * MIPS: basic support for hardware virtualization (ImgTec
 P5600/P6600/I6400 and Cavium Octeon III)
 
 * PPC: in-kernel acceleration for VFIO
 
 * s390: support for guests without storage keys; adapter interruption
 suppression
 
 * x86: usual range of nVMX improvements, notably nested EPT support for
 accessed and dirty bits; emulation of CPL3 CPUID faulting
 
 * generic: first part of VCPU thread request API; kvm_stat improvements
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJZEHUkAAoJEL/70l94x66DBeYH/09wrpJ2FjU4Rqv7FxmqgWfH
 9WGi4wvn/Z+XzQSyfMJiu2SfZVzU69/Y67OMHudy7vBT6knB+ziM7Ntoiu/hUfbG
 0g5KsDX79FW15HuvuuGh9kSjUsj7qsQdyPZwP4FW/6ZoDArV9mibSvdjSmiUSMV/
 2wxaoLzjoShdOuCe9EABaPhKK0XCrOYkygT6Paz1pItDxaSn8iW3ulaCuWMprUfG
 Niq+dFemK464E4yn6HVD88xg5j2eUM6bfuXB3qR3eTR76mHLgtwejBzZdDjLG9fk
 32PNYKhJNomBxHVqtksJ9/7cSR6iNPs7neQ1XHemKWTuYqwYQMlPj1NDy0aslQU=
 =IsiZ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "ARM:
   - HYP mode stub supports kexec/kdump on 32-bit
   - improved PMU support
   - virtual interrupt controller performance improvements
   - support for userspace virtual interrupt controller (slower, but
     necessary for KVM on the weird Broadcom SoCs used by the Raspberry
     Pi 3)

  MIPS:
   - basic support for hardware virtualization (ImgTec P5600/P6600/I6400
     and Cavium Octeon III)

  PPC:
   - in-kernel acceleration for VFIO

  s390:
   - support for guests without storage keys
   - adapter interruption suppression

  x86:
   - usual range of nVMX improvements, notably nested EPT support for
     accessed and dirty bits
   - emulation of CPL3 CPUID faulting

  generic:
   - first part of VCPU thread request API
   - kvm_stat improvements"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
  kvm: nVMX: Don't validate disabled secondary controls
  KVM: put back #ifndef CONFIG_S390 around kvm_vcpu_kick
  Revert "KVM: Support vCPU-based gfn->hva cache"
  tools/kvm: fix top level makefile
  KVM: x86: don't hold kvm->lock in KVM_SET_GSI_ROUTING
  KVM: Documentation: remove VM mmap documentation
  kvm: nVMX: Remove superfluous VMX instruction fault checks
  KVM: x86: fix emulation of RSM and IRET instructions
  KVM: mark requests that need synchronization
  KVM: return if kvm_vcpu_wake_up() did wake up the VCPU
  KVM: add explicit barrier to kvm_vcpu_kick
  KVM: perform a wake_up in kvm_make_all_cpus_request
  KVM: mark requests that do not need a wakeup
  KVM: remove #ifndef CONFIG_S390 around kvm_vcpu_wake_up
  KVM: x86: always use kvm_make_request instead of set_bit
  KVM: add kvm_{test,clear}_request to replace {test,clear}_bit
  s390: kvm: Cpu model support for msa6, msa7 and msa8
  KVM: x86: remove irq disablement around KVM_SET_CLOCK/KVM_GET_CLOCK
  kvm: better MWAIT emulation for guests
  KVM: x86: virtualize cpuid faulting
  ...
2017-05-08 12:37:56 -07:00
Linus Torvalds
7246f60068 powerpc updates for 4.12 part 1.
Highlights include:
 
  - Larger virtual address space on 64-bit server CPUs. By default we use a 128TB
    virtual address space, but a process can request access to the full 512TB by
    passing a hint to mmap().
 
  - Support for the new Power9 "XIVE" interrupt controller.
 
  - TLB flushing optimisations for the radix MMU on Power9.
 
  - Support for CAPI cards on Power9, using the "Coherent Accelerator Interface
    Architecture 2.0".
 
  - The ability to configure the mmap randomisation limits at build and runtime.
 
  - Several small fixes and cleanups to the kprobes code, as well as support for
    KPROBES_ON_FTRACE.
 
  - Major improvements to handling of system reset interrupts, correctly treating
    them as NMIs, giving them a dedicated stack and using a new hypervisor call
    to trigger them, all of which should aid debugging and robustness.
 
 Many fixes and other minor enhancements.
 
 Thanks to:
   Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan,
   Aneesh Kumar K.V, Anshuman Khandual, Anton Blanchard, Balbir Singh, Ben
   Hutchings, Benjamin Herrenschmidt, Bhupesh Sharma, Chris Packham, Christian
   Zigotzky, Christophe Leroy, Christophe Lombard, Daniel Axtens, David Gibson,
   Gautham R. Shenoy, Gavin Shan, Geert Uytterhoeven, Guilherme G. Piccoli,
   Hamish Martin, Hari Bathini, Kees Cook, Laurent Dufour, Madhavan Srinivasan,
   Mahesh J Salgaonkar, Mahesh Salgaonkar, Masami Hiramatsu, Matt Brown, Matthew
   R. Ochs, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran,
   Pan Xinhui, Paul Mackerras, Rashmica Gupta, Russell Currey, Sukadev
   Bhattiprolu, Thadeu Lima de Souza Cascardo, Tobin C. Harding, Tyrel Datwyler,
   Uma Krishnan, Vaibhav Jain, Vipin K Parashar, Yang Shi.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZDHUMAAoJEFHr6jzI4aWAT7oQALkE2Nj3gjcn1z0SkFhq/1iO
 Py9Elmqm4E+L6NKYtBY5dS8xVAJ088ffzERyqJ1FY1LHkB8tn8bWRcMQmbjAFzTI
 V4TAzDNI890BN/F4ptrYRwNFxRBHAvZ4NDunTzagwYnwmTzW9PYHmOi4pvWTo3Tw
 KFUQ0joLSEgHzyfXxYB3fyj41u8N0FZvhfazdNSqia2Y5Vwwv/ION5jKplDM+09Y
 EtVEXFvaKAS1sjbM/d/Jo5rblHfR0D9/lYV10+jjyIokjzslIpyTbnj3izeYoM5V
 I4h99372zfsEjBGPPXyM3khL3zizGMSDYRmJHQSaKxjtecS9SPywPTZ8ufO/aSzV
 Ngq6nlND+f1zep29VQ0cxd3Jh40skWOXzxJaFjfDT25xa6FbfsWP2NCtk8PGylZ7
 EyqTuCWkMgIP02KlX3oHvEB2LRRPCDmRU2zECecRGNJrIQwYC2xjoiVi7Q8Qe8rY
 gr7Ib5Jj/a+uiTcCIy37+5nXq2s14/JBOKqxuYZIxeuZFvKYuRUipbKWO05WDOAz
 m/pSzeC3J8AAoYiqR0gcSOuJTOnJpGhs7zrQFqnEISbXIwLW+ICumzOmTAiBqOEY
 Rt8uW2gYkPwKLrE05445RfVUoERaAjaE06eRMOWS6slnngHmmnRJbf3PcoALiJkT
 ediqGEj0/N1HMB31V5tS
 =vSF3
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Highlights include:

   - Larger virtual address space on 64-bit server CPUs. By default we
     use a 128TB virtual address space, but a process can request access
     to the full 512TB by passing a hint to mmap().

   - Support for the new Power9 "XIVE" interrupt controller.

   - TLB flushing optimisations for the radix MMU on Power9.

   - Support for CAPI cards on Power9, using the "Coherent Accelerator
     Interface Architecture 2.0".

   - The ability to configure the mmap randomisation limits at build and
     runtime.

   - Several small fixes and cleanups to the kprobes code, as well as
     support for KPROBES_ON_FTRACE.

   - Major improvements to handling of system reset interrupts,
     correctly treating them as NMIs, giving them a dedicated stack and
     using a new hypervisor call to trigger them, all of which should
     aid debugging and robustness.

   - Many fixes and other minor enhancements.

  Thanks to: Alastair D'Silva, Alexey Kardashevskiy, Alistair Popple,
  Andrew Donnellan, Aneesh Kumar K.V, Anshuman Khandual, Anton
  Blanchard, Balbir Singh, Ben Hutchings, Benjamin Herrenschmidt,
  Bhupesh Sharma, Chris Packham, Christian Zigotzky, Christophe Leroy,
  Christophe Lombard, Daniel Axtens, David Gibson, Gautham R. Shenoy,
  Gavin Shan, Geert Uytterhoeven, Guilherme G. Piccoli, Hamish Martin,
  Hari Bathini, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mahesh J
  Salgaonkar, Mahesh Salgaonkar, Masami Hiramatsu, Matt Brown, Matthew
  R. Ochs, Michael Neuling, Naveen N. Rao, Nicholas Piggin, Oliver
  O'Halloran, Pan Xinhui, Paul Mackerras, Rashmica Gupta, Russell
  Currey, Sukadev Bhattiprolu, Thadeu Lima de Souza Cascardo, Tobin C.
  Harding, Tyrel Datwyler, Uma Krishnan, Vaibhav Jain, Vipin K Parashar,
  Yang Shi"

* tag 'powerpc-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (214 commits)
  powerpc/64s: Power9 has no LPCR[VRMASD] field so don't set it
  powerpc/powernv: Fix TCE kill on NVLink2
  powerpc/mm/radix: Drop support for CPUs without lockless tlbie
  powerpc/book3s/mce: Move add_taint() later in virtual mode
  powerpc/sysfs: Move #ifdef CONFIG_HOTPLUG_CPU out of the function body
  powerpc/smp: Document irq enable/disable after migrating IRQs
  powerpc/mpc52xx: Don't select user-visible RTAS_PROC
  powerpc/powernv: Document cxl dependency on special case in pnv_eeh_reset()
  powerpc/eeh: Clean up and document event handling functions
  powerpc/eeh: Avoid use after free in eeh_handle_special_event()
  cxl: Mask slice error interrupts after first occurrence
  cxl: Route eeh events to all drivers in cxl_pci_error_detected()
  cxl: Force context lock during EEH flow
  powerpc/64: Allow CONFIG_RELOCATABLE if COMPILE_TEST
  powerpc/xmon: Teach xmon oops about radix vectors
  powerpc/mm/hash: Fix off-by-one in comment about kernel contexts ids
  powerpc/pseries: Enable VFIO
  powerpc/powernv: Fix iommu table size calculation hook for small tables
  powerpc/powernv: Check kzalloc() return value in pnv_pci_table_alloc
  powerpc: Add arch/powerpc/tools directory
  ...
2017-05-05 11:36:44 -07:00
Scott Wood
61baf15555 powerpc/64e: Don't place the stack beyond TASK_SIZE
Commit f4ea6dcb08 ("powerpc/mm: Enable mappings above 128TB") increased
the task size on book3s, and introduced a mechanism to dynamically
control whether a task uses these larger addresses.  While the change to
the task size itself was ifdef-protected to only apply on book3s, the
change to STACK_TOP_USER64 was not.  On book3e, this had the effect of
trying to use addresses up to 128TiB for the stack despite a 64TiB task
size limit -- which broke 64-bit userspace producing the following errors:

Starting init: /sbin/init exists but couldn't execute it (error -14)
Starting init: /bin/sh exists but couldn't execute it (error -14)
Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.

Fixes: f4ea6dcb08 ("powerpc/mm: Enable mappings above 128TB")
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Scott Wood <oss@buserror.net>
2017-05-05 01:22:06 -05:00
Christophe Leroy
726bd22310 powerpc/8xx: Adding support of IRQ in MPC8xx GPIO
This patch allows the use of IRQ to notify the change of GPIO status
on MPC8xx CPM IO ports. This then allows to associate IRQs to GPIOs
in the Device Tree.

Ex:
	CPM1_PIO_C: gpio-controller@960 {
		#gpio-cells = <2>;
		compatible = "fsl,cpm1-pario-bank-c";
		reg = <0x960 0x10>;
		fsl,cpm1-gpio-irq-mask = <0x0fff>;
		interrupts = <1 2 6 9 10 11 14 15 23 24 26 31>;
		interrupt-parent = <&CPM_PIC>;
		gpio-controller;
	};

The property 'fsl,cpm1-gpio-irq-mask' defines which of the 16 GPIOs
have the associated interrupts defined in the 'interrupts' property.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
2017-05-02 22:35:00 -05:00
Linus Torvalds
76f1948a79 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatch updates from Jiri Kosina:

 - a per-task consistency model is being added for architectures that
   support reliable stack dumping (extending this, currently rather
   trivial set, is currently in the works).

   This extends the nature of the types of patches that can be applied
   by live patching infrastructure. The code stems from the design
   proposal made [1] back in November 2014. It's a hybrid of SUSE's
   kGraft and RH's kpatch, combining advantages of both: it uses
   kGraft's per-task consistency and syscall barrier switching combined
   with kpatch's stack trace switching. There are also a number of
   fallback options which make it quite flexible.

   Most of the heavy lifting done by Josh Poimboeuf with help from
   Miroslav Benes and Petr Mladek

   [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz

 - module load time patch optimization from Zhou Chengming

 - a few assorted small fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: add missing printk newlines
  livepatch: Cancel transition a safe way for immediate patches
  livepatch: Reduce the time of finding module symbols
  livepatch: make klp_mutex proper part of API
  livepatch: allow removal of a disabled patch
  livepatch: add /proc/<pid>/patch_state
  livepatch: change to a per-task consistency model
  livepatch: store function sizes
  livepatch: use kstrtobool() in enabled_store()
  livepatch: move patching functions into patch.c
  livepatch: remove unnecessary object loaded check
  livepatch: separate enabled and patched states
  livepatch/s390: add TIF_PATCH_PENDING thread flag
  livepatch/s390: reorganize TIF thread flag bits
  livepatch/powerpc: add TIF_PATCH_PENDING thread flag
  livepatch/x86: add TIF_PATCH_PENDING thread flag
  livepatch: create temporary klp_update_patch_state() stub
  x86/entry: define _TIF_ALLWORK_MASK flags explicitly
  stacktrace/x86: add function for detecting reliable stack traces
2017-05-02 18:24:16 -07:00
Linus Torvalds
8d65b08deb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Millar:
 "Here are some highlights from the 2065 networking commits that
  happened this development cycle:

   1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)

   2) Add a generic XDP driver, so that anyone can test XDP even if they
      lack a networking device whose driver has explicit XDP support
      (me).

   3) Sparc64 now has an eBPF JIT too (me)

   4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
      Starovoitov)

   5) Make netfitler network namespace teardown less expensive (Florian
      Westphal)

   6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)

   7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)

   8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)

   9) Multiqueue support in stmmac driver (Joao Pinto)

  10) Remove TCP timewait recycling, it never really could possibly work
      well in the real world and timestamp randomization really zaps any
      hint of usability this feature had (Soheil Hassas Yeganeh)

  11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
      Aleksandrov)

  12) Add socket busy poll support to epoll (Sridhar Samudrala)

  13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
      and several others)

  14) IPSEC hw offload infrastructure (Steffen Klassert)"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
  tipc: refactor function tipc_sk_recv_stream()
  tipc: refactor function tipc_sk_recvmsg()
  net: thunderx: Optimize page recycling for XDP
  net: thunderx: Support for XDP header adjustment
  net: thunderx: Add support for XDP_TX
  net: thunderx: Add support for XDP_DROP
  net: thunderx: Add basic XDP support
  net: thunderx: Cleanup receive buffer allocation
  net: thunderx: Optimize CQE_TX handling
  net: thunderx: Optimize RBDR descriptor handling
  net: thunderx: Support for page recycling
  ipx: call ipxitf_put() in ioctl error path
  net: sched: add helpers to handle extended actions
  qed*: Fix issues in the ptp filter config implementation.
  qede: Fix concurrency issue in PTP Tx path processing.
  stmmac: Add support for SIMATIC IOT2000 platform
  net: hns: fix ethtool_get_strings overflow in hns driver
  tcp: fix wraparound issue in tcp_lp
  bpf, arm64: fix jit branch offset related to ldimm64
  bpf, arm64: implement jiting of BPF_XADD
  ...
2017-05-02 16:40:27 -07:00
Linus Torvalds
d3b5d35290 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The main x86 MM changes in this cycle were:

   - continued native kernel PCID support preparation patches to the TLB
     flushing code (Andy Lutomirski)

   - various fixes related to 32-bit compat syscall returning address
     over 4Gb in applications, launched from 64-bit binaries - motivated
     by C/R frameworks such as Virtuozzo. (Dmitry Safonov)

   - continued Intel 5-level paging enablement: in particular the
     conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov)

   - x86/mpx ABI corner case fixes/enhancements (Joerg Roedel)

   - ... plus misc updates, fixes and cleanups"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
  mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash
  x86/mm: Fix flush_tlb_page() on Xen
  x86/mm: Make flush_tlb_mm_range() more predictable
  x86/mm: Remove flush_tlb() and flush_tlb_current_task()
  x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
  x86/mm/64: Fix crash in remove_pagetable()
  Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation"
  x86/boot/e820: Remove a redundant self assignment
  x86/mm: Fix dump pagetables for 4 levels of page tables
  x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow
  x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
  Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()"
  x86/espfix: Add support for 5-level paging
  x86/kasan: Extend KASAN to support 5-level paging
  x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y
  x86/paravirt: Add 5-level support to the paravirt code
  x86/mm: Define virtual memory map for 5-level paging
  x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert
  x86/boot: Detect 5-level paging support
  x86/mm/numa: Remove numa_nodemask_from_meminfo()
  ...
2017-05-01 23:54:56 -07:00
Linus Torvalds
3fb9268e43 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "The main changes in this cycle were:

   - unwinder fixes and enhancements

   - improve ftrace interaction with the unwinder

   - optimize the code footprint of WARN() and related debugging
     constructs

   - ... plus misc updates, cleanups and fixes"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/unwind: Dump all stacks in unwind_dump()
  x86/unwind: Silence more entry-code related warnings
  x86/ftrace: Fix ebp in ftrace_regs_caller that screws up unwinder
  x86/unwind: Remove unused 'sp' parameter in unwind_dump()
  x86/unwind: Prepend hex mask value with '0x' in unwind_dump()
  x86/unwind: Properly zero-pad 32-bit values in unwind_dump()
  x86/unwind: Ensure stack pointer is aligned
  debug: Avoid setting BUGFLAG_WARNING twice
  x86/unwind: Silence entry-related warnings
  x86/unwind: Read stack return address in update_stack_state()
  x86/unwind: Move common code into update_stack_state()
  debug: Fix __bug_table[] in arch linker scripts
  debug: Add _ONCE() logic to report_bug()
  x86/debug: Define BUG() again for !CONFIG_BUG
  x86/debug: Implement __WARN() using UD0
  x86/ftrace: Use Makefile logic instead of #ifdef for compiling ftrace_*.o
  x86/ftrace: Add -mfentry support to x86_32 with DYNAMIC_FTRACE set
  x86/ftrace: Clean up ftrace_regs_caller
  x86/ftrace: Add stack frame pointer to ftrace_caller
  x86/ftrace: Move the ftrace specific code out of entry_32.S
  ...
2017-05-01 22:07:51 -07:00
Linus Torvalds
5db6db0d40 Merge branch 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess unification updates from Al Viro:
 "This is the uaccess unification pile. It's _not_ the end of uaccess
  work, but the next batch of that will go into the next cycle. This one
  mostly takes copy_from_user() and friends out of arch/* and gets the
  zero-padding behaviour in sync for all architectures.

  Dealing with the nocache/writethrough mess is for the next cycle;
  fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am
  sold on access_ok() in there, BTW; just not in this pile), same for
  reducing __copy_... callsites, strn*... stuff, etc. - there will be a
  pile about as large as this one in the next merge window.

  This one sat in -next for weeks. -3KLoC"

* 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits)
  HAVE_ARCH_HARDENED_USERCOPY is unconditional now
  CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
  m32r: switch to RAW_COPY_USER
  hexagon: switch to RAW_COPY_USER
  microblaze: switch to RAW_COPY_USER
  get rid of padding, switch to RAW_COPY_USER
  ia64: get rid of copy_in_user()
  ia64: sanitize __access_ok()
  ia64: get rid of 'segment' argument of __do_{get,put}_user()
  ia64: get rid of 'segment' argument of __{get,put}_user_check()
  ia64: add extable.h
  powerpc: get rid of zeroing, switch to RAW_COPY_USER
  esas2r: don't open-code memdup_user()
  alpha: fix stack smashing in old_adjtimex(2)
  don't open-code kernel_setsockopt()
  mips: switch to RAW_COPY_USER
  mips: get rid of tail-zeroing in primitives
  mips: make copy_from_user() zero tail explicitly
  mips: clean and reorder the forest of macros...
  mips: consolidate __invoke_... wrappers
  ...
2017-05-01 14:41:04 -07:00
Bjorn Helgaas
889e4dd916 Merge branch 'pci/resource-mmap' into next
* pci/resource-mmap:
  ia64: Use generic pci_mmap_resource_range()
  ia64: Remove redundant checks for WC in pci_mmap_page_range()
  ia64: Remove redundant valid_mmap_phys_addr_range() from pci_mmap_page_range()
  PCI: Add I/O BAR support to generic pci_mmap_resource_range()
  x86/PCI: Use generic pci_mmap_resource_range()
  unicore32/PCI: Use generic pci_mmap_resource_range()
  sh/PCI: Use generic pci_mmap_resource_range()
  parisc: Use generic pci_mmap_resource_range()
  mn10300/PCI: Use generic pci_mmap_resource_range()
  MIPS: PCI: Use generic pci_mmap_resource_range()
  cris/PCI: Use generic pci_mmap_resource_range()
  ARM/PCI: Use generic pci_mmap_resource_range()
  PCI: Add pci_mmap_resource_range() and use it for ARM64
  PCI: Add BAR index argument to pci_mmap_page_range()
  PCI: Use BAR index in sysfs attr->private instead of resource pointer
  PCI: Add arch_can_pci_mmap_io() on architectures which can mmap() I/O space
  PCI: Move multiple declarations of pci_mmap_page_range() to <linux/pci.h>
  PCI: Add arch_can_pci_mmap_wc() macro
  xtensa/PCI: Do not mmap PCI BARs to userspace as write-through
  PCI: Only allow WC mmap on prefetchable resources
  PCI: Fix another sanity check bug in /proc/pci mmap
  PCI: Fix pci_mmap_fits() for HAVE_PCI_RESOURCE_TO_USER platforms
2017-04-28 10:34:34 -05:00
Michael Ellerman
add2e1e585 powerpc/mm/hash: Fix off-by-one in comment about kernel contexts ids
Michal Suchánek noticed a comment in book3s/64/mmu-hash.h about the context ids
we use for the kernel was inconsistent with the code and other comments in the
same file.

It should read 1-4 not 1-5.

While we're touching it, update "address" to "addresses" which makes more sense
as it's referring to more than one address below.

Reported-by: Michal Suchánek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28 22:02:55 +10:00
Nicholas Piggin
c64af6458e powerpc: Add struct smp_ops_t.cause_nmi_ipi operation
Have the NMI IPI code use this op when the platform defines it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28 21:02:25 +10:00
Nicholas Piggin
ddd703ca06 powerpc: Add NMI IPI infrastructure
Add a simple NMI IPI system that handles concurrency and reentrancy.

The platform does not have to implement a true non-maskable interrupt,
the default is to simply use the debugger break IPI message. This has
now been co-opted for a general IPI message, and users (debugger and
crash) have been reimplemented on top of the NMI system.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Incorporate incremental fixes from Nick]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28 21:02:25 +10:00
Nicholas Piggin
b1ee8a3de5 powerpc/64s: Dedicated system reset interrupt stack
The system reset interrupt is used for crash/debug situations, so it is
desirable to have as little impact on the normal state of the system as
possible.

Currently it uses the current kernel stack to process the exception.
This stores into the stack which may be involved with the crash. The
stack pointer may be corrupted, or it may have overflowed.

Avoid or minimise these problems by creating a dedicated NMI stack for
the system reset interrupt to use.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28 21:02:25 +10:00
Nicholas Piggin
c4f3b52ce7 powerpc/64s: Disallow system reset vs system reset reentrancy
In preparation for using a dedicated stack for system reset interrupts,
prevent a nested system reset from recovering, in order to simplify
code that is called in crash/debug path. This allows a system reset
interrupt to just use the base stack pointer.

Keep an in_nmi nesting counter similarly to the in_mce counter. Consider
the interrrupt non-recoverable if it is taken inside another system
reset.

Interrupt nesting could be allowed similarly to MCE, but system reset
is a special case that's not for normal operation, so simplicity wins
until there is requirement for nested system reset interrupts.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28 21:02:25 +10:00
Nicholas Piggin
a3d96f70c1 powerpc/64s: Fix system reset vs general interrupt reentrancy
The system reset interrupt can occur when MSR_EE=0, and it currently
uses the PACA_EXGEN save area.

Some PACA_EXGEN interrupts have a window where MSR_RI=1 and MSR_EE=0
when the save area is still in use. A system reset interrupt in this
window can lead to undetected corruption when the save area gets
overwritten.

This patch introduces PACA_EXNMI save area for system reset exceptions,
which closes this corruption window. It's also helpful to retain the
EXGEN state for debugging situations, even if not considering the
recoverability aspect.

This patch also moves the PACA_EXMC area down to a less frequently used
part of the paca with the new save area.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28 21:02:25 +10:00
Nicholas Piggin
a4087a4d38 powerpc/64s: Exception macro for stack frame and initial register save
This code is common to a few exceptions, and another user will be added.
This causes a trivial change to generated code:

-     604: std     r9,416(r1)
-     608: mfspr   r11,314
-     60c: std     r11,368(r1)
-     610: mfspr   r12,315
+     604: mfspr   r11,314
+     608: mfspr   r12,315
+     60c: std     r9,416(r1)
+     610: std     r11,368(r1)

machine_check_powernv_early could also use this, but that requires non
trivial changes to generated code, so that's for another patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28 21:02:25 +10:00
Nicholas Piggin
83a980f7f4 powerpc/64s: Add exception macro that does not enable RI
Subsequent patches will add more non-RI variant exceptions, so
create a macro for it rather than open-code it.

This does not change generated instructions.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-28 21:02:25 +10:00
Michael Ellerman
b13f6683ed Merge branch 'topic/ppc-kvm' into next
Merge the topic branch we were sharing with kvm-ppc, Paul has also
merged it.
2017-04-28 20:19:37 +10:00
Paul Mackerras
fb7dcf723d Merge remote-tracking branch 'remotes/powerpc/topic/xive' into kvm-ppc-next
This merges in the powerpc topic/xive branch to bring in the code for
the in-kernel XICS interrupt controller emulation to use the new XIVE
(eXternal Interrupt Virtualization Engine) hardware in the POWER9 chip
directly, rather than via a XICS emulation in firmware.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-28 08:23:16 +10:00
Christophe Leroy
fd893fe56a powerpc/mm: Fix missing page attributes in page table dump
On some targets, _PAGE_RW is 0 and this is _PAGE_RO which is used.
There is also _PAGE_SHARED that is missing.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-27 22:20:27 +10:00
Benjamin Herrenschmidt
5af5099385 KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller
This patch makes KVM capable of using the XIVE interrupt controller
to provide the standard PAPR "XICS" style hypercalls. It is necessary
for proper operations when the host uses XIVE natively.

This has been lightly tested on an actual system, including PCI
pass-through with a TG3 device.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build
 failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and
 adding empty stubs for the kvm_xive_xxx() routines, fixup subject,
 integrate fixes from Paul for building PR=y HV=n]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-27 21:37:29 +10:00
Al Viro
eea86b637a Merge branches 'uaccess.alpha', 'uaccess.arc', 'uaccess.arm', 'uaccess.arm64', 'uaccess.avr32', 'uaccess.bfin', 'uaccess.c6x', 'uaccess.cris', 'uaccess.frv', 'uaccess.h8300', 'uaccess.hexagon', 'uaccess.ia64', 'uaccess.m32r', 'uaccess.m68k', 'uaccess.metag', 'uaccess.microblaze', 'uaccess.mips', 'uaccess.mn10300', 'uaccess.nios2', 'uaccess.openrisc', 'uaccess.parisc', 'uaccess.powerpc', 'uaccess.s390', 'uaccess.score', 'uaccess.sh', 'uaccess.sparc', 'uaccess.tile', 'uaccess.um', 'uaccess.unicore32', 'uaccess.x86' and 'uaccess.xtensa' into work.uaccess 2017-04-26 12:06:59 -04:00
David Gibson
9765ad134a powerpc/mm: Ensure IRQs are off in switch_mm()
powerpc expects IRQs to already be (soft) disabled when switch_mm() is
called, as made clear in the commit message of 9c1e105238 ("powerpc: Allow
perf_counters to access user memory at interrupt time").

Aside from any race conditions that might exist between switch_mm() and an IRQ,
there is also an unconditional hard_irq_disable() in switch_slb(). If that isn't
followed at some point by an IRQ enable then interrupts will remain disabled
until we return to userspace.

It is true that when switch_mm() is called from the scheduler IRQs are off, but
not when it's called by use_mm(). Looking closer we see that last year in commit
f98db6013c ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler")
this was made more explicit by the addition of switch_mm_irqs_off() which is now
called by the scheduler, vs switch_mm() which is used by use_mm().

Arguably it is a bug in use_mm() to call switch_mm() in a different context than
it expects, but fixing that will take time.

This was discovered recently when vhost started throwing warnings such as:

  BUG: sleeping function called from invalid context at kernel/mutex.c:578
  in_atomic(): 0, irqs_disabled(): 1, pid: 10768, name: vhost-10760
  no locks held by vhost-10760/10768.
  irq event stamp: 10
  hardirqs last  enabled at (9):  _raw_spin_unlock_irq+0x40/0x80
  hardirqs last disabled at (10): switch_slb+0x2e4/0x490
  softirqs last  enabled at (0):  copy_process+0x5e8/0x1260
  softirqs last disabled at (0):  (null)
  Call Trace:
    show_stack+0x88/0x390 (unreliable)
    dump_stack+0x30/0x44
    __might_sleep+0x1c4/0x2d0
    mutex_lock_nested+0x74/0x5c0
    cgroup_attach_task_all+0x5c/0x180
    vhost_attach_cgroups_work+0x58/0x80 [vhost]
    vhost_worker+0x24c/0x3d0 [vhost]
    kthread+0xec/0x100
    ret_from_kernel_thread+0x5c/0xd4

Prior to commit 04b96e5528 ("vhost: lockless enqueuing") (Aug 2016) the
vhost_worker() would do a spin_unlock_irq() not long after calling use_mm(),
which had the effect of reenabling IRQs. Since that commit removed the locking
in vhost_worker() the body of the vhost_worker() loop now runs with interrupts
off causing the warnings.

This patch addresses the problem by making the powerpc code mirror the x86 code,
ie. we disable interrupts in switch_mm(), and optimise the scheduler case by
defining switch_mm_irqs_off().

Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[mpe: Flesh out/rewrite change log, add stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-25 00:24:59 +10:00
Michael Ellerman
9fc849144c Merge branch 'topic/kprobes' into next
Although most of these kprobes patches are powerpc specific, there's a couple
that touch generic code (with Acks). At the moment there's one conflict with
acme's tree, but it's not too bad. Still just in case some other conflicts show
up, we've put these in a topic branch so another tree could merge some or all of
it if necessary.
2017-04-25 00:24:04 +10:00
Naveen N. Rao
1b32cd1715 powerpc: Introduce a new helper to obtain function entry points
kprobe_lookup_name() is specific to the kprobe subsystem and may not always
return the function entry point (in a subsequent patch for KPROBES_ON_FTRACE).
For looking up function entry points, introduce a separate helper and use it
in optprobes.c

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-24 19:07:58 +10:00
Naveen N. Rao
ead514d5fb powerpc/kprobes: Add support for KPROBES_ON_FTRACE
Allow kprobes to be placed on ftrace _mcount() call sites. This optimization
avoids the use of a trap, by riding on ftrace infrastructure.

This depends on HAVE_DYNAMIC_FTRACE_WITH_REGS which depends on MPROFILE_KERNEL,
which is only currently enabled on powerpc64le with newer toolchains.

Based on the x86 code by Masami.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-24 19:07:58 +10:00
Naveen N. Rao
9a914aa682 powerpc/kprobes: Blacklist common exception handlers
Blacklist all the exception common/OOL handlers as the kernel stack is not yet
setup, which means we can't take a trap at this point.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23 20:32:26 +10:00
Naveen N. Rao
7aa5b018bf powerpc/kprobes: Blacklist exception handlers
Introduce __head_end to mark end of the early fixed sections and use it to
blacklist all exception handlers from kprobes.

mpe: We do not need to do anything special for relocatable kernels, where the
exception vectors are split from the main kernel, as the split vectors are
already excluded by the check for kernel_text_address().

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Move __head_end outside #ifdef 64-bit to unbreak the 32-bit build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23 20:32:25 +10:00
Nicholas Piggin
9cba253df4 powerpc/64s: Simplify POWER9 DD1 idle workaround code
The idle workaround does not need to load PACATOC, and it does not
need to be called within a nested function that requires LR to be
saved.

Load the PACATOC at entry to the idle wakeup. It does not matter which
PACA this comes from, so it's okay to call before the workaround. Then
apply the workaround to get the right PACA.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23 20:32:23 +10:00
Nicholas Piggin
0d7720a242 powerpc/64s: Idle POWER8 avoid full state loss recovery where possible
If not all threads were in winkle, full state loss recovery is not
necessary and can be avoided. A previous patch removed this optimisation
due to some complexity with the implementation. Re-implement it by
counting the number of threads in winkle with the per-core idle state.
Only restore full state loss if all threads were in winkle.

This has a small window of false positives right before threads execute
winkle and just after they wake up, when the winkle count does not
reflect the true number of threads in winkle. This is not a significant
problem in comparison with even the minimum winkle duration. For
correctness, a false positive is not a problem (only false negatives
would be).

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23 20:32:12 +10:00
Nicholas Piggin
adbcf8d74f powerpc/64s: Expand core idle state bits
In preparation for adding more bits to the core idle state word, move
the lock bit up, and unlock by flipping the lock bit rather than masking
off all but the thread bits.

Add branch hints for atomic operations while we're here.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23 20:31:49 +10:00
Nicholas Piggin
1945bc4549 powerpc/64s: Fix POWER9 machine check handler from stop state
The ISA specifies power save wakeup due to a machine check exception can
cause a machine check interrupt (rather than the usual system reset
interrupt).

The machine check handler copes with this by doing low level machine
check recovery without restoring full state from idle, then queues up a
machine check event for logging, then directly executes the same idle
instruction it woke from. This minimises the work done before recovery
is performed.

The problem is that it requires machine specific instructions and
knowledge of the book3s idle code. Currently it only has code to handle
POWER8 idle, so POWER9 crashes when trying to execute the P8 idle
instructions which don't exist in ISAv3.0B.

cpu 0x0: Vector: e40 (Emulation Assist) at [c0000000008f3810]
    pc: c000000000008380: machine_check_handle_early+0x130/0x2f0
    lr: c00000000053a098: stop_loop+0x68/0xd0
    sp: c0000000008f3a90
   msr: 9000000000081001
  current = 0xc0000000008a1080
  paca    = 0xc00000000ffd0000   softe: 0        irq_happened: 0x01
    pid   = 0, comm = swapper/0

Instead of going to sleep after recovery, do the usual idle wakeup and
state restoration by calling into the normal idle wakeup path. This
reuses the normal idle wakeup paths.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23 20:31:46 +10:00
Nicholas Piggin
544686cae8 powerpc/64s: Stop using bit in HSPRG0 to test winkle
The POWER8 idle code has a neat trick of programming the power on engine
to restore a low bit into HSPRG0, so idle wakeup code can test and see
if it has been programmed this way and therefore lost all state. Restore
time can be reduced if winkle has not been reached.

However this messes with our r13 PACA pointer, and requires HSPRG0 to be
written to. It also optimizes the slowest and most uncommon case at the
expense of another SPR write in the common nap state wakeup.

Remove this complexity and assume winkle sleeps always require a state
restore. This speedup could be made entirely contained within the winkle
idle code by counting per-core winkles and setting a thread bitmap when
all have gone to winkle.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23 20:31:39 +10:00
Nicholas Piggin
2563a70c3b powerpc/64s: Remove unnecessary relocation branch from idle handler
The system reset idle handler system_reset_idle_common is relocated, so
relocation is not required to branch to kvm_start_guest. The superfluous
relocation does not result in incorrect code, but it does not compile
outside of exception-64s.S (with fixed section definitions).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-23 17:26:35 +10:00
David S. Miller
fb796707d7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Both conflict were simple overlapping changes.

In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the
conversion of the driver in net-next to use in-netdev stats.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21 20:23:53 -07:00
Oliver O'Halloran
f855b2f544 powerpc/mm: Wire up ioremap_cache()
The default implementation of ioremap_cache() is aliased to ioremap().
On powerpc ioremap() creates cache-inhibited mappings by default which
is almost certainly not what you wanted.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-21 21:08:47 +10:00
Naveen N. Rao
49e0b4658f kprobes: Convert kprobe_lookup_name() to a function
The macro is now pretty long and ugly on powerpc. In the light of further
changes needed here, convert it to a __weak variant to be over-ridden with a
nicer looking function.

Suggested-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-20 23:18:54 +10:00
Nicholas Piggin
a050d20d02 powerpc/64s: Use relon prolog for EXC_VIRT_OOL_MASKABLE_HV handlers
Hypervisor Virtualization and Directed Hypervisor Doorbell interrupt handlers
use the macro EXC_VIRT_OOL_MASKABLE_HV for their relocation-on handlers, which
calls MASKABLE_RELON_EXCEPTION_HV_OOL, which uses the *real mode* interrupt
prolog. This means we needlessly rfid from virtual mode to virtual mode.

For POWER8 it only affects doorbell IPIs. Context switch microbenchmark between
threads with snooze disabled (which causes IPI) gets about 3% faster, about 370
cycles. Should be more important on POWER9 with global doorbells and HVI for
host interrupts.

Use the RELON variant instead to reduce overhead.

Fixes: 1707dd1613 ("powerpc: Save CFAR before branching in interrupt entry paths")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fold some more detail into the change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-20 17:05:11 +10:00
Alexey Kardashevskiy
121f80ba68 KVM: PPC: VFIO: Add in-kernel acceleration for VFIO
This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
and H_STUFF_TCE requests targeted an IOMMU TCE table used for VFIO
without passing them to user space which saves time on switching
to user space and back.

This adds H_PUT_TCE/H_PUT_TCE_INDIRECT/H_STUFF_TCE handlers to KVM.
KVM tries to handle a TCE request in the real mode, if failed
it passes the request to the virtual mode to complete the operation.
If it a virtual mode handler fails, the request is passed to
the user space; this is not expected to happen though.

To avoid dealing with page use counters (which is tricky in real mode),
this only accelerates SPAPR TCE IOMMU v2 clients which are required
to pre-register the userspace memory. The very first TCE request will
be handled in the VFIO SPAPR TCE driver anyway as the userspace view
of the TCE table (iommu_table::it_userspace) is not allocated till
the very first mapping happens and we cannot call vmalloc in real mode.

If we fail to update a hardware IOMMU table unexpected reason, we just
clear it and move on as there is nothing really we can do about it -
for example, if we hot plug a VFIO device to a guest, existing TCE tables
will be mirrored automatically to the hardware and there is no interface
to report to the guest about possible failures.

This adds new attribute - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE - to
the VFIO KVM device. It takes a VFIO group fd and SPAPR TCE table fd
and associates a physical IOMMU table with the SPAPR TCE table (which
is a guest view of the hardware IOMMU table). The iommu_table object
is cached and referenced so we do not have to look up for it in real mode.

This does not implement the UNSET counterpart as there is no use for it -
once the acceleration is enabled, the existing userspace won't
disable it unless a VFIO container is destroyed; this adds necessary
cleanup to the KVM_DEV_VFIO_GROUP_DEL handler.

This advertises the new KVM_CAP_SPAPR_TCE_VFIO capability to the user
space.

This adds real mode version of WARN_ON_ONCE() as the generic version
causes problems with rcu_sched. Since we testing what vmalloc_to_phys()
returns in the code, this also adds a check for already existing
vmalloc_to_phys() call in kvmppc_rm_h_put_tce_indirect().

This finally makes use of vfio_external_user_iommu_id() which was
introduced quite some time ago and was considered for removal.

Tests show that this patch increases transmission speed from 220MB/s
to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:39:26 +10:00
Alexey Kardashevskiy
b1af23d836 KVM: PPC: iommu: Unify TCE checking
This reworks helpers for checking TCE update parameters in way they
can be used in KVM.

This should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:39:21 +10:00
Alexey Kardashevskiy
503bfcbe18 KVM: PPC: Pass kvm* to kvmppc_find_table()
The guest view TCE tables are per KVM anyway (not per VCPU) so pass kvm*
there. This will be used in the following patches where we will be
attaching VFIO containers to LIOBNs via ioctl() to KVM (rather than
to VCPU).

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:39:12 +10:00
Paul Mackerras
644d2d6fef Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next
This merges in the commits in the topic/ppc-kvm branch of the powerpc
tree to get the changes to arch/powerpc which subsequent patches will
rely on.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:38:33 +10:00
Alexey Kardashevskiy
96df226769 KVM: PPC: Book3S PR: Preserve storage control bits
PR KVM page fault handler performs eaddr to pte translation for a guest,
however kvmppc_mmu_book3s_64_xlate() does not preserve WIMG bits
(storage control) in the kvmppc_pte struct. If PR KVM is running as
a second level guest under HV KVM, and PR KVM tries inserting HPT entry,
this fails in HV KVM if it already has this mapping.

This preserves WIMG bits between kvmppc_mmu_book3s_64_xlate() and
kvmppc_mmu_map_page().

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:38:14 +10:00
Paul Mackerras
9b5ab00513 KVM: PPC: Add MMIO emulation for remaining floating-point instructions
For completeness, this adds emulation of the lfiwax and lfiwzx
instructions.  With this, all floating-point load and store instructions
as of Power ISA V2.07 are emulated.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:37:44 +10:00
Paul Mackerras
ceba57df43 KVM: PPC: Emulation for more integer loads and stores
This adds emulation for the following integer loads and stores,
thus enabling them to be used in a guest for accessing emulated
MMIO locations.

- lhaux
- lwaux
- lwzux
- ldu
- lwa
- stdux
- stwux
- stdu
- ldbrx
- stdbrx

Previously, most of these would cause an emulation failure exit to
userspace, though ldu and lwa got treated incorrectly as ld, and
stdu got treated incorrectly as std.

This also tidies up some of the formatting and updates the comment
listing instructions that still need to be implemented.

With this, all integer loads and stores that are defined in the Power
ISA v2.07 are emulated, except for those that are permitted to trap
when used on cache-inhibited or write-through mappings (and which do
in fact trap on POWER8), that is, lmw/stmw, lswi/stswi, lswx/stswx,
lq/stq, and l[bhwdq]arx/st[bhwdq]cx.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:37:38 +10:00
Alexey Kardashevskiy
91242fd1a3 KVM: PPC: Add MMIO emulation for stdx (store doubleword indexed)
This adds missing stdx emulation for emulated MMIO accesses by KVM
guests.  This allows the Mellanox mlx5_core driver from recent kernels
to work when MMIO emulation is enforced by userspace.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:37:33 +10:00
Bin Lu
6f63e81bda KVM: PPC: Book3S: Add MMIO emulation for FP and VSX instructions
This patch provides the MMIO load/store emulation for instructions
of 'double & vector unsigned char & vector signed char & vector
unsigned short & vector signed short & vector unsigned int & vector
signed int & vector double '.

The instructions that this adds emulation for are:

- ldx, ldux, lwax,
- lfs, lfsx, lfsu, lfsux, lfd, lfdx, lfdu, lfdux,
- stfs, stfsx, stfsu, stfsux, stfd, stfdx, stfdu, stfdux, stfiwx,
- lxsdx, lxsspx, lxsiwax, lxsiwzx, lxvd2x, lxvw4x, lxvdsx,
- stxsdx, stxsspx, stxsiwx, stxvd2x, stxvw4x

[paulus@ozlabs.org - some cleanups, fixes and rework, make it
 compile for Book E, fix build when PR KVM is built in]

Signed-off-by: Bin Lu <lblulb@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 11:36:41 +10:00
Paul Mackerras
307d927967 KVM: PPC: Provide functions for queueing up FP/VEC/VSX unavailable interrupts
This provides functions that can be used for generating interrupts
indicating that a given functional unit (floating point, vector, or
VSX) is unavailable.  These functions will be used in instruction
emulation code.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-04-20 10:39:50 +10:00
Yongji Xie
3827463769 powerpc/powernv: Override pcibios_default_alignment() to force PCI devices to be page aligned
Override pcibios_default_alignment() to set default alignment to PAGE_SIZE
for all PCI devices on PowerNV platform.  Thus sub-page BARs would not
share a page and could be mapped into guest when VFIO passthrough them.

Signed-off-by: Yongji Xie <elohimes@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-04-19 12:51:26 -05:00
Nicholas Piggin
ca80d5d0a8 powerpc/64s: Remove SAO feature from Power9 DD1
Power9 DD1 does not implement SAO. Although it's not widely used, its presence
or absence is visible to user space via arch_validate_prot() so it's moderately
important that we get the value right.

Fixes: 7dccfbc325 ("powerpc/book3s: Add a cpu table entry for different POWER9 revs")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-19 20:48:25 +10:00
Nicholas Piggin
2384d2d7ad powerpc/64s: Remove ICSWX feature from Power9
Power9 does not implement the icswx instruction. This CPU feature is not visible
to userspace and is only used in the CONFIG_PPC_ICSWX code, which is generally
not enabled, and can only be triggered by other code using icswx, which should
not happen on Power9 systems in the first place. So impact should be minimal.

Fixes: c3ab300ea5 ("powerpc: Add POWER9 cputable entry")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-19 20:21:50 +10:00
Madhavan Srinivasan
170a315f41 powerpc/perf: Support to export MMCRA[TEC*] field to userspace
Threshold feature when used with MMCRA [Threshold Event Counter Event],
MMCRA[Threshold Start event] and MMCRA[Threshold End event] will update
MMCRA[Threashold Event Counter Exponent] and MMCRA[Threshold Event
Counter Multiplier] with the corresponding threshold event count values.
Patch to export MMCRA[TECX/TECM] to userspace in 'weight' field of
struct perf_sample_data.

Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-19 20:00:22 +10:00
Madhavan Srinivasan
79e96f8f93 powerpc/perf: Export memory hierarchy info to user space
The LDST field and DATA_SRC in SIER identifies the memory hierarchy level
(eg: L1, L2 etc), from which a data-cache miss for a marked instruction
was satisfied. Use the 'perf_mem_data_src' object to export this
hierarchy level to user space.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-19 20:00:21 +10:00
David Woodhouse
e854d8b2a8 PCI: Add arch_can_pci_mmap_io() on architectures which can mmap() I/O space
This is relatively esoteric, and knowing that we don't have it makes life
easier in some cases rather than just an eventual -EINVAL from
pci_mmap_page_range().

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-04-18 13:02:26 -05:00
David Woodhouse
11df19546f PCI: Move multiple declarations of pci_mmap_page_range() to <linux/pci.h>
We can declare it <linux/pci.h> even on platforms where it isn't going to
be defined.  There's no need to have it littered through the various
<asm/pci.h> files.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-04-18 13:02:11 -05:00
David Woodhouse
ae749c7ab4 PCI: Add arch_can_pci_mmap_wc() macro
Most of the almost-identical versions of pci_mmap_page_range() silently
ignore the 'write_combine' argument and give uncached mappings.

Yet we allow the PCIIOC_WRITE_COMBINE ioctl in /proc/bus/pci, expose the
'resourceX_wc' file in sysfs, and allow an attempted mapping to apparently
succeed.

To fix this, introduce a macro arch_can_pci_mmap_wc() which indicates
whether the platform can do a write-combining mapping.  On x86 this ends up
being pat_enabled(), while the few other platforms that support it can just
set it to a literal '1'.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-04-18 13:01:42 -05:00
Michael Ellerman
be5c5e843c powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y
Prior to commit 2337d20728 ("powerpc/64: CONFIG_RELOCATABLE support for hmi
interrupts"), the branch from hmi_exception_early() to hmi_exception_realmode()
was just a bl hmi_exception_realmode, which the linker would turn into a bl to
the local entry point of hmi_exception_realmode. This was broken when
CONFIG_RELOCATABLE=y because hmi_exception_realmode() is not in the low part of
the kernel text that is copied down to 0x0.

But in fixing that, we added a new bug on little endian kernels. Because the
branch is now a bctrl when CONFIG_RELOCATABLE=y, we branch to the global entry
point of hmi_exception_realmode(). The global entry point must be called with
r12 containing the address of hmi_exception_realmode(), because it uses that
value to calculate the TOC value (r2).

This may manifest as a checkstop, because we take a junk value from r12 which
came from HSRR1, add a small constant to it and then use that as the TOC
pointer. The HSRR1 value will have 0x9 as the top nibble, which puts it above
RAM and somewhere in MMIO space.

Fix it by changing the BRANCH_LINK_TO_FAR() macro to always use r12 to load the
label we're branching to. This means r12 will be setup correctly on LE, fixing
this bug, and r12 is also volatile across function calls on BE so it's a good
choice anyway.

Fixes: 2337d20728 ("powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts")
Reported-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-18 20:19:52 +10:00
Michael Ellerman
590c369e7e powerpc: Drop include of linux/io.h from asm/io.h
Currently powerpc's asm/io.h includes linux/io.h, and linux/io.h
includes asm/io.h.

This can cause problems because depending on which is included first the
order of definitions between the two files will change.

The include of linux/io.h was added back in 2008 in commit b41e5fffe8
("[POWERPC] devres: Add devm_ioremap_prot()"). It's not entirely clear
it was needed then, but devm_ioremap_prot() has since been removed
entirely as unused, in dedd24a12f ("powerpc: Remove unused
devm_ioremap_prot()").

So it seems to be unnecessary and can potentially cause problems, so
remove the include of linux/io.h from asm/io.h

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-13 23:34:35 +10:00
Nicholas Piggin
6b3edefefa powerpc/powernv: POWER9 support for msgsnd/doorbell IPI
POWER9 requires msgsync for receiver-side synchronization, and a DD1
workaround restricts IPIs to core-local.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Drop no longer needed asm feature macro changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-13 23:34:34 +10:00
Nicholas Piggin
a5adf28246 powerpc/64s: Avoid a branch for ppc_msgsnd
IPIs are a pretty hot path and we already have the ability to do asm feature
patching, so use it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Change log detail]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-13 23:34:34 +10:00
Nicholas Piggin
b87ac02183 powerpc: Introduce msgsnd/doorbell barrier primitives
POWER9 changes requirements and adds new instructions for
synchronization.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-13 23:34:33 +10:00
Nicholas Piggin
b866cc2199 powerpc: Change the doorbell IPI calling convention
Change the doorbell callers to know about their msgsnd addressing,
rather than have them set a per-cpu target data tag at boot that gets
sent to the cause_ipi functions. The data is only used for doorbell IPI
functions, no other IPI types, so it makes sense to keep that detail
local to doorbell.

Have the platform code understand doorbell IPIs, rather than the
interrupt controller code understand them. Platform code can look at
capabilities it has available and decide which to use.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-13 23:34:33 +10:00
Nicholas Piggin
9b7ff0c658 powerpc/64s: Add SCV FSCR bit for ISA v3.0
Add the bit definition and use it in facility_unavailable_exception() so we can
intelligently report the cause if we take a fault for SCV. This doesn't actually
enable SCV.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Drop whitespace changes to the existing entries, flush out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-13 23:34:32 +10:00
Balbir Singh
9c355917fc powerpc/tracing: Allow tracing of mmap syscalls
Currently sys_mmap() and sys_mmap2() (32-bit only), are not visible to the
syscall tracing machinery. This means users are not able to see the execution of
mmap() syscalls using the syscall tracer.

Fix that by using SYSCALL_DEFINE6 for sys_mmap() and sys_mmap2() so that the
meta-data associated with these syscalls is visible to the syscall tracer.

A side-effect of this change is that the return type has changed from unsigned
long to long. However this should have no effect, the only code in the kernel
which uses the result of these syscalls is in the syscall return path, which is
written in asm and treats the result as unsigned regardless.

Example output:
  cat-3399  [001] ....   196.542410: sys_mmap(addr: 7fff922a0000, len: 20000, prot: 3, flags: 812, fd: 3, offset: 1b0000)
  cat-3399  [001] ....   196.542443: sys_mmap -> 0x7fff922a0000
  cat-3399  [001] ....   196.542668: sys_munmap(addr: 7fff922c0000, len: 6d2c)
  cat-3399  [001] ....   196.542677: sys_munmap -> 0x0

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Massage change log, add detail on return type change]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-12 22:32:43 +10:00
Michael Ellerman
03dfee6d5f powerpc/mm: Fix swapper_pg_dir size on 64-bit hash w/64K pages
Recently in commit f6eedbba7a ("powerpc/mm/hash: Increase VA range to 128TB"),
we increased H_PGD_INDEX_SIZE to 15 when we're building with 64K pages. This
makes it larger than RADIX_PGD_INDEX_SIZE (13), which means the logic to
calculate MAX_PGD_INDEX_SIZE in book3s/64/pgtable.h is wrong.

The end result is that the PGD (Page Global Directory, ie top level page table)
of the kernel (aka. swapper_pg_dir), is too small.

This generally doesn't lead to a crash, as we don't use the full range in normal
operation. However if we try to dump the kernel pagetables we can trigger a
crash because we walk off the end of the pgd into other memory and eventually
try to dereference something bogus:

  $ cat /sys/kernel/debug/kernel_pagetables
  Unable to handle kernel paging request for data at address 0xe8fece0000000000
  Faulting instruction address: 0xc000000000072314
  cpu 0xc: Vector: 380 (Data SLB Access) at [c0000000daa13890]
      pc: c000000000072314: ptdump_show+0x164/0x430
      lr: c000000000072550: ptdump_show+0x3a0/0x430
     dar: e802cf0000000000
  seq_read+0xf8/0x560
  full_proxy_read+0x84/0xc0
  __vfs_read+0x6c/0x1d0
  vfs_read+0xbc/0x1b0
  SyS_read+0x6c/0x110
  system_call+0x38/0xfc

The root cause is that MAX_PGD_INDEX_SIZE isn't actually computed to be
the max of H_PGD_INDEX_SIZE or RADIX_PGD_INDEX_SIZE. To fix that move
the calculation into asm-offsets.c where we can do it easily using
max().

Fixes: f6eedbba7a ("powerpc/mm/hash: Increase VA range to 128TB")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-12 22:32:43 +10:00
Michael Ellerman
3c19d5ada1 Merge branch 'topic/xive' (early part) into next
This merges the arch part of the XIVE support, leaving the final commit
with the KVM specific pieces dangling on the branch for Paul to merge
via the kvm-ppc tree.
2017-04-12 22:31:37 +10:00
Gautham R. Shenoy
17ed4c8f81 powerpc/powernv: Recover correct PACA on wakeup from a stop on P9 DD1
POWER9 DD1.0 hardware has a bug where the SPRs of a thread waking up
from stop 0,1,2 with ESL=1 can endup being misplaced in the core. Thus
the HSPRG0 of a thread waking up from can contain the paca pointer of
its sibling.

This patch implements a context recovery framework within threads of a
core, by provisioning space in paca_struct for saving every sibling
threads's paca pointers. Basically, we should be able to arrive at the
right paca pointer from any of the thread's existing paca pointer.

At bootup, during powernv idle-init, we save the paca address of every
CPU in each one its siblings paca_struct in the slot corresponding to
this CPU's index in the core.

On wakeup from a stop, the thread will determine its index in the core
from the TIR register and recover its PACA pointer by indexing into
the correct slot in the provisioned space in the current PACA.

Furthermore, ensure that the NVGPRs are restored from the stack on the
way out by setting the NAPSTATELOST in paca.

[Changelog written with inputs from svaidy@linux.vnet.ibm.com]
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Call it a bug]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11 08:45:09 +10:00
Gautham R. Shenoy
a7cd88da97 powerpc/powernv: Move CPU-Offline idle state invocation from smp.c to idle.c
Move the piece of code in powernv/smp.c::pnv_smp_cpu_kill_self() which
transitions the CPU to the deepest available platform idle state to a
new function named pnv_cpu_offline() in powernv/idle.c. The rationale
behind this code movement is that the data required to determine the
deepest available platform state resides in powernv/idle.c.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11 08:45:09 +10:00
Anshuman Khandual
2c9faa7675 powerpc/hugetlb: Add ABI defines for supported HugeTLB page sizes
Add user space exported API definitions for 512KB, 1MB, 2MB, 8MB, 16MB,
1GB, 16GB non default huge page sizes to be used with mmap() system
call.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Reword the comment to emphasise that these are only needed to use
 the non-default huge page size, and updated the change log.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11 07:46:06 +10:00
Michael Ellerman
7644d5819c powerpc: Create asm/debugfs.h and move powerpc_debugfs_root there
powerpc_debugfs_root is the dentry representing the root of the
"powerpc" directory tree in debugfs.

Currently it sits in asm/debug.h, a long with some other things that
have "debug" in the name, but are otherwise unrelated.

Pull it out into a separate header, which also includes linux/debugfs.h,
and convert all the users to include debugfs.h instead of debug.h.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-11 07:46:03 +10:00
Benjamin Herrenschmidt
08a1e650cc powerpc: Fixup LPCR:PECE and HEIC setting on POWER9
We need to set LPES in order for normal external interrupts (0x500)
to be directed to the guest while running in guest state.

We also need HEIC set to prevent them to be sent to the host while
in host state.

With XIVE the host never gets one of these and wouldn't know how to
handle it. All host external interrupts come in via the new
hypervisor virtualization interrupts vector.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-10 21:43:17 +10:00
Benjamin Herrenschmidt
d381d7caf8 powerpc: Consolidate variants of real-mode MMIOs
We have all sort of variants of MMIO accessors for the real mode
instructions. This creates a clean set of accessors based on
Linux normal naming conventions, replacing all occurrences of
the old ones in the tree.

I have purposefully removed the "out/in" variants in favor of
only including __raw variants. Any code using these is already
pretty much hand tuned to operate in a very specific environment.
I've fixed up the 2 users (only one of them actually needed
a barrier in the first place).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-10 21:43:16 +10:00
Benjamin Herrenschmidt
f50d6bd344 powerpc/kvm: Remove obsolete kvm_vm_ioctl_xics_irq declaration
The function doesn't exist anymore

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-10 21:43:16 +10:00
Benjamin Herrenschmidt
936774cd3f powerpc/kvm: Make kvmppc_xics_create_icp static
It's only used within the same file it's defined

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-10 21:43:15 +10:00
Benjamin Herrenschmidt
243e25112d powerpc/xive: Native exploitation of the XIVE interrupt controller
The XIVE interrupt controller is the new interrupt controller
found in POWER9. It supports advanced virtualization capabilities
among other things.

Currently we use a set of firmware calls that simulate the old
"XICS" interrupt controller but this is fairly inefficient.

This adds the framework for using XIVE along with a native
backend which OPAL for configuration. Later, a backend allowing
the use in a KVM or PowerVM guest will also be provided.

This disables some fast path for interrupts in KVM when XIVE is
enabled as these rely on the firmware emulation code which is no
longer available when the XIVE is used natively by Linux.

A latter patch will make KVM also directly exploit the XIVE, thus
recovering the lost performance (and more).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Fixup pr_xxx("XIVE:"...), don't split pr_xxx() strings,
 tweak Kconfig so XIVE_NATIVE selects XIVE and depends on POWERNV,
 fix build errors when SMP=n, fold in fixes from Ben:
   Don't call cpu_online() on an invalid CPU number
   Fix irq target selection returning out of bounds cpu#
   Extra sanity checks on cpu numbers
 ]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-10 21:41:34 +10:00
Chenbo Feng
5daab9db7b New getsockopt option to get socket cookie
Introduce a new getsockopt operation to retrieve the socket cookie
for a specific socket based on the socket fd.  It returns a unique
non-decreasing cookie for each socket.
Tested: https://android-review.googlesource.com/#/c/358163/

Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-08 08:07:01 -07:00
Paolo Bonzini
4b4357e025 kvm: make KVM_COALESCED_MMIO_PAGE_OFFSET public
Its value has never changed; we might as well make it part of the ABI instead
of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO).

Because PPC does not always make MMIO available, the code has to be made
dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-04-07 16:49:01 +02:00
Benjamin Herrenschmidt
a978e13965 powerpc/smp: Remove migrate_irq() custom implementation
Some powerpc platforms use this to move IRQs away from a CPU being
unplugged. This function has several bugs such as not taking the right
locks or failing to NULL check pointers.

There's a new generic function doing exactly the same thing without all
the bugs, so let's use it instead.

mpe: The obvious place for the select of GENERIC_IRQ_MIGRATION is on
HOTPLUG_CPU, but that doesn't work. On some configs PM_SLEEP_SMP will
select HOTPLUG_CPU even though its dependencies are not met, which means
the select of GENERIC_IRQ_MIGRATION doesn't happen. That leads to the
build breaking. Fix it by moving the select of GENERIC_IRQ_MIGRATION to
SMP.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-07 12:01:27 +10:00
Al Viro
fccfb99508 Merge commit 'b4fb8f66f1ae2e167d06c12d018025a8d4d3ba7e' into uaccess.ia64
backmerge of mainline ia64 fix
2017-04-06 19:35:03 -04:00
Al Viro
3448890c32 powerpc: get rid of zeroing, switch to RAW_COPY_USER
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-06 15:08:42 -04:00
Al Viro
f2ed8bebee Merge commit 'a7d2475af7aedcb9b5c6343989a8bfadbf84429b' into uaccess.powerpc
backmerge of sorting the arch/powerpc/Kconfig
2017-04-06 15:08:10 -04:00
Benjamin Herrenschmidt
14d4ae5c4c powerpc: Add optional smp_ops->prepare_cpu SMP callback
Some platforms (will) need to perform allocations before bringing
a new CPU online. Doing it from smp_ops->setup_cpu is the wrong
thing to do:

 - It has no useful failure path (too late)
 - Calling any allocator will enable interrupts prematurely
   causing problems with large decrementer among others

Instead, add a new callback that is called from __cpu_up (so from
the context trying to online the new CPU) at a point where we
can safely allocate and handle failures.

This will be used by XIVE support.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-06 19:58:53 +10:00
Benjamin Herrenschmidt
22bd64a621 powerpc: Add more PPC bit conversion macros
Add 32 and 8 bit variants

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-06 19:58:53 +10:00
Benjamin Herrenschmidt
eeea1a434d powerpc/powernv: Add XIVE related definitions to opal-api.h
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-06 19:58:46 +10:00
Al Viro
054838bc01 Merge commit 'fc69910f329d' into uaccess.mips
backmerge of a build fix from mainline
2017-04-06 02:07:33 -04:00
Alistair Popple
1ab66d1fba powerpc/powernv: Introduce address translation services for Nvlink2
Nvlink2 supports address translation services (ATS) allowing devices
to request address translations from an mmu known as the nest MMU
which is setup to walk the CPU page tables.

To access this functionality certain firmware calls are required to
setup and manage hardware context tables in the nvlink processing unit
(NPU). The NPU also manages forwarding of TLB invalidates (known as
address translation shootdowns/ATSDs) to attached devices.

This patch exports several methods to allow device drivers to register
a process id (PASID/PID) in the hardware tables and to receive
notification of when a device should stop issuing address translation
requests (ATRs). It also adds a fault handler to allow device drivers
to demand fault pages in.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
[mpe: Fix up comment formatting, use flush_tlb_mm()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-04 13:27:26 +10:00
Ingo Molnar
7f75540ff2 Linux 4.11-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJY4ZYkAAoJEHm+PkMAQRiGsq4H/R4PMXDoe2XhSSk7IoT97pXV
 /A8np/scAPjzEgYUidbb54OSqWwsPRuPGWONTFeSrE2u0L4wln/REI91jg7QetLq
 IisncExlYeJ/XQ+iO0ZZh9fLbqwIlEJFdSXmyIFr3m/TBxe8a61C8j93oNgM1tHT
 yuwzlq7c3sLq2hsmUG2HyL2kJsEfRasv4Rk0yhFuti12zVsBoTW4qmZuMauq+gdf
 f7cSYgiHhPTdb2o+azg5O7uYNHaQQBxdUMlIuhhYtVOUq+pFDO23SLHSFIW2NwOm
 Zn5R6CFSrLsCw0Bx0v8Xlc151QUbaRK4h9lhUhkBr6d3uNShU1NQ9JojpSvYwBo=
 =vP6E
 -----END PGP SIGNATURE-----

Merge tag 'v4.11-rc5' into x86/mm, to refresh the branch

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-04-03 16:36:32 +02:00
Michael Ellerman
63f44d6514 powerpc/book3s: Print task info if we take a machine check in user mode
For an MCE (Machine Check Exception) that hits while in user mode
MSR(PR=1), print the task info to the console MCE error log. This may
help to identify an application that triggered the MCE.

After this patch the MCE console looks like:

  Severe Machine check interrupt [Recovered]
    NIP: [0000000010039778] PID: 762 Comm: ebizzy
    Initiator: CPU
    Error type: SLB [Multihit]
      Effective address: 0000000010039778

  Severe Machine check interrupt [Not recovered]
    NIP: [0000000010039778] PID: 763 Comm: ebizzy
    Initiator: CPU
    Error type: UE [Page table walk ifetch]
      Effective address: 0000000010039778
  ebizzy[763]: unhandled signal 7 at 0000000010039778 nip 0000000010039778 lr 0000000010001b44 code 30004

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-03 16:12:00 +10:00
Al Viro
bee3f412d6 Merge branch 'parisc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux into uaccess.parisc 2017-04-02 10:33:48 -04:00
Aneesh Kumar K.V
f4ea6dcb08 powerpc/mm: Enable mappings above 128TB
Not all user space application is ready to handle wide addresses. It's
known that at least some JIT compilers use higher bits in pointers to
encode their information. It collides with valid pointers with 512TB
addresses and leads to crashes.

To mitigate this, we are not going to allocate virtual address space
above 128TB by default.

But userspace can ask for allocation from full address space by
specifying hint address (with or without MAP_FIXED) above 128TB.

If hint address set above 128TB, but MAP_FIXED is not specified, we try
to look for unmapped area by specified address. If it's already
occupied, we look for unmapped area in *full* address space, rather than
from 128TB window.

This approach helps to easily make application's memory allocator aware
about large address space without manually tracking allocated virtual
address space.

This is going to be a per mmap decision. ie, we can have some mmaps with
larger addresses and other that do not.

A sample memory layout looks like:

  10000000-10010000 r-xp 00000000 fc:00 9057045          /home/max_addr_512TB
  10010000-10020000 r--p 00000000 fc:00 9057045          /home/max_addr_512TB
  10020000-10030000 rw-p 00010000 fc:00 9057045          /home/max_addr_512TB
  10029630000-10029660000 rw-p 00000000 00:00 0          [heap]
  7fff834a0000-7fff834b0000 rw-p 00000000 00:00 0
  7fff834b0000-7fff83670000 r-xp 00000000 fc:00 9177190  /lib/powerpc64le-linux-gnu/libc-2.23.so
  7fff83670000-7fff83680000 r--p 001b0000 fc:00 9177190  /lib/powerpc64le-linux-gnu/libc-2.23.so
  7fff83680000-7fff83690000 rw-p 001c0000 fc:00 9177190  /lib/powerpc64le-linux-gnu/libc-2.23.so
  7fff83690000-7fff836a0000 rw-p 00000000 00:00 0
  7fff836a0000-7fff836c0000 r-xp 00000000 00:00 0        [vdso]
  7fff836c0000-7fff83700000 r-xp 00000000 fc:00 9177193  /lib/powerpc64le-linux-gnu/ld-2.23.so
  7fff83700000-7fff83710000 r--p 00030000 fc:00 9177193  /lib/powerpc64le-linux-gnu/ld-2.23.so
  7fff83710000-7fff83720000 rw-p 00040000 fc:00 9177193  /lib/powerpc64le-linux-gnu/ld-2.23.so
  7fffdccf0000-7fffdcd20000 rw-p 00000000 00:00 0        [stack]
  1000000000000-1000000010000 rw-p 00000000 00:00 0
  1ffff83710000-1ffff83720000 rw-p 00000000 00:00 0

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-01 21:12:29 +11:00
Aneesh Kumar K.V
82228e362f powerpc/pseries: Skip using reserved virtual address range
Now that we use all the available virtual address range, we need to make
sure we don't generate VSID such that it overlaps with the reserved vsid
range. Reserved vsid range include the virtual address range used by the
adjunct partition and also the VRMA virtual segment. We find the context
value that can result in generating such a VSID and reserve it early in
boot.

We don't look at the adjunct range, because for now we disable the
adjunct usage in a Linux LPAR via CAS interface.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Rewrite hash__reserve_context_id(), move the rest into pseries]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-01 21:12:27 +11:00
Aneesh Kumar K.V
bb1832217a powerpc/mm/hash: Store addr_limit in PACA
We optmize the slice page size array copy to paca by copying only the
range based on addr_limit. This will require us to not look at page size
array beyond addr_limit in PACA on slb fault. To enable that copy task
size to paca which will be used during slb fault.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Rename from task_size to addr_limit, consolidate #ifdefs]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-01 21:12:27 +11:00
Aneesh Kumar K.V
957b778a16 powerpc/mm: Add addr_limit to mm_context and use it to derive max slice index
In the followup patch, we will increase the slice array size to handle
512TB range, but will limit the max addr to 128TB. Avoid doing
unnecessary computation and avoid doing slice mask related operation
above address limit.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-01 21:12:20 +11:00
Aneesh Kumar K.V
f6eedbba7a powerpc/mm/hash: Increase VA range to 128TB
We update the hash linux page table layout such that we can support
512TB. But we limit the TASK_SIZE to 128TB. We can switch to 128TB by
default without conditional because that is the max virtual address
supported by other architectures. We will later add a mechanism to
on-demand increase the application's effective address range to 512TB.

Having the page table layout changed to accommodate 512TB makes testing
large memory configuration easier with less code changes to kernel

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:10:01 +11:00
Aneesh Kumar K.V
59248aecc4 powerpc/mm/hash: Convert mask to unsigned long
This doesn't have any functional change. But helps in avoiding mistakes
in case the shift bit changes

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:10:00 +11:00
Aneesh Kumar K.V
e6f81a9201 powerpc/mm/hash: Support 68 bit VA
Inorder to support large effective address range (512TB), we want to
increase the virtual address bits to 68. But we do have platforms like
p4 and p5 that can only do 65 bit VA. We support those platforms by
limiting context bits on them to 16.

The protovsid -> vsid conversion is verified to work with both 65 and 68
bit va values. I also documented the restrictions in a table format as
part of code comments.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:10:00 +11:00
Michael Ellerman
85beb1c486 powerpc/mm/hash: Check for non-kernel address in get_kernel_vsid()
get_kernel_vsid() has a very stern comment saying that it's only valid
for kernel addresses, but there's nothing in the code to enforce that.

Rather than hoping our callers are well behaved, add a check and return
a VSID of 0 (invalid).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:09:59 +11:00
Aneesh Kumar K.V
941711a3a0 powerpc/mm/hash: Use context ids 1-4 for the kernel
Currently we use the top 4 context ids (0x7fffc-0x7ffff) for the kernel.
Kernel VSIDs are built using these top context values and effective the
segement ID. In subsequent patches we want to increase the max effective
address to 512TB. We will achieve that by increasing the effective
segment IDs there by increasing virtual address range.

We will be switching to a 68bit virtual address in the following patch.
But platforms like Power4 and Power5 only support a 65 bit virtual
address. We will handle that by limiting the context bits to 16 instead
of 19 on those platforms. That means the max context id will have a
different value on different platforms.

So that we don't have to deal with the kernel context ids changing
between different platforms, move the kernel context ids down to use
context ids 1-4.

We can't use segment 0 of context-id 0, because that maps to VSID 0,
which we want to keep as invalid, so we avoid context-id 0 entirely.
Similarly we can't use the last segment of the maximum context, so we
avoid it too.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Switch from 0-3 to 1-4 so VSID=0 remains invalid]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:09:59 +11:00
Michael Ellerman
760573c1a9 powerpc/mm: Split radix vs hash mm context initialisation
Complete the split of the radix vs hash mm context initialisation.

This is mostly code movement, with the exception that we now limit the
context allocation to PRTB_ENTRIES - 1 on radix.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:09:58 +11:00
Michael Ellerman
a336f2f5b0 powerpc/mm/hash: Abstract context id allocation for KVM
KVM wants to be able to allocate an MMU context id, which it does
currently by calling __init_new_context().

We're about to rework that code, so provide a wrapper for KVM so it
can not worry about the details.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:09:57 +11:00
Aneesh Kumar K.V
82185222ff powerpc/mm/slice: Move slice_mask struct definition to slice.c
This structure definition need not be in a header since this is used only by
slice.c file. So move it to slice.c. This also allow us to use SLICE_NUM_HIGH
instead of 64.

I also switch the low_slices type to u64 from u16. This doesn't have an impact
on size of struct due to padding added with u16 type. This helps in using
bitmap printing function for printing slice mask.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:09:56 +11:00
Aneesh Kumar K.V
52b1e66587 powerpc/mm: Move copy_mm_to_paca to paca.c
We also update the function arg to struct mm_struct. Move this so that function
finds the definition of struct mm_struct. No functional change in this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:09:54 +11:00
Aneesh Kumar K.V
f3207c124e powerpc/mm/slice: Convert slice_mask high slice to a bitmap
In followup patch we want to increase the va range which will result
in us requiring high_slices to have more than 64 bits. To enable this
convert high_slices to bitmap. We keep the number bits same in this patch
and later change that to higher value

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Fold in fix to use bitmap_empty()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:09:53 +11:00
Aneesh Kumar K.V
6aa59f5162 powerpc/mm: Move hash specific pte bits to be top bits of RPN
We don't support the full 57 bits of physical address and hence can
overload the top bits of RPN as hash specific pte bits.

Add a BUILD_BUG_ON() to enforce the relationship between H_PAGE_F_SECOND
and H_PAGE_F_GIX.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Move the BUILD_BUG_ON() into hash_utils_64.c and comment it]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:09:52 +11:00
Aneesh Kumar K.V
2f18d53375 powerpc/mm: Lower the max real address to 53 bits
Max value supported by hardware is 51 bits address. Radix page table define
a slot of 57 bits for future expansion. We restrict the value supported in
linux kernel 53 bits, so that we can use the bits between 57-53 for storing
hash linux page table bits. This is done in the next patch.

This will free up the software page table bits to be used for features
that are needed for both hash and radix. The current hash linux page table
format doesn't have any free software bits. Moving hash linux page table
specific bits to top of RPN field free up the software bits for other purpose.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:09:52 +11:00
Aneesh Kumar K.V
32789d3821 powerpc/mm: Define all PTE bits based on radix definitions.
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:09:51 +11:00
Aneesh Kumar K.V
54c4025efc powerpc/mm: Define _PAGE_SOFT_DIRTY unconditionally
Conditional PTE bit definition is confusing and results in coding error.

Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:09:51 +11:00
Aneesh Kumar K.V
ddb014b68d powerpc/mm/radix: rename _PAGE_LARGE to R_PAGE_LARGE
This bit is only used by radix and it is nice to follow the naming style of having
bit name start with H_/R_ depending on which translation mode they are used.

No functional change in this patch.

Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:09:49 +11:00
Aneesh Kumar K.V
f5bd0fdc0c powerpc/mm: Cleanup bits definition between hash and radix.
Define everything based on bits present in pgtable.h. This will help in easily
identifying overlapping bits between hash/radix.

No functional change with this patch.

Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:09:48 +11:00
Aneesh Kumar K.V
b42279f016 powerpc/mm/nohash: MM_SLICE is only used by book3s 64
BOOKE code is dead code as per the Kconfig details. So make it simpler
by enabling MM_SLICE only for book3s_64. The changes w.r.t nohash is just
removing deadcode. W.r.t ppc64, 4k without hugetlb will now enable MM_SLICE.
But that is good, because we reduce one extra variant which probably is not
getting tested much.

Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-31 23:09:47 +11:00
Alexey Kardashevskiy
e5afdf9dd5 powerpc/vfio_spapr_tce: Add reference counting to iommu_table
So far iommu_table obejcts were only used in virtual mode and had
a single owner. We are going to change this by implementing in-kernel
acceleration of DMA mapping requests. The proposed acceleration
will handle requests in real mode and KVM will keep references to tables.

This adds a kref to iommu_table and defines new helpers to update it.
This replaces iommu_free_table() with iommu_tce_table_put() and makes
iommu_free_table() static. iommu_tce_table_get() is not used in this patch
but it will be in the following patch.

Since this touches prototypes, this also removes @node_name parameter as
it has never been really useful on powernv and carrying it for
the pseries platform code to iommu_free_table() seems to be quite
useless as well.

This should cause no behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-30 21:42:11 +11:00
Alexey Kardashevskiy
a540aa56ba powerpc/powernv/iommu: Add real mode version of iommu_table_ops::exchange()
In real mode, TCE tables are invalidated using special
cache-inhibited store instructions which are not available in
virtual mode

This defines and implements exchange_rm() callback. This does not
define set_rm/clear_rm/flush_rm callbacks as there is no user for those -
exchange/exchange_rm are only to be used by KVM for VFIO.

The exchange_rm callback is defined for IODA1/IODA2 powernv platforms.

This replaces list_for_each_entry_rcu with its lockless version as
from now on pnv_pci_ioda2_tce_invalidate() can be called in
the real mode too.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-30 21:42:01 +11:00
Peter Zijlstra
19d436268d debug: Add _ONCE() logic to report_bug()
Josh suggested moving the _ONCE logic inside the trap handler, using a
bit in the bug_entry::flags field, avoiding the need for the extra
variable.

Sadly this only works for WARN_ON_ONCE(), since the others have
printk() statements prior to triggering the trap.

Still, this saves a fair amount of text and some data:

  text         data       filename
  10682460     4530992    defconfig-build/vmlinux.orig
  10665111     4530096    defconfig-build/vmlinux.patched

Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-30 09:37:20 +02:00
Al Viro
527b5baead powerpc: switch to extable.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-28 18:23:54 -04:00
Alexey Kardashevskiy
6b5c19c552 powerpc/mmu: Add real mode support for IOMMU preregistered memory
This makes mm_iommu_lookup() able to work in realmode by replacing
list_for_each_entry_rcu() (which can do debug stuff which can fail in
real mode) with list_for_each_entry_lockless().

This adds realmode version of mm_iommu_ua_to_hpa() which adds
explicit vmalloc'd-to-linear address conversion.
Unlike mm_iommu_ua_to_hpa(), mm_iommu_ua_to_hpa_rm() can fail.

This changes mm_iommu_preregistered() to receive @mm as in real mode
@current does not always have a correct pointer.

This adds realmode version of mm_iommu_lookup() which receives @mm
(for the same reason as for mm_iommu_preregistered()) and uses
lockless version of list_for_each_entry_rcu().

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-27 15:25:48 +11:00
Sridhar Samudrala
6d4339028b net: Introduce SO_INCOMING_NAPI_ID
This socket option returns the NAPI ID associated with the queue on which
the last frame is received. This information can be used by the apps to
split the incoming flows among the threads based on the Rx queue on which
they are received.

If the NAPI ID actually represents a sender_cpu then the value is ignored
and 0 is returned.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-24 20:49:31 -07:00
David S. Miller
16ae1f2236 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/genet/bcmmii.c
	drivers/net/hyperv/netvsc.c
	kernel/bpf/hashtab.c

Almost entirely overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-23 16:41:27 -07:00
Josh Hunt
a2d133b1d4 sock: introduce SO_MEMINFO getsockopt
Allows reading of SK_MEMINFO_VARS via socket option. This way an
application can get all meminfo related information in single socket
option call instead of multiple calls.

Adds helper function, sk_get_meminfo(), and uses that for both
getsockopt and sock_diag_put_meminfo().

Suggested by Eric Dumazet.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Jason Baron <jbaron@akamai.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-22 11:18:58 -07:00
Nicholas Piggin
58c8d17f2e powerpc/64s: Move POWER machine check defines into mce_power.c
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-21 22:09:29 +11:00
Ben Hutchings
43a8888f0a powerpc: Fix missing CRCs, add more asm-prototypes.h declarations
Add declarations for:
  - __mfdcr, __mtdcr (if CONFIG_PPC_DCR_NATIVE=y; through <asm/dcr.h>)
  - switch_mmu_context (if CONFIG_PPC_BOOK3S_64=n; through <asm/mmu_context.h>)

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-21 22:09:26 +11:00
Tobin C. Harding
b3a7864c6f powerpc/ftrace: Add prototype for prepare_ftrace_return()
Sparse emits a warning: symbol 'prepare_ftrace_return' was not
declared. Should it be static? prepare_ftrace_return() is called from
assembler and should not be static.

Add a prototype for it to asm-prototypes.h and include that in ftrace.c.

Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-20 19:02:49 +11:00
Tobin C. Harding
017614a5d6 powerpc/pseries: Move struct hcall_stats to hvCall_inst.c
struct hcall_stats is only used in hvCall_inst.c, so move it there.

Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-20 19:02:49 +11:00
Hamish Martin
476134070c powerpc: Move THREAD_SHIFT config to Kconfig
Shift the logic for defining THREAD_SHIFT logic to Kconfig in order to
allow override by users.

Signed-off-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
Reviewed-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-20 19:02:49 +11:00
Kirill A. Shutemov
9a804fecee mm/gup: Drop the arch_pte_access_permitted() MMU callback
The only arch that defines it to something meaningful is x86.
But x86 doesn't use the generic GUP_fast() implementation -- the
only place where the callback is called.

Let's drop it.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K . V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dann Frazier <dann.frazier@canonical.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170316152655.37789-2-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-18 09:48:01 +01:00
Chandan Rajendra
f717629c7f powerpc: Wire up statx() syscall
Test runs on a ppc64 BE guest succeeded. linux/samples/statx/test-statx
program was executed on the following file types,

1. Regular file
2. Directory
3. device file
4. symlink
5. Named pipe

The test run also included invoking test-statx with the runtime options
provided in the main() function of test-statx.c

Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-16 20:45:53 +11:00
Linus Torvalds
fb5fe0fd62 powerpc fixes for 4.11 #4
The main item is the addition of the Power9 Machine Check handler. This was
 delayed to make sure some details were correct, and is as minimal as possible.
 
 The rest is small fixes, two for the Power9 PMU, two dealing with obscure
 toolchain problems, two for the PowerNV IOMMU code (used by VFIO), and one to
 fix a crash on 32-bit machines with macio devices due to missing dma_ops.
 
 Thanks to:
   Alexey Kardashevskiy, Cyril Bur, Larry Finger, Madhavan Srinivasan, Nicholas
   Piggin.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYx0IxAAoJEFHr6jzI4aWAplIQAKtEJklDDnu/lqnR1iR+Tiqf
 fyVAdiPJ2MBcwkodcodg12PNcU2vB9nQwzfNc2BbZe81xZjjAPLNSA3IwAZGm+oB
 U+B+oltJu5eKMg7wjRp3rkZZ7h19jT5j/auUAq+kJ9EmtT0Auo0CiQXBuxm2XBpF
 77s52A64Ey1EIiSQz/GUW8/vJtGiWj5+tQj55Fsstv8vDyPCrq2AZCoU27z8keFs
 iGXSLIuBUCC/VH3U6CmxzBH+g8eYm7ccL/D0T51qgxmUFWh/5NStzIPzjRP1Kq57
 iV7hcKiSfNvzLY/rKYr+ziPDH8E3fixZUtcFBMpLKTEfLqJhRZQL8dDvxsfHNe2E
 LpWabvnuHCIEf5UEyrrfev+CYVGIrlSC+BD9Ra895KH2h2zmmziRAuQ7gB/h72+o
 FDpfcy1Pzgw3BA+CVqL73jZZSgL3GkGigozO1jpU8h+7ufBRKHqdFehvso72N18U
 NOHVrNil5qerwN3R9obaVUnXDLCVj67c8ep6cW2zYRkX3oDaXDlBf88VIc4bU9dm
 adHUdkmbWIQB096bMTfukY+lsxA3KFq2xfPjlkAwoRkrXx55Qa4ZYCnLcE1rwj8M
 18zjroq+7UQsbVGH4rK3iUgUxYbvT7seVA/U7lLchyLdn4qn1TAYXYscW0GIZDdM
 dZELElGPncH5x4uEA6Sy
 =390M
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull some more powerpc fixes from Michael Ellerman:
 "The main item is the addition of the Power9 Machine Check handler.
  This was delayed to make sure some details were correct, and is as
  minimal as possible.

  The rest is small fixes, two for the Power9 PMU, two dealing with
  obscure toolchain problems, two for the PowerNV IOMMU code (used by
  VFIO), and one to fix a crash on 32-bit machines with macio devices
  due to missing dma_ops.

  Thanks to:
    Alexey Kardashevskiy, Cyril Bur, Larry Finger, Madhavan Srinivasan,
    Nicholas Piggin"

* tag 'powerpc-4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: POWER9 machine check handler
  powerpc/64s: allow machine check handler to set severity and initiator
  powerpc/64s: fix handling of non-synchronous machine checks
  powerpc/pmac: Fix crash in dma-mapping.h with NULL dma_ops
  powerpc/powernv/ioda2: Update iommu table base on ownership change
  powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested
  selftests/powerpc: Replace stxvx and lxvx with stxvd2x/lxvd2x
  powerpc/perf: Handle sdar_mode for marked event in power9
  powerpc/perf: Fix perf_get_data_addr() for power9 DD1
  powerpc/boot: Fix zImage TOC alignment
2017-03-13 19:48:22 -07:00
Linus Torvalds
baeedc7158 Merge branch 'prep-for-5level'
Merge 5-level page table prep from Kirill Shutemov:
 "Here's relatively low-risk part of 5-level paging patchset. Merging it
  now will make x86 5-level paging enabling in v4.12 easier.

  The first patch is actually x86-specific: detect 5-level paging
  support. It boils down to single define.

  The rest of patchset converts Linux MMU abstraction from 4- to 5-level
  paging.

  Enabling of new abstraction in most cases requires adding single line
  of code in arch-specific code. The rest is taken care by asm-generic/.

  Changes to mm/ code are mostly mechanical: add support for new page
  table level -- p4d_t -- where we deal with pud_t now.

  v2:
   - fix build on microblaze (Michal);
   - comment for __ARCH_HAS_5LEVEL_HACK in kasan_populate_zero_shadow();
   - acks from Michal"

* emailed patches from Kirill A Shutemov <kirill.shutemov@linux.intel.com>:
  mm: introduce __p4d_alloc()
  mm: convert generic code to 5-level paging
  asm-generic: introduce <asm-generic/pgtable-nop4d.h>
  arch, mm: convert all architectures to use 5level-fixup.h
  asm-generic: introduce __ARCH_USE_5LEVEL_HACK
  asm-generic: introduce 5level-fixup.h
  x86/cpufeature: Add 5-level paging detection
2017-03-10 08:59:07 -08:00
Nicholas Piggin
7b9f71f974 powerpc/64s: POWER9 machine check handler
Add POWER9 machine check handler. There are several new types of errors
added, so logging messages for those are also added.

This doesn't attempt to reuse any of the P7/8 defines or functions,
because that becomes too complex. The better option in future is to use
a table driven approach.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-10 16:32:08 +11:00
Nicholas Piggin
c1bbf387d6 powerpc/64s: allow machine check handler to set severity and initiator
Currently severity and initiator are always set to MCE_SEV_ERROR_SYNC and
MCE_INITIATOR_CPU in the core mce code. Allow them to be set by the
machine specific mce handlers.

No functional change for existing handlers.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-10 16:32:07 +11:00
Aneesh Kumar K.V
d19469e841 power/mm: update pte_write and pte_wrprotect to handle savedwrite
We use pte_write() to check whethwer the pte entry is writable.  This is
mostly used to later mark the pte read only if it is writable.  The other
use of pte_write() is to check whether the pte_entry is writable so that
hardware page table entry can be marked accordingly.  This is used in kvm
where we look at qemu page table entry and update hardware hash page table
for the guest with correct write enable bit.

With the above, for the first usage we should also check the savedwrite
bit so that we can correctly clear the savedwite bit.  For the later, we
add a new variant __pte_write().

With this we can revert write_protect_page part of 595cd8f256 ("mm/ksm:
handle protnone saved writes when making page write protect").  But I left
it as it is as an example code for savedwrite check.

Fixes: c137a2757b ("powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write")
Link: http://lkml.kernel.org/r/1488203787-17849-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09 17:01:09 -08:00
Aneesh Kumar K.V
52c50ca75c powerpc/mm: handle protnone ptes on fork
We need to mark pages of parent process read only on fork.  Numa fault
pte needs a protnone ptes variant with saved write flag set.  On fork we
need to make sure we remove the saved write bit.  Instead of adding the
protnone check in the caller update ptep_set_wrprotect variants to clear
savedwrite bit.

Without this we see random segfaults in application on fork.

Fixes: c137a2757b ("powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write")
Link: http://lkml.kernel.org/r/1488203787-17849-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09 17:01:09 -08:00
Kirill A. Shutemov
9849a5697d arch, mm: convert all architectures to use 5level-fixup.h
If an architecture uses 4level-fixup.h we don't need to do anything as
it includes 5level-fixup.h.

If an architecture uses pgtable-nop*d.h, define __ARCH_USE_5LEVEL_HACK
before inclusion of the header. It makes asm-generic code to use
5level-fixup.h.

If an architecture has 4-level paging or folds levels on its own,
include 5level-fixup.h directly.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-09 11:48:47 -08:00
Josh Poimboeuf
a768f78429 livepatch/powerpc: add TIF_PATCH_PENDING thread flag
Add the TIF_PATCH_PENDING thread flag to enable the new livepatch
per-task consistency model for powerpc.  The bit getting set indicates
the thread has a pending patch which needs to be applied when the thread
exits the kernel.

The bit is included in the _TIF_USER_WORK_MASK macro so that
do_notify_resume() and klp_update_patch_state() get called when the bit
is set.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-08 09:20:19 +01:00
Linus Torvalds
f7d6a7283a powerpc fixes for 4.11 #3
Five fairly small fixes for things that went in this cycle.
 
 A fairly large patch to rework the CAS logic on Power9, necessitated by a late
 change to the firmware API, and we can't boot without it.
 
 Three fixes going to stable, allowing more instructions to be emulated on LE,
 fixing a boot crash on 32-bit Freescale BookE machines, and the OPAL XICS
 workaround.
 
 And a patch from me to sort the selects under CONFIG PPC. Annoying churn, but
 worth it in the long run, and best for it to go in now to avoid conflicts.
 
 Thanks to:
   Alexey Kardashevskiy, Anton Blanchard, Balbir Singh, Gautham R. Shenoy,
   Laurentiu Tudor, Nicholas Piggin, Paul Mackerras, Ravi Bangoria, Sachin Sant,
   Shile Zhang, Suraj Jitindar Singh.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYvqSxAAoJEFHr6jzI4aWAjMQP/06OFGz3VQvO5Q8jPsqRF22y
 Wr+04OKFmKnYVObdQk15HGOagp1fSkWWHfP/eu50kx1WNCzq7tQdLjNSi7H4F3s1
 4NwlaOfSQoxctsVtfnITJkfVScjcxK7XVagswtb3wvBpBx4lwD8fGwxkSxj6NhRw
 PNxLi44wobb8mDyR6L/6tJKBI2Jt12qXZY+kBQIleun5+lF8fNXIu4qPiglMOia6
 oPhXlp4RASt8wz74H8JuMTwGv17MxG+zvbkDPwQC7PI/fohJLybgWEfByN4H5UMy
 7Xi/lWHlShAyc7ulAIN+A1mHKY9LSv45U6qrrHFUJgRftZihoZHe6ekcI+h5oFVX
 chP9oUrQNeeZ5QqUC4rYdWwsMfiXBI0y5+BCupItixXc1LANBH9Ym9IECbgPRP93
 LQVqiS4958KijHlYBOA2zPicl/FnVO16orqakyRS0B3lQ54XBvhcgG8gIXjQr8PM
 Mt2W4r6RtGJ4ddhUPpF/W4lEuR4+dmXfEqs7DkgBKRbvi8XYkiLx2byBNh/OMRUG
 T4ILXsYf50AKRAq/jFTs9A0zkjtmtBeDdn96Mcan8i3WZuTQ7b8mQlC46zEg23A8
 XmTG2xt7N1dMjjwS78CfnvQ8sIVtA9AUfK37aTc0ICMsBCqEcWLAhHKZyCw0h25C
 wq9BMn4e5Gdg2xLTHKlL
 =SxON
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Five fairly small fixes for things that went in this cycle.

  A fairly large patch to rework the CAS logic on Power9, necessitated
  by a late change to the firmware API, and we can't boot without it.

  Three fixes going to stable, allowing more instructions to be emulated
  on LE, fixing a boot crash on 32-bit Freescale BookE machines, and the
  OPAL XICS workaround.

  And a patch from me to sort the selects under CONFIG PPC. Annoying
  churn, but worth it in the long run, and best for it to go in now to
  avoid conflicts.

  Thanks to:
    Alexey Kardashevskiy, Anton Blanchard, Balbir Singh, Gautham R.
    Shenoy, Laurentiu Tudor, Nicholas Piggin, Paul Mackerras, Ravi
    Bangoria, Sachin Sant, Shile Zhang, Suraj Jitindar Singh"

* tag 'powerpc-4.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Sort the selects under CONFIG_PPC
  powerpc/64: Fix L1D cache shape vector reporting L1I values
  powerpc/64: Avoid panic during boot due to divide by zero in init_cache_info()
  powerpc: Update to new option-vector-5 format for CAS
  powerpc: Parse the command line before calling CAS
  powerpc/xics: Work around limitations of OPAL XICS priority handling
  powerpc/64: Fix checksum folding in csum_add()
  powerpc/powernv: Fix opal tracepoints with JUMP_LABEL=n
  powerpc/booke: Fix boot crash due to null hugepd
  powerpc: Fix compiling a BE kernel with a powerpc64le toolchain
  selftest/powerpc: Fix false failures for skipped tests
  powerpc/powernv: Fix bug due to labeling ambiguity in power_enter_stop
  powerpc/64: Invalidate process table caching after setting process table
  powerpc: emulate_step() tests for load/store instructions
  powerpc: Emulation support for load/store instructions on LE
2017-03-07 10:46:10 -08:00
Michael Ellerman
9c7a00868c powerpc/64: Fix L1D cache shape vector reporting L1I values
It seems we didn't pay quite enough attention when testing the new cache
shape vectors, which means we didn't notice the bug where the vector for
the L1D was using the L1I values. Fix it, resulting in eg:

  L1I  cache size:     0x8000      32768B         32K
  L1I  line size:        0x80       8-way associative
  L1D  cache size:    0x10000      65536B         64K
  L1D  line size:        0x80       8-way associative

Fixes: 98a5f361b8 ("powerpc: Add new cache geometry aux vectors")
Cut-and-paste-bug-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Badly-reviewed-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-06 21:51:32 +11:00
Suraj Jitindar Singh
014d02cbf1 powerpc: Update to new option-vector-5 format for CAS
On POWER9 the ibm,client-architecture-support (CAS) negotiation process
has been updated to change how the host to guest negotiation is done for
the new hash/radix mmu as well as the nest mmu, process tables and guest
translation shootdown (GTSE).

This is documented in the unreleased PAPR ACR "CAS option vector
additions for P9".

The host tells the guest which options it supports in
ibm,arch-vec-5-platform-support. The guest then chooses a subset of these
to request in the CAS call and these are agreed to in the
ibm,architecture-vec-5 property of the chosen node.

Thus we read ibm,arch-vec-5-platform-support and make our selection before
calling CAS. We then parse the ibm,architecture-vec-5 property of the
chosen node to check whether we should run as hash or radix.

ibm,arch-vec-5-platform-support format:

index value pairs: <index, val> ... <index, val>

index: Option vector 5 byte number
val:   Some representation of supported values

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Don't print about unknown options, be consistent with OV5_FEAT]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-06 21:44:09 +11:00
Al Viro
444f02c458 uaccess: drop pointless ifdefs
None of those file is ever included from uapi stuff, so __KERNEL__
is always defined.  None of them is ever included from assembler
(they are only pulled from linux/uaccess.h, which _can't_ be
included from assembler), so __ASSEMBLY__ is never defined.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-05 21:57:58 -05:00
Al Viro
af1d5b37d6 uaccess: drop duplicate includes from asm/uaccess.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-05 21:57:49 -05:00
Al Viro
5e6039d8a3 uaccess: move VERIFY_{READ,WRITE} definitions to linux/uaccess.h
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-05 20:40:25 -05:00
Linus Torvalds
2d62e0768d Second batch of KVM changes for 4.11 merge window
PPC:
  * correct assumption about ASDR on POWER9
  * fix MMIO emulation on POWER9
 
 x86:
  * add a simple test for ioperm
  * cleanup TSS
    (going through KVM tree as the whole undertaking was caused by VMX's
     use of TSS)
  * fix nVMX interrupt delivery
  * fix some performance counters in the guest
 
 And two cleanup patches.
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJYuu5qAAoJEED/6hsPKofoRAUH/jkx/KFDcw3FggixysWVgRai
 iLSbbAZemnSLFSOkOU/t7Bz0fXCUgB0tAcMJd9ow01Dg1zObiTpuUIo6qEPaYHdX
 gqtUzlHuyECZEcgK0RXS9kDYLrvw7EFocxnDWQfV91qCZSS6nBSSLF3ST1rNV69W
 mUvcZG+MciDcZUe1lTexoswVTh1m7avvozEnQ5OHnZR9yicoXiadBQjzL6yqWoqf
 Ml/29zRk5+MvloTudxjkAKm3mh7psW88jNMh37TXbAA7i+Xwl9cU6GLR9mFWstoP
 7Ot7ecq9mNAUO3lTIQh7lqvB60LMFznS4IlYK7MbplC3kvJLkfzhTWaN1aGvh90=
 =cqHo
 -----END PGP SIGNATURE-----

Merge tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Radim Krčmář:
 "Second batch of KVM changes for the 4.11 merge window:

  PPC:
   - correct assumption about ASDR on POWER9
   - fix MMIO emulation on POWER9

  x86:
   - add a simple test for ioperm
   - cleanup TSS (going through KVM tree as the whole undertaking was
     caused by VMX's use of TSS)
   - fix nVMX interrupt delivery
   - fix some performance counters in the guest

  ... and two cleanup patches"

* tag 'kvm-4.11-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: nVMX: Fix pending events injection
  x86/kvm/vmx: remove unused variable in segment_base()
  selftests/x86: Add a basic selftest for ioperm
  x86/asm: Tidy up TSS limit code
  kvm: convert kvm.users_count from atomic_t to refcount_t
  KVM: x86: never specify a sample period for virtualized in_tx_cp counters
  KVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9
  KVM: PPC: Book3S HV: Fix software walk of guest process page tables
2017-03-04 11:36:19 -08:00
Shile Zhang
6ad966d730 powerpc/64: Fix checksum folding in csum_add()
Paul's patch to fix checksum folding, commit b492f7e4e0 ("powerpc/64:
Fix checksum folding in csum_tcpudp_nofold and ip_fast_csum_nofold")
missed a case in csum_add(). Fix it.

Signed-off-by: Shile Zhang <shile.zhang@nokia.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-04 23:07:17 +11:00
Laurentiu Tudor
3fb66a70a4 powerpc/booke: Fix boot crash due to null hugepd
On 32-bit book-e machines, hugepd_ok() no longer takes into account null
hugepd values, causing this crash at boot:

  Unable to handle kernel paging request for data at address 0x80000000
  ...
  NIP [c0018378] follow_huge_addr+0x38/0xf0
  LR [c001836c] follow_huge_addr+0x2c/0xf0
  Call Trace:
   follow_huge_addr+0x2c/0xf0 (unreliable)
   follow_page_mask+0x40/0x3e0
   __get_user_pages+0xc8/0x450
   get_user_pages_remote+0x8c/0x250
   copy_strings+0x110/0x390
   copy_strings_kernel+0x2c/0x50
   do_execveat_common+0x478/0x630
   do_execve+0x2c/0x40
   try_to_run_init_process+0x18/0x60
   kernel_init+0xbc/0x110
   ret_from_kernel_thread+0x5c/0x64

This impacts all nxp (ex-freescale) 32-bit booke platforms.

This was caused by the change of hugepd_t.pd from signed to unsigned,
and the update to the nohash version of hugepd_ok(). Previously
hugepd_ok() could exclude all non-huge and NULL pgds using > 0, whereas
now we need to explicitly check that the value is not zero and also that
PD_HUGE is *clear*.

This isn't protected by the pgd_none() check in __find_linux_pte_or_hugepte()
because on 32-bit we use pgtable-nopud.h, which causes the pgd_none()
check to be always false.

Fixes: 20717e1ff5 ("powerpc/mm: Fix little-endian 4K hugetlb")
Cc: stable@vger.kernel.org # v4.7+
Reported-by: Madalin-Cristian Bucur <madalin.bucur@nxp.com>
Signed-off-by: Laurentiu Tudor <laurentiu.tudor@nxp.com>
[mpe: Flesh out change log details.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-03 11:24:50 +11:00
Gautham R. Shenoy
424f8acd32 powerpc/powernv: Fix bug due to labeling ambiguity in power_enter_stop
Commit 09206b600c ("powernv: Pass PSSCR value and mask to
power9_idle_stop") added additional code in power_enter_stop() to
distinguish between stop requests whose PSSCR had ESL=EC=1 from those
which did not. When ESL=EC=1, we do a forward-jump to a location
labelled by "1", which had the code to handle the ESL=EC=1 case.

Unfortunately just a couple of instructions before this label, is the
macro IDLE_STATE_ENTER_SEQ() which also has a label "1" in its
expansion.

As a result, the current code can result in directly executing stop
instruction for deep stop requests with PSSCR ESL=EC=1, without saving
the hypervisor state.

Fix this BUG by labeling the location that handles ESL=EC=1 case with
a more descriptive label ".Lhandle_esl_ec_set" (local label suggestion
a la .Lxx from Anton Blanchard).

While at it, rename the label "2" labelling the location of the code
handling entry into deep stop states with ".Lhandle_deep_stop".

For a good measure, change the label in IDLE_STATE_ENTER_SEQ() macro
to an not-so commonly used value in order to avoid similar mishaps in
the future.

Fixes: 09206b600c ("powernv: Pass PSSCR value and mask to power9_idle_stop")
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-03 11:24:50 +11:00
Ravi Bangoria
4ceae137bd powerpc: emulate_step() tests for load/store instructions
Add new selftest that test emulate_step for Normal, Floating Point,
Vector and Vector Scalar - load/store instructions. Test should run
at boot time if CONFIG_KPROBES_SANITY_TEST and CONFIG_PPC64 is set.

Sample log:

  emulate_step_test: ld             : PASS
  emulate_step_test: lwz            : PASS
  emulate_step_test: lwzx           : PASS
  emulate_step_test: std            : PASS
  emulate_step_test: ldarx / stdcx. : PASS
  emulate_step_test: lfsx           : PASS
  emulate_step_test: stfsx          : PASS
  emulate_step_test: lfdx           : PASS
  emulate_step_test: stfdx          : PASS
  emulate_step_test: lvx            : PASS
  emulate_step_test: stvx           : PASS
  emulate_step_test: lxvd2x         : PASS
  emulate_step_test: stxvd2x        : PASS

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
[mpe: Drop start/complete lines, make it all __init]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-03 11:24:50 +11:00
Linus Torvalds
b286cedd47 powerpc updates for 4.11 part 2
Highlights include:
 
  - An update of the disassembly code used by xmon to the latest versions in
    binutils. We've received permission from all the authors of the relevant
    binutils changes to relicense their changes to the relevant files from GPLv3
    to GPLv2, for inclusion in Linux. Thanks to Peter Bergner for doing the leg
    work to get permission from everyone.
 
  - Addition of the "architected" Power9 CPU table entry, allowing us to boot
    in Power9 architected mode under a hypervisor.
 
  - Updates to the Power9 PMU code.
 
  - Implementation of clear_bit_unlock_is_negative_byte() to optimise
    unlock_page().
 
  - Freescale updates from Scott: "Highlights include 8xx breakpoints and perf,
    t1042rdb display support, and board updates."
 
 Thanks to:
   Al Viro, Andrew Donnellan, Aneesh Kumar K.V, Balbir Singh, Douglas Miller,
   Frédéric Weisbecker, Gavin Shan, Madhavan Srinivasan, Michael Roth, Nathan
   Fontenot, Naveen N. Rao, Nicholas Piggin, Peter Bergner, Paul E. McKenney,
   Rashmica Gupta, Russell Currey, Sahil Mehta, Stewart Smith.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYthsKAAoJEFHr6jzI4aWAaWMQAJ7mAwX98ncoYschPgRmmIun
 f6DtE4IonrxiZ22gp1ct4+c9OFtA+B5FXMcEhOKpfh93lg38PTDjHs9e5kfauD7+
 oTQ2Bg1eXaL48FKdmC5Vs4Kt+/J8e9guGafUC1OVIpTyyRPoZeUDH0lx+kSPV5bd
 PkL+wY/k3W0Njo8WgD1P9u3W15+BxISo/k8c7ajzKTHGBZlAvj5h2gO6XUBNMLyy
 YClB/qIymjZriSB+AeWYD79k8gPbBZPsmZG0ZF1hY060894LgqLB9mPOJdffx/DY
 H7/uP6jcsRDOXTOmyueW1SEmPoQbtysiMd1lNrCXKtC/Okr5uhn2cUhi88AsgWvd
 1QFly2lobcDAKPah/yB7YQGMAcmYvGGNuqrWaosaV2T7r0KprzUYYgCOqzvC3WSJ
 QtVatBzMIqRTMYq+3U4G1aHeCXlRazVQHDuvPby8RdR5b2gIexiqMab2eS7tSMIH
 mCOIunRIvT14g/7wxUV7tahN+ifncNxzAk4DvPO+Wc4FQ4sy7wArv2YipSaWRWtE
 u7tNdBkEwlDkKhJgRU5T0Op2PyMbHwCP8pWuz7PQIhKIcgwmP9wb07BIWG/GGIqn
 07TxJYX2ItabyEMZMsYhzILZqjLyiAaCARANB7ScbQbdP8wdcGZcwismhwnfROIU
 NuxsZg63BUDMoxk7Sauu
 =rspd
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull more powerpc updates from Michael Ellerman:
 "Highlights include:

   - an update of the disassembly code used by xmon to the latest
     versions in binutils. We've received permission from all the
     authors of the relevant binutils changes to relicense their changes
     to the relevant files from GPLv3 to GPLv2, for inclusion in Linux.
     Thanks to Peter Bergner for doing the leg work to get permission
     from everyone.

   - addition of the "architected" Power9 CPU table entry, allowing us
     to boot in Power9 architected mode under a hypervisor.

   - updates to the Power9 PMU code.

   - implementation of clear_bit_unlock_is_negative_byte() to optimise
     unlock_page().

   - Freescale updates from Scott: "Highlights include 8xx breakpoints
     and perf, t1042rdb display support, and board updates."

  Thanks to:
    Al Viro, Andrew Donnellan, Aneesh Kumar K.V, Balbir Singh, Douglas
    Miller, Frédéric Weisbecker, Gavin Shan, Madhavan Srinivasan,
    Michael Roth, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Peter
    Bergner, Paul E. McKenney, Rashmica Gupta, Russell Currey, Sahil
    Mehta, Stewart Smith"

* tag 'powerpc-4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (48 commits)
  powerpc: Remove leftover cputime_to_nsecs call causing build error
  powerpc/mm/hash: Always clear UPRT and Host Radix bits when setting up CPU
  powerpc/optprobes: Fix TOC handling in optprobes trampoline
  powerpc/pseries: Advertise Hot Plug Event support to firmware
  cxl: fix nested locking hang during EEH hotplug
  powerpc/xmon: Dump memory in CPU endian format
  powerpc/pseries: Revert 'Auto-online hotplugged memory'
  powerpc/powernv: Make PCI non-optional
  powerpc/64: Implement clear_bit_unlock_is_negative_byte()
  powerpc/powernv: Remove unused variable in pnv_pci_sriov_disable()
  powerpc/kernel: Remove error message in pcibios_setup_phb_resources()
  powerpc/mm: Fix typo in set_pte_at()
  pci/hotplug/pnv-php: Disable MSI and PCI device properly
  pci/hotplug/pnv-php: Disable surprise hotplug capability on conflicts
  pci/hotplug/pnv-php: Remove WARN_ON() in pnv_php_put_slot()
  powerpc: Add POWER9 architected mode to cputable
  powerpc/perf: use is_kernel_addr macro in perf_get_misc_flags()
  powerpc/perf: Avoid FAB_*_MATCH checks for power9
  powerpc/perf: Add restrictions to PMC5 in power9 DD1
  powerpc/perf: Use Instruction Counter value
  ...
2017-03-01 10:10:16 -08:00
Paul Mackerras
70cd4c10b2 KVM: PPC: Book3S HV: Fix software walk of guest process page tables
This fixes some bugs in the code that walks the guest's page tables.
These bugs cause MMIO emulation to fail whenever the guest is in
virtial mode (MMU on), leading to the guest hanging if it tried to
access a virtio device.

The first bug was that when reading the guest's process table, we were
using the whole of arch->process_table, not just the field that contains
the process table base address.  The second bug was that the mask used
when reading the process table entry to get the radix tree base address,
RPDB_MASK, had the wrong value.

Fixes: 9e04ba69be ("KVM: PPC: Book3S HV: Add basic infrastructure for radix guests")
Fixes: e99833448c ("powerpc/mm/radix: Add partition table format & callback")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-03-01 11:53:45 +11:00
Masahiro Yamada
8ab102d60a scripts/spelling.txt: add "partiton" pattern and fix typo instances
Fix typos and add the following to the scripts/spelling.txt:

  partiton||partition

Link: http://lkml.kernel.org/r/1481573103-11329-7-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Luis R. Rodriguez
7d134b2ce6 kprobes: move kprobe declarations to asm-generic/kprobes.h
Often all is needed is these small helpers, instead of compiler.h or a
full kprobes.h.  This is important for asm helpers, in fact even some
asm/kprobes.h make use of these helpers...  instead just keep a generic
asm file with helpers useful for asm code with the least amount of
clutter as possible.

Likewise we need now to also address what to do about this file for both
when architectures have CONFIG_HAVE_KPROBES, and when they do not.  Then
for when architectures have CONFIG_HAVE_KPROBES but have disabled
CONFIG_KPROBES.

Right now most asm/kprobes.h do not have guards against CONFIG_KPROBES,
this means most architecture code cannot include asm/kprobes.h safely.
Correct this and add guards for architectures missing them.
Additionally provide architectures that not have kprobes support with
the default asm-generic solution.  This lets us force asm/kprobes.h on
the header include/linux/kprobes.h always, but most importantly we can
now safely include just asm/kprobes.h on architecture code without
bringing the full kitchen sink of header files.

Two architectures already provided a guard against CONFIG_KPROBES on its
kprobes.h: sh, arch.  The rest of the architectures needed gaurds added.
We avoid including any not-needed headers on asm/kprobes.h unless
kprobes have been enabled.

In a subsequent atomic change we can try now to remove compiler.h from
include/linux/kprobes.h.

During this sweep I've also identified a few architectures defining a
common macro needed for both kprobes and ftrace, that of the definition
of the breakput instruction up.  Some refer to this as
BREAKPOINT_INSTRUCTION.  This must be kept outside of the #ifdef
CONFIG_KPROBES guard.

[mcgrof@kernel.org: fix arm64 build]
  Link: http://lkml.kernel.org/r/CAB=NE6X1WMByuARS4mZ1g9+W=LuVBnMDnh_5zyN0CLADaVh=Jw@mail.gmail.com
[sfr@canb.auug.org.au: fixup for kprobes declarations moving]
  Link: http://lkml.kernel.org/r/20170214165933.13ebd4f4@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170203233139.32682-1-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:45 -08:00
Linus Torvalds
ac1820fb28 This is a tree wide change and has been kept separate for that reason.
Bart Van Assche noted that the ib DMA mapping code was significantly
 similar enough to the core DMA mapping code that with a few changes
 it was possible to remove the IB DMA mapping code entirely and
 switch the RDMA stack to use the core DMA mapping code.  This resulted
 in a nice set of cleanups, but touched the entire tree.  This branch
 will be submitted separately to Linus at the end of the merge window
 as per normal practice for tree wide changes like this.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYo06oAAoJELgmozMOVy/d9Z8QALedWHdu98St1L0u2c8sxnR9
 2zo/4sF5Vb9u7FpmdIX32L4SQ9s9KhPE8Qp8NtZLf9v10zlDebIRJDpXknXtKooV
 CAXxX4sxBXV27/UrhbZEfXiPrmm6ccJFyIfRnMU6NlMqh2AtAsRa5AC2/RMp8oUD
 Med97PFiF0o6TD22/UH1VFbRpX1zjaKyqm7a3as5sJfzNA+UGIZAQ7Euz8000DKZ
 xCgVLTEwS0FmOujtBkCst7xa9TjuqR1HLOB4DdGvAhP6BHdz2yamM7Qmh9NN+NEX
 0BtjsuXomtn6j6AszGC+bpipCZh3NUigcwoFAARXCYFHibBvo4DPdFeGsraFgXdy
 1+KyR8CCeQG3Aly5Vwr264RFPGkGpwMj8PsBlXgQVtrlg4rriaCzOJNmIIbfdADw
 ftqhxBOzReZw77aH2s+9p2ILRfcAmPqhynLvFGFo9LBvsik8LVso7YgZN0xGxwcI
 IjI/XGC8UskPVsIZBIYA6sl2bYzgOjtBIHiXjRrPlW3uhduIXLrvKFfLPP/5XLAG
 ehLXK+J0bfsyY9ClmlNS8oH/WdLhXAyy/KNmnj5bRRm9qg6BRJR3bsOBhZJODuoC
 XgEXFfF6/7roNESWxowff7pK0rTkRg/m/Pa4VQpeO+6NWHE7kgZhL6kyIp5nKcwS
 3e7mgpcwC+3XfA/6vU3F
 =e0Si
 -----END PGP SIGNATURE-----

Merge tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma DMA mapping updates from Doug Ledford:
 "Drop IB DMA mapping code and use core DMA code instead.

  Bart Van Assche noted that the ib DMA mapping code was significantly
  similar enough to the core DMA mapping code that with a few changes it
  was possible to remove the IB DMA mapping code entirely and switch the
  RDMA stack to use the core DMA mapping code.

  This resulted in a nice set of cleanups, but touched the entire tree
  and has been kept separate for that reason."

* tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
  IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
  IB/core: Remove ib_device.dma_device
  nvme-rdma: Switch from dma_device to dev.parent
  RDS: net: Switch from dma_device to dev.parent
  IB/srpt: Modify a debug statement
  IB/srp: Switch from dma_device to dev.parent
  IB/iser: Switch from dma_device to dev.parent
  IB/IPoIB: Switch from dma_device to dev.parent
  IB/rxe: Switch from dma_device to dev.parent
  IB/vmw_pvrdma: Switch from dma_device to dev.parent
  IB/usnic: Switch from dma_device to dev.parent
  IB/qib: Switch from dma_device to dev.parent
  IB/qedr: Switch from dma_device to dev.parent
  IB/ocrdma: Switch from dma_device to dev.parent
  IB/nes: Remove a superfluous assignment statement
  IB/mthca: Switch from dma_device to dev.parent
  IB/mlx5: Switch from dma_device to dev.parent
  IB/mlx4: Switch from dma_device to dev.parent
  IB/i40iw: Remove a superfluous assignment statement
  IB/hns: Switch from dma_device to dev.parent
  ...
2017-02-25 13:45:43 -08:00
Aneesh Kumar K.V
c137a2757b powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write
With this our protnone becomes a present pte with READ/WRITE/EXEC bit
cleared.  By default we also set _PAGE_PRIVILEGED on such pte.  This is
now used to help us identify a protnone pte that as saved write bit.
For such pte, we will clear the _PAGE_PRIVILEGED bit.  The pte still
remain non-accessible from both user and kernel.

[aneesh.kumar@linux.vnet.ibm.com: v3]
  Link: http://lkml.kernel.org/r/1487498625-10891-4-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1487050314-3892-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <michaele@au1.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:56 -08:00
Linus Torvalds
bc49a7831b Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:
 "142 patches:

   - DAX updates

   - various misc bits

   - OCFS2 updates

   - most of MM"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (142 commits)
  mm/z3fold.c: limit first_num to the actual range of possible buddy indexes
  mm: fix <linux/pagemap.h> stray kernel-doc notation
  zram: remove obsolete sysfs attrs
  mm/memblock.c: remove unnecessary log and clean up
  oom-reaper: use madvise_dontneed() logic to decide if unmap the VMA
  mm: drop unused argument of zap_page_range()
  mm: drop zap_details::check_swap_entries
  mm: drop zap_details::ignore_dirty
  mm, page_alloc: warn_alloc nodemask is NULL when cpusets are disabled
  mm: help __GFP_NOFAIL allocations which do not trigger OOM killer
  mm, oom: do not enforce OOM killer for __GFP_NOFAIL automatically
  mm: consolidate GFP_NOFAIL checks in the allocator slowpath
  lib/show_mem.c: teach show_mem to work with the given nodemask
  arch, mm: remove arch specific show_mem
  mm, page_alloc: warn_alloc print nodemask
  mm, page_alloc: do not report all nodes in show_mem
  Revert "mm: bail out in shrink_inactive_list()"
  mm, vmscan: consider eligible zones in get_scan_count
  mm, vmscan: cleanup lru size claculations
  mm, vmscan: do not count freed pages as PGDEACTIVATE
  ...
2017-02-22 19:29:24 -08:00
Linus Torvalds
fd7e9a8834 4.11 is going to be a relatively large release for KVM, with a little over
200 commits and noteworthy changes for most architectures.
 
 * ARM:
 - GICv3 save/restore
 - cache flushing fixes
 - working MSI injection for GICv3 ITS
 - physical timer emulation
 
 * MIPS:
 - various improvements under the hood
 - support for SMP guests
 - a large rewrite of MMU emulation.  KVM MIPS can now use MMU notifiers
 to support copy-on-write, KSM, idle page tracking, swapping, ballooning
 and everything else.  KVM_CAP_READONLY_MEM is also supported, so that
 writes to some memory regions can be treated as MMIO.  The new MMU also
 paves the way for hardware virtualization support.
 
 * PPC:
 - support for POWER9 using the radix-tree MMU for host and guest
 - resizable hashed page table
 - bugfixes.
 
 * s390: expose more features to the guest
 - more SIMD extensions
 - instruction execution protection
 - ESOP2
 
 * x86:
 - improved hashing in the MMU
 - faster PageLRU tracking for Intel CPUs without EPT A/D bits
 - some refactoring of nested VMX entry/exit code, preparing for live
 migration support of nested hypervisors
 - expose yet another AVX512 CPUID bit
 - host-to-guest PTP support
 - refactoring of interrupt injection, with some optimizations thrown in
 and some duct tape removed.
 - remove lazy FPU handling
 - optimizations of user-mode exits
 - optimizations of vcpu_is_preempted() for KVM guests
 
 * generic:
 - alternative signaling mechanism that doesn't pound on tsk->sighand->siglock
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJYral1AAoJEL/70l94x66DbNgH/Rx8YXuidFq2fe3RWOvld3RK
 85OM/D5g38cTLpBE0/sJpcvX34iYN8U/l5foCZwpxB+83GHEk2Cr57JyfTogdaAJ
 x8dBhHKQCA/HxSQUQLN6nFqRV+yT8WUR92Fhqx82+80BSen5Yzcfee/TDoW6T1IW
 g8CYgX9FrRaGOX066ImAuUfdAdUVjyssfs9VttDTX+HiusPeuBPx/wsRe1ZEEPlH
 vnltIJQb1ETV2GOZLUojKjzH6aZkjIl29XxjkYii9JTUornClG0DfW+5QT3uLrB5
 gJ+G+Zmpsq8ZBx9jNDtAi7sFsoPY1Mzf+JPNCGXBra2sP2GrBAuXcxmgznRYltQ=
 =8IIp
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "4.11 is going to be a relatively large release for KVM, with a little
  over 200 commits and noteworthy changes for most architectures.

  ARM:
   - GICv3 save/restore
   - cache flushing fixes
   - working MSI injection for GICv3 ITS
   - physical timer emulation

  MIPS:
   - various improvements under the hood
   - support for SMP guests
   - a large rewrite of MMU emulation. KVM MIPS can now use MMU
     notifiers to support copy-on-write, KSM, idle page tracking,
     swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is
     also supported, so that writes to some memory regions can be
     treated as MMIO. The new MMU also paves the way for hardware
     virtualization support.

  PPC:
   - support for POWER9 using the radix-tree MMU for host and guest
   - resizable hashed page table
   - bugfixes.

  s390:
   - expose more features to the guest
   - more SIMD extensions
   - instruction execution protection
   - ESOP2

  x86:
   - improved hashing in the MMU
   - faster PageLRU tracking for Intel CPUs without EPT A/D bits
   - some refactoring of nested VMX entry/exit code, preparing for live
     migration support of nested hypervisors
   - expose yet another AVX512 CPUID bit
   - host-to-guest PTP support
   - refactoring of interrupt injection, with some optimizations thrown
     in and some duct tape removed.
   - remove lazy FPU handling
   - optimizations of user-mode exits
   - optimizations of vcpu_is_preempted() for KVM guests

  generic:
   - alternative signaling mechanism that doesn't pound on
     tsk->sighand->siglock"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (195 commits)
  x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64
  x86/paravirt: Change vcp_is_preempted() arg type to long
  KVM: VMX: use correct vmcs_read/write for guest segment selector/base
  x86/kvm/vmx: Defer TR reload after VM exit
  x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss
  x86/kvm/vmx: Simplify segment_base()
  x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels
  x86/kvm/vmx: Don't fetch the TSS base from the GDT
  x86/asm: Define the kernel TSS limit in a macro
  kvm: fix page struct leak in handle_vmon
  KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now
  KVM: Return an error code only as a constant in kvm_get_dirty_log()
  KVM: Return an error code only as a constant in kvm_get_dirty_log_protect()
  KVM: Return directly after a failed copy_from_user() in kvm_vm_compat_ioctl()
  KVM: x86: remove code for lazy FPU handling
  KVM: race-free exit from KVM_RUN without POSIX signals
  KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug message
  KVM: PPC: Book3S PR: Ratelimit copy data failure error messages
  KVM: Support vCPU-based gfn->hva cache
  KVM: use separate generations for each address space
  ...
2017-02-22 18:22:53 -08:00
Denys Vlasenko
16e72e9b30 powerpc: do not make the entire heap executable
On 32-bit powerpc the ELF PLT sections of binaries (built with
--bss-plt, or with a toolchain which defaults to it) look like this:

  [17] .sbss             NOBITS          0002aff8 01aff8 000014 00  WA  0   0  4
  [18] .plt              NOBITS          0002b00c 01aff8 000084 00 WAX  0   0  4
  [19] .bss              NOBITS          0002b090 01aff8 0000a4 00  WA  0   0  4

Which results in an ELF load header:

  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x019c70 0x00029c70 0x00029c70 0x01388 0x014c4 RWE 0x10000

This is all correct, the load region containing the PLT is marked as
executable.  Note that the PLT starts at 0002b00c but the file mapping
ends at 0002aff8, so the PLT falls in the 0 fill section described by
the load header, and after a page boundary.

Unfortunately the generic ELF loader ignores the X bit in the load
headers when it creates the 0 filled non-file backed mappings.  It
assumes all of these mappings are RW BSS sections, which is not the case
for PPC.

gcc/ld has an option (--secure-plt) to not do this, this is said to
incur a small performance penalty.

Currently, to support 32-bit binaries with PLT in BSS kernel maps
*entire brk area* with executable rights for all binaries, even
--secure-plt ones.

Stop doing that.

Teach the ELF loader to check the X bit in the relevant load header and
create 0 filled anonymous mappings that are executable if the load
header requests that.

Test program showing the difference in /proc/$PID/maps:

int main() {
	char buf[16*1024];
	char *p = malloc(123); /* make "[heap]" mapping appear */
	int fd = open("/proc/self/maps", O_RDONLY);
	int len = read(fd, buf, sizeof(buf));
	write(1, buf, len);
	printf("%p\n", p);
	return 0;
}

Compiled using: gcc -mbss-plt -m32 -Os test.c -otest

Unpatched ppc64 kernel:
00100000-00120000 r-xp 00000000 00:00 0                                  [vdso]
0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094                           /usr/lib/libc-2.17.so
0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094                           /usr/lib/libc-2.17.so
0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094                           /usr/lib/libc-2.17.so
10000000-10010000 r-xp 00000000 fd:00 100674505                          /home/user/test
10010000-10020000 r--p 00000000 fd:00 100674505                          /home/user/test
10020000-10030000 rw-p 00010000 fd:00 100674505                          /home/user/test
10690000-106c0000 rwxp 00000000 00:00 0                                  [heap]
f7f70000-f7fa0000 r-xp 00000000 fd:00 67898089                           /usr/lib/ld-2.17.so
f7fa0000-f7fb0000 r--p 00020000 fd:00 67898089                           /usr/lib/ld-2.17.so
f7fb0000-f7fc0000 rw-p 00030000 fd:00 67898089                           /usr/lib/ld-2.17.so
ffa90000-ffac0000 rw-p 00000000 00:00 0                                  [stack]
0x10690008

Patched ppc64 kernel:
00100000-00120000 r-xp 00000000 00:00 0                                  [vdso]
0fe10000-0ffd0000 r-xp 00000000 fd:00 67898094                           /usr/lib/libc-2.17.so
0ffd0000-0ffe0000 r--p 001b0000 fd:00 67898094                           /usr/lib/libc-2.17.so
0ffe0000-0fff0000 rw-p 001c0000 fd:00 67898094                           /usr/lib/libc-2.17.so
10000000-10010000 r-xp 00000000 fd:00 100674505                          /home/user/test
10010000-10020000 r--p 00000000 fd:00 100674505                          /home/user/test
10020000-10030000 rw-p 00010000 fd:00 100674505                          /home/user/test
10180000-101b0000 rw-p 00000000 00:00 0                                  [heap]
                  ^^^^ this has changed
f7c60000-f7c90000 r-xp 00000000 fd:00 67898089                           /usr/lib/ld-2.17.so
f7c90000-f7ca0000 r--p 00020000 fd:00 67898089                           /usr/lib/ld-2.17.so
f7ca0000-f7cb0000 rw-p 00030000 fd:00 67898089                           /usr/lib/ld-2.17.so
ff860000-ff890000 rw-p 00000000 00:00 0                                  [stack]
0x10180008

The patch was originally posted in 2012 by Jason Gunthorpe
and apparently ignored:

https://lkml.org/lkml/2012/9/30/138

Lightly run-tested.

Link: http://lkml.kernel.org/r/20161215131950.23054-1-dvlasenk@redhat.com
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:29 -08:00
Linus Torvalds
38705613b7 powerpc updates for 4.11 part 1.
Highlights include:
 
  - Support for direct mapped LPC on POWER9, giving Linux direct access to
    devices that may be on there such as a UART.
 
  - Memory hotplug support for the Power9 Radix MMU.
 
  - Add new AUX vectors describing the processor's cache geometry, to be used by
    glibc.
 
  - The ability for a guest to ask the hypervisor to resize the guest's hash
    table, and in addition support for doing so automatically when memory is
    hotplugged into/out-of the guest. This allows the hash table to be sized
    based on the current memory usage of the guest, rather than the maximum
    possible memory usage.
 
  - Implementation of optprobes (kprobe optimisation) for powerpc.
 
 In addition there's the topic branch shared with the KVM tree, which includes
 support for guests to use the Radix MMU on Power9.
 
 Thanks to:
   Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T, Anton Blanchard,
   Benjamin Herrenschmidt, Chris Packham, Daniel Axtens, Daniel Borkmann, David
   Gibson, Finn Thain, Gautham R. Shenoy, Gavin Shan, Greg Kurz, Joel Stanley,
   John Allen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael
   Neuling, Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Ravi
   Bangoria, Reza Arbab, Shailendra Singh, Vaibhav Jain, Wei Yongjun.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYrWj0AAoJEFHr6jzI4aWAsn4P/08Kz3TtOvDuuPGVNoO7fWOn
 ag5/zVt8R4FuCALqWpAZbVqMuUU4wLxG0RuWlmBNYYhrjMC6JxHpOSjQXxM2D7YT
 CdGTJxG414r6HMOeToL9i/z33o0m+KT07tscer+QMKlXVKCR2z0fEJch+zoPBHYA
 5eVBFpLLTtbNiX9UnrcM/vYz61d56kT4YJey9/8qbAkTAc1rMPa8ucU5UiKYJ7yX
 mF8cd7WE+7aqif9V8yN59G2rcbz+h3pbMw/gzImiYsYrUj4fLjU+VTKL5PPT+UFy
 WWBXD3MAMm1dksMMZi3hgoo2BZhDn3RkymeYi6Jo4kDknNMPZzkMxGyvaJ8Eq5H/
 bIYXdS1AbTtvaaEEuWqDFjpnChOEvj/8IeqitU0jjlql8BVjNKg/ESaaKucZr+pO
 pk2Mfvw0Tb/lxJT5qj27yq4aRsxJwdFOoPYCN7MquPp/wV2Tg5M6h4nVQ4T6Wl0S
 tMFQeCqXflhDWh0Xgr2UXpF66/YTj3Du5LasOTkgGeU30Z8TcNGFEmDWShKP3cEm
 e0oQE+OHhIGN4WSBXBAto/Gw8/0v3pXlMs+VEfeHqenwPss2sWtSwXWe8khmiy9e
 DJ48sTzj75/Zx1fiqRldw9YEnrL+NK0eOOpvzxeyKpfvdUytc+chFfEqUmO6kl3Z
 DW2UmlZxmW+b0SfexCHL
 =Icle
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Highlights include:

   - Support for direct mapped LPC on POWER9, giving Linux direct access
     to devices that may be on there such as a UART.

   - Memory hotplug support for the Power9 Radix MMU.

   - Add new AUX vectors describing the processor's cache geometry, to
     be used by glibc.

   - The ability for a guest to ask the hypervisor to resize the guest's
     hash table, and in addition support for doing so automatically when
     memory is hotplugged into/out-of the guest. This allows the hash
     table to be sized based on the current memory usage of the guest,
     rather than the maximum possible memory usage.

   - Implementation of optprobes (kprobe optimisation) for powerpc.

  In addition there's the topic branch shared with the KVM tree, which
  includes support for guests to use the Radix MMU on Power9.

  Thanks to:
    Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T, Anton
    Blanchard, Benjamin Herrenschmidt, Chris Packham, Daniel Axtens,
    Daniel Borkmann, David Gibson, Finn Thain, Gautham R. Shenoy, Gavin
    Shan, Greg Kurz, Joel Stanley, John Allen, Madhavan Srinivasan,
    Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Nathan Fontenot,
    Naveen N. Rao, Nicholas Piggin, Paul Mackerras, Ravi Bangoria, Reza
    Arbab, Shailendra Singh, Vaibhav Jain, Wei Yongjun"

* tag 'powerpc-4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (129 commits)
  powerpc/mm/radix: Skip ptesync in pte update helpers
  powerpc/mm/radix: Use ptep_get_and_clear_full when clearing pte for full mm
  powerpc/mm/radix: Update pte update sequence for pte clear case
  powerpc/mm: Update PROTFAULT handling in the page fault path
  powerpc/xmon: Fix data-breakpoint
  powerpc/mm: Fix build break with BOOK3S_64=n and MEMORY_HOTPLUG=y
  powerpc/mm: Fix build break when CMA=n && SPAPR_TCE_IOMMU=y
  powerpc/mm: Fix build break with RADIX=y & HUGETLBFS=n
  powerpc/pseries: Fix typo in parameter description
  powerpc/kprobes: Remove kprobe_exceptions_notify()
  kprobes: Introduce weak variant of kprobe_exceptions_notify()
  powerpc/ftrace: Fix confusing help text for DISABLE_MPROFILE_KERNEL
  powerpc/powernv: Fix opal_exit tracepoint opcode
  powerpc: Add a prototype for mcount() so it can be versioned
  powerpc: Drop GPL from of_node_to_nid() export to match other arches
  powerpc/kprobes: Optimize kprobe in kretprobe_trampoline()
  powerpc/kprobes: Implement Optprobes
  powerpc/kprobes: Fixes for kprobe_lookup_name() on BE
  powerpc: Add helper to check if offset is within relative branch range
  powerpc/bpf: Introduce __PPC_SH64()
  ...
2017-02-22 10:30:38 -08:00
Michael Roth
3dbbaf200f powerpc/pseries: Advertise Hot Plug Event support to firmware
With the inclusion of commit 333f7b7686 ("powerpc/pseries: Implement
indexed-count hotplug memory add") and commit 753843471c
("powerpc/pseries: Implement indexed-count hotplug memory remove"), we
now have complete handling of the RTAS hotplug event format as described
by PAPR via ACR "PAPR Changes for Hotplug RTAS Events".

This capability is indicated by byte 6, bit 2 (5 in IBM numbering) of
architecture option vector 5, and allows for greater control over
cpu/memory/pci hot plug/unplug operations.

Existing pseries kernels will utilize this capability based on the
existence of the /event-sources/hot-plug-events DT property, so we
only need to advertise it via CAS and do not need a corresponding
FW_FEATURE_* value to test for.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-21 21:32:53 +11:00
Linus Torvalds
59da2a0630 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching updates from Jiri Kosina:

 - removal of dead code (Kamalesh Babulal)

 - documentation update (Miroslav Benes)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: doc: remove the limitation for schedule() patching
  powerpc/livepatch: Remove klp_write_module_reloc() stub
2017-02-20 17:13:23 -08:00
Linus Torvalds
828cad8ea0 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this (fairly busy) cycle were:

   - There was a class of scheduler bugs related to forgetting to update
     the rq-clock timestamp which can cause weird and hard to debug
     problems, so there's a new debug facility for this: which uncovered
     a whole lot of bugs which convinced us that we want to keep the
     debug facility.

     (Peter Zijlstra, Matt Fleming)

   - Various cputime related updates: eliminate cputime and use u64
     nanoseconds directly, simplify and improve the arch interfaces,
     implement delayed accounting more widely, etc. - (Frederic
     Weisbecker)

   - Move code around for better structure plus cleanups (Ingo Molnar)

   - Move IO schedule accounting deeper into the scheduler plus related
     changes to improve the situation (Tejun Heo)

   - ... plus a round of sched/rt and sched/deadline fixes, plus other
     fixes, updats and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (85 commits)
  sched/core: Remove unlikely() annotation from sched_move_task()
  sched/autogroup: Rename auto_group.[ch] to autogroup.[ch]
  sched/topology: Split out scheduler topology code from core.c into topology.c
  sched/core: Remove unnecessary #include headers
  sched/rq_clock: Consolidate the ordering of the rq_clock methods
  delayacct: Include <uapi/linux/taskstats.h>
  sched/core: Clean up comments
  sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds
  sched/clock: Add dummy clear_sched_clock_stable() stub function
  sched/cputime: Remove generic asm headers
  sched/cputime: Remove unused nsec_to_cputime()
  s390, sched/cputime: Remove unused cputime definitions
  powerpc, sched/cputime: Remove unused cputime definitions
  s390, sched/cputime: Make arch_cpu_idle_time() to return nsecs
  ia64, sched/cputime: Remove unused cputime definitions
  ia64: Convert vtime to use nsec units directly
  ia64, sched/cputime: Move the nsecs based cputime headers to the last arch using it
  sched/cputime: Remove jiffies based cputime
  sched/cputime, vtime: Return nsecs instead of cputime_t to account
  sched/cputime: Complete nsec conversion of tick based accounting
  ...
2017-02-20 12:52:55 -08:00
Michael Ellerman
6c8f9ad566 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Freescale updates from Scott:

"Highlights include 8xx breakpoints and perf, t1042rdb display support,
and board updates."
2017-02-18 21:37:14 +11:00
Nicholas Piggin
d11914b21c powerpc/64: Implement clear_bit_unlock_is_negative_byte()
Commit b91e1302ad ("mm: optimize PageWaiters bit use for
unlock_page()") added a special bitop function to speed up
unlock_page(). Implement this for 64-bit powerpc.

This improves the unlock_page() core code from this:

	li	9,1
	lwsync
1:	ldarx	10,0,3,0
	andc	10,10,9
	stdcx.	10,0,3
	bne-	1b
	ori	2,2,0
	ld	9,0(3)
	andi.	10,9,0x80
	beqlr
	li	4,0
	b	wake_up_page_bit

To this:

	li	10,1
	lwsync
1:	ldarx	9,0,3,0
	andc	9,9,10
	stdcx.	9,0,3
	bne-	1b
	andi.	10,9,0x80
	beqlr
	li	4,0
	b	wake_up_page_bit

In a test of elapsed time for dd writing into 16GB of already-dirty
pagecache on a POWER8 with 4K pages, which has one unlock_page per 4kB
this patch reduced overhead by 1.1%:

    N           Min           Max        Median           Avg        Stddev
x  19         2.578         2.619         2.594         2.595         0.011
+  19         2.552         2.592         2.564         2.565         0.008
Difference at 95.0% confidence
	-0.030  +/- 0.006
	-1.142% +/- 0.243%

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Made 64-bit only until I can test it properly on 32-bit]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-18 14:40:01 +11:00
Sahil Mehta
333f7b7686 powerpc/pseries: Implement indexed-count hotplug memory add
Indexed-count add for memory hotplug guarantees that a contiguous block
of <count> lmbs beginning at a specified <drc index> will be assigned,
any LMBs in this range that are not already assigned will be DLPAR added.
Because of Qemu's per-DIMM memory management, the addition of a contiguous
block of memory currently requires a series of individual calls to add
each LMB in the block. Indexed-count add reduces this series of calls to
a single call for the entire block.

Signed-off-by: Sahil Mehta <sahilmehta17@gmail.com>
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-17 17:57:30 +11:00
Paolo Bonzini
ee10689117 Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
This brings in two fixes for potential host crashes, from Ben
Herrenschmidt and Nick Piggin.
2017-02-15 12:30:20 +01:00
Gavin Shan
454593e54c drivers/pci/hotplug: Mask PDC interrupt if required
We're supporting surprise hotplug on PCI slots behind root port
or PCIe switch downstream ports, which don't claim the capability
in hardware register (offset: PCIe cap + PCI_EXP_SLTCAP). PEX8718
is one of the examples. For those PCI slots, the PDC (Presence
Detection Change) event isn't reliable and the underly (skiboot)
firmware has best judgement.

This masks the PDC event when skiboot requests by "ibm,slot-broken-pdc"
property in PCI slot's device-tree node.

Reported-by: Hank Chang <hankmax0000@gmail.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Tested-by: Willie Liauw <williel@supermicro.com.tw>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-15 20:02:43 +11:00
Aneesh Kumar K.V
438e69b52b powerpc/mm/radix: Skip ptesync in pte update helpers
We do them at the start of tlb flush, and we are sure a pte update will be
followed by a tlbflush. Hence we can skip the ptesync in pte update helpers.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-15 20:02:41 +11:00
Aneesh Kumar K.V
f4894b80b1 powerpc/mm/radix: Use ptep_get_and_clear_full when clearing pte for full mm
This helps us to do some optimization for application exit case, where we can
skip the DD1 style pte update sequence.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-15 20:02:40 +11:00
Aneesh Kumar K.V
ca94573b9c powerpc/mm/radix: Update pte update sequence for pte clear case
In the kernel we do follow the below sequence in different code paths.
pte = ptep_get_clear(ptep)
....
set_pte_at(ptep, pte)

We do that for mremap, autonuma protection update and softdirty clearing. This
implies our optimization to skip a tlb flush when clearing a pte update is
not valid, because for DD1 system that followup set_pte_at will be done witout
doing the required tlbflush. Fix that by always doing the dd1 style pte update
irrespective of new_pte value. In a later patch we will optimize the application
exit case.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Tested-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-15 20:02:40 +11:00
Michael Ellerman
36b390fd62 powerpc/mm: Fix build break with BOOK3S_64=n and MEMORY_HOTPLUG=y
The recently merged HPT (Hash Page Table) resize support broke the build
when BOOK3S_64=n (ie. 32-bit or 64-bit Book3E) and MEMORY_HOTPLUG=y:

  arch/powerpc/mm/mem.o: In function `.arch_add_memory':
  (.text+0x4e4): undefined reference to `.resize_hpt_for_hotplug'

Fix it by adding a dummy version.

Fixes: 438cc81a41 ("powerpc/pseries: Automatically resize HPT for memory hot add/remove")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-15 19:58:39 +11:00
Michael Ellerman
da0e7e6276 Merge branch 'topic/ppc-kvm' into next
Merge the topic branch we're sharing with the kvm-ppc tree.
2017-02-14 17:18:29 +11:00
Michael Ellerman
aad71e3928 powerpc/mm: Fix build break with RADIX=y & HUGETLBFS=n
If we enable RADIX but disable HUGETLBFS, the build breaks with:

  arch/powerpc/mm/pgtable-radix.c:557:7: error: implicit declaration of function 'pmd_huge'
  arch/powerpc/mm/pgtable-radix.c:588:7: error: implicit declaration of function 'pud_huge'

Fix it by stubbing those functions when HUGETLBFS=n.

Fixes: 4b5d62ca17 ("powerpc/mm: add radix__remove_section_mapping()")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-14 16:54:21 +11:00
Linus Torvalds
3ebc703316 powerpc fixes for 4.10 #4
Four fixes from Ben:
 
  - Userspace was semi-randomly segfaulting on radix due to us incorrectly
    handling a fault triggered by autonuma, caused by a patch we merged earlier
    in v4.10 to prevent the kernel executing userspace.
  - We weren't marking host IPIs properly for KVM in the OPAL ICP backend.
  - The ERAT flushing on radix was missing an isync and was incorrectly marked
    as DD1 only.
  - The powernv CPU hotplug code was missing a wakeup type and failing to flush
    the interrupt correctly when using OPAL ICP.
 
 Thanks to:
   Benjamin Herrenschmidt.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYnXkMAAoJEFHr6jzI4aWATLoQAIIGjrJ/U5vKHcb8vjhMaJwQ
 JiwnIhTiA3+DJShodmReV6LxmUfIfo4CWuhG79NCuTibBi9LrTUkinE8t0DB6Z4H
 qmeaULNi+tjX2OFyWNd/WTLrARzuXKm/gs0q2LpwslnrIAahP8xaOJEDzhhqpa4r
 XyyVMVhyY1SFZK8d/ULwXeE19ZH+CwpBvanTyVS/zQPaVCY3v8798Z7x/1K6Vtrj
 ks8FDY8AqHxSOwaaCmATy8gzGzLqj0I6phUT6D0OVmZIkKn3ucE+XpCzhLYBGpGa
 PNc9guKuwXISiKDxQgCQCLyjeZnSOD57o4p83OdBIWEorxiHdaIKtXt8op5C6+3X
 uO6MzI91t4ER3N40W4JNSROxxz/wd4R7mr3qLnsi8p3iJdG87T113L/2/pjKsoCy
 HrEcD38K77mZTzcd3XPI8fYkaawL0kgnFrAGUJ+sVrpHbnbVgSvKF8c255FYl1YH
 K9LiRZ3txSfTA+eA9Xuyw7O8dMEZiA89kdJ1yng5Pe+CBnBDcXJDdeD8Fm4FWCkc
 OePb8IXPEbfT0jJ0TZpnu2x2mpzKxU2nEih3GNK90/NmiYd6Pp5drNJo78hQfFQA
 Abterj12dXLW7/ynxLz3SMv20lnxOgzfbJZwFRXY90yc9YwYy4QFrpuDy/V/gopP
 pGQhqPneUS2z6A8M5+59
 =yLmR
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes friom Michael Ellerman:
 "Apologies for the late pull request, but Ben has been busy finding bugs.

   - Userspace was semi-randomly segfaulting on radix due to us
     incorrectly handling a fault triggered by autonuma, caused by a
     patch we merged earlier in v4.10 to prevent the kernel executing
     userspace.

   - We weren't marking host IPIs properly for KVM in the OPAL ICP
     backend.

   - The ERAT flushing on radix was missing an isync and was incorrectly
     marked as DD1 only.

   - The powernv CPU hotplug code was missing a wakeup type and failing
     to flush the interrupt correctly when using OPAL ICP

  Thanks to Benjamin Herrenschmidt"

* tag 'powerpc-4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv: Properly set "host-ipi" on IPIs
  powerpc/powernv: Fix CPU hotplug to handle waking on HVI
  powerpc/mm/radix: Update ERAT flushes when invalidating TLB
  powerpc/mm: Fix spurrious segfaults on radix with autonuma
2017-02-10 14:10:35 -08:00
Michael Ellerman
99ad503287 powerpc: Add a prototype for mcount() so it can be versioned
Currently we get a warning that _mcount() can't be versioned:

  WARNING: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned.

Add a prototype to asm-prototypes.h to fix it.

The prototype is not really correct, mcount() is not a normal function,
it has a special ABI. But for the purpose of versioning it doesn't
matter.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:28:06 +11:00
Anju T
51c9c08439 powerpc/kprobes: Implement Optprobes
Current infrastructure of kprobe uses the unconditional trap instruction
to probe a running kernel. Optprobe allows kprobe to replace the trap
with a branch instruction to a detour buffer. Detour buffer contains
instructions to create an in memory pt_regs. Detour buffer also has a
call to optimized_callback() which in turn call the pre_handler(). After
the execution of the pre-handler, a call is made for instruction
emulation. The NIP is determined in advanced through dummy instruction
emulation and a branch instruction is created to the NIP at the end of
the trampoline.

To address the limitation of branch instruction in POWER architecture,
detour buffer slot is allocated from a reserved area. For the time
being, 64KB is reserved in memory for this purpose.

Instructions which can be emulated using analyse_instr() are the
candidates for optimization. Before optimization ensure that the address
range between the detour buffer allocated and the instruction being
probed is within +/- 32MB.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:28:04 +11:00
Naveen N. Rao
30176466e3 powerpc/kprobes: Fixes for kprobe_lookup_name() on BE
Fix two issues with kprobes.h on BE which were exposed with the
optprobes work:
  - one, having to do with a missing include for linux/module.h for
    MODULE_NAME_LEN -- this didn't show up previously since the only
    users of kprobe_lookup_name were in kprobes.c, which included
    linux/module.h through other headers, and
  - two, with a missing const qualifier for a local variable which ends
    up referring a string literal. Again, this is unique to how
    kprobe_lookup_name is being invoked in optprobes.c

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:28:04 +11:00
Anju T
ebfa50df43 powerpc: Add helper to check if offset is within relative branch range
To permit the use of relative branch instruction in powerpc, the target
address has to be relatively nearby, since the address is specified in an
immediate field (24 bit filed) in the instruction opcode itself. Here
nearby refers to 32MB on either side of the current instruction.

This patch verifies whether the target address is within +/- 32MB
range or not.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:28:03 +11:00
Naveen N. Rao
c233f5979b powerpc/bpf: Introduce __PPC_SH64()
Introduce __PPC_SH64() as a 64-bit variant to encode shift field in some
of the shift and rotate instructions operating on double-words. Convert
some of the BPF instruction macros to use the same.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:28:03 +11:00
David Gibson
438cc81a41 powerpc/pseries: Automatically resize HPT for memory hot add/remove
We've now implemented code in the pseries platform to use the new PAPR
interface to allow resizing the hash page table (HPT) at runtime.

This patch uses that interface to automatically attempt to resize the HPT
when memory is hot added or removed.  This tries to always keep the HPT at
a reasonable size for our current memory size.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:28:02 +11:00
David Gibson
0de0fb09bb powerpc/pseries: Advertise HPT resizing support via CAS
The hypervisor needs to know a guest is capable of using the HPT resizing
PAPR extension in order to make full advantage of it for memory hotplug.

If the hypervisor knows the guest is HPT resize aware, it can size the
initial HPT based on the initial guest RAM size, relying on the guest to
resize the HPT when more memory is hot-added. Without this, the hypervisor
must size the HPT for the maximum possible guest RAM, which can lead to
a huge waste of space if the guest never actually expends to that maximum
size.

This patch advertises the guest's support for HPT resizing via the
ibm,client-architecture-support OF interface. We use bit 5 of byte 6 of
option vector 5 for this purpose, as defined in the PAPR ACR "HPT
resizing option".

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:28:01 +11:00
David Gibson
dbcf929c00 powerpc/pseries: Add support for hash table resizing
This adds support for using two hypercalls to change the size of the
main hash page table while running as a PAPR guest. For now these
hypercalls are only in experimental qemu versions.

The interface is two part: first H_RESIZE_HPT_PREPARE is used to
allocate and prepare the new hash table. This may be slow, but can be
done asynchronously. Then, H_RESIZE_HPT_COMMIT is used to switch to the
new hash table. This requires that no CPUs be concurrently updating the
HPT, and so must be run under stop_machine().

This also adds a debugfs file which can be used to manually control
HPT resizing or testing purposes.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
[mpe: Rename the debugfs file to "hpt_order"]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-10 13:27:55 +11:00
Paolo Bonzini
2e751dfb5f kvmarm updates for 4.11
- GICv3 save restore
 - Cache flushing fixes
 - MSI injection fix for GICv3 ITS
 - Physical timer emulation support
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYnICmAAoJECPQ0LrRPXpDC04P/A73ZEL6m0vUzGpuvclxwWc6
 OCJ2C9kYloK+twyGLFbPprI4eN/70dpThgFE1Zr+ol/vAOhQlGQJoarc4n4+eYyb
 8e8IxM5Cmi44HUB64xOInidLacqeyRy5+TvKXIH0aHLgpdynSEQJu88RVXUvVgvs
 IZizhTpYueDYdexNNEkL5r2yJhVZCaczyjB1vU8k5MdODLDM63ABnPOSNJNXio2x
 itoO0EU1Lb9GhuzQj0hiMvKJPyviuPHwau7AhokUSjDPaHzaQT7TgSVioKov/rl6
 bRzhPmXqesex97ZWA5Fxr8jgSNR7JyRz+bzCLEry7XFaI3chbe0YvXeRv32PNH7I
 meuycQw64gsKmfJGRNlq30qhQQfv4fTbzpZP/j1UbvKNwhK5J6e7037c1CUH4i9C
 p9UO9HF/zAMqzD3iMcDZSpaFcbhJYrfQufbhTnbHfGC5AMVJEOWheHSEmzlDWnwr
 K5fPBxnsPv58hDmp/UZUTqCEPusY+HyuOq4ZumFSsnBwjdW+z9mLuaaTJbxaqR/G
 B6dfSQNwSnw6b2lbiXPUCm6c+Z9b190pUEWdwJ4kOTxwiPUWBppVU7gE2TrjnQ8m
 aIvEBPGIf58okjEewA5Dni6qjv7CjDN5z1V0vZUTTdVw8xuhX9eJ1Cx853SM7n0U
 sJgW5nSvSLDUpizSKdRI
 =H4vX
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

kvmarm updates for 4.11

- GICv3 save restore
- Cache flushing fixes
- MSI injection fix for GICv3 ITS
- Physical timer emulation support
2017-02-09 16:01:23 +01:00
David Gibson
64b40ffbc8 powerpc/pseries: Add hypercall wrappers for hash page table resizing
This adds the hypercall numbers and wrapper functions for the hash page
table resizing hypercalls.

These hypercall numbers are defined in the PAPR ACR "HPT resizing
option".

It also adds a new firmware feature flag to track the presence of the
HPT resizing calls.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-09 21:45:48 +11:00
Benjamin Herrenschmidt
9b25671497 powerpc/powernv: Fix CPU hotplug to handle waking on HVI
The IPIs come in as HVI not EE, so we need to test the appropriate
SRR1 bits. The encoding is such that it won't have false positives
on P7 and P8 so we can just test it like that. We also need to handle
the icp-opal variant of the flush.

Fixes: d74361881f ("powerpc/xics: Add ICP OPAL backend")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-09 15:19:57 +11:00
Paul Mackerras
a4a741a048 Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next
This merges in a fix which touches both PPC and KVM code,
which was therefore put into a topic branch in the powerpc
tree.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-02-08 19:35:34 +11:00
Benjamin Herrenschmidt
ab9bad0ead powerpc/powernv: Remove separate entry for OPAL real mode calls
All entry points already read the MSR so they can easily do
the right thing.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-07 16:40:18 +11:00
Nicholas Piggin
2337d20728 powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts
The branch from hmi_exception_early to hmi_exception_realmode must use
a "relocatable-style" branch, because it is branching from unrelocated
exception code to beyond __end_interrupts.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-07 16:39:44 +11:00
Aneesh Kumar K.V
a5ecdad484 powerpc/mm: Add MMU_FTR_KERNEL_RO to possible feature mask
Without this we will always find the feature disabled.

Fixes: 984d7a1ec6 ("powerpc/mm: Fixup kernel read only mapping")
Cc: stable@vger.kernel.org # v4.7+
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-07 16:29:55 +11:00
Nicholas Piggin
1a6822d194 powerpc/64s: Use (start, size) rather than (start, end) for exception handlers
start,size has the benefit of being easier to search for (start,end
usually gives you the preceeding vector from the one you want, as first
result).

Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-07 07:26:57 +11:00
Nicholas Piggin
852e5da99d powerpc/64s: Tidy up after exception handler rework
Somewhere along the line, search/replace left some naming garbled,
and untidy alignment (aka. mpe stuffed it up). Might as well fix them
all up now while git blame history doesn't extend too far.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-07 07:26:57 +11:00
Benjamin Herrenschmidt
98a5f361b8 powerpc: Add new cache geometry aux vectors
This adds AUX vectors for the L1I,D, L2 and L3 cache levels
providing for each cache level the size of the cache in bytes
and the geometry (line size and number of ways).

We chose to not use the existing alpha/sh definition which
packs all the information in a single entry per cache level as
it is too restricted to represent some of the geometries used
on POWER.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-06 19:46:04 +11:00
Benjamin Herrenschmidt
65e01f386f powerpc/64: Add L2 and L3 cache shape info
Retrieved from device-tree when available

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-06 19:46:04 +11:00
Benjamin Herrenschmidt
e2827fe5c1 powerpc/64: Clean up ppc64_caches using a struct per cache
We have two set of identical struct members for the I and D sides
and mostly identical bunches of code to parse the device-tree to
populate them. Instead make a ppc_cache_info structure with one
copy for I and one for D

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-06 19:46:04 +11:00
Benjamin Herrenschmidt
5d451a87e5 powerpc/64: Retrieve number of L1 cache sets from device-tree
It will be used to calculate the associativity

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-06 19:46:04 +11:00
Benjamin Herrenschmidt
bd067f83b0 powerpc/64: Fix naming of cache block vs. cache line
In a number of places we called "cache line size" what is actually
the cache block size, which in the powerpc architecture, means the
effective size to use with cache management instructions (it can
be different from the actual cache line size).

We fix the naming across the board and properly retrieve both
pieces of information when available in the device-tree.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-06 19:46:04 +11:00
Benjamin Herrenschmidt
2a196e24b3 powerpc: Move ARCH_DLINFO out of uapi
It's an kernel private macro, it doesn't belong there

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-06 19:46:04 +11:00
Linus Torvalds
57480b98af powerpc fixes for 4.10 #3
The main change is we're reverting the initial stack protector support we
 merged this cycle. It turns out to not work on toolchains built with libc
 support, and fixing it will be need to wait for another release.
 
 And the rest are all fairly minor:
  - Some pasemi machines were not booting due to a missing error check in
    prom_find_boot_cpu().
  - In EEH we were checking a pointer rather than the bool it pointed to.
  - The clang build was broken by a BUILD_BUG_ON() we added.
  - The radix (Power9 only) version of map_kernel_page() was broken if our
    memory size was a multiple of 2MB, which it generally isn't.
 
 Thanks to:
   Darren Stevens, Gavin Shan, Reza Arbab.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYlGHSAAoJEFHr6jzI4aWAVy4QALSpVLv0900hMtn+5MBhhgad
 GxHI3ViJUV+K3gDpagl+Ya++WiPd9ffiK9FqS4Xe/r1r2pgSoNU8SC736Jomb1wL
 cCx34E73SLMTfnhm/lVq+a4e+nRnk633C8gdsU4Py+FFOfz7FkeQX/gBGbv+ffng
 oVO6Susqo/dKirVo6+BCgVjSLdYezzxw31SVNGYUYz4MFxWUHK+bi/l0FIDQGhjD
 mTzV2cPhfjYTAHQtY43in6plQ91Pd2CR5HxL3Bw9DnFhS7zFMBnWPYYWh+kdT/vf
 wNS1cFJoGHjXCaXt/eXsk3GPC696bDWGSwpenQewTQdb8yJ5MWP7Jv9KeBN8603x
 LSE6o9/RATv7+pJJ9UPP+vLwSXtx2bi1FJlOU4Nbxi5YhfXZncCltMANCNuVix1l
 2x5Qhs8GJNt7nqxhv5GwvAmEaKZet1Ucgo+p/9D08bfKk8loN+evRx6fjT5CymE7
 s1INIbkzxppebTBjawDA7w6kWXjmgrskWvqjrPUfu91Ro8KmeBVS4t0pXY9D/i8N
 hCEMbXNL5qrWTRScOxP3k5cQsOPnQEz6F7AkecYjWfwGG9ZvdcztuhTj1qYRjVaF
 9Xk7OWu52/EamFdZ5MGuiVIEp8Vbqy8/0IJ/bdTOEMgjsaziD6LyRNZ6Qoyxg+bf
 kFEqmzuKgmz3D4aTklnh
 =XUAi
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "The main change is we're reverting the initial stack protector support
  we merged this cycle. It turns out to not work on toolchains built
  with libc support, and fixing it will be need to wait for another
  release.

  And the rest are all fairly minor:

   - Some pasemi machines were not booting due to a missing error check
     in prom_find_boot_cpu()

   - In EEH we were checking a pointer rather than the bool it pointed
     to

   - The clang build was broken by a BUILD_BUG_ON() we added.

   - The radix (Power9 only) version of map_kernel_page() was broken if
     our memory size was a multiple of 2MB, which it generally isn't

  Thanks to: Darren Stevens, Gavin Shan, Reza Arbab"

* tag 'powerpc-4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Use the correct pointer when setting a 2MB pte
  powerpc: Fix build failure with clang due to BUILD_BUG_ON()
  powerpc: Revert the initial stack protector support
  powerpc/eeh: Fix wrong flag passed to eeh_unfreeze_pe()
  powerpc: Add missing error check to prom_find_boot_cpu()
2017-02-03 11:10:06 -08:00
Ard Biesheuvel
71810db27c modversions: treat symbol CRCs as 32 bit quantities
The modversion symbol CRCs are emitted as ELF symbols, which allows us
to easily populate the kcrctab sections by relying on the linker to
associate each kcrctab slot with the correct value.

This has a couple of downsides:

 - Given that the CRCs are treated as memory addresses, we waste 4 bytes
   for each CRC on 64 bit architectures,

 - On architectures that support runtime relocation, a R_<arch>_RELATIVE
   relocation entry is emitted for each CRC value, which identifies it
   as a quantity that requires fixing up based on the actual runtime
   load offset of the kernel. This results in corrupted CRCs unless we
   explicitly undo the fixup (and this is currently being handled in the
   core module code)

 - Such runtime relocation entries take up 24 bytes of __init space
   each, resulting in a x8 overhead in [uncompressed] kernel size for
   CRCs.

Switching to explicit 32 bit values on 64 bit architectures fixes most
of these issues, given that 32 bit values are not treated as quantities
that require fixing up based on the actual runtime load offset.  Note
that on some ELF64 architectures [such as PPC64], these 32-bit values
are still emitted as [absolute] runtime relocatable quantities, even if
the value resolves to a build time constant.  Since relative relocations
are always resolved at build time, this patch enables MODULE_REL_CRCS on
powerpc when CONFIG_RELOCATABLE=y, which turns the absolute CRC
references into relative references into .rodata where the actual CRC
value is stored.

So redefine all CRC fields and variables as u32, and redefine the
__CRC_SYMBOL() macro for 64 bit builds to emit the CRC reference using
inline assembler (which is necessary since 64-bit C code cannot use
32-bit types to hold memory addresses, even if they are ultimately
resolved using values that do not exceed 0xffffffff).  To avoid
potential problems with legacy 32-bit architectures using legacy
toolchains, the equivalent C definition of the kcrctab entry is retained
for 32-bit architectures.

Note that this mostly reverts commit d4703aefdb ("module: handle ppc64
relocating kcrctabs when CONFIG_RELOCATABLE=y")

Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-03 08:28:25 -08:00
John Allen
e70d59700f powerpc/pseries: Introduce memory hotplug READD operation
Currently, memory must be hot removed and subsequently re-added in order
to dynamically update the affinity of LMBs specified by a PRRN event.
Earlier implementations of the PRRN event handler ran into issues in which
the hot remove would occur successfully, but a hotplug event would be
initiated from another source and grab the hotplug lock preventing the hot
add from occurring. To prevent this situation, this patch introduces the
notion of a hot "readd" action for memory which atomizes a hot remove and
a hot add into a single, serialized operation on the hotplug queue.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-02 16:57:39 +11:00
Daniel Axtens
f2ca809059 powerpc/sparse: Constify the address pointer in __get_user_nosleep()
In __get_user_nosleep, we create an intermediate pointer for the
user address we're about to fetch. We currently don't tag this
pointer as const. Make it const, as we are simply dereferencing
it, and it's scope is limited to the __get_user_nosleep macro.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-02 16:35:21 +11:00
Daniel Axtens
d466f6c5ca powerpc/sparse: Constify the address pointer in __get_user_nocheck()
In __get_user_nocheck, we create an intermediate pointer for the
user address we're about to fetch. We currently don't tag this
pointer as const. Make it const, as we are simply dereferencing
it, and it's scope is limited to the __get_user_nocheck macro.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-02 16:35:17 +11:00
Daniel Axtens
f84ed59a61 powerpc/sparse: Constify the address pointer in __get_user_check()
In __get_user_check, we create an intermediate pointer for the
user address we're about to fetch. We currently don't tag this
pointer as const. Make it const, as we are simply dereferencing
it, and it's scope is limited to the __get_user_check macro.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-02-02 16:35:11 +11:00
Frederic Weisbecker
b672592f02 sched/cputime: Remove generic asm headers
cputime_t is now only used by two architectures:

	* powerpc (when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y)
	* s390

And since the core doesn't use it anymore, we don't need any arch support
from the others. So we can remove their stub implementations.

A final cleanup would be to provide an efficient pure arch
implementation of cputime_to_nsec() for s390 and powerpc and finally
remove include/linux/cputime.h .

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-36-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01 09:14:07 +01:00
Frederic Weisbecker
e7f340ca9c powerpc, sched/cputime: Remove unused cputime definitions
Since the core doesn't deal with cputime_t anymore, most of these APIs
have been left unused. Lets remove these.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-33-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01 09:14:04 +01:00
Ingo Molnar
ed5c8c854f Merge branch 'linus' into sched/core, to pick up fixes and refresh the branch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-01 09:12:25 +01:00
David Gibson
5e9859699a KVM: PPC: Book3S HV: Outline of KVM-HV HPT resizing implementation
This adds a not yet working outline of the HPT resizing PAPR
extension.  Specifically it adds the necessary ioctl() functions,
their basic steps, the work function which will handle preparation for
the resize, and synchronization between these, the guest page fault
path and guest HPT update path.

The actual guts of the implementation isn't here yet, so for now the
calls will always fail.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:59:56 +11:00
David Gibson
f98a8bf9ee KVM: PPC: Book3S HV: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size
The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page
table (HPT) that userspace expects a guest VM to have, and is also used to
clear that HPT when necessary (e.g. guest reboot).

At present, once the ioctl() is called for the first time, the HPT size can
never be changed thereafter - it will be cleared but always sized as from
the first call.

With upcoming HPT resize implementation, we're going to need to allow
userspace to resize the HPT at reset (to change it back to the default size
if the guest changed it).

So, we need to allow this ioctl() to change the HPT size.

This patch also updates Documentation/virtual/kvm/api.txt to reflect
the new behaviour.  In fact the documentation was already slightly
incorrect since 572abd5 "KVM: PPC: Book3S HV: Don't fall back to
smaller HPT size in allocation ioctl"

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:59:45 +11:00
David Gibson
aae0777f1e KVM: PPC: Book3S HV: Split HPT allocation from activation
Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT)
and sets it up as the active page table for a VM.  For the upcoming HPT
resize implementation we're going to want to allocate HPTs separately from
activating them.

So, split the allocation itself out into kvmppc_allocate_hpt() and perform
the activation with a new kvmppc_set_hpt() function.  Likewise we split
kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt()
which unsets it as an active HPT, then frees it.

We also move the logic to fall back to smaller HPT sizes if the first try
fails into the single caller which used that behaviour,
kvmppc_hv_setup_htab_rma().  This introduces a slight semantic change, in
that previously if the initial attempt at CMA allocation failed, we would
fall back to attempting smaller sizes with the page allocator.  Now, we
try first CMA, then the page allocator at each size.  As far as I can tell
this change should be harmless.

To match, we make kvmppc_free_hpt() just free the actual HPT itself.  The
call to kvmppc_free_lpid() that was there, we move to the single caller.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:59:39 +11:00
David Gibson
3d089f84c6 KVM: PPC: Book3S HV: Don't store values derivable from HPT order
Currently the kvm_hpt_info structure stores the hashed page table's order,
and also the number of HPTEs it contains and a mask for its size.  The
last two can be easily derived from the order, so remove them and just
calculate them as necessary with a couple of helper inlines.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:59:34 +11:00
David Gibson
3f9d4f5a5f KVM: PPC: Book3S HV: Gather HPT related variables into sub-structure
Currently, the powerpc kvm_arch structure contains a number of variables
tracking the state of the guest's hashed page table (HPT) in KVM HV.  This
patch gathers them all together into a single kvm_hpt_info substructure.
This makes life more convenient for the upcoming HPT resizing
implementation.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:59:28 +11:00
David Gibson
db9a290d9c KVM: PPC: Book3S HV: Rename kvm_alloc_hpt() for clarity
The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at
all obvious from the name.  In practice kvmppc_alloc_hpt() allocates an HPT
by whatever means, and calls kvm_alloc_hpt() which will attempt to allocate
it with CMA only.

To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma().
Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 21:59:23 +11:00
Paul Mackerras
167c76e055 Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next
This merges in the POWER9 radix MMU host and guest support, which
was put into a topic branch because it touches both powerpc and
KVM code.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-01-31 19:21:26 +11:00
Paul Mackerras
8cf4ecc0ca KVM: PPC: Book3S HV: Enable radix guest support
This adds a few last pieces of the support for radix guests:

* Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and
  KVM_PPC_GET_RMMU_INFO ioctls for radix guests

* On POWER9, allow secondary threads to be on/off-lined while guests
  are running.

* Set up LPCR and the partition table entry for radix guests.

* Don't allocate the rmap array in the kvm_memory_slot structure
  on radix.

* Don't try to initialize the HPT for radix guests, since they don't
  have an HPT.

* Take out the code that prevents the HV KVM module from
  initializing on radix hosts.

At this stage, we only support radix guests if the host is running
in radix mode, and only support HPT guests if the host is running in
HPT mode.  Thus a guest cannot switch from one mode to the other,
which enables some simplifications.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:52 +11:00
Paul Mackerras
a29ebeaf55 KVM: PPC: Book3S HV: Invalidate TLB on radix guest vcpu movement
With radix, the guest can do TLB invalidations itself using the tlbie
(global) and tlbiel (local) TLB invalidation instructions.  Linux guests
use local TLB invalidations for translations that have only ever been
accessed on one vcpu.  However, that doesn't mean that the translations
have only been accessed on one physical cpu (pcpu) since vcpus can move
around from one pcpu to another.  Thus a tlbiel might leave behind stale
TLB entries on a pcpu where the vcpu previously ran, and if that task
then moves back to that previous pcpu, it could see those stale TLB
entries and thus access memory incorrectly.  The usual symptom of this
is random segfaults in userspace programs in the guest.

To cope with this, we detect when a vcpu is about to start executing on
a thread in a core that is a different core from the last time it
executed.  If that is the case, then we mark the core as needing a
TLB flush and then send an interrupt to any thread in the core that is
currently running a vcpu from the same guest.  This will get those vcpus
out of the guest, and the first one to re-enter the guest will do the
TLB flush.  The reason for interrupting the vcpus executing on the old
core is to cope with the following scenario:

	CPU 0			CPU 1			CPU 4
	(core 0)			(core 0)			(core 1)

	VCPU 0 runs task X      VCPU 1 runs
	core 0 TLB gets
	entries from task X
	VCPU 0 moves to CPU 4
							VCPU 0 runs task X
							Unmap pages of task X
							tlbiel

				(still VCPU 1)			task X moves to VCPU 1
				task X runs
				task X sees stale TLB
				entries

That is, as soon as the VCPU starts executing on the new core, it
could unmap and tlbiel some page table entries, and then the task
could migrate to one of the VCPUs running on the old core and
potentially see stale TLB entries.

Since the TLB is shared between all the threads in a core, we only
use the bit of kvm->arch.need_tlb_flush corresponding to the first
thread in the core.  To ensure that we don't have a window where we
can miss a flush, this moves the clearing of the bit from before the
actual flush to after it.  This way, two threads might both do the
flush, but we prevent the situation where one thread can enter the
guest before the flush is finished.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:51 +11:00
Paul Mackerras
8f7b79b837 KVM: PPC: Book3S HV: Implement dirty page logging for radix guests
This adds code to keep track of dirty pages when requested (that is,
when memslot->dirty_bitmap is non-NULL) for radix guests.  We use the
dirty bits in the PTEs in the second-level (partition-scoped) page
tables, together with a bitmap of pages that were dirty when their
PTE was invalidated (e.g., when the page was paged out).  This bitmap
is stored in the first half of the memslot->dirty_bitmap area, and
kvm_vm_ioctl_get_dirty_log_hv() now uses the second half for the
bitmap that gets returned to userspace.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:50 +11:00
Paul Mackerras
01756099e0 KVM: PPC: Book3S HV: MMU notifier callbacks for radix guests
This adapts our implementations of the MMU notifier callbacks
(unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva)
to call radix functions when the guest is using radix.  These
implementations are much simpler than for HPT guests because we
have only one PTE to deal with, so we don't need to traverse
rmap chains.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:49 +11:00
Paul Mackerras
5a319350a4 KVM: PPC: Book3S HV: Page table construction and page faults for radix guests
This adds the code to construct the second-level ("partition-scoped" in
architecturese) page tables for guests using the radix MMU.  Apart from
the PGD level, which is allocated when the guest is created, the rest
of the tree is all constructed in response to hypervisor page faults.

As well as hypervisor page faults for missing pages, we also get faults
for reference/change (RC) bits needing to be set, as well as various
other error conditions.  For now, we only set the R or C bit in the
guest page table if the same bit is set in the host PTE for the
backing page.

This code can take advantage of the guest being backed with either
transparent or ordinary 2MB huge pages, and insert 2MB page entries
into the guest page tables.  There is no support for 1GB huge pages
yet.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:49 +11:00
Paul Mackerras
f4c51f841d KVM: PPC: Book3S HV: Modify guest entry/exit paths to handle radix guests
This adds code to  branch around the parts that radix guests don't
need - clearing and loading the SLB with the guest SLB contents,
saving the guest SLB contents on exit, and restoring the host SLB
contents.

Since the host is now using radix, we need to save and restore the
host value for the PID register.

On hypervisor data/instruction storage interrupts, we don't do the
guest HPT lookup on radix, but just save the guest physical address
for the fault (from the ASDR register) in the vcpu struct.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:48 +11:00
Paul Mackerras
9e04ba69be KVM: PPC: Book3S HV: Add basic infrastructure for radix guests
This adds a field in struct kvm_arch and an inline helper to
indicate whether a guest is a radix guest or not, plus a new file
to contain the radix MMU code, which currently contains just a
translate function which knows how to traverse the guest page
tables to translate an address.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:48 +11:00
Paul Mackerras
468808bd35 KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9
This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl
for HPT guests on POWER9.  With this, we can return 1 for the
KVM_CAP_PPC_MMU_HASH_V3 capability.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:47 +11:00
Paul Mackerras
c927013227 KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU
This adds two capabilities and two ioctls to allow userspace to
find out about and configure the POWER9 MMU in a guest.  The two
capabilities tell userspace whether KVM can support a guest using
the radix MMU, or using the hashed page table (HPT) MMU with a
process table and segment tables.  (Note that the MMUs in the
POWER9 processor cores do not use the process and segment tables
when in HPT mode, but the nest MMU does).

The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
whether a guest will use the radix MMU or the HPT MMU, and to
specify the size and location (in guest space) of the process
table.

The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
the radix MMU.  It returns a list of supported radix tree geometries
(base page size and number of bits indexed at each level of the
radix tree) and the encoding used to specify the various page
sizes for the TLB invalidate entry instruction.

Initially, both capabilities return 0 and the ioctls return -EINVAL,
until the necessary infrastructure for them to operate correctly
is added.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:47 +11:00
Paul Mackerras
bc3551257a powerpc/64: Allow for relocation-on interrupts from guest to host
With host and guest both using radix translation, it is feasible
for the host to take interrupts that come from the guest with
relocation on, and that is in fact what the POWER9 hardware will
do when LPCR[AIL] = 3.  All such interrupts use HSRR0/1 not SRR0/1
except for system call with LEV=1 (hcall).

Therefore this adds the KVM tests to the _HV variants of the
relocation-on interrupt handlers, and adds the KVM test to the
relocation-on system call entry point.

We also instantiate the relocation-on versions of the hypervisor
data storage and instruction interrupt handlers, since these can
occur with relocation on in radix guests.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:46 +11:00
Paul Mackerras
dbcbfee0c8 powerpc/64: More definitions for POWER9
This adds definitions for bits in the DSISR register which are used
by POWER9 for various translation-related exception conditions, and
for some more bits in the partition table entry that will be needed
by KVM.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:45 +11:00
Paul Mackerras
cc3d294013 powerpc/64: Enable use of radix MMU under hypervisor on POWER9
To use radix as a guest, we first need to tell the hypervisor via
the ibm,client-architecture call first that we support POWER9 and
architecture v3.00, and that we can do either radix or hash and
that we would like to choose later using an hcall (the
H_REGISTER_PROC_TBL hcall).

Then we need to check whether the hypervisor agreed to us using
radix.  We need to do this very early on in the kernel boot process
before any of the MMU initialization is done.  If the hypervisor
doesn't agree, we can't use radix and therefore clear the radix
MMU feature bit.

Later, when we have set up our process table, which points to the
radix tree for each process, we need to install that using the
H_REGISTER_PROC_TBL hcall.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:44 +11:00
Paul Mackerras
3f4ab2f83b powerpc/pseries: Fixes for the "ibm,architecture-vec-5" options
This fixes the byte index values for some of the option bits in
the "ibm,architectur-vec-5" property. The "platform facilities options"
bits are in byte 17 not byte 14, so the upper 8 bits of their
definitions need to be 0x11 not 0x0E. The "sub processor support" option
is in byte 21 not byte 15.

Note none of these options are actually looked up in
"ibm,architecture-vec-5" at this time, so there is no bug.

When checking whether option bits are set, we should check that
the offset of the byte being checked is less than the vector
length that we got from the hypervisor.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-31 19:11:44 +11:00