Commit Graph

4129 Commits

Author SHA1 Message Date
Paul Mackerras
31a4d4480c Revert "KVM: PPC: Book3S HV: POWER9 does not require secondary thread management"
This reverts commit 94a04bc25a.

In order to run HPT guests on a radix POWER9 host, we will have to run
the host in single-threaded mode, because POWER9 processors do not
currently support running some threads of a core in HPT mode while
others are in radix mode ("mixed mode").

That means that we will need the same mechanisms that are used on
POWER8 to make the secondary threads available to KVM, which were
disabled on POWER9 by commit 94a04bc25a.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-19 15:28:04 +11:00
Michael Bringmann
cee5405da4 powerpc/hotplug: Improve responsiveness of hotplug change
powerpc/hotplug: On Power systems with shared configurations of CPUs
and memory, there are some issues with the association of additional
CPUs and memory to nodes when hot-adding resources.  During hotplug
CPU operations, this patch resets the timer on topology update work
function to a small value to better ensure that the CPU topology is
detected and configured sooner.

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-16 23:12:04 +11:00
Balbir Singh
ba41e1e1cc powerpc/mce: Hookup derror (load/store) UE errors
Extract physical_address for UE errors by walking the page
tables for the mm and address at the NIP, to extract the
instruction. Then use the instruction to find the effective
address via analyse_instr().

We might have page table walking races, but we expect them to
be rare, the physical address extraction is best effort. The idea
is to then hook up this infrastructure to memory failure eventually.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-16 23:12:01 +11:00
Balbir Singh
73e341eb6b powerpc/mce: Remove unused function get_mce_fault_addr()
There are no users of get_mce_fault_addr() since commit
1363875bdb ("powerpc/64s: fix handling of non-synchronous machine
checks") removed the last usage.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-16 23:11:19 +11:00
Will Deacon
a4c1887d4c locking/arch: Remove dummy arch_{read,spin,write}_lock_flags() implementations
The arch_{read,spin,write}_lock_flags() macros are simply mapped to the
non-flags versions by the majority of architectures, so do this in core
code and remove the dummy implementations. Also remove the implementation
in spinlock_up.h, since all callers of do_raw_spin_lock_flags() call
local_irq_save(flags) anyway.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-4-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10 11:50:19 +02:00
Will Deacon
a8a217c221 locking/core: Remove {read,spin,write}_can_lock()
Outside of the locking code itself, {read,spin,write}_can_lock() have no
users in tree. Apparmor (the last remaining user of write_can_lock()) got
moved over to lockdep by the previous patch.

This patch removes the use of {read,spin,write}_can_lock() from the
BUILD_LOCK_OPS macro, deferring to the trylock operation for testing the
lock status, and subsequently removes the unused macros altogether. They
aren't guaranteed to work in a concurrent environment and can give
incorrect results in the case of qrwlock.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1507055129-12300-2-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-10 11:50:18 +02:00
Seth Forshee
186b8f1587 powerpc: Always initialize input array when calling epapr_hypercall()
Several callers to epapr_hypercall() pass an uninitialized stack
allocated array for the input arguments, presumably because they
have no input arguments. However this can produce errors like
this one

 arch/powerpc/include/asm/epapr_hcalls.h:470:42: error: 'in' may be used uninitialized in this function [-Werror=maybe-uninitialized]
  unsigned long register r3 asm("r3") = in[0];
                                        ~~^~~

Fix callers to this function to always zero-initialize the input
arguments array to prevent this.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-06 20:50:58 +11:00
Naveen N. Rao
bf3a912517 powerpc/kprobes: Clean up jprobe detection in livepatch handler
In commit c05b8c4474 ("powerpc/kprobes: Skip livepatch_handler() for
jprobes"), we added a helper is_current_kprobe_addr() to help detect if
the modified regs->nip was due to a jprobe or livepatch. Masami felt
that the function name was not quite clear. To that end, this patch
renames is_current_kprobe_addr() to __is_active_jprobe() and adds a
comment to (hopefully) better clarify the purpose of this helper. The
helper has also now been moved to kprobes-ftrace.c so that it is only
available for KPROBES_ON_FTRACE.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04 23:42:17 +11:00
Nicholas Piggin
e36d0a2ed5 powerpc/powernv: Implement NMI IPI with OPAL_SIGNAL_SYSTEM_RESET
This allows MSR[EE]=0 lockups to be detected on an OPAL (bare metal)
system similarly to the hcall NMI IPI on pseries guests, when the
platform/firmware supports it.

This is an example of CPU10 spinning with interrupts hard disabled:

  Watchdog CPU:32 detected Hard LOCKUP other CPUS:10
  Watchdog CPU:10 Hard LOCKUP
  CPU: 10 PID: 4410 Comm: bash Not tainted 4.13.0-rc7-00074-ge89ce1f89f62-dirty #34
  task: c0000003a82b4400 task.stack: c0000003af55c000
  NIP: c0000000000a7b38 LR: c000000000659044 CTR: c0000000000a7b00
  REGS: c00000000fd23d80 TRAP: 0100   Not tainted  (4.13.0-rc7-00074-ge89ce1f89f62-dirty)
  MSR: 90000000000c1033 <SF,HV,ME,IR,DR,RI,LE>
  CR: 28422222  XER: 20000000
  CFAR: c0000000000a7b38 SOFTE: 0
  GPR00: c000000000659044 c0000003af55fbb0 c000000001072a00 0000000000000078
  GPR04: c0000003c81b5c80 c0000003c81cc7e8 9000000000009033 0000000000000000
  GPR08: 0000000000000000 c0000000000a7b00 0000000000000001 9000000000001003
  GPR12: c0000000000a7b00 c00000000fd83200 0000000010180df8 0000000010189e60
  GPR16: 0000000010189ed8 0000000010151270 000000001018bd88 000000001018de78
  GPR20: 00000000370a0668 0000000000000001 00000000101645e0 0000000010163c10
  GPR24: 00007fffd14d6294 00007fffd14d6290 c000000000fba6f0 0000000000000004
  GPR28: c000000000f351d8 0000000000000078 c000000000f4095c 0000000000000000
  NIP [c0000000000a7b38] sysrq_handle_xmon+0x38/0x40
  LR [c000000000659044] __handle_sysrq+0xe4/0x270
  Call Trace:
  [c0000003af55fbd0] [c000000000659044] __handle_sysrq+0xe4/0x270
  [c0000003af55fc70] [c000000000659810] write_sysrq_trigger+0x70/0xa0
  [c0000003af55fca0] [c0000000003da650] proc_reg_write+0xb0/0x110
  [c0000003af55fcf0] [c0000000003423bc] __vfs_write+0x6c/0x1b0
  [c0000003af55fd90] [c000000000344398] vfs_write+0xd8/0x240
  [c0000003af55fde0] [c00000000034632c] SyS_write+0x6c/0x110
  [c0000003af55fe30] [c00000000000b220] system_call+0x58/0x6c

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Use kernel types for opal_signal_system_reset()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-10-04 11:27:27 +11:00
Frederic Barrat
03b8abedf4 cxl: Enable global TLBIs for cxl contexts
The PSL and nMMU need to see all TLB invalidations for the memory
contexts used on the adapter. For the hash memory model, it is done by
making all TLBIs global as soon as the cxl driver is in use. For
radix, we need something similar, but we can refine and only convert
to global the invalidations for contexts actually used by the device.

The new mm_context_add_copro() API increments the 'active_cpus' count
for the contexts attached to the cxl adapter. As soon as there's more
than 1 active cpu, the TLBIs for the context become global. Active cpu
count must be decremented when detaching to restore locality if
possible and to avoid overflowing the counter.

The hash memory model support is somewhat limited, as we can't
decrement the active cpus count when mm_context_remove_copro() is
called, because we can't flush the TLB for a mm on hash. So TLBIs
remain global on hash.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Fixes: f24be42aab ("cxl: Add psl9 specific code")
Tested-by: Alistair Popple <alistair@popple.id.au>
[mpe: Fold in updated comment on the barrier from Fred]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-28 17:09:16 +10:00
Frederic Barrat
6110236b9b powerpc/mm: Export flush_all_mm()
With the optimizations introduced by commit a46cc7a90f
("powerpc/mm/radix: Improve TLB/PWC flushes"), flush_tlb_mm() no
longer flushes the page walk cache (PWC) with radix. This patch
introduces flush_all_mm(), which flushes everything, TLB and PWC, for
a given mm.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-By: Alistair Popple <alistair@popple.id.au>
[mpe: Add a WARN_ON_ONCE() in the empty hash routines]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-28 16:28:22 +10:00
Michael Neuling
5080332c2c powerpc/64s: Add workaround for P9 vector CI load issue
POWER9 DD2.1 and earlier has an issue where some cache inhibited
vector load will return bad data. The workaround is two part, one
firmware/microcode part triggers HMI interrupts when hitting such
loads, the other part is this patch which then emulates the
instructions in Linux.

The affected instructions are limited to lxvd2x, lxvw4x, lxvb16x and
lxvh8x.

When an instruction triggers the HMI, all threads in the core will be
sent to the HMI handler, not just the one running the vector load.

In general, these spurious HMIs are detected by the emulation code and
we just return back to the running process. Unfortunately, if a
spurious interrupt occurs on a vector load that's to normal memory we
have no way to detect that it's spurious (unless we walk the page
tables, which is very expensive). In this case we emulate the load but
we need do so using a vector load itself to ensure 128bit atomicity is
preserved.

Some additional debugfs emulated instruction counters are added also.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Switch CONFIG_PPC_BOOK3S_64 to CONFIG_VSX to unbreak the build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-27 08:23:22 +10:00
Benjamin Herrenschmidt
b9fde58db7 powerpc/powernv: Rework EEH initialization on powernv
Remove the post_init callback which is only used
by powernv, we can just call it explicitly from
the powernv code.

This partially kills the ability to "disable" eeh at
runtime via debugfs as this was calling that same
callback again, but this is both unused and broken
in several ways. If we want to revive it, we need
to create a dedicated enable/disable callback on the
backend that does the right thing.

Let the bulk of eeh initialize normally at
core_initcall() like it does on pseries by removing
the hack in eeh_init() that delays it.

Instead we make sure our eeh->probe cleanly bails
out of the PEs haven't been created yet and we force
a re-probe where we used to call eeh_init() again.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-26 11:19:07 +10:00
Linus Torvalds
fbf4432ff7 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - most of the rest of MM

 - a small number of misc things

 - lib/ updates

 - checkpatch

 - autofs updates

 - ipc/ updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (126 commits)
  ipc: optimize semget/shmget/msgget for lots of keys
  ipc/sem: play nicer with large nsops allocations
  ipc/sem: drop sem_checkid helper
  ipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t
  ipc: convert sem_undo_list.refcnt from atomic_t to refcount_t
  ipc: convert ipc_namespace.count from atomic_t to refcount_t
  kcov: support compat processes
  sh: defconfig: cleanup from old Kconfig options
  mn10300: defconfig: cleanup from old Kconfig options
  m32r: defconfig: cleanup from old Kconfig options
  drivers/pps: use surrounding "if PPS" to remove numerous dependency checks
  drivers/pps: aesthetic tweaks to PPS-related content
  cpumask: make cpumask_next() out-of-line
  kmod: move #ifdef CONFIG_MODULES wrapper to Makefile
  kmod: split off umh headers into its own file
  MAINTAINERS: clarify kmod is just a kernel module loader
  kmod: split out umh code into its own file
  test_kmod: flip INT checks to be consistent
  test_kmod: remove paranoid UINT_MAX check on uint range processing
  vfat: deduplicate hex2bin()
  ...
2017-09-09 10:30:07 -07:00
Matthew Wilcox
ac036f9570 vga: optimise console scrolling
Where possible, call memset16(), memmove() or memcpy() instead of using
open-coded loops.  I don't like the calling convention that uses a byte
count instead of a count of u16s, but it's a little late to change that.
Reduces code size of fbcon.o by almost 400 bytes on my laptop build.

[akpm@linux-foundation.org: fix build]
Link: http://lkml.kernel.org/r/20170720184539.31609-9-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-08 18:26:48 -07:00
Linus Torvalds
0756b7fbb6 First batch of KVM changes for 4.14
Common:
  - improve heuristic for boosting preempted spinlocks by ignoring VCPUs
    in user mode
 
 ARM:
  - fix for decoding external abort types from guests
 
  - added support for migrating the active priority of interrupts when
    running a GICv2 guest on a GICv3 host
 
  - minor cleanup
 
 PPC:
  - expose storage keys to userspace
 
  - merge powerpc/topic/ppc-kvm branch that contains
    find_linux_pte_or_hugepte and POWER9 thread management cleanup
 
  - merge kvm-ppc-fixes with a fix that missed 4.13 because of vacations
 
  - fixes
 
 s390:
  - merge of topic branch tlb-flushing from the s390 tree to get the
    no-dat base features
 
  - merge of kvm/master to avoid conflicts with additional sthyi fixes
 
  - wire up the no-dat enhancements in KVM
 
  - multiple epoch facility (z14 feature)
 
  - Configuration z/Architecture Mode
 
  - more sthyi fixes
 
  - gdb server range checking fix
 
  - small code cleanups
 
 x86:
  - emulate Hyper-V TSC frequency MSRs
 
  - add nested INVPCID
 
  - emulate EPTP switching VMFUNC
 
  - support Virtual GIF
 
  - support 5 level page tables
 
  - speedup nested VM exits by packing byte operations
 
  - speedup MMIO by using hardware provided physical address
 
  - a lot of fixes and cleanups, especially nested
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJZspE1AAoJEED/6hsPKofoDcMIALT11n+LKV50QGwQdg2W1GOt
 aChbgnj/Kegit3hQlDhVNb8kmdZEOZzSL81Lh0VPEr7zXU8QiWn2snbizDPv8sde
 MpHhcZYZZ0YrpoiZKjl8yiwcu88OWGn2qtJ7OpuTS5hvEGAfxMncp0AMZho6fnz/
 ySTwJ9GK2MTgBw39OAzCeDOeoYn4NKYMwjJGqBXRhNX8PG/1wmfqv0vPrd6wfg31
 KJ58BumavwJjr8YbQ1xELm9rpQrAmaayIsG0R1dEUqCbt5a1+t2gt4h2uY7tWcIv
 ACt2bIze7eF3xA+OpRs+eT+yemiH3t9btIVmhCfzUpnQ+V5Z55VMSwASLtTuJRQ=
 =R8Ry
 -----END PGP SIGNATURE-----

Merge tag 'kvm-4.14-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Radim Krčmář:
 "First batch of KVM changes for 4.14

  Common:
   - improve heuristic for boosting preempted spinlocks by ignoring
     VCPUs in user mode

  ARM:
   - fix for decoding external abort types from guests

   - added support for migrating the active priority of interrupts when
     running a GICv2 guest on a GICv3 host

   - minor cleanup

  PPC:
   - expose storage keys to userspace

   - merge kvm-ppc-fixes with a fix that missed 4.13 because of
     vacations

   - fixes

  s390:
   - merge of kvm/master to avoid conflicts with additional sthyi fixes

   - wire up the no-dat enhancements in KVM

   - multiple epoch facility (z14 feature)

   - Configuration z/Architecture Mode

   - more sthyi fixes

   - gdb server range checking fix

   - small code cleanups

  x86:
   - emulate Hyper-V TSC frequency MSRs

   - add nested INVPCID

   - emulate EPTP switching VMFUNC

   - support Virtual GIF

   - support 5 level page tables

   - speedup nested VM exits by packing byte operations

   - speedup MMIO by using hardware provided physical address

   - a lot of fixes and cleanups, especially nested"

* tag 'kvm-4.14-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (67 commits)
  KVM: arm/arm64: Support uaccess of GICC_APRn
  KVM: arm/arm64: Extract GICv3 max APRn index calculation
  KVM: arm/arm64: vITS: Drop its_ite->lpi field
  KVM: arm/arm64: vgic: constify seq_operations and file_operations
  KVM: arm/arm64: Fix guest external abort matching
  KVM: PPC: Book3S HV: Fix memory leak in kvm_vm_ioctl_get_htab_fd
  KVM: s390: vsie: cleanup mcck reinjection
  KVM: s390: use WARN_ON_ONCE only for checking
  KVM: s390: guestdbg: fix range check
  KVM: PPC: Book3S HV: Report storage key support to userspace
  KVM: PPC: Book3S HV: Fix case where HDEC is treated as 32-bit on POWER9
  KVM: PPC: Book3S HV: Fix invalid use of register expression
  KVM: PPC: Book3S HV: Fix H_REGISTER_VPA VPA size validation
  KVM: PPC: Book3S HV: Fix setting of storage key in H_ENTER
  KVM: PPC: e500mc: Fix a NULL dereference
  KVM: PPC: e500: Fix some NULL dereferences on error
  KVM: PPC: Book3S HV: Protect updates to spapr_tce_tables list
  KVM: s390: we are always in czam mode
  KVM: s390: expose no-DAT to guest and migration support
  KVM: s390: sthyi: remove invalid guest write access
  ...
2017-09-08 15:18:36 -07:00
Linus Torvalds
bac65d9d87 powerpc updates for 4.14
Nothing really major this release, despite quite a lot of activity. Just lots of
 things all over the place.
 
 Some things of note include:
 
  - Access via perf to a new type of PMU (IMC) on Power9, which can count both
    core events as well as nest unit events (Memory controller etc).
 
  - Optimisations to the radix MMU TLB flushing, mostly to avoid unnecessary Page
    Walk Cache (PWC) flushes when the structure of the tree is not changing.
 
  - Reworks/cleanups of do_page_fault() to modernise it and bring it closer to
    other architectures where possible.
 
  - Rework of our page table walking so that THP updates only need to send IPIs
    to CPUs where the affected mm has run, rather than all CPUs.
 
  - The size of our vmalloc area is increased to 56T on 64-bit hash MMU systems.
    This avoids problems with the percpu allocator on systems with very sparse
    NUMA layouts.
 
  - STRICT_KERNEL_RWX support on PPC32.
 
  - A new sched domain topology for Power9, to capture the fact that pairs of
    cores may share an L2 cache.
 
  - Power9 support for VAS, which is a new mechanism for accessing coprocessors,
    and initial support for using it with the NX compression accelerator.
 
  - Major work on the instruction emulation support, adding support for many new
    instructions, and reworking it so it can be used to implement the emulation
    needed to fixup alignment faults.
 
  - Support for guests under PowerVM to use the Power9 XIVE interrupt controller.
 
 And probably that many things again that are almost as interesting, but I had to
 keep the list short. Plus the usual fixes and cleanups as always.
 
 Thanks to:
   Alexey Kardashevskiy, Alistair Popple, Andreas Schwab, Aneesh Kumar K.V, Anju
   T Sudhakar, Arvind Yadav, Balbir Singh, Benjamin Herrenschmidt, Bhumika Goyal,
   Breno Leitao, Bryant G. Ly, Christophe Leroy, Cédric Le Goater, Dan Carpenter,
   Dou Liyang, Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand,
   Hannes Reinecke, Haren Myneni, Ivan Mikhaylov, John Allen, Julia Lawall, LABBE
   Corentin, Laurentiu Tudor, Madhavan Srinivasan, Markus Elfring, Masahiro
   Yamada, Matt Brown, Michael Neuling, Murilo Opsfelder Araujo, Nathan Fontenot,
   Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Rashmica
   Gupta, Rob Herring, Rui Teng, Sam Bobroff, Santosh Sivaraj, Scott Wood,
   Shilpasri G Bhat, Sukadev Bhattiprolu, Suraj Jitindar Singh, Tobin C. Harding,
   Victor Aoqui.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZr83SAAoJEFHr6jzI4aWA6pUP/3CEaj2bSxNzWIwidqyYjuoS
 O1moEsP0oYH7eBEWVHalYxvo0QPIIAhbFPaFyrOrgtfDH01Szwu9LcCALGb8orC5
 Hg3IY8mpNG3Q1T8wEtTa56Ik4b5ZFty35S5+X9qLNSFoDUqSvGlSsLzhPNN7f2tl
 XFm2hWqd8wXCwDsuVSFBCF61M3SAm+g6NMVNJ+VL2KIDCwBrOZLhKDPRoxLTAuMa
 jjSdjVIozWyXjUrBFi8HVcoOWLxcT1HsNF0tRs51LwY/+Mlj2jAtFtsx+a06HZa6
 f2p/Kcp/MEispSTk064Ap9cC1seXWI18zwZKpCUFqu0Ec2yTAiGdjOWDyYQldIp+
 ttVPSHQ01YrVKwDFTtM9CiA0EET6fVPhWgAPkPfvH5TvtKwGkNdy0b+nQLuWrYip
 BUmOXmjdIG3nujCzA9sv6/uNNhjhj2y+HWwuV7Qo002VFkhgZFL67u2SSUQLpYPj
 PxdkY8pPVq+O+in94oDV3c36dYFF6+g6A6505Vn6eKUm/TLpszRFGkS3bKKA5vtn
 74FR+guV/5RwYJcdZbfm04DgAocl7AfUDxpwRxibt6KtAK2VZKQuw4ugUTgYEd7W
 mL2+AMmPKuajWXAMTHjCZPbUp9gFNyYyBQTFfGVX/XLiM8erKBnGfoa1/KzUJkhr
 fVZLYIO/gzl34PiTIfgD
 =UJtt
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Nothing really major this release, despite quite a lot of activity.
  Just lots of things all over the place.

  Some things of note include:

   - Access via perf to a new type of PMU (IMC) on Power9, which can
     count both core events as well as nest unit events (Memory
     controller etc).

   - Optimisations to the radix MMU TLB flushing, mostly to avoid
     unnecessary Page Walk Cache (PWC) flushes when the structure of the
     tree is not changing.

   - Reworks/cleanups of do_page_fault() to modernise it and bring it
     closer to other architectures where possible.

   - Rework of our page table walking so that THP updates only need to
     send IPIs to CPUs where the affected mm has run, rather than all
     CPUs.

   - The size of our vmalloc area is increased to 56T on 64-bit hash MMU
     systems. This avoids problems with the percpu allocator on systems
     with very sparse NUMA layouts.

   - STRICT_KERNEL_RWX support on PPC32.

   - A new sched domain topology for Power9, to capture the fact that
     pairs of cores may share an L2 cache.

   - Power9 support for VAS, which is a new mechanism for accessing
     coprocessors, and initial support for using it with the NX
     compression accelerator.

   - Major work on the instruction emulation support, adding support for
     many new instructions, and reworking it so it can be used to
     implement the emulation needed to fixup alignment faults.

   - Support for guests under PowerVM to use the Power9 XIVE interrupt
     controller.

  And probably that many things again that are almost as interesting,
  but I had to keep the list short. Plus the usual fixes and cleanups as
  always.

  Thanks to: Alexey Kardashevskiy, Alistair Popple, Andreas Schwab,
  Aneesh Kumar K.V, Anju T Sudhakar, Arvind Yadav, Balbir Singh,
  Benjamin Herrenschmidt, Bhumika Goyal, Breno Leitao, Bryant G. Ly,
  Christophe Leroy, Cédric Le Goater, Dan Carpenter, Dou Liyang,
  Frederic Barrat, Gautham R. Shenoy, Geliang Tang, Geoff Levand, Hannes
  Reinecke, Haren Myneni, Ivan Mikhaylov, John Allen, Julia Lawall,
  LABBE Corentin, Laurentiu Tudor, Madhavan Srinivasan, Markus Elfring,
  Masahiro Yamada, Matt Brown, Michael Neuling, Murilo Opsfelder Araujo,
  Nathan Fontenot, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran,
  Paul Mackerras, Rashmica Gupta, Rob Herring, Rui Teng, Sam Bobroff,
  Santosh Sivaraj, Scott Wood, Shilpasri G Bhat, Sukadev Bhattiprolu,
  Suraj Jitindar Singh, Tobin C. Harding, Victor Aoqui"

* tag 'powerpc-4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (321 commits)
  powerpc/xive: Fix section __init warning
  powerpc: Fix kernel crash in emulation of vector loads and stores
  powerpc/xive: improve debugging macros
  powerpc/xive: add XIVE Exploitation Mode to CAS
  powerpc/xive: introduce H_INT_ESB hcall
  powerpc/xive: add the HW IRQ number under xive_irq_data
  powerpc/xive: introduce xive_esb_write()
  powerpc/xive: rename xive_poke_esb() in xive_esb_read()
  powerpc/xive: guest exploitation of the XIVE interrupt controller
  powerpc/xive: introduce a common routine xive_queue_page_alloc()
  powerpc/sstep: Avoid used uninitialized error
  axonram: Return directly after a failed kzalloc() in axon_ram_probe()
  axonram: Improve a size determination in axon_ram_probe()
  axonram: Delete an error message for a failed memory allocation in axon_ram_probe()
  powerpc/powernv/npu: Move tlb flush before launching ATSD
  powerpc/macintosh: constify wf_sensor_ops structures
  powerpc/iommu: Use permission-specific DEVICE_ATTR variants
  powerpc/eeh: Delete an error out of memory message at init time
  powerpc/mm: Use seq_putc() in two functions
  macintosh: Convert to using %pOF instead of full_name
  ...
2017-09-07 10:15:40 -07:00
Mike Kravetz
aafd4562df mm: arch: consolidate mmap hugetlb size encodings
A non-default huge page size can be encoded in the flags argument of the
mmap system call.  The definitions for these encodings are in arch
specific header files.  However, all architectures use the same values.

Consolidate all the definitions in the primary user header file
(uapi/linux/mman.h).  Include definitions for all known huge page sizes.
Use the generic encoding definitions in hugetlb_encode.h as the basis
for these definitions.

Link: http://lkml.kernel.org/r/1501527386-10736-3-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-09-06 17:27:28 -07:00
Linus Torvalds
5f82e71a00 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:

 - Add 'cross-release' support to lockdep, which allows APIs like
   completions, where it's not the 'owner' who releases the lock, to be
   tracked. It's all activated automatically under
   CONFIG_PROVE_LOCKING=y.

 - Clean up (restructure) the x86 atomics op implementation to be more
   readable, in preparation of KASAN annotations. (Dmitry Vyukov)

 - Fix static keys (Paolo Bonzini)

 - Add killable versions of down_read() et al (Kirill Tkhai)

 - Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)

 - Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)

 - Remove smp_mb__before_spinlock() and convert its usages, introduce
   smp_mb__after_spinlock() (Peter Zijlstra)

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
  locking/lockdep/selftests: Fix mixed read-write ABBA tests
  sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
  acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
  locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
  smp: Avoid using two cache lines for struct call_single_data
  locking/lockdep: Untangle xhlock history save/restore from task independence
  locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
  futex: Remove duplicated code and fix undefined behaviour
  Documentation/locking/atomic: Finish the document...
  locking/lockdep: Fix workqueue crossrelease annotation
  workqueue/lockdep: 'Fix' flush_work() annotation
  locking/lockdep/selftests: Add mixed read-write ABBA tests
  mm, locking/barriers: Clarify tlb_flush_pending() barriers
  locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
  locking/lockdep: Explicitly initialize wq_barrier::done::map
  locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
  locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
  locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
  locking/refcounts, x86/asm: Implement fast refcount overflow protection
  locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
  ...
2017-09-04 11:52:29 -07:00
Linus Torvalds
0081a0ce80 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnad:
 "The main RCU related changes in this cycle were:

   - Removal of spin_unlock_wait()
   - SRCU updates
   - RCU torture-test updates
   - RCU Documentation updates
   - Extend the sys_membarrier() ABI with the MEMBARRIER_CMD_PRIVATE_EXPEDITED variant
   - Miscellaneous RCU fixes
   - CPU-hotplug fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
  arch: Remove spin_unlock_wait() arch-specific definitions
  locking: Remove spin_unlock_wait() generic definitions
  drivers/ata: Replace spin_unlock_wait() with lock/unlock pair
  ipc: Replace spin_unlock_wait() with lock/unlock pair
  exit: Replace spin_unlock_wait() with lock/unlock pair
  completion: Replace spin_unlock_wait() with lock/unlock pair
  doc: Set down RCU's scheduling-clock-interrupt needs
  doc: No longer allowed to use rcu_dereference on non-pointers
  doc: Add RCU files to docbook-generation files
  doc: Update memory-barriers.txt for read-to-write dependencies
  doc: Update RCU documentation
  membarrier: Provide expedited private command
  rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()
  rcu: Add warning to rcu_idle_enter() for irqs enabled
  rcu: Make rcu_idle_enter() rely on callers disabling irqs
  rcu: Add assertions verifying blocked-tasks list
  rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()
  rcu: Add TPS() protection for _rcu_barrier_trace strings
  rcu: Use idle versions of swait to make idle-hack clear
  swait: Add idle variants which don't contribute to load average
  ...
2017-09-04 08:13:52 -07:00
Ingo Molnar
edc2988c54 Merge branch 'linus' into locking/core, to fix up conflicts
Conflicts:
	mm/page_alloc.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-04 11:01:18 +02:00
Cédric Le Goater
ac5e5a5402 powerpc/xive: add XIVE Exploitation Mode to CAS
On POWER9, the Client Architecture Support (CAS) negotiation process
determines whether the guest operates in XIVE Legacy compatibility or
in XIVE exploitation mode. Now that we have initial guest support for
the XIVE interrupt controller, let's inform the hypervisor what we can
do.

The platform advertises the XIVE Exploitation Mode support using the
property "ibm,arch-vec-5-platform-support-vec-5", byte 23 bits 0-1 :

 - 0b00 XIVE legacy mode Only
 - 0b01 XIVE exploitation mode Only
 - 0b10 XIVE legacy or exploitation mode

The OS asks for XIVE Exploitation Mode support using the property
"ibm,architecture-vec-5", byte 23 bits 0-1:

 - 0b00 XIVE legacy mode Only
 - 0b01 XIVE exploitation mode Only

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-02 21:02:38 +10:00
Cédric Le Goater
bed81ee181 powerpc/xive: introduce H_INT_ESB hcall
The H_INT_ESB hcall() is used to issue a load or store to the ESB page
instead of using the MMIO pages. This can be used as a workaround on
some HW issues. The OS knows that this hcall should be used on an
interrupt source when the ESB hcall flag is set to 1 in the hcall
H_INT_GET_SOURCE_INFO.

To maintain the frontier between the xive frontend and backend, we
introduce a new xive operation 'esb_rw' to be used in the routines
doing memory accesses on the ESBs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-02 21:02:37 +10:00
Cédric Le Goater
c58a14a9cc powerpc/xive: add the HW IRQ number under xive_irq_data
It will be required later by the H_INT_ESB hcall.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-02 21:02:37 +10:00
Cédric Le Goater
eac1e731b5 powerpc/xive: guest exploitation of the XIVE interrupt controller
This is the framework for using XIVE in a PowerVM guest. The support
is very similar to the native one in a much simpler form.

Each source is associated with an Event State Buffer (ESB). This is a
two bit state machine which is used to trigger events. The bits are
named "P" (pending) and "Q" (queued) and can be controlled by MMIO.
The Guest OS registers event (or notifications) queues on which the HW
will post event data for a target to notify.

Instead of OPAL calls, a set of Hypervisors call are used to configure
the interrupt sources and the event/notification queues of the guest:

 - H_INT_GET_SOURCE_INFO

   used to obtain the address of the MMIO page of the Event State
   Buffer (PQ bits) entry associated with the source.

 - H_INT_SET_SOURCE_CONFIG

   assigns a source to a "target".

 - H_INT_GET_SOURCE_CONFIG

   determines to which "target" and "priority" is assigned to a source

 - H_INT_GET_QUEUE_INFO

   returns the address of the notification management page associated
   with the specified "target" and "priority".

 - H_INT_SET_QUEUE_CONFIG

   sets or resets the event queue for a given "target" and "priority".
   It is also used to set the notification config associated with the
   queue, only unconditional notification for the moment.  Reset is
   performed with a queue size of 0 and queueing is disabled in that
   case.

 - H_INT_GET_QUEUE_CONFIG

   returns the queue settings for a given "target" and "priority".

 - H_INT_RESET

   resets all of the partition's interrupt exploitation structures to
   their initial state, losing all configuration set via the hcalls
   H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG.

 - H_INT_SYNC

   issue a synchronisation on a source to make sure sure all
   notifications have reached their queue.

As for XICS, the XIVE interface for the guest is described in the
device tree under the "interrupt-controller" node. A couple of new
properties are specific to XIVE :

 - "reg"

   contains the base address and size of the thread interrupt
   managnement areas (TIMA), also called rings, for the User level and
   for the Guest OS level. Only the Guest OS level is taken into
   account today.

 - "ibm,xive-eq-sizes"

   the size of the event queues. One cell per size supported, contains
   log2 of size, in ascending order.

 - "ibm,xive-lisn-ranges"

   the interrupt numbers ranges assigned to the guest. These are
   allocated using a simple bitmap.

and also :

 - "/ibm,plat-res-int-priorities"

   contains a list of priorities that the hypervisor has reserved for
   its own use.

Tested with a QEMU XIVE model for pseries and with the Power hypervisor.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-02 21:02:35 +10:00
Haren Myneni
146e9f1b65 crypto/nx: Add P9 NX specific error codes for 842 engine
This patch adds changes for checking P9 specific 842 engine
error codes. These errros are reported in coprocessor status
block (CSB) for failures.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:42:50 +10:00
Christophe Leroy
da74f65920 powerpc/32: add memset16()
Commit 694fc88ce2 ("powerpc/string: Implement optimized
memset variants") added memset16(), memset32() and memset64()
for the 64 bits PPC.

On 32 bits, memset64() is not relevant, and as shown below,
the generic version of memset32() gives a good code, so only
memset16() is candidate for an optimised version.

000009c0 <memset32>:
 9c0:   2c 05 00 00     cmpwi   r5,0
 9c4:   39 23 ff fc     addi    r9,r3,-4
 9c8:   4d 82 00 20     beqlr
 9cc:   7c a9 03 a6     mtctr   r5
 9d0:   94 89 00 04     stwu    r4,4(r9)
 9d4:   42 00 ff fc     bdnz    9d0 <memset32+0x10>
 9d8:   4e 80 00 20     blr

The last part of memset() handling the not 4-bytes multiples
operates on bytes, making it unsuitable for handling word without
modification. As it would increase memset() complexity, it is
better to implement memset16() from scratch. In addition it
has the advantage of allowing a more optimised memset16() than what
we would have by using the memset() function.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:42:45 +10:00
Paul Mackerras
d2b65ac652 powerpc: Emulate load/store floating point as integer word instructions
This adds emulation for the lfiwax, lfiwzx and stfiwx instructions.
This necessitated adding a new flag to indicate whether a floating
point or an integer conversion was needed for LOAD_FP and STORE_FP,
so this moves the size field in op->type up 4 bits.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:42:44 +10:00
Paul Mackerras
a53d5182e2 powerpc: Separate out load/store emulation into its own function
This moves the parts of emulate_step() that deal with emulating
load and store instructions into a new function called
emulate_loadstore().  This is to make it possible to reuse this
code in the alignment handler.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:42:41 +10:00
Paul Mackerras
d955189ae4 powerpc: Handle opposite-endian processes in emulation code
This adds code to the load and store emulation code to byte-swap
the data appropriately when the process being emulated is set to
the opposite endianness to that of the kernel.

This also enables the emulation for the multiple-register loads
and stores (lmw, stmw, lswi, stswi, lswx, stswx) to work for
little-endian.  In little-endian mode, the partial word at the
end of a transfer for lsw*/stsw* (when the byte count is not a
multiple of 4) is loaded/stored at the least-significant end of
the register.  Additionally, this fixes a bug in the previous
code in that it could call read_mem/write_mem with a byte count
that was not 1, 2, 4 or 8.

Note that this only works correctly on processors with "true"
little-endian mode, such as IBM POWER processors from POWER6 on, not
the so-called "PowerPC" little-endian mode that uses address swizzling
as implemented on the old 32-bit 603, 604, 740/750, 74xx CPUs.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:39:55 +10:00
Paul Mackerras
b2543f7b20 powerpc: Emulate the dcbz instruction
This adds code to analyse_instr() and emulate_step() to understand the
dcbz (data cache block zero) instruction.  The emulate_dcbz() function
is made public so it can be used by the alignment handler in future.
(The apparently unnecessary cropping of the address to 32 bits is
there because it will be needed in that situation.)

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:39:54 +10:00
Paul Mackerras
c22435a5f3 powerpc: Emulate FP/vector/VSX loads/stores correctly when regs not live
At present, the analyse_instr/emulate_step code checks for the
relevant MSR_FP/VEC/VSX bit being set when a FP/VMX/VSX load
or store is decoded, but doesn't recheck the bit before reading or
writing the relevant FP/VMX/VSX register in emulate_step().

Since we don't have preemption disabled, it is possible that we get
preempted between checking the MSR bit and doing the register access.
If that happened, then the registers would have been saved to the
thread_struct for the current process.  Accesses to the CPU registers
would then potentially read stale values, or write values that would
never be seen by the user process.

Another way that the registers can become non-live is if a page
fault occurs when accessing user memory, and the page fault code
calls a copy routine that wants to use the VMX or VSX registers.

To fix this, the code for all the FP/VMX/VSX loads gets restructured
so that it forms an image in a local variable of the desired register
contents, then disables preemption, checks the MSR bit and either
sets the CPU register or writes the value to the thread struct.
Similarly, the code for stores checks the MSR bit, copies either the
CPU register or the thread struct to a local variable, then reenables
preemption and then copies the register image to memory.

If the instruction being emulated is in the kernel, then we must not
use the register values in the thread_struct.  In this case, if the
relevant MSR enable bit is not set, then emulate_step refuses to
emulate the instruction.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:39:52 +10:00
Paul Mackerras
d120cdbce6 powerpc/64: Fix update forms of loads and stores to write 64-bit EA
When a 64-bit processor is executing in 32-bit mode, the update forms
of load and store instructions are required by the architecture to
write the full 64-bit effective address into the RA register, though
only the bottom 32 bits are used to address memory.  Currently,
the instruction emulation code writes the truncated address to the
RA register.  This fixes it by keeping the full 64-bit EA in the
instruction_op structure, truncating the address in emulate_step()
where it is used to address memory, rather than in the address
computations in analyse_instr().

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:39:49 +10:00
Paul Mackerras
350779a29f powerpc: Handle most loads and stores in instruction emulation code
This extends the instruction emulation infrastructure in sstep.c to
handle all the load and store instructions defined in the Power ISA
v3.0, except for the atomic memory operations, ldmx (which was never
implemented), lfdp/stfdp, and the vector element load/stores.

The instructions added are:

Integer loads and stores: lbarx, lharx, lqarx, stbcx., sthcx., stqcx.,
lq, stq.

VSX loads and stores: lxsiwzx, lxsiwax, stxsiwx, lxvx, lxvl, lxvll,
lxvdsx, lxvwsx, stxvx, stxvl, stxvll, lxsspx, lxsdx, stxsspx, stxsdx,
lxvw4x, lxsibzx, lxvh8x, lxsihzx, lxvb16x, stxvw4x, stxsibx, stxvh8x,
stxsihx, stxvb16x, lxsd, lxssp, lxv, stxsd, stxssp, stxv.

These instructions are handled both in the analyse_instr phase and in
the emulate_step phase.

The code for lxvd2ux and stxvd2ux has been taken out, as those
instructions were never implemented in any processor and have been
taken out of the architecture, and their opcodes have been reused for
other instructions in POWER9 (lxvb16x and stxvb16x).

The emulation for the VSX loads and stores uses helper functions
which don't access registers or memory directly, which can hopefully
be reused by KVM later.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:39:48 +10:00
Paul Mackerras
3cdfcbfd32 powerpc: Change analyse_instr so it doesn't modify *regs
The analyse_instr function currently doesn't just work out what an
instruction does, it also executes those instructions whose effect
is only to update CPU registers that are stored in struct pt_regs.
This is undesirable because optprobes uses analyse_instr to work out
if an instruction could be successfully emulated in future.

This changes analyse_instr so it doesn't modify *regs; instead it
stores information in the instruction_op structure to indicate what
registers (GPRs, CR, XER, LR) would be set and what value they would
be set to.  A companion function called emulate_update_regs() can
then use that information to update a pt_regs struct appropriately.

As a minor cleanup, this replaces inline asm using the cntlzw and
cntlzd instructions with calls to __builtin_clz() and __builtin_clzl().

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-09-01 16:39:27 +10:00
Jérôme Glisse
fb1522e099 KVM: update to new mmu_notifier semantic v2
Calls to mmu_notifier_invalidate_page() were replaced by calls to
mmu_notifier_invalidate_range() and are now bracketed by calls to
mmu_notifier_invalidate_range_start()/end()

Remove now useless invalidate_page callback.

Changed since v1 (Linus Torvalds)
    - remove now useless kvm_arch_mmu_notifier_invalidate_page()

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Tested-by: Mike Galbraith <efault@gmx.de>
Tested-by: Adam Borowski <kilobyte@angband.pl>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-31 16:13:00 -07:00
Paul Mackerras
93b2d3cf37 powerpc: Correct instruction code for xxlor instruction
The instruction code for xxlor that commit 0016a4cf55 ("powerpc:
Emulate most Book I instructions in emulate_step()", 2010-06-15)
added is actually the code for xxlnor.  It is used in get_vsr()
and put_vsr() and the effect of the error is that if emulate_step
is used to emulate a VSX load or store from any register other
than vsr0, the bitwise complement of the correct value will be
loaded or stored.  This corrects the error.

Fixes: 0016a4cf55 ("powerpc: Emulate most Book I instructions in emulate_step()")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 22:07:19 +10:00
Oliver O'Halloran
2a636a56d2 powerpc/smp: Add cpu_l2_cache_map
We want to add an extra level to the CPU scheduler topology to account
for cores which share a cache. To do this we need to build a cpumask
for each CPU that indicates which CPUs share this cache to use as an
input to the scheduler.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:56 +10:00
Tobin C. Harding
eb039161da powerpc/asm: Convert .llong directives to .8byte
.llong is an undocumented PPC specific directive. The generic
equivalent is .quad, but even better (because it's self describing) is
.8byte.

Convert all .llong directives to .8byte.

Signed-off-by: Tobin C. Harding <me@tobin.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:47 +10:00
Balbir Singh
d1e1b351f5 powerpc/xmon: Add ISA v3.0 SPRs to SPR dump
Add support for printing the PIDR/TIDR for ISA 300 and PSSCR and PTCR
in ISA 3.0 hypervisor mode.

SPRN_PSSCR_PR is the privileged mode access and is used when we are
not in hypervisor mode.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[mpe: Split out of larger patch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:45 +10:00
Sukadev Bhattiprolu
2392c8c8c0 powerpc/powernv/vas: Define copy/paste interfaces
Define interfaces (wrappers) to the 'copy' and 'paste'
instructions (which are new in PowerISA 3.0). These are intended to be
used to by NX driver(s) to submit Coprocessor Request Blocks (CRBs) to
the NX hardware engines.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:38 +10:00
Sukadev Bhattiprolu
5239af679a powerpc/powernv/vas: Define vas_tx_win_open()
Define an interface to open a VAS send window. This interface is
intended to be used the Nest Accelerator (NX) driver(s) to open
a send window and use it to submit compression/encryption requests
to a VAS receive window.

The receive window, identified by the [vasid, cop] parameters, must
already be open in VAS (i.e connected to an NX engine).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:37 +10:00
Sukadev Bhattiprolu
98271d4198 powerpc/powernv/vas: Define vas_win_close() interface
Define the vas_win_close() interface which should be used to close a
send or receive windows.

While the hardware configurations required to open send and receive
windows differ, the configuration to close a window is the same for
both. So we use a single interface to close the window.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:37 +10:00
Sukadev Bhattiprolu
62c4eda4fa powerpc/powernv/vas: Define vas_rx_win_open() interface
Define the vas_rx_win_open() interface. This interface is intended to
be used by the Nest Accelerator (NX) driver(s) to setup receive
windows for one or more NX engines (which implement compression &
encryption algorithms in the hardware).

Follow-on patches will provide an interface to close the window and to
open a send window that kernel subsystems can use to access the NX
engines.

The interface to open a receive window is expected to be invoked for
each instance of VAS in the system.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:36 +10:00
Sukadev Bhattiprolu
b6622a339e powerpc/powernv: Move GET_FIELD/SET_FIELD to vas.h
Move the GET_FIELD and SET_FIELD macros to vas.h as VAS and other
users of VAS, including NX-842 can use those macros.

There is a lot of related code between the VAS/NX kernel drivers
and skiboot. For consistency, switch the order of parameters in
SET_FIELD to match the order in skiboot.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:20 +10:00
Sukadev Bhattiprolu
967689141e powerpc/powernv/vas: Define macros, register fields and structures
Define macros for the VAS hardware registers and bit-fields as well
as couple of data structures needed by the VAS driver.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[mpe: Fixup include guard to use _ASM_POWERPC_VAS_H]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:17 +10:00
Alexey Kardashevskiy
f1e08232ed powerpc/pci: Remove OF node back pointer from pci_dn
The check_req() helper uses pci_get_pdn() to get an OF node pointer.
pci_get_pdn() returns a pci_dn pointer which either:
1) from the OF node returned by pci_device_to_OF_node();
2) from the parent child_list where entries don't have OF node pointers.
Since check_req() does not care about 2), it can call
pci_device_to_OF_node() directly, hence the change.

The find_pe_dn() helper uses embedded pci_dn to get an OF node which is
also stored in edev->pdev so let's take a shortcut and call
pci_device_to_OF_node() directly.

With these 2 changes, we can finally get rid of the OF node back pointer.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:12 +10:00
Alexey Kardashevskiy
405b33a76d powerpc/eeh: Remove unnecessary config_addr from eeh_dev
The eeh_dev struct hold a config space address of an associated node
and the very same address is also stored in the pci_dn struct which
is always present during the eeh_dev lifetime.

This uses bus:devfn directly from pci_dn instead of cached and packed
config_addr.

Since config_addr is made from device's bus:dev.fn, there is no point
in keeping it in the debugfs either so remove that too.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:09 +10:00
Alexey Kardashevskiy
69672bd748 powerpc/eeh: Remove unnecessary pointer to phb from eeh_dev
The eeh_dev struct already holds a pointer to pci_dn which it does not
exist without and pci_dn itself holds the very same pointer so just
use it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:09 +10:00
Alexey Kardashevskiy
8bae6a2319 powerpc/eeh: Reduce to one the number of places where edev is allocated
arch/powerpc/kernel/eeh_dev.c:57 is the only legit place where edev
is allocated; other 2 places allocate it on stack and in the heap for
a very short period of time to use eeh_pe_get() as takes edev.

This changes eeh_pe_get() to receive required parameters explicitly.

This removes unnecessary temporary allocation of edev.

This uses the "pe_no" name instead of the "pe_config_addr" name as
it actually is a PE number and not a config space address as it seemed.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:08 +10:00
Nicholas Piggin
6fcd6baa90 powerpc/powernv: Use kernel crash path for machine checks
There are quite a few machine check exceptions that can be caused by
kernel bugs. To make debugging easier, use the kernel crash path in
cases of synchronous machine checks that occur in kernel mode, if that
would not result in the machine going straight to panic or crash dump.

There is a downside here that die()ing the process in kernel mode can
still leave the system unstable. panic_on_oops will always force the
system to fail-stop, so systems where that behaviour is important will
still do the right thing.

As a test, when triggering an i-side 0111b error (ifetch from foreign
address) in kernel mode process context on POWER9, the kernel currently
dies quickly like this:

  Severe Machine check interrupt [Not recovered]
    NIP [ffff000000000000]: 0xffff000000000000
    Initiator: CPU
    Error type: Real address [Instruction fetch (foreign)]
  [  127.426651616,0] OPAL: Reboot requested due to Platform error.
      Effective[  127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
  opal: Reboot type 1 not supported
  Kernel panic - not syncing: PowerNV Unrecovered Machine Check
  CPU: 56 PID: 4425 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a261072-dirty #35
  Call Trace:
  [  128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to
    buggy/mising code in OPAL for this BMC
    Rebooting in 10 seconds..
  Trying to free IRQ 496 from IRQ context!

After this patch, the process is killed and the kernel continues with
this message, which gives enough information to identify the offending
branch (i.e., with CFAR):

  Severe Machine check interrupt [Not recovered]
    NIP [ffff000000000000]: 0xffff000000000000
    Initiator: CPU
    Error type: Real address [Instruction fetch (foreign)]
      Effective address: ffff000000000000
  Oops: Machine check, sig: 7 [#1]
  SMP NR_CPUS=2048
  NUMA
  PowerNV
  Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ...
  CPU: 22 PID: 4436 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a261072-dirty #36
  task: c000000932300000 task.stack: c000000932380000
  NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000
  REGS: c00000000fc8fd80 TRAP: 0200   Tainted: G   M             (4.12.0-rc1-13857-ga4700a261072-dirty)
  MSR: 90000000001c1003 <SF,HV,ME,RI,LE>
    CR: 24000484  XER: 20000000
  CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1
  GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000
  GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8
  GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030
  GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918
  NIP [ffff000000000000] 0xffff000000000000
  LR [00000000217706a4] 0x217706a4
  Call Trace:
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:04 +10:00
Nicholas Piggin
b746e3e01e powerpc/powernv: Flush console before platform error reboot
Unrecovered MCE and HMI errors are sent through a special restart OPAL
call to log the platform error. The downside is that they don't go
through normal Linux crash paths, so they don't give much information
to the Linux console.

Change this by providing a special crash function which does some of
the console flushing from the panic() path before calling firmware to
reboot.

The downside of this is a little more code to execute before reaching
the firmware reboot. However in practice, it's critical to get the
Linux console messages output in order to debug a problem. So this is
a desirable tradeoff.

Note on the implementation: It is difficult to plumb a custom reboot
handler into the panic path, because panic does a little bit too much
work. For example, it will try to delay with the timebase, but that
may be corrupted in some cases resulting in a hang without reaching
the platform reboot. Another problem is that panic can invoke the
crash dump code which is not what we want in the case of a hardware
platform error. Long-term the best solution will be to rework the
panic path so it can be suitable for this kind of panic, but for now
we just duplicate a bit of the code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:03 +10:00
Nicholas Piggin
a3b2cb30f2 powerpc: Do not call ppc_md.panic in fadump panic notifier
If fadump is not registered, and no other crash or debug handlers are
registered, the powerpc panic handler stops the guest before the
generic panic code can push out debug information to the console.

Currently, system reset injection causes the guest to silently stop.

Stop calling ppc_md.panic in the panic notifier. crash_fadump already
does rtas_os_term() to terminate the guest if fadump is registered.

Remove ppc_md.panic. Move fadump panic notifier into fadump code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:01 +10:00
Nicholas Piggin
70412c55d4 powerpc/64: Fix watchdog configuration regressions
This fixes a couple more bits of fallout from the new hard lockup watchdog
patch.

It restores the required hw_nmi_get_sample_period() function for the
perf watchdog, and removes some function declarations on 64e that are only
defined for 64s. This fixes the 64e build when the hardlockup detector is
enabled.

It restores the default behaviour of disabling the perf watchdog, and also
fixes disabling the 64s watchdog when running as a guest.

Fixes: 2104180a53 ("powerpc/64s: implement arch-specific hardlockup watchdog")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:00 +10:00
Paul Mackerras
4dafecde44 Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-next
This merges in the 'ppc-kvm' topic branch from the powerpc tree in
order to bring in some fixes which touch both powerpc and KVM code.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-08-31 12:37:03 +10:00
Ram Pai
d182b8fd60 KVM: PPC: Book3S HV: Fix setting of storage key in H_ENTER
In handling a H_ENTER hypercall, the code in kvmppc_do_h_enter
clobbers the high-order two bits of the storage key, which is stored
in a split field in the second doubleword of the HPTE.  Any storage
key number above 7 hence fails to operate correctly.

This makes sure we preserve all the bits of the storage key.

Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2017-08-31 12:36:44 +10:00
Nicholas Piggin
aafc8a8300 powerpc/64s: Move IDLE_STATE_ENTER_SEQ[_NORET] into idle_book3s.S
This macro is only used in idle_book3s.S, move it in there and add a
more descriptive comment.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Split out of larger patch and write change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-29 21:38:47 +10:00
Michael Ellerman
82b7fcc005 Merge branch 'topic/ppc-kvm' into next
Merge Nicks commit to rework the KVM thread management, shared with the
KVM tree via the ppc-kvm topic branch.
2017-08-29 21:26:30 +10:00
Nicholas Piggin
94a04bc25a KVM: PPC: Book3S HV: POWER9 does not require secondary thread management
POWER9 CPUs have independent MMU contexts per thread, so KVM does not
need to quiesce secondary threads, so the hwthread_req/hwthread_state
protocol does not have to be used. So patch it away on POWER9, and patch
away the branch from the Linux idle wakeup to kvm_start_guest that is
never used.

Add a warning and error out of kvmppc_grab_hwthread in case it is ever
called on POWER9.

This avoids a hwsync in the idle wakeup path on POWER9.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Use WARN(...) instead of WARN_ON()/pr_err(...)]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-29 14:48:59 +10:00
Linus Torvalds
42e6d5e5ee powerpc fixes for 4.13 #8
Just one fix, to add a barrier in the switch_mm() code to make sure the mm
 cpumask update is ordered vs the MMU starting to load translations. As far as we
 know no one's actually hit the bug, but that's just luck.
 
 Thanks to:
   Benjamin Herrenschmidt, Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZoAZDAAoJEFHr6jzI4aWAi3AQAJq4boEBqdmL042oNK4PWW0M
 uGfehNmtzCw9Hp8bPfzOf8NypJ51Kw7eDQELaeSaazKW+gffUCBeEsKGS7kmHvc+
 x1tHxkXxI7PXuNIRojJg9y7rlKXdRym5SecvPSo1cm/c46RRWOlNGZaIwiHyrXSh
 eBjyP5EHu1HXpRxkcUh+//PQp2b+7SmgUYzSf0hA9UCtzSZSJr19DuY8uhetI9Ws
 AfjkO1uvb2KETqBVegGBpAruZzQtxqdtffd2HToSaCHUnAKma2iqUZqkqBNjL6OQ
 gSXWpXVInng/7ktrrfEgSiwlHns7pgHkxYHS8thDZqQpIt3GNsUg2UwpHGf6oL7V
 L+GtRp36LM91Ueq6KdlU7bJkmoiJ798Hnp3FOjpkqo+j/MGuCQDDDK4Ge1popehJ
 a17K7lE/FKGqNaFINc1Q6hnXg4MPyawAOLDlV839Ap5+ISPS6WcHaa1AgKjdQNkH
 fIkZZsYT531FIf853AjUGFw8frSlVfrHmIx9/HJOhEa1KHQhBqGRV1sWYEjuN6IB
 av+tQDlleG5aT641qhHlA/hN5DGrGZXLp8e6cFRufF+CSsRayL27u0Qw9pP9VZ3S
 bgfdnmZZyP23+bzaq/m/bjhRiOf0snSQPxIKe56KmNCJ8buTrGWDw4IuiPKB7Y6V
 06vBFn7ZUP5aeHIZkS62
 =IClj
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.13-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 "Just one fix, to add a barrier in the switch_mm() code to make sure
  the mm cpumask update is ordered vs the MMU starting to load
  translations. As far as we know no one's actually hit the bug, but
  that's just luck.

  Thanks to Benjamin Herrenschmidt, Nicholas Piggin"

* tag 'powerpc-4.13-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Ensure cpumask update is ordered
2017-08-25 17:32:35 -07:00
Jiri Slaby
30d6e0a419 futex: Remove duplicated code and fix undefined behaviour
There is code duplicated over all architecture's headers for
futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
and comparison of the result.

Remove this duplication and leave up to the arches only the needed
assembly which is now in arch_futex_atomic_op_inuser.

This effectively distributes the Will Deacon's arm64 fix for undefined
behaviour reported by UBSAN to all architectures. The fix was done in
commit 5f16a046f8 (arm64: futex: Fix undefined behaviour with
FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.

And as suggested by Thomas, check for negative oparg too, because it was
also reported to cause undefined behaviour report.

Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
remove pointless access_ok() checks") as access_ok there returns true.
We introduce it back to the helper for the sake of simplicity (it gets
optimized away anyway).

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390]
Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org>
Reviewed-by: Will Deacon <will.deacon@arm.com> [core/arm64]
Cc: linux-mips@linux-mips.org
Cc: Rich Felker <dalias@libc.org>
Cc: linux-ia64@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: peterz@infradead.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: sparclinux@vger.kernel.org
Cc: Jonas Bonn <jonas@southpole.se>
Cc: linux-s390@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: linux-hexagon@vger.kernel.org
Cc: Helge Deller <deller@gmx.de>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-xtensa@linux-xtensa.org
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: openrisc@lists.librecores.org
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Stafford Horne <shorne@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Richard Henderson <rth@twiddle.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-parisc@vger.kernel.org
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-alpha@vger.kernel.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz
2017-08-25 22:49:59 +02:00
Benjamin Herrenschmidt
3a2df3798d powerpc/mm: Make switch_mm_irqs_off() out of line
It's too big to be inline, there is no reason to keep it
that way.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Rework to incorporate the comment changes via fixes branch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 22:48:51 +10:00
Benjamin Herrenschmidt
a619e59c07 powerpc/mm: Optimize detection of thread local mm's
Instead of comparing the whole CPU mask every time, let's
keep a counter of how many bits are set in the mask. Thus
testing for a local mm only requires testing if that counter
is 1 and the current CPU bit is set in the mask.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 22:28:38 +10:00
Benjamin Herrenschmidt
058ccc3465 powerpc/mm: Avoid double irq save/restore in activate_mm
It calls switch_mm() which already does the irq save/restore
these days.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 22:27:44 +10:00
Benjamin Herrenschmidt
43ed84a891 powerpc/mm: Move pgdir setting into a helper
Makes switch_mm_irqs_off() a bit more readable

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-23 22:27:42 +10:00
Michael Ellerman
15c659ff9d Merge branch 'fixes' into next
There's a non-trivial dependency between some commits we want to put in
next and the KVM prefetch work around that went into fixes. So merge
fixes into next.
2017-08-23 22:20:10 +10:00
Ingo Molnar
94edf6f3c2 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

 - Removal of spin_unlock_wait()
 - SRCU updates
 - Torture-test updates
 - Documentation updates
 - Miscellaneous fixes
 - CPU-hotplug fixes
 - Miscellaneous non-RCU fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-21 09:45:19 +02:00
Benjamin Herrenschmidt
1a92a80ad3 powerpc/mm: Ensure cpumask update is ordered
There is no guarantee that the various isync's involved with
the context switch will order the update of the CPU mask with
the first TLB entry for the new context being loaded by the HW.

Be safe here and add a memory barrier to order any subsequent
load/store which may bring entries into the TLB.

The corresponding barrier on the other side already exists as
pte updates use pte_xchg() which uses __cmpxchg_u64 which has
a sync after the atomic operation.

Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add comments in the code]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-18 13:07:16 +10:00
Paul E. McKenney
952111d7db arch: Remove spin_unlock_wait() arch-specific definitions
There is no agreed-upon definition of spin_unlock_wait()'s semantics,
and it appears that all callers could do just as well with a lock/unlock
pair.  This commit therefore removes the underlying arch-specific
arch_spin_unlock_wait() for all architectures providing them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
2017-08-17 08:08:59 -07:00
Aneesh Kumar K.V
fa4531f753 powerpc/mm: Don't send IPI to all cpus on THP updates
Now that we made sure that lockless walk of linux page table is mostly
limitted to current task(current->mm->pgdir) we can update the THP
update sequence to only send IPI to CPUs on which this task has run.
This helps in reducing the IPI overload on systems with large number
of CPUs.

WRT kvm even though kvm is walking page table with vpc->arch.pgdir,
it is done only on secondary CPUs and in that case we have primary CPU
added to task's mm cpumask. Sending an IPI to primary will force the
secondary to do a vm exit and hence this mm cpumask usage is safe
here.

WRT CAPI, we still end up walking linux page table with capi context
MM. For now the pte lookup serialization sends an IPI to all CPUs in
CPI is in use. We can further improve this by adding the CAPI
interrupt handling CPU to task mm cpumask. That will be done in a
later patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17 23:31:13 +10:00
Michael Ellerman
8434f0892e Merge branch 'topic/ppc-kvm' into next
Bring in the commit to rename find_linux_pte_or_hugepte() which touches
arch and KVM code, and might need to be merged with the kvmppc tree to
avoid conflicts.
2017-08-17 23:14:17 +10:00
Aneesh Kumar K.V
94171b19c3 powerpc/mm: Rename find_linux_pte_or_hugepte()
Add newer helpers to make the function usage simpler. It is always
recommended to use find_current_mm_pte() for walking the page table.
If we cannot use find_current_mm_pte(), it should be documented why
the said usage of __find_linux_pte() is safe against a parallel THP
split.

For now we have KVM code using __find_linux_pte(). This is because kvm
code ends up calling __find_linux_pte() in real mode with MSR_EE=0 but
with PACA soft_enabled = 1. We may want to fix that later and make
sure we keep the MSR_EE and PACA soft_enabled in sync. When we do that
we can switch kvm to use find_linux_pte().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17 23:13:46 +10:00
Naveen N. Rao
694fc88ce2 powerpc/string: Implement optimized memset variants
Based on Matthew Wilcox's patches for other architectures.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-17 23:04:35 +10:00
Dou Liyang
7baebe54a6 powerpc/topology: Remove the unused parent_node() macro
Commit a7be6e5a7f ("mm: drop useless local parameters of
__register_one_node()") removes the last user of parent_node().

The parent_node() macro in POWERPC platform is unnecessary.

Remove it for cleanup.

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16 21:12:41 +10:00
Aneesh Kumar K.V
4ae279c2c9 powerpc/mm/hugetlb: Allow runtime allocation of 16G.
Now that we have GIGANTIC_PAGE enabled on powerpc, use this for 16G hugepages
with hash translation mode. Depending on the total system memory we have, we may
be able to allocate 16G hugepages runtime. This also remove the hugetlb setup
difference between hash/radix translation mode.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16 14:56:13 +10:00
Aneesh Kumar K.V
79cc38ded1 powerpc/mm/hugetlb: Add support for reserving gigantic huge pages via kernel command line
With commit aa888a7497 ("hugetlb: support larger than MAX_ORDER") we added
support for allocating gigantic hugepages via kernel command line. Switch
ppc64 arch specific code to use that.

W.r.t FSL support, we now limit our allocation range using BOOTMEM_ALLOC_ACCESSIBLE.

We use the kernel command line to do reservation of hugetlb pages on powernv
platforms. On pseries hash mmu mode the supported gigantic huge page size is
16GB and that can only be allocated with hypervisor assist. For pseries the
command line option doesn't do the allocation. Instead pseries does gigantic
hugepage allocation based on hypervisor hint that is specified via
"ibm,expected#pages" property of the memory node.

Cc: Scott Wood <oss@buserror.net>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-16 14:56:12 +10:00
Christophe Leroy
ca8afd4046 powerpc/hugetlb: fix page rights verification in gup_hugepte()
gup_hugepte() checks if pages are present and readable, and
when  'write' is set, also checks if the pages are writable.

Initially this was done by checking if _PAGE_PRESENT and
_PAGE_READ were set. In addition, _PAGE_WRITE was verified for write
accesses.

The problem is that we have to handle the three following cases:
1/ The target defines __PAGE_READ and __PAGE_WRITE
2/ The target defines __PAGE_RW
3/ The target defines __PAGE_RO

In case 1/, this is obvious
In case 2/, __PAGE_READ is defined as 0 and __PAGE_WRITE as __PAGE_RW
so it works as well.
But in case 3, __PAGE_RW is defined as 0, which means __PAGE_WRITE is 0
and then the test returns true (page writable) in all cases.

A first correction was attempted in commit 6b8cb66a6a ("powerpc: Fix
usage of _PAGE_RO in hugepage"), but that fix is wrong:
instead of checking that the page is writable when write is requested,
it checks that the page is NOT writable when write is NOT requested.

This patch adds a new pte_read() helper to check whether a page is
readable or not. This avoids handling all possible cases in
gup_hugepte().

Then gup_hugepte() is modified to use pte_present(), pte_read()
and pte_write() instead of the raw flags.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 22:55:58 +10:00
Christophe Leroy
4cfac2f9c7 powerpc/mm: Simplify __set_fixmap()
__set_fixmap() uses __fix_to_virt() then does the boundary checks
by it self. Instead, we can use fix_to_virt() which does the
verification at build time. For this, we need to use it inline
so that GCC can see the real value of idx at buildtime.

In the meantime, we remove the 'fixmaps' variable.
This variable is set but has never been used from the beginning
(commit 2c419bdeca ("[POWERPC] Port fixmap from x86 and use
for kmap_atomic"))

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 22:55:58 +10:00
Christophe Leroy
86b19520e7 powerpc/mm: declare some local functions static
get_pteptr() and __mapin_ram_chunk() are only used locally,
so define them static

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 22:55:57 +10:00
Christophe Leroy
3184cc4b6f powerpc/mm: Fix kernel RAM protection after freeing unused memory on PPC32
As seen below, allthough the init sections have been freed, the
associated memory area is still marked as executable in the
page tables.

~ dmesg
[    5.860093] Freeing unused kernel memory: 592K (c0570000 - c0604000)

~ cat /sys/kernel/debug/kernel_page_tables
---[ Start of kernel VM ]---
0xc0000000-0xc0497fff        4704K  rw  X  present dirty accessed shared
0xc0498000-0xc056ffff         864K  rw     present dirty accessed shared
0xc0570000-0xc059ffff         192K  rw  X  present dirty accessed shared
0xc05a0000-0xc7ffffff      125312K  rw     present dirty accessed shared
---[ vmalloc() Area ]---

This patch fixes that.

The implementation is done by reusing the change_page_attr()
function implemented for CONFIG_DEBUG_PAGEALLOC

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 22:55:56 +10:00
Michael Ellerman
5b6c133e08 powerpc/mm/nohash: Move definition of PGALLOC_GFP to fix build errors
In some obscure Book3E configs (randconfig) we can end up missing a
definition for PGALLOC_GFP in pgtable_64.c.

Fix it by moving the definition to asm/pgalloc.h.

Fixes: de3b87611d ("powerpc/mm/book(e)(3s)/64: Add page table accounting")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-15 20:02:56 +10:00
Christophe Leroy
3ee87674e0 powerpc/8xx: Use symbolic PVR value
For the 8xx, PVR values defined in arch/powerpc/include/asm/reg.h
are nowhere used.

Remove all defines and add PVR_8xx

Use it in arch/powerpc/kernel/cputable.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:18 +10:00
Christophe Leroy
968159c003 powerpc/8xx: Getting rid of remaining use of CONFIG_8xx
Two config options exist to define powerpc MPC8xx:
* CONFIG_PPC_8xx
* CONFIG_8xx

arch/powerpc/platforms/Kconfig.cputype has contained the following
comment about CONFIG_8xx item for some years:
"# this is temp to handle compat with arch=ppc"

arch/powerpc is now the only place with remaining use of
CONFIG_8xx: get rid of them.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:12 +10:00
Christophe Leroy
72e4b2cdf0 powerpc/time: refactor MFTB() to limit number of ifdefs
The 8xx cannot access the TBL and TBU registers using mfspr/mtspr
It must be accessed using mftb/mftbu

Due to this, there is a number of places with #ifdef CONFIG_8xx

This patch defines new macros MFTBL(x) and MFTBU(x) on the same model
as MFTB(x) and tries to make use of them as much as possible.

In arch/powerpc/include/asm/timex.h, we also remove the ifdef
for the asm() operands as the compiler doesn't mind unused operands

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:32:09 +10:00
Michael Ellerman
d30a5a5262 powerpc/traps: Use SRR1 defines for program check reasons
Currently we open code the reason codes for program checks. Instead use
the existing SRR1 defines.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 23:31:32 +10:00
Shilpasri G Bhat
bf9571550f powerpc/powernv: Add support to clear sensor groups data
Adds support for clearing different sensor groups. OCC inband sensor
groups like CSM, Profiler, Job Scheduler can be cleared using this
driver. The min/max of all sensors belonging to these sensor groups
will be cleared.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:40:05 +10:00
Shilpasri G Bhat
8e84b2d1f0 powerpc/powernv: Add support to set power-shifting-ratio
This patch adds support to set power-shifting-ratio which hints the
firmware how to distribute/throttle power between different entities
in a system (e.g CPU v/s GPU). This ratio is used by OCC for power
capping algorithm.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:40:01 +10:00
Shilpasri G Bhat
cb8b340de2 powerpc/powernv: Add support for powercap framework
Adds a generic powercap framework to change the system powercap
inband through OPAL-OCC command/response interface.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:39:53 +10:00
Nicholas Piggin
04019bf8eb powerpc: Add irq accounting for watchdog interrupts
This adds an irq counter for the watchdog soft-NMI. This interrupt
only fires when interrupts are soft-disabled, so it will not
increment much even when the watchdog is running. However it's
useful for debugging and sanity checking.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:30:02 +10:00
Nicholas Piggin
ca41ad4377 powerpc: Add irq accounting for system reset interrupts
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:30:01 +10:00
Andreas Schwab
8a583c0a8d powerpc: Fix invalid use of register expressions
binutils >= 2.26 now warns about misuse of register expressions in
assembler operands that are actually literals, for example:

  arch/powerpc/kernel/entry_64.S:535: Warning: invalid register expression

In practice these are almost all uses of r0 that should just be a
literal 0.

Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
[mpe: Mention r0 is almost always the culprit, fold in purgatory change]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:29:41 +10:00
Peter Zijlstra
a9668cd6ee locking: Remove smp_mb__before_spinlock()
Now that there are no users of smp_mb__before_spinlock() left, remove
it entirely.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:03 +02:00
Peter Zijlstra
d89e588ca4 locking: Introduce smp_mb__after_spinlock()
Since its inception, our understanding of ACQUIRE, esp. as applied to
spinlocks, has changed somewhat. Also, I wonder if, with a simple
change, we cannot make it provide more.

The problem with the comment is that the STORE done by spin_lock isn't
itself ordered by the ACQUIRE, and therefore a later LOAD can pass over
it and cross with any prior STORE, rendering the default WMB
insufficient (pointed out by Alan).

Now, this is only really a problem on PowerPC and ARM64, both of
which already defined smp_mb__before_spinlock() as a smp_mb().

At the same time, we can get a much stronger construct if we place
that same barrier _inside_ the spin_lock(). In that case we upgrade
the RCpc spinlock to an RCsc.  That would make all schedule() calls
fully transitive against one another.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:02 +02:00
Michael Ellerman
21a0e8c14b powerpc/mm/hash64: Make vmalloc 56T on hash
On 64-bit book3s, with the hash MMU, we currently define the kernel
virtual space (vmalloc, ioremap etc.), to be 16T in size. This is a
leftover from pre v3.7 when our user VM was also 16T.

Of that 16T we split it 50/50, with half used for PCI IO and ioremap
and the other 8T for vmalloc.

We never bothered to make it any bigger because 8T of vmalloc ought to
be enough for anybody. But it turns out that's not true, the per cpu
allocator wants large amounts of vmalloc space, not to make large
allocations, but to allow a large stride between allocations, because
we use pcpu_embed_first_chunk().

With a bit of juggling we can increase the entire kernel virtual space
to 64T. The only real complication is the check of the address in the
SLB miss handler, see the comment in the code.

Although we could continue to split virtual space 50/50 as we do now,
no one seems to be running out of PCI IO or ioremap space. So instead
keep that as 8T, and use the remaining 56T for vmalloc.

In future we should be able to increase the kernel virtual space to
512T, the code already supports that, it just needs testing on older
hardware.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2017-08-08 19:37:05 +10:00
Michael Ellerman
63ee9b2ff9 powerpc/mm/book3s64: Make KERN_IO_START a variable
Currently KERN_IO_START is defined as:

 #define KERN_IO_START  (KERN_VIRT_START + (KERN_VIRT_SIZE >> 1))

Although it looks like a constant, both the components are actually
variables, to allow us to have a different value between Radix and
Hash with a single kernel.

However that still requires both Radix and Hash to place the kernel IO
region at the same location relative to the start and end of the
kernel virtual region (namely 1/2 way through it), and we'd like to
change that.

So split KERN_IO_START out into its own variable, and initialise it
for Radix and Hash. In the medium term we should be able to
reconsolidate this, by doing a more involved rearrangement of the
location of the regions.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-08 19:37:04 +10:00
Matt Brown
e66ca3db59 powerpc/powernv: Use darn instruction for get_random_seed() on Power9
This adds powernv_get_random_darn() which utilises the darn instruction,
introduced in ISA v3.0/POWER9.

The darn instruction can potentially return an error, which is supported
by the get_random_seed() API, in normal usage if we see an error we just
return that to the caller.

However when detecting whether darn is functional at boot we try up to
10 times, before deciding that darn doesn't work and failing the
registration of get_random_seed(). That way an intermittent failure
at boot doesn't deprive the system of randomness until the next reboot.

Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com>
[mpe: Move init into a function, tweak change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-08 19:37:03 +10:00
Frederic Barrat
2552910084 powerpc/powernv: Enable PCI peer-to-peer
P9 has support for PCI peer-to-peer, enabling a device to write in the
MMIO space of another device directly, without interrupting the CPU.

This patch adds support for it on powernv, by adding a new API to be
called by drivers. The pnv_pci_set_p2p(...) call configures an
'initiator', i.e the device which will issue the MMIO operation, and a
'target', i.e. the device on the receiving side.

P9 really only supports MMIO stores for the time being but that's
expected to change in the future, so the API allows to define both
load and store operations.

  /* PCI p2p descriptor */
  #define OPAL_PCI_P2P_ENABLE           0x1
  #define OPAL_PCI_P2P_LOAD             0x2
  #define OPAL_PCI_P2P_STORE            0x4

  int pnv_pci_set_p2p(struct pci_dev *initiator, struct pci_dev *target,
                      u64 desc)

It uses a new OPAL call, as the configuration magic is done on the
PHBs by skiboot.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Russell Currey <ruscur@russell.cc>
[mpe: Drop unrelated OPAL calls, s/uint64_t/u64/, minor formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-08 11:27:30 +10:00
Linus Torvalds
841fe95303 powerpc fixes for 4.13 #5
Fixes for recently merged code:
  - a fix for the _PAGE_DEVMAP support, which was breaking KVM on Power9 radix
  - avoid a (harmless) lockdep warning in the early SMP code
  - return failure for some uses of dma_set_mask() rather than falling back to 32-bits
  - fix stack setup in watchdog soft_nmi_common() to use emergency stack
  - fix of_irq_to_resource() error check in of_fsl_spi_probe()
 
 Two fixes going to stable:
  - fix saving of Transactional Memory SPRs in core dump
  - fix __check_irq_replay missing decrementer interrupt
 
 And two misc:
  - fix 64-bit boot wrapper build with non-biarch compiler
  - work around a POWER9 PMU hang after state-loss idle
 
 Thanks to:
   Alistair Popple, Aneesh Kumar K.V, Cyril Bur, Gustavo Romero, Jose Ricardo
   Ziviani, Laurent Vivier, Nicholas Piggin, Oliver O'Halloran, Sergei Shtylyov,
   Suraj Jitindar Singh, Thomas Gleixner.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZhEH/AAoJEFHr6jzI4aWA8GoP+wdtaBYZIElFMnPay/BY8lSQ
 MODF6FlaexdNuVFo5In2BnMAs12bIXDO69wMziqc+uZIA+vI3K7lqsi+8p1R38Ts
 FG5OFLzX/gx1ykoeDe0mC8GPUHGVBdMr6+rHzt/9M7PrQWApCoBJLakYIElOrN5B
 EiAxunwJQISKNQYYmyVhjMVFq8ydMHSgtR7JVDHTh1AoStaO9M5hoJegD4Dnytaf
 F0/0Y5U+NGGWwPIyDrjB4wE9L8NCk0JfvpBhjORMkFSpAOe4VhxZrAGDyaCzVKVh
 YkxSirVKuVwEmvV2prmq4SAJi61EMUEPx2WL7hRN3ol/dF+2lU9nl2lyL3yRnGPY
 9mtNHur07qoMqyXI8750ayTzbMG75aakJF2be9G+zwNc3Sw8dqJvs2PwpSVdnima
 jWmWnRikws4qmEOEBV6nxtWf8qYACHevyK7YzIHDU8SVXLjAkX/u6LZGHAfTQ8Xo
 jYcISpiA2ko8YnMR7yy9FZqs4KGqeOotMamgJRGUcSY/9Xn8WDIrtQ4tIuJtGjzW
 XFOFafZOzyt6s1dNz01C4hzP60J+/CxPNRxx6zWnbijY6puADu/Mv1g+ai3nYdKA
 e/CYt6O8tzooFHCHtVLUPgWA82heDwNtjmlnSikWTdT07KLRhP+0w9Qq8gBzztKT
 7B/vPMjCMPQaJrrnj6GX
 =Rdy8
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fixes for recently merged code:
   - a fix for the _PAGE_DEVMAP support, which was breaking KVM on
     Power9 radix
   - avoid a (harmless) lockdep warning in the early SMP code
   - return failure for some uses of dma_set_mask() rather than falling
     back to 32-bits
   - fix stack setup in watchdog soft_nmi_common() to use emergency
     stack
   - fix of_irq_to_resource() error check in of_fsl_spi_probe()

  Two fixes going to stable:
   - fix saving of Transactional Memory SPRs in core dump
   - fix __check_irq_replay missing decrementer interrupt

  And two misc:
   - fix 64-bit boot wrapper build with non-biarch compiler
   - work around a POWER9 PMU hang after state-loss idle

  Thanks to: Alistair Popple, Aneesh Kumar K.V, Cyril Bur, Gustavo
  Romero, Jose Ricardo Ziviani, Laurent Vivier, Nicholas Piggin, Oliver
  O'Halloran, Sergei Shtylyov, Suraj Jitindar Singh, Thomas Gleixner"

* tag 'powerpc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64: Fix __check_irq_replay missing decrementer interrupt
  powerpc/perf: POWER9 PMU stops after idle workaround
  powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check
  powerpc/64s: Fix stack setup in watchdog soft_nmi_common()
  powerpc/powernv/pci: Return failure for some uses of dma_set_mask()
  powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler
  powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU
  powerpc/tm: Fix saving of TM SPRs in core dump
  powerpc/mm: Fix pmd/pte_devmap() on non-leaf entries
2017-08-04 09:56:54 -07:00
Benjamin Herrenschmidt
6ff4d3e966 powerpc: Remove old unused icswx based coprocessor support
We have a whole pile of unused code to maintain the ACOP register,
allocate coprocessor PIDs and handle ACOP faults. This mechanism
was used for the HFI adapter on POWER7 which is dead and gone and
whose driver never went upstream. It was used on some A2 core based
stuff that also never saw the light of day.

Take out all that code.

There is still some POWER8 coprocessor code that uses icswx but it's
kernel only and thus doesn't use any of that infrastructure.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03 16:06:52 +10:00
Benjamin Herrenschmidt
870cfe77a9 powerpc/mm: Update definitions of DSISR bits
This updates the definitions for the various DSISR bits to
match both some historical stuff and to match new bits on
POWER9.

In addition, we define some masks corresponding to the "bad"
faults on Book3S, and some masks corresponding to the bits
that match between DSISR and SRR1 for a DSI and an ISI.

This comes with a small code update to change the definition
of DSISR_PGDIRFAULT which becomes DSISR_PRTABLE_FAULT to
match architecture 3.0B

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-03 16:06:43 +10:00