Commit Graph

122 Commits

Author SHA1 Message Date
Nicholas Piggin
b74d2807ae powerpc/powernv: Remove OPALv1 support from opal console driver
opal_put_chars deals with partial writes because in OPALv1,
opal_console_write_buffer_space did not work correctly. That firmware
is not supported.

This reworks the opal_put_chars code to no longer deal with partial
writes by turning them into full writes. Partial write handling is still
supported in terms of what gets returned to the caller, but it may not
go to the console atomically. A warning message is printed in this
case.

This allows console flushing to be moved out of the opal_write_lock
spinlock. That could cause the lock to be held for long periods if the
console is busy (especially if it was being spammed by firmware),
which is dangerous because the lock is taken by xmon to debug the
system. Flushing outside the lock improves the situation a bit.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:56 +10:00
Nicholas Piggin
d2a2262e68 powerpc/powernv: Implement and use opal_flush_console
A new console flushing firmware API was introduced to replace event
polling loops, and implemented in opal-kmsg with affddff69c
("powerpc/powernv: Add a kmsg_dumper that flushes console output on
panic"), to flush the console in the panic path.

The OPAL console driver has other situations where interrupts are off
and it needs to flush the console synchronously. These still use a
polling loop.

So move the opal-kmsg flush code to opal_flush_console, and use the
new function in opal-kmsg and opal_put_chars.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:56 +10:00
Nicholas Piggin
36d2dabc87 powerpc/powernv: Fix OPAL console driver OPAL_BUSY loops
The OPAL console driver does not delay in case it gets OPAL_BUSY or
OPAL_BUSY_EVENT from firmware.

It can't yet be made to sleep because it is called under spinlock,
but it can be changed to the standard OPAL_BUSY loop form, and a
delay added to keep it from hitting the firmware too frequently.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:55 +10:00
Nicholas Piggin
bd90284cc6 powerpc/powernv: opal_put_chars partial write fix
The intention here is to consume and discard the remaining buffer
upon error. This works if there has not been a previous partial write.
If there has been, then total_len is no longer total number of bytes
to copy. total_len is always "bytes left to copy", so it should be
added to written bytes.

This code may not be exercised any more if partial writes will not be
hit, but this is a small bugfix before a larger change.

Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-07-24 22:09:54 +10:00
Nicholas Piggin
56c0b48b1e powerpc/powernv: process all OPAL event interrupts with kopald
Using irq_work for processing OPAL event interrupts is not necessary.
irq_work is typically used to schedule work from NMI context, a
softirq may be more appropriate. However OPAL events are not
particularly performance or latency critical, so they can all be
invoked by kopald.

This patch removes the irq_work queueing, and instead wakes up
kopald when there is an event to be processed. kopald processes
interrupts individually, enabling irqs and calling cond_resched
between each one to minimise latencies.

Event handlers themselves should still use threaded handlers,
workqueues, etc. as necessary to avoid high interrupts-off latencies
within any single interrupt.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:30 +10:00
Linus Torvalds
9f3a0941fb libnvdimm for 4.17
* A rework of the filesytem-dax implementation provides for detection of
   unmap operations (truncate / hole punch) colliding with in-progress
   device-DMA. A fix for these collisions remains a work-in-progress
   pending resolution of truncate latency and starvation regressions.
 
 * The of_pmem driver expands the users of libnvdimm outside of x86 and
   ACPI to describe an implementation of persistent memory on PowerPC with
   Open Firmware / Device tree.
 
 * Address Range Scrub (ARS) handling is completely rewritten to account for
   the fact that ARS may run for 100s of seconds and there is no platform
   defined way to cancel it. ARS will now no longer block namespace
   initialization.
 
 * The NVDIMM Namespace Label implementation is updated to handle label
   areas as small as 1K, down from 128K.
 
 * Miscellaneous cleanups and updates to unit test infrastructure.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJazDt5AAoJEB7SkWpmfYgCqGMQALLwdPeY87cUK7AvQ2IXj46B
 lJgeVuHPzyQDbC03AS5uUYnnU3I5lFd7i4y7ZrywNpFs4lsb/bNmbUpQE5xp+Yvc
 1MJ/JYDIP5X4misWYm3VJo85N49+VqSRgAQk52PBigwnZ7M6/u4cSptXM9//c9JL
 /NYbat6IjjY6Tx49Tec6+F3GMZjsFLcuTVkQcREoOyOqVJE4YpP0vhNjEe0vq6vr
 EsSWiqEI5VFH4PfJwKdKj/64IKB4FGKj2A5cEgjQBxW2vw7tTJnkRkdE3jDUjqtg
 xYAqGp/Dqs4+bgdYlT817YhiOVrcr5mOHj7TKWQrBPgzKCbcG5eKDmfT8t+3NEga
 9kBlgisqIcG72lwZNA7QkEHxq1Omy9yc1hUv9qz2YA0G+J1WE8l1T15k1DOFwV57
 qIrLLUypklNZLxvrzNjclempboKc4JCUlj+TdN5E5Y6pRs55UWTXaP7Xf5O7z0vf
 l/uiiHkc3MPH73YD2PSEGFJ8m8EU0N8xhrcz3M9E2sHgYCnbty1Lw3FH0/GhThVA
 ya1mMeDdb8A2P7gWCBk1Lqeig+rJKXSey4hKM6D0njOEtMQO1H4tFqGjyfDX1xlJ
 3plUR9WBVEYzN5+9xWbwGag/ezGZ+NfcVO2gmy6yXiEph796BxRAZx/18zKRJr0m
 9eGJG1H+JspcbtLF9iHn
 =acZQ
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "This cycle was was not something I ever want to repeat as there were
  several late changes that have only now just settled.

  Half of the branch up to commit d2c997c0f1 ("fs, dax: use
  page->mapping to warn...") have been in -next for several releases.
  The of_pmem driver and the address range scrub rework were late
  arrivals, and the dax work was scaled back at the last moment.

  The of_pmem driver missed a previous merge window due to an oversight.
  A sense of obligation to rectify that miss is why it is included for
  4.17. It has acks from PowerPC folks. Stephen reported a build failure
  that only occurs when merging it with your latest tree, for now I have
  fixed that up by disabling modular builds of of_pmem. A test merge
  with your tree has received a build success report from the 0day robot
  over 156 configs.

  An initial version of the ARS rework was submitted before the merge
  window. It is self contained to libnvdimm, a net code reduction, and
  passing all unit tests.

  The filesystem-dax changes are based on the wait_var_event()
  functionality from tip/sched/core. However, late review feedback
  showed that those changes regressed truncate performance to a large
  degree. The branch was rewound to drop the truncate behavior change
  and now only includes preparation patches and cleanups (with full acks
  and reviews). The finalization of this dax-dma-vs-trnucate work will
  need to wait for 4.18.

  Summary:

   - A rework of the filesytem-dax implementation provides for detection
     of unmap operations (truncate / hole punch) colliding with
     in-progress device-DMA. A fix for these collisions remains a
     work-in-progress pending resolution of truncate latency and
     starvation regressions.

   - The of_pmem driver expands the users of libnvdimm outside of x86
     and ACPI to describe an implementation of persistent memory on
     PowerPC with Open Firmware / Device tree.

   - Address Range Scrub (ARS) handling is completely rewritten to
     account for the fact that ARS may run for 100s of seconds and there
     is no platform defined way to cancel it. ARS will now no longer
     block namespace initialization.

   - The NVDIMM Namespace Label implementation is updated to handle
     label areas as small as 1K, down from 128K.

   - Miscellaneous cleanups and updates to unit test infrastructure"

* tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits)
  libnvdimm, of_pmem: workaround OF_NUMA=n build error
  nfit, address-range-scrub: add module option to skip initial ars
  nfit, address-range-scrub: rework and simplify ARS state machine
  nfit, address-range-scrub: determine one platform max_ars value
  powerpc/powernv: Create platform devs for nvdimm buses
  doc/devicetree: Persistent memory region bindings
  libnvdimm: Add device-tree based driver
  libnvdimm: Add of_node to region and bus descriptors
  libnvdimm, region: quiet region probe
  libnvdimm, namespace: use a safe lookup for dimm device name
  libnvdimm, dimm: fix dpa reservation vs uninitialized label area
  libnvdimm, testing: update the default smart ctrl_temperature
  libnvdimm, testing: Add emulation for smart injection commands
  nfit, address-range-scrub: introduce nfit_spa->ars_state
  libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device'
  nfit, address-range-scrub: fix scrub in-progress reporting
  dax, dm: allow device-mapper to operate without dax support
  dax: introduce CONFIG_DAX_DRIVER
  fs, dax: use page->mapping to warn if truncate collides with a busy page
  ext2, dax: introduce ext2_dax_aops
  ...
2018-04-10 10:25:57 -07:00
Oliver O'Halloran
3013e17381 powerpc/powernv: Create platform devs for nvdimm buses
Scan the devicetree for an nvdimm-bus compatible and create
a platform device for them.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-04-07 07:53:23 -07:00
Balbir Singh
5ee573e8ef powerpc/powernv/mce: Don't silently restart the machine
On MCE the current code will restart the machine with
ppc_md.restart(). This case was extremely unlikely since
prior to that a skiboot call is made and that resulted in
a checkstop for analysis.

With newer skiboots, on P9 we don't checkstop the box by
default, instead we return back to the kernel to extract
useful information at the time of the MCE. While we still
get this information, this patch converts the restart to
a panic(), so that if configured a dump can be taken and
we can track and probably debug the potential issue causing
the MCE.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-03-13 15:50:31 +11:00
Nicholas Piggin
35adacd6fc powerpc/pseries, ps3: panic flush kernel messages before halting system
Platforms with a panic handler that halts the system can have problems
getting kernel messages out, because the panic notifiers are called
before kernel/panic.c does its flushing of printk buffers an console
etc.

This was attempted to be solved with commit a3b2cb30f2 ("powerpc: Do
not call ppc_md.panic in fadump panic notifier"), but that wasn't the
right approach and caused other problems, and was reverted by commit
ab9dbf771f.

Instead, the powernv shutdown paths have already had a similar
problem, fixed by taking the message flushing sequence from
kernel/panic.c. That's a little bit ugly, but while we have the code
duplicated, it will work for this case as well. So have ppc panic
handlers do the same flushing before they terminate.

Without this patch, a qemu pseries_le_defconfig guest stops silently
when issued the nmi command when xmon is off and no crash dumpers
enabled. Afterwards, an oops is printed by each CPU as expected.

Fixes: ab9dbf771f ("Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-01-22 11:44:24 +11:00
Benjamin Herrenschmidt
5138b31422 powerpc: Reduce log level of "OPAL detected !" message
This message isn't terribly useful.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-11 13:03:28 +11:00
Joe Perches
f2c2cbcc35 powerpc: Use pr_warn instead of pr_warning
At some point, pr_warning will be removed so all logging messages use
a consistent <prefix>_warn style.

Update arch/powerpc/

Miscellanea:

o Coalesce formats
o Realign arguments
o Use %s, __func__ instead of embedded function names
o Remove unnecessary line continuations

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Geoff Levand <geoff@infradead.org>
[mpe: Rebase due to some %pOF changes.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-12-04 11:54:34 +11:00
Cyril Bur
77adbd2207 powerpc/powernv: Add OPAL_BUSY to opal_error_code()
Also export opal_error_code() so that it can be used in modules

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-06 20:39:31 +11:00
Nicholas Piggin
6fcd6baa90 powerpc/powernv: Use kernel crash path for machine checks
There are quite a few machine check exceptions that can be caused by
kernel bugs. To make debugging easier, use the kernel crash path in
cases of synchronous machine checks that occur in kernel mode, if that
would not result in the machine going straight to panic or crash dump.

There is a downside here that die()ing the process in kernel mode can
still leave the system unstable. panic_on_oops will always force the
system to fail-stop, so systems where that behaviour is important will
still do the right thing.

As a test, when triggering an i-side 0111b error (ifetch from foreign
address) in kernel mode process context on POWER9, the kernel currently
dies quickly like this:

  Severe Machine check interrupt [Not recovered]
    NIP [ffff000000000000]: 0xffff000000000000
    Initiator: CPU
    Error type: Real address [Instruction fetch (foreign)]
  [  127.426651616,0] OPAL: Reboot requested due to Platform error.
      Effective[  127.426693712,3] OPAL: Reboot requested due to Platform error. address: ffff000000000000
  opal: Reboot type 1 not supported
  Kernel panic - not syncing: PowerNV Unrecovered Machine Check
  CPU: 56 PID: 4425 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a261072-dirty #35
  Call Trace:
  [  128.017988928,4] IPMI: BUG: Dropping ESEL on the floor due to
    buggy/mising code in OPAL for this BMC
    Rebooting in 10 seconds..
  Trying to free IRQ 496 from IRQ context!

After this patch, the process is killed and the kernel continues with
this message, which gives enough information to identify the offending
branch (i.e., with CFAR):

  Severe Machine check interrupt [Not recovered]
    NIP [ffff000000000000]: 0xffff000000000000
    Initiator: CPU
    Error type: Real address [Instruction fetch (foreign)]
      Effective address: ffff000000000000
  Oops: Machine check, sig: 7 [#1]
  SMP NR_CPUS=2048
  NUMA
  PowerNV
  Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 ...
  CPU: 22 PID: 4436 Comm: syscall Tainted: G   M            4.12.0-rc1-13857-ga4700a261072-dirty #36
  task: c000000932300000 task.stack: c000000932380000
  NIP: ffff000000000000 LR: 00000000217706a4 CTR: ffff000000000000
  REGS: c00000000fc8fd80 TRAP: 0200   Tainted: G   M             (4.12.0-rc1-13857-ga4700a261072-dirty)
  MSR: 90000000001c1003 <SF,HV,ME,RI,LE>
    CR: 24000484  XER: 20000000
  CFAR: c000000000004c80 DAR: 0000000021770a90 DSISR: 0a000000 SOFTE: 1
  GPR00: 0000000000001ebe 00007fffce4818b0 0000000021797f00 0000000000000000
  GPR04: 00007fff8007ac24 0000000044000484 0000000000004000 00007fff801405e8
  GPR08: 900000000280f033 0000000024000484 0000000000000000 0000000000000030
  GPR12: 9000000000001003 00007fff801bc370 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR28: 00007fff801b0000 0000000000000000 00000000217707a0 00007fffce481918
  NIP [ffff000000000000] 0xffff000000000000
  LR [00000000217706a4] 0x217706a4
  Call Trace:
  Instruction dump:
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
  XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:04 +10:00
Nicholas Piggin
b746e3e01e powerpc/powernv: Flush console before platform error reboot
Unrecovered MCE and HMI errors are sent through a special restart OPAL
call to log the platform error. The downside is that they don't go
through normal Linux crash paths, so they don't give much information
to the Linux console.

Change this by providing a special crash function which does some of
the console flushing from the panic() path before calling firmware to
reboot.

The downside of this is a little more code to execute before reaching
the firmware reboot. However in practice, it's critical to get the
Linux console messages output in order to debug a problem. So this is
a desirable tradeoff.

Note on the implementation: It is difficult to plumb a custom reboot
handler into the panic path, because panic does a little bit too much
work. For example, it will try to delay with the timebase, but that
may be corrupted in some cases resulting in a hang without reaching
the platform reboot. Another problem is that panic can invoke the
crash dump code which is not what we want in the case of a hardware
platform error. Long-term the best solution will be to rework the
panic path so it can be suitable for this kind of panic, but for now
we just duplicate a bit of the code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:26:03 +10:00
Nicholas Piggin
76b42e28be powerpc/powernv: powernv platform is not constrained by RMA
Remove incorrect comment about real mode address restrictions on
powernv (bare metal), and unnecessary clamping to ppc64_rma_size.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-31 14:25:58 +10:00
Shilpasri G Bhat
bf9571550f powerpc/powernv: Add support to clear sensor groups data
Adds support for clearing different sensor groups. OCC inband sensor
groups like CSM, Profiler, Job Scheduler can be cleared using this
driver. The min/max of all sensors belonging to these sensor groups
will be cleared.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:40:05 +10:00
Shilpasri G Bhat
8e84b2d1f0 powerpc/powernv: Add support to set power-shifting-ratio
This patch adds support to set power-shifting-ratio which hints the
firmware how to distribute/throttle power between different entities
in a system (e.g CPU v/s GPU). This ratio is used by OCC for power
capping algorithm.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:40:01 +10:00
Shilpasri G Bhat
cb8b340de2 powerpc/powernv: Add support for powercap framework
Adds a generic powercap framework to change the system powercap
inband through OPAL-OCC command/response interface.

Signed-off-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-10 22:39:53 +10:00
Madhavan Srinivasan
8f95faaac5 powerpc/powernv: Detect and create IMC device
Code to create platform device for the In-Memory Collection (IMC)
counters. Platform devices are created based on the IMC compatibility.
New header file created to contain the data structures and macros
needed for In-Memory Collection (IMC) counter pmu devices.

The device tree for IMC counters starts at the node "imc-counters".
This node contains all the IMC PMU nodes and event nodes for these IMC
PMUs. Device probe() parses the device to locate three possible IMC
device types (Nest/Core/Thread). Function then branch to parse each
unit nodes to populate vital information such as device memory sizes,
event nodes information, base address for reserve memory access (if
any) and so on. Simple bare-minimum shutdown function added which only
"stops" the engines.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
[mpe: Fix build with CONFIG_PERF_EVENTS=n]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-25 22:55:27 +10:00
Michael Ellerman
a70b487b07 powerpc/powernv: Fix boot on Power8 bare metal due to opal_configure_cores()
In commit 1c0eaf0f56 ("powerpc/powernv: Tell OPAL about our MMU mode
on POWER9"), we added additional flags to the OPAL call to configure
CPUs at boot.

These flags only work on Power9 firmwares, and worse can cause boot
failures on Power8 machines, so we check for CPU_FTR_ARCH_300 (aka POWER9)
before adding the extra flags.

Unfortunately we forgot that opal_configure_cores() is called before
the CPU feature checks are dynamically patched, meaning the check
always returns true.

We definitely need to do something to make the CPU feature checks less
prone to bugs like this, but for now the minimal fix is to use
early_cpu_has_feature().

Reported-and-tested-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Fixes: 1c0eaf0f56 ("powerpc/powernv: Tell OPAL about our MMU mode on POWER9")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-17 21:30:32 +10:00
Benjamin Herrenschmidt
1c0eaf0f56 powerpc/powernv: Tell OPAL about our MMU mode on POWER9
That will allow OPAL to configure the CPU in an optimal way.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-10 21:12:27 +10:00
Paolo Bonzini
4415b33528 Merge branch 'kvm-ppc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
The main thing here is a new implementation of the in-kernel
XICS interrupt controller emulation for POWER9 machines, from Ben
Herrenschmidt.

POWER9 has a new interrupt controller called XIVE (eXternal Interrupt
Virtualization Engine) which is able to deliver interrupts directly
to guest virtual CPUs in hardware without hypervisor intervention.
With this new code, the guest still sees the old XICS interface but
performance is better because the XICS emulation in the host uses the
XIVE directly rather than going through a XICS emulation in firmware.

Conflicts:
	arch/powerpc/kernel/cpu_setup_power.S [cherry-picked fix]
	arch/powerpc/kvm/book3s_xive.c [include asm/debugfs.h]
2017-05-09 11:50:01 +02:00
Benjamin Herrenschmidt
5af5099385 KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller
This patch makes KVM capable of using the XIVE interrupt controller
to provide the standard PAPR "XICS" style hypercalls. It is necessary
for proper operations when the host uses XIVE natively.

This has been lightly tested on an actual system, including PCI
pass-through with a TG3 device.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Cleanup pr_xxx(), unsplit pr_xxx() strings, etc., fix build
 failures by adding KVM_XIVE which depends on KVM_XICS and XIVE, and
 adding empty stubs for the kvm_xive_xxx() routines, fixup subject,
 integrate fixes from Paul for building PR=y HV=n]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-27 21:37:29 +10:00
Michael Ellerman
83c4919058 powerpc/powernv: Fix missing attr initialisation in opal_export_attrs()
In opal_export_attrs() we dynamically allocate some bin_attributes. They're
allocated with kmalloc() and although we initialise most of the fields, we don't
initialise write() or mmap(), and in particular we don't initialise the lockdep
related fields in the embedded struct attribute.

This leads to a lockdep warning at boot:

  BUG: key c0000000f11906d8 not in .data!
  WARNING: CPU: 0 PID: 1 at ../kernel/locking/lockdep.c:3136 lockdep_init_map+0x28c/0x2a0
  ...
  Call Trace:
    lockdep_init_map+0x288/0x2a0 (unreliable)
    __kernfs_create_file+0x8c/0x170
    sysfs_add_file_mode_ns+0xc8/0x240
    __machine_initcall_powernv_opal_init+0x60c/0x684
    do_one_initcall+0x60/0x1c0
    kernel_init_freeable+0x2f4/0x3d4
    kernel_init+0x24/0x160
    ret_from_kernel_thread+0x5c/0xb0

Fix it by kzalloc'ing the attr, which fixes the uninitialised write() and
mmap(), and calling sysfs_bin_attr_init() on it to initialise the lockdep
fields.

Fixes: 11fe909d23 ("powerpc/powernv: Add OPAL exports attributes to sysfs")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-26 15:57:19 +10:00
Matt Brown
11fe909d23 powerpc/powernv: Add OPAL exports attributes to sysfs
New versions of OPAL have a device node /ibm,opal/firmware/exports, each
property of which describes a range of memory in OPAL that Linux might
want to export to userspace for debugging.

This patch adds a sysfs file under 'opal/exports' for each property
found there, and makes it read-only by root.

Signed-off-by: Matt Brown <matthew.brown.dev@gmail.com>
[mpe: Drop counting of props, rename to attr, free on sysfs error, c'log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-04 14:11:22 +10:00
Michael Ellerman
63f44d6514 powerpc/book3s: Print task info if we take a machine check in user mode
For an MCE (Machine Check Exception) that hits while in user mode
MSR(PR=1), print the task info to the console MCE error log. This may
help to identify an application that triggered the MCE.

After this patch the MCE console looks like:

  Severe Machine check interrupt [Recovered]
    NIP: [0000000010039778] PID: 762 Comm: ebizzy
    Initiator: CPU
    Error type: SLB [Multihit]
      Effective address: 0000000010039778

  Severe Machine check interrupt [Not recovered]
    NIP: [0000000010039778] PID: 763 Comm: ebizzy
    Initiator: CPU
    Error type: UE [Page table walk ifetch]
      Effective address: 0000000010039778
  ebizzy[763]: unhandled signal 7 at 0000000010039778 nip 0000000010039778 lr 0000000010001b44 code 30004

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-04-03 16:12:00 +10:00
Nicholas Piggin
1363875bdb powerpc/64s: fix handling of non-synchronous machine checks
A synchronous machine check is an exception raised by the attempt to
execute the current instruction. If the error can't be corrected, it
can make sense to SIGBUS the currently running process.

In other cases, the error condition is not related to the current
instruction, so killing the current process is not the right thing to
do.

Today, all machine checks are MCE_SEV_ERROR_SYNC, so this has no
practical change. It will be used to handle POWER9 asynchronous
machine checks.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-03-10 16:32:06 +11:00
Alistair Popple
1d0761d255 powerpc/powernv: Initialise nest mmu
POWER9 contains an off core mmu called the nest mmu (NMMU). This is
used by other hardware units on the chip to translate virtual
addresses into real addresses. The unit attempting an address
translation provides the majority of the context required for the
translation request except for the base address of the partition table
(ie. the PTCR) which needs to be programmed into the NMMU.

This patch adds a call to OPAL to set the PTCR for the nest mmu in
opal_init().

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-01-30 20:24:33 +11:00
Michael Ellerman
ddbefe7e77 Merge branch 'topic/ppc-kvm' into next
Merge the topic branch we're sharing with the kvm-ppc tree.
2016-11-24 22:14:52 +11:00
Paul Mackerras
ffe6d810fe powerpc/powernv: Define real-mode versions of OPAL XICS accessors
This defines real-mode versions of opal_int_get_xirr(), opal_int_eoi()
and opal_int_set_mfrr(), for use by KVM real-mode code.

It also exports opal_int_set_mfrr() so that the modular part of KVM
can use it to send IPIs.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-23 10:32:11 +11:00
Jack Miller
9e4f51bdaf powerpc/powernv: Simplify searching for compatible device nodes
This condenses the opal node searching into a single function that finds
all compatible nodes, instead of just searching the ibm,opal children,
for ipmi, flash, and prd similar to how opal-i2c nodes are found.

Signed-off-by: Jack Miller <jack@codezen.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-11-14 20:05:57 +11:00
Mahesh Salgaonkar
c74dd88e77 powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
When machine check occurs with MSR(RI=0), it means MC interrupt is
unrecoverable and kernel goes down to panic path. But the console
message still shows it as recovered. This patch fixes the MCE console
messages.

Fixes: 36df96f8ac ("powerpc/book3s: Decode and save machine check event.")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-08-09 19:46:54 +10:00
Benjamin Herrenschmidt
d3cbff1b5a powerpc: Put exception configuration in a common place
The various calls to establish exception endianness and AIL are
now done from a single point using already established CPU and FW
feature bits to decide what to do.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-21 18:56:31 +10:00
Benjamin Herrenschmidt
a203658b5e powerpc/opal: Wake up kopald polling thread before waiting for events
On some environments (prototype machines, some simulators, etc...)
there is no functional interrupt source to signal completion, so
we rely on the fairly slow OPAL heartbeat.

In a number of cases, the calls complete very quickly or even
immediately. We've observed that it helps a lot to wakeup the OPAL
heartbeat thread before waiting for event in those cases, it will
call OPAL immediately to collect completions for anything that
finished fast enough.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-By: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-07-08 19:53:26 +10:00
Suraj Jitindar Singh
43a1dd9b5f powerpc/powernv: Add driver for operator panel on FSP machines
Implement new character device driver to allow access from user space
to the operator panel display present on IBM Power Systems machines
with FSPs.

This will allow status information to be presented on the display which
is visible to a user.

The driver implements a character buffer which a user can read/write
by accessing the device (/dev/op_panel). This buffer is then displayed on
the operator panel display. Any attempt to write past the last character
position will have no effect and attempts to write more characters than
the size of the display will be truncated. The device may only be accessed
by a single process at a time.

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-06-29 17:33:46 +10:00
Andrew Donnellan
9b4fffa149 powerpc/powernv: new function to access OPAL msglog
Currently, the OPAL msglog/console buffer is exposed as a sysfs file, with
the sysfs read handler responsible for retrieving the log from the OPAL
buffer. We'd like to be able to use it in xmon as well.

Refactor the OPAL msglog code to create a new function, opal_msglog_copy(),
that copies to an arbitrary buffer. Separate the initialisation code into
generic memcons init and sysfs file creation.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2016-02-09 20:27:04 +11:00
Linus Torvalds
f689b742f2 powerpc updates for 4.5
- Ground work for the new Power9 MMU from Aneesh Kumar K.V
  - Optimise FP/VMX/VSX context switching from Anton Blanchard
 
  - Various cleanups from Krzysztof Kozlowski, John Ogness, Rashmica Gupta,
    Russell Currey, Gavin Shan, Daniel Axtens, Michael Neuling, Andrew Donnellan
  - Allow wrapper to work on non-english system from Laurent Vivier
  - Add rN aliases to the pt_regs_offset table from Rashmica Gupta
  - Fix module autoload for rackmeter & axonram drivers from Luis de Bethencourt
  - Include KVM guest test in all interrupt vectors from Paul Mackerras
  - Fix DSCR inheritance over fork() from Anton Blanchard
  - Make value-returning atomics & {cmp}xchg* & their atomic_ versions fully ordered from Boqun Feng
  - Print MSR TM bits in oops messages from Michael Neuling
  - Add TM signal return & invalid stack selftests from Michael Neuling
  - Limit EPOW reset event warnings from Vipin K Parashar
  - Remove the Cell QPACE code from Rashmica Gupta
  - Append linux_banner to exception information in xmon from Rashmica Gupta
  - Add selftest to check if VSRs are corrupted from Rashmica Gupta
  - Remove broken GregorianDay() from Daniel Axtens
  - Import Anton's context_switch2 benchmark into selftests from Michael Ellerman
  - Add selftest script to test HMI functionality from Daniel Axtens
  - Remove obsolete OPAL v2 support from Stewart Smith
  - Make enter_rtas() private from Michael Ellerman
  - PPR exception cleanups from Michael Ellerman
  - Add page soft dirty tracking from Laurent Dufour
  - Add support for Nvlink NPUs from Alistair Popple
  - Add support for kexec on 476fpe from Alistair Popple
  - Enable kernel CPU dlpar from sysfs from Nathan Fontenot
  - Copy only required pieces of the mm_context_t to the paca from Michael Neuling
  - Add a kmsg_dumper that flushes OPAL console output on panic from Russell Currey
  - Implement save_stack_trace_regs() to enable kprobe stack tracing from Steven Rostedt
  - Add HWCAP bits for Power9 from Michael Ellerman
  - Fix _PAGE_PTE breaking swapoff from Aneesh Kumar K.V
  - Fix _PAGE_SWP_SOFT_DIRTY breaking swapoff from Hugh Dickins
  - scripts/recordmcount.pl: support data in text section on powerpc from Ulrich Weigand
  - Handle R_PPC64_ENTRY relocations in modules from Ulrich Weigand
 
  - cxl: Fix possible idr warning when contexts are released from Vaibhav Jain
  - cxl: use correct operator when writing pcie config space values from Andrew Donnellan
  - cxl: Fix DSI misses when the context owning task exits from Vaibhav Jain
  - cxl: fix build for GCC 4.6.x from Brian Norris
  - cxl: use -Werror only with CONFIG_PPC_WERROR from Brian Norris
  - cxl: Enable PCI device ID for future IBM CXL adapter from Uma Krishnan
 
  - Freescale updates from Scott: Highlights include moving QE code out of
    arch/powerpc (to be shared with arm), device tree updates, and minor fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWmIxeAAoJEFHr6jzI4aWAA+cQAIXAw4WfVWJ2V4ZK+1eKfB57
 fdXG71PuXG+WYIWy71ly8keLHdzzD1NQ2OUB64bUVRq202nRgVc15ZYKRJ/FE/sP
 SkxaQ2AG/2kI2EflWshOi0Lu9qaZ+LMHJnszIqE/9lnGSB2kUI/cwsSXgziiMKXR
 XNci9v14SdDd40YV/6BSZXoxApwyq9cUbZ7rnzFLmz4hrFuKmB/L3LABDF8QcpH7
 sGt/YaHGOtqP0UX7h5KQTFLGe1OPvK6NWixSXeZKQ71ED6cho1iKUEOtBA9EZeIN
 QM5JdHFWgX8MMRA0OHAgidkSiqO38BXjmjkVYWoIbYz7Zax3ThmrDHB4IpFwWnk3
 l7WBykEXY7KEqpZzbh0GFGehZWzVZvLnNgDdvpmpk/GkPzeYKomBj7ZZfm3H1yGD
 BTHPwuWCTX+/K75yEVNO8aJO12wBg7DRl4IEwBgqhwU8ga4FvUOCJkm+SCxA1Dnn
 qlpS7qPwTXNIEfKMJcxp5X0KiwDY1EoOotd4glTN0jbeY5GEYcxe+7RQ302GrYxP
 zcc8EGLn8h6BtQvV3ypNHF5l6QeTW/0ZlO9c236tIuUQ5gQU39SQci7jQKsYjSzv
 BB1XdLHkbtIvYDkmbnr1elbeJCDbrWL9rAXRUTRyfuCzaFWTfZmfVNe8c8qwDMLk
 TUxMR/38aI7bLcIQjwj9
 =R5bX
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Core:
   - Ground work for the new Power9 MMU from Aneesh Kumar K.V
   - Optimise FP/VMX/VSX context switching from Anton Blanchard

  Misc:
   - Various cleanups from Krzysztof Kozlowski, John Ogness, Rashmica
     Gupta, Russell Currey, Gavin Shan, Daniel Axtens, Michael Neuling,
     Andrew Donnellan
   - Allow wrapper to work on non-english system from Laurent Vivier
   - Add rN aliases to the pt_regs_offset table from Rashmica Gupta
   - Fix module autoload for rackmeter & axonram drivers from Luis de
     Bethencourt
   - Include KVM guest test in all interrupt vectors from Paul Mackerras
   - Fix DSCR inheritance over fork() from Anton Blanchard
   - Make value-returning atomics & {cmp}xchg* & their atomic_ versions
     fully ordered from Boqun Feng
   - Print MSR TM bits in oops messages from Michael Neuling
   - Add TM signal return & invalid stack selftests from Michael Neuling
   - Limit EPOW reset event warnings from Vipin K Parashar
   - Remove the Cell QPACE code from Rashmica Gupta
   - Append linux_banner to exception information in xmon from Rashmica
     Gupta
   - Add selftest to check if VSRs are corrupted from Rashmica Gupta
   - Remove broken GregorianDay() from Daniel Axtens
   - Import Anton's context_switch2 benchmark into selftests from
     Michael Ellerman
   - Add selftest script to test HMI functionality from Daniel Axtens
   - Remove obsolete OPAL v2 support from Stewart Smith
   - Make enter_rtas() private from Michael Ellerman
   - PPR exception cleanups from Michael Ellerman
   - Add page soft dirty tracking from Laurent Dufour
   - Add support for Nvlink NPUs from Alistair Popple
   - Add support for kexec on 476fpe from Alistair Popple
   - Enable kernel CPU dlpar from sysfs from Nathan Fontenot
   - Copy only required pieces of the mm_context_t to the paca from
     Michael Neuling
   - Add a kmsg_dumper that flushes OPAL console output on panic from
     Russell Currey
   - Implement save_stack_trace_regs() to enable kprobe stack tracing
     from Steven Rostedt
   - Add HWCAP bits for Power9 from Michael Ellerman
   - Fix _PAGE_PTE breaking swapoff from Aneesh Kumar K.V
   - Fix _PAGE_SWP_SOFT_DIRTY breaking swapoff from Hugh Dickins
   - scripts/recordmcount.pl: support data in text section on powerpc
     from Ulrich Weigand
   - Handle R_PPC64_ENTRY relocations in modules from Ulrich Weigand

  cxl:
   - cxl: Fix possible idr warning when contexts are released from
     Vaibhav Jain
   - cxl: use correct operator when writing pcie config space values
     from Andrew Donnellan
   - cxl: Fix DSI misses when the context owning task exits from Vaibhav
     Jain
   - cxl: fix build for GCC 4.6.x from Brian Norris
   - cxl: use -Werror only with CONFIG_PPC_WERROR from Brian Norris
   - cxl: Enable PCI device ID for future IBM CXL adapter from Uma
     Krishnan

  Freescale:
   - Freescale updates from Scott: Highlights include moving QE code out
     of arch/powerpc (to be shared with arm), device tree updates, and
     minor fixes"

* tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (149 commits)
  powerpc/module: Handle R_PPC64_ENTRY relocations
  scripts/recordmcount.pl: support data in text section on powerpc
  powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages
  powerpc/mm: fix _PAGE_SWP_SOFT_DIRTY breaking swapoff
  powerpc/mm: Fix _PAGE_PTE breaking swapoff
  cxl: Enable PCI device ID for future IBM CXL adapter
  cxl: use -Werror only with CONFIG_PPC_WERROR
  cxl: fix build for GCC 4.6.x
  powerpc: Add HWCAP bits for Power9
  powerpc/powernv: Reserve PE#0 on NPU
  powerpc/powernv: Change NPU PE# assignment
  powerpc/powernv: Fix update of NVLink DMA mask
  powerpc/powernv: Remove misleading comment in pci.c
  powerpc: Implement save_stack_trace_regs() to enable kprobe stack tracing
  powerpc: Fix build break due to paca mm_context_t changes
  cxl: Fix DSI misses when the context owning task exits
  MAINTAINERS: Update Scott Wood's e-mail address
  powerpc/powernv: Fix minor off-by-one error in opal_mce_check_early_recovery()
  powerpc: Fix style of self-test config prompts
  powerpc/powernv: Only delay opal_rtc_read() retry when necessary
  ...
2016-01-15 13:18:47 -08:00
Andrew Donnellan
dc3799bb9a powerpc/powernv: Fix minor off-by-one error in opal_mce_check_early_recovery()
Fix off-by-one error in opal_mce_check_early_recovery() when checking
whether the NIP falls within OPAL space.

Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-27 19:12:41 +11:00
Russell Currey
affddff69c powerpc/powernv: Add a kmsg_dumper that flushes console output on panic
On BMC machines, console output is controlled by the OPAL firmware and is
only flushed when its pollers are called.  When the kernel is in a panic
state, it no longer calls these pollers and thus console output does not
completely flush, causing some output from the panic to be lost.

Output is only actually lost when the kernel is configured to not power off
or reboot after panic (i.e. CONFIG_PANIC_TIMEOUT is set to 0) since OPAL
flushes the console buffer as part of its power down routines.  Before this
patch, however, only partial output would be printed during the timeout wait.

This patch adds a new kmsg_dumper which gets called at panic time to ensure
panic output is not lost.  It accomplishes this by calling OPAL_CONSOLE_FLUSH
in the OPAL API, and if that is not available, the pollers are called enough
times to (hopefully) completely flush the buffer.

The flushing mechanism will only affect output printed at and before the
kmsg_dump call in kernel/panic.c:panic().  As such, the "end Kernel panic"
message may still be truncated as follows:

>Call Trace:
>[c000000f1f603b00] [c0000000008e9458] dump_stack+0x90/0xbc (unreliable)
>[c000000f1f603b30] [c0000000008e7e78] panic+0xf8/0x2c4
>[c000000f1f603bc0] [c000000000be4860] mount_block_root+0x288/0x33c
>[c000000f1f603c80] [c000000000be4d14] prepare_namespace+0x1f4/0x254
>[c000000f1f603d00] [c000000000be43e8] kernel_init_freeable+0x318/0x350
>[c000000f1f603dc0] [c00000000000bd74] kernel_init+0x24/0x130
>[c000000f1f603e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac
>---[ end Kernel panic - not

This functionality is implemented as a kmsg_dumper as it seems to be the
most sensible way to introduce platform-specific functionality to the
panic function.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-27 19:12:40 +11:00
Stewart Smith
e4d54f71d2 powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPAL
Long ago, only in the lab, there was OPALv1 and OPALv2. Now there is
just OPALv3, with nobody ever expecting anything on pre-OPALv3 to
be cared about or supported by mainline kernels.

So, let's remove FW_FEATURE_OPALv3 and instead use FW_FEATURE_OPAL
exclusively.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:54 +11:00
Stewart Smith
7261aafc09 powerpc/powernv: Remove OPALv2 firmware define and references
OPALv2 only ever existed in the lab and didn't escape to the world.
All OPAL systems in the wild are OPALv3.

The probability of there being an OPALv2 system still powered on
anywhere inside IBM is approximately zero, let alone anyone
expecting to run mainline kernels.

So, start to remove references to OPALv2.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:54 +11:00
Stewart Smith
786842b62f powerpc/powernv: panic() on OPAL < V3
The OpenPower Abstraction Layer firmware went through a couple
of iterations in the lab before being released. What we now know
as OPAL advertises itself as OPALv3.

OPALv2 and OPALv1 never made it outside the lab, and the possibility
of anyone at all ever building a mainline kernel today and expecting
it to boot on such hardware is zero.

Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 22:40:53 +11:00
Stewart Smith
98da62b716 powerpc/powernv: pr_warn_once on unsupported OPAL_MSG type
When running on newer OPAL firmware that supports sending extra
OPAL_MSG types, we would print a warning on *every* message received.

This could be a problem for kernels that don't support OPAL_MSG_OCC
on machines that are running real close to thermal limits and the
OCC is throttling the chip. For a kernel that is paying attention to
the message queue, we could get these notifications quite often.

Conceivably, future message types could also come fairly often,
and printing that we didn't understand them 10,000 times provides
no further information than printing them once.

Cc: stable@vger.kernel.org
Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-12-17 20:42:13 +11:00
Daniel Axtens
f2dd80ecca powerpc/powernv: Panic on unhandled Machine Check
All unrecovered machine check errors on PowerNV should cause an
immediate panic. There are 2 reasons that this is the right policy:
it's not safe to continue, and we're already trying to reboot.

Firstly, if we go through the recovery process and do not successfully
recover, we can't be sure about the state of the machine, and it is
not safe to recover and proceed.

Linux knows about the following sources of Machine Check Errors:
- Uncorrectable Errors (UE)
- Effective - Real Address Translation (ERAT)
- Segment Lookaside Buffer (SLB)
- Translation Lookaside Buffer (TLB)
- Unknown/Unrecognised

In the SLB, TLB and ERAT cases, we can further categorise these as
parity errors, multihit errors or unknown/unrecognised.

We can handle SLB errors by flushing and reloading the SLB. We can
handle TLB and ERAT multihit errors by flushing the TLB. (It appears
we may not handle TLB and ERAT parity errors: I will investigate
further and send a followup patch if appropriate.)

This leaves us with uncorrectable errors. Uncorrectable errors are
usually the result of ECC memory detecting an error that it cannot
correct, but they also crop up in the context of PCI cards failing
during DMA writes, and during CAPI error events.

There are several types of UE, and there are 3 places a UE can occur:
Skiboot, the kernel, and userspace. For Skiboot errors, we have the
facility to make some recoverable. For userspace, we can simply kill
(SIGBUS) the affected process. We have no meaningful way to deal with
UEs in kernel space or in unrecoverable sections of Skiboot.

Currently, these unrecovered UEs fall through to
machine_check_expection() in traps.c, which calls die(), which OOPSes
and sends SIGBUS to the process. This sometimes allows us to stumble
onwards. For example we've seen UEs kill the kernel eehd and
khugepaged. However, the process killed could have held a lock, or it
could have been a more important process, etc: we can no longer make
any assertions about the state of the machine. Similarly if we see a
UE in skiboot (and again we've seen this happen), we're not in a
position where we can make any assertions about the state of the
machine.

Likewise, for unknown or unrecognised errors, we're not able to say
anything about the state of the machine.

Therefore, if we have an unrecovered MCE, the most appropriate thing
to do is to panic.

The second reason is that since e784b6499d ("powerpc/powernv: Invoke
opal_cec_reboot2() on unrecoverable machine check errors."), we
attempt a special OPAL reboot on an unhandled MCE. This is so the
hardware can record error data for later debugging.

The comments in that commit assert that we are heading down the panic
path anyway. At the moment this is not always true. With UEs in kernel
space, for instance, they are marked as recoverable by the hardware,
so if the attempt to reboot failed (e.g. old Skiboot), we wouldn't
panic() but would simply die() and OOPS. It doesn't make sense to be
staggering on if we've just tried to reboot: we should panic().

Explicitly panic() on unrecovered MCEs on PowerNV.
Update the comments appropriately.

This fixes some hangs following EEH events on cxlflash setups.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Reviewed-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-09 08:07:19 +11:00
Vasant Hegde
c159b5968e powerpc/powernv: Create LED platform device
This patch adds platform devices for leds. Also export LED related
OPAL API's so that led driver can use these APIs.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-20 18:19:07 +10:00
Mahesh Salgaonkar
e784b6499d powerpc/powernv: Invoke opal_cec_reboot2() on unrecoverable machine check errors.
On non-recoverable MCE errors in kernel space, Linux kernel panics
and system reboots. On BMC based system opal-prd runs as a daemon
in the host. Hence, kernel crash may prevent opal-prd to detect and
analyze this MCE error. This may land us in a situation where the faulty
memory never gets de-configured and Linux would keep hitting same MCE error
again and again. If this happens in early stage of kernel initialization,
then Linux will keep crashing and rebooting in a loop.

This patch fixes this issue by invoking new opal_cec_reboot2() call with
reboot type OPAL_REBOOT_PLATFORM_ERROR to inform BMC/OCC about this
error, so that BMC can collect relevant data for error analysis and
decide what component to de-configure before rebooting.

This patch is dependent on OPAL patchset posted on skiboot mailing list
at https://lists.ozlabs.org/pipermail/skiboot/2015-July/001771.html that
introduces opal_cec_reboot2() opal call.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-08-06 15:10:18 +10:00
Alistair Popple
02b6505c8f powerpc/powernv: Increase opal-irqchip initcall priority
The eeh subsystem for powernv requires the opal event irqchip to be
initialised prior to initialisation or the following errors are
produced (and eeh doesn't work as expected):

irq: XICS didn't like hwirq-0x9 to VIRQ17 mapping (rc=-22)
pnv_eeh_post_init: Can't request OPAL event interrupt (0)

On powernv eeh is initialised from a subsys_initcall due to a check
for machine_is(powernv) in eeh_init(). This patch increases the
initcall priority of opal_event_init() to an arch_initcall to ensure
the opal event interface is initialised prior to any users of it.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Reported-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-18 07:16:01 +10:00
Jeremy Kerr
0d7cd8550d powerpc/powernv: Add opal-prd channel
This change adds a char device to access the "PRD" (processor runtime
diagnostics) channel to OPAL firmware.

Includes contributions from Vaidyanathan Srinivasan, Neelesh Gupta &
Vishal Kulkarni.

Signed-off-by: Neelesh Gupta <neelegup@linux.vnet.ibm.com>
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-05 08:32:21 +10:00
Jeremy Kerr
594fcb9ec9 powerpc/powernv: Expose OPAL APIs required by PRD interface
The (upcoming) opal-prd driver needs to access the message notifier and
xscom code, so add EXPORT_SYMBOL_GPL macros for these.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-05 08:32:20 +10:00
Jeremy Kerr
48c0615495 powerpc/powernv: Merge common platform device initialisation
opal_ipmi_init and opal_flash_init are equivalent, except for the
compatbile string. Merge these two into a common opal_pdev_init
function.

Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-06-05 08:32:20 +10:00