Commit Graph

37 Commits

Author SHA1 Message Date
Alistair Popple
b8d65e9662 powerpc/eeh-powernv: Fix unbalanced IRQ warning
pnv_eeh_next_error() re-enables the eeh opal event interrupt but it
gets called from a loop if there are more outstanding events to
process, resulting in a warning due to enabling an already enabled
interrupt. Instead the interrupt should only be re-enabled once the
last outstanding event has been processed.

Tested-by: Daniel Axtens <dja@axtens.net>
Reported-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-07-30 19:01:32 +10:00
Alistair Popple
79231448c9 powernv/eeh: Update the EEH code to use the opal irq domain
The eeh code currently uses the old notifier method to get eeh events
from OPAL. It also contains some logic to filter opal events which has
been moved into the virtual irqchip. This patch converts the eeh code
to the new event interface which simplifies event handling.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-22 15:14:38 +10:00
Wei Yang
e17866d559 powerpc/eeh: fix powernv_eeh_wait_state delay logic
As the comment indicates, powernv_eeh_get_state() will inform EEH core to
delay 1 second. This means the delay doesn't happen when
powernv_eeh_get_state() returns.

This patch moves the delay subtraction just before msleep(), which is the
same logic in pseries_eeh_wait_state().

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-13 14:00:07 +10:00
Wei Yang
2ac3990cc3 powerpc/eeh: fix comment for wait_state()
To retrieve the PCI slot state, EEH driver would set a timeout for that.
While current comment is not aligned to what the code does.

This patch fixes those comments according to the code.

Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-05-13 14:00:07 +10:00
Gavin Shan
0bd785873c powerpc/eeh: Replace device_node with pci_dn in eeh_ops
There are 3 EEH operations whose arguments contain device_node:
read_config(), write_config() and restore_config(). The patch
replaces device_node with pci_dn.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:52 +11:00
Gavin Shan
ff57b454dd powerpc/eeh: Do probe on pci_dn
Originally, EEH core probes on device_node or pci_dev to populate
EEH devices and PEs, which conflicts with the fact: SRIOV VFs are
usually enabled and created by PF's driver and they don't have the
corresponding device_nodes. Instead, SRIOV VFs have dynamically
created pci_dn, which can be used for EEH probe.

The patch reworks EEH probe for PowerNV and pSeries platforms to
do probing based on pci_dn, instead of pci_dev or device_node any
more.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:52 +11:00
Gavin Shan
3532a741f8 powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor
The PCI config accessors previously relied on device_node.  Unfortunately,
VFs don't have a corresponding device_node, so change the accessors to use
pci_dn instead.

[bhelgaas: changelog]
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-24 13:15:50 +11:00
Gavin Shan
cadf364d14 powerpc/powernv: Drop PHB operation reset()
The patch drops PHB EEH operation reset() and merges its logic to
eeh_ops::reset().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:19 +11:00
Gavin Shan
2a485ad7c8 powerpc/powernv: Drop PHB operation next_error()
The patch drops PHB EEH operation next_error() and merges its
logic to eeh_ops::next_error().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:19 +11:00
Gavin Shan
40ae5f693f powerpc/powernv: Drop PHB operation get_state()
The patch drops PHB EEH operation get_state() and merges its logic
to eeh_ops::get_state().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:19 +11:00
Gavin Shan
7e3e4f8d5e powerpc/powernv: Drop PHB operation set_option()
The patch drops PHB EEH operation set_option() and merges its
logic to eeh_ops::set_option().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:19 +11:00
Gavin Shan
bbe170ede1 powerpc/powernv: Drop PHB operation configure_bridge()
The patch drops PHB EEH operation configure_bridge() and merges
its logic to eeh_ops::configure_bridge().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:19 +11:00
Gavin Shan
95edcdeadf powerpc/powernv: Drop PHB operation get_log()
The patch drops PHB operation get_log() and merges its logic to
eeh_ops::get_log().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:19 +11:00
Gavin Shan
4cf1744558 powerpc/powernv: Drop PHB operation post_init()
The patch drops PHB EEH operation post_init() and merge its logic
to eeh_ops::post_init().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:18 +11:00
Gavin Shan
fa646c3cab powerpc/powernv: Drop PHB operation err_inject()
The patch drops PHB EEH operation err_inject() and merge its logic
to eeh_ops::err_inject().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:18 +11:00
Gavin Shan
01f3bfb780 powerpc/powernv: Shorten EEH function names
The patch shortens names of EEH functions in powernv-eeh.c and no
logic change introduced by this patch.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2015-03-17 10:31:18 +11:00
Gavin Shan
2aa5cf9e48 powerpc/eeh: Fix missed PE#0 on P7IOC
PE#0 should be regarded as valid for P7IOC, while it's invalid for
PHB3. The patch adds flag EEH_VALID_PE_ZERO to differentiate those
two cases. Without the patch, we possibly see frozen PE#0 state is
cleared without EEH recovery taken on P7IOC as following kernel logs
indicate:

[root@ltcfbl8eb ~]# dmesg
       :
pci 0000:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0000:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0001:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0001:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0002:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0002:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0003:00     : [PE# 000] Secondary bus 0 associated with PE#0
pci 0003:01     : [PE# 001] Secondary bus 1 associated with PE#1
pci 0003:20     : [PE# 002] Secondary bus 32..63 associated with PE#2
       :
EEH: Clear non-existing PHB#3-PE#0
EEH: PHB location: U78AE.001.WZS00M9-P1-002

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:52 +11:00
Gavin Shan
179ea48bc7 powerpc/eeh: Block CFG upon frozen Shiner adapter
The Broadcom Shiner 2-ports 10G ethernet adapter has same problem
commit 6f20bda0 ("powerpc/eeh: Block PCI config access upon frozen
PE") fixes. Put it to the black list as well.

   # lspci -s 0004:01:00.0
   0004:01:00.0 Ethernet controller: Broadcom Corporation \
                NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
   # lspci -n -s 0004:01:00.0
   0004:01:00.0 0200: 14e4:168e (rev 10)

Reported-by: John Walthour <jwalthour@us.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-15 11:27:34 +11:00
Gavin Shan
b6541db139 powerpc/eeh: Block PCI config access upon frozen PE
The problem was found when I tried to inject PCI config error by
PHB3 PAPR error injection registers into Broadcom Austin 4-ports
NIC adapter. The frozen PE was reported successfully and EEH core
started to recover it. However, I run into fenced PHB when dumping
PCI config space as EEH logs. I was told that PCI config requests
should not be progagated to the adapter until PE reset is done
successfully. Otherise, we would run out of PHB internal credits
and trigger PCT (PCIE Completion Timeout), which leads to the
fenced PHB.

The patch introduces another PE flag EEH_PE_CFG_RESTRICTED, which
is set during PE initialization time if the PE includes the specific
PCI devices that need block PCI config access until PE reset is done.
When the PE becomes frozen for the first time, EEH_PE_CFG_BLOCKED is
set if the PE has flag EEH_PE_CFG_RESTRICTED. Then the PCI config
access to the PE will be dropped by platform PCI accessors until
PE reset is done successfully. The mechanism is shared by PowerNV
platform owned PE or userland owned ones. It's not used on pSeries
platform yet.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-15 11:27:20 +11:00
Gavin Shan
d2cfbcd7c8 powerpc/powernv: Drop config requests in EEH accessors
It's bad idea to access the PCI config registers of the adapters,
which is experiencing reset. It leads to recursive EEH error without
exception. The patch drops PCI config requests in EEH accessors if
the PE has been marked to accept PCI config requests, for example
during PE reseet time.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-10-15 11:27:19 +11:00
Gavin Shan
131c123abe powerpc/eeh: Introduce eeh_ops::err_inject
The patch introduces eeh_ops::err_inject(), which allows to inject
specified errors to indicated PE for testing purpose. The functionality
isn't support on pSeries platform. On PowerNV, the functionality
relies on OPAL API opal_pci_err_inject().

Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2014-09-30 17:15:10 +10:00
Gavin Shan
bb593c0049 powerpc/eeh: Aux PE data for error log
The patch allows PE (struct eeh_pe) instance to have auxillary data,
whose size is configurable on basis of platform. For PowerNV, the
auxillary data will be used to cache PHB diag-data for that PE
(frozen PE or fenced PHB). In turn, we can retrieve the diag-data
at any later points.

It's useful for the case of VFIO PCI devices where the error log
should be cached, and then be retrieved by the guest at later point.
Also, it can avoid PHB diag-data overwritting if another frozen PE
reported and the previous diag-data isn't fetched by guest.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:41:43 +10:00
Gavin Shan
0dae27439a powerpc/eeh: Replace pr_warning() with pr_warn()
pr_warn() is equal to pr_warning(), but the former is a bit more
formal according to commit fc62f2f ("kernel.h: add pr_warn for
symmetry to dev_warn, netdev_warn").

The patch replaces pr_warning() with pr_warn().

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:41:34 +10:00
Gavin Shan
dc561fb9e7 powerpc/eeh: Selectively enable IO for error log
According to the experiment I did, PCI config access is blocked
on P7IOC frozen PE by hardware, but PHB3 doesn't do that. That
means we always get 0xFF's while dumping PCI config space of the
frozen PE on P7IOC. We don't have the problem on PHB3. So we have
to enable I/O prioir to collecting error log. Otherwise, meaningless
0xFF's are always returned.

The patch fixes it by EEH flag (EEH_ENABLE_IO_FOR_LOG), which is
selectively set to indicate the case for: P7IOC on PowerNV platform,
pSeries platform.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:41:25 +10:00
Gavin Shan
05b1721d9f powerpc/eeh: Refactor EEH flag accessors
There are multiple global EEH flags. Almost each flag has its own
accessor, which doesn't make sense. The patch refactors EEH flag
accessors so that they look unified:

  eeh_add_flag():   Add EEH flag
  eeh_clear_flag(): Clear EEH flag
  eeh_has_flag():   Check if one specific flag has been set
  eeh_enabled():    Check if EEH functionality has been enabled

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:41:21 +10:00
Mike Qiu
dadcd6d6e7 powerpc/eeh: sysfs entries lost
The sysfs entries are lost because of commit 2213fb1 ("powerpc/eeh:
Skip eeh sysfs when eeh is disabled"). That commit added condition
to create sysfs entries with EEH_ENABLED, which isn't populated
when trying to create sysfs entries on PowerNV platform during system
boot time. The patch fixes the issue by:

   * Reoder EEH initialization functions so that they're same on
     PowerNV/pSeries.
   * Cache PE's primary bus by PowerNV platform instead of EEH core
     to avoid kernel crash caused by the function reorder. Another
     benefit with this is to avoid one eeh_probe_mode_dev() in EEH
     core.

Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-05 15:28:49 +10:00
Michael Ellerman
b14726c51c powerpc/powernv: Switch powernv drivers to use machine_xxx_initcall()
A lot of the code in platforms/powernv is using non-machine initcalls.
That means if a kernel built with powernv support runs on another
platform, for example pseries, the initcalls will still run.

That is usually OK, because the initcalls will check for something in
the device tree or elsewhere before doing anything, so on other
platforms they will usually just return.

But it's fishy for powernv code to be running on other platforms, so
switch them all to be machine initcalls. If we want any of them to run
on other platforms in future they should move to sysdev.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-07-28 14:11:26 +10:00
Gavin Shan
2a18dfc6ee powerpc/eeh: Use cached capability for log dump
When calling into eeh_gather_pci_data() on pSeries platform, we
possiblly don't have pci_dev instance yet, but eeh_dev is always
ready. So we use cached capability from eeh_dev instead of pci_dev
for log dump there. In order to keep things unified, we also cache
PCI capability positions to eeh_dev for PowerNV as well.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-04-28 17:34:19 +10:00
Gavin Shan
2ec5a0adf6 powerpc/eeh: Cleanup on eeh_subsystem_enabled
The patch cleans up variable eeh_subsystem_enabled so that we needn't
refer the variable directly from external. Instead, we will use
function eeh_enabled() and eeh_set_enable() to operate the variable.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-17 11:19:39 +11:00
Gavin Shan
9be3becc2f powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config space
The patch implements the EEH operation backend restore_config()
for PowerNV platform. That relies on OPAL API opal_pci_reinit()
where we reinitialize the error reporting properly after PE or
PHB reset.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:57:43 +11:00
Gavin Shan
1d350544d5 powerpc/eeh: Add restore_config operation
After reset on the specific PE or PHB, we never configure AER
correctly on PowerNV platform. We needn't care it on pSeries
platform. The patch introduces additional EEH operation eeh_ops::
restore_config() so that we have chance to configure AER correctly
for PowerNV platform.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-01-15 13:46:46 +11:00
Gavin Shan
20bb842b9b powerpc/powernv: Enable EEH for PHB3
The EEH isn't enabled for PHB3 and the patch intends to enable it.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-10-11 16:53:56 +11:00
Gavin Shan
ab55d2187d powerpc/eeh: Introdce flag to protect sysfs
The patch introduces flag EEH_DEV_SYSFS to keep track that the sysfs
entries for the corresponding EEH device (then PCI device) has been
added or removed, in order to avoid race condition.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24 14:18:49 +10:00
Gavin Shan
4b83bd452f powerpc/eeh: Don't use pci_dev during BAR restore
While restoring BARs for one specific PCI device, the pci_dev
instance should have been released. So it's not reliable to use
the pci_dev instance on restoring BARs. However, we still need
some information (e.g. PCIe capability position, header type) from
the pci_dev instance. So we have to store those information to
EEH device in advance.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24 14:18:49 +10:00
Gavin Shan
f5c57710dd powerpc/eeh: Use partial hotplug for EEH unaware drivers
When EEH error happens to one specific PE, some devices with drivers
supporting EEH won't except hotplug on the device. However, there
might have other deivces without driver, or with driver without EEH
support. For the case, we need do partial hotplug in order to make
sure that the PE becomes absolutely quite during reset. Otherise,
the PE reset might fail and leads to failure of error recovery.

The current code doesn't handle that 'mixed' case properly, it either
uses the error callbacks to the drivers, or tries hotplug, but doesn't
handle a PE (EEH domain) composed of a combination of the two.

The patch intends to support so-called "partial" hotplug for EEH:
Before we do reset, we stop and remove those PCI devices without
EEH sensitive driver. The corresponding EEH devices are not detached
from its PE, but with special flag. After the reset is done, those
EEH devices with the special flag will be scanned one by one.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-24 14:18:48 +10:00
Gavin Shan
9bf41be673 powerpc/powernv: Use dev-node in PCI config accessors
Currently, we're using the combo (PCI bus + devfn) in the PCI
config accessors and PCI config accessors in EEH depends on them.
However, it's not safe to refer the PCI bus which might have been
removed during hotplug. So we're using device node in the PCI
config accessors and the corresponding backends just reuse them.

The patch also fix one potential risk: We possiblly have frozen
PE during the early PCI probe time, but we haven't setup the PE
mapping yet. So the errors should be counted to PE#0.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-07-01 11:10:33 +10:00
Gavin Shan
29310e5e86 powerpc/eeh: PowerNV EEH backends
The patch adds EEH backends for PowerNV platform. It's notable that
part of those EEH backends call to the I/O chip dependent backends.

[Removed pointless change to eeh_pseries.c -- BenH]

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-20 17:06:43 +10:00