linux_dsm_epyc7002/drivers/pci
Gabriele Paoloni 86acc79071 PCI/AER: Report non-fatal errors only to the affected endpoint
Previously, if an non-fatal error was reported by an endpoint, we
called report_error_detected() for the endpoint, every sibling on the
bus, and their descendents.  If any of them did not implement the
.error_detected() method, do_recovery() failed, leaving all these
devices unrecovered.

For example, the system described in the bugzilla below has two devices:

  0000:74:02.0 [19e5:a230] SAS controller, driver has .error_detected()
  0000:74:03.0 [19e5:a235] SATA controller, driver lacks .error_detected()

When a device such as 74:02.0 reported a non-fatal error, do_recovery()
failed because 74:03.0 lacked an .error_detected() method.  But per PCIe
r3.1, sec 6.2.2.2.2, such an error does not compromise the Link and
does not affect 74:03.0:

  Non-fatal errors are uncorrectable errors which cause a particular
  transaction to be unreliable but the Link is otherwise fully functional.
  Isolating Non-fatal from Fatal errors provides Requester/Receiver logic
  in a device or system management software the opportunity to recover from
  the error without resetting the components on the Link and disturbing
  other transactions in progress.  Devices not associated with the
  transaction in error are not impacted by the error.

Report non-fatal errors only to the endpoint that reported them.  We really
want to check for AER_NONFATAL here, but the current code structure doesn't
allow that.  Looking for pci_channel_io_normal is the best we can do now.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=197055
Fixes: 6c2b374d74 ("PCI-Express AER implemetation: AER core and aerdriver")
Signed-off-by: Gabriele Paoloni <gabriele.paoloni@huawei.com>
Signed-off-by: Dongdong Liu <liudongdong3@huawei.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2017-10-05 15:49:31 -05:00
..
dwc Merge branch 'pci/trivial' into next 2017-09-07 13:24:20 -05:00
endpoint PCI: endpoint: Use correct "end of test" interrupt 2017-09-20 13:56:06 -05:00
host pci-v4.14-changes 2017-09-08 15:47:43 -07:00
hotplug Merge branch 'pci/misc' into next 2017-09-07 13:24:16 -05:00
pcie PCI/AER: Report non-fatal errors only to the affected endpoint 2017-10-05 15:49:31 -05:00
switch Merge branch 'pci/switchtec' into next 2017-07-02 18:51:10 -05:00
access.c PCI: Provide Kconfig option for lockless config space accessors 2017-06-28 22:32:56 +02:00
ats.c PCI: Restore PRI and PASID state after Function-Level Reset 2017-05-30 15:40:50 -05:00
bus.c
ecam.c
host-bridge.c
hotplug-pci.c
htirq.c
iov.c PCI: Disable VF decoding before pcibios_sriov_disable() updates resources 2017-08-29 17:24:02 -05:00
irq.c
Kconfig PCI: Provide Kconfig option for lockless config space accessors 2017-06-28 22:32:56 +02:00
Makefile PCI: Build setup-irq.o on all arches 2017-07-02 16:14:27 -05:00
mmap.c
msi.c pci-v4.14-changes 2017-09-08 15:47:43 -07:00
of.c
pci-acpi.c ACPI / PCI / PM: Rework acpi_pci_propagate_wakeup() 2017-08-01 14:05:03 +02:00
pci-driver.c ACPI updates for v4.14-rc1 2017-09-05 12:45:03 -07:00
pci-label.c PCI: Constify label attribute_group structures 2017-08-10 15:21:41 -05:00
pci-mid.c PCI / PM: Simplify device wakeup settings code 2017-06-28 01:52:45 +02:00
pci-stub.c
pci-sysfs.c PCI: Fix race condition with driver_override 2017-09-25 18:34:54 -05:00
pci.c Revert "PCI: Avoid race while enabling upstream bridges" 2017-09-15 01:33:51 -05:00
pci.h PCI: Mark Broadcom HT2100 Root Port Extended Tags as broken 2017-07-31 14:31:22 -05:00
probe.c pci-v4.14-changes 2017-09-08 15:47:43 -07:00
proc.c
quirks.c dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
remove.c
rom.c
search.c
setup-bus.c
setup-irq.c PCI: Inline and remove pcibios_update_irq() 2017-08-10 12:49:57 -05:00
setup-res.c PCI: Add a generic weak pcibios_align_resource() 2017-08-02 14:53:16 -05:00
slot.c
syscall.c
vc.c
vpd.c
xen-pcifront.c