mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-21 23:01:04 +07:00
12833017e5
Add sysfs attributes for rootport statistics (that are cumulative of all the ERR_* messages seen on this PCI hierarchy). Signed-off-by: Rajat Jain <rajatja@google.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
123 lines
4.6 KiB
Plaintext
123 lines
4.6 KiB
Plaintext
==========================
|
|
PCIe Device AER statistics
|
|
==========================
|
|
These attributes show up under all the devices that are AER capable. These
|
|
statistical counters indicate the errors "as seen/reported by the device".
|
|
Note that this may mean that if an endpoint is causing problems, the AER
|
|
counters may increment at its link partner (e.g. root port) because the
|
|
errors may be "seen" / reported by the link partner and not the
|
|
problematic endpoint itself (which may report all counters as 0 as it never
|
|
saw any problems).
|
|
|
|
Where: /sys/bus/pci/devices/<dev>/aer_dev_correctable
|
|
Date: July 2018
|
|
Kernel Version: 4.19.0
|
|
Contact: linux-pci@vger.kernel.org, rajatja@google.com
|
|
Description: List of correctable errors seen and reported by this
|
|
PCI device using ERR_COR. Note that since multiple errors may
|
|
be reported using a single ERR_COR message, thus
|
|
TOTAL_ERR_COR at the end of the file may not match the actual
|
|
total of all the errors in the file. Sample output:
|
|
-------------------------------------------------------------------------
|
|
localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_correctable
|
|
Receiver Error 2
|
|
Bad TLP 0
|
|
Bad DLLP 0
|
|
RELAY_NUM Rollover 0
|
|
Replay Timer Timeout 0
|
|
Advisory Non-Fatal 0
|
|
Corrected Internal Error 0
|
|
Header Log Overflow 0
|
|
TOTAL_ERR_COR 2
|
|
-------------------------------------------------------------------------
|
|
|
|
Where: /sys/bus/pci/devices/<dev>/aer_dev_fatal
|
|
Date: July 2018
|
|
Kernel Version: 4.19.0
|
|
Contact: linux-pci@vger.kernel.org, rajatja@google.com
|
|
Description: List of uncorrectable fatal errors seen and reported by this
|
|
PCI device using ERR_FATAL. Note that since multiple errors may
|
|
be reported using a single ERR_FATAL message, thus
|
|
TOTAL_ERR_FATAL at the end of the file may not match the actual
|
|
total of all the errors in the file. Sample output:
|
|
-------------------------------------------------------------------------
|
|
localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_fatal
|
|
Undefined 0
|
|
Data Link Protocol 0
|
|
Surprise Down Error 0
|
|
Poisoned TLP 0
|
|
Flow Control Protocol 0
|
|
Completion Timeout 0
|
|
Completer Abort 0
|
|
Unexpected Completion 0
|
|
Receiver Overflow 0
|
|
Malformed TLP 0
|
|
ECRC 0
|
|
Unsupported Request 0
|
|
ACS Violation 0
|
|
Uncorrectable Internal Error 0
|
|
MC Blocked TLP 0
|
|
AtomicOp Egress Blocked 0
|
|
TLP Prefix Blocked Error 0
|
|
TOTAL_ERR_FATAL 0
|
|
-------------------------------------------------------------------------
|
|
|
|
Where: /sys/bus/pci/devices/<dev>/aer_dev_nonfatal
|
|
Date: July 2018
|
|
Kernel Version: 4.19.0
|
|
Contact: linux-pci@vger.kernel.org, rajatja@google.com
|
|
Description: List of uncorrectable nonfatal errors seen and reported by this
|
|
PCI device using ERR_NONFATAL. Note that since multiple errors
|
|
may be reported using a single ERR_FATAL message, thus
|
|
TOTAL_ERR_NONFATAL at the end of the file may not match the
|
|
actual total of all the errors in the file. Sample output:
|
|
-------------------------------------------------------------------------
|
|
localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_nonfatal
|
|
Undefined 0
|
|
Data Link Protocol 0
|
|
Surprise Down Error 0
|
|
Poisoned TLP 0
|
|
Flow Control Protocol 0
|
|
Completion Timeout 0
|
|
Completer Abort 0
|
|
Unexpected Completion 0
|
|
Receiver Overflow 0
|
|
Malformed TLP 0
|
|
ECRC 0
|
|
Unsupported Request 0
|
|
ACS Violation 0
|
|
Uncorrectable Internal Error 0
|
|
MC Blocked TLP 0
|
|
AtomicOp Egress Blocked 0
|
|
TLP Prefix Blocked Error 0
|
|
TOTAL_ERR_NONFATAL 0
|
|
-------------------------------------------------------------------------
|
|
|
|
============================
|
|
PCIe Rootport AER statistics
|
|
============================
|
|
These attributes show up under only the rootports (or root complex event
|
|
collectors) that are AER capable. These indicate the number of error messages as
|
|
"reported to" the rootport. Please note that the rootports also transmit
|
|
(internally) the ERR_* messages for errors seen by the internal rootport PCI
|
|
device, so these counters include them and are thus cumulative of all the error
|
|
messages on the PCI hierarchy originating at that root port.
|
|
|
|
Where: /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_cor
|
|
Date: July 2018
|
|
Kernel Version: 4.19.0
|
|
Contact: linux-pci@vger.kernel.org, rajatja@google.com
|
|
Description: Total number of ERR_COR messages reported to rootport.
|
|
|
|
Where: /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_fatal
|
|
Date: July 2018
|
|
Kernel Version: 4.19.0
|
|
Contact: linux-pci@vger.kernel.org, rajatja@google.com
|
|
Description: Total number of ERR_FATAL messages reported to rootport.
|
|
|
|
Where: /sys/bus/pci/devices/<dev>/aer_stats/aer_rootport_total_err_nonfatal
|
|
Date: July 2018
|
|
Kernel Version: 4.19.0
|
|
Contact: linux-pci@vger.kernel.org, rajatja@google.com
|
|
Description: Total number of ERR_NONFATAL messages reported to rootport.
|