mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-11-24 06:50:58 +07:00
* Rework error logging functions to accept a count of errors parameter (Hanna Hawa)
* Part one of substantial EDAC core + ghes_edac driver cleanup (Robert Richter) * Print additional useful logging information in skx_* (Tony Luck) * Improve amd64_edac hw detection + cleanups (Yazen Ghannam) * Misc cleanups, fixes and code improvements -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl3bf/MACgkQEsHwGGHe VUqRwBAAhyg0jxp6pKJQe1cUvX+Qv00/x2DBAy66nRQAwC8YG1pzJWtGr3YPWX0t +sBDEjE2Pqnh0EZOvUaGhRDCPOWBCMPrx53+KlNMw++hFy4rfrIFMUOVFRKabSEC zYwVbzC7tgLkyeqNccPQ2qmsG7L9H44zAR8HkH4c4IrWWgDtIlp5BSoGpiPWqszv LENuKvp+B2ll35oa9yfusqnudQzjdDn5wESigaEpjkPD7I+bHrWhUaE0xWGlIxbd hZUhh1tL31gBfQoumEADNOSZEHJBa9xsJrX9dJR07UIrgp5yn1PVhm+gGti5Np9v f/+3vtwTQGWJlYxJnvitvm4EGavgB/ihHpXiDwfGXWiwU/PqG3uEGIS3Lucu3PRn ZVv/4jCf2rxo+ECFOd7nj8WID2Mu1ZEKPQkIlrCx0A7v1FBaCXB9itUUL7GhX37W HaqBPoHa/eJDOlyYWgVXfdQQJ/gqMhfHCedMTL1WLNkHwbQU05TQoFDBd8r/J2Fb VkwkTS4fYTRyl+pbHX6Kk1bymyorTKp9jyxgoBdzlwLegqksmctxADZx/RJfM5WT 5CbNVsK9XVWYDiGRleMpbNveSgWGod+ul32BLp1rOLPOVXjZFDNawDCygtUjtHdH pXvoP1wIRupq1o745uoXQ17WvjYFR9cjVRGcQLZjW3EUng2fVrw= =ue24 -----END PGP SIGNATURE----- Merge tag 'edac_for_5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC updates from Borislav Petkov: "A lot of changes this time around, details below. From the next cycle onwards, we'll switch the EDAC tree to topic branches (instead of a single edac-for-next branch) which should make the changes handling more flexible, hopefully. We'll see. Summary: - Rework error logging functions to accept a count of errors parameter (Hanna Hawa) - Part one of substantial EDAC core + ghes_edac driver cleanup (Robert Richter) - Print additional useful logging information in skx_* (Tony Luck) - Improve amd64_edac hw detection + cleanups (Yazen Ghannam) - Misc cleanups, fixes and code improvements" * tag 'edac_for_5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (35 commits) EDAC/altera: Use the Altera System Manager driver EDAC/altera: Cleanup the ECC Manager EDAC/altera: Use fast register IO for S10 IRQs EDAC/ghes: Do not warn when incrementing refcount on 0 EDAC/Documentation: Describe CPER module definition and DIMM ranks EDAC: Unify the mc_event tracepoint call EDAC/ghes: Remove intermediate buffer pvt->detail_location EDAC/ghes: Fix grain calculation EDAC/ghes: Use standard kernel macros for page calculations EDAC: Remove misleading comment in struct edac_raw_error_desc EDAC/mc: Reduce indentation level in edac_mc_handle_error() EDAC/mc: Remove needless zero string termination EDAC/mc: Do not BUG_ON() in edac_mc_alloc() EDAC: Introduce an mci_for_each_dimm() iterator EDAC: Remove EDAC_DIMM_OFF() macro EDAC: Replace EDAC_DIMM_PTR() macro with edac_get_dimm() function EDAC/amd64: Get rid of the ECC disabled long message EDAC/ghes: Fix locking and memory barrier issues EDAC/amd64: Check for memory before fully initializing an instance EDAC/amd64: Use cached data when checking for ECC ...
This commit is contained in:
commit
9c91e6a5be
@ -330,9 +330,12 @@ There can be multiple csrows and multiple channels.
|
||||
|
||||
.. [#f4] Nowadays, the term DIMM (Dual In-line Memory Module) is widely
|
||||
used to refer to a memory module, although there are other memory
|
||||
packaging alternatives, like SO-DIMM, SIMM, etc. Along this document,
|
||||
and inside the EDAC system, the term "dimm" is used for all memory
|
||||
modules, even when they use a different kind of packaging.
|
||||
packaging alternatives, like SO-DIMM, SIMM, etc. The UEFI
|
||||
specification (Version 2.7) defines a memory module in the Common
|
||||
Platform Error Record (CPER) section to be an SMBIOS Memory Device
|
||||
(Type 17). Along this document, and inside the EDAC subsystem, the term
|
||||
"dimm" is used for all memory modules, even when they use a
|
||||
different kind of packaging.
|
||||
|
||||
Memory controllers allow for several csrows, with 8 csrows being a
|
||||
typical value. Yet, the actual number of csrows depends on the layout of
|
||||
@ -349,12 +352,14 @@ controllers. The following example will assume 2 channels:
|
||||
| | ``ch0`` | ``ch1`` |
|
||||
+============+===========+===========+
|
||||
| ``csrow0`` | DIMM_A0 | DIMM_B0 |
|
||||
+------------+ | |
|
||||
| ``csrow1`` | | |
|
||||
| | rank0 | rank0 |
|
||||
+------------+ - | - |
|
||||
| ``csrow1`` | rank1 | rank1 |
|
||||
+------------+-----------+-----------+
|
||||
| ``csrow2`` | DIMM_A1 | DIMM_B1 |
|
||||
+------------+ | |
|
||||
| ``csrow3`` | | |
|
||||
| | rank0 | rank0 |
|
||||
+------------+ - | - |
|
||||
| ``csrow3`` | rank1 | rank1 |
|
||||
+------------+-----------+-----------+
|
||||
|
||||
In the above example, there are 4 physical slots on the motherboard
|
||||
@ -374,11 +379,13 @@ which the memory DIMM is placed. Thus, when 1 DIMM is placed in each
|
||||
Channel, the csrows cross both DIMMs.
|
||||
|
||||
Memory DIMMs come single or dual "ranked". A rank is a populated csrow.
|
||||
Thus, 2 single ranked DIMMs, placed in slots DIMM_A0 and DIMM_B0 above
|
||||
will have just one csrow (csrow0). csrow1 will be empty. On the other
|
||||
hand, when 2 dual ranked DIMMs are similarly placed, then both csrow0
|
||||
and csrow1 will be populated. The pattern repeats itself for csrow2 and
|
||||
csrow3.
|
||||
In the example above 2 dual ranked DIMMs are similarly placed. Thus,
|
||||
both csrow0 and csrow1 are populated. On the other hand, when 2 single
|
||||
ranked DIMMs are placed in slots DIMM_A0 and DIMM_B0, then they will
|
||||
have just one csrow (csrow0) and csrow1 will be empty. The pattern
|
||||
repeats itself for csrow2 and csrow3. Also note that some memory
|
||||
controllers don't have any logic to identify the memory module, see
|
||||
``rankX`` directories below.
|
||||
|
||||
The representation of the above is reflected in the directory
|
||||
tree in EDAC's sysfs interface. Starting in directory
|
||||
|
@ -14,6 +14,7 @@
|
||||
#include <linux/interrupt.h>
|
||||
#include <linux/irqchip/chained_irq.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/mfd/altera-sysmgr.h>
|
||||
#include <linux/mfd/syscon.h>
|
||||
#include <linux/notifier.h>
|
||||
#include <linux/of_address.h>
|
||||
@ -275,7 +276,6 @@ static int a10_unmask_irq(struct platform_device *pdev, u32 mask)
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int socfpga_is_a10(void);
|
||||
static int altr_sdram_probe(struct platform_device *pdev)
|
||||
{
|
||||
const struct of_device_id *id;
|
||||
@ -399,7 +399,7 @@ static int altr_sdram_probe(struct platform_device *pdev)
|
||||
goto err;
|
||||
|
||||
/* Only the Arria10 has separate IRQs */
|
||||
if (socfpga_is_a10()) {
|
||||
if (of_machine_is_compatible("altr,socfpga-arria10")) {
|
||||
/* Arria10 specific initialization */
|
||||
res = a10_init(mc_vbase);
|
||||
if (res < 0)
|
||||
@ -502,68 +502,6 @@ module_platform_driver(altr_sdram_edac_driver);
|
||||
|
||||
#endif /* CONFIG_EDAC_ALTERA_SDRAM */
|
||||
|
||||
/**************** Stratix 10 EDAC Memory Controller Functions ************/
|
||||
|
||||
/**
|
||||
* s10_protected_reg_write
|
||||
* Write to a protected SMC register.
|
||||
* @context: Not used.
|
||||
* @reg: Address of register
|
||||
* @value: Value to write
|
||||
* Return: INTEL_SIP_SMC_STATUS_OK (0) on success
|
||||
* INTEL_SIP_SMC_REG_ERROR on error
|
||||
* INTEL_SIP_SMC_RETURN_UNKNOWN_FUNCTION if not supported
|
||||
*/
|
||||
static int s10_protected_reg_write(void *context, unsigned int reg,
|
||||
unsigned int val)
|
||||
{
|
||||
struct arm_smccc_res result;
|
||||
unsigned long offset = (unsigned long)context;
|
||||
|
||||
arm_smccc_smc(INTEL_SIP_SMC_REG_WRITE, offset + reg, val, 0, 0,
|
||||
0, 0, 0, &result);
|
||||
|
||||
return (int)result.a0;
|
||||
}
|
||||
|
||||
/**
|
||||
* s10_protected_reg_read
|
||||
* Read the status of a protected SMC register
|
||||
* @context: Not used.
|
||||
* @reg: Address of register
|
||||
* @value: Value read.
|
||||
* Return: INTEL_SIP_SMC_STATUS_OK (0) on success
|
||||
* INTEL_SIP_SMC_REG_ERROR on error
|
||||
* INTEL_SIP_SMC_RETURN_UNKNOWN_FUNCTION if not supported
|
||||
*/
|
||||
static int s10_protected_reg_read(void *context, unsigned int reg,
|
||||
unsigned int *val)
|
||||
{
|
||||
struct arm_smccc_res result;
|
||||
unsigned long offset = (unsigned long)context;
|
||||
|
||||
arm_smccc_smc(INTEL_SIP_SMC_REG_READ, offset + reg, 0, 0, 0,
|
||||
0, 0, 0, &result);
|
||||
|
||||
*val = (unsigned int)result.a1;
|
||||
|
||||
return (int)result.a0;
|
||||
}
|
||||
|
||||
static const struct regmap_config s10_sdram_regmap_cfg = {
|
||||
.name = "s10_ddr",
|
||||
.reg_bits = 32,
|
||||
.reg_stride = 4,
|
||||
.val_bits = 32,
|
||||
.max_register = 0xffd12228,
|
||||
.reg_read = s10_protected_reg_read,
|
||||
.reg_write = s10_protected_reg_write,
|
||||
.use_single_read = true,
|
||||
.use_single_write = true,
|
||||
};
|
||||
|
||||
/************** </Stratix10 EDAC Memory Controller Functions> ***********/
|
||||
|
||||
/************************* EDAC Parent Probe *************************/
|
||||
|
||||
static const struct of_device_id altr_edac_device_of_match[];
|
||||
@ -1008,16 +946,6 @@ static int __maybe_unused altr_init_memory_port(void __iomem *ioaddr, int port)
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int socfpga_is_a10(void)
|
||||
{
|
||||
return of_machine_is_compatible("altr,socfpga-arria10");
|
||||
}
|
||||
|
||||
static int socfpga_is_s10(void)
|
||||
{
|
||||
return of_machine_is_compatible("altr,socfpga-stratix10");
|
||||
}
|
||||
|
||||
static __init int __maybe_unused
|
||||
altr_init_a10_ecc_block(struct device_node *np, u32 irq_mask,
|
||||
u32 ecc_ctrl_en_mask, bool dual_port)
|
||||
@ -1033,34 +961,10 @@ altr_init_a10_ecc_block(struct device_node *np, u32 irq_mask,
|
||||
/* Get the ECC Manager - parent of the device EDACs */
|
||||
np_eccmgr = of_get_parent(np);
|
||||
|
||||
if (socfpga_is_a10()) {
|
||||
ecc_mgr_map = syscon_regmap_lookup_by_phandle(np_eccmgr,
|
||||
"altr,sysmgr-syscon");
|
||||
} else {
|
||||
struct device_node *sysmgr_np;
|
||||
struct resource res;
|
||||
uintptr_t base;
|
||||
ecc_mgr_map =
|
||||
altr_sysmgr_regmap_lookup_by_phandle(np_eccmgr,
|
||||
"altr,sysmgr-syscon");
|
||||
|
||||
sysmgr_np = of_parse_phandle(np_eccmgr,
|
||||
"altr,sysmgr-syscon", 0);
|
||||
if (!sysmgr_np) {
|
||||
edac_printk(KERN_ERR, EDAC_DEVICE,
|
||||
"Unable to find altr,sysmgr-syscon\n");
|
||||
return -ENODEV;
|
||||
}
|
||||
|
||||
if (of_address_to_resource(sysmgr_np, 0, &res)) {
|
||||
of_node_put(sysmgr_np);
|
||||
return -ENOMEM;
|
||||
}
|
||||
|
||||
/* Need physical address for SMCC call */
|
||||
base = res.start;
|
||||
|
||||
ecc_mgr_map = regmap_init(NULL, NULL, (void *)base,
|
||||
&s10_sdram_regmap_cfg);
|
||||
of_node_put(sysmgr_np);
|
||||
}
|
||||
of_node_put(np_eccmgr);
|
||||
if (IS_ERR(ecc_mgr_map)) {
|
||||
edac_printk(KERN_ERR, EDAC_DEVICE,
|
||||
@ -1125,9 +1029,6 @@ static int __init __maybe_unused altr_init_a10_ecc_device_type(char *compat)
|
||||
int irq;
|
||||
struct device_node *child, *np;
|
||||
|
||||
if (!socfpga_is_a10() && !socfpga_is_s10())
|
||||
return -ENODEV;
|
||||
|
||||
np = of_find_compatible_node(NULL, NULL,
|
||||
"altr,socfpga-a10-ecc-manager");
|
||||
if (!np) {
|
||||
@ -2178,33 +2079,9 @@ static int altr_edac_a10_probe(struct platform_device *pdev)
|
||||
platform_set_drvdata(pdev, edac);
|
||||
INIT_LIST_HEAD(&edac->a10_ecc_devices);
|
||||
|
||||
if (socfpga_is_a10()) {
|
||||
edac->ecc_mgr_map =
|
||||
syscon_regmap_lookup_by_phandle(pdev->dev.of_node,
|
||||
"altr,sysmgr-syscon");
|
||||
} else {
|
||||
struct device_node *sysmgr_np;
|
||||
struct resource res;
|
||||
uintptr_t base;
|
||||
|
||||
sysmgr_np = of_parse_phandle(pdev->dev.of_node,
|
||||
"altr,sysmgr-syscon", 0);
|
||||
if (!sysmgr_np) {
|
||||
edac_printk(KERN_ERR, EDAC_DEVICE,
|
||||
"Unable to find altr,sysmgr-syscon\n");
|
||||
return -ENODEV;
|
||||
}
|
||||
|
||||
if (of_address_to_resource(sysmgr_np, 0, &res))
|
||||
return -ENOMEM;
|
||||
|
||||
/* Need physical address for SMCC call */
|
||||
base = res.start;
|
||||
|
||||
edac->ecc_mgr_map = devm_regmap_init(&pdev->dev, NULL,
|
||||
(void *)base,
|
||||
&s10_sdram_regmap_cfg);
|
||||
}
|
||||
edac->ecc_mgr_map =
|
||||
altr_sysmgr_regmap_lookup_by_phandle(pdev->dev.of_node,
|
||||
"altr,sysmgr-syscon");
|
||||
|
||||
if (IS_ERR(edac->ecc_mgr_map)) {
|
||||
edac_printk(KERN_ERR, EDAC_DEVICE,
|
||||
@ -2270,18 +2147,7 @@ static int altr_edac_a10_probe(struct platform_device *pdev)
|
||||
if (!of_device_is_available(child))
|
||||
continue;
|
||||
|
||||
if (of_device_is_compatible(child, "altr,socfpga-a10-l2-ecc") ||
|
||||
of_device_is_compatible(child, "altr,socfpga-a10-ocram-ecc") ||
|
||||
of_device_is_compatible(child, "altr,socfpga-eth-mac-ecc") ||
|
||||
of_device_is_compatible(child, "altr,socfpga-nand-ecc") ||
|
||||
of_device_is_compatible(child, "altr,socfpga-dma-ecc") ||
|
||||
of_device_is_compatible(child, "altr,socfpga-usb-ecc") ||
|
||||
of_device_is_compatible(child, "altr,socfpga-qspi-ecc") ||
|
||||
#ifdef CONFIG_EDAC_ALTERA_SDRAM
|
||||
of_device_is_compatible(child, "altr,sdram-edac-s10") ||
|
||||
#endif
|
||||
of_device_is_compatible(child, "altr,socfpga-sdmmc-ecc"))
|
||||
|
||||
if (of_match_node(altr_edac_a10_device_of_match, child))
|
||||
altr_edac_a10_device_add(edac, child);
|
||||
|
||||
#ifdef CONFIG_EDAC_ALTERA_SDRAM
|
||||
|
@ -16,12 +16,11 @@ module_param(ecc_enable_override, int, 0644);
|
||||
|
||||
static struct msr __percpu *msrs;
|
||||
|
||||
static struct amd64_family_type *fam_type;
|
||||
|
||||
/* Per-node stuff */
|
||||
static struct ecc_settings **ecc_stngs;
|
||||
|
||||
/* Number of Unified Memory Controllers */
|
||||
static u8 num_umcs;
|
||||
|
||||
/*
|
||||
* Valid scrub rates for the K8 hardware memory scrubber. We map the scrubbing
|
||||
* bandwidth to a valid bit pattern. The 'set' operation finds the 'matching-
|
||||
@ -454,7 +453,7 @@ static void get_cs_base_and_mask(struct amd64_pvt *pvt, int csrow, u8 dct,
|
||||
for (i = 0; i < pvt->csels[dct].m_cnt; i++)
|
||||
|
||||
#define for_each_umc(i) \
|
||||
for (i = 0; i < num_umcs; i++)
|
||||
for (i = 0; i < fam_type->max_mcs; i++)
|
||||
|
||||
/*
|
||||
* @input_addr is an InputAddr associated with the node given by mci. Return the
|
||||
@ -2224,6 +2223,7 @@ static struct amd64_family_type family_types[] = {
|
||||
.ctl_name = "K8",
|
||||
.f1_id = PCI_DEVICE_ID_AMD_K8_NB_ADDRMAP,
|
||||
.f2_id = PCI_DEVICE_ID_AMD_K8_NB_MEMCTL,
|
||||
.max_mcs = 2,
|
||||
.ops = {
|
||||
.early_channel_count = k8_early_channel_count,
|
||||
.map_sysaddr_to_csrow = k8_map_sysaddr_to_csrow,
|
||||
@ -2234,6 +2234,7 @@ static struct amd64_family_type family_types[] = {
|
||||
.ctl_name = "F10h",
|
||||
.f1_id = PCI_DEVICE_ID_AMD_10H_NB_MAP,
|
||||
.f2_id = PCI_DEVICE_ID_AMD_10H_NB_DRAM,
|
||||
.max_mcs = 2,
|
||||
.ops = {
|
||||
.early_channel_count = f1x_early_channel_count,
|
||||
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
|
||||
@ -2244,6 +2245,7 @@ static struct amd64_family_type family_types[] = {
|
||||
.ctl_name = "F15h",
|
||||
.f1_id = PCI_DEVICE_ID_AMD_15H_NB_F1,
|
||||
.f2_id = PCI_DEVICE_ID_AMD_15H_NB_F2,
|
||||
.max_mcs = 2,
|
||||
.ops = {
|
||||
.early_channel_count = f1x_early_channel_count,
|
||||
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
|
||||
@ -2254,6 +2256,7 @@ static struct amd64_family_type family_types[] = {
|
||||
.ctl_name = "F15h_M30h",
|
||||
.f1_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F1,
|
||||
.f2_id = PCI_DEVICE_ID_AMD_15H_M30H_NB_F2,
|
||||
.max_mcs = 2,
|
||||
.ops = {
|
||||
.early_channel_count = f1x_early_channel_count,
|
||||
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
|
||||
@ -2264,6 +2267,7 @@ static struct amd64_family_type family_types[] = {
|
||||
.ctl_name = "F15h_M60h",
|
||||
.f1_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F1,
|
||||
.f2_id = PCI_DEVICE_ID_AMD_15H_M60H_NB_F2,
|
||||
.max_mcs = 2,
|
||||
.ops = {
|
||||
.early_channel_count = f1x_early_channel_count,
|
||||
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
|
||||
@ -2274,6 +2278,7 @@ static struct amd64_family_type family_types[] = {
|
||||
.ctl_name = "F16h",
|
||||
.f1_id = PCI_DEVICE_ID_AMD_16H_NB_F1,
|
||||
.f2_id = PCI_DEVICE_ID_AMD_16H_NB_F2,
|
||||
.max_mcs = 2,
|
||||
.ops = {
|
||||
.early_channel_count = f1x_early_channel_count,
|
||||
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
|
||||
@ -2284,6 +2289,7 @@ static struct amd64_family_type family_types[] = {
|
||||
.ctl_name = "F16h_M30h",
|
||||
.f1_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F1,
|
||||
.f2_id = PCI_DEVICE_ID_AMD_16H_M30H_NB_F2,
|
||||
.max_mcs = 2,
|
||||
.ops = {
|
||||
.early_channel_count = f1x_early_channel_count,
|
||||
.map_sysaddr_to_csrow = f1x_map_sysaddr_to_csrow,
|
||||
@ -2294,6 +2300,7 @@ static struct amd64_family_type family_types[] = {
|
||||
.ctl_name = "F17h",
|
||||
.f0_id = PCI_DEVICE_ID_AMD_17H_DF_F0,
|
||||
.f6_id = PCI_DEVICE_ID_AMD_17H_DF_F6,
|
||||
.max_mcs = 2,
|
||||
.ops = {
|
||||
.early_channel_count = f17_early_channel_count,
|
||||
.dbam_to_cs = f17_addr_mask_to_cs_size,
|
||||
@ -2303,6 +2310,7 @@ static struct amd64_family_type family_types[] = {
|
||||
.ctl_name = "F17h_M10h",
|
||||
.f0_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F0,
|
||||
.f6_id = PCI_DEVICE_ID_AMD_17H_M10H_DF_F6,
|
||||
.max_mcs = 2,
|
||||
.ops = {
|
||||
.early_channel_count = f17_early_channel_count,
|
||||
.dbam_to_cs = f17_addr_mask_to_cs_size,
|
||||
@ -2312,6 +2320,7 @@ static struct amd64_family_type family_types[] = {
|
||||
.ctl_name = "F17h_M30h",
|
||||
.f0_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F0,
|
||||
.f6_id = PCI_DEVICE_ID_AMD_17H_M30H_DF_F6,
|
||||
.max_mcs = 8,
|
||||
.ops = {
|
||||
.early_channel_count = f17_early_channel_count,
|
||||
.dbam_to_cs = f17_addr_mask_to_cs_size,
|
||||
@ -2321,6 +2330,7 @@ static struct amd64_family_type family_types[] = {
|
||||
.ctl_name = "F17h_M70h",
|
||||
.f0_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F0,
|
||||
.f6_id = PCI_DEVICE_ID_AMD_17H_M70H_DF_F6,
|
||||
.max_mcs = 2,
|
||||
.ops = {
|
||||
.early_channel_count = f17_early_channel_count,
|
||||
.dbam_to_cs = f17_addr_mask_to_cs_size,
|
||||
@ -2838,8 +2848,6 @@ static void read_mc_regs(struct amd64_pvt *pvt)
|
||||
edac_dbg(1, " DIMM type: %s\n", edac_mem_types[pvt->dram_type]);
|
||||
|
||||
determine_ecc_sym_sz(pvt);
|
||||
|
||||
dump_misc_regs(pvt);
|
||||
}
|
||||
|
||||
/*
|
||||
@ -2936,6 +2944,7 @@ static int init_csrows_df(struct mem_ctl_info *mci)
|
||||
dimm->mtype = pvt->dram_type;
|
||||
dimm->edac_mode = edac_mode;
|
||||
dimm->dtype = dev_type;
|
||||
dimm->grain = 64;
|
||||
}
|
||||
}
|
||||
|
||||
@ -3012,6 +3021,7 @@ static int init_csrows(struct mem_ctl_info *mci)
|
||||
dimm = csrow->channels[j]->dimm;
|
||||
dimm->mtype = pvt->dram_type;
|
||||
dimm->edac_mode = edac_mode;
|
||||
dimm->grain = 64;
|
||||
}
|
||||
}
|
||||
|
||||
@ -3178,43 +3188,27 @@ static void restore_ecc_error_reporting(struct ecc_settings *s, u16 nid,
|
||||
amd64_warn("Error restoring NB MCGCTL settings!\n");
|
||||
}
|
||||
|
||||
/*
|
||||
* EDAC requires that the BIOS have ECC enabled before
|
||||
* taking over the processing of ECC errors. A command line
|
||||
* option allows to force-enable hardware ECC later in
|
||||
* enable_ecc_error_reporting().
|
||||
*/
|
||||
static const char *ecc_msg =
|
||||
"ECC disabled in the BIOS or no ECC capability, module will not load.\n"
|
||||
" Either enable ECC checking or force module loading by setting "
|
||||
"'ecc_enable_override'.\n"
|
||||
" (Note that use of the override may cause unknown side effects.)\n";
|
||||
|
||||
static bool ecc_enabled(struct pci_dev *F3, u16 nid)
|
||||
static bool ecc_enabled(struct amd64_pvt *pvt)
|
||||
{
|
||||
u16 nid = pvt->mc_node_id;
|
||||
bool nb_mce_en = false;
|
||||
u8 ecc_en = 0, i;
|
||||
u32 value;
|
||||
|
||||
if (boot_cpu_data.x86 >= 0x17) {
|
||||
u8 umc_en_mask = 0, ecc_en_mask = 0;
|
||||
struct amd64_umc *umc;
|
||||
|
||||
for_each_umc(i) {
|
||||
u32 base = get_umc_base(i);
|
||||
umc = &pvt->umc[i];
|
||||
|
||||
/* Only check enabled UMCs. */
|
||||
if (amd_smn_read(nid, base + UMCCH_SDP_CTRL, &value))
|
||||
continue;
|
||||
|
||||
if (!(value & UMC_SDP_INIT))
|
||||
if (!(umc->sdp_ctrl & UMC_SDP_INIT))
|
||||
continue;
|
||||
|
||||
umc_en_mask |= BIT(i);
|
||||
|
||||
if (amd_smn_read(nid, base + UMCCH_UMC_CAP_HI, &value))
|
||||
continue;
|
||||
|
||||
if (value & UMC_ECC_ENABLED)
|
||||
if (umc->umc_cap_hi & UMC_ECC_ENABLED)
|
||||
ecc_en_mask |= BIT(i);
|
||||
}
|
||||
|
||||
@ -3227,7 +3221,7 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
|
||||
/* Assume UMC MCA banks are enabled. */
|
||||
nb_mce_en = true;
|
||||
} else {
|
||||
amd64_read_pci_cfg(F3, NBCFG, &value);
|
||||
amd64_read_pci_cfg(pvt->F3, NBCFG, &value);
|
||||
|
||||
ecc_en = !!(value & NBCFG_ECC_ENABLE);
|
||||
|
||||
@ -3240,11 +3234,10 @@ static bool ecc_enabled(struct pci_dev *F3, u16 nid)
|
||||
amd64_info("Node %d: DRAM ECC %s.\n",
|
||||
nid, (ecc_en ? "enabled" : "disabled"));
|
||||
|
||||
if (!ecc_en || !nb_mce_en) {
|
||||
amd64_info("%s", ecc_msg);
|
||||
if (!ecc_en || !nb_mce_en)
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
else
|
||||
return true;
|
||||
}
|
||||
|
||||
static inline void
|
||||
@ -3278,8 +3271,7 @@ f17h_determine_edac_ctl_cap(struct mem_ctl_info *mci, struct amd64_pvt *pvt)
|
||||
}
|
||||
}
|
||||
|
||||
static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
|
||||
struct amd64_family_type *fam)
|
||||
static void setup_mci_misc_attrs(struct mem_ctl_info *mci)
|
||||
{
|
||||
struct amd64_pvt *pvt = mci->pvt_info;
|
||||
|
||||
@ -3298,7 +3290,7 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
|
||||
|
||||
mci->edac_cap = determine_edac_cap(pvt);
|
||||
mci->mod_name = EDAC_MOD_STR;
|
||||
mci->ctl_name = fam->ctl_name;
|
||||
mci->ctl_name = fam_type->ctl_name;
|
||||
mci->dev_name = pci_name(pvt->F3);
|
||||
mci->ctl_page_to_phys = NULL;
|
||||
|
||||
@ -3312,8 +3304,6 @@ static void setup_mci_misc_attrs(struct mem_ctl_info *mci,
|
||||
*/
|
||||
static struct amd64_family_type *per_family_init(struct amd64_pvt *pvt)
|
||||
{
|
||||
struct amd64_family_type *fam_type = NULL;
|
||||
|
||||
pvt->ext_model = boot_cpu_data.x86_model >> 4;
|
||||
pvt->stepping = boot_cpu_data.x86_stepping;
|
||||
pvt->model = boot_cpu_data.x86_model;
|
||||
@ -3401,51 +3391,15 @@ static const struct attribute_group *amd64_edac_attr_groups[] = {
|
||||
NULL
|
||||
};
|
||||
|
||||
/* Set the number of Unified Memory Controllers in the system. */
|
||||
static void compute_num_umcs(void)
|
||||
static int hw_info_get(struct amd64_pvt *pvt)
|
||||
{
|
||||
u8 model = boot_cpu_data.x86_model;
|
||||
|
||||
if (boot_cpu_data.x86 < 0x17)
|
||||
return;
|
||||
|
||||
if (model >= 0x30 && model <= 0x3f)
|
||||
num_umcs = 8;
|
||||
else
|
||||
num_umcs = 2;
|
||||
|
||||
edac_dbg(1, "Number of UMCs: %x", num_umcs);
|
||||
}
|
||||
|
||||
static int init_one_instance(unsigned int nid)
|
||||
{
|
||||
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
|
||||
struct amd64_family_type *fam_type = NULL;
|
||||
struct mem_ctl_info *mci = NULL;
|
||||
struct edac_mc_layer layers[2];
|
||||
struct amd64_pvt *pvt = NULL;
|
||||
u16 pci_id1, pci_id2;
|
||||
int err = 0, ret;
|
||||
|
||||
ret = -ENOMEM;
|
||||
pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
|
||||
if (!pvt)
|
||||
goto err_ret;
|
||||
|
||||
pvt->mc_node_id = nid;
|
||||
pvt->F3 = F3;
|
||||
|
||||
ret = -EINVAL;
|
||||
fam_type = per_family_init(pvt);
|
||||
if (!fam_type)
|
||||
goto err_free;
|
||||
int ret = -EINVAL;
|
||||
|
||||
if (pvt->fam >= 0x17) {
|
||||
pvt->umc = kcalloc(num_umcs, sizeof(struct amd64_umc), GFP_KERNEL);
|
||||
if (!pvt->umc) {
|
||||
ret = -ENOMEM;
|
||||
goto err_free;
|
||||
}
|
||||
pvt->umc = kcalloc(fam_type->max_mcs, sizeof(struct amd64_umc), GFP_KERNEL);
|
||||
if (!pvt->umc)
|
||||
return -ENOMEM;
|
||||
|
||||
pci_id1 = fam_type->f0_id;
|
||||
pci_id2 = fam_type->f6_id;
|
||||
@ -3454,21 +3408,37 @@ static int init_one_instance(unsigned int nid)
|
||||
pci_id2 = fam_type->f2_id;
|
||||
}
|
||||
|
||||
err = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
|
||||
if (err)
|
||||
goto err_post_init;
|
||||
ret = reserve_mc_sibling_devs(pvt, pci_id1, pci_id2);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
read_mc_regs(pvt);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void hw_info_put(struct amd64_pvt *pvt)
|
||||
{
|
||||
if (pvt->F0 || pvt->F1)
|
||||
free_mc_sibling_devs(pvt);
|
||||
|
||||
kfree(pvt->umc);
|
||||
}
|
||||
|
||||
static int init_one_instance(struct amd64_pvt *pvt)
|
||||
{
|
||||
struct mem_ctl_info *mci = NULL;
|
||||
struct edac_mc_layer layers[2];
|
||||
int ret = -EINVAL;
|
||||
|
||||
/*
|
||||
* We need to determine how many memory channels there are. Then use
|
||||
* that information for calculating the size of the dynamic instance
|
||||
* tables in the 'mci' structure.
|
||||
*/
|
||||
ret = -EINVAL;
|
||||
pvt->channel_count = pvt->ops->early_channel_count(pvt);
|
||||
if (pvt->channel_count < 0)
|
||||
goto err_siblings;
|
||||
return ret;
|
||||
|
||||
ret = -ENOMEM;
|
||||
layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
|
||||
@ -3480,24 +3450,18 @@ static int init_one_instance(unsigned int nid)
|
||||
* Always allocate two channels since we can have setups with DIMMs on
|
||||
* only one channel. Also, this simplifies handling later for the price
|
||||
* of a couple of KBs tops.
|
||||
*
|
||||
* On Fam17h+, the number of controllers may be greater than two. So set
|
||||
* the size equal to the maximum number of UMCs.
|
||||
*/
|
||||
if (pvt->fam >= 0x17)
|
||||
layers[1].size = num_umcs;
|
||||
else
|
||||
layers[1].size = 2;
|
||||
layers[1].size = fam_type->max_mcs;
|
||||
layers[1].is_virt_csrow = false;
|
||||
|
||||
mci = edac_mc_alloc(nid, ARRAY_SIZE(layers), layers, 0);
|
||||
mci = edac_mc_alloc(pvt->mc_node_id, ARRAY_SIZE(layers), layers, 0);
|
||||
if (!mci)
|
||||
goto err_siblings;
|
||||
return ret;
|
||||
|
||||
mci->pvt_info = pvt;
|
||||
mci->pdev = &pvt->F3->dev;
|
||||
|
||||
setup_mci_misc_attrs(mci, fam_type);
|
||||
setup_mci_misc_attrs(mci);
|
||||
|
||||
if (init_csrows(mci))
|
||||
mci->edac_cap = EDAC_FLAG_NONE;
|
||||
@ -3505,31 +3469,30 @@ static int init_one_instance(unsigned int nid)
|
||||
ret = -ENODEV;
|
||||
if (edac_mc_add_mc_with_groups(mci, amd64_edac_attr_groups)) {
|
||||
edac_dbg(1, "failed edac_mc_add_mc()\n");
|
||||
goto err_add_mc;
|
||||
edac_mc_free(mci);
|
||||
return ret;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
err_add_mc:
|
||||
edac_mc_free(mci);
|
||||
static bool instance_has_memory(struct amd64_pvt *pvt)
|
||||
{
|
||||
bool cs_enabled = false;
|
||||
int cs = 0, dct = 0;
|
||||
|
||||
err_siblings:
|
||||
free_mc_sibling_devs(pvt);
|
||||
for (dct = 0; dct < fam_type->max_mcs; dct++) {
|
||||
for_each_chip_select(cs, dct, pvt)
|
||||
cs_enabled |= csrow_enabled(cs, dct, pvt);
|
||||
}
|
||||
|
||||
err_post_init:
|
||||
if (pvt->fam >= 0x17)
|
||||
kfree(pvt->umc);
|
||||
|
||||
err_free:
|
||||
kfree(pvt);
|
||||
|
||||
err_ret:
|
||||
return ret;
|
||||
return cs_enabled;
|
||||
}
|
||||
|
||||
static int probe_one_instance(unsigned int nid)
|
||||
{
|
||||
struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
|
||||
struct amd64_pvt *pvt = NULL;
|
||||
struct ecc_settings *s;
|
||||
int ret;
|
||||
|
||||
@ -3540,8 +3503,29 @@ static int probe_one_instance(unsigned int nid)
|
||||
|
||||
ecc_stngs[nid] = s;
|
||||
|
||||
if (!ecc_enabled(F3, nid)) {
|
||||
ret = 0;
|
||||
pvt = kzalloc(sizeof(struct amd64_pvt), GFP_KERNEL);
|
||||
if (!pvt)
|
||||
goto err_settings;
|
||||
|
||||
pvt->mc_node_id = nid;
|
||||
pvt->F3 = F3;
|
||||
|
||||
fam_type = per_family_init(pvt);
|
||||
if (!fam_type)
|
||||
goto err_enable;
|
||||
|
||||
ret = hw_info_get(pvt);
|
||||
if (ret < 0)
|
||||
goto err_enable;
|
||||
|
||||
ret = 0;
|
||||
if (!instance_has_memory(pvt)) {
|
||||
amd64_info("Node %d: No DIMMs detected.\n", nid);
|
||||
goto err_enable;
|
||||
}
|
||||
|
||||
if (!ecc_enabled(pvt)) {
|
||||
ret = -ENODEV;
|
||||
|
||||
if (!ecc_enable_override)
|
||||
goto err_enable;
|
||||
@ -3556,7 +3540,7 @@ static int probe_one_instance(unsigned int nid)
|
||||
goto err_enable;
|
||||
}
|
||||
|
||||
ret = init_one_instance(nid);
|
||||
ret = init_one_instance(pvt);
|
||||
if (ret < 0) {
|
||||
amd64_err("Error probing instance: %d\n", nid);
|
||||
|
||||
@ -3566,9 +3550,15 @@ static int probe_one_instance(unsigned int nid)
|
||||
goto err_enable;
|
||||
}
|
||||
|
||||
dump_misc_regs(pvt);
|
||||
|
||||
return ret;
|
||||
|
||||
err_enable:
|
||||
hw_info_put(pvt);
|
||||
kfree(pvt);
|
||||
|
||||
err_settings:
|
||||
kfree(s);
|
||||
ecc_stngs[nid] = NULL;
|
||||
|
||||
@ -3595,14 +3585,13 @@ static void remove_one_instance(unsigned int nid)
|
||||
|
||||
restore_ecc_error_reporting(s, nid, F3);
|
||||
|
||||
free_mc_sibling_devs(pvt);
|
||||
|
||||
kfree(ecc_stngs[nid]);
|
||||
ecc_stngs[nid] = NULL;
|
||||
|
||||
/* Free the EDAC CORE resources */
|
||||
mci->pvt_info = NULL;
|
||||
|
||||
hw_info_put(pvt);
|
||||
kfree(pvt);
|
||||
edac_mc_free(mci);
|
||||
}
|
||||
@ -3668,8 +3657,6 @@ static int __init amd64_edac_init(void)
|
||||
if (!msrs)
|
||||
goto err_free;
|
||||
|
||||
compute_num_umcs();
|
||||
|
||||
for (i = 0; i < amd_nb_num(); i++) {
|
||||
err = probe_one_instance(i);
|
||||
if (err) {
|
||||
|
@ -479,6 +479,8 @@ struct low_ops {
|
||||
struct amd64_family_type {
|
||||
const char *ctl_name;
|
||||
u16 f0_id, f1_id, f2_id, f6_id;
|
||||
/* Maximum number of memory controllers per die/node. */
|
||||
u8 max_mcs;
|
||||
struct low_ops ops;
|
||||
};
|
||||
|
||||
|
@ -281,16 +281,11 @@ static int aspeed_probe(struct platform_device *pdev)
|
||||
struct device *dev = &pdev->dev;
|
||||
struct edac_mc_layer layers[2];
|
||||
struct mem_ctl_info *mci;
|
||||
struct resource *res;
|
||||
void __iomem *regs;
|
||||
u32 reg04;
|
||||
int rc;
|
||||
|
||||
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
|
||||
if (!res)
|
||||
return -ENOENT;
|
||||
|
||||
regs = devm_ioremap_resource(dev, res);
|
||||
regs = devm_platform_ioremap_resource(pdev, 0);
|
||||
if (IS_ERR(regs))
|
||||
return PTR_ERR(regs);
|
||||
|
||||
|
@ -555,12 +555,16 @@ static inline int edac_device_get_panic_on_ue(struct edac_device_ctl_info
|
||||
return edac_dev->panic_on_ue;
|
||||
}
|
||||
|
||||
void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
|
||||
int inst_nr, int block_nr, const char *msg)
|
||||
void edac_device_handle_ce_count(struct edac_device_ctl_info *edac_dev,
|
||||
unsigned int count, int inst_nr, int block_nr,
|
||||
const char *msg)
|
||||
{
|
||||
struct edac_device_instance *instance;
|
||||
struct edac_device_block *block = NULL;
|
||||
|
||||
if (!count)
|
||||
return;
|
||||
|
||||
if ((inst_nr >= edac_dev->nr_instances) || (inst_nr < 0)) {
|
||||
edac_device_printk(edac_dev, KERN_ERR,
|
||||
"INTERNAL ERROR: 'instance' out of range "
|
||||
@ -582,27 +586,31 @@ void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
|
||||
|
||||
if (instance->nr_blocks > 0) {
|
||||
block = instance->blocks + block_nr;
|
||||
block->counters.ce_count++;
|
||||
block->counters.ce_count += count;
|
||||
}
|
||||
|
||||
/* Propagate the count up the 'totals' tree */
|
||||
instance->counters.ce_count++;
|
||||
edac_dev->counters.ce_count++;
|
||||
instance->counters.ce_count += count;
|
||||
edac_dev->counters.ce_count += count;
|
||||
|
||||
if (edac_device_get_log_ce(edac_dev))
|
||||
edac_device_printk(edac_dev, KERN_WARNING,
|
||||
"CE: %s instance: %s block: %s '%s'\n",
|
||||
edac_dev->ctl_name, instance->name,
|
||||
block ? block->name : "N/A", msg);
|
||||
"CE: %s instance: %s block: %s count: %d '%s'\n",
|
||||
edac_dev->ctl_name, instance->name,
|
||||
block ? block->name : "N/A", count, msg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(edac_device_handle_ce);
|
||||
EXPORT_SYMBOL_GPL(edac_device_handle_ce_count);
|
||||
|
||||
void edac_device_handle_ue(struct edac_device_ctl_info *edac_dev,
|
||||
int inst_nr, int block_nr, const char *msg)
|
||||
void edac_device_handle_ue_count(struct edac_device_ctl_info *edac_dev,
|
||||
unsigned int count, int inst_nr, int block_nr,
|
||||
const char *msg)
|
||||
{
|
||||
struct edac_device_instance *instance;
|
||||
struct edac_device_block *block = NULL;
|
||||
|
||||
if (!count)
|
||||
return;
|
||||
|
||||
if ((inst_nr >= edac_dev->nr_instances) || (inst_nr < 0)) {
|
||||
edac_device_printk(edac_dev, KERN_ERR,
|
||||
"INTERNAL ERROR: 'instance' out of range "
|
||||
@ -624,22 +632,22 @@ void edac_device_handle_ue(struct edac_device_ctl_info *edac_dev,
|
||||
|
||||
if (instance->nr_blocks > 0) {
|
||||
block = instance->blocks + block_nr;
|
||||
block->counters.ue_count++;
|
||||
block->counters.ue_count += count;
|
||||
}
|
||||
|
||||
/* Propagate the count up the 'totals' tree */
|
||||
instance->counters.ue_count++;
|
||||
edac_dev->counters.ue_count++;
|
||||
instance->counters.ue_count += count;
|
||||
edac_dev->counters.ue_count += count;
|
||||
|
||||
if (edac_device_get_log_ue(edac_dev))
|
||||
edac_device_printk(edac_dev, KERN_EMERG,
|
||||
"UE: %s instance: %s block: %s '%s'\n",
|
||||
edac_dev->ctl_name, instance->name,
|
||||
block ? block->name : "N/A", msg);
|
||||
"UE: %s instance: %s block: %s count: %d '%s'\n",
|
||||
edac_dev->ctl_name, instance->name,
|
||||
block ? block->name : "N/A", count, msg);
|
||||
|
||||
if (edac_device_get_panic_on_ue(edac_dev))
|
||||
panic("EDAC %s: UE instance: %s block %s '%s'\n",
|
||||
edac_dev->ctl_name, instance->name,
|
||||
block ? block->name : "N/A", msg);
|
||||
panic("EDAC %s: UE instance: %s block %s count: %d '%s'\n",
|
||||
edac_dev->ctl_name, instance->name,
|
||||
block ? block->name : "N/A", count, msg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(edac_device_handle_ue);
|
||||
EXPORT_SYMBOL_GPL(edac_device_handle_ue_count);
|
||||
|
@ -286,27 +286,60 @@ extern int edac_device_add_device(struct edac_device_ctl_info *edac_dev);
|
||||
extern struct edac_device_ctl_info *edac_device_del_device(struct device *dev);
|
||||
|
||||
/**
|
||||
* edac_device_handle_ue():
|
||||
* perform a common output and handling of an 'edac_dev' UE event
|
||||
* Log correctable errors.
|
||||
*
|
||||
* @edac_dev: pointer to struct &edac_device_ctl_info
|
||||
* @inst_nr: number of the instance where the UE error happened
|
||||
* @block_nr: number of the block where the UE error happened
|
||||
* @inst_nr: number of the instance where the CE error happened
|
||||
* @count: Number of errors to log.
|
||||
* @block_nr: number of the block where the CE error happened
|
||||
* @msg: message to be printed
|
||||
*/
|
||||
extern void edac_device_handle_ue(struct edac_device_ctl_info *edac_dev,
|
||||
int inst_nr, int block_nr, const char *msg);
|
||||
void edac_device_handle_ce_count(struct edac_device_ctl_info *edac_dev,
|
||||
unsigned int count, int inst_nr, int block_nr,
|
||||
const char *msg);
|
||||
|
||||
/**
|
||||
* edac_device_handle_ce():
|
||||
* perform a common output and handling of an 'edac_dev' CE event
|
||||
* Log uncorrectable errors.
|
||||
*
|
||||
* @edac_dev: pointer to struct &edac_device_ctl_info
|
||||
* @inst_nr: number of the instance where the CE error happened
|
||||
* @count: Number of errors to log.
|
||||
* @block_nr: number of the block where the CE error happened
|
||||
* @msg: message to be printed
|
||||
*/
|
||||
void edac_device_handle_ue_count(struct edac_device_ctl_info *edac_dev,
|
||||
unsigned int count, int inst_nr, int block_nr,
|
||||
const char *msg);
|
||||
|
||||
/**
|
||||
* edac_device_handle_ce(): Log a single correctable error
|
||||
*
|
||||
* @edac_dev: pointer to struct &edac_device_ctl_info
|
||||
* @inst_nr: number of the instance where the CE error happened
|
||||
* @block_nr: number of the block where the CE error happened
|
||||
* @msg: message to be printed
|
||||
*/
|
||||
extern void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
|
||||
int inst_nr, int block_nr, const char *msg);
|
||||
static inline void
|
||||
edac_device_handle_ce(struct edac_device_ctl_info *edac_dev, int inst_nr,
|
||||
int block_nr, const char *msg)
|
||||
{
|
||||
edac_device_handle_ce_count(edac_dev, 1, inst_nr, block_nr, msg);
|
||||
}
|
||||
|
||||
/**
|
||||
* edac_device_handle_ue(): Log a single uncorrectable error
|
||||
*
|
||||
* @edac_dev: pointer to struct &edac_device_ctl_info
|
||||
* @inst_nr: number of the instance where the UE error happened
|
||||
* @block_nr: number of the block where the UE error happened
|
||||
* @msg: message to be printed
|
||||
*/
|
||||
static inline void
|
||||
edac_device_handle_ue(struct edac_device_ctl_info *edac_dev, int inst_nr,
|
||||
int block_nr, const char *msg)
|
||||
{
|
||||
edac_device_handle_ue_count(edac_dev, 1, inst_nr, block_nr, msg);
|
||||
}
|
||||
|
||||
/**
|
||||
* edac_device_alloc_index: Allocate a unique device index number
|
||||
@ -316,5 +349,4 @@ extern void edac_device_handle_ce(struct edac_device_ctl_info *edac_dev,
|
||||
*/
|
||||
extern int edac_device_alloc_index(void);
|
||||
extern const char *edac_layer_name[];
|
||||
|
||||
#endif
|
||||
|
@ -145,15 +145,18 @@ static void edac_mc_dump_channel(struct rank_info *chan)
|
||||
edac_dbg(4, " channel->dimm = %p\n", chan->dimm);
|
||||
}
|
||||
|
||||
static void edac_mc_dump_dimm(struct dimm_info *dimm, int number)
|
||||
static void edac_mc_dump_dimm(struct dimm_info *dimm)
|
||||
{
|
||||
char location[80];
|
||||
|
||||
if (!dimm->nr_pages)
|
||||
return;
|
||||
|
||||
edac_dimm_info_location(dimm, location, sizeof(location));
|
||||
|
||||
edac_dbg(4, "%s%i: %smapped as virtual row %d, chan %d\n",
|
||||
dimm->mci->csbased ? "rank" : "dimm",
|
||||
number, location, dimm->csrow, dimm->cschannel);
|
||||
dimm->idx, location, dimm->csrow, dimm->cschannel);
|
||||
edac_dbg(4, " dimm = %p\n", dimm);
|
||||
edac_dbg(4, " dimm->label = '%s'\n", dimm->label);
|
||||
edac_dbg(4, " dimm->nr_pages = 0x%x\n", dimm->nr_pages);
|
||||
@ -314,25 +317,28 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
|
||||
struct dimm_info *dimm;
|
||||
u32 *ce_per_layer[EDAC_MAX_LAYERS], *ue_per_layer[EDAC_MAX_LAYERS];
|
||||
unsigned int pos[EDAC_MAX_LAYERS];
|
||||
unsigned int size, tot_dimms = 1, count = 1;
|
||||
unsigned int idx, size, tot_dimms = 1, count = 1;
|
||||
unsigned int tot_csrows = 1, tot_channels = 1, tot_errcount = 0;
|
||||
void *pvt, *p, *ptr = NULL;
|
||||
int i, j, row, chn, n, len, off;
|
||||
int i, j, row, chn, n, len;
|
||||
bool per_rank = false;
|
||||
|
||||
BUG_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0);
|
||||
if (WARN_ON(n_layers > EDAC_MAX_LAYERS || n_layers == 0))
|
||||
return NULL;
|
||||
|
||||
/*
|
||||
* Calculate the total amount of dimms and csrows/cschannels while
|
||||
* in the old API emulation mode
|
||||
*/
|
||||
for (i = 0; i < n_layers; i++) {
|
||||
tot_dimms *= layers[i].size;
|
||||
if (layers[i].is_virt_csrow)
|
||||
tot_csrows *= layers[i].size;
|
||||
else
|
||||
tot_channels *= layers[i].size;
|
||||
for (idx = 0; idx < n_layers; idx++) {
|
||||
tot_dimms *= layers[idx].size;
|
||||
|
||||
if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
|
||||
if (layers[idx].is_virt_csrow)
|
||||
tot_csrows *= layers[idx].size;
|
||||
else
|
||||
tot_channels *= layers[idx].size;
|
||||
|
||||
if (layers[idx].type == EDAC_MC_LAYER_CHIP_SELECT)
|
||||
per_rank = true;
|
||||
}
|
||||
|
||||
@ -425,19 +431,15 @@ struct mem_ctl_info *edac_mc_alloc(unsigned int mc_num,
|
||||
memset(&pos, 0, sizeof(pos));
|
||||
row = 0;
|
||||
chn = 0;
|
||||
for (i = 0; i < tot_dimms; i++) {
|
||||
for (idx = 0; idx < tot_dimms; idx++) {
|
||||
chan = mci->csrows[row]->channels[chn];
|
||||
off = EDAC_DIMM_OFF(layer, n_layers, pos[0], pos[1], pos[2]);
|
||||
if (off < 0 || off >= tot_dimms) {
|
||||
edac_mc_printk(mci, KERN_ERR, "EDAC core bug: EDAC_DIMM_OFF is trying to do an illegal data access\n");
|
||||
goto error;
|
||||
}
|
||||
|
||||
dimm = kzalloc(sizeof(**mci->dimms), GFP_KERNEL);
|
||||
if (!dimm)
|
||||
goto error;
|
||||
mci->dimms[off] = dimm;
|
||||
mci->dimms[idx] = dimm;
|
||||
dimm->mci = mci;
|
||||
dimm->idx = idx;
|
||||
|
||||
/*
|
||||
* Copy DIMM location and initialize it.
|
||||
@ -714,6 +716,7 @@ int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
|
||||
edac_mc_dump_mci(mci);
|
||||
|
||||
if (edac_debug_level >= 4) {
|
||||
struct dimm_info *dimm;
|
||||
int i;
|
||||
|
||||
for (i = 0; i < mci->nr_csrows; i++) {
|
||||
@ -730,9 +733,9 @@ int edac_mc_add_mc_with_groups(struct mem_ctl_info *mci,
|
||||
if (csrow->channels[j]->dimm->nr_pages)
|
||||
edac_mc_dump_channel(csrow->channels[j]);
|
||||
}
|
||||
for (i = 0; i < mci->tot_dimms; i++)
|
||||
if (mci->dimms[i]->nr_pages)
|
||||
edac_mc_dump_dimm(mci->dimms[i], i);
|
||||
|
||||
mci_for_each_dimm(mci, dimm)
|
||||
edac_mc_dump_dimm(dimm);
|
||||
}
|
||||
#endif
|
||||
mutex_lock(&mem_ctls_mutex);
|
||||
@ -1055,6 +1058,21 @@ void edac_raw_mc_handle_error(const enum hw_event_mc_err_type type,
|
||||
{
|
||||
char detail[80];
|
||||
int pos[EDAC_MAX_LAYERS] = { e->top_layer, e->mid_layer, e->low_layer };
|
||||
u8 grain_bits;
|
||||
|
||||
/* Sanity-check driver-supplied grain value. */
|
||||
if (WARN_ON_ONCE(!e->grain))
|
||||
e->grain = 1;
|
||||
|
||||
grain_bits = fls_long(e->grain - 1);
|
||||
|
||||
/* Report the error via the trace interface */
|
||||
if (IS_ENABLED(CONFIG_RAS))
|
||||
trace_mc_event(type, e->msg, e->label, e->error_count,
|
||||
mci->mc_idx, e->top_layer, e->mid_layer,
|
||||
e->low_layer,
|
||||
(e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
|
||||
grain_bits, e->syndrome, e->other_detail);
|
||||
|
||||
/* Memory type dependent details about the error */
|
||||
if (type == HW_EVENT_ERR_CORRECTED) {
|
||||
@ -1090,11 +1108,11 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
|
||||
const char *msg,
|
||||
const char *other_detail)
|
||||
{
|
||||
struct dimm_info *dimm;
|
||||
char *p;
|
||||
int row = -1, chan = -1;
|
||||
int pos[EDAC_MAX_LAYERS] = { top_layer, mid_layer, low_layer };
|
||||
int i, n_labels = 0;
|
||||
u8 grain_bits;
|
||||
struct edac_raw_error_desc *e = &mci->error_desc;
|
||||
|
||||
edac_dbg(3, "MC%d\n", mci->mc_idx);
|
||||
@ -1150,9 +1168,7 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
|
||||
p = e->label;
|
||||
*p = '\0';
|
||||
|
||||
for (i = 0; i < mci->tot_dimms; i++) {
|
||||
struct dimm_info *dimm = mci->dimms[i];
|
||||
|
||||
mci_for_each_dimm(mci, dimm) {
|
||||
if (top_layer >= 0 && top_layer != dimm->location[0])
|
||||
continue;
|
||||
if (mid_layer >= 0 && mid_layer != dimm->location[1])
|
||||
@ -1170,37 +1186,37 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
|
||||
* channel/memory controller/... may be affected.
|
||||
* Also, don't show errors for empty DIMM slots.
|
||||
*/
|
||||
if (e->enable_per_layer_report && dimm->nr_pages) {
|
||||
if (n_labels >= EDAC_MAX_LABELS) {
|
||||
e->enable_per_layer_report = false;
|
||||
break;
|
||||
}
|
||||
n_labels++;
|
||||
if (p != e->label) {
|
||||
strcpy(p, OTHER_LABEL);
|
||||
p += strlen(OTHER_LABEL);
|
||||
}
|
||||
strcpy(p, dimm->label);
|
||||
p += strlen(p);
|
||||
*p = '\0';
|
||||
if (!e->enable_per_layer_report || !dimm->nr_pages)
|
||||
continue;
|
||||
|
||||
/*
|
||||
* get csrow/channel of the DIMM, in order to allow
|
||||
* incrementing the compat API counters
|
||||
*/
|
||||
edac_dbg(4, "%s csrows map: (%d,%d)\n",
|
||||
mci->csbased ? "rank" : "dimm",
|
||||
dimm->csrow, dimm->cschannel);
|
||||
if (row == -1)
|
||||
row = dimm->csrow;
|
||||
else if (row >= 0 && row != dimm->csrow)
|
||||
row = -2;
|
||||
|
||||
if (chan == -1)
|
||||
chan = dimm->cschannel;
|
||||
else if (chan >= 0 && chan != dimm->cschannel)
|
||||
chan = -2;
|
||||
if (n_labels >= EDAC_MAX_LABELS) {
|
||||
e->enable_per_layer_report = false;
|
||||
break;
|
||||
}
|
||||
n_labels++;
|
||||
if (p != e->label) {
|
||||
strcpy(p, OTHER_LABEL);
|
||||
p += strlen(OTHER_LABEL);
|
||||
}
|
||||
strcpy(p, dimm->label);
|
||||
p += strlen(p);
|
||||
|
||||
/*
|
||||
* get csrow/channel of the DIMM, in order to allow
|
||||
* incrementing the compat API counters
|
||||
*/
|
||||
edac_dbg(4, "%s csrows map: (%d,%d)\n",
|
||||
mci->csbased ? "rank" : "dimm",
|
||||
dimm->csrow, dimm->cschannel);
|
||||
if (row == -1)
|
||||
row = dimm->csrow;
|
||||
else if (row >= 0 && row != dimm->csrow)
|
||||
row = -2;
|
||||
|
||||
if (chan == -1)
|
||||
chan = dimm->cschannel;
|
||||
else if (chan >= 0 && chan != dimm->cschannel)
|
||||
chan = -2;
|
||||
}
|
||||
|
||||
if (!e->enable_per_layer_report) {
|
||||
@ -1234,20 +1250,6 @@ void edac_mc_handle_error(const enum hw_event_mc_err_type type,
|
||||
if (p > e->location)
|
||||
*(p - 1) = '\0';
|
||||
|
||||
/* Sanity-check driver-supplied grain value. */
|
||||
if (WARN_ON_ONCE(!e->grain))
|
||||
e->grain = 1;
|
||||
|
||||
grain_bits = fls_long(e->grain - 1);
|
||||
|
||||
/* Report the error via the trace interface */
|
||||
if (IS_ENABLED(CONFIG_RAS))
|
||||
trace_mc_event(type, e->msg, e->label, e->error_count,
|
||||
mci->mc_idx, e->top_layer, e->mid_layer,
|
||||
e->low_layer,
|
||||
(e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
|
||||
grain_bits, e->syndrome, e->other_detail);
|
||||
|
||||
edac_raw_mc_handle_error(type, mci, e);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(edac_mc_handle_error);
|
||||
|
@ -557,14 +557,8 @@ static ssize_t dimmdev_ce_count_show(struct device *dev,
|
||||
{
|
||||
struct dimm_info *dimm = to_dimm(dev);
|
||||
u32 count;
|
||||
int off;
|
||||
|
||||
off = EDAC_DIMM_OFF(dimm->mci->layers,
|
||||
dimm->mci->n_layers,
|
||||
dimm->location[0],
|
||||
dimm->location[1],
|
||||
dimm->location[2]);
|
||||
count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][off];
|
||||
count = dimm->mci->ce_per_layer[dimm->mci->n_layers-1][dimm->idx];
|
||||
return sprintf(data, "%u\n", count);
|
||||
}
|
||||
|
||||
@ -574,14 +568,8 @@ static ssize_t dimmdev_ue_count_show(struct device *dev,
|
||||
{
|
||||
struct dimm_info *dimm = to_dimm(dev);
|
||||
u32 count;
|
||||
int off;
|
||||
|
||||
off = EDAC_DIMM_OFF(dimm->mci->layers,
|
||||
dimm->mci->n_layers,
|
||||
dimm->location[0],
|
||||
dimm->location[1],
|
||||
dimm->location[2]);
|
||||
count = dimm->mci->ue_per_layer[dimm->mci->n_layers-1][off];
|
||||
count = dimm->mci->ue_per_layer[dimm->mci->n_layers-1][dimm->idx];
|
||||
return sprintf(data, "%u\n", count);
|
||||
}
|
||||
|
||||
@ -633,8 +621,7 @@ static const struct device_type dimm_attr_type = {
|
||||
|
||||
/* Create a DIMM object under specifed memory controller device */
|
||||
static int edac_create_dimm_object(struct mem_ctl_info *mci,
|
||||
struct dimm_info *dimm,
|
||||
int index)
|
||||
struct dimm_info *dimm)
|
||||
{
|
||||
int err;
|
||||
dimm->mci = mci;
|
||||
@ -644,9 +631,9 @@ static int edac_create_dimm_object(struct mem_ctl_info *mci,
|
||||
|
||||
dimm->dev.parent = &mci->dev;
|
||||
if (mci->csbased)
|
||||
dev_set_name(&dimm->dev, "rank%d", index);
|
||||
dev_set_name(&dimm->dev, "rank%d", dimm->idx);
|
||||
else
|
||||
dev_set_name(&dimm->dev, "dimm%d", index);
|
||||
dev_set_name(&dimm->dev, "dimm%d", dimm->idx);
|
||||
dev_set_drvdata(&dimm->dev, dimm);
|
||||
pm_runtime_forbid(&mci->dev);
|
||||
|
||||
@ -928,7 +915,8 @@ static const struct device_type mci_attr_type = {
|
||||
int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
|
||||
const struct attribute_group **groups)
|
||||
{
|
||||
int i, err;
|
||||
struct dimm_info *dimm;
|
||||
int err;
|
||||
|
||||
/* get the /sys/devices/system/edac subsys reference */
|
||||
mci->dev.type = &mci_attr_type;
|
||||
@ -952,13 +940,12 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
|
||||
/*
|
||||
* Create the dimm/rank devices
|
||||
*/
|
||||
for (i = 0; i < mci->tot_dimms; i++) {
|
||||
struct dimm_info *dimm = mci->dimms[i];
|
||||
mci_for_each_dimm(mci, dimm) {
|
||||
/* Only expose populated DIMMs */
|
||||
if (!dimm->nr_pages)
|
||||
continue;
|
||||
|
||||
err = edac_create_dimm_object(mci, dimm, i);
|
||||
err = edac_create_dimm_object(mci, dimm);
|
||||
if (err)
|
||||
goto fail_unregister_dimm;
|
||||
}
|
||||
@ -973,12 +960,9 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
|
||||
return 0;
|
||||
|
||||
fail_unregister_dimm:
|
||||
for (i--; i >= 0; i--) {
|
||||
struct dimm_info *dimm = mci->dimms[i];
|
||||
if (!dimm->nr_pages)
|
||||
continue;
|
||||
|
||||
device_unregister(&dimm->dev);
|
||||
mci_for_each_dimm(mci, dimm) {
|
||||
if (device_is_registered(&dimm->dev))
|
||||
device_unregister(&dimm->dev);
|
||||
}
|
||||
device_unregister(&mci->dev);
|
||||
|
||||
@ -990,7 +974,7 @@ int edac_create_sysfs_mci_device(struct mem_ctl_info *mci,
|
||||
*/
|
||||
void edac_remove_sysfs_mci_device(struct mem_ctl_info *mci)
|
||||
{
|
||||
int i;
|
||||
struct dimm_info *dimm;
|
||||
|
||||
edac_dbg(0, "\n");
|
||||
|
||||
@ -1001,8 +985,7 @@ void edac_remove_sysfs_mci_device(struct mem_ctl_info *mci)
|
||||
edac_delete_csrow_objects(mci);
|
||||
#endif
|
||||
|
||||
for (i = 0; i < mci->tot_dimms; i++) {
|
||||
struct dimm_info *dimm = mci->dimms[i];
|
||||
mci_for_each_dimm(mci, dimm) {
|
||||
if (dimm->nr_pages == 0)
|
||||
continue;
|
||||
edac_dbg(1, "unregistering device %s\n", dev_name(&dimm->dev));
|
||||
|
@ -21,14 +21,22 @@ struct ghes_edac_pvt {
|
||||
struct mem_ctl_info *mci;
|
||||
|
||||
/* Buffers for the error handling routine */
|
||||
char detail_location[240];
|
||||
char other_detail[160];
|
||||
char other_detail[400];
|
||||
char msg[80];
|
||||
};
|
||||
|
||||
static atomic_t ghes_init = ATOMIC_INIT(0);
|
||||
static refcount_t ghes_refcount = REFCOUNT_INIT(0);
|
||||
|
||||
/*
|
||||
* Access to ghes_pvt must be protected by ghes_lock. The spinlock
|
||||
* also provides the necessary (implicit) memory barrier for the SMP
|
||||
* case to make the pointer visible on another CPU.
|
||||
*/
|
||||
static struct ghes_edac_pvt *ghes_pvt;
|
||||
|
||||
/* GHES registration mutex */
|
||||
static DEFINE_MUTEX(ghes_reg_mutex);
|
||||
|
||||
/*
|
||||
* Sync with other, potentially concurrent callers of
|
||||
* ghes_edac_report_mem_error(). We don't know what the
|
||||
@ -79,15 +87,15 @@ static void ghes_edac_count_dimms(const struct dmi_header *dh, void *arg)
|
||||
(*num_dimm)++;
|
||||
}
|
||||
|
||||
static int get_dimm_smbios_index(u16 handle)
|
||||
static int get_dimm_smbios_index(struct mem_ctl_info *mci, u16 handle)
|
||||
{
|
||||
struct mem_ctl_info *mci = ghes_pvt->mci;
|
||||
int i;
|
||||
struct dimm_info *dimm;
|
||||
|
||||
for (i = 0; i < mci->tot_dimms; i++) {
|
||||
if (mci->dimms[i]->smbios_handle == handle)
|
||||
return i;
|
||||
mci_for_each_dimm(mci, dimm) {
|
||||
if (dimm->smbios_handle == handle)
|
||||
return dimm->idx;
|
||||
}
|
||||
|
||||
return -1;
|
||||
}
|
||||
|
||||
@ -98,9 +106,7 @@ static void ghes_edac_dmidecode(const struct dmi_header *dh, void *arg)
|
||||
|
||||
if (dh->type == DMI_ENTRY_MEM_DEVICE) {
|
||||
struct memdev_dmi_entry *entry = (struct memdev_dmi_entry *)dh;
|
||||
struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
|
||||
mci->n_layers,
|
||||
dimm_fill->count, 0, 0);
|
||||
struct dimm_info *dimm = edac_get_dimm(mci, dimm_fill->count, 0, 0);
|
||||
u16 rdr_mask = BIT(7) | BIT(13);
|
||||
|
||||
if (entry->size == 0xffff) {
|
||||
@ -198,13 +204,9 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
|
||||
enum hw_event_mc_err_type type;
|
||||
struct edac_raw_error_desc *e;
|
||||
struct mem_ctl_info *mci;
|
||||
struct ghes_edac_pvt *pvt = ghes_pvt;
|
||||
struct ghes_edac_pvt *pvt;
|
||||
unsigned long flags;
|
||||
char *p;
|
||||
u8 grain_bits;
|
||||
|
||||
if (!pvt)
|
||||
return;
|
||||
|
||||
/*
|
||||
* We can do the locking below because GHES defers error processing
|
||||
@ -216,12 +218,17 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
|
||||
|
||||
spin_lock_irqsave(&ghes_lock, flags);
|
||||
|
||||
pvt = ghes_pvt;
|
||||
if (!pvt)
|
||||
goto unlock;
|
||||
|
||||
mci = pvt->mci;
|
||||
e = &mci->error_desc;
|
||||
|
||||
/* Cleans the error report buffer */
|
||||
memset(e, 0, sizeof (*e));
|
||||
e->error_count = 1;
|
||||
e->grain = 1;
|
||||
strcpy(e->label, "unknown label");
|
||||
e->msg = pvt->msg;
|
||||
e->other_detail = pvt->other_detail;
|
||||
@ -311,13 +318,13 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
|
||||
|
||||
/* Error address */
|
||||
if (mem_err->validation_bits & CPER_MEM_VALID_PA) {
|
||||
e->page_frame_number = mem_err->physical_addr >> PAGE_SHIFT;
|
||||
e->offset_in_page = mem_err->physical_addr & ~PAGE_MASK;
|
||||
e->page_frame_number = PHYS_PFN(mem_err->physical_addr);
|
||||
e->offset_in_page = offset_in_page(mem_err->physical_addr);
|
||||
}
|
||||
|
||||
/* Error grain */
|
||||
if (mem_err->validation_bits & CPER_MEM_VALID_PA_MASK)
|
||||
e->grain = ~(mem_err->physical_addr_mask & ~PAGE_MASK);
|
||||
e->grain = ~mem_err->physical_addr_mask + 1;
|
||||
|
||||
/* Memory error location, mapped on e->location */
|
||||
p = e->location;
|
||||
@ -348,7 +355,7 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
|
||||
p += sprintf(p, "DIMM DMI handle: 0x%.4x ",
|
||||
mem_err->mem_dev_handle);
|
||||
|
||||
index = get_dimm_smbios_index(mem_err->mem_dev_handle);
|
||||
index = get_dimm_smbios_index(mci, mem_err->mem_dev_handle);
|
||||
if (index >= 0) {
|
||||
e->top_layer = index;
|
||||
e->enable_per_layer_report = true;
|
||||
@ -360,6 +367,8 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
|
||||
|
||||
/* All other fields are mapped on e->other_detail */
|
||||
p = pvt->other_detail;
|
||||
p += snprintf(p, sizeof(pvt->other_detail),
|
||||
"APEI location: %s ", e->location);
|
||||
if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_STATUS) {
|
||||
u64 status = mem_err->error_status;
|
||||
|
||||
@ -433,16 +442,9 @@ void ghes_edac_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)
|
||||
if (p > pvt->other_detail)
|
||||
*(p - 1) = '\0';
|
||||
|
||||
/* Generate the trace event */
|
||||
grain_bits = fls_long(e->grain);
|
||||
snprintf(pvt->detail_location, sizeof(pvt->detail_location),
|
||||
"APEI location: %s %s", e->location, e->other_detail);
|
||||
trace_mc_event(type, e->msg, e->label, e->error_count,
|
||||
mci->mc_idx, e->top_layer, e->mid_layer, e->low_layer,
|
||||
(e->page_frame_number << PAGE_SHIFT) | e->offset_in_page,
|
||||
grain_bits, e->syndrome, pvt->detail_location);
|
||||
|
||||
edac_raw_mc_handle_error(type, mci, e);
|
||||
|
||||
unlock:
|
||||
spin_unlock_irqrestore(&ghes_lock, flags);
|
||||
}
|
||||
|
||||
@ -457,10 +459,12 @@ static struct acpi_platform_list plat_list[] = {
|
||||
int ghes_edac_register(struct ghes *ghes, struct device *dev)
|
||||
{
|
||||
bool fake = false;
|
||||
int rc, num_dimm = 0;
|
||||
int rc = 0, num_dimm = 0;
|
||||
struct mem_ctl_info *mci;
|
||||
struct ghes_edac_pvt *pvt;
|
||||
struct edac_mc_layer layers[1];
|
||||
struct ghes_edac_dimm_fill dimm_fill;
|
||||
unsigned long flags;
|
||||
int idx = -1;
|
||||
|
||||
if (IS_ENABLED(CONFIG_X86)) {
|
||||
@ -472,11 +476,14 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
|
||||
idx = 0;
|
||||
}
|
||||
|
||||
/* finish another registration/unregistration instance first */
|
||||
mutex_lock(&ghes_reg_mutex);
|
||||
|
||||
/*
|
||||
* We have only one logical memory controller to which all DIMMs belong.
|
||||
*/
|
||||
if (atomic_inc_return(&ghes_init) > 1)
|
||||
return 0;
|
||||
if (refcount_inc_not_zero(&ghes_refcount))
|
||||
goto unlock;
|
||||
|
||||
/* Get the number of DIMMs */
|
||||
dmi_walk(ghes_edac_count_dimms, &num_dimm);
|
||||
@ -494,12 +501,13 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
|
||||
mci = edac_mc_alloc(0, ARRAY_SIZE(layers), layers, sizeof(struct ghes_edac_pvt));
|
||||
if (!mci) {
|
||||
pr_info("Can't allocate memory for EDAC data\n");
|
||||
return -ENOMEM;
|
||||
rc = -ENOMEM;
|
||||
goto unlock;
|
||||
}
|
||||
|
||||
ghes_pvt = mci->pvt_info;
|
||||
ghes_pvt->ghes = ghes;
|
||||
ghes_pvt->mci = mci;
|
||||
pvt = mci->pvt_info;
|
||||
pvt->ghes = ghes;
|
||||
pvt->mci = mci;
|
||||
|
||||
mci->pdev = dev;
|
||||
mci->mtype_cap = MEM_FLAG_EMPTY;
|
||||
@ -527,8 +535,7 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
|
||||
dimm_fill.mci = mci;
|
||||
dmi_walk(ghes_edac_dmidecode, &dimm_fill);
|
||||
} else {
|
||||
struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
|
||||
mci->n_layers, 0, 0, 0);
|
||||
struct dimm_info *dimm = edac_get_dimm(mci, 0, 0, 0);
|
||||
|
||||
dimm->nr_pages = 1;
|
||||
dimm->grain = 128;
|
||||
@ -541,23 +548,48 @@ int ghes_edac_register(struct ghes *ghes, struct device *dev)
|
||||
if (rc < 0) {
|
||||
pr_info("Can't register at EDAC core\n");
|
||||
edac_mc_free(mci);
|
||||
return -ENODEV;
|
||||
rc = -ENODEV;
|
||||
goto unlock;
|
||||
}
|
||||
return 0;
|
||||
|
||||
spin_lock_irqsave(&ghes_lock, flags);
|
||||
ghes_pvt = pvt;
|
||||
spin_unlock_irqrestore(&ghes_lock, flags);
|
||||
|
||||
/* only set on success */
|
||||
refcount_set(&ghes_refcount, 1);
|
||||
|
||||
unlock:
|
||||
mutex_unlock(&ghes_reg_mutex);
|
||||
|
||||
return rc;
|
||||
}
|
||||
|
||||
void ghes_edac_unregister(struct ghes *ghes)
|
||||
{
|
||||
struct mem_ctl_info *mci;
|
||||
unsigned long flags;
|
||||
|
||||
if (!ghes_pvt)
|
||||
return;
|
||||
mutex_lock(&ghes_reg_mutex);
|
||||
|
||||
if (atomic_dec_return(&ghes_init))
|
||||
return;
|
||||
if (!refcount_dec_and_test(&ghes_refcount))
|
||||
goto unlock;
|
||||
|
||||
mci = ghes_pvt->mci;
|
||||
/*
|
||||
* Wait for the irq handler being finished.
|
||||
*/
|
||||
spin_lock_irqsave(&ghes_lock, flags);
|
||||
mci = ghes_pvt ? ghes_pvt->mci : NULL;
|
||||
ghes_pvt = NULL;
|
||||
edac_mc_del_mc(mci->pdev);
|
||||
edac_mc_free(mci);
|
||||
spin_unlock_irqrestore(&ghes_lock, flags);
|
||||
|
||||
if (!mci)
|
||||
goto unlock;
|
||||
|
||||
mci = edac_mc_del_mc(mci->pdev);
|
||||
if (mci)
|
||||
edac_mc_free(mci);
|
||||
|
||||
unlock:
|
||||
mutex_unlock(&ghes_reg_mutex);
|
||||
}
|
||||
|
@ -154,8 +154,7 @@ static int i10nm_get_dimm_config(struct mem_ctl_info *mci)
|
||||
|
||||
ndimms = 0;
|
||||
for (j = 0; j < I10NM_NUM_DIMMS; j++) {
|
||||
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
|
||||
mci->n_layers, i, j, 0);
|
||||
dimm = edac_get_dimm(mci, i, j, 0);
|
||||
mtr = I10NM_GET_DIMMMTR(imc, i, j);
|
||||
mcddrtcfg = I10NM_GET_MCDDRTCFG(imc, i, j);
|
||||
edac_dbg(1, "dimmmtr 0x%x mcddrtcfg 0x%x (mc%d ch%d dimm%d)\n",
|
||||
|
@ -392,8 +392,7 @@ static int i3200_probe1(struct pci_dev *pdev, int dev_idx)
|
||||
unsigned long nr_pages;
|
||||
|
||||
for (j = 0; j < nr_channels; j++) {
|
||||
struct dimm_info *dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
|
||||
mci->n_layers, i, j, 0);
|
||||
struct dimm_info *dimm = edac_get_dimm(mci, i, j, 0);
|
||||
|
||||
nr_pages = drb_to_nr_pages(drbs, stacked, j, i);
|
||||
if (nr_pages == 0)
|
||||
|
@ -1275,9 +1275,8 @@ static int i5000_init_csrows(struct mem_ctl_info *mci)
|
||||
if (!MTR_DIMMS_PRESENT(mtr))
|
||||
continue;
|
||||
|
||||
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers,
|
||||
channel / MAX_BRANCHES,
|
||||
channel % MAX_BRANCHES, slot);
|
||||
dimm = edac_get_dimm(mci, channel / MAX_BRANCHES,
|
||||
channel % MAX_BRANCHES, slot);
|
||||
|
||||
csrow_megs = pvt->dimm_info[slot][channel].megabytes;
|
||||
dimm->grain = 8;
|
||||
|
@ -713,7 +713,6 @@ static int i5100_read_spd_byte(const struct mem_ctl_info *mci,
|
||||
{
|
||||
struct i5100_priv *priv = mci->pvt_info;
|
||||
u16 w;
|
||||
unsigned long et;
|
||||
|
||||
pci_read_config_word(priv->mc, I5100_SPDDATA, &w);
|
||||
if (i5100_spddata_busy(w))
|
||||
@ -724,7 +723,6 @@ static int i5100_read_spd_byte(const struct mem_ctl_info *mci,
|
||||
0, 0));
|
||||
|
||||
/* wait up to 100ms */
|
||||
et = jiffies + HZ / 10;
|
||||
udelay(100);
|
||||
while (1) {
|
||||
pci_read_config_word(priv->mc, I5100_SPDDATA, &w);
|
||||
@ -848,21 +846,17 @@ static void i5100_init_interleaving(struct pci_dev *pdev,
|
||||
|
||||
static void i5100_init_csrows(struct mem_ctl_info *mci)
|
||||
{
|
||||
int i;
|
||||
struct i5100_priv *priv = mci->pvt_info;
|
||||
struct dimm_info *dimm;
|
||||
|
||||
for (i = 0; i < mci->tot_dimms; i++) {
|
||||
struct dimm_info *dimm;
|
||||
const unsigned long npages = i5100_npages(mci, i);
|
||||
const unsigned int chan = i5100_csrow_to_chan(mci, i);
|
||||
const unsigned int rank = i5100_csrow_to_rank(mci, i);
|
||||
mci_for_each_dimm(mci, dimm) {
|
||||
const unsigned long npages = i5100_npages(mci, dimm->idx);
|
||||
const unsigned int chan = i5100_csrow_to_chan(mci, dimm->idx);
|
||||
const unsigned int rank = i5100_csrow_to_rank(mci, dimm->idx);
|
||||
|
||||
if (!npages)
|
||||
continue;
|
||||
|
||||
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers,
|
||||
chan, rank, 0);
|
||||
|
||||
dimm->nr_pages = npages;
|
||||
dimm->grain = 32;
|
||||
dimm->dtype = (priv->mtr[chan][rank].width == 4) ?
|
||||
|
@ -548,8 +548,8 @@ static void i5400_proccess_non_recoverable_info(struct mem_ctl_info *mci,
|
||||
ras = nrec_ras(info);
|
||||
cas = nrec_cas(info);
|
||||
|
||||
edac_dbg(0, "\t\tDIMM= %d Channels= %d,%d (Branch= %d DRAM Bank= %d Buffer ID = %d rdwr= %s ras= %d cas= %d)\n",
|
||||
rank, channel, channel + 1, branch >> 1, bank,
|
||||
edac_dbg(0, "\t\t%s DIMM= %d Channels= %d,%d (Branch= %d DRAM Bank= %d Buffer ID = %d rdwr= %s ras= %d cas= %d)\n",
|
||||
type, rank, channel, channel + 1, branch >> 1, bank,
|
||||
buf_id, rdwr_str(rdwr), ras, cas);
|
||||
|
||||
/* Only 1 bit will be on */
|
||||
@ -1054,8 +1054,6 @@ static void i5400_get_mc_regs(struct mem_ctl_info *mci)
|
||||
u32 actual_tolm;
|
||||
u16 limit;
|
||||
int slot_row;
|
||||
int maxch;
|
||||
int maxdimmperch;
|
||||
int way0, way1;
|
||||
|
||||
pvt = mci->pvt_info;
|
||||
@ -1065,9 +1063,6 @@ static void i5400_get_mc_regs(struct mem_ctl_info *mci)
|
||||
pci_read_config_dword(pvt->system_address, AMBASE + sizeof(u32),
|
||||
&pvt->u.ambase_top);
|
||||
|
||||
maxdimmperch = pvt->maxdimmperch;
|
||||
maxch = pvt->maxch;
|
||||
|
||||
edac_dbg(2, "AMBASE= 0x%lx MAXCH= %d MAX-DIMM-Per-CH= %d\n",
|
||||
(long unsigned int)pvt->ambase, pvt->maxch, pvt->maxdimmperch);
|
||||
|
||||
@ -1170,17 +1165,13 @@ static int i5400_init_dimms(struct mem_ctl_info *mci)
|
||||
{
|
||||
struct i5400_pvt *pvt;
|
||||
struct dimm_info *dimm;
|
||||
int ndimms, channel_count;
|
||||
int max_dimms;
|
||||
int ndimms;
|
||||
int mtr;
|
||||
int size_mb;
|
||||
int channel, slot;
|
||||
|
||||
pvt = mci->pvt_info;
|
||||
|
||||
channel_count = pvt->maxch;
|
||||
max_dimms = pvt->maxdimmperch;
|
||||
|
||||
ndimms = 0;
|
||||
|
||||
/*
|
||||
@ -1196,8 +1187,7 @@ static int i5400_init_dimms(struct mem_ctl_info *mci)
|
||||
if (!MTR_DIMMS_PRESENT(mtr))
|
||||
continue;
|
||||
|
||||
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers,
|
||||
channel / 2, channel % 2, slot);
|
||||
dimm = edac_get_dimm(mci, channel / 2, channel % 2, slot);
|
||||
|
||||
size_mb = pvt->dimm_info[slot][channel].megabytes;
|
||||
|
||||
|
@ -580,7 +580,7 @@ static void i7300_enable_error_reporting(struct mem_ctl_info *mci)
|
||||
* @ch: Channel number within the branch (0 or 1)
|
||||
* @branch: Branch number (0 or 1)
|
||||
* @dinfo: Pointer to DIMM info where dimm size is stored
|
||||
* @p_csrow: Pointer to the struct csrow_info that corresponds to that element
|
||||
* @dimm: Pointer to the struct dimm_info that corresponds to that element
|
||||
*/
|
||||
static int decode_mtr(struct i7300_pvt *pvt,
|
||||
int slot, int ch, int branch,
|
||||
@ -794,8 +794,7 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
|
||||
for (ch = 0; ch < max_channel; ch++) {
|
||||
int channel = to_channel(ch, branch);
|
||||
|
||||
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
|
||||
mci->n_layers, branch, ch, slot);
|
||||
dimm = edac_get_dimm(mci, branch, ch, slot);
|
||||
|
||||
dinfo = &pvt->dimm_info[slot][channel];
|
||||
|
||||
@ -817,7 +816,7 @@ static int i7300_init_csrows(struct mem_ctl_info *mci)
|
||||
|
||||
/**
|
||||
* decode_mir() - Decodes Memory Interleave Register (MIR) info
|
||||
* @int mir_no: number of the MIR register to decode
|
||||
* @mir_no: number of the MIR register to decode
|
||||
* @mir: array with the MIR data cached on the driver
|
||||
*/
|
||||
static void decode_mir(int mir_no, u16 mir[MAX_MIR])
|
||||
|
@ -585,8 +585,7 @@ static int get_dimm_config(struct mem_ctl_info *mci)
|
||||
if (!DIMM_PRESENT(dimm_dod[j]))
|
||||
continue;
|
||||
|
||||
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers,
|
||||
i, j, 0);
|
||||
dimm = edac_get_dimm(mci, i, j, 0);
|
||||
banks = numbank(MC_DOD_NUMBANK(dimm_dod[j]));
|
||||
ranks = numrank(MC_DOD_NUMRANK(dimm_dod[j]));
|
||||
rows = numrow(MC_DOD_NUMROW(dimm_dod[j]));
|
||||
|
@ -490,9 +490,7 @@ static int ie31200_probe1(struct pci_dev *pdev, int dev_idx)
|
||||
|
||||
if (dimm_info[j][i].dual_rank) {
|
||||
nr_pages = nr_pages / 2;
|
||||
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
|
||||
mci->n_layers, (i * 2) + 1,
|
||||
j, 0);
|
||||
dimm = edac_get_dimm(mci, (i * 2) + 1, j, 0);
|
||||
dimm->nr_pages = nr_pages;
|
||||
edac_dbg(0, "set nr pages: 0x%lx\n", nr_pages);
|
||||
dimm->grain = 8; /* just a guess */
|
||||
@ -503,8 +501,7 @@ static int ie31200_probe1(struct pci_dev *pdev, int dev_idx)
|
||||
dimm->dtype = DEV_UNKNOWN;
|
||||
dimm->edac_mode = EDAC_UNKNOWN;
|
||||
}
|
||||
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
|
||||
mci->n_layers, i * 2, j, 0);
|
||||
dimm = edac_get_dimm(mci, i * 2, j, 0);
|
||||
dimm->nr_pages = nr_pages;
|
||||
edac_dbg(0, "set nr pages: 0x%lx\n", nr_pages);
|
||||
dimm->grain = 8; /* same guess */
|
||||
|
@ -1231,7 +1231,7 @@ static void apl_get_dimm_config(struct mem_ctl_info *mci)
|
||||
if (!(chan_mask & BIT(i)))
|
||||
continue;
|
||||
|
||||
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, i, 0, 0);
|
||||
dimm = edac_get_dimm(mci, i, 0, 0);
|
||||
if (!dimm) {
|
||||
edac_dbg(0, "No allocated DIMM for channel %d\n", i);
|
||||
continue;
|
||||
@ -1311,7 +1311,7 @@ static void dnv_get_dimm_config(struct mem_ctl_info *mci)
|
||||
if (!ranks_of_dimm[j])
|
||||
continue;
|
||||
|
||||
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, i, j, 0);
|
||||
dimm = edac_get_dimm(mci, i, j, 0);
|
||||
if (!dimm) {
|
||||
edac_dbg(0, "No allocated DIMM for channel %d DIMM %d\n", i, j);
|
||||
continue;
|
||||
|
@ -254,18 +254,20 @@ static const u32 rir_offset[MAX_RIR_RANGES][MAX_RIR_WAY] = {
|
||||
* FIXME: Implement the error count reads directly
|
||||
*/
|
||||
|
||||
static const u32 correrrcnt[] = {
|
||||
0x104, 0x108, 0x10c, 0x110,
|
||||
};
|
||||
|
||||
#define RANK_ODD_OV(reg) GET_BITFIELD(reg, 31, 31)
|
||||
#define RANK_ODD_ERR_CNT(reg) GET_BITFIELD(reg, 16, 30)
|
||||
#define RANK_EVEN_OV(reg) GET_BITFIELD(reg, 15, 15)
|
||||
#define RANK_EVEN_ERR_CNT(reg) GET_BITFIELD(reg, 0, 14)
|
||||
|
||||
#if 0 /* Currently unused*/
|
||||
static const u32 correrrcnt[] = {
|
||||
0x104, 0x108, 0x10c, 0x110,
|
||||
};
|
||||
|
||||
static const u32 correrrthrsld[] = {
|
||||
0x11c, 0x120, 0x124, 0x128,
|
||||
};
|
||||
#endif
|
||||
|
||||
#define RANK_ODD_ERR_THRSLD(reg) GET_BITFIELD(reg, 16, 30)
|
||||
#define RANK_EVEN_ERR_THRSLD(reg) GET_BITFIELD(reg, 0, 14)
|
||||
@ -1340,7 +1342,7 @@ static void knl_show_mc_route(u32 reg, char *s)
|
||||
*/
|
||||
static int knl_get_dimm_capacity(struct sbridge_pvt *pvt, u64 *mc_sizes)
|
||||
{
|
||||
u64 sad_base, sad_size, sad_limit = 0;
|
||||
u64 sad_base, sad_limit = 0;
|
||||
u64 tad_base, tad_size, tad_limit, tad_deadspace, tad_livespace;
|
||||
int sad_rule = 0;
|
||||
int tad_rule = 0;
|
||||
@ -1427,7 +1429,6 @@ static int knl_get_dimm_capacity(struct sbridge_pvt *pvt, u64 *mc_sizes)
|
||||
edram_only = KNL_EDRAM_ONLY(dram_rule);
|
||||
|
||||
sad_limit = pvt->info.sad_limit(dram_rule)+1;
|
||||
sad_size = sad_limit - sad_base;
|
||||
|
||||
pci_read_config_dword(pvt->pci_sad0,
|
||||
pvt->info.interleave_list[sad_rule], &interleave_reg);
|
||||
@ -1620,7 +1621,7 @@ static int __populate_dimms(struct mem_ctl_info *mci,
|
||||
}
|
||||
|
||||
for (j = 0; j < max_dimms_per_channel; j++) {
|
||||
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, i, j, 0);
|
||||
dimm = edac_get_dimm(mci, i, j, 0);
|
||||
if (pvt->info.type == KNIGHTS_LANDING) {
|
||||
pci_read_config_dword(pvt->knl.pci_channel[i],
|
||||
knl_mtr_reg, &mtr);
|
||||
@ -2952,7 +2953,7 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
|
||||
struct mem_ctl_info *new_mci;
|
||||
struct sbridge_pvt *pvt = mci->pvt_info;
|
||||
enum hw_event_mc_err_type tp_event;
|
||||
char *type, *optype, msg[256];
|
||||
char *optype, msg[256];
|
||||
bool ripv = GET_BITFIELD(m->mcgstatus, 0, 0);
|
||||
bool overflow = GET_BITFIELD(m->status, 62, 62);
|
||||
bool uncorrected_error = GET_BITFIELD(m->status, 61, 61);
|
||||
@ -2981,14 +2982,11 @@ static void sbridge_mce_output_error(struct mem_ctl_info *mci,
|
||||
if (uncorrected_error) {
|
||||
core_err_cnt = 1;
|
||||
if (ripv) {
|
||||
type = "FATAL";
|
||||
tp_event = HW_EVENT_ERR_FATAL;
|
||||
} else {
|
||||
type = "NON_FATAL";
|
||||
tp_event = HW_EVENT_ERR_UNCORRECTED;
|
||||
}
|
||||
} else {
|
||||
type = "CORRECTED";
|
||||
tp_event = HW_EVENT_ERR_CORRECTED;
|
||||
}
|
||||
|
||||
@ -3200,7 +3198,6 @@ static struct notifier_block sbridge_mce_dec = {
|
||||
static void sbridge_unregister_mci(struct sbridge_dev *sbridge_dev)
|
||||
{
|
||||
struct mem_ctl_info *mci = sbridge_dev->mci;
|
||||
struct sbridge_pvt *pvt;
|
||||
|
||||
if (unlikely(!mci || !mci->pvt_info)) {
|
||||
edac_dbg(0, "MC: dev = %p\n", &sbridge_dev->pdev[0]->dev);
|
||||
@ -3209,8 +3206,6 @@ static void sbridge_unregister_mci(struct sbridge_dev *sbridge_dev)
|
||||
return;
|
||||
}
|
||||
|
||||
pvt = mci->pvt_info;
|
||||
|
||||
edac_dbg(0, "MC: mci = %p, dev = %p\n",
|
||||
mci, &sbridge_dev->pdev[0]->dev);
|
||||
|
||||
|
@ -46,7 +46,8 @@ static struct skx_dev *get_skx_dev(struct pci_bus *bus, u8 idx)
|
||||
}
|
||||
|
||||
enum munittype {
|
||||
CHAN0, CHAN1, CHAN2, SAD_ALL, UTIL_ALL, SAD
|
||||
CHAN0, CHAN1, CHAN2, SAD_ALL, UTIL_ALL, SAD,
|
||||
ERRCHAN0, ERRCHAN1, ERRCHAN2,
|
||||
};
|
||||
|
||||
struct munit {
|
||||
@ -68,6 +69,9 @@ static const struct munit skx_all_munits[] = {
|
||||
{ 0x2040, { PCI_DEVFN(10, 0), PCI_DEVFN(12, 0) }, 2, 2, CHAN0 },
|
||||
{ 0x2044, { PCI_DEVFN(10, 4), PCI_DEVFN(12, 4) }, 2, 2, CHAN1 },
|
||||
{ 0x2048, { PCI_DEVFN(11, 0), PCI_DEVFN(13, 0) }, 2, 2, CHAN2 },
|
||||
{ 0x2043, { PCI_DEVFN(10, 3), PCI_DEVFN(12, 3) }, 2, 2, ERRCHAN0 },
|
||||
{ 0x2047, { PCI_DEVFN(10, 7), PCI_DEVFN(12, 7) }, 2, 2, ERRCHAN1 },
|
||||
{ 0x204b, { PCI_DEVFN(11, 3), PCI_DEVFN(13, 3) }, 2, 2, ERRCHAN2 },
|
||||
{ 0x208e, { }, 1, 0, SAD },
|
||||
{ }
|
||||
};
|
||||
@ -104,10 +108,18 @@ static int get_all_munits(const struct munit *m)
|
||||
}
|
||||
|
||||
switch (m->mtype) {
|
||||
case CHAN0: case CHAN1: case CHAN2:
|
||||
case CHAN0:
|
||||
case CHAN1:
|
||||
case CHAN2:
|
||||
pci_dev_get(pdev);
|
||||
d->imc[i].chan[m->mtype].cdev = pdev;
|
||||
break;
|
||||
case ERRCHAN0:
|
||||
case ERRCHAN1:
|
||||
case ERRCHAN2:
|
||||
pci_dev_get(pdev);
|
||||
d->imc[i].chan[m->mtype - ERRCHAN0].edev = pdev;
|
||||
break;
|
||||
case SAD_ALL:
|
||||
pci_dev_get(pdev);
|
||||
d->sad_all = pdev;
|
||||
@ -177,8 +189,7 @@ static int skx_get_dimm_config(struct mem_ctl_info *mci)
|
||||
pci_read_config_dword(imc->chan[i].cdev, 0x8C, &amap);
|
||||
pci_read_config_dword(imc->chan[i].cdev, 0x400, &mcddrtcfg);
|
||||
for (j = 0; j < SKX_NUM_DIMMS; j++) {
|
||||
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms,
|
||||
mci->n_layers, i, j, 0);
|
||||
dimm = edac_get_dimm(mci, i, j, 0);
|
||||
pci_read_config_dword(imc->chan[i].cdev,
|
||||
0x80 + 4 * j, &mtr);
|
||||
if (IS_DIMM_PRESENT(mtr)) {
|
||||
@ -216,6 +227,39 @@ static int skx_get_dimm_config(struct mem_ctl_info *mci)
|
||||
#define SKX_ILV_REMOTE(tgt) (((tgt) & 8) == 0)
|
||||
#define SKX_ILV_TARGET(tgt) ((tgt) & 7)
|
||||
|
||||
static void skx_show_retry_rd_err_log(struct decoded_addr *res,
|
||||
char *msg, int len)
|
||||
{
|
||||
u32 log0, log1, log2, log3, log4;
|
||||
u32 corr0, corr1, corr2, corr3;
|
||||
struct pci_dev *edev;
|
||||
int n;
|
||||
|
||||
edev = res->dev->imc[res->imc].chan[res->channel].edev;
|
||||
|
||||
pci_read_config_dword(edev, 0x154, &log0);
|
||||
pci_read_config_dword(edev, 0x148, &log1);
|
||||
pci_read_config_dword(edev, 0x150, &log2);
|
||||
pci_read_config_dword(edev, 0x15c, &log3);
|
||||
pci_read_config_dword(edev, 0x114, &log4);
|
||||
|
||||
n = snprintf(msg, len, " retry_rd_err_log[%.8x %.8x %.8x %.8x %.8x]",
|
||||
log0, log1, log2, log3, log4);
|
||||
|
||||
pci_read_config_dword(edev, 0x104, &corr0);
|
||||
pci_read_config_dword(edev, 0x108, &corr1);
|
||||
pci_read_config_dword(edev, 0x10c, &corr2);
|
||||
pci_read_config_dword(edev, 0x110, &corr3);
|
||||
|
||||
if (len - n > 0)
|
||||
snprintf(msg + n, len - n,
|
||||
" correrrcnt[%.4x %.4x %.4x %.4x %.4x %.4x %.4x %.4x]",
|
||||
corr0 & 0xffff, corr0 >> 16,
|
||||
corr1 & 0xffff, corr1 >> 16,
|
||||
corr2 & 0xffff, corr2 >> 16,
|
||||
corr3 & 0xffff, corr3 >> 16);
|
||||
}
|
||||
|
||||
static bool skx_sad_decode(struct decoded_addr *res)
|
||||
{
|
||||
struct skx_dev *d = list_first_entry(skx_edac_list, typeof(*d), list);
|
||||
@ -659,7 +703,7 @@ static int __init skx_init(void)
|
||||
}
|
||||
}
|
||||
|
||||
skx_set_decode(skx_decode);
|
||||
skx_set_decode(skx_decode, skx_show_retry_rd_err_log);
|
||||
|
||||
if (nvdimm_count && skx_adxl_get() == -ENODEV)
|
||||
skx_printk(KERN_NOTICE, "Only decoding DDR4 address!\n");
|
||||
|
@ -37,6 +37,7 @@ static char *adxl_msg;
|
||||
|
||||
static char skx_msg[MSG_SIZE];
|
||||
static skx_decode_f skx_decode;
|
||||
static skx_show_retry_log_f skx_show_retry_rd_err_log;
|
||||
static u64 skx_tolm, skx_tohm;
|
||||
static LIST_HEAD(dev_edac_list);
|
||||
|
||||
@ -100,6 +101,7 @@ void __exit skx_adxl_put(void)
|
||||
|
||||
static bool skx_adxl_decode(struct decoded_addr *res)
|
||||
{
|
||||
struct skx_dev *d;
|
||||
int i, len = 0;
|
||||
|
||||
if (res->addr >= skx_tohm || (res->addr >= skx_tolm &&
|
||||
@ -118,6 +120,24 @@ static bool skx_adxl_decode(struct decoded_addr *res)
|
||||
res->channel = (int)adxl_values[component_indices[INDEX_CHANNEL]];
|
||||
res->dimm = (int)adxl_values[component_indices[INDEX_DIMM]];
|
||||
|
||||
if (res->imc > NUM_IMC - 1) {
|
||||
skx_printk(KERN_ERR, "Bad imc %d\n", res->imc);
|
||||
return false;
|
||||
}
|
||||
|
||||
list_for_each_entry(d, &dev_edac_list, list) {
|
||||
if (d->imc[0].src_id == res->socket) {
|
||||
res->dev = d;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (!res->dev) {
|
||||
skx_printk(KERN_ERR, "No device for src_id %d imc %d\n",
|
||||
res->socket, res->imc);
|
||||
return false;
|
||||
}
|
||||
|
||||
for (i = 0; i < adxl_component_count; i++) {
|
||||
if (adxl_values[i] == ~0x0ull)
|
||||
continue;
|
||||
@ -131,9 +151,10 @@ static bool skx_adxl_decode(struct decoded_addr *res)
|
||||
return true;
|
||||
}
|
||||
|
||||
void skx_set_decode(skx_decode_f decode)
|
||||
void skx_set_decode(skx_decode_f decode, skx_show_retry_log_f show_retry_log)
|
||||
{
|
||||
skx_decode = decode;
|
||||
skx_show_retry_rd_err_log = show_retry_log;
|
||||
}
|
||||
|
||||
int skx_get_src_id(struct skx_dev *d, int off, u8 *id)
|
||||
@ -452,34 +473,17 @@ static void skx_unregister_mci(struct skx_imc *imc)
|
||||
edac_mc_free(mci);
|
||||
}
|
||||
|
||||
static struct mem_ctl_info *get_mci(int src_id, int lmc)
|
||||
{
|
||||
struct skx_dev *d;
|
||||
|
||||
if (lmc > NUM_IMC - 1) {
|
||||
skx_printk(KERN_ERR, "Bad lmc %d\n", lmc);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
list_for_each_entry(d, &dev_edac_list, list) {
|
||||
if (d->imc[0].src_id == src_id)
|
||||
return d->imc[lmc].mci;
|
||||
}
|
||||
|
||||
skx_printk(KERN_ERR, "No mci for src_id %d lmc %d\n", src_id, lmc);
|
||||
return NULL;
|
||||
}
|
||||
|
||||
static void skx_mce_output_error(struct mem_ctl_info *mci,
|
||||
const struct mce *m,
|
||||
struct decoded_addr *res)
|
||||
{
|
||||
enum hw_event_mc_err_type tp_event;
|
||||
char *type, *optype;
|
||||
char *optype;
|
||||
bool ripv = GET_BITFIELD(m->mcgstatus, 0, 0);
|
||||
bool overflow = GET_BITFIELD(m->status, 62, 62);
|
||||
bool uncorrected_error = GET_BITFIELD(m->status, 61, 61);
|
||||
bool recoverable;
|
||||
int len;
|
||||
u32 core_err_cnt = GET_BITFIELD(m->status, 38, 52);
|
||||
u32 mscod = GET_BITFIELD(m->status, 16, 31);
|
||||
u32 errcode = GET_BITFIELD(m->status, 0, 15);
|
||||
@ -490,14 +494,11 @@ static void skx_mce_output_error(struct mem_ctl_info *mci,
|
||||
if (uncorrected_error) {
|
||||
core_err_cnt = 1;
|
||||
if (ripv) {
|
||||
type = "FATAL";
|
||||
tp_event = HW_EVENT_ERR_FATAL;
|
||||
} else {
|
||||
type = "NON_FATAL";
|
||||
tp_event = HW_EVENT_ERR_UNCORRECTED;
|
||||
}
|
||||
} else {
|
||||
type = "CORRECTED";
|
||||
tp_event = HW_EVENT_ERR_CORRECTED;
|
||||
}
|
||||
|
||||
@ -539,12 +540,12 @@ static void skx_mce_output_error(struct mem_ctl_info *mci,
|
||||
}
|
||||
}
|
||||
if (adxl_component_count) {
|
||||
snprintf(skx_msg, MSG_SIZE, "%s%s err_code:0x%04x:0x%04x %s",
|
||||
len = snprintf(skx_msg, MSG_SIZE, "%s%s err_code:0x%04x:0x%04x %s",
|
||||
overflow ? " OVERFLOW" : "",
|
||||
(uncorrected_error && recoverable) ? " recoverable" : "",
|
||||
mscod, errcode, adxl_msg);
|
||||
} else {
|
||||
snprintf(skx_msg, MSG_SIZE,
|
||||
len = snprintf(skx_msg, MSG_SIZE,
|
||||
"%s%s err_code:0x%04x:0x%04x socket:%d imc:%d rank:%d bg:%d ba:%d row:0x%x col:0x%x",
|
||||
overflow ? " OVERFLOW" : "",
|
||||
(uncorrected_error && recoverable) ? " recoverable" : "",
|
||||
@ -553,6 +554,9 @@ static void skx_mce_output_error(struct mem_ctl_info *mci,
|
||||
res->bank_group, res->bank_address, res->row, res->column);
|
||||
}
|
||||
|
||||
if (skx_show_retry_rd_err_log)
|
||||
skx_show_retry_rd_err_log(res, skx_msg + len, MSG_SIZE - len);
|
||||
|
||||
edac_dbg(0, "%s\n", skx_msg);
|
||||
|
||||
/* Call the helper to output message */
|
||||
@ -583,15 +587,12 @@ int skx_mce_check_error(struct notifier_block *nb, unsigned long val,
|
||||
if (adxl_component_count) {
|
||||
if (!skx_adxl_decode(&res))
|
||||
return NOTIFY_DONE;
|
||||
|
||||
mci = get_mci(res.socket, res.imc);
|
||||
} else {
|
||||
if (!skx_decode || !skx_decode(&res))
|
||||
return NOTIFY_DONE;
|
||||
|
||||
mci = res.dev->imc[res.imc].mci;
|
||||
} else if (!skx_decode || !skx_decode(&res)) {
|
||||
return NOTIFY_DONE;
|
||||
}
|
||||
|
||||
mci = res.dev->imc[res.imc].mci;
|
||||
|
||||
if (!mci)
|
||||
return NOTIFY_DONE;
|
||||
|
||||
|
@ -64,6 +64,7 @@ struct skx_dev {
|
||||
u8 src_id, node_id;
|
||||
struct skx_channel {
|
||||
struct pci_dev *cdev;
|
||||
struct pci_dev *edev;
|
||||
struct skx_dimm {
|
||||
u8 close_pg;
|
||||
u8 bank_xor_enable;
|
||||
@ -113,10 +114,11 @@ struct decoded_addr {
|
||||
|
||||
typedef int (*get_dimm_config_f)(struct mem_ctl_info *mci);
|
||||
typedef bool (*skx_decode_f)(struct decoded_addr *res);
|
||||
typedef void (*skx_show_retry_log_f)(struct decoded_addr *res, char *msg, int len);
|
||||
|
||||
int __init skx_adxl_get(void);
|
||||
void __exit skx_adxl_put(void);
|
||||
void skx_set_decode(skx_decode_f decode);
|
||||
void skx_set_decode(skx_decode_f decode, skx_show_retry_log_f show_retry_log);
|
||||
|
||||
int skx_get_src_id(struct skx_dev *d, int off, u8 *id);
|
||||
int skx_get_node_id(struct skx_dev *d, u8 *id);
|
||||
|
@ -135,7 +135,7 @@ static void ti_edac_setup_dimm(struct mem_ctl_info *mci, u32 type)
|
||||
u32 val;
|
||||
u32 memsize;
|
||||
|
||||
dimm = EDAC_DIMM_PTR(mci->layers, mci->dimms, mci->n_layers, 0, 0, 0);
|
||||
dimm = edac_get_dimm(mci, 0, 0, 0);
|
||||
|
||||
val = ti_edac_readl(edac, EMIF_SDRAM_CONFIG);
|
||||
|
||||
|
@ -362,78 +362,6 @@ struct edac_mc_layer {
|
||||
*/
|
||||
#define EDAC_MAX_LAYERS 3
|
||||
|
||||
/**
|
||||
* EDAC_DIMM_OFF - Macro responsible to get a pointer offset inside a pointer
|
||||
* array for the element given by [layer0,layer1,layer2]
|
||||
* position
|
||||
*
|
||||
* @layers: a struct edac_mc_layer array, describing how many elements
|
||||
* were allocated for each layer
|
||||
* @nlayers: Number of layers at the @layers array
|
||||
* @layer0: layer0 position
|
||||
* @layer1: layer1 position. Unused if n_layers < 2
|
||||
* @layer2: layer2 position. Unused if n_layers < 3
|
||||
*
|
||||
* For 1 layer, this macro returns "var[layer0] - var";
|
||||
*
|
||||
* For 2 layers, this macro is similar to allocate a bi-dimensional array
|
||||
* and to return "var[layer0][layer1] - var";
|
||||
*
|
||||
* For 3 layers, this macro is similar to allocate a tri-dimensional array
|
||||
* and to return "var[layer0][layer1][layer2] - var".
|
||||
*
|
||||
* A loop could be used here to make it more generic, but, as we only have
|
||||
* 3 layers, this is a little faster.
|
||||
*
|
||||
* By design, layers can never be 0 or more than 3. If that ever happens,
|
||||
* a NULL is returned, causing an OOPS during the memory allocation routine,
|
||||
* with would point to the developer that he's doing something wrong.
|
||||
*/
|
||||
#define EDAC_DIMM_OFF(layers, nlayers, layer0, layer1, layer2) ({ \
|
||||
int __i; \
|
||||
if ((nlayers) == 1) \
|
||||
__i = layer0; \
|
||||
else if ((nlayers) == 2) \
|
||||
__i = (layer1) + ((layers[1]).size * (layer0)); \
|
||||
else if ((nlayers) == 3) \
|
||||
__i = (layer2) + ((layers[2]).size * ((layer1) + \
|
||||
((layers[1]).size * (layer0)))); \
|
||||
else \
|
||||
__i = -EINVAL; \
|
||||
__i; \
|
||||
})
|
||||
|
||||
/**
|
||||
* EDAC_DIMM_PTR - Macro responsible to get a pointer inside a pointer array
|
||||
* for the element given by [layer0,layer1,layer2] position
|
||||
*
|
||||
* @layers: a struct edac_mc_layer array, describing how many elements
|
||||
* were allocated for each layer
|
||||
* @var: name of the var where we want to get the pointer
|
||||
* (like mci->dimms)
|
||||
* @nlayers: Number of layers at the @layers array
|
||||
* @layer0: layer0 position
|
||||
* @layer1: layer1 position. Unused if n_layers < 2
|
||||
* @layer2: layer2 position. Unused if n_layers < 3
|
||||
*
|
||||
* For 1 layer, this macro returns "var[layer0]";
|
||||
*
|
||||
* For 2 layers, this macro is similar to allocate a bi-dimensional array
|
||||
* and to return "var[layer0][layer1]";
|
||||
*
|
||||
* For 3 layers, this macro is similar to allocate a tri-dimensional array
|
||||
* and to return "var[layer0][layer1][layer2]";
|
||||
*/
|
||||
#define EDAC_DIMM_PTR(layers, var, nlayers, layer0, layer1, layer2) ({ \
|
||||
typeof(*var) __p; \
|
||||
int ___i = EDAC_DIMM_OFF(layers, nlayers, layer0, layer1, layer2); \
|
||||
if (___i < 0) \
|
||||
__p = NULL; \
|
||||
else \
|
||||
__p = (var)[___i]; \
|
||||
__p; \
|
||||
})
|
||||
|
||||
struct dimm_info {
|
||||
struct device dev;
|
||||
|
||||
@ -443,6 +371,7 @@ struct dimm_info {
|
||||
unsigned int location[EDAC_MAX_LAYERS];
|
||||
|
||||
struct mem_ctl_info *mci; /* the parent */
|
||||
unsigned int idx; /* index within the parent dimm array */
|
||||
|
||||
u32 grain; /* granularity of reported error in bytes */
|
||||
enum dev_type dtype; /* memory device type */
|
||||
@ -528,15 +457,10 @@ struct errcount_attribute_data {
|
||||
* (typically, a memory controller error)
|
||||
*/
|
||||
struct edac_raw_error_desc {
|
||||
/*
|
||||
* NOTE: everything before grain won't be cleaned by
|
||||
* edac_raw_error_desc_clean()
|
||||
*/
|
||||
char location[LOCATION_SIZE];
|
||||
char label[(EDAC_MC_LABEL_LEN + 1 + sizeof(OTHER_LABEL)) * EDAC_MAX_LABELS];
|
||||
long grain;
|
||||
|
||||
/* the vars below and grain will be cleaned on every new error report */
|
||||
u16 error_count;
|
||||
int top_layer;
|
||||
int mid_layer;
|
||||
@ -669,4 +593,70 @@ struct mem_ctl_info {
|
||||
bool fake_inject_ue;
|
||||
u16 fake_inject_count;
|
||||
};
|
||||
#endif
|
||||
|
||||
#define mci_for_each_dimm(mci, dimm) \
|
||||
for ((dimm) = (mci)->dimms[0]; \
|
||||
(dimm); \
|
||||
(dimm) = (dimm)->idx + 1 < (mci)->tot_dimms \
|
||||
? (mci)->dimms[(dimm)->idx + 1] \
|
||||
: NULL)
|
||||
|
||||
/**
|
||||
* edac_get_dimm_by_index - Get DIMM info at @index from a memory
|
||||
* controller
|
||||
*
|
||||
* @mci: MC descriptor struct mem_ctl_info
|
||||
* @index: index in the memory controller's DIMM array
|
||||
*
|
||||
* Returns a struct dimm_info * or NULL on failure.
|
||||
*/
|
||||
static inline struct dimm_info *
|
||||
edac_get_dimm_by_index(struct mem_ctl_info *mci, int index)
|
||||
{
|
||||
if (index < 0 || index >= mci->tot_dimms)
|
||||
return NULL;
|
||||
|
||||
if (WARN_ON_ONCE(mci->dimms[index]->idx != index))
|
||||
return NULL;
|
||||
|
||||
return mci->dimms[index];
|
||||
}
|
||||
|
||||
/**
|
||||
* edac_get_dimm - Get DIMM info from a memory controller given by
|
||||
* [layer0,layer1,layer2] position
|
||||
*
|
||||
* @mci: MC descriptor struct mem_ctl_info
|
||||
* @layer0: layer0 position
|
||||
* @layer1: layer1 position. Unused if n_layers < 2
|
||||
* @layer2: layer2 position. Unused if n_layers < 3
|
||||
*
|
||||
* For 1 layer, this function returns "dimms[layer0]";
|
||||
*
|
||||
* For 2 layers, this function is similar to allocating a two-dimensional
|
||||
* array and returning "dimms[layer0][layer1]";
|
||||
*
|
||||
* For 3 layers, this function is similar to allocating a tri-dimensional
|
||||
* array and returning "dimms[layer0][layer1][layer2]";
|
||||
*/
|
||||
static inline struct dimm_info *edac_get_dimm(struct mem_ctl_info *mci,
|
||||
int layer0, int layer1, int layer2)
|
||||
{
|
||||
int index;
|
||||
|
||||
if (layer0 < 0
|
||||
|| (mci->n_layers > 1 && layer1 < 0)
|
||||
|| (mci->n_layers > 2 && layer2 < 0))
|
||||
return NULL;
|
||||
|
||||
index = layer0;
|
||||
|
||||
if (mci->n_layers > 1)
|
||||
index = index * mci->layers[1].size + layer1;
|
||||
|
||||
if (mci->n_layers > 2)
|
||||
index = index * mci->layers[2].size + layer2;
|
||||
|
||||
return edac_get_dimm_by_index(mci, index);
|
||||
}
|
||||
#endif /* _LINUX_EDAC_H_ */
|
||||
|
Loading…
Reference in New Issue
Block a user