Add an EDAC driver for SiFive SoCs. The initial version supports ECC
event monitoring and reporting through the EDAC framework for the SiFive
L2 cache controller. It registers for notifier events from the L2 cache
controller driver (arch/riscv/mm/sifive_l2_cache.c) for L2 ECC events.
[ bp: Massage commit message. ]
Signed-off-by: Yash Shah <yash.shah@sifive.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: James Morse <james.morse@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux-riscv@lists.infradead.org
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: sachin.ghadi@sifive.com
Link: https://lkml.kernel.org/r/1557142026-15949-2-git-send-email-yash.shah@sifive.com
The variable tad_base is being set to a value that is never read and is
being over-written on the next iteration of a for-loop. This assignment
is therefore redundant and can be removed.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: kernel-janitors@vger.kernel.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Link: https://lkml.kernel.org/r/20190508224201.27120-1-colin.king@canonical.com
Do put_device() if device_add() fails.
[ bp: do device_del() for the successfully created devices in
edac_create_csrow_objects(), on the unwind path. ]
Signed-off-by: Greg KH <gregkh@linuxfoundation.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20190427214925.GE16338@kroah.com
In edac_create_csrow_object(), the reference to the object is not
released when adding the device to the device hierarchy fails
(device_add()). This may result in a memory leak.
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1555554438-103953-1-git-send-email-bianpan2016@163.com
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation version 2 of the license
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 315 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190115.503150771@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation this program is
distributed in the hope that it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not write to the free
software foundation inc 59 temple place suite 330 boston ma 02111
1307 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 136 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190530000436.384967451@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 263 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141901.208660670@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 228 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528171438.107155473@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this file may be distributed under the terms of the gnu general
public license version 2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 9 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070034.395589349@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not see http www gnu org licenses
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details [based]
[from] [clk] [highbank] [c] you should have received a copy of the
gnu general public license along with this program if not see http
www gnu org licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 355 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154041.837383322@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have MODULE_LICENCE("GPL*") inside which was used in the initial
scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Correct edac_mc_find()'s return value on error (Robert Richter)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlzdOjIACgkQEsHwGGHe
VUrkxQ/+J+mSL23ush4HNq43OqW7Wgpkdoljo+5hr1XWH790Kc/HNI+B/Qdsgl6I
UoFkJPwMTfxqV/JZFuNC22ADnrXw6nRDZSE7+K+fHmpBgj2rK4hQ6tOYON9dTg/X
9nPF1ylBQcdzwNt4eTTnS5U1eKBSWqmQm5NtYmq2s83DQ1v3r2Pckqx00SC5Stdy
9vQLCPUR5SjsGpeQZnrdesxO89t3pWWzfZrAsCxPiY36VbUVmsHQVbHC9yfpLSIE
6pzc0esQx6zqyb4e0Rx+aKJPJ6fS43NsxifkI81h7S5F/xPEuZy39rhAmnEOYCNE
0ceQwl+ktzzGzmurgXf9Y5LLy8vOcCUJX4HBCkaweHSDRiP1xZtIl7Ao3aAaP4no
AUQqX/O0BydkO+awarhCduM7vIGdr0B9xn0xCtUODitoTIzaoVZ8F2iF7xSOi/3F
QkP4dSsf72ZK/EdA/8k1bxyBnxQ/TKgwLIQEJ1n9Bz7biPxZmqCyHQMDg7HaA1Uc
crGkJJr0cGHb9TEAzy0ygtRNrKE+hpt0FGFGE0zlE4uX7YP1Y/Bw3R38zbcl2x9h
nIqEmHwswVfBdm28RfxRH3LnS9SHwsXTtUk0y89Je1pK0qZ2Kv6nRiEFzWIAqPyY
I/7eorSkNXU+vVnZko8cvuHEuJh1VUj2bGJeS6NUo9X5+IOE0LE=
=yjXI
-----END PGP SIGNATURE-----
Merge tag 'edac_fixes_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC fixes from Borislav Petkov:
- Do not build mpc85_edac as a module (Michael Ellerman)
- Correct edac_mc_find()'s return value on error (Robert Richter)
* tag 'edac_fixes_for_5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC/mc: Fix edac_mc_find() in case no device is found
EDAC/mpc85xx: Prevent building as a module
The function should return NULL in case no device is found, but it
always returns the last checked mc device from the list even if the
index did not match. Fix that.
I did some analysis why this did not raise any issues for about 3 years
and the reason is that edac_mc_find() is mostly used to search for
existing devices. Thus, the bug is not triggered.
[ bp: Drop the if (mci->mc_idx > idx) test in favor of readability. ]
Fixes: c73e8833be ("EDAC, mc: Fix locking around mc_devices list")
Signed-off-by: Robert Richter <rrichter@marvell.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lkml.kernel.org/r/20190514104838.15065-1-rrichter@marvell.com
The mpc85xx EDAC driver can be configured as a module but then fails to
build because it uses two unexported symbols:
ERROR: ".pci_find_hose_for_OF_device" [drivers/edac/mpc85xx_edac_mod.ko] undefined!
ERROR: ".early_find_capability" [drivers/edac/mpc85xx_edac_mod.ko] undefined!
We don't want to export those symbols just for this driver, so make the
driver only configurable as a built-in.
This seems to have been broken since at least
c92132f598 ("edac/85xx: Add PCIe error interrupt edac support")
(Nov 2013).
[ bp: make it depend on EDAC=y so that the EDAC core doesn't get built
as a module. ]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Johannes Thumshirn <jth@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linuxppc-dev@ozlabs.org
Cc: morbidrsa@gmail.com
Link: https://lkml.kernel.org/r/20190502141941.12927-1-mpe@ellerman.id.au
Pull RAS updates from Borislav Petkov:
- Support for varying MCA bank numbers per CPU: this is in preparation
for future CPU enablement (Yazen Ghannam)
- MCA banks read race fix (Tony Luck)
- Facility to filter MCEs which should not be logged (Yazen Ghannam)
- The usual round of cleanups and fixes
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/MCE/AMD: Don't report L1 BTB MCA errors on some family 17h models
x86/MCE: Add an MCE-record filtering function
RAS/CEC: Increment cec_entered under the mutex lock
x86/mce: Fix debugfs_simple_attr.cocci warnings
x86/mce: Remove mce_report_event()
x86/mce: Handle varying MCA bank counts
x86/mce: Fix machine_check_poll() tests for error types
MAINTAINERS: Fix file pattern for X86 MCE INFRASTRUCTURE
x86/MCE: Group AMD function prototypes in <asm/mce.h>
AMD family 17h Models 10h-2Fh may report a high number of L1 BTB MCA
errors under certain conditions. The errors are benign and can safely be
ignored. However, the high error rate may cause the MCA threshold
counter to overflow causing a high rate of thresholding interrupts.
In addition, users may see the errors reported through the AMD MCE
decoder module, even with the interrupt disabled, due to MCA polling.
Clear the "Counter Present" bit in the Instruction Fetch bank's
MCA_MISC0 register. This will prevent enabling MCA thresholding on this
bank which will prevent the high interrupt rate due to this error.
Define an AMD-specific function to filter these errors from the MCE
event pool so that they don't get reported during early boot.
Rename filter function in EDAC/mce_amd to avoid a naming conflict, while
at it.
[ bp: Move function prototype to the internal header and
massage/cleanup, fix typos. ]
Reported-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "clemej@gmail.com" <clemej@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: x86-ml <x86@kernel.org>
Cc: <stable@vger.kernel.org> # 5.0.x: c95b323dcd: x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
Cc: <stable@vger.kernel.org> # 5.0.x: 30aa3d26ed: x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
Cc: <stable@vger.kernel.org> # 5.0.x: 9308fd4074: x86/MCE: Group AMD function prototypes in <asm/mce.h>
Cc: <stable@vger.kernel.org> # 5.0.x
Link: https://lkml.kernel.org/r/20190325163410.171021-2-Yazen.Ghannam@amd.com
Reserve ECC Double Bit Error SMC call to alert U-Boot that a DBE has
occurred. Move the call from local EDAC header file to a common header.
[ bp: Merge the two patches. ]
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Richard Gong <richard.gong@intel.com>
Reviewed-by: Alan Tull <atull@kernel.org> # firmware
Cc: Greg KH <greg@kroah.com>
Cc: James Morse <james.morse@arm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: mchehab@kernel.org
Link: https://lkml.kernel.org/r/1553870639-23895-1-git-send-email-thor.thayer@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
The FIFO memory and ECC initialization doesn't need to be
done as a separate operation early in the startup.
Improve the Arria10 and Stratix10 peripheral FIFO init
by initializing memory and enabling ECC as part of the
device driver initialization.
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1553635771-32693-2-git-send-email-thor.thayer@linux.intel.com
Improve the Arria10 and Stratix10 error injection routine
by reading the data and changing just 1 bit before writing
back out. Previous routine would overwrite the first bytes
to 0 then change 1 bit but this method is less intrusive.
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1553635771-32693-1-git-send-email-thor.thayer@linux.intel.com
AMD systems may support chip select interleaving. However, on family
17h+ this was not taken into account when printing the chip select
sizes.
Add support to detect if chip selects are interleaved on family 17h+,
and adjust the sizes accordingly.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-6-Yazen.Ghannam@amd.com
The struct chip_select array that's used for saving chip select bases
and masks is fixed at length of two. There should be one struct
chip_select for each controller, so this array should be increased to
support systems that may have more than two controllers.
Increase the size of the struct chip_select array to eight, which is the
largest number of controllers per die currently supported on AMD
systems.
Also, carve out the Family 17h+ reading of the bases/masks into a
separate function. This effectively reverts the original bases/masks
reading code to before Family 17h support was added.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-5-Yazen.Ghannam@amd.com
Future AMD systems may support x16 symbol sizes.
Recognize if a system is using x16 symbol size. Also, simplify the print
statement.
Note that a x16 syndrome vector table is not necessary like with x4 or
x8 syndromes. This is because systems that support x16 symbol sizes are
SMCA systems and in that case, the syndrome can be directly extracted
from the MCA_SYND[Syndrome] field.
[ bp: massage. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-4-Yazen.Ghannam@amd.com
The AMD64 EDAC module currently hardcodes the EDAC channel layer size
count to two. Future AMD systems may have more channels than this.
Set the EDAC channel layer size equal to the maximum number of channels
possible for the system. On Family 17h and later, this is set in the
num_umcs variable. Older systems will continue to use two as the
default.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190325203319.7603-1-Yazen.Ghannam@amd.com
The first few models of Family 17h all had 2 Unified Memory Controllers
per Die, so this was treated as a fixed value. However, future systems
may have more Unified Memory Controllers per Die.
Related to this, the channel number and base address of a Unified Memory
Controller were found by matching on fixed, known values. However,
current and future systems follow this pattern for the channel number
and base address of a Unified Memory Controller: 0xYXXXXX, where Y is
the channel number. So matching on hardcoded values is not necessary.
Set the number of Unified Memory Controllers at driver init time based
on the family/model. Also, update the functions that find the channel
number and base address of a Unified Memory Controller to support more
than two.
[ bp: Move num_umcs into the .c file and simplify comment. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-3-Yazen.Ghannam@amd.com
Define and use a macro for looping over the number of Unified Memory
Controllers.
No functional change.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-2-Yazen.Ghannam@amd.com
Add the new Family 17h Model 30h PCI IDs to the AMD64 EDAC module.
This also fixes a probe failure that appeared when some other PCI IDs
for Family 17h Model 30h were added to the AMD NB code.
Fixes: be3518a16e (x86/amd_nb: Add PCI device IDs for family 17h, model 30h)
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190228153558.127292-1-Yazen.Ghannam@amd.com
Stratix10 Double Bit Error Address was always read from SDRAM Address
register instead of each device's Address register.
To determine which device had the DBE, cycle through the EDAC devices
comparing the DBE value to the db_irq value. Once found, report the DBE
Address from the device registers as well as the device name.
Finally, notify the system via an SMC call and indicate the panic should
result in a system reboot. Change a run-time check to a Stratix10
compile-time check for a clean SMC notification.
Fixes: d5fc912556 ("EDAC, altera: Combine Stratix10 and Arria10 probe functions")
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/1552490842-25440-1-git-send-email-thor.thayer@linux.intel.com
The following Kconfig constellations fail randconfig builds:
CONFIG_ACPI_NFIT=y
CONFIG_EDAC_DEBUG=y
CONFIG_EDAC_SKX=m
CONFIG_EDAC_I10NM=y
or
CONFIG_ACPI_NFIT=y
CONFIG_EDAC_DEBUG=y
CONFIG_EDAC_SKX=y
CONFIG_EDAC_I10NM=m
with:
...
CC [M] drivers/edac/skx_common.o
...
.../skx_common.o:.../skx_common.c:672: undefined reference to `__this_module'
That is because if one of the two drivers - skx_edac or i10nm_edac - is
built-in and the other one is a module, the shared file skx_common.c
gets linked into a module object by kbuild. Therefore, when linking that
same file into vmlinux, the '__this_module' symbol used in debugfs isn't
defined, leading to the above error.
Fix it by moving all debugfs code from skx_common.c to both skx_base.c
and i10nm_base.c respectively. Thus, skx_common.c doesn't refer to the
'__this_module' symbol anymore.
Clarify skx_common.c's purpose at the top of the file for future
reference, while at it.
[ bp: Make text more readable. ]
Fixes: d4dc89d069 ("EDAC, i10nm: Add a driver for Intel 10nm server processors")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190321221339.GA32323@agluck-desk
Pull RAS updates from Borislav Petkov:
"This time around we have in store:
- Disable MC4_MISC thresholding banks on all AMD family 0x15 models
(Shirish S)
- AMD MCE error descriptions update and error decode improvements
(Yazen Ghannam)
- The usual smaller conversions and fixes"
* 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Improve error message when kernel cannot recover, p2
EDAC/mce_amd: Decode MCA_STATUS in bit definition order
EDAC/mce_amd: Decode MCA_STATUS[Scrub] bit
EDAC, mce_amd: Print ExtErrorCode and description on a single line
EDAC, mce_amd: Match error descriptions to latest documentation
x86/MCE/AMD, EDAC/mce_amd: Add new error descriptions for some SMCA bank types
x86/MCE/AMD, EDAC/mce_amd: Add new McaTypes for CS, PSP, and SMU units
x86/MCE/AMD, EDAC/mce_amd: Add new MP5, NBIO, and PCIE SMCA bank types
RAS: Add a MAINTAINERS entry
RAS: Use consistent types for UUIDs
x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
x86/MCE: Switch to use the new generic UUID API
* New i10nm EDAC driver for Intel 10nm CPUs (Qiuxu Zhuo and Tony Luck)
* Altera SDRAM functionality carveout for separate enablement of RAS and
SDRAM capabilities on some Altera chips. (Thor Thayer)
* The usual round of cleanups and fixes
Last but not least:
* Recruit James Morse as a reviewer for the ARM side
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlx+SqYACgkQEsHwGGHe
VUpwxg//fDdlIcnNjPUKWcBQxfy7meFd5xlDwbbIbkdE1mfHLBP6n2gRVM9NguSm
shYPXcdqIrFTn4D7nOxVLS2Gqa7cF/j9M+YaqTNfe9/OVI0oSeM84D2+kEUi2tHQ
LkCbBL9W+SAk4wjcFUPrEuwPaABfPdt0g9wuEf3Yg+PQsZ4FojwF7p91plBiKo/X
GewLIM4+QT/mIkyn5u+2UJWayUvtdc1nchBGg3klYaDTRsUqH9pn284bInj7/Woj
r34288yXuksIhDnUd2h4F9RCdZegBLIZf/k7Rqdg+Acot64c3PprE+/SI9nFcYfn
fcF/48Sv6vMfP5kDKeJhsDjWu85VdpP+Cp4bxebXx4NURWn30kyYGDdpvbpgWxzc
XDOXiEDxfh43/dNEyqCRr86dcZS8ro1pQNlnQvxOJyMljdEGjbB4JizG2ZvVluBP
hSu3ifgpTiBGJMRQQijha41SMuWE7Z1ZgZt/XnyPAKwEEFtQVrm7IfnDohag3VYw
6kWMVeyenmx/yF1JmA0fTxAdeeZPMnbUx0JxHRo1wJXF+1b19b0P+1nYUjgKlXQN
Wq78DGPkQ9InfISFegS/A2AMWk+ZgLZ5d4pVwRVWdyeOMQVUoXO4R3KQur1tV7gu
vm5BpWRZUszhcVvuhly8fOTyOsudYsNe7EeMd2V0Q2FZBy81MH8=
=A1kA
-----END PGP SIGNATURE-----
Merge tag 'edac_for_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC updates from Borislav Petkov:
- A new EDAC AST 2500 SoC driver (Stefan M Schaeckeler)
- New i10nm EDAC driver for Intel 10nm CPUs (Qiuxu Zhuo and Tony Luck)
- Altera SDRAM functionality carveout for separate enablement of RAS
and SDRAM capabilities on some Altera chips. (Thor Thayer)
- The usual round of cleanups and fixes
And last but not least: recruit James Morse as a reviewer for the ARM
side.
* tag 'edac_for_5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC/altera: Add separate SDRAM EDAC config
EDAC, altera: Add missing of_node_put()
EDAC, skx_common: Add code to recognise new compound error code
EDAC, i10nm: Fix randconfig builds
EDAC, i10nm: Add a driver for Intel 10nm server processors
EDAC, skx_edac: Delete duplicated code
EDAC, skx_common: Separate common code out from skx_edac
EDAC: Do not check return value of debugfs_create() functions
EDAC: Add James Morse as a reviewer
dt-bindings, EDAC: Add Aspeed AST2500
EDAC, aspeed: Add an Aspeed AST2500 EDAC driver
The CONFIG_ALTERA_EDAC Kconfig symbol always enables the SDRAM EDAC
functionality. On the newer architectures, however, there are cases
where the peripheral EDAC functionality is enabled but SDRAM needs to be
disabled.
Move SDRAM functions so they can be contained inside the conditional
CONFIG. Create new CONFIG option just for SDRAM.
[ bp: Massage commit message. ]
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: dinguyen@kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: linux@armlinux.org.uk
Link: https://lkml.kernel.org/r/1551121006-4657-2-git-send-email-thor.thayer@linux.intel.com
Previous AMD systems have had a bit in MCA_STATUS to indicate that an
error was detected on a scrub operation. However, this bit was defined
differently within different banks and families/models.
Starting with Family 17h, MCA_STATUS[40] is either Reserved/Read-as-Zero
or defined as "Scrub", for all MCA banks and CPU models. Therefore, this
bit can be defined as the "Scrub" bit.
Define MCA_STATUS[40] as "Scrub" and decode it in the AMD MCE decoding
module for Family 17h and newer systems.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190212212417.107049-1-Yazen.Ghannam@amd.com
The call to of_parse_phandle() returns a node pointer with refcount
incremented thus it must be explicitly decremented here after the last
usage.
Signed-off-by: Huang Zijiang <huang.zijiang@zte.com.cn>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thor Thayer <thor.thayer@linux.intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: wang.yi59@zte.com.cn
Link: https://lkml.kernel.org/r/1550126347-27984-1-git-send-email-huang.zijiang@zte.com.cn
A new error code for systems that use DRAM as an extra level of cache
looks like:
000F 0010 1MMM CCCC
where the MMM and CCCC bits are used for the same purpose as the
original code. For this new class of errors the ADXL translation will
provide details of both the DIMM used as cache for the error location
and the component that is being cached.
Note: This new error code is first supported in Skylake. Older EDAC
drivers do not need to be updated.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190205182109.27828-1-tony.luck@intel.com
Save a log line by printing the extended error code and the description
on a single line. This is similar to how errors are printed in other
subsystems, e.g. "#, description". If we don't have a valid description
then only the number/code is printed.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190201225534.8177-6-Yazen.Ghannam@amd.com
Update the error descriptions to match the latest documentation for
easier searching. In some cases the changes are small and in other cases
the changes may be total rewording of the description.
No functional changes.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86@kernel.org
Link: https://lkml.kernel.org/r/20190201225534.8177-5-Yazen.Ghannam@amd.com
Some SMCA bank types on future systems will report new error types even
though the bank type is not treated as a new version. These new error
types will reported by bits that are reserved in past systems.
Add the new error descriptions to the lists in edac_mce_amd.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190201225534.8177-4-Yazen.Ghannam@amd.com
The existing CS, PSP, and SMU SMCA bank types will see new versions (as
indicated by their McaTypes) in future SMCA systems.
Add the new (HWID, MCATYPE) tuples for these new versions. Reuse the
same names as the older versions, since they are logically the same to
the user. SMCA systems won't mix and match IP blocks with different
McaType versions in the same system, so there isn't a need to
distinguish them. The MCA_IPID register is saved when logging an MCA
error, and that can be used to triage the error.
Also, add the new error descriptions to edac_mce_amd. Some error types
(positions in the list) are overloaded compared to the previous
McaTypes. Therefore, just create new lists of the error descriptions to
keep things simple even if some of the error descriptions are the same
between versions.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Pu Wen <puwen@hygon.cn>
Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Shirish S <Shirish.S@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190201225534.8177-3-Yazen.Ghannam@amd.com
This driver supports the Intel 10nm series server integrated memory
controller. It gets the memory capacity and topology information by
reading the registers in PCI configuration space and memory-mapped I/O.
It decodes the memory error address to the platform specific address
by using the ACPI Address Translation (ADXL) Device Specific Method
(DSM).
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190130191519.15393-5-tony.luck@intel.com
Delete the duplicated code from skx_edac.c and rename skx_edac.c to
skx_base.c. Update the Makefile to build the skx_edac driver from
skx_base.c and skx_common.c.
Add SPDX to skx_base.c and clean out unnecessary #include lines.
[ bp: Drop the license boilerplate - there's an SPDX identifier now. ]
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190130191519.15393-4-tony.luck@intel.com
Parts of skx_edac can be shared with the Intel 10nm server EDAC driver.
Carve out the common parts from skx_edac in preparation to support both
skx_edac driver and i10nm_edac drivers.
Co-developed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190130191519.15393-3-tony.luck@intel.com
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
[ bp: Make edac_debugfs_init() return void too, while at it. ]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: James Morse <james.morse@arm.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: https://lkml.kernel.org/r/20190122152151.16139-17-gregkh@linuxfoundation.org