linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-15 14:36:46 +07:00

Author	SHA1	Message	Date
Qiuxu Zhuo	133e4455c9	EDAC, sb_edac: Avoid creating SOCK memory controller Xiaolong Ye reported the following failure on Broadwell D server: EDAC sbridge: Some needed devices are missing EDAC MC: Removed device 0 for sbridge_edac.c Broadwell SrcID#0_Ha#0: DEV 0000:ff:12.0 EDAC sbridge: Couldn't find mci handler EDAC sbridge: Failed to register device with error -19. Broadwell D (only IMC0 per socket) and Broadwell X (IMC0 and IMC1 per socket) use the same PCI device IDs for IMC0 per socket, then they share pci_dev_descr_broadwell_table (n_imcs_per_sock=2). In this case, Broadwell D wrongly creates the nonexistent SOCK EDAC memory controller and reports above error messages, since it has no IMC1 per socket. Avoid creating the nonexistent SOCK memory controller. Reported-and-tested-by: Xiaolong Ye <xiaolong.ye@intel.com> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170608113351.25323-1-qiuxu.zhuo@intel.com [ Massage. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-06-14 11:53:39 +02:00
Qiuxu Zhuo	d14e3a201f	EDAC, sb_edac: Bump driver version and do some cleanups Collapse 'case:' in *_mci_bind_devs() and update driver version from 1.1.1 to 1.1.2. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000934.87971-1-qiuxu.zhuo@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 15:00:36 +02:00
Qiuxu Zhuo	4d475dde79	EDAC, sb_edac: Check if ECC enabled when at least one DIMM is present This is based on previous work by Patrick Geary, see Link. Additional cleanups ontop: - Remove the code to read MCMTR from pci_ha1_ta and CHN_TO_HA macro, now that TA0 and TA1 are unified. - Remove get_pdev_same_bus(), since in get_dimm_config() the variable "pvt->pci_ta" for KNL is also ready, we can simply use pci_read_config_dword(pvt->pci_ta, KNL_MCMTR, &pvt->info.mcmtr) to read MCMTR. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: https://lkml.kernel.org/r/57884350.1030401@supermicro.com Link: http://lkml.kernel.org/r/20170523000910.87925-1-qiuxu.zhuo@intel.com [ Make __populate_dimms() return int. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 14:57:52 +02:00
Qiuxu Zhuo	3286d3eb90	EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4 We don't need this quirk anymore now that the EDAC memory controller representation matches the hardware. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000834.87881-1-qiuxu.zhuo@intel.com [ Commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 14:40:40 +02:00
Borislav Petkov	6696522957	EDAC, sb_edac: Carve out dimm-populating loop ... to slim down get_dimm_config(). No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 14:37:34 +02:00
Borislav Petkov	199389acd9	EDAC, sb_edac: Fix mod_name It is called "sb_edac.c" now. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 14:37:33 +02:00
Qiuxu Zhuo	e2f747b1f4	EDAC, sb_edac: Assign EDAC memory controller per h/w controller Tony pointed out: "currently the driver pretends there is one big 8-channel memory controller per socket instead of 2 4-channel controllers. This is fine with all memory controller populated with symmetrical DIMM configurations, but runs into difficulties on asymmetrical setups". Restructure the driver to assign an EDAC memory controller to each real h/w memory controller to resolve the issue. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000731.87793-1-qiuxu.zhuo@intel.com [ Break some lines at convenient points. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 14:37:21 +02:00
Tony Luck	7fd562b75d	EDAC, sb_edac: Don't use "Socket#" in the memory controller name EDAC assigns logical memory controller numbers in the order that we find memory controllers, which depends on which PCI bus they are on. Some systems end up with MC0 on socket0, others (e.g Haswell) have MC0 on socket3. All this is made more confusing for users because we use the string "Socket" while generating names for memory controllers, but the number that we attach there is the memory controller number. E.g. EDAC MC0: Giving out device to module sbridge_edac.c controller Haswell Socket#0: DEV 0000:ff:12.0 (INTERRUPT) Change the names to say "SrcID#%d" (where the number we use is read from the h/w associated with the memory controller instead of some logical number internal to the EDAC driver). New message: EDAC MC0: Giving out device to module sbridge_edac.c controller Haswell SrcID#3: DEV 0000:ff:12.0 (INTERRUPT) Reported-by: Andrey Korolyov <andrey@xdel.ru> Reported-by: Patrick Geary <patrickg@supermicro.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000603.87748-1-qiuxu.zhuo@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 11:47:11 +02:00
Qiuxu Zhuo	00cf50d90a	EDAC, sb_edac: Classify PCI-IDs by topology Each of the PCI device IDs belongs to a CPU socket, or to one of the integrated memory controllers. Provide an enum to specify the domain of each, and distinguish the resource number in each domain: the number of the PCI device IDs per integrated memory controller/socket, and the number of integrated memory controllers per socket. Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170523000533.87704-1-qiuxu.zhuo@intel.com [ Realign pci_dev_descr_knl members. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-05-25 11:19:25 +02:00
Borislav Petkov	bffc7dece9	EDAC: Rename report status accessors Change them to have the edac_ prefix. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>	2017-04-10 17:15:02 +02:00
Linus Torvalds	60c906bab1	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "The main changes in this cycle were: - Assign notifier chain priorities for all RAS related handlers to make the ordering explicit (Borislav Petkov) - Improve the AMD MCA banks sysfs output (Yazen Ghannam) - Various cleanups and restructuring of the x86 RAS code (Borislav Petkov)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority x86/ras: Get rid of mce_process_work() EDAC/mce/amd: Dump TSC value EDAC/mce/amd: Unexport amd_decode_mce() x86/ras/amd/inj: Change dependency x86/ras: Flip the TSC-adding logic x86/ras/amd: Make sysfs names of banks more user-friendly x86/ras/therm_throt: Do not log a fake MCE for thermal events x86/ras/inject: Make it depend on X86_LOCAL_APIC=y	2017-02-20 12:47:44 -08:00
Borislav Petkov	9026cc82b6	x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority Assign all notifiers on the MCE decode chain a priority so that they get called in the correct order. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-10-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-24 09:14:57 +01:00
Nicolas Iooss	127c1225bf	EDAC, sb_edac: Get rid of ->show_interleave_mode() Function sbridge_register_mci() sets pvt->info.show_interleave_mode to knl_show_interleave_mode() on Knight's Landing and show_interleave_mode() anywhere else. Merge show_interleave_mode() and knl_show_interleave_mode() in a single implementation and use it without an indirect function pointer. Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170122172806.10412-1-nicolas.iooss_linux@m4x.org [ Call it get_intlv_mode_str(). ] Signed-off-by: Borislav Petkov <bp@suse.de>	2017-01-23 11:39:48 +01:00
Mauro Carvalho Chehab	78d88e8a3d	edac: rename edac_core.h to edac_mc.h Now, all left at edac_core.h are at drivers/edac/edac_mc.c, so rename it to edac_mc.h. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>	2016-12-15 08:54:51 -02:00
Piotr Luc	9a9260ca92	EDAC, sb_edac: Add Knights Mill support Add Knights Mill (KNM) to the list of CPU models supported by sb_edac. Signed-off-by: Piotr Luc <piotr.luc@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20161013153105.2517-6-piotr.luc@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-10-19 12:37:44 +02:00
Dave Hansen	20f4d69243	EDAC, {sb,skx}_edac: Use Intel model macros instead of open-coding them We now have symbolic names for a bunch of Intel CPU models via asm/intel-family.h. The original conversion missed the EDAC drivers. Convert them. Signed-off-by: Dave Hansen <dave.hansen@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20160929204321.9FAE5F84@viggo.jf.intel.com [ Remove comment, macro name is descriptive enough. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-10-19 12:32:40 +02:00
Linus Torvalds	19fe416532	* Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO buffers (Thor Thayer) * Split the memory controller part out of mpc85xx and share it with a * new Freescale ARM Layerscape driver (York Sun) * amd64_edac fixes (Yazen Ghannam) * Misc cleanups, refactoring and fixes all over the place -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJX80EwAAoJEBLB8Bhh3lVKWwYP/1bbODQ7o+XhO8IaCDffYk30 8y4WdSnI0/QcP8JbSvFA7y6Zn4L0BbrbYhKLRDAg9c34V2bMaqonCnkDtT6YatUb 6l0H/hQ/Cah9AOm5PJLYg6O9s+ZBT8zA5b+F2Z9kUsuB6LSnVhp9skNrH6KPlm0U 4pFaLnHQenQPZbuRCfRxPU49ZuKtBZtQDkJLJlHXwn7e1qZy2Q4tMnnEtsY6U2ea t3Hj+F8g+cdoiTQXOceCcOTR8GqDI6szgzn7vpXAGYvljBndszauAkxO7by79jg1 I8AQfgwoBF5CYL2Q0pzT1maHmmG2sydeRAHIvhmGxiEfFz1abWhriXbS33c32q8a iFiVMAUIaSKpB/sB+5w5ymuBctI1mX5EQVW+8Xl2Gxt+olnhdJMocHnvQdYkfsYm Ka8LcbaiK6ZQTbs/cIMOc2paE0AFPu5uXKHCPeZlhQAxOBvSPuDAv0+qUB/of5Uq 1SPidtsTmCI7X2hrdHAH9hLEkSjq68v3kqL5YnZL3H4gA3WohQEmX9ybjk097Kus WWEhdi/PSFX0qQKotMUUDuxfNcKI6PZH9p+i2dN6tNCkiTDdb0Eo5lCXN7RVVhvq qfE0Fcc4uDzh5MUS5jT58MWpA1cfdu9jbAf2BwFIU/poJcaeqy/SMyzCL+1D2/u6 dmDAtQbKUUwiltB8QzQd =pcI8 -----END PGP SIGNATURE----- Merge tag 'edac_for_4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: "A lot of movement in the EDAC tree this time around, coarse summary below: - Altera Arria10 enablement of NAND, DMA, USB, QSPI and SD-MMC FIFO buffers (Thor Thayer) - split the memory controller part out of mpc85xx and share it with a new Freescale ARM Layerscape driver (York Sun) - amd64_edac fixes (Yazen Ghannam) - misc cleanups, refactoring and fixes all over the place" * tag 'edac_for_4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (37 commits) EDAC, altera: Add IRQ Flags to disable IRQ while handling EDAC, altera: Correct EDAC IRQ error message EDAC, amd64: Autoload module using x86_cpu_id EDAC, sb_edac: Remove NULL pointer check on array pci_tad EDAC: Remove NO_IRQ from powerpc-only drivers EDAC, fsl_ddr: Fix error return code in fsl_mc_err_probe() EDAC, fsl_ddr: Add entry to MAINTAINERS EDAC: Move Doug Thompson to CREDITS EDAC, I3000: Orphan driver EDAC, fsl_ddr: Replace simple_strtoul() with kstrtoul() EDAC, layerscape: Add Layerscape EDAC support EDAC, fsl_ddr: Fix IRQ dispose warning when module is removed EDAC, fsl_ddr: Add support for little endian EDAC, fsl_ddr: Add missing DDR DRAM types EDAC, fsl_ddr: Rename macros and names EDAC, fsl-ddr: Separate FSL DDR driver from MPC85xx EDAC, mpc85xx: Replace printk() with pr_* format EDAC, mpc85xx: Drop setting/clearing RFXE bit in HID1 EDAC, altera: Rename MC trigger to common name EDAC, altera: Rename device trigger to common name ...	2016-10-04 12:06:26 -07:00
Colin Ian King	c7c35407cd	EDAC, sb_edac: Remove NULL pointer check on array pci_tad pvt->pci_tad is a NUM_CHANNELS array of struct pci_dev pointers and hence cannot be NULL, so the NULL pointer check on pci_tad is redundant. Remove it. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20160908083801.14766-1-colin.king@canonical.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-09-12 20:15:43 +02:00
Lukasz Odzioba	c5b48fa7e2	EDAC, sb_edac: Fix channel reporting on Knights Landing On Intel Xeon Phi Knights Landing processor family the channels of the memory controller have untypical arrangement - MC0 is mapped to CH3,4,5 and MC1 is mapped to CH0,1,2. This causes the EDAC driver to report the channel name incorrectly. We missed this change earlier, so the code already contains similar comment, but the translation function is incorrect. Without this patch: errors in DIMM_A and DIMM_D were reported in DIMM_D errors in DIMM_B and DIMM_E were reported in DIMM_E errors in DIMM_C and DIMM_F were reported in DIMM_F Correct this. Hubert Chrzaniuk: - rebased to 4.8 - comments and code cleanup Fixes: `d0cdf90031` ("sb_edac: Add Knights Landing (Xeon Phi gen 2) support") Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Cc: lukasz.odzioba@intel.com Cc: mchehab@kernel.org Cc: <stable@vger.kernel.org> # v4.5.. Link: http://lkml.kernel.org/r/1469231089-22837-1-git-send-email-lukasz.odzioba@intel.com Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com> [ Boris: Simplify a bit by removing char mc. ] Signed-off-by: Borislav Petkov <bp@suse.de>	2016-08-08 05:52:08 +02:00
Tony Luck	0ba169ac36	EDAC, sb_edac: Fix Knights Landing In commit `2c1ea4c700` ("EDAC, sb_edac: Use cpu family/model in driver detection") I broke Knights Landing because I failed to notice that it called a wrapper macro "sbridge_get_all_devices_knl" instead of "sbridge_get_all_devices" like all the other types. Now that we include the processor type in the pci_id_table structure we can skip the wrappers and just have the sbridge_get_all_devices() check the type to decide whether to allow duplicate devices and controllers to have registers spread across buses. Fixes: `2c1ea4c700` ("EDAC, sb_edac: Use cpu family/model in driver detection") Tested-by: Lukasz Odzioba <lukasz.odzioba@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-07-16 06:11:59 +09:00
Tony Luck	665f05e0b8	EDAC, sb_edac: Readd accidentally dropped Broadwell-D support In commit `2c1ea4c700` ("EDAC, sb_edac: Use cpu family/model in driver detection") we switched from using PCI ids to determine which platform we are running on to using CPU model instead. I forgot that Broadwell-DE has its own distinct model number different from Broadwell-EP or -EX. Fixing this isn't just adding a line to the array of cpuids - the exising code assumed a 1:1 mapping between entries in that array and the "enum type" values. Added the type to pci_id_table structure to remove this dependency and allows two Broadwell cpu models. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: `2c1ea4c700` ("EDAC, sb_edac: Use cpu family/model in driver detection") Link: http://lkml.kernel.org/r/b3cffe40dec6dfe0235a5d52a504f0ba86a07ce7.1464902605.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-03 17:28:21 +02:00
Tony Luck	c7103f650a	EDAC, sb_edac: Fix rank lookup on Broadwell Broadwell made a small change to the rank target register moving the target rank ID field up from bits 16:19 to bits 20:23. Also found that the offset field grew by one bit in the IVY_BRIDGE to HASWELL transition, so fix the RIR_OFFSET() macro too. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org # v3.19+ Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/2943fb819b1f7e396681165db9c12bb3df0e0b16.1464735623.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-06-03 10:05:49 +02:00
Linus Torvalds	1cc3880a3c	* Altera Arria10 L2 cache and On-Chip RAM ECC handling. (Thor Thayer) * Remove ad-hoc buffering of MCE records in sb_edac and i7core_edac. (Tony Luck) * Do not register sb_edac with pci_register_driver(). (Tony Luck) * Add support for Skylake to ie31200_edac. (Jason Baron) * Do not register amd64_edac with pci_register_driver(). (Borislav Petkov) + the usual round of cleanups and fixes all over the place. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXOaHaAAoJEBLB8Bhh3lVKRdoP/jDewaX4GAxgcKaK9Kx7VjMe i42E/7syh5iLoyFgZ1hUjvqiQHN4RyWloUYyNHbNxGS9uXCMXIUoJQd2FjOY442s oeEaoMCfZsJLzKQti3fUPK9GugZ57BSZxfn6NlJPpyG4FxSirOcZFCxGaNuyqqrh 0M+gFvbWfZcVDLE0FI5CUYZWRxsk//+pLM4ZkQZFA8Eo1tGVc3r0zEEAlL/uEMDW P0ATvRHNUeib9YQGTdSD7cNUpRX4SX+T8lDUiaVm+tHL5ES3b4rayMBdbVMrahfc Gnxu9GtO5gWXEY6QDDpOx+VkbfFmDFodx833psJ3MVD8evEFHHdinkgDaptLrrV6 92ZDKR5s3W6tKXkcmGuExrtc17UgjLcRZCebXbv+5FJlVrslzvQ9ESOTRyiZBrCD ZpFi2TZhpYU1uEWuoBZCbWNXW2pcSt7/bQ9bYUvfrvNfgPzPnblubJuVKRcfY0WB x2vj0PNnckpoPRvskV7GEe0Y/JISzAxBQUK6XO+GJgMgz5M+23SEaSVU5yeyf26e x/yD5yImQGN84AxfRMMvbR2JvpVLN3vdFWtWigneht160erVA/Qgw5Gcrpw53tZr zPD3fGaetrNgS8BvJAy/c6xHV1j6A1RGCGThy8Ivqep9WD90a5JhR1NRjZp6m8sf aB0A1zTP7e+hRFkZGahO =ud2P -----END PGP SIGNATURE----- Merge tag 'edac_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: "It was pretty busy in EDAC land this time: - Altera Arria10 L2 cache and On-Chip RAM ECC handling (Thor Thayer) - Remove ad-hoc buffering of MCE records in sb_edac and i7core_edac (Tony Luck) - Do not register sb_edac with pci_register_driver() (Tony Luck) - Add support for Skylake to ie31200_edac (Jason Baron) - Do not register amd64_edac with pci_register_driver() (Borislav Petkov) ... plus the usual round of cleanups and fixes all over the place" * tag 'edac_for_4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (25 commits) EDAC, amd64_edac: Drop pci_register_driver() use EDAC, ie31200_edac: Add Skylake support EDAC, sb_edac: Use cpu family/model in driver detection EDAC, i7core: Remove double buffering of error records EDAC, amd64_edac: Issue driver banner only on success ARM: socfpga: Initialize Arria10 OCRAM ECC on startup EDAC: Increment correct counter in edac_inc_ue_error() EDAC, sb_edac: Remove double buffering of error records EDAC: Fix used after kfree() error in edac_unregister_sysfs() EDAC, altera: Avoid unused function warnings EDAC, altera: Remove useless casts ARM: socfpga: Enable Arria10 OCRAM ECC on startup EDAC, altera: Add Arria10 OCRAM ECC support Documentation: dt: socfpga: Add Altera Arria10 OCRAM binding EDAC, altera: Make OCRAM ECC dependency check generic EDAC, altera: Add register offset for ECC Enable EDAC, altera: Extract error inject operations to a struct fops ARM: socfpga: Enable Arria10 L2 cache ECC on startup EDAC, altera: Add Arria10 L2 Cache ECC handling Documentation, dt, socfpga: Add Altera Arria10 L2 cache binding ...	2016-05-16 18:44:39 -07:00
Tony Luck	2c1ea4c700	EDAC, sb_edac: Use cpu family/model in driver detection Instead of picking a random PCI ID from the dozen or so we need to access, just use x86_match_cpu() to pick based on CPU model number. The choosing of PCI devices has been problematic in the past, see `11249e7399` ("sb_edac: Fix detection on SNB machines") which fixed problems introduced by `d0585cd815` ("sb_edac: Claim a different PCI device"). This is especially ugly if future hardware might not even have EDAC-relevant registers in PCI config space and we would still be required to choose some "random" PCI devices to scan for just so our driver loads. Is this cleaner/clearer? It deletes much more code than it adds. Only tested on Broadwell. The driver loads/unloads and loads again. Still decodes errors too. Signed-off-by: Tony Luck <tony.luck@intel.com> Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Borislav Petkov <bp@suse.de>	2016-05-02 19:44:43 +02:00
Tony Luck	c4fc1956fa	EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback Both of these drivers can return NOTIFY_BAD, but this terminates processing other callbacks that were registered later on the chain. Since the driver did nothing to log the error it seems wrong to prevent other interested parties from seeing it. E.g. neither of them had even bothered to check the type of the error to see if it was a memory error before the return NOTIFY_BAD. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/72937355dd92318d2630979666063f8a2853495b.1461864507.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-29 15:43:10 +02:00
Tony Luck	ad08c4e974	EDAC, sb_edac: Remove double buffering of error records In the bad old days the functions from x86_mce_decoder_chain could be called in machine check context. So we used to carefully copy them and defer processing until later. But in `f29a7aff4b` ("x86/mce: Avoid potential deadlock due to printk() in MCE context") we switched the logging code to save the record in a genpool, and call the functions that registered to be notified later from a work queue. So drop all the double buffering and do all the work we want to do as soon as sbridge_mce_check_error() is called. Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: patrickg@supermicro.com Link: http://lkml.kernel.org/r/100025611cd780d9bca72792b2b2146760da53e0.1460756761.git.tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-04-23 14:02:02 +02:00
Tony Luck	ea5dfb5fae	x86 EDAC, sb_edac.c: Take account of channel hashing when needed Haswell and Broadwell can be configured to hash the channel interleave function using bits [27:12] of the physical address. On those processor models we must check to see if hashing is enabled (bit21 of the HASWELL_HASYSDEFEATURE2 register) and act accordingly. Based on a patch by patrickg <patrickg@supermicro.com> Tested-by: Patrick Geary <patrickg@supermicro.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-04-22 10:10:01 +02:00
Tony Luck	ff15e95c82	x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address In commit: `eb1af3b71f` ("Fix computation of channel address") I switched the "sck_way" variable from holding the log2 value read from the h/w to instead be the actual number. Unfortunately it is needed in log2 form when used to shift the address. Tested-by: Patrick Geary <patrickg@supermicro.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Fixes: `eb1af3b71f` ("Fix computation of channel address") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-04-22 10:10:01 +02:00
Linus Torvalds	d88bfe1d68	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "Various RAS updates: - AMD MCE support updates for future CPUs, fixes and 'SMCA' (Scalable MCA) error decoding support (Aravind Gopalakrishnan) - x86 memcpy_mcsafe() support, to enable smart(er) hardware error recovery in NVDIMM drivers, based on an extension of the x86 exception handling code. (Tony Luck)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: EDAC/sb_edac: Fix computation of channel address x86/mm, x86/mce: Add memcpy_mcsafe() x86/mce/AMD: Document some functionality x86/mce: Clarify comments regarding deferred error x86/mce/AMD: Fix logic to obtain block address x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errors x86/mce: Move MCx_CONFIG MSR definitions x86/mce: Check for faults tagged in EXTABLE_CLASS_FAULT exception table entries x86/mm: Expand the exception table logic to allow new handling options x86/mce/AMD: Set MCAX Enable bit x86/mce/AMD: Carve out threshold block preparation x86/mce/AMD: Fix LVT offset configuration for thresholding x86/mce/AMD: Reduce number of blocks scanned per bank x86/mce/AMD: Do not perform shared bank check for future processors x86/mce: Fix order of AMD MCE init function call	2016-03-14 18:43:51 -07:00
Luck, Tony	eb1af3b71f	EDAC/sb_edac: Fix computation of channel address Large memory Haswell-EX systems with multiple DIMMs per channel were sometimes reporting the wrong DIMM. Found three problems: 1) Debug printouts for socket and channel interleave were not interpreting the register fields correctly. The socket interleave field is a 2^X value (0=1, 1=2, 2=4, 3=8). The channel interleave is X+1 (0=1, 1=2, 2=3. 3=4). 2) Actual use of the socket interleave value didn't interpret as 2^X 3) Conversion of address to channel address was complicated, and wrong. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-10 18:31:55 +01:00
Hubert Chrzaniuk	83bdaad4d9	EDAC, sb_edac: Fix logic when computing DIMM sizes on Xeon Phi Correct a typo introduced by `d0cdf90031` ("EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) support") As a result under some configurations DIMMs were not correctly recognized. Problem affects only Xeon Phi architecture. Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1457361045-26221-1-git-send-email-hubert.chrzaniuk@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2016-03-07 19:07:40 +01:00
Hubert Chrzaniuk	45f4d3ab3e	EDAC, sb_edac: Set fixed DIMM width on Xeon Knights Landing Knights Landing does not come with register that could be used to fetch DIMM width. However the value is fixed for this architecture so it can be hardcoded. Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449840082-18673-1-git-send-email-hubert.chrzaniuk@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-11 16:58:32 +01:00
Jim Snow	d0cdf90031	EDAC, sb_edac: Add Knights Landing (Xeon Phi gen 2) support Knights Landing is the next generation architecture for HPC market. KNL introduces concept of a tile and CHA - Cache/Home Agent for memory accesses. Some things are fixed in KNL: () There's single DIMM slot per channel () There's 2 memory controllers with 3 channels each, however, from EDAC standpoint, it is presented as single memory controller with 6 channels. In order to represent 2 MCs w/ 3 CH, it would require major redesign of EDAC core driver. Basically, two functionalities are added/extended: () during driver initialization KNL topology is being recognized, i.e. which channels are populated with what DIMM sizes (knl_get_dimm_capacity function) () handle MCE errors - channel swizzling Reviewed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Jim Snow <jim.m.snow@intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449136134-23706-5-git-send-email-hubert.chrzaniuk@intel.com [ Rebase to 4.4-rc3. ] Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-05 19:00:52 +01:00
Jim Snow	c1979ba254	EDAC, sb_edac: Add support for duplicate device IDs Add options to sbridge_get_all_devices() to allow for duplicate device IDs and devices that are scattered across mulitple PCI buses. Signed-off-by: Jim Snow <jim.m.snow@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449136134-23706-4-git-send-email-hubert.chrzaniuk@intel.com [ Rebase to 4.4-rc3. ] Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-05 18:57:41 +01:00
Jim Snow	c59f9c06bd	EDAC, sb_edac: Virtualize several hard-coded functions SAD limit, interleave mode and DRAM related functionalities are now virtualized, so that overriding them is easier. Signed-off-by: Jim Snow <jim.m.snow@intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: lukasz.anaczkowski@intel.com Link: http://lkml.kernel.org/r/1449136134-23706-3-git-send-email-hubert.chrzaniuk@intel.com [ Rebase to 4.4-rc3. ] Signed-off-by: Hubert Chrzaniuk <hubert.chrzaniuk@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-12-05 18:54:45 +01:00
Seth Jennings	2900ea6096	EDAC, sb_edac: Fix TAD presence check for sbridge_mci_bind_devs() In commit `7d375bffa5` ("sb_edac: Fix support for systems with two home agents per socket") NUM_CHANNELS was changed to 8 and the channel space was renumerated to handle EN, EP, and EX configurations. The *_mci_bind_devs() functions - except for sbridge_mci_bind_devs() - got a new device presence check in the form of saw_chan_mask. However, sbridge_mci_bind_devs() still uses the NUM_CHANNELS for loop. With the increase in NUM_CHANNELS, this loop fails at index 4 since SB only has 4 TADs. This results in the following error on SB machines: EDAC sbridge: Some needed devices are missing EDAC sbridge: Couldn't find mci handler EDAC sbridge: Couldn't find mci handle This patch adapts the saw_chan_mask logic for sbridge_mci_bind_devs() as well. After this patch: EDAC MC0: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#0: DEV 0000:3f:0e.0 (POLLED) EDAC MC1: Giving out device to module sbridge_edac.c controller Sandy Bridge Socket#1: DEV 0000:7f:0e.0 (POLLED) Signed-off-by: Seth Jennings <sjenning@redhat.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Tony Luck <tony.luck@intel.com> Tested-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # v4.2 Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1438798561-10180-1-git-send-email-sjenning@redhat.com Signed-off-by: Borislav Petkov <bp@suse.de>	2015-09-24 20:40:50 +02:00
Aristeu Rozanski	12f0721c5a	sb_edac: correctly fetch DIMM width on Ivy Bridge and Haswell dimm_dev_type has been incorrectly determined in sb_edac. This patch fixes it for Ivy Bridge and Haswell only since nothing like exists for Sandy Bridge. We tested this patch in multiple systems matching the results with the installed memory modules. Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>	2015-09-08 20:33:48 -03:00
Aristeu Rozanski	7179385afe	sb_edac: look harder for DDRIO on Haswell systems In case the memory banks are populated so the first channel isn't used, the DDRIO PCI device won't be visible and it won't be possible to determine the memory type. Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>	2015-09-08 20:32:13 -03:00
Tony Luck	fa2ce64f85	sb_edac: support for Broadwell -EP and -EX Basic support for the single socket Broadwell-DE processor was added back in commit `1f39581a9a` sb_edac: Add support for Broadwell-DE processor This patch extends Broadwell support to cover the two socket "-EP" and four socket "-EX" versions of Broadwell. Only tested on the 2 socket - but this code is largely cloned from the Haswell path. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>	2015-06-03 10:10:59 -03:00
Tony Luck	7d375bffa5	sb_edac: Fix support for systems with two home agents per socket First noticed a problem on a 4 socket machine where EDAC only reported half the DIMMS. Tracked this down to the code that assumes that systems with two home agents only have two memory channels on each agent. This is true on 2 sockect ("-EP") machines. But four socket ("-EX") machines have four memory channels on each home agent. The old code would have had problems on two socket systems as it did a shuffling trick to make the internals of the code think that the channels from the first agent were '0' and '1', with the second agent providing '2' and '3'. But the code didn't uniformly convert from {ha,channel} tuples to this internal representation. New code always considers up to eight channels. On a machine with a single home agent these map easily to edac channels 0, 1, 2, 3. On machines with two home agents we map using: edac_channel = 4*ha# + channel So on a -EP machine where each home agent supports only two channels we'll fill in channels 0, 1, 4, 5, and on a -EX machine we use all of 0, 1, 2, 3, 4, 5, 6, 7. [mchehab@osg.samsung.com: fold a fixup patch as per Tony's request and fixed a few CodingStyle issues] Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>	2015-06-03 10:10:52 -03:00
Tony Luck	bb89e7141a	sb_edac: Fix a typo and a thinko in address handling for Haswell typo: "a7mode" chooses whether to use bits {8, 7, 9} or {8, 7, 6} in the algorithm to spread access between memory resources. But the non-a7mode path was incorrectly using GET_BITFIELD(addr, 7, 9) and so picking bits {9, 8, 7} thinko: BIT(1) of the dram_rule registers chooses whether to just use the {8, 7, 6} (or {8, 7, 9}) bits mentioned above as they are, or to XOR them with bits {18, 17, 16} but the code inverted the test. We need the additional XOR when dram_rule{1} == 0. Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>	2015-06-03 10:10:47 -03:00
Borislav Petkov	11249e7399	sb_edac: Fix detection on SNB machines `d0585cd815` ("sb_edac: Claim a different PCI device") changed the probing of sb_edac to look for PCI device 0x3ca0: 3f:0e.0 System peripheral: Intel Corporation Xeon E5/Core i7 Processor Home Agent (rev 07) 00: 86 80 a0 3c 00 00 00 00 07 00 80 08 00 00 80 00 ... but we're matching for 0x3ca8, i.e. PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_TA in sbridge_probe() therefore the probing fails. Changing it to probe for 0x3ca0 (PCI_DEVICE_ID_INTEL_SBRIDGE_IMC_HA0), .i.e., the 14.0 device, fixes the issue and driver loads successfully again: [ 2449.013120] EDAC DEBUG: sbridge_init: [ 2449.017029] EDAC sbridge: Seeking for: PCI ID 8086:3ca0 [ 2449.022368] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca0 [ 2449.028498] EDAC sbridge: Seeking for: PCI ID 8086:3ca0 [ 2449.033768] EDAC sbridge: Seeking for: PCI ID 8086:3ca8 [ 2449.039028] EDAC DEBUG: sbridge_get_onedevice: Detected 8086:3ca8 [ 2449.045155] EDAC sbridge: Seeking for: PCI ID 8086:3ca8 ... Add a debug printk while at it to be able to catch the failure in the future and dump driver version on successful load. Fixes: `d0585cd815` ("sb_edac: Claim a different PCI device") Cc: stable@vger.kernel.org # 3.18 Acked-by: Aristeu Rozanski <aris@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2015-02-09 16:55:26 +01:00
Tony Luck	fec53af531	sb_edac: Fix typo computing number of banks Code will always think there are 16 banks because of a typo Reported-by: Misha Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>	2014-12-02 16:46:33 -02:00
Tony Luck	1f39581a9a	sb_edac: Add support for Broadwell-DE processor Broadwell-DE is the microserver version of next generation Xeon processors. A whole bunch of new PCIe device ids, but otherwise pretty much the same as Haswell. Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>	2014-12-02 16:46:08 -02:00
Tony Luck	f7cf2a22a2	sb_edac: Fix discovery of top-of-low-memory for Haswell Haswell moved the TOLM/TOHM registers to a different device and offset. The sb_edac driver accounted for the change of device, but not for the new offset. There was also a typo in the constant to fill in the low 26 bits (was 0x1ffffff, should be 0x3ffffff). This resulted in a bogus value for the top of low memory: EDAC DEBUG: get_memory_layout: TOLM: 0.032 GB (0x0000000001ffffff) which would result in EDAC refusing to translate addresses for errors above the bogus value and below 4GB: sbridge MC3: HANDLING MCE MEMORY ERROR sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090 sbridge MC3: TSC 0 sbridge MC3: ADDR 2000000 sbridge MC3: MISC 523eac86 sbridge MC3: PROCESSOR 0:306f3 TIME 1414600951 SOCKET 0 APIC 0 MC3: 1 CE Error at TOLM area, on addr 0x02000000 on any memory ( page:0x0 offset:0x0 grain:32 syndrome:0x0) With the fix we see the correct TOLM value: DEBUG: get_memory_layout: TOLM: 2.048 GB (0x000000007fffffff) and we decode address 2000000 correctly: sbridge MC3: HANDLING MCE MEMORY ERROR sbridge MC3: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010090 sbridge MC3: TSC 0 sbridge MC3: ADDR 2000000 sbridge MC3: MISC 523e1086 sbridge MC3: PROCESSOR 0:306f3 TIME 1414601319 SOCKET 0 APIC 0 DEBUG: get_memory_error_data: SAD interleave package: 0 = CPU socket 0, HA 0, shiftup: 0 DEBUG: get_memory_error_data: TAD#0: address 0x0000000002000000 < 0x000000007fffffff, socket interleave 1, channel interleave 4 (offset 0x00000000), index 0, base ch: 0, ch mask: 0x01 DEBUG: get_memory_error_data: RIR#0, limit: 4.095 GB (0x00000000ffffffff), way: 1 DEBUG: get_memory_error_data: RIR#0: channel address 0x00200000 < 0xffffffff, RIR interleave 0, index 0 DEBUG: sbridge_mce_output_error: area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0 MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x2000 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0) Signed-off-by: Tony Luck <tony.luck@intel.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>	2014-12-02 12:06:52 -02:00
Jim Snow	8c00910029	sb_edac: Fix erroneous bytes->gigabytes conversion Signed-off-by: Jim Snow <jim.snow@intel.com> Signed-off-by: Lukasz Anaczkowski <lukasz.anaczkowski@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>	2014-12-02 12:06:51 -02:00
Andy Lutomirski	d0585cd815	sb_edac: Claim a different PCI device sb_edac controls a large number of different PCI functions. Rather than registering as a normal PCI driver for all of them, it registers for just one so that it gets probed and, at probe time, it looks for all the others. Coincidentally, the device it registers for also contains the SMBUS registers, so the PCI core will refuse to probe both sb_edac and a future iMC SMBUS driver. The drivers don't actually conflict, so just change sb_edac's device table to probe a different device. An alternative fix would be to merge the two drivers, but sb_edac will also refuse to load on non-ECC systems, whereas i2c_imc would still be useful without ECC. The only user-visible change should be that sb_edac appears to bind a different device. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Rui Wang <ruiv.wang@gmail.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>	2014-10-08 17:04:16 -03:00
Andy Lutomirski	68939df1d7	Move Intel SNB device ids from sb_edac to pci_ids.h The i2c_imc driver will use two of them, and moving only part of the list seems messier. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>	2014-10-08 17:04:16 -03:00
Seth Jennings	351fc4a99d	sb_edac: avoid INTERNAL ERROR message in EDAC with unspecified channel Intel IA32 SDM Table 15-14 defines channel 0xf as 'not specified', but EDAC doesn't know about this and returns and INTERNAL ERROR when the channel is greater than NUM_CHANNELS: kernel: [ 1538.886456] CPU 0: Machine Check Exception: 0 Bank 1: 940000000000009f kernel: [ 1538.886669] TSC 2bc68b22e7e812 ADDR 46dae7000 MISC 0 PROCESSOR 0:306e4 TIME 1390414572 SOCKET 0 APIC 0 kernel: [ 1538.971948] EDAC MC1: INTERNAL ERROR: channel value is out of range (15 >= 4) kernel: [ 1538.972203] EDAC MC1: 0 CE memory read error on unknown memory (slot:0 page:0x46dae7 offset:0x0 grain:0 syndrome:0x0 - area:DRAM err_code:0000:009f socket:1 channel_mask:1 rank:0) This commit changes sb_edac to forward a channel of -1 to EDAC if the channel is not specified. edac_mc_handle_error() sets the channel to -1 internally after the error message anyway, so this commit should have no effect other than avoiding the INTERNAL ERROR message when the channel is not specified. Signed-off-by: Seth Jennings <sjenning@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>	2014-10-08 17:04:16 -03:00
Aristeu Rozanski	50d1bb9367	sb_edac: add support for Haswell based systems Haswell memory controllers are very similar to Ivy Bridge and Sandy Bridge ones. This patch adds support to Haswell based systems. [m.chehab@samsung.com: Fix CodingStyle issues] Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2014-06-26 15:47:01 -03:00

1 2 3

118 Commits