linux_dsm_epyc7002/drivers
Qiuxu Zhuo 15cc3ae001 EDAC, sb_edac: Don't create a second memory controller if HA1 is not present
Yi Zhang reported the following failure on a 2-socket Haswell (E5-2603v3)
server (DELL PowerEdge 730xd):

  EDAC sbridge: Some needed devices are missing
  EDAC MC: Removed device 0 for sb_edac.c Haswell SrcID#0_Ha#0: DEV 0000:7f:12.0
  EDAC MC: Removed device 1 for sb_edac.c Haswell SrcID#1_Ha#0: DEV 0000:ff:12.0
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Couldn't find mci handler
  EDAC sbridge: Failed to register device with error -19.

The refactored sb_edac driver creates the IMC1 (the 2nd memory
controller) if any IMC1 device is present. In this case only
HA1_TA of IMC1 was present, but the driver expected to find
HA1/HA1_TM/HA1_TAD[0-3] devices too, leading to the above failure.

The document [1] says the 'E5-2603 v3' CPU has 4 memory channels max. Yi
Zhang inserted one DIMM per channel for each CPU, and did random error
address injection test with this patch:

      4024  addresses fell in TOLM hole area
     12715  addresses fell in CPU_SrcID#0_Ha#0_Chan#0_DIMM#0
     12774  addresses fell in CPU_SrcID#0_Ha#0_Chan#1_DIMM#0
     12798  addresses fell in CPU_SrcID#0_Ha#0_Chan#2_DIMM#0
     12913  addresses fell in CPU_SrcID#0_Ha#0_Chan#3_DIMM#0
     12674  addresses fell in CPU_SrcID#1_Ha#0_Chan#0_DIMM#0
     12686  addresses fell in CPU_SrcID#1_Ha#0_Chan#1_DIMM#0
     12882  addresses fell in CPU_SrcID#1_Ha#0_Chan#2_DIMM#0
     12934  addresses fell in CPU_SrcID#1_Ha#0_Chan#3_DIMM#0
    106400  addresses were injected totally.

The test result shows that all the 4 channels belong to IMC0 per CPU, so
the server really only has one IMC per CPU.

In the 1st page of chapter 2 in datasheet [2], it also says 'E5-2600 v3'
implements either one or two IMCs. For CPUs with one IMC, IMC1 is not
used and should be ignored.

Thus, do not create a second memory controller if the key HA1 is absent.

[1] http://ark.intel.com/products/83349/Intel-Xeon-Processor-E5-2603-v3-15M-Cache-1_60-GHz
[2] https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v3-datasheet-vol-2.pdf

Reported-and-tested-by: Yi Zhang <yizhan@redhat.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Fixes: e2f747b1f4 ("EDAC, sb_edac: Assign EDAC memory controller per h/w controller")
Link: http://lkml.kernel.org/r/20170913104214.7325-1-qiuxu.zhuo@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
2017-09-27 12:15:43 +02:00
..
accessibility
acpi dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
amba
android ANDROID: binder: don't queue async transactions to thread. 2017-09-01 09:22:50 +02:00
ata Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata 2017-09-06 22:41:21 -07:00
atm
auxdisplay
base firmware: Restore support for built-in firmware 2017-09-16 10:58:48 -07:00
bcma
block The highlights include: 2017-09-12 20:03:53 -07:00
bluetooth
bus ARM: SoC driver updates for v4.14 2017-09-10 20:40:00 -07:00
cdrom
char dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
clk The diff is dominated by the Allwinner A10/A20 SoCs getting converted to 2017-09-13 11:04:14 -07:00
clocksource Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-09-15 20:43:33 -07:00
connector
cpufreq dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
cpuidle Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-09-15 20:43:33 -07:00
crypto dmaengine updates for 4.14-rc1 2017-09-07 14:03:05 -07:00
dax - Some request-based DM core and DM multipath fixes and cleanups 2017-09-14 13:43:16 -07:00
dca
devfreq
dio
dma dmaengine updates for 4.14-rc1 2017-09-07 14:03:05 -07:00
dma-buf
edac EDAC, sb_edac: Don't create a second memory controller if HA1 is not present 2017-09-27 12:15:43 +02:00
eisa
extcon
firewire
firmware dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
fmc
fpga
fsi
gpio - New Drivers 2017-09-07 13:51:13 -07:00
gpu amd fixes pull 2017-09-15 17:52:52 -07:00
hid media updates for v4.14-rc1 2017-09-07 12:53:14 -07:00
hsi
hv Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-09-07 09:25:15 -07:00
hwmon dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
hwspinlock
hwtracing
i2c i2c: i2c-stm32f7: add driver 2017-09-14 17:34:43 +02:00
ide Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block 2017-09-07 11:59:42 -07:00
idle Power management updates for v4.14-rc1 2017-09-05 12:19:08 -07:00
iio - New Drivers 2017-09-07 13:51:13 -07:00
infiniband IB/mlx4: fix sprintf format warning 2017-09-13 18:53:15 -07:00
input Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2017-09-16 11:24:26 -07:00
iommu IOMMU Updates for Linux v4.14 2017-09-09 15:03:24 -07:00
ipack
irqchip Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-09-15 20:43:33 -07:00
isdn isdn: isdnloop: fix logic error in isdnloop_sendbuf 2017-09-07 20:03:54 -07:00
leds dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
lightnvm
macintosh powerpc/macintosh: constify wf_sensor_ops structures 2017-09-01 16:42:54 +10:00
mailbox Just behavorial changes to a controller driver: 2017-09-07 13:23:37 -07:00
mcb Char/Misc drivers for 4.14-rc1 2017-09-05 11:08:17 -07:00
md - Some request-based DM core and DM multipath fixes and cleanups 2017-09-14 13:43:16 -07:00
media Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:13:32 -07:00
memory ARM: SoC driver updates for v4.14 2017-09-10 20:40:00 -07:00
memstick
message
mfd dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
misc mm: treewide: remove GFP_TEMPORARY allocation flag 2017-09-13 18:53:16 -07:00
mmc mmc: renesas_sdhi: Add r8a7743/5 support 2017-09-01 15:31:01 +02:00
mtd This pull request contains updates for UBI: 2017-09-16 12:08:10 -07:00
mux
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-09-16 11:28:59 -07:00
nfc
ntb
nubus
nvdimm - Some request-based DM core and DM multipath fixes and cleanups 2017-09-14 13:43:16 -07:00
nvme nvme-pci: implement the HMB entry number and size limitations 2017-09-11 12:29:40 -04:00
nvmem
of dma-mapping updates for 4.14: 2017-09-12 13:30:06 -07:00
oprofile
parisc
parport Char/Misc drivers for 4.14-rc1 2017-09-05 11:08:17 -07:00
pci Revert "PCI: Avoid race while enabling upstream bridges" 2017-09-15 01:33:51 -05:00
pcmcia
perf
phy Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-09-15 20:43:33 -07:00
pinctrl pinctrl/amd: save pin registers over suspend/resume 2017-09-12 15:58:45 +02:00
platform dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
pnp dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
power power supply and reset changes for the v4.14 series 2017-09-09 14:44:39 -07:00
powercap
pps drivers/pps: use surrounding "if PPS" to remove numerous dependency checks 2017-09-08 18:26:51 -07:00
ps3
ptp
pwm pwm: Changes for v4.14-rc1 2017-09-11 13:04:32 -07:00
rapidio
ras
regulator - New Drivers 2017-09-07 13:51:13 -07:00
remoteproc rpmsg updates for v4.14 2017-09-09 14:34:38 -07:00
reset Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-09-15 20:43:33 -07:00
rpmsg rpmsg: glink: initialize ret to zero to ensure error status check is correct 2017-09-04 10:52:30 -07:00
rtc RTC for 4.14 2017-09-13 10:56:00 -07:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2017-09-12 06:01:59 -07:00
sbus
scsi SCSI misc on 20170913 2017-09-13 10:47:14 -07:00
sfi
sh
sn
soc Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-09-15 20:43:33 -07:00
spi ACPI updates for v4.14-rc1 2017-09-05 12:45:03 -07:00
spmi
ssb
staging Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:54:01 -07:00
target Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:13:32 -07:00
tc
tee
thermal Merge branches 'thermal-core', 'thermal-soc', 'thermal-intel' and 'const-thermal-zone-structure' into next 2017-09-08 11:20:04 +08:00
thunderbolt ACPI updates for v4.14-rc1 2017-09-05 12:45:03 -07:00
tty dmi: Mark all struct dmi_system_id instances const 2017-09-14 11:59:30 +02:00
uio
usb Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-09-14 18:13:32 -07:00
uwb
vfio
vhost lib/interval_tree: fast overlap detection 2017-09-08 18:26:49 -07:00
video fbdev changes for v4.14: 2017-09-14 13:33:33 -07:00
virt
virtio SCSI misc on 20170907 2017-09-07 21:11:05 -07:00
vlynq
vme
w1 power supply and reset changes for the v4.14 series 2017-09-09 14:44:39 -07:00
watchdog Merge branch '4.14-features' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus 2017-09-15 20:43:33 -07:00
xen mm: treewide: remove GFP_TEMPORARY allocation flag 2017-09-13 18:53:16 -07:00
zorro
Kconfig
Makefile