linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-11-25 22:10:49 +07:00

Author	SHA1	Message	Date
Mauro Carvalho Chehab	a7d7d2e1a0	edac: Create a dimm struct and move the labels into it The way a DIMM is currently represented implies that they're linked into a per-csrow struct. However, some drivers don't see csrows, as they're ridden behind some chip like the AMB's on FBDIMM's, for example. This forced drivers to fake^Wvirtualize a csrow struct, and to create a mess under csrow/channel original's concept. Move the DIMM labels into a per-DIMM struct, and add there the real location of the socket, in terms of csrow/channel. Latter patches will modify the location to properly represent the memory architecture. All other drivers will use a per-csrow type of location. Some of those drivers will require a latter conversion, as they also fake the csrows internally. TODO: While this patch doesn't change the existing behavior, on csrows-based memory controllers, a csrow/channel pair points to a memory rank. There's a known bug at the EDAC core that allows having different labels for the same DIMM, if it has more than one rank. A latter patch is need to merge the several ranks for a DIMM into the same dimm_info struct, in order to avoid having different labels for the same DIMM. The edac_mc_alloc() will now contain a per-dimm initialization loop that will be changed by latter patches in order to match other types of memory architectures. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:57 -03:00
Linus Torvalds	f0f3680e50	Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC fixes from Mauro Carvalho Chehab: "A series of EDAC driver fixes. It also has one core fix at the documentation, and a rename patch, fixing the name of the struct that contains the rank information." * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: edac: rename channel_info to rank_info i5400_edac: Avoid calling pci_put_device() twice edac: i5100 ack error detection register after each read edac: i5100 fix erroneous define for M1Err edac: sb_edac: Fix a wrong value setting for the previous value edac: sb_edac: Fix a INTERLEAVE_MODE() misuse edac: sb_edac: Let the driver depend on PCI_MMCONFIG edac: Improve the comments to better describe the memory concepts edac/ppc4xx_edac: Fix compilation Fix sb_edac compilation with 32 bits kernels	2012-03-28 14:24:40 -07:00
Niklas Söderlund	df95e42e1f	edac: i5100 ack error detection register after each read If I only ack the detection register after a error have been detected I'm unable to reliably detect errors. I have verified this behavior using both an error injection DIMM and software to inject errors. I can't find any documentation supporting this behavior in Intel 5100 Memory Controller Hub Chipset, see 1. So this is all based on experimentation. [1] Intel® 5100 Memory Controller Hub Chipset http://www.intel.com/content/dam/doc/datasheet/5100- memory-controller-hub-chipset-datasheet.pdf Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:22:49 -03:00
Niklas Söderlund	b6378cb3e5	edac: i5100 fix erroneous define for M1Err According to [1] the define for M1Err in the FERR_NF_MEM register is wrong. It should be at position 1 not 0. [1] Intel 5100 Memory Controller Hub Chipset Doc.Nr: 318378 http://www.intel.com/content/dam/doc/datasheet/5100- memory-controller-hub-chipset-datasheet.pdf Reported-by: Ba Thang Nguyen <thang.b.nguyen@dektech.com.au> Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:20:55 -03:00
Lionel Debroux	36c46f31df	EDAC: Make pci_device_id tables __devinitconst. These const tables are currently marked __devinitdata, but Documentation/PCI/pci.txt says: "o The ID table array should be marked __devinitconst; this is done automatically if the table is declared with DEFINE_PCI_DEVICE_TABLE()." So use DEFINE_PCI_DEVICE_TABLE(x). Based on PaX and earlier work by Andi Kleen. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-03-19 12:04:54 +01:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Borislav Petkov	390944439f	EDAC: Fixup scrubrate manipulation Make the ->{get\|set}_sdram_scrub_rate return the actual scrub rate bandwidth it succeeded setting and remove superfluous arg pointer used for that. A negative value returned still means that an error occurred while setting the scrubrate. Document this for future reference. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2011-01-07 11:38:31 +01:00
Borislav Petkov	eba042a81e	edac, mc: Improve scrub rate handling Fortify the interface to not accept negative values, remove memctrl_int_store() as a result. Also, sanitize bandwidth setting by making the argument a simple u32 instead of strange u32 pointer being passed around for no obvious reason. Then, fix error handling and teach it to return proper error values. Finally, make code more readable, simplify debug messages. Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Arthur Jones <ajones@riverbed.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Doug Thompson <dougthompson@xmission.com>	2010-08-03 16:14:06 +02:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Nils Carlson	bbead2104e	edac: i5100 add 6 ranks per channel Add support for 6 ranks per channel to the i5100 chipset. I have tested the patch as far as possible with correctible errors and things appear good. The DIMM mapping is correct for our board, but boards may differ. Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se> Acked-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:20:12 -08:00
Nils Carlson	295439f2a3	edac: i5100 add scrubbing Addscrubbing to the i5100 chipset. The i5100 chipset only supports one scrubbing rate, which is not constant but dependent on memory load. The rate returned by this driver is an estimate based on some experimentation, but is substantially closer to the truth than the speed supplied in the documentation. Also, scrubbing is done once, and then a done-bit is set. This means that to accomplish continuous scrubbing a re-enabling mechanism must be used. I have created the simplest possible such mechanism in the form of a work-queue which will check every five minutes. This interval is quite arbitrary but should be sufficient for all sizes of system memory. Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:20:12 -08:00
Nils Carlson	b18dfd05f9	edac: i5100 clean controller to channel terms The i5100 driver uses the word controller instead of channel in a lot of places, this is simply a cleanup of the patch. Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-12-16 07:20:12 -08:00
Arthur Jones	b238e57723	edac: i5100: cleanup Some code cleanliness issues found by Andrew Morton (thanks!) which should not affect functionality, but which should help make the code more maintainable. In particular, we now: * convert all #define's w/ a parameter to static inlines * use 1UL rather than 1ULL when calculating an unsigned long * use pci_disable_device The resulting code is tested and seems to work fine... Signed-off-by: Arthur Jones <ajones@riverbed.com> Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Arthur Jones	178d5a7422	edac: i5100 fix unmask ecc bits Explicitly unmask ECC errors we are interested in reporting. Signed-off-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Arthur Jones	43920a598f	edac: i5100 fix enable ecc hardware It is possible that the BIOS did not enable ECC at boot time. We check for that case and fail to load if it is true. Signed-off-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Arthur Jones	f7952ffcff	edac: i5100 fix missing bits The error mask we use to trigger ECC notifications is missing many bits of interest. We add these bits here so that all possible ECC errors can be reported. Signed-off-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00
Arthur Jones	8f421c595a	edac: i5100 new intel chipset driver Preliminary support for the Intel 5100 MCH. CE and UE errors are reported along with the current DIMM label information and other memory parameters. Reasons why this is preliminary: 1) This chip has 2 independent memory controllers which, for best perforance, use interleaved accesses to the DDR2 memory. This architecture does not map very well to the current edac data structures which depend on symmetric channel access to the interleaved data. Without core changes, the best I could do for now is to map both memory controllers to different csrows (first all ranks of controller 0, then all ranks of controller 1). Someone much more familiar with the edac core than I will probably need to come up with a more general data structure to handle the interleaving and de-interleaving of the two memory controllers. 2) I have not yet tackled the de-interleaving of the rank/controller address space into the physical address space of the CPU. There is nothing fundamentally missing, it is just ending up to be a lot of code, and I'd rather keep it separate for now, esp since it doesn't work yet... 3) The code depends on a particular i5100 chip select to DIMM mainboard chip select mapping. This mapping seems obvious to me in order to support dual and single ranked memory, but it is not unique and DIMM labels could be wrong on other mainboards. There is no way to query this mapping that I know of. 4) The code requires that the i5100 is in 32GB mode. Only 4 ranks per controller, 2 ranks per DIMM are supported. I do not have hardware (nor do I expect to have hardware anytime soon) for the 48GB (6 ranks per controller) mode. 5) The serial presence detect code should be broken out into a "real" i2c driver so that decode-dimms.pl can work. Signed-off-by: Arthur Jones <ajones@riverbed.com> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-07-25 10:53:48 -07:00

17 Commits