linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-05 09:26:44 +07:00

Author	SHA1	Message	Date
Aristeu Rozanski	be3036d220	sb_edac: avoid decoding the same error multiple times Whenever the extended error reporting is active, multiple MCEs will be generated for the same event, which will lead to multiple repeated errors to be reported. So check ADDRV and only decode the error if the MCE address is valid. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:50:03 -02:00
Aristeu Rozanski	ea779b5a09	sb_edac: rename mci_bind_devs() This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:49:47 -02:00
Aristeu Rozanski	5153a0f94c	sb_edac: enable multiple PCI id tables to be used This is needed to allow separated PCI id tables for Sandy Bridge and Ivy Bridge. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:49:27 -02:00
Aristeu Rozanski	cc311991a7	sb_edac: rework sad_pkg This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:49:04 -02:00
Aristeu Rozanski	ef1ce51e7b	sb_edac: allow different interleave lists This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:48:42 -02:00
Aristeu Rozanski	464f1d829a	sb_edac: allow different dram_rule arrays This is in preparation for Ivy Bridge support Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:48:16 -02:00
Aristeu Rozanski	8fd6a43ac9	sb_edac: isolate TOHM retrieval This is preparation of Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:43:17 -02:00
Aristeu Rozanski	5f8a1b8a7b	sb_edac: rename pci_br Ivy Bridge has more than one, so rename pci_br to pci_br0 Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:43:06 -02:00
Aristeu Rozanski	fb79a50926	sb_edac: isolate TOLM retrieval This is in preparation for the Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:25:27 -02:00
Aristeu Rozanski	ef1e8d03b1	sb_edac: make RANK_CFG_A value part of sbridge_info This is in preparation of Ivy Bridge support. Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>	2013-11-14 16:20:37 -02:00
Linus Torvalds	10d0c9705e	DeviceTree updates for 3.13. This is a bit larger pull request than usual for this cycle with lots of clean-up. - Cross arch clean-up and consolidation of early DT scanning code. - Clean-up and removal of arch prom.h headers. Makes arch specific prom.h optional on all but Sparc. - Addition of interrupts-extended property for devices connected to multiple interrupt controllers. - Refactoring of DT interrupt parsing code in preparation for deferred probe of interrupts. - ARM cpu and cpu topology bindings documentation. - Various DT vendor binding documentation updates. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQEcBAABAgAGBQJSgPQ4AAoJEMhvYp4jgsXif28H/1WkrXq5+lCFQZF8nbYdE2h0 R8PsfiJJmAl6/wFgQTsRel+ScMk2hiP08uTyqf2RLnB1v87gCF7MKVaLOdONfUDi huXbcQGWCmZv0tbBIklxJe3+X3FIJch4gnyUvPudD1m8a0R0LxWXH/NhdTSFyB20 PNjhN/IzoN40X1PSAhfB5ndWnoxXBoehV/IVHVDU42vkPVbVTyGAw5qJzHW8CLyN 2oGTOalOO4ffQ7dIkBEQfj0mrgGcODToPdDvUQyyGZjYK2FY2sGrjyquir6SDcNa Q4gwatHTu0ygXpyphjtQf5tc3ZCejJ/F0s3olOAS1ahKGfe01fehtwPRROQnCK8= =GCbY -----END PGP SIGNATURE----- Merge tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: "DeviceTree updates for 3.13. This is a bit larger pull request than usual for this cycle with lots of clean-up. - Cross arch clean-up and consolidation of early DT scanning code. - Clean-up and removal of arch prom.h headers. Makes arch specific prom.h optional on all but Sparc. - Addition of interrupts-extended property for devices connected to multiple interrupt controllers. - Refactoring of DT interrupt parsing code in preparation for deferred probe of interrupts. - ARM cpu and cpu topology bindings documentation. - Various DT vendor binding documentation updates" * tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits) powerpc: add missing explicit OF includes for ppc dt/irq: add empty of_irq_count for !OF_IRQ dt: disable self-tests for !OF_IRQ of: irq: Fix interrupt-map entry matching MIPS: Netlogic: replace early_init_devtree() call of: Add Panasonic Corporation vendor prefix of: Add Chunghwa Picture Tubes Ltd. vendor prefix of: Add AU Optronics Corporation vendor prefix of/irq: Fix potential buffer overflow of/irq: Fix bug in interrupt parsing refactor. of: set dma_mask to point to coherent_dma_mask of: add vendor prefix for PHYTEC Messtechnik GmbH DT: sort vendor-prefixes.txt of: Add vendor prefix for Cadence of: Add empty for_each_available_child_of_node() macro definition arm/versatile: Fix versatile irq specifications. of/irq: create interrupts-extended property microblaze/pci: Drop PowerPC-ism from irq parsing of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code. of/irq: Use irq_of_parse_and_map() ...	2013-11-12 16:52:17 +09:00
Robert Richter	78cfbf0bbf	edac, highbank: Moving error injection to sysfs for edac Always have the error injection i/f available, even if there is no debugfs or EDAC_DEBUG enabled. We need this for testing production kernels and environments. Thus, the entry moves from: /sys/kernel/debug/edac/mc0/inject_ctrl to: /sys/devices/system/edac/mc/mc0/inject_ctrl No other changes of the interface. Signed-off-by: Robert Richter <robert.richter@linaro.org> Signed-off-by: Robert Richter <rric@kernel.org>	2013-11-04 17:01:11 -06:00
Robert Richter	7270a6085a	edac: Unify reporting of device info for device, mc and pci Log messages slightly differ between edac subsystems. Unifying it. Signed-off-by: Robert Richter <robert.richter@linaro.org> Acked-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Robert Richter <rric@kernel.org>	2013-11-04 17:01:09 -06:00
Robert Richter	41ec0e8da0	edac, highbank: Improve and unify naming Assinging correct names of the 'hb_mc_edac' and 'hb_l2_edac' edac modules for module, controller and device. Reported values for Highbank in dmesg are now: EDAC MC0: Giving out device to module hb_mc_edac controller calxeda,hb-ddr-ctrl: DEV fff00000.memory-controller (INTERRUPT) EDAC DEVICE0: Giving out device to module hb_l2_edac controller calxeda,hb-sregs-l2-ecc: DEV fff3c200.sregs (INTERRUPT) Signed-off-by: Robert Richter <robert.richter@linaro.org> Acked-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Robert Richter <rric@kernel.org>	2013-11-04 17:01:07 -06:00
Robert Richter	0ec8579e16	edac, highbank: Add Calxeda ECX-2000 support Implement edac support for Calxeda ECX-2000. The ECX-2000 memory controller is similar to Highbank but has different register bases for error and interrupt registers. There is an own device tree name "calxeda,ecx-2000-ddr-ctrl" for identification and initialization of the ECX-2000 and its base addresses. Signed-off-by: Robert Richter <robert.richter@linaro.org> Acked-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Robert Richter <rric@kernel.org>	2013-11-04 17:01:06 -06:00
Robert Richter	a72b8859fd	edac, highbank: Fix interrupt setup of mem and l2 controller Register and enable interrupts after the edac registration. Otherwise incomming ecc error interrupts lead to crashes during device setup. Fixing this in drivers for mc and l2. Signed-off-by: Robert Richter <robert.richter@linaro.org> Acked-by: Rob Herring <rob.herring@calxeda.com> Cc: stable <stable@vger.kernel.org> # 3.6+ Signed-off-by: Robert Richter <rric@kernel.org>	2013-11-04 15:43:18 -06:00
Chen, Gong	56507694de	EDAC, GHES: Update ghes error record info In latest UEFI spec(by now it's 2.4) there are some new fields for memory error reporting. Add these new fields for ghes_edac interface. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-10-23 10:11:00 -07:00
Chen, Gong	147de14772	ACPI, APEI, CPER: Add UEFI 2.4 support for memory error In latest UEFI spec(by now it is 2.4) memory error definition for CPER (UEFI 2.4 Appendix N Common Platform Error Record) adds some new fields. These fields help people to locate memory error to an actual DIMM location. Original-author: Tony Luck <tony.luck@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-10-23 10:10:20 -07:00
Chen, Gong	10ef6b0dff	bitops: Introduce a more generic BITMASK macro GENMASK is used to create a contiguous bitmask([hi:lo]). It is implemented twice in current kernel. One is in EDAC driver, the other is in SiS/XGI FB driver. Move it to a more generic place for other usage. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Winischhofer <thomas@winischhofer.net> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-10-21 15:12:01 -07:00
Rob Herring	5af5073004	drivers: clean-up prom.h implicit includes Powerpc is a mess of implicit includes by prom.h. Add the necessary explicit includes to drivers in preparation of prom.h cleanup. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Acked-by: Grant Likely <grant.likely@linaro.org>	2013-10-09 20:04:04 -05:00
Linus Torvalds	4de9ad9bc0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile Pull Tile arch updates from Chris Metcalf: "These changes bring in a bunch of new functionality that has been maintained internally at Tilera over the last year, plus other stray bits of work that I've taken into the tile tree from other folks. The changes include some PCI root complex work, interrupt-driven console support, support for performing fast-path unaligned data fixups by kernel-based JIT code generation, CONFIG_PREEMPT support, vDSO support for gettimeofday(), a serial driver for the tilegx on-chip UART, KGDB support, more optimized string routines, support for ftrace and kprobes, improved ASLR, and many bug fixes. We also remove support for the old TILE64 chip, which is no longer buildable" * git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (85 commits) tile: refresh tile defconfig files tile: rework <asm/cmpxchg.h> tile PCI RC: make default consistent DMA mask 32-bit tile: add null check for kzalloc in tile/kernel/setup.c tile: make __write_once a synonym for __read_mostly tile: remove support for TILE64 tile: use asm-generic/bitops/builtin-*.h tile: eliminate no-op "noatomichash" boot argument tile: use standard tile_bundle_bits type in traps.c tile: simplify code referencing hypervisor API addresses tile: change <asm/system.h> to <asm/switch_to.h> in comments tile: mark pcibios_init() as __init tile: check for correct compiler earlier in asm-offsets.c tile: use standard 'generic-y' model for <asm/hw_irq.h> tile: use asm-generic version of <asm/local64.h> tile PCI RC: add comment about "PCI hole" problem tile: remove DEBUG_EXTRA_FLAGS kernel config option tile: add virt_to_kpte() API and clean up and document behavior tile: support FRAME_POINTER tile: support reporting Tilera hypervisor statistics ...	2013-09-06 11:14:33 -07:00
Aravind Gopalakrishnan	4fc06b3171	amd64_edac: Fix incorrect wraparounds dct_base and dct_limit obtain 32 bit register values when they read their respective pci config space registers. A left shift beyond 32 bits will cause them to wrap around. Similar case for chan_addr as can be seen from the bug report (link below). In the patch, we rectify this by casting chan_addr to u64 and by comparing dct_base and dct_limit against properly shifted sys_addr in order to compare the correct bits. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/20130819132302.GA12171@elgon.mountain Signed-off-by: Borislav Petkov <bp@suse.de>	2013-08-27 15:00:22 +02:00
Borislav Petkov	3f0aba4fc0	amd64_edac: Correct erratum 505 range Basically we want to cover all 0x0-0xf models, i.e. Orochi and later. Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/20130819192321.GF4165@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de>	2013-08-27 14:25:08 +02:00
Ingo Molnar	c874b6ba55	An amd64_edac fix for single channel configurations + trivial cleanups courtesy of Jingoo Han. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iQIcBAABAgAGBQJSC6mSAAoJEBLB8Bhh3lVKhKYQALt3lklah1BpZIqyTiLqtizN YVF/lSubpcHB2BfnZUzaSInE+17uo21DJ44tQ56O0lWDsxLqadJzTDiJ2BG5Ol3u JOIWti/Yl8YoCQQcir6QksBAUkR26iCpelO8kO6J/JWGEekVgl3Oik4NOmYrdN55 rUoHIyVdK5Z5y4j79Pmth7//+c6OFli1cAeUmBlIvxS9T4T2ZCz30jBim76VTS8H AjgaX/aBlE3SxAYoMWLZh1VglukxAVCG2qZ9lm7iNLCpkGwP59jT/DE9Gok2IXBg id9SMTkrpitijCyM3oKYox14Tl+QP/brElnWCVyVeRIpkVH8s2WUkU+qHeZztBg9 8i/aU9x4bOCkDjhvBjIM4jbYJAvvaVKlIXPvANO8xl/A0D7nCsmCs7DpritRHZEr 3y4N8SQsaamVD083+UaVciAo1XCpl5cNq9gH/Y7+U8h2bIThZn8HTbZ1uWgXaCY1 OwfElbJDKInsSmDVBEklMI8CF42YsjGQc2JC+A/3M3CapTfepKPg6SvptNQueZb+ SUw0mgGBumpdDQSHo5tPf5JL1y57ERkdOBVryrqYtr6qdjw3ox9o/+B8eTy3gkg1 b71LEhzG2UbdSVaHiYhRE/IR3W3yKzNX8Oh3BTHfp04jqnNijx4hHu9oX7j3W59M P/bd6GHHI5ZY66aLqCue =r2kL -----END PGP SIGNATURE----- Merge tag 'edac_for_3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras Pull RAS/EDAC updates from Boris Petkov: "An amd64_edac fix for single channel configurations + trivial cleanups courtesy of Jingoo Han." Signed-off-by: Ingo Molnar <mingo@kernel.org>	2013-08-15 10:07:20 +02:00
Jingoo Han	75a9551f2b	cpc925_edac: Use proper array termination The struct should be terminated by using empty braces in order to fix the following sparse warning. drivers/edac/cpc925_edac.c:792:10: warning: Using plain integer as NULL pointer Signed-off-by: Jingoo Han <jg1.han@samsung.com> [ drop obvious comment ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-08-14 12:46:46 +02:00
Borislav Petkov	a4b4bedce8	amd64_edac: Get rid of boot_cpu_data accesses Now that we cache (family, model, stepping) locally, use them instead of boot_cpu_data. No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de>	2013-08-12 16:01:56 +02:00
Aravind Gopalakrishnan	18b94f66f9	amd64_edac: Add ECC decoding support for newer F15h models On newer models, support has been included for upto 4 DCT's, however, only DCT0 and DCT3 are currently configured (cf BKDG Section 2.10). Also, the routing DRAM Requests algorithm is different for F15h M30h. Thus it is cleaner to use a brand new function rather than adding quirks to the more generic f1x_match_to_this_node(). Refer to "2.10.5 DRAM Routing Requests" in the BKDG for further info. Tested on Fam15h M30h with ECC turned on using mce_amd_inj facility and verified to be functionally correct. While at it, verify if erratum workarounds for E505 and E637 still hold. From email conversations within AMD, the current status of the errata is: * Erratum 505: fixed in model 0x1, stepping 0x1 and later. * Erratum 637: not fixed. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Cleanups, corrections ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-08-12 16:00:10 +02:00
Jingoo Han	e0d391ab04	x38_edac: Make a local function static Make a local function static in order to fix the following sparse warning: drivers/edac/x38_edac.c:252:14: warning: symbol 'x38_map_mchbar' was not declared. Should it be static? Signed-off-by: Jingoo Han <jg1.han@samsung.com> [ Boris: Correct commit message ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-08-09 15:29:10 +02:00
Jingoo Han	166e9334e9	i3200_edac: Make a local function static This local symbol is used only in this file. Fix the following sparse warnings: drivers/edac/i3200_edac.c:264:14: warning: symbol 'i3200_map_mchbar' was not declared. Should it be static? Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2013-08-09 15:23:02 +02:00
Borislav Petkov	f0a56c4801	amd64_edac: Fix single-channel setups It can happen that configurations are running in a single-channel mode even with a dual-channel memory controller, by, say, putting the DIMMs only on the one channel and leaving the other empty. This causes a problem in init_csrows which implicitly assumes that when the second channel is enabled, i.e. channel 1, the struct dimm hierarchy will be present. Which is not. So always allocate two channels unconditionally. This provides for the nice side effect that the data structures are initialized so some day, when memory hotplug is supported, it should just work out of the box when all of a sudden a second channel appears. Reported-and-tested-by: Roger Leigh <rleigh@debian.org> Signed-off-by: Borislav Petkov <bp@suse.de>	2013-07-29 17:22:41 +02:00
Jingoo Han	c542b53da9	EDAC: Replace strict_strtol() with kstrtol() The usage of strict_strtol() is not preferred, because strict_strtol() is obsolete. Thus, kstrtol() should be used. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2013-07-24 09:30:03 +02:00
Borislav Petkov	88d84ac973	EDAC: Fix lockdep splat Fix the following: BUG: key ffff88043bdd0330 not in .data! ------------[ cut here ]------------ WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0() DEBUG_LOCKS_WARN_ON(1) Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1 Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013 0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958 ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000 ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc Call Trace: dump_stack warn_slowpath_common warn_slowpath_fmt lockdep_init_map ? trace_hardirqs_on_caller ? trace_hardirqs_on debug_mutex_init __mutex_init bus_register edac_create_sysfs_mci_device edac_mc_add_mc sbridge_probe pci_device_probe driver_probe_device __driver_attach ? driver_probe_device bus_for_each_dev driver_attach bus_add_driver driver_register __pci_register_driver ? 0xffffffffa0010fff sbridge_init ? 0xffffffffa0010fff do_one_initcall load_module ? unset_module_init_ro_nx SyS_init_module tracesys ---[ end trace d24a70b0d3ddf733 ]--- EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0 EDAC sbridge: Driver loaded. What happens is that bus_register needs a statically allocated lock_key because the last is handed in to lockdep. However, struct mem_ctl_info embeds struct bus_type (the whole struct, not a pointer to it) and the whole thing gets dynamically allocated. Fix this by using a statically allocated struct bus_type for the MC bus. Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: stable@kernel.org # v3.10 Signed-off-by: Tony Luck <tony.luck@intel.com>	2013-07-23 16:01:28 -07:00
Sachin Kamat	8e42e211e4	edac: Remove redundant platform_set_drvdata() Commit `0998d06310` (device-core: Ensure drvdata = NULL when no driver is bound) removes the need to set driver data field to NULL. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>	2013-07-17 12:49:55 -04:00
Linus Torvalds	d144746478	Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus Pull MIPS updates from Ralf Baechle: "MIPS updates: - All the things that didn't make 3.10. - Removes the Windriver PPMC platform. Nobody will miss it. - Remove a workaround from kernel/irq/irqdomain.c which was there exclusivly for MIPS. Patch by Grant Likely. - More small improvments for the SEAD 3 platform - Improvments on the BMIPS / SMP support for the BCM63xx series. - Various cleanups of dead leftovers. - Platform support for the Cavium Octeon-based EdgeRouter Lite. Two large KVM patchsets didn't make it for this pull request because their respective authors are vacationing" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (124 commits) MIPS: Kconfig: Add missing MODULES dependency to VPE_LOADER MIPS: BCM63xx: CLK: Add dummy clk_{set,round}_rate() functions MIPS: SEAD3: Disable L2 cache on SEAD-3. MIPS: BCM63xx: Enable second core SMP on BCM6328 if available MIPS: BCM63xx: Add SMP support to prom.c MIPS: define write{b,w,l,q}_relaxed MIPS: Expose missing pci_io{map,unmap} declarations MIPS: Malta: Update GCMP detection. Revert "MIPS: make CAC_ADDR and UNCAC_ADDR account for PHYS_OFFSET" MIPS: APSP: Remove <asm/kspd.h> SSB: Kconfig: Amend SSB_EMBEDDED dependencies MIPS: microMIPS: Fix improper definition of ISA exception bit. MIPS: Don't try to decode microMIPS branch instructions where they cannot exist. MIPS: Declare emulate_load_store_microMIPS as a static function. MIPS: Fix typos and cleanup comment MIPS: Cleanup indentation and whitespace MIPS: BMIPS: support booting from physical CPU other than 0 MIPS: Only set cpu_has_mmips if SYS_SUPPORTS_MICROMIPS MIPS: GIC: Fix gic_set_affinity infinite loop MIPS: Don't save/restore OCTEON wide multiplier state on syscalls. ...	2013-07-13 14:52:21 -07:00
Linus Torvalds	f3acb96f38	Add MCE signatures for family 0x15, models 30-3f. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJR0UeRAAoJEBLB8Bhh3lVKp/oP/1Bn5wFlVNVQymyqjUv8PCui W7gro9vf67KSZrEvbvWYuBl3/Z7JI9e7MQMzWOtj1mx/li6G/mSHPWYsoK/g4MRD nuxr5EWmXoXvoJvaIPDf/ne9qeMZ88sTv1zQavCWEO+fCY95rUY5DzhJuw8kgjMX g++C1AZJFzRGHBF0/VswcBKU1cBaEIJ2h0UI6zJQDPWtTUL5VJcfZ+7CduHf07IA rMKqRLKLzBuuQ/aiE52bPQAHirbYbJ9d2jfNqpDRsdBFrOoOeqJbco7ac/itk9Lo 4Qlh52HuF0rpi9Ub+QIchaQm4SZCFnBuw5liDgT9kp8cOO9KWS6JBNmYa2cEw54F LALbdp0XeHCYWoQskzVlTaZlAGADmr4f0C0o1e+XXJYfp8TFIR3ZoconyBKHulsh JbQDp1NNC1JzoyZyZW73Gzi4a1yLmf5tB7Y3+O6NMTDy+6RFck3oCwctcr3dRaP2 pArQA0OCApDixcGfB0h1uX+H8zhVrUen+JcF3KfuXhTepS5koIRbMBglBiJYtSjN 4RRaV1DWR3ii+gvSfWEaApzX2XxnLYA/6ZJSvicwoRHi6AcZPn5mcLDCXIyYU0p/ jDGLPY1RU+EmIQlgXUr6CbtlpoCaQOZuyUgSAqQL0pPfCvaeUvX1M7Ly0GqXy5b5 dKMw7cf7T37S2S9YUBJ/ =KQCd -----END PGP SIGNATURE----- Merge tag 'edac_for_3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull AMD EDAC update from Borislav Petkov: "Add MCE signatures for family 0x15, models 30-3f" * tag 'edac_for_3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC, MCE, AMD: Add an MCE signature for new Fam15h models EDAC: Replace strict_strtoul() with kstrtoul()	2013-07-03 13:11:18 -07:00
David Daney	9ddebc46e7	MIPS: OCTEON: Rename Kconfig CAVIUM_OCTEON_REFERENCE_BOARD to CAVIUM_OCTEON_SOC CAVIUM_OCTEON_SOC most place we used to use CPU_CAVIUM_OCTEON. This allows us to CPU_CAVIUM_OCTEON in places where we have no OCTEON SOC. Remove CAVIUM_OCTEON_SIMULATOR as it doesn't really do anything, we can get the same configuration with CAVIUM_OCTEON_SOC. Signed-off-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Cc: linux-ide@vger.kernel.org Cc: linux-edac@vger.kernel.org Cc: linux-i2c@vger.kernel.org Cc: netdev@vger.kernel.org Cc: spi-devel-general@lists.sourceforge.net Cc: devel@driverdev.osuosl.org Cc: linux-usb@vger.kernel.org Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Wolfram Sang <wsa@the-dreams.de> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Patchwork: https://patchwork.linux-mips.org/patch/5295/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>	2013-06-10 18:01:25 +02:00
Aravind Gopalakrishnan	aad19e5176	EDAC, MCE, AMD: Add an MCE signature for new Fam15h models Add a new error signature for Family 15h, models 30h-3fh. Patch has been tested on Fam15h using mce_amd_inj facility and has been verified to work correctly. Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> [ cleanup commit message and error string ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-06-08 10:17:03 +02:00
Jingoo Han	c7f62fc87b	EDAC: Replace strict_strtoul() with kstrtoul() The usage of strict_strtoul() is not preferred, because strict_strtoul() is obsolete. Thus, kstrtoul() should be used. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2013-06-08 10:16:33 +02:00
Stephen Rothwell	40b313608a	Finally eradicate CONFIG_HOTPLUG Ever since commit `45f035ab9b` ("CONFIG_HOTPLUG should be always on"), it has been basically impossible to build a kernel with CONFIG_HOTPLUG turned off. Remove all the remaining references to it. Cc: Russell King <linux@arm.linux.org.uk> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-06-03 14:20:18 -07:00
Borislav Petkov	bbb013b920	amd64_edac: Fix bogus sysfs file permissions Fix yet another issue caught by `8f46baaa7e` ("base: core: WARN() about bogus permissions on device attributes"). Signed-off-by: Borislav Petkov <bp@suse.de>	2013-05-21 09:13:11 +02:00
Linus Torvalds	7462543abb	Two small EDAC fixes. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRi31iAAoJEBLB8Bhh3lVKiaQQAIZ4R6/9QSt7KbP5VLN1ksv2 qV2VOGPuxkhGoO/vfUJ0RKFj0NgnQFYizKl+vjZL/ahQy9yqelWInoc/TO7tHGol rxlyCoxmS5kuXwbs8CYhC5OQBvuFSTfk/8Ecu4lZa/PfPKs4IXaq+jzjqfk6/QvC jjfJ/4TIbyHLR48tbjoiJtjMQlsZOZ+R9o6Ic0Qw49GlqbIdZ15KzZVnuLRxJWby 4S2hQnBWkMfc/RYXfliUs6TsXd55qyd88La6PHeY/BJRxQoaBvsPWLEd6Pk6RNWd RfpoEHCdUfnFBQMvO5C50/Dp6iKXJ4xqh9qrPg0LVlTtGbPL8gdDfK5QEElhEiuw vM9dzfNXFvE4Pavx32WSm7ql2LD9Qf8ZdLPxMlvjNGW14+oQexHmDI6sdU5iVYxb toct5jF7MwEy6GQQcwmp84V8y5FU+MyEtT+w9OKLpay/9Bqcq0I/Mv6LJWH10IDC bpfkaUm10C1aF2/vP6BB48NGUUElZIxXg9VapzX+AuRs6kN7LLOmM38G371HPEbV wcsRCU1znxo1Yjehen6oI9I2AQ4NuMcHplK2FiD0I1AzlRQ/BM6TeHejy84SJMgf QEQkjwh8DAClzcKJFlt9uoIglCLLjY/WDVSLgvvhv+/kQIXrLV7zCAhGR2CE27ci XQsmruJYlPt0A0xkOh// =uyan -----END PGP SIGNATURE----- Merge tag 'edac_fixes_for_3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull two small EDAC fixes from Borislav Petkov. * tag 'edac_fixes_for_3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC: Don't give write permission to read-only files EDAC, mc_sysfs.c: Fix string array pointer types	2013-05-09 10:11:08 -07:00
Srivatsa S. Bhat	c8c64d165c	EDAC: Don't give write permission to read-only files I get the following warning on boot: ------------[ cut here ]------------ WARNING: at drivers/base/core.c:575 device_create_file+0x9a/0xa0() Hardware name: -[8737R2A]- Write permission without 'store' ... </snip> Drilling down, this is related to dynamic channel ce_count attribute files sporting a S_IWUSR mode without a ->store() function. Looking around, it appears that they aren't supposed to have a ->store() function. So remove the bogus write permission to get rid of the warning. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: <stable@vger.kernel.org> # 3.[89] [ shorten commit message ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-05-09 12:40:45 +02:00
Linus Torvalds	e2823299cd	Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull edac fixes from Mauro Carvalho Chehab: "Two edac fixes: - i7300_edac currently reports a wrong number of DIMMs when the memory controller is in single channel mode - on some Sandy Bridge machines, the EDAC driver bails out as one of the PCI IDs used by the driver is hidden by BIOS. As the driver uses it only to detect the type of memory, make it optional at the driver" * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: edac: sb_edac.c should not require prescence of IMC_DDRIO device i7300_edac: Fix memory detection in single mode	2013-04-30 10:00:49 -07:00
Luck, Tony	de4772c621	edac: sb_edac.c should not require prescence of IMC_DDRIO device The Sandy Bridge EDAC driver uses a register in the IMC_DDRIO CSR space to determine the type of DIMMs (registered or unregistered). But this device does not exist on some single socket Sandy Bridge servers. While the type of DIMMs is nice to know, it is not essential for this driver's other functions. So it seems harsh to have it refuse to load at all when it cannot find this device. Make the check for this device be optional. If it isn't present just report the memory type as "MEM_UNKNOWN". Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-04-29 10:32:40 -03:00
Mauro Carvalho Chehab	33ad41263d	i7300_edac: Fix memory detection in single mode When the machine is on single mode, only branch 0 channel 0 is valid. However, the code is not honouring it: [ 1952.639341] EDAC DEBUG: i7300_get_mc_regs: Memory controller operating on single mode ... [ 1952.639351] EDAC DEBUG: i7300_init_csrows: AMB-present CH0 = 0x1: [ 1952.639353] EDAC DEBUG: i7300_init_csrows: AMB-present CH1 = 0x0: [ 1952.639355] EDAC DEBUG: i7300_init_csrows: AMB-present CH2 = 0x0: [ 1952.639358] EDAC DEBUG: i7300_init_csrows: AMB-present CH3 = 0x0: ... [ 1952.639360] EDAC DEBUG: decode_mtr: MTR0 CH0: DIMMs are Present (mtr) [ 1952.639362] EDAC DEBUG: decode_mtr: WIDTH: x8 [ 1952.639363] EDAC DEBUG: decode_mtr: ELECTRICAL THROTTLING is enabled [ 1952.639364] EDAC DEBUG: decode_mtr: NUMBANK: 4 bank(s) [ 1952.639366] EDAC DEBUG: decode_mtr: NUMRANK: single [ 1952.639367] EDAC DEBUG: decode_mtr: NUMROW: 16,384 - 14 rows [ 1952.639368] EDAC DEBUG: decode_mtr: NUMCOL: 1,024 - 10 columns [ 1952.639370] EDAC DEBUG: decode_mtr: SIZE: 512 MB [ 1952.639371] EDAC DEBUG: decode_mtr: ECC code is 8-byte-over-32-byte SECDED+ code [ 1952.639373] EDAC DEBUG: decode_mtr: Scrub algorithm for x8 is on enhanced mode [ 1952.639374] EDAC DEBUG: decode_mtr: MTR0 CH1: DIMMs are Present (mtr) [ 1952.639376] EDAC DEBUG: decode_mtr: WIDTH: x8 [ 1952.639377] EDAC DEBUG: decode_mtr: ELECTRICAL THROTTLING is enabled [ 1952.639379] EDAC DEBUG: decode_mtr: NUMBANK: 4 bank(s) [ 1952.639380] EDAC DEBUG: decode_mtr: NUMRANK: single [ 1952.639381] EDAC DEBUG: decode_mtr: NUMROW: 16,384 - 14 rows [ 1952.639383] EDAC DEBUG: decode_mtr: NUMCOL: 1,024 - 10 columns [ 1952.639384] EDAC DEBUG: decode_mtr: SIZE: 512 MB [ 1952.639385] EDAC DEBUG: decode_mtr: ECC code is 8-byte-over-32-byte SECDED+ code [ 1952.639387] EDAC DEBUG: decode_mtr: Scrub algorithm for x8 is on enhanced mode ... [ 1952.639449] EDAC DEBUG: print_dimm_size: channel 0 \| channel 1 \| channel 2 \| channel 3 \| [ 1952.639451] EDAC DEBUG: print_dimm_size: ------------------------------------------------------------- [ 1952.639453] EDAC DEBUG: print_dimm_size: csrow/SLOT 0 512 MB \| 512 MB \| 0 MB \| 0 MB \| [ 1952.639456] EDAC DEBUG: print_dimm_size: csrow/SLOT 1 0 MB \| 0 MB \| 0 MB \| 0 MB \| [ 1952.639458] EDAC DEBUG: print_dimm_size: csrow/SLOT 2 0 MB \| 0 MB \| 0 MB \| 0 MB \| [ 1952.639460] EDAC DEBUG: print_dimm_size: csrow/SLOT 3 0 MB \| 0 MB \| 0 MB \| 0 MB \| [ 1952.639462] EDAC DEBUG: print_dimm_size: csrow/SLOT 4 0 MB \| 0 MB \| 0 MB \| 0 MB \| [ 1952.639464] EDAC DEBUG: print_dimm_size: csrow/SLOT 5 0 MB \| 0 MB \| 0 MB \| 0 MB \| [ 1952.639466] EDAC DEBUG: print_dimm_size: csrow/SLOT 6 0 MB \| 0 MB \| 0 MB \| 0 MB \| [ 1952.639468] EDAC DEBUG: print_dimm_size: csrow/SLOT 7 0 MB \| 0 MB \| 0 MB \| 0 MB \| [ 1952.639470] EDAC DEBUG: print_dimm_size: ------------------------------------------------------------- Instead of detecting a single memory at channel 0, it is showing twice the memory. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-04-29 10:32:39 -03:00
Aravind Gopalakrishnan	94c1acf2c8	amd64_edac: Add Family 16h support Add code to handle DRAM ECC errors decoding for Fam16h. Tested on Fam16h with ECC turned on using the mce_amd_inj facility and works fine. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> [ Boris: cleanups and clarifications ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-04-19 12:46:50 +02:00
Borislav Petkov	8b7719e08a	EDAC, mc_sysfs.c: Fix string array pointer types Those should be const ptr to a const string, fix them. Signed-off-by: Borislav Petkov <bp@suse.de>	2013-03-25 15:44:25 +01:00
Mauro Carvalho Chehab	9713faecff	EDAC: Merge mci.mem_is_per_rank with mci.csbased Both mci.mem_is_per_rank and mci.csbased denote the same thing: the memory controller is csrows based. Merge both fields into one. There's no need for the driver to actually fill it, as the core detects it by checking if one of the layers has the csrows type as part of the memory hierarchy: if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT) per_rank = true; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2013-03-16 06:32:30 +01:00
Mauro Carvalho Chehab	1eef128254	amd64_edac: Correct DIMM sizes We were filling the csrow size with a wrong value. `16a528ee39` ("EDAC: Fix csrow size reported in sysfs") tried to address the issue. It fixed the report with the old API but not with the new one. Correct it for the new API too. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> [ make it a per-csrow accounting regardless of ->channel_count ] Signed-off-by: Borislav Petkov <bp@suse.de>	2013-03-16 06:32:02 +01:00
Stephen Hemminger	fbe2d3616c	EDAC: Make sysfs functions static Fixes lots of sparse warnings here. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Borislav Petkov <bp@suse.de>	2013-03-05 11:33:57 +01:00
Linus Torvalds	ad6c2c2eb3	Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC fixes and ghes-edac from Mauro Carvalho Chehab: "For: - Some fixes at edac drivers (i7core_edac, sb_edac, i3200_edac); - error injection support for i5100, when EDAC debug is enabled; - fix edac when it is loaded builtin (early init for the subsystem); - a "Firmware First" EDAC driver, allowing ghes to report errors via EDAC (ghes-edac). With regards to ghes-edac, this fixes a longstanding BZ at Red Hat that happens with Nehalem and Sandy Bridge CPUs: when both GHES and i7core_edac or sb_edac are running, the error reports are unpredictable, as both BIOS and OS race to access the registers. With ghes-edac, the EDAC core will refuse to register any other concurrent memory error driver. This patchset moves the ghes struct definitions to a separate header file (include/acpi/ghes.h) and adds 3 hooks at apei/ghes.c to register/unregister and to report errors via ghes-edac. Those changes were acked by ghes driver maintainer (Huang)." * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (30 commits) i5100_edac: convert to use simple_open() ghes_edac: fix to use list_for_each_entry_safe() when delete list items ghes_edac: Fix RAS tracing ghes_edac: Make it compliant with UEFI spec 2.3.1 ghes_edac: Improve driver's printk messages ghes_edac: Don't credit the same memory dimm twice ghes_edac: do a better job of filling EDAC DIMM info ghes_edac: add support for reporting errors via EDAC ghes_edac: Register at EDAC core the BIOS report ghes: add the needed hooks for EDAC error report ghes: move structures/enum to a header file edac: add support for error type "Info" edac: add support for raw error reports edac: reduce stack pressure by using a pre-allocated buffer edac: lock module owner to avoid error report conflicts edac: remove proc_name from mci structure edac: add a new memory layer type edac: initialize the core earlier edac: better report error conditions in debug mode i5100_edac: Remove two checkpatch warnings ...	2013-02-28 20:42:33 -08:00
Wei Yongjun	b0769891ba	i5100_edac: convert to use simple_open() This removes an open coded simple_open() function and replaces file operations references to the function with simple_open() instead. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-26 10:06:18 -03:00
Wei Yongjun	5dae92a718	ghes_edac: fix to use list_for_each_entry_safe() when delete list items Since we will remove items off the list using list_del() we need to use a safe version of the list_for_each_entry() macro aptly named list_for_each_entry_safe(). Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-26 10:05:35 -03:00
Mauro Carvalho Chehab	8ae8f50ad8	ghes_edac: Fix RAS tracing With the current version of CPER, there's no way to associate an error with the memory error. So, the error location in EDAC layers is unused. As CPER has its own idea about memory architectural layers, just output whatever is there inside the driver's detail at the RAS tracepoint. The EDAC location keeps untouched, in the case that, in some future, we could actually map the error into the dimm labels. Now, the error message: [ 72.396625] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 72.396627] {1}[Hardware Error]: APEI generic hardware error status [ 72.396628] {1}[Hardware Error]: severity: 2, corrected [ 72.396630] {1}[Hardware Error]: section: 0, severity: 2, corrected [ 72.396632] {1}[Hardware Error]: flags: 0x01 [ 72.396634] {1}[Hardware Error]: primary [ 72.396635] {1}[Hardware Error]: section_type: memory error [ 72.396637] {1}[Hardware Error]: error_status: 0x0000000000000400 [ 72.396638] {1}[Hardware Error]: node: 3 [ 72.396639] {1}[Hardware Error]: card: 0 [ 72.396640] {1}[Hardware Error]: module: 0 [ 72.396641] {1}[Hardware Error]: device: 0 [ 72.396643] {1}[Hardware Error]: error_type: 18, unknown [ 72.396666] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:0 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Is properly represented on the trace event: kworker/0:2-584 [000] .... 72.396657: mc_event: 1 Corrected error: reserved error (18) on unknown label (mc:0 location👎-1:-1 address:0x00000000 grain:1 syndrome:0x00000000 APEI location: node:3 card:0 module:0 status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 sockets E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-25 19:42:17 -03:00
Mauro Carvalho Chehab	689c9cd812	ghes_edac: Make it compliant with UEFI spec 2.3.1 The UEFI spec defines the memory error types ans the bits that validate each field on the memory error record, at Appendix N om items N.2.5 (Memory Error Section) and N.2.11 (Error Status). Make the error description compliant with it, only showing the valid fields. The EDAC error log is now properly reporting the error: [ 281.556854] mce: [Hardware Error]: Machine check events logged [ 281.557042] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 281.557044] {2}[Hardware Error]: APEI generic hardware error status [ 281.557046] {2}[Hardware Error]: severity: 2, corrected [ 281.557048] {2}[Hardware Error]: section: 0, severity: 2, corrected [ 281.557050] {2}[Hardware Error]: flags: 0x01 [ 281.557052] {2}[Hardware Error]: primary [ 281.557053] {2}[Hardware Error]: section_type: memory error [ 281.557055] {2}[Hardware Error]: error_status: 0x0000000000000400 [ 281.557056] {2}[Hardware Error]: node: 3 [ 281.557057] {2}[Hardware Error]: card: 0 [ 281.557058] {2}[Hardware Error]: module: 1 [ 281.557059] {2}[Hardware Error]: device: 0 [ 281.557061] {2}[Hardware Error]: error_type: 18, unknown [ 281.557067] EDAC DEBUG: ghes_edac_report_mem_error: error validation_bits: 0x000040b9 [ 281.557084] EDAC MC0: 1 CE reserved error (18) on unknown label (node:3 card:0 module:1 page:0x0 offset:0x0 grain:0 syndrome:0x0 - status(0x0000000000000400): Storage error in DRAM memory) Tested on a 4 CPUs E5-4650 Sandy Bridge machine. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-25 19:42:16 -03:00
Mauro Carvalho Chehab	d2a6856614	ghes_edac: Improve driver's printk messages Provide a better infrastructure for printk's inside the driver: - use edac_dbg() for debug messages; - standardize the usage of pr_info(); - provide warning about the risk of relying on this driver. While here, changes the size of a fake memory to 1 page. This is as good or as bad as 1000 pages, but it is easier for userspace to detect, as I don't expect that any machine implementing GHES would provide just 1 page available ;) Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Conflicts: drivers/edac/ghes_edac.c	2013-02-25 19:42:15 -03:00
Mauro Carvalho Chehab	5ee726db52	ghes_edac: Don't credit the same memory dimm twice On my tests on a 4xE5-4650 CPU's system, the GHES EDAC driver is called twice. As the SMBIOS DMI enumeration call will seek for the entire DIMM sockets in the system, on this machine, equipped with 128 GB of RAM, the memory is displayed twice: +-----------------------+ \| mc0 \| mc1 \| ----------+-----------------------+ memory45: \| 8192 MB \| 8192 MB \| memory44: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory43: \| 0 MB \| 0 MB \| memory42: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ memory41: \| 0 MB \| 0 MB \| memory40: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory39: \| 8192 MB \| 8192 MB \| memory38: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory37: \| 0 MB \| 0 MB \| memory36: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ memory35: \| 0 MB \| 0 MB \| memory34: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory33: \| 8192 MB \| 8192 MB \| memory32: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory31: \| 0 MB \| 0 MB \| memory30: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ memory29: \| 0 MB \| 0 MB \| memory28: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory27: \| 8192 MB \| 8192 MB \| memory26: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory25: \| 0 MB \| 0 MB \| memory24: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ memory23: \| 0 MB \| 0 MB \| memory22: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory21: \| 8192 MB \| 8192 MB \| memory20: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory19: \| 0 MB \| 0 MB \| memory18: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ memory17: \| 0 MB \| 0 MB \| memory16: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory15: \| 8192 MB \| 8192 MB \| memory14: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory13: \| 0 MB \| 0 MB \| memory12: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ memory11: \| 0 MB \| 0 MB \| memory10: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory9: \| 8192 MB \| 8192 MB \| memory8: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory7: \| 0 MB \| 0 MB \| memory6: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ memory5: \| 0 MB \| 0 MB \| memory4: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory3: \| 8192 MB \| 8192 MB \| memory2: \| 0 MB \| 0 MB \| ----------+-----------------------+ memory1: \| 0 MB \| 0 MB \| memory0: \| 8192 MB \| 8192 MB \| ----------+-----------------------+ Total sum of 256 GB. As there's no reliable way to credit DIMMS to the right memory controller, just put everything on memory controller 0 (with should always exist). Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-25 19:42:14 -03:00
Mauro Carvalho Chehab	32fa1f53c2	ghes_edac: do a better job of filling EDAC DIMM info Instead of just faking a random value for the DIMM data, get the information that it is available via DMI table. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-25 19:42:13 -03:00
Mauro Carvalho Chehab	f04c62a703	ghes_edac: add support for reporting errors via EDAC Now that the EDAC core is capable of just forward the errors via the userspace API, add a report mechanism for the GHES errors. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-25 19:42:13 -03:00
Mauro Carvalho Chehab	77c5f5d2f2	ghes_edac: Register at EDAC core the BIOS report Register GHES at EDAC MC core, in order to avoid other drivers to also handle errors and mangle with error data. The edac core will warrant that just one driver will be used, so the first one to register (BIOS first) will be the one that will be reporting the hardware errors. For now, the EDAC driver does nothing but to register at the EDAC core, preventing the hardware-driven mechanism to interfere with GHES. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-25 19:42:12 -03:00
Linus Torvalds	06991c28f3	Driver core patches for 3.9-rc1 Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL If you need me to provide a merged tree to handle these resolutions, please let me know. Other than those patches, there's not much here, some minor fixes and updates. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEABECAAYFAlEmV0cACgkQMUfUDdst+yncCQCfbmnQZju7kzWXk6PjdFuKspT9 weAAoMCzcAtEzzc4LXuUxxG/sXBVBCjW =yWAQ -----END PGP SIGNATURE----- Merge tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core patches from Greg Kroah-Hartman: "Here is the big driver core merge for 3.9-rc1 There are two major series here, both of which touch lots of drivers all over the kernel, and will cause you some merge conflicts: - add a new function called devm_ioremap_resource() to properly be able to check return values. - remove CONFIG_EXPERIMENTAL Other than those patches, there's not much here, some minor fixes and updates" Fix up trivial conflicts * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits) base: memory: fix soft/hard_offline_page permissions drivercore: Fix ordering between deferred_probe and exiting initcalls backlight: fix class_find_device() arguments TTY: mark tty_get_device call with the proper const values driver-core: constify data for class_find_device() firmware: Ignore abort check when no user-helper is used firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER firmware: Make user-mode helper optional firmware: Refactoring for splitting user-mode helper code Driver core: treat unregistered bus_types as having no devices watchdog: Convert to devm_ioremap_resource() thermal: Convert to devm_ioremap_resource() spi: Convert to devm_ioremap_resource() power: Convert to devm_ioremap_resource() mtd: Convert to devm_ioremap_resource() mmc: Convert to devm_ioremap_resource() mfd: Convert to devm_ioremap_resource() media: Convert to devm_ioremap_resource() iommu: Convert to devm_ioremap_resource() drm: Convert to devm_ioremap_resource() ...	2013-02-21 12:05:51 -08:00
Mauro Carvalho Chehab	e7e248304c	edac: add support for raw error reports That allows APEI GHES driver to report errors directly, using the EDAC error report API. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 14:16:03 -03:00
Mauro Carvalho Chehab	c7ef764554	edac: reduce stack pressure by using a pre-allocated buffer The number of variables at the stack is too big. Reduces the stack usage by using a pre-allocated error buffer. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 13:48:45 -03:00
Mauro Carvalho Chehab	80cc7d87d5	edac: lock module owner to avoid error report conflicts APEI GHES and i7core_edac/sb_edac currently can be loaded at the same time, but those are Highlander modules: "There can be only one". There are two reasons for that: 1) Each driver assumes that it is the only one registering at the EDAC core, as it is driver's responsibility to number the memory controllers, and all of them start from 0; 2) If BIOS is handling the memory errors, the OS can't also be doing it, as one will mangle with the other. So, we need to add an module owner's lock at the EDAC core, in order to avoid having two different modules handling memory errors at the same time. The best way for doing this lock seems to use the driver's name, as this is unique, and won't require changes on every driver. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:38 -03:00
Mauro Carvalho Chehab	c66b5a79a9	edac: add a new memory layer type There are some cases where the memory controller layout is completely hidden. This is the case of firmware-driven error code, like the one provided by GHES. Add a new layer to be used on such memory error report mechanisms. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:37 -03:00
Mauro Carvalho Chehab	4ab19b06ac	edac: initialize the core earlier In order for it to work with it builtin, the EDAC core should be initialized earlier, otherwise the ghes_edac driver initializes before edac_mc_sysfs_init() being called: ... [ 4.998373] EDAC MC0: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes ... [ 4.998373] EDAC MC1: Giving out device to 'ghes_edac.c' 'ghes_edac': DEV ghes [ 6.519495] EDAC MC: Ver: 3.0.0 [ 6.523749] EDAC DEBUG: edac_mc_sysfs_init: device mc created The net result is that no EDAC sysfs nodes will appear. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:36 -03:00
Mauro Carvalho Chehab	3d958823e2	edac: better report error conditions in debug mode It is hard to find what's wrong without a proper error report. Improve it, in debug mode. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:35 -03:00
Mauro Carvalho Chehab	59b9796d1e	i5100_edac: Remove two checkpatch warnings The last changeset introduced a few checkpatch warnings: WARNING: debugfs_remove_recursive(NULL) is safe this check is probably not required 261: FILE: drivers/edac/i5100_edac.c:1207: + if (priv->debugfs) + debugfs_remove_recursive(priv->debugfs); WARNING: debugfs_remove(NULL) is safe this check is probably not required 290: FILE: drivers/edac/i5100_edac.c:1250: + if (i5100_debugfs) + debugfs_remove(i5100_debugfs); Get rid of them. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:34 -03:00
Niklas Söderlund	9cbc6d38f2	i5100_edac: connect fault injection to debugfs node Create a debugfs direcotry i5100_edac/mcX for each memory controller and add nodes to control how fault injection is preformed. After configuring an injection using inject_channel, inject_deviceptr1, inject_deviceptr2, inject_eccmask1, inject_eccmask2 and inject_hlinesel trigger the injection by writing anything to inject_enable. Example of a CE injection: echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel echo 61440 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1 echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable Example of UE injection: echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_channel echo 2 > /sys/kernel/debug/i5100_edac/mc0/inject_hlinesel echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask1 echo 65535 > /sys/kernel/debug/i5100_edac/mc0/inject_eccmask2 echo 17 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr1 echo 0 > /sys/kernel/debug/i5100_edac/mc0/inject_deviceptr2 echo 1 > /sys/kernel/debug/i5100_edac/mc0/inject_enable Sometimes it is needed to enable the injection more then once (echo to the inject_enable node) for the injection to happen, I am not sure why. Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:34 -03:00
Niklas Söderlund	53ceafd6a2	i5100_edac: add fault injection code Add fault injection based on information datasheet for i5100, see 1. In addition to the i5100 datasheet some missing information on injection functions where found through experimentation and the i7300 datasheet, see 2. [1] Intel 5100 Memory Controller Hub Chipset Doc.Nr: 318378 http://www.intel.com/content/dam/doc/datasheet/5100- memory-controller-hub-chipset-datasheet.pdf [2] Intel 7300 Chipset MemoryController Hub (MCH) Doc.Nr: 318082 http://www.intel.com/assets/pdf/datasheet/318082.pdf Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:33 -03:00
Niklas Söderlund	52608ba205	i5100_edac: probe for device 19 function 0 Probe and store the device handle for the device 19 function 0 during driver initialization. The device is used during fault injection. Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:32 -03:00
Mauro Carvalho Chehab	e7100478fa	edac: only create sdram_scrub_rate where supported Currently, sdram_scrub_rate sysfs node is created even if the device doesn't support get/set the scub rate. Change the logic to only create this device node when the operation is supported. Reported-by: Felipe Balbi <balbi@ti.com> Acked-by: Borislav Petkov <bp@suse.de> Reviewed-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:31 -03:00
Mauro Carvalho Chehab	61734e1835	i3200_edac: Fix the logic that detects filled memories After running a series of tests on an HP DL320, filled with different memory sizes, it was noticed that, when filled with just one DIMM on such hardware, the driver wrongly detects twice the memory, and thinks that both channels 0 and 1 are filled. It seems to be partially caused by the BIOS and partially by the driver. The i3200_edac current logic would be working fine if the BIOS were disabling the unused second channel when just one DIMM is connected, in order to do power-saving, as recommended on this chipset's datasheet. However, the BIOS on this particular machine doesn't do it: [ 16.741421] EDAC DEBUG: how_many_channels: In dual channel mode [ 16.741424] EDAC DEBUG: how_many_channels: 2 DIMMS per channel enabled So, the driver were assuming that 2 channels are enabled (well, they are, but the second is unused). Combined with that, I found two issues at the logic that creates the EDAC data, that were failing when the two channels are not equally filled (AFAICT, that happens only when just 1 DIMM is plugged). The first one is that a 0 at DRB means that nothing is filled. The driver's logic, however, do some calculation with that. The second one is that the logic that fills the DIMM data currently assumes that both channels are equally filled. I tested the system already with the current configuration and my patch and it is now working fine. So, for a 2R single DIMM 2Gb memory at dimm slot 01 (channel 0), it is now displaying: [ 16.741406] EDAC DEBUG: i3200_get_drbs: drb[0][0] = 16, drb[1][0] = 0 [ 16.741410] EDAC DEBUG: i3200_get_drbs: drb[0][1] = 32, drb[1][1] = 0 [ 16.741413] EDAC DEBUG: i3200_get_drbs: drb[0][2] = 32, drb[1][2] = 0 [ 16.741416] EDAC DEBUG: i3200_get_drbs: drb[0][3] = 32, drb[1][3] = 0 ... [ 16.741896] EDAC DEBUG: i3200_probe1: csrow 0, channel 0, size = 1024 Mb [ 16.741899] EDAC DEBUG: i3200_probe1: csrow 1, channel 0, size = 1024 Mb and the corresponding sysfs nodes are now properly filled. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:31 -03:00
Mauro Carvalho Chehab	5f466cb025	i3200_edac: Add more debug to the driver Currently, it is not possible to know, when debug is enabled, if the driver is using 2 DIMMS per channel mode or not. It is not possible to know the values of the drbs registers, used to identify the memory rank sizes. Add debug for both, as it helps to track issues on the driver. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2013-02-21 11:06:22 -03:00
Linus Torvalds	55529fa576	EDAC updates for 3.9 Only one: AMD F16h MCE decoding enablement from Jacob Shin. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJRI2t7AAoJEBLB8Bhh3lVKmLAP/RbYNVVvLZZxXqB6heCkX7D3 Avhi4GtuHqjY+79nqBoJEH9HjgzBzL6ffsJGPF3gtXrWybi6nzZkqog5QGRQFfGS gIoldbfP9MMF/RrMHbUF5x9Ha9EcdudDgYE3eqCq6KUfOMlUrBRECaU5YRrB+FKR qsuQNmF6J3uMPYwO9oGhTpA/vPwwapFaKgm+KND8q5h5JcpJrRpooKNQc1DheW+x nfYFpmSBW8eamVwV5DTjqKVhKeG3gB/3i6uSTpLvh4i0ZmV2vQ+drwC3aU0Eh0Q4 wnslEpQENjODsvbZH6b0j2WTDgBGKEpALRkMUDTnnqvYrR96cV02MWUfp6cbKYCv +d/iPt3hEBCyDTDzfP66N+1b+Wqls4Dx8lhhqLdPhsIpY2Qu8OQ8/mjyIIs12qK5 g9w3lsfy3shWmdsK/Ehd0IhNt1bzNV+63CINA2isC2QAp0Eqecj3jKrXor+MTPu1 uRY3wM+8CrdeFNNhhhsMQYIb4+DFGLgMwY5mV6z8ceFJAjxvyUxjObnSMopjBKog 9+JcyEHCADbpHsRup/zRIoGtSs+44sQ6l8l7pq6d4K4UfkG7Biq9lcxThID7rImn Rl6MOJVmJuM5n9JVIU4/bmhjfuTcI+gdshexDEH2HnqaXxd7ceIbnrFWFaBZEO5t t3WasBuIb30g1XCQmNT6 =LbcU -----END PGP SIGNATURE----- Merge tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: "Mostly AMD's side of EDAC. It is basically a new family enablement stuff: AMD F16h MCE decoding enablement from Jacob Shin. The rest is trivial cleanups." * tag 'edac_3.9' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: mpc85xx_edac: Fix typo EDAC, MCE, AMD: Remove unneeded exports EDAC, MCE, AMD: Add MCE decoding support for Family 16h EDAC, MCE, AMD: Make MC2 decoding per-family amd64_edac: Remove dead code	2013-02-20 11:28:50 -08:00
Mauro Carvalho Chehab	1339730e73	Linux 3.8-rc7 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.13 (GNU/Linux) iQEcBAABAgAGBQJRFWw4AAoJEHm+PkMAQRiGnTAH/jBHA2umNc3lc7VkUpusve4q GGIlNzYh6iuvIGwKQVj9YPsl37qtQnkDUmY8f8WxZjfxiIDw3TkRKDgNLJaM3Jy5 E426/FGlRx/Iia5+4tuBeoVYMoIPnndgW5lEAMRK1SvhTByhIYAXsaM0UwPBetb+ Z5NMdH1f1HVF7RCCmHAkzEk9z78UpdeCzI0t0XuasP2hp2ARAcE1okdO7fNaLiyo EmenGhRvy9bAsbRwV0rCdl0rQiZXEYM353DWS7n6q4fyVm8MXFwloUxnWCJTzOIL ZLJaz18adFj7Ip/X6ksnMQiQU2Q3B7aThs5ycv0QGxxL2rDFveYRRQ5ICrXOy3M= =jjBc -----END PGP SIGNATURE----- Merge tag 'v3.8-rc7' into next Linux 3.8-rc7 * tag 'v3.8-rc7': (12052 commits) Linux 3.8-rc7 net: sctp: sctp_endpoint_free: zero out secret key data net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree atm/iphase: rename fregt_t -> ffreg_t ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned ARM: DMA mapping: fix bad atomic test ARM: realview: ensure that we have sufficient IRQs available ARM: GIC: fix GIC cpumask initialization net: usb: fix regression from FLAG_NOARP code l2tp: dont play with skb->truesize net: sctp: sctp_auth_key_put: use kzfree instead of kfree netback: correct netbk_tx_err to handle wrap around. xen/netback: free already allocated memory on failure in xen_netbk_get_requests xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop. xen/netback: shutdown the ring if it contains garbage. drm/ttm: fix fence locking in ttm_buffer_object_transfer, 2nd try virtio_console: Don't access uninitialized data. net: qmi_wwan: add more Huawei devices, including E320 net: cdc_ncm: add another Huawei vendor specific device ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit() ...	2013-02-20 15:45:52 -03:00
Linus Torvalds	f98982ce80	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform changes from Ingo Molnar: - Support for the Technologic Systems TS-5500 platform, by Vivien Didelot - Improved NUMA support on AMD systems: Add support for federated systems where multiple memory controllers can exist and see each other over multiple PCI domains. This basically means that AMD node ids can be more than 8 now and the code handling this is taught to incorporate PCI domain into those IDs. - Support for the Goldfish virtual Android emulator, by Jun Nakajima, Intel, Google, et al. - Misc fixlets. * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Add TS-5500 platform support x86/srat: Simplify memory affinity init error handling x86/apb/timer: Remove unnecessary "if" goldfish: platform device for x86 amd64_edac: Fix type usage in NB IDs and memory ranges amd64_edac: Fix PCI function lookup x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_id x86, AMD, NB: Add multi-domain support	2013-02-19 20:11:07 -08:00
Baruch Siach	e7d2c215e5	mpc85xx_edac: Fix typo Correct typos. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Cc: Dave Jiang <djiang@mvista.com> Signed-off-by: Borislav Petkov <bp@suse.de>	2013-02-10 13:28:41 +01:00
Joe Perches	d3d09e1820	EDAC: Fix kcalloc argument order First number, then size. Signed-off-by: Joe Perches <joe@perches.com> Cc: <stable@vger.kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>	2013-01-30 11:38:25 +01:00
Dan Carpenter	8024c4c0b1	EDAC: Test correct variable in ->store function We're testing for ->show but calling ->store(). Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Borislav Petkov <bp@suse.de>	2013-01-30 11:38:11 +01:00
Borislav Petkov	0f08669e86	EDAC, MCE, AMD: Remove unneeded exports Initially, those strings describing different parts of an MCE message were shared with amd64_edac and were therefore exported to modules. However, all except pp_msgs are used only in one place right now so hide them and make them static. No functionality change. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-22 22:40:03 +01:00
Jacob Shin	980eec8b20	EDAC, MCE, AMD: Add MCE decoding support for Family 16h Add MCE decoding logic for AMD Family 16h processors. Boris: - drop unneeded uu_msgs export - exit early in cat_mc1_mce and save us an indentation level Signed-off-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-22 22:39:58 +01:00
Jacob Shin	4a73d3de63	EDAC, MCE, AMD: Make MC2 decoding per-family Currently only AMD Family 15h processors have special handling for MC2 errors. Since upcoming Family 16h will also need unique handling, let's make MC2 handling part of amd_decoder_ops. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-22 22:39:54 +01:00
Borislav Petkov	acc7fcb400	amd64_edac: Remove dead code `5e2af0c09e` ("edac: Don't initialize csrow's first_page & friends when not needed") removed useless initialization of variables but left in the functions which did that. They're unused now so drop them. Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-22 22:39:41 +01:00
Kees Cook	053417a53e	drivers/edac: remove depends on CONFIG_EXPERIMENTAL The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-17 12:11:26 -08:00
Daniel J Blueman	c7e5301a1b	amd64_edac: Fix type usage in NB IDs and memory ranges Use appropriate types for northbridge IDs and memory ranges. Mark immutable data const and keep within compilation unit on related structures. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1354265060-22956-2-git-send-email-daniel@numascale-asia.com [Boris: Drop arg change to node_to_amd_nb] Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-10 16:18:00 +01:00
Daniel J Blueman	e2c0bffea2	amd64_edac: Fix PCI function lookup Fix locating sibling memory controller PCI functions by using the correct PCI domain and use a northbridge descriptor only if found. We need to at least warn if it wasn't found so that it gets fixed and we don't go off with wrong results. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1354265060-22956-1-git-send-email-daniel@numascale-asia.com [Boris: remove wrong comment, sanitize code and warn if NB desc lookup fails] Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-10 16:17:59 +01:00
Daniel J Blueman	8b84c8df38	x86, AMD, NB: Use u16 for northbridge IDs in amd_get_nb_id Change amd_get_nb_id to return u16 to support >255 memory controllers, and related consistency fixes. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1353997932-8475-2-git-send-email-daniel@numascale-asia.com Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-10 16:17:58 +01:00
Daniel J Blueman	772c3ff385	x86, AMD, NB: Add multi-domain support Fix get_node_id to match northbridge IDs from the array of detected ones, allowing multi-server support such as with Numascale's NumaConnect, renaming to 'amd_get_node_id' for consistency. Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/1353997932-8475-1-git-send-email-daniel@numascale-asia.com [Boris: shorten lines to fit 80 cols] Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-10 16:17:58 +01:00
Linus Torvalds	57a0c1e2d6	Two error path fixes causing a crash and a Kconfig fix for an issue which spilled all EDAC suboptions into the 'Device Drivers' menu. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQ7D0jAAoJEBLB8Bhh3lVKtcgP/3uzBjETQ7KzKG9R+zHxkBom 7pl6UYqgquqpoqP3tGsn0+oaVyEvY3D11NMp/vhGxUjqJjDnIE3T/TvBMIwaRaq1 /bPrZ+hE5d2mHWP/TULEVVYMziHhR9UO1SpB+E7vX7FqKcLwzeiS7CMlm4qjq0qC duXCG/6gyEKIecT3KPuXFzpojV6pSpSIYvLu03hdS3e8w7Wb6jvomWV282TpaSim aFyL909M/f4kexWD0827VnBxogwBWPl1hIGfFVprYGrabgUTtfb/OFhnZut8U1FV giDtDgIC7nATp1LzTK+fPE5kx9yZvrE1SV8ZTNF6r0dYH14fnR3YgbQqbVgbLUYF icQqCHpyZKD+s/ajymq8lg+ltVtROqCLWdT6YFZlaBlObP0sYltpSj1JRcoC5O6c Lx252LQQmxMQZKGnNlBooI6QN73GXlsngTgHD9z3DpFgvRilhA8EmbK2ZFFesdkk R4s25yH8fno6ClhxlwyA3+0vwU9B19Ul00cx4/5ZXPdK56Fq7slM/ORTE5SQFrHH 6QG/pYDz09WXhKvOtX4ZRh2bBudMsY7mgsrxFelGYtsqHikLtYVUzkkyFkccjZMj UWcHUHsdSx+gr4DK/jclW1tDgwzh2G17d3FJobujGSgJJIqM2qOo/izON/g4+ojA RlKqYTCkAh3fX6DZX1vX =Qh4Y -----END PGP SIGNATURE----- Merge tag 'edac_fixes_for_3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC fixes from Borislav Petkov: "Two error path fixes causing a crash and a Kconfig fix for an issue which spilled all EDAC suboptions into the 'Device Drivers' menu." * tag 'edac_fixes_for_3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC: Cleanup device deregistering path EDAC: Fix EDAC Kconfig menu EDAC: Fix kernel panic on module unloading	2013-01-09 08:43:56 -08:00
Lans Zhang	44d22e2404	EDAC: Cleanup device deregistering path Use device_unregister to replace put_device + device_del for cleanup, and fix the potential use after free. Signed-off-by: Lans Zhang <jia.zhang@windriver.com> Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-07 17:43:00 +01:00
Borislav Petkov	5445166384	EDAC: Fix EDAC Kconfig menu After f65aad41772f("MIPS: Cavium: Add EDAC support."), when entering the "Device Drivers" toplevel menu in menuconfig, the suboptions behind EDAC appeared merged with the rest of the device drivers types. This was because the menuconfig option EDAC is querying an EDAC_SUPPORT Kconfig bool which was defined after the menu definition. When pushing EDAC_SUPPORT up, before the menu definition, the variable is defined earlier and the above menuconfig artifact doesn't happen. Drop a useless menuconfig comment while at it. Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-07 17:42:59 +01:00
Konstantin Khlebnikov	311bd84247	EDAC: Fix kernel panic on module unloading This patch fixes use-after-free and double-free bugs in edac_mc_sysfs_exit(). mci_pdev has single reference and put_device() calls mc_attr_release() which calls kfree(). The following device_del() works with already released memory. An another kfree() in edac_mc_sysfs_exit() releses the same memory again. Great. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: stable@vger.kernel.org # 3.[67] Cc: Denis Kirjanov <kirjanov@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Link: http://lkml.kernel.org/r/20121214110310.11019.21098.stgit@zurg Signed-off-by: Borislav Petkov <bp@alien8.de>	2013-01-07 17:42:58 +01:00
Greg Kroah-Hartman	9b3c6e85c2	Drivers: edac: remove __dev* attributes. CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Doug Thompson <dougthompson@xmission.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Daney <david.daney@cavium.com> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Olof Johansson <olof@lixom.net> Cc: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-01-03 15:57:03 -08:00
Lans Zhang	1c069100c1	i7core_edac: fix kernel crash on unloading i7core_edac. It is easy to trigger this crash on 3.7.0: root@intel_westmere_ep-3:~# modprobe -r i7core_edac EDAC PCI: Removed device 0 for i7core_edac EDAC PCI controller: DEV 0000:fe:03.0 EDAC MC: Removed device 1 for i7core_edac.c i7 core #1: DEV 0000:fe:03.0 EDAC PCI: Removed device 1 for i7core_edac EDAC PCI controller: DEV 0000:ff:03.0 EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:ff:03.0 BUG: unable to handle kernel NULL pointer dereference at 0000000000000110 IP: [<ffffffff82069ee9>] __blocking_notifier_call_chain+0x29/0x80 PGD 1eaae7067 PUD 1e96e4067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: minix acpi_cpufreq freq_table mperf ioatdma processor edac_core(-) iTCO_wdt coretemp evdev hwmon lpc_ich dca mfd_core crc32c_intel ioapic [last unloaded: i7core_edac] CPU 3 Pid: 1268, comm: modprobe Not tainted 3.7.0-WR5.0.1.0_standard+ #30 Intel Corporation S5520HC/S5520HC RIP: 0010:[<ffffffff82069ee9>] [<ffffffff82069ee9>] __blocking_notifier_call_chain+0x29/0x80 RSP: 0018:ffff8801eb12de28 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000000000f0 RCX: 00000000ffffffff RDX: ffff88012b452800 RSI: 0000000000000002 RDI: 00000000000000f0 RBP: ffff8801eb12de68 R08: 0000000000000000 R09: ffffea0004ad1118 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8801eb12dee8 R14: ffff88012b452800 R15: 000000000060e518 FS: 00007f9ea95a9700(0000) GS:ffff8801efc20000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000110 CR3: 00000001262f1000 CR4: 00000000000007e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process modprobe (pid: 1268, threadinfo ffff8801eb12c000, task ffff8801e8421690) Stack: ffff88012c802a00 ffff88012b445ec0 ffff88012c802300 ffff88012b452800 0000000000000000 ffff8801eb12dee8 000000000060e080 000000000060e518 ffff8801eb12de78 ffffffff82069f56 ffff8801eb12dea8 ffffffff824ead7c Call Trace: [<ffffffff82069f56>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff824ead7c>] device_del+0x3c/0x1d0 [<ffffffffa00095a8>] edac_mc_sysfs_exit+0x1c/0x2f [edac_core] [<ffffffffa000961c>] edac_exit+0x4f/0x56 [edac_core] [<ffffffff820a3d2a>] sys_delete_module+0x17a/0x240 [<ffffffff8212da7c>] ? vm_munmap+0x5c/0x80 [<ffffffff82877682>] system_call_fastpath+0x16/0x1b Code: 90 90 55 48 89 e5 48 83 ec 40 48 89 5d d8 4c 89 65 e0 4c 89 6d e8 4c 89 75 f0 4c 89 7d f8 66 66 66 66 90 31 c0 49 89 d6 48 89 fb <48> 8b 57 20 49 89 f5 41 89 cf 4c 8d 67 20 48 85 d2 74 2c 4c 89 RIP [<ffffffff82069ee9>] __blocking_notifier_call_chain+0x29/0x80 RSP <ffff8801eb12de28> CR2: 0000000000000110 ---[ end trace b69acf12ccad1c0d ]--- Usually, edac_subsys is grabbed one time by pci at initialization. But edac_subsys may be released several times if multiple pci MCs exist. The fix just makes the operations balanced. Signed-off-by: Lans Zhang <jia.zhang@windriver.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-12-21 08:01:23 -02:00
Niklas Söderlund	c31d34fe92	i7core_edac: fix erroneous size of static array Remove size from lookup arrays and mark them as const. Reviewed-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Niklas Söderlund <niso@kth.se> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-12-21 08:01:20 -02:00
Mauro Carvalho Chehab	da14d93d95	sb_edac: add a missing /n on a debug message [ 17.024963] EDAC DEBUG: get_memory_layout: TOHM: 132.160 GB (0x0000002043ffffff)<7>[ 17.024971] EDAC DEBUG: get_memory_layout: SAD#0 DRAM up to 33.792 GB (0x0000000840000000) Interleave: 8:6 reg=0x000083c3 Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-12-21 08:01:18 -02:00
Shaun Ruffell	80f5ab097b	edac: edac_mc no longer deals with kobjects directly There are no more embedded kobjects in struct mem_ctl_info. Remove a header and a comment that does not reflect the code anymore. Signed-off-by: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-12-21 08:00:02 -02:00
Linus Torvalds	cebfa85eb8	Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus Pull MIPS updates from Ralf Baechle: "The MIPS bits for 3.8. This also includes a bunch fixes that were sitting in the linux-mips.org git tree for a long time. This pull request contains updates to several OCTEON drivers and the board support code for BCM47XX, BCM63XX, XLP, XLR, XLS, lantiq, Loongson1B, updates to the SSB bus support, MIPS kexec code and adds support for kdump. When pulling this, there are two expected merge conflicts in include/linux/bcma/bcma_driver_chipcommon.h which are trivial to resolve, just remove the conflict markers and keep both alternatives." * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (90 commits) MIPS: PMC-Sierra Yosemite: Remove support. VIDEO: Newport Fix console crashes MIPS: wrppmc: Fix build of PCI code. MIPS: IP22/IP28: Fix build of EISA code. MIPS: RB532: Fix build of prom code. MIPS: PowerTV: Fix build. MIPS: IP27: Correct fucked grammar in ops-bridge.c MIPS: Highmem: Fix build error if CONFIG_DEBUG_HIGHMEM is disabled MIPS: Fix potencial corruption MIPS: Fix for warning from FPU emulation code MIPS: Handle COP3 Unusable exception as COP1X for FP emulation MIPS: Fix poweroff failure when HOTPLUG_CPU configured. MIPS: MT: Fix build with CONFIG_UIDGID_STRICT_TYPE_CHECKS=y MIPS: Remove unused smvp.h MIPS/EDAC: Improve OCTEON EDAC support. MIPS: OCTEON: Add definitions for OCTEON memory contoller registers. MIPS: OCTEON: Add OCTEON family definitions to octeon-model.h ata: pata_octeon_cf: Use correct byte order for DMA in when built little-endian. MIPS/OCTEON/ata: Convert pata_octeon_cf.c to use device tree. MIPS: Remove usage of CEVT_R4K_LIB config option. ...	2012-12-14 14:27:45 -08:00
David Daney	e1ced09797	MIPS/EDAC: Improve OCTEON EDAC support. Some initialization errors are reported with the existing OCTEON EDAC support patch. Also some parts have more than one memory controller. Fix the errors and add multiple controllers if present. Signed-off-by: David Daney <david.daney@cavium.com>	2012-12-13 18:15:26 +01:00
Ralf Baechle	f65aad4177	MIPS: Cavium: Add EDAC support. Drivers for EDAC on Cavium. Supported subsystems are: o CPU primary caches. These are parity protected only, so only error reporting. o Second level cache - ECC protected, provides SECDED. o Memory: ECC / SECDEC if used with suitable DRAM modules. The driver will will only initialize if ECC is enabled on a system so is safe to run on non-ECC memory. o PCI: Parity error reporting Since it is very hard to test this sort of code the implementation is very conservative and uses polling where possible for now. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>	2012-12-12 16:48:49 +01:00
Linus Torvalds	9ada9fd5df	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC fixes from Borislav Petkov: - EDAC core error path fix, from Denis Kirjanov. - Generalization of AMD MCE bank names and some minor error reporting improvements. - EDAC core cleanups and simplifications, from Wei Yongjun. - amd64_edac fixes for sysfs-reported values, from Josh Hunt. - some heavy amd64_edac error reporting path shaving, leading to removing a bunch of code. - amd64_edac error injection method improvements. - EDAC core cleanups and fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (24 commits) EDAC, pci_sysfs: Use for_each_pci_dev to simplify the code EDAC: Handle error path in edac_mc_sysfs_init() properly MCE, AMD: Dump error status MCE, AMD: Report decoded error type first MCE, AMD: Dump CPU f/m/s triple with the error MCE, AMD: Remove functional unit references EDAC: Convert to use simple_open() EDAC, Calxeda highbank: Convert to use simple_open() EDAC: Fix mc size reported in sysfs EDAC: Fix csrow size reported in sysfs EDAC: Pass mci parent EDAC: Add memory controller flags amd64_edac: Fix csrows size and pages computation amd64_edac: Use DBAM_DIMM macro amd64_edac: Fix K8 chip select reporting amd64_edac: Reorganize error reporting path amd64_edac: Do not check whether error address is valid amd64_edac: Improve error injection amd64_edac: Cleanup error injection code amd64_edac: Small fixlets and cleanups ...	2012-12-11 11:28:43 -08:00
Wei Yongjun	3bfe5aae8e	EDAC, pci_sysfs: Use for_each_pci_dev to simplify the code Use for_each_pci_dev to simplify the code. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> [Boris: cleanup comments and drop loop brackets] Signed-off-by: Borislav Petkov <bp@alien8.de>	2012-12-04 08:27:39 +01:00
Linus Torvalds	b52c6402b5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC fixes from Mauro Carvalho Chehab: "One EDAC core fix, and a few driver fixes (i7300, i9275x, i7core)." * git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: i7core_edac: fix panic when accessing sysfs files i7300_edac: Fix error flag testing edac: Fix the dimm filling for csrows-based layouts i82975x_edac: Fix dimm label initialization	2012-12-03 11:16:37 -08:00
Denis Kirjanov	2d56b109e3	EDAC: Handle error path in edac_mc_sysfs_init() properly Make sure proper deregistration happens on all error paths in edac_mc_sysfs_init. Signed-off-by: Denis Kirjanov <kirjanov@gmail.com> [ Boris: cleanup and concretize commit message ] Signed-off-by: Borislav Petkov <bp@alien8.de>	2012-11-28 11:56:52 +01:00
Borislav Petkov	d5c6770d4c	MCE, AMD: Dump error status Dump error status after decoding the error which describes the error disposition. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:56:30 +01:00
Borislav Petkov	d824c7718b	MCE, AMD: Report decoded error type first Instead of starting with the error details, report the decoded, readable error type first. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:56:17 +01:00
Borislav Petkov	f89f8388cd	MCE, AMD: Dump CPU f/m/s triple with the error It is very useful to have the family/model/stepping with the reported error so dump it. This saves us asking the bug reporter about it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:55:57 +01:00
Borislav Petkov	f05c41a9c6	MCE, AMD: Remove functional unit references Having the functional unit names in each bank decode is only misleading as this code supports multiple families and there's no guarantee the mapping between FUs and MCE banks will stay the same. And also, knowing the functional unit name doesn't help much since you end up looking at the respective BKDG anyway. So drop all FU references and use the MC bank numbers instead. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:55:44 +01:00
Wei Yongjun	db7312a295	EDAC: Convert to use simple_open() This removes an open coded simple_open() function and replaces file operations references to the function with simple_open() instead. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:55:22 +01:00
Wei Yongjun	f35d852e80	EDAC, Calxeda highbank: Convert to use simple_open() This removes an open coded simple_open() function and replaces file operations references to the function with simple_open() instead. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Cc: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:55:07 +01:00
Josh Hunt	3c0622760a	EDAC: Fix mc size reported in sysfs This is the complement to previous commit "EDAC: Fix csrow size reported in sysfs". This fixes the memory controller size reporting on csrow-based memory controllers. The csrow size is already combined for both channels. Without this patch memory size is reported doubled. Signed-off-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:54:50 +01:00
Borislav Petkov	16a528ee39	EDAC: Fix csrow size reported in sysfs On csrow-based memory controllers, we combine the csrow size from both channels and there's no need to do that again in csrow_size_show which leads to double the size of a csrow. Fix it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:54:40 +01:00
Borislav Petkov	921a689965	EDAC: Pass mci parent Initialize the mem_ctl_info descriptor of a csrow properly. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:54:23 +01:00
Borislav Petkov	1165276917	EDAC: Add memory controller flags The first flag is ->csbased and will be used in common EDAC code later. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:48:04 +01:00
Borislav Petkov	10de6497a5	amd64_edac: Fix csrows size and pages computation Make sure code pays attention to K8 having only one DCT, reformat and cleanup code, correct debug messages, remove unused code. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:47:36 +01:00
Borislav Petkov	0a5dfc3140	amd64_edac: Use DBAM_DIMM macro Instead of open-coding it, use the DBAM_DIMM macro in amd64_csrow_nr_pages() which we have already. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:46:19 +01:00
Borislav Petkov	bb89f5a054	amd64_edac: Fix K8 chip select reporting This basically reverts `603adaf6b3` ("amd64_edac: fix K8 chip select reporting") because it was a clumsy workaround for DIMM sizes reporting on K8 which got superceded by a much more correct one with `41d8bfaba7` ("amd64_edac: Improve DRAM address mapping") without removing the prior one. Remove it now finally. Reported-by: Josh Hunt <johunt@akamai.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:45:46 +01:00
Borislav Petkov	33ca0643c9	amd64_edac: Reorganize error reporting path Rewrite CE/UE paths so that they use the same code and drop additional code duplication in handle_ue. Add a struct err_info which collects required info for the error reporting. This, in turn, helps slimming all edac_mc_handle_error() calls down to one. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:45:34 +01:00
Borislav Petkov	c8d1adf092	amd64_edac: Do not check whether error address is valid All families report a valid error address when encountering a DRAM ECC error so no need to check it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:45:11 +01:00
Borislav Petkov	66fed2d464	amd64_edac: Improve error injection When injecting DRAM ECC errors over the F3xB[8,C] interface, the machine does this by injecting the error in the next non-cached access. This takes relatively long time on a normal system so that in order for us to expedite it, we disable the caches around the injection. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:45:01 +01:00
Borislav Petkov	6e71a870b8	amd64_edac: Cleanup error injection code Invert kstrtoul return value testing and win one indentation level. Also, shorten up macro names so that the lines can fit into 80 cols. No functional change. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:44:35 +01:00
Borislav Petkov	1f31677e0d	amd64_edac: Small fixlets and cleanups amd64_get_dram_hole_info: remove local variable 'base'. sys_addr_to_dram_addr: do not clear local variable 'ret'. Also, sanitize constants formatting. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:44:12 +01:00
Borislav Petkov	f430d5707a	EDAC: Handle empty msg strings when reporting errors A reported error could look like this [ 226.178315] EDAC MC0: 1 CE on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x427c0d offset:0xde0 grain:0 syndrome:0x1c6) with two spaces back-to-back due to the msg argument of edac_mc_handle_error being passed on empty by the specific drivers. Handle that. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:24:12 +01:00
Borislav Petkov	4da1b7bfe7	EDAC: Remove useless assignment of error type The tracepoint decodes the error type later anyway so remove a useless assignment to the temporary p which gets overwritten later anyway. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:23:50 +01:00
Borislav Petkov	37929874d4	EDAC: Boundary-check edac_debug_level Only levels [0:4] are allowed so enforce that. Also, while at it, massage Kconfig text and add valid debug levels range to the module parameter description. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:23:32 +01:00
Borislav Petkov	876bb331e2	EDAC: Respect operational state in edac_pci.c Currently, we unconditionally enable PCI polling and we don't look at the edac_op_state module parameter. Make this dependent on the parameter setting supplied on the command line. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-11-28 11:22:47 +01:00
Prarit Bhargava	42709efb3a	i7core_edac: fix panic when accessing sysfs files The i7core_edac addrmatch_dev and chancounts_dev have sysfs files associated with them. The sysfs files, however, are coded so that the parent device is is the mci device. This is incorrect and the mci struct should be obtained through the addrmatch_dev and chancounts_dev device's private data field which is populated in i7core_create_sysfs_devices(). Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-11-28 06:56:00 -02:00
Borislav Petkov	43aff26ce1	EDAC: Change Boris' email address My @amd.com address will be invalid soon so move to private email address. Signed-off-by: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/1351532410-4887-2-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-10-30 10:05:51 +01:00
Jean Delvare	7e06b7a333	i7300_edac: Fix error flag testing * Right-shift the values in GET_FBD_FAT_IDX and GET_FBD_NF_IDX, so that the callers get the result they expect. * Fix definition of FERR_FAT_FBD_ERR_MASK. * Call GET_FBD_NF_IDX, not GET_FBD_FAT_IDX, when operating on register FERR_NF_FBD. We were lucky they have the same definition. This fixes kernel bug #44131: https://bugzilla.kernel.org/show_bug.cgi?id=44131 Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: stable@vger.kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-25 07:43:00 -02:00
Mauro Carvalho Chehab	24bef66e74	edac: Fix the dimm filling for csrows-based layouts The driver is currently filling data in a wrong way, on drivers for csrows-based memory controller, when the first layer is a csrow. This is not easily to notice, as, in general, memories are filed in dual, interleaved, symetric mode, as very few memory controllers support asymetric modes. While digging into a bug for i82795_edac driver, the asymetric mode there is now working, allowing us to fill the machine with 4x1GB ranks at channel 0, and 2x512GB at channel 1: Channel 0 ranks: EDAC DEBUG: i82975x_init_csrows: DIMM A0: from page 0x00000000 to 0x0003ffff (size: 0x00040000 pages) EDAC DEBUG: i82975x_init_csrows: DIMM A1: from page 0x00040000 to 0x0007ffff (size: 0x00040000 pages) EDAC DEBUG: i82975x_init_csrows: DIMM A2: from page 0x00080000 to 0x000bffff (size: 0x00040000 pages) EDAC DEBUG: i82975x_init_csrows: DIMM A3: from page 0x000c0000 to 0x000fffff (size: 0x00040000 pages) Channel 1 ranks: EDAC DEBUG: i82975x_init_csrows: DIMM B0: from page 0x00100000 to 0x0011ffff (size: 0x00020000 pages) EDAC DEBUG: i82975x_init_csrows: DIMM B1: from page 0x00120000 to 0x0013ffff (size: 0x00020000 pages) Instead of properly showing the memories as such, before this patch, it shows the memory layout as: +-----------------------------------+ \| mc0 \| \| csrow0 \| csrow1 \| csrow2 \| ----------+-----------------------------------+ channel1: \| 1024 MB \| 1024 MB \| 512 MB \| channel0: \| 1024 MB \| 1024 MB \| 512 MB \| ----------+-----------------------------------+ as if both channels were symetric, grouping the DIMMs on a wrong layout. After this patch, the memory is correctly represented. So, for csrows at layers[0], it shows: +-----------------------------------------------+ \| mc0 \| \| csrow0 \| csrow1 \| csrow2 \| csrow3 \| ----------+-----------------------------------------------+ channel1: \| 512 MB \| 512 MB \| 0 MB \| 0 MB \| channel0: \| 1024 MB \| 1024 MB \| 1024 MB \| 1024 MB \| ----------+-----------------------------------------------+ For csrows at layers[1], it shows: +-----------------------+ \| mc0 \| \| channel0 \| channel1 \| --------+-----------------------+ csrow3: \| 1024 MB \| 0 MB \| csrow2: \| 1024 MB \| 0 MB \| --------+-----------------------+ csrow1: \| 1024 MB \| 512 MB \| csrow0: \| 1024 MB \| 512 MB \| --------+-----------------------+ So, no matter of what comes first, the information between channel and csrow will be properly represented. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-25 07:17:18 -02:00
Mauro Carvalho Chehab	4796968402	i82975x_edac: Fix dimm label initialization The driver has only 4 hardcoded labels, but allows much more memory. Fix it by removing the hardcoded logic, using snprintf() instead. [ 19.833972] general protection fault: 0000 [#1] SMP [ 19.837733] Modules linked in: i82975x_edac(+) edac_core firewire_ohci firewire_core crc_itu_t nouveau mxm_wmi wmi video i2c_algo_bit drm_kms_helper ttm drm i2c_core [ 19.837733] CPU 0 [ 19.837733] Pid: 390, comm: udevd Not tainted 3.6.1-1.fc17.x86_64.debug #1 Dell Inc. Precision WorkStation 390 /0MY510 [ 19.837733] RIP: 0010:[<ffffffff813463a8>] [<ffffffff813463a8>] strncpy+0x18/0x30 [ 19.837733] RSP: 0018:ffff880078535b68 EFLAGS: 00010202 [ 19.837733] RAX: ffff880069fa9708 RBX: ffff880078588000 RCX: ffff880069fa9708 [ 19.837733] RDX: 000000000000001f RSI: 5f706f5f63616465 RDI: ffff880069fa9708 [ 19.837733] RBP: ffff880078535b68 R08: ffff880069fa9727 R09: 000000000000fffe [ 19.837733] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003 [ 19.837733] R13: 0000000000000000 R14: ffff880069fa9290 R15: ffff880079624a80 [ 19.837733] FS: 00007f3de01ee840(0000) GS:ffff88007c400000(0000) knlGS:0000000000000000 [ 19.837733] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 19.837733] CR2: 00007f3de00b9000 CR3: 0000000078dbc000 CR4: 00000000000007f0 [ 19.837733] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 19.837733] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 19.837733] Process udevd (pid: 390, threadinfo ffff880078534000, task ffff880079642450) [ 19.837733] Stack: [ 19.837733] ffff880078535c18 ffffffffa017c6b8 00040000816d627f ffff880079624a88 [ 19.837733] ffffc90004cd6000 ffff880079624520 ffff88007ac21148 0000000000000000 [ 19.837733] 0000000000000000 0004000000000000 feda000078535bc8 ffffffff810d696d [ 19.837733] Call Trace: [ 19.837733] [<ffffffffa017c6b8>] i82975x_init_one+0x2e6/0x3e6 [i82975x_edac] ... Fix bug reported at: https://bugzilla.redhat.com/show_bug.cgi?id=848149 And, very likely: https://bbs.archlinux.org/viewtopic.php?id=148033 https://bugzilla.kernel.org/show_bug.cgi?id=47171 Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-10-25 07:17:01 -02:00
Andrew Morton	168bfeef7b	amd64_edac:__amd64_set_scrub_rate(): avoid overindexing scrubrates[] If none of the elements in scrubrates[] matches, this loop will cause __amd64_set_scrub_rate() to incorrectly use the n+1th element. As the function is designed to use the final scrubrates[] element in the case of no match, we can fix this bug by simply terminating the array search at the n-1th element. Boris: this code is fragile anyway, see here why: http://marc.info/?l=linux-kernel&m=135102834131236&w=2 It will be rewritten more robustly soonish. Reported-by: Denis Kirjanov <kirjanov@gmail.com> Cc: stable@vger.kernel.org Cc: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-10-24 16:13:27 +02:00
Linus Torvalds	5f3d2f2e1a	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Benjamin Herrenschmidt: "Some highlights in addition to the usual batch of fixes: - 64TB address space support for 64-bit processes by Aneesh Kumar - Gavin Shan did a major cleanup & re-organization of our EEH support code (IBM fancy PCI error handling & recovery infrastructure) which paves the way for supporting different platform backends, along with some rework of the PCIe code for the PowerNV platform in order to remove home made resource allocations and instead use the generic code (which is possible after some small improvements to it done by Gavin). - Uprobes support by Ananth N Mavinakayanahalli - A pile of embedded updates from Freescale folks, including new SoC and board supports, more KVM stuff including preparing for 64-bit BookE KVM support, ePAPR 1.1 updates, etc..." Fixup trivial conflicts in drivers/scsi/ipr.c * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (146 commits) powerpc/iommu: Fix multiple issues with IOMMU pools code powerpc: Fix VMX fix for memcpy case driver/mtd:IFC NAND:Initialise internal SRAM before any write powerpc/fsl-pci: use 'Header Type' to identify PCIE mode powerpc/eeh: Don't release eeh_mutex in eeh_phb_pe_get powerpc: Remove tlb batching hack for nighthawk powerpc: Set paca->data_offset = 0 for boot cpu powerpc/perf: Sample only if SIAR-Valid bit is set in P7+ powerpc/fsl-pci: fix warning when CONFIG_SWIOTLB is disabled powerpc/mpc85xx: Update interrupt handling for IFC controller powerpc/85xx: Enable USB support in p1023rds_defconfig powerpc/smp: Do not disable IPI interrupts during suspend powerpc/eeh: Fix crash on converting OF node to edev powerpc/eeh: Lock module while handling EEH event powerpc/kprobe: Don't emulate store when kprobe stwu r1 powerpc/kprobe: Complete kprobe and migrate exception frame powerpc/kprobe: Introduce a new thread flag powerpc: Remove unused __get_user64() and __put_user64() powerpc/eeh: Global mutex to protect PE tree powerpc/eeh: Remove EEH PE for normal PCI hotplug ...	2012-10-06 03:16:12 +09:00
Linus Torvalds	033d9959ed	Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue changes from Tejun Heo: "This is workqueue updates for v3.7-rc1. A lot of activities this round including considerable API and behavior cleanups. * delayed_work combines a timer and a work item. The handling of the timer part has always been a bit clunky leading to confusing cancelation API with weird corner-case behaviors. delayed_work is updated to use new IRQ safe timer and cancelation now works as expected. * Another deficiency of delayed_work was lack of the counterpart of mod_timer() which led to cancel+queue combinations or open-coded timer+work usages. mod_delayed_work[_on]() are added. These two delayed_work changes make delayed_work provide interface and behave like timer which is executed with process context. * A work item could be executed concurrently on multiple CPUs, which is rather unintuitive and made flush_work() behavior confusing and half-broken under certain circumstances. This problem doesn't exist for non-reentrant workqueues. While non-reentrancy check isn't free, the overhead is incurred only when a work item bounces across different CPUs and even in simulated pathological scenario the overhead isn't too high. All workqueues are made non-reentrant. This removes the distinction between flush_[delayed_]work() and flush_[delayed_]_work_sync(). The former is now as strong as the latter and the specified work item is guaranteed to have finished execution of any previous queueing on return. * In addition to the various bug fixes, Lai redid and simplified CPU hotplug handling significantly. * Joonsoo introduced system_highpri_wq and used it during CPU hotplug. There are two merge commits - one to pull in IRQ safe timer from tip/timers/core and the other to pull in CPU hotplug fixes from wq/for-3.6-fixes as Lai's hotplug restructuring depended on them." Fixed a number of trivial conflicts, but the more interesting conflicts were silent ones where the deprecated interfaces had been used by new code in the merge window, and thus didn't cause any real data conflicts. Tejun pointed out a few of them, I fixed a couple more. * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits) workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending() workqueue: use cwq_set_max_active() helper for workqueue_set_max_active() workqueue: introduce cwq_set_max_active() helper for thaw_workqueues() workqueue: remove @delayed from cwq_dec_nr_in_flight() workqueue: fix possible stall on try_to_grab_pending() of a delayed work item workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback() workqueue: use __cpuinit instead of __devinit for cpu callbacks workqueue: rename manager_mutex to assoc_mutex workqueue: WORKER_REBIND is no longer necessary for idle rebinding workqueue: WORKER_REBIND is no longer necessary for busy rebinding workqueue: reimplement idle worker rebinding workqueue: deprecate __cancel_delayed_work() workqueue: reimplement cancel_delayed_work() using try_to_grab_pending() workqueue: use mod_delayed_work() instead of __cancel + queue workqueue: use irqsafe timer for delayed_work workqueue: clean up delayed_work initializers and add missing one workqueue: make deferrable delayed_work initializer names consistent workqueue: cosmetic whitespace updates for macro definitions workqueue: deprecate system_nrt[_freezable]_wq workqueue: deprecate flush[_delayed]_work_sync() ...	2012-10-02 09:54:49 -07:00
Mauro Carvalho Chehab	deb09ddaff	sb_edac: Avoid overflow errors at memory size calculation Sandy bridge EDAC is calculating the memory size with overflow. Basically, the size field and the integer calculation is using 32 bits. More bits are needed, when the DIMM memories have high density. The net result is that memories are improperly reported there, when high-density DIMMs are used: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 0, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 591: mc#0: channel 1, dimm 0, -16384 Mb (-4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 As the number of pages value is handled at the EDAC core as unsigned ints, the driver shows the 16 GB memories at sysfs interface as 16760832 MB! The fix is simple: calculate the number of pages as unsigned 64-bits integer. After the patch, the memory size (16 GB) is properly detected: EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 0, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 EDAC DEBUG: in drivers/edac/sb_edac.c, line at 592: mc#0: channel 1, dimm 0, 16384 Mb (4194304 pages) bank: 8, rank: 2, row: 0x10000, col: 0x800 Cc: stable@kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-09-25 07:38:20 -03:00
Mauro Carvalho Chehab	b70f833377	i5000: Fix the memory size calculation with 2R memories When 2R memories are found, the memory size should be multiplied by two, otherwise, it will report half of the memory size: +-----------------------------------------------+ \| mc0 \| \| branch0 \| branch1 \| \| channel0 \| channel1 \| channel0 \| channel1 \| -------+-----------------------------------------------+ slot3: \| 0 MB \| 0 MB \| 0 MB \| 0 MB \| slot2: \| 0 MB \| 0 MB \| 0 MB \| 0 MB \| -------+-----------------------------------------------+ slot1: \| 0 MB \| 0 MB \| 0 MB \| 0 MB \| slot0: \| 1024 MB \| 1024 MB \| 1024 MB \| 1024 MB \| -------+-----------------------------------------------+ (the above machine have 4 x 2GB 2R memories) Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-09-25 07:38:19 -03:00
Mauro Carvalho Chehab	582a899622	i3200_edac: Fix memory rank size commit `a895bf8b1e` incorrectly changed the logic that fills the memory bank size. Fix it. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-09-25 07:32:33 -03:00
Shaun Ruffell	faa2ad09c0	edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs. Fix potential NULL pointer dereference in edac_unregister_sysfs() on system boot introduced in 3.6-rc1. Since commit `7a623c039` ("edac: rewrite the sysfs code to use struct device") edac_mc_alloc() no longer initializes embedded kobjects in struct mem_ctl_info. Therefore edac_mc_free() can no longer simply decrement a kobject reference count to free the allocated memory unless the memory controller driver module had also called edac_mc_add_mc(). Now edac_mc_free() will check if the newly embedded struct device has been registered with sysfs before using either the standard device release functions or freeing the data structures itself with logic pulled out of the error path of edac_mc_alloc(). The BUG this patch resolves for me: BUG: unable to handle kernel NULL pointer dereference at (null) EIP is at __wake_up_common+0x1a/0x6a Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 task.ti=f3dc6000) Call Trace: complete_all+0x3f/0x50 device_pm_remove+0x23/0xa2 device_del+0x34/0x142 edac_unregister_sysfs+0x3b/0x5c [edac_core] edac_mc_free+0x29/0x2f [edac_core] e7xxx_probe1+0x268/0x311 [e7xxx_edac] e7xxx_init_one+0x56/0x61 [e7xxx_edac] local_pci_probe+0x13/0x15 ... Cc: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-09-23 14:46:40 -07:00
Fengguang Wu	ef6e7816b4	edac_mc: fix messy kfree calls in the error path coccinelle warns about: + drivers/edac/edac_mc.c:429:9-23: ERROR: reference preceded by free on line 429 421 if (mci->csrows) { > 422 for (chn = 0; chn < tot_channels; chn++) { 423 csr = mci->csrows[chn]; 424 if (csr) { > 425 for (chn = 0; chn < tot_channels; chn++) 426 kfree(csr->channels[chn]); 427 kfree(csr); 428 } > 429 kfree(mci->csrows[i]); 430 } 431 kfree(mci->csrows); 432 } and that code block seem to mess things up in several ways (double free, memory leak, out-of-bound reads etc.): L422: The iterator "chn" and bound "tot_channels" are totally wrong. Should be "row" and "tot_csrows" respectively. Which means either memory leak, or out-of-bound reads (which if does not trigger an immediate page fault error, will further lead to kfree() on random addresses). L425: The inner loop is reusing the same iterator "chn" as the outer loop, which could lead to premature end of the outer loop, and hence memory leak. L429: The array index 'i' in mci->csrows[i] is a temporary value used in previous loops, and won't change at all in the current loop. Which means either out-of-bound read and possibly kfree(random number), or the same mci->csrows[i] get freed once and again, and possibly double free for the kfree(csr) in L427. L426/L427: a kfree(csr->channels) is needed in between to avoid leaking the memory. The buggy code was introduced by commit `de3910eb` ("edac: change the mem allocation scheme to make Documentation/kobject.txt happy") in the 3.6-rc1 merge window. Fix it by freeing up resources in this order: free csrows[i]->channels[j] free csrows[i]->channels free csrows[i] free csrows CC: Mauro Carvalho Chehab <mchehab@redhat.com> CC: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-09-23 14:45:26 -07:00
Jia Hongtao	905e75c46d	powerpc/fsl-pci: Unify pci/pcie initialization code We unified the Freescale pci/pcie initialization by changing the fsl_pci to a platform driver. In previous PCI code architecture the initialization routine is called at board_setup_arch stage. Now the initialization is done in probe function which is architectural better. Also It's convenient for adding PM support for PCI controller in later patch. Now we registered pci controllers as platform devices. So we combine two initialization code as one platform driver. Signed-off-by: Jia Hongtao <B38951@freescale.com> Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Chunhe Lan <Chunhe.Lan@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>	2012-09-12 14:57:12 -05:00
Tejun Heo	41f63c5359	workqueue: use mod_delayed_work() instead of cancel + queue Convert delayed_work users doing cancel_delayed_work() followed by queue_delayed_work() to mod_delayed_work(). Most conversions are straight-forward. Ones worth mentioning are, * drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always use mod_delayed_work() and cancel loop in edac_mc_reset_delay_period() is dropped. * drivers/platform/x86/thinkpad_acpi.c: No need to remember whether watchdog is active or not. @fan_watchdog_active and related code dropped. * drivers/power/charger-manager.c: Seemingly a lot of delayed_work_pending() abuse going on here. [delayed_]work_pending() are unsynchronized and racy when used like this. I converted one instance in fullbatt_handler(). Please conver the rest so that it invokes workqueue APIs for the intended target state rather than trying to game work item pending state transitions. e.g. if timer should be modified - call mod_delayed_work(), canceled - call cancel_delayed_work[_sync](). * drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling() simplified. Note that round_jiffies() calls in this function are meaningless. round_jiffies() work on absolute jiffies not delta delay used by delayed_work. v2: Tomi pointed out that __cancel_delayed_work() users can't be safely converted to mod_delayed_work(). They could be calling it from irq context and if that happens while delayed_work_timer_fn() is running, it could deadlock. __cancel_delayed_work() users are dropped. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Anton Vorontsov <cbouatmailru@gmail.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Doug Thompson <dougthompson@xmission.com> Cc: David Airlie <airlied@linux.ie> Cc: Roland Dreier <roland@kernel.org> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Len Brown <len.brown@intel.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Johannes Berg <johannes@sipsolutions.net>	2012-08-13 16:27:37 -07:00
Mauro Carvalho Chehab	c2078e4c91	Merge branch 'devel' * devel: (33 commits) edac i5000, i5400: fix pointer math in i5000_get_mc_regs() edac: allow specifying the error count with fake_inject edac: add support for Calxeda highbank L2 cache ecc edac: add support for Calxeda highbank memory controller edac: create top-level debugfs directory sb_edac: properly handle error count i7core_edac: properly handle error count edac: edac_mc_handle_error(): add an error_count parameter edac: remove arch-specific parameter for the error handler amd64_edac: Don't pass driver name as an error parameter edac_mc: check for allocation failure in edac_mc_alloc() edac: Increase version to 3.0.0 edac_mc: Cleanup per-dimm_info debug messages edac: Convert debugfX to edac_dbg(X, edac: Use more normal debugging macro style edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs Edac: Add ABI Documentation for the new device nodes edac: move documentation ABI to ABI/testing/sysfs-devices-edac i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy edac: change the mem allocation scheme to make Documentation/kobject.txt happy ...	2012-07-29 21:11:05 -03:00
Dan Carpenter	f58d0dee07	edac i5000, i5400: fix pointer math in i5000_get_mc_regs() "pvt->ambase" is a u64 datatype. The intent here is to fill the first half in the first call to pci_read_config_dword() and the other half in the second. Unfortunately the pointer math is wrong so we set the wrong data. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-27 09:08:40 -03:00
Mauro Carvalho Chehab	38ced28b21	edac: allow specifying the error count with fake_inject In order to test if the error counters are properly incremented, add a way to specify how many errors were generated by a trace. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-27 09:01:30 -03:00
Rob Herring	69154d0698	edac: add support for Calxeda highbank L2 cache ecc Add support for L2 ECC on Calxeda highbank platform. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-27 09:01:29 -03:00
Rob Herring	a1b01edb27	edac: add support for Calxeda highbank memory controller Add support for memory controller on Calxeda Highbank platforms. Highbank platforms support a single 4GB mini-DIMM with 1-bit correction and 2-bit detection. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-27 09:00:57 -03:00
Rob Herring	e7930ba49e	edac: create top-level debugfs directory Create a single, top-level "edac" directory for debugfs. An "mc[0-N]" directory is then created for each memory controller. Individual drivers can create additional entries such as h/w error injection control. Signed-off-by: Rob Herring <rob.herring@calxeda.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-12 12:15:49 -03:00
Mauro Carvalho Chehab	c10538396b	sb_edac: properly handle error count Instead of reporting the error count via driver-specific details, use the new way provided by edac_mc_handle_error. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-12 12:15:49 -03:00
Mauro Carvalho Chehab	00d1833927	i7core_edac: properly handle error count Instead of generating a burst of errors or reporting the error count via driver-specific details, use the new way provided by edac_mc_handle_error. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-12 12:15:48 -03:00
Mauro Carvalho Chehab	9eb07a7fb8	edac: edac_mc_handle_error(): add an error_count parameter In order to avoid loosing error events, it is desirable to group error events together and generate a single trace for several identical errors. The trace API already allows reporting multiple errors. Change the handle_error function to also allow that. The changes at the drivers were made by this small script: $file .=$_ while (<>); $file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g; print $file; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-12 12:15:47 -03:00
Mauro Carvalho Chehab	03f7eae80f	edac: remove arch-specific parameter for the error handler Remove the arch-dependent parameter, as it were not used, as the MCE tracepoint weren't implemented. It probably doesn't make sense to have an MCE-specific tracepoint, as this will cost more bytes at the tracepoint, and tracepoint is not free. The changes at the EDAC drivers were done by this small perl script: $file .=$_ while (<>); $file =~ s/(edac_mc_handle_error)\s$([^\;]+)\,([^\,$]+)\s\)/$1($2)/g; print $file; Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:52 -03:00
Mauro Carvalho Chehab	075f30901e	amd64_edac: Don't pass driver name as an error parameter The EDAC driver name doesn't help to handle EDAC errors. So, remove it from the EDAC error messages, preserving only the error_message. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:51 -03:00
Dan Carpenter	08a4a13690	edac_mc: check for allocation failure in edac_mc_alloc() Add a check here for if kzalloc() failed. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:51 -03:00
Mauro Carvalho Chehab	5156a5f4e0	edac: Increase version to 3.0.0 There were lots of changes introduced to justify renaming it to 3.0.0: - EDAC core were redesigned to represent all types of memory controllers; - EDAC API were redesigned to properly represent the memory controller hierarchy; - a tracepoint-based API were added to report memory errors. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:50 -03:00
Mauro Carvalho Chehab	6e84d359b2	edac_mc: Cleanup per-dimm_info debug messages The edac_mc_alloc() routine allocates one dimm_info device for all possible memories, including the non-filled ones. The debug messages there are somewhat confusing. So, cleans them, by moving the code that prints the memory location to edac_mc, and using it on both edac_mc_sysfs and edac_mc. Also, only dumps information when DIMM/ranks are actually filled. After this patch, a dimm-based memory controller will print the debug info as: [ 1011.380027] EDAC DEBUG: edac_mc_dump_csrow: csrow->csrow_idx = 0 [ 1011.380029] EDAC DEBUG: edac_mc_dump_csrow: csrow = ffff8801169be000 [ 1011.380031] EDAC DEBUG: edac_mc_dump_csrow: csrow->first_page = 0x0 [ 1011.380032] EDAC DEBUG: edac_mc_dump_csrow: csrow->last_page = 0x0 [ 1011.380034] EDAC DEBUG: edac_mc_dump_csrow: csrow->page_mask = 0x0 [ 1011.380035] EDAC DEBUG: edac_mc_dump_csrow: csrow->nr_channels = 3 [ 1011.380037] EDAC DEBUG: edac_mc_dump_csrow: csrow->channels = ffff8801149c2840 [ 1011.380039] EDAC DEBUG: edac_mc_dump_csrow: csrow->mci = ffff880117426000 [ 1011.380041] EDAC DEBUG: edac_mc_dump_channel: channel->chan_idx = 0 [ 1011.380042] EDAC DEBUG: edac_mc_dump_channel: channel = ffff8801149c2860 [ 1011.380044] EDAC DEBUG: edac_mc_dump_channel: channel->csrow = ffff8801169be000 [ 1011.380046] EDAC DEBUG: edac_mc_dump_channel: channel->dimm = ffff88010fe90400 ... [ 1011.380095] EDAC DEBUG: edac_mc_dump_dimm: dimm0: channel 0 slot 0 mapped as virtual row 0, chan 0 [ 1011.380097] EDAC DEBUG: edac_mc_dump_dimm: dimm = ffff88010fe90400 [ 1011.380099] EDAC DEBUG: edac_mc_dump_dimm: dimm->label = 'CPU#0Channel#0_DIMM#0' [ 1011.380101] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000 [ 1011.380103] EDAC DEBUG: edac_mc_dump_dimm: dimm->grain = 8 [ 1011.380104] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000 ... (a rank-based memory controller would print, instead of "dimm?", "rank?" on the above debug info) Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:49 -03:00
Joe Perches	956b9ba156	edac: Convert debugfX to edac_dbg(X, Use a more common debugging style. Remove __FILE__ uses, add missing newlines, coalesce formats and align arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:49 -03:00
Joe Perches	7e881856ee	edac: Use more normal debugging macro style Convert macros to a simpler style and enforce appropriate format checking when not CONFIG_EDAC_DEBUG. Use fmt and __VA_ARGS__, neaten macros. Move some string arrays to the debugfx uses and remove the now unnecessary CONFIG_EDAC_DEBUG variable block definitions. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:48 -03:00
Mauro Carvalho Chehab	dd23cd6eb1	edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs The debug macro already adds that. Most of the work here was made by this small script: $f .=$_ while (<>); $f =~ s/(debugf[0-9]\s$\s)__FILE__\s": /\1"/g; $f =~ s/(debugf[0-9]\s\(\s)__FILE__\s/\1/g; $f =~ s/(debugf[0-9]\s\(\s)__FILE__\s"MC: /\1"/g; $f =~ s/(debugf[0-9]\s\(\")\%s[\:\,\($]\s([^\"]\s[^\)]+)__func__\s\,\s/\1\2/g; $f =~ s/(debugf[0-9]\s$\")\%s[\:\,\($]\s([^\"]\s[^\)]+),\s__func__\s\)/\1\2)/g; $f =~ s/(debugf[0-9]\s$\"MC\:\s)\%s[\:\,\($]\s([^\"]\s[^\)]+)__func__\s\,\s/\1\2/g; $f =~ s/(debugf[0-9]\s$\"MC\:\s)\%s[\:\,\($]\s([^\"]\s[^\)]+),\s__func__\s*\)/\1\2)/g; $f =~ s/\"MC\: \\n\"/"MC:\\n"/g; print $f; After running the script, manual cleanups were done to fix it the remaining places. While here, removed the __LINE__ on most places, as it doesn't actually give useful info on most places. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:47 -03:00
Mauro Carvalho Chehab	356f0a3086	i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy Kernel kobjects have rigid rules: each container object should be dynamically allocated, and can't be allocated into a single kmalloc. EDAC never obeyed this rule: it has a single malloc function that allocates all needed data into a single kzalloc. As this is not accepted anymore, change the allocation schema of the EDAC *_info structs to enforce this Kernel standard. Cc: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:46 -03:00
Mauro Carvalho Chehab	de3910eb79	edac: change the mem allocation scheme to make Documentation/kobject.txt happy Kernel kobjects have rigid rules: each container object should be dynamically allocated, and can't be allocated into a single kmalloc. EDAC never obeyed this rule: it has a single malloc function that allocates all needed data into a single kzalloc. As this is not accepted anymore, change the allocation schema of the EDAC *_info structs to enforce this Kernel standard. Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Greg K H <gregkh@linuxfoundation.org> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:45 -03:00
Mauro Carvalho Chehab	e39f4ea9b0	edac: Only expose csrows/channels on legacy API if they're populated This patch actually fixes a bug with the legacy API, where, at the same csrow, some channels may have different DIMMs. This can happen on FB-DIMM/RAMBUS and modern Intel controllers. This is the case, for example, of Nehalem machines: $ ./edac-ctl --layout +-----------------------------------+ \| mc0 \| \| channel0 \| channel1 \| channel2 \| -------+-----------------------------------+ slot2: \| 0 MB \| 0 MB \| 0 MB \| slot1: \| 1024 MB \| 0 MB \| 0 MB \| slot0: \| 1024 MB \| 1024 MB \| 1024 MB \| -------+-----------------------------------+ Before this patch, non-filled memories were shown. Now, only what's filled is there: grep . /sys/devices/system/edac/mc/mc0/csrow/ch? /sys/devices/system/edac/mc/mc0/csrow0/ch0_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow0/ch0_dimm_label:CPU#0Channel#0_DIMM#0 /sys/devices/system/edac/mc/mc0/csrow0/ch1_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow0/ch1_dimm_label:CPU#0Channel#0_DIMM#1 /sys/devices/system/edac/mc/mc0/csrow1/ch0_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow1/ch0_dimm_label:CPU#0Channel#1_DIMM#0 /sys/devices/system/edac/mc/mc0/csrow2/ch0_ce_count:0 /sys/devices/system/edac/mc/mc0/csrow2/ch0_dimm_label:CPU#0Channel#2_DIMM#0 Thanks-to: Aristeu Rozanski Filho <arozansk@redhat.com> Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:44 -03:00
Mauro Carvalho Chehab	fd63312dfe	edac: Move grain/dtype/edac_type calculus to be out of channel loop The 3e7bddc changeset (edac: move dimm properties to struct memset_info) moved the calculus inside a loop. However, at those stuff are common to all channels, on several drivers, it is better to put the calculus outside the loop, to optimize the code. Reported-by: Aristeu Rozanski Filho <arozansk@redhat.com> Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:44 -03:00
Mauro Carvalho Chehab	452a6bf955	edac: Add debufs nodes to allow doing fake error inject Sometimes, it is useful to have a mechanism that generates fake errors, in order to test the EDAC core code, and the userspace tools. Provide such mechanism by adding a few debugfs nodes. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:43 -03:00
Mauro Carvalho Chehab	8ad6c78a69	edac: add a sysfs node to report the maximum location for the system The userspace tools need to know what's the maximum location on each system, as it helps to create nice maps showing how the memory was filled at the system. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:43 -03:00
Mauro Carvalho Chehab	1997471069	edac: add a new per-dimm API and make the old per-virtual-rank API obsolete The old EDAC API is broken. It only works fine for systems manufatured before 2005 and for AMD 64. The reason is that it forces all memory controller drivers to discover rank info. Also, it doesn't allow grouping the several ranks into a DIMM. So, what almost all modern drivers do is to create a fake virtual-rank information, and use it to cheat the EDAC core to accept the driver. While this works if the user has enough time to discover what DIMM slot corresponds to each "virtual-rank" information, it prevents EDAC usage for users with less available time. It also makes life hard for vendors that may want to provide a table with their motherboards to the userspace tool (edac-utils) as each driver has its own logic for the virtual mapping. So, the old API should be removed, in favor of a more flexible API that allows newer drivers to not lie to the EDAC core. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Josh Boyer <jwboyer@redhat.com> Cc: Hui Wang <jason77.wang@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:42 -03:00
Mauro Carvalho Chehab	d90c008963	edac: Get rid of the old kobj's from the edac mc code Now that al users for the old kobj raw access are gone, we can get rid of the legacy kobj-based structures and data. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:41 -03:00
Mauro Carvalho Chehab	5c4cdb5ae7	i7core_edac: convert it to use struct device Instead of relying on a complex logic inside the edac core to create a "device tree-like" sysfs struct, just use device_add. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:41 -03:00
Mauro Carvalho Chehab	c56087595f	amd64_edac: convert sysfs logic to use struct device Now that the EDAC core supports struct device, there's no sense on having any logic at the EDAC core to simulate it. So, instead of adding such logic there, change the logic at amd64_edac to use it. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:40 -03:00
Mauro Carvalho Chehab	ba004239e0	mpc85xx_edac: convert sysfs logic to use struct device Now that the EDAC core supports struct device, there's no sense on having any logic at the EDAC core to simulate it. So, instead of adding such logic there, change the logic at mpc85xx_edac to use it compile-tested only. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:39 -03:00
Mauro Carvalho Chehab	7a623c0390	edac: rewrite the sysfs code to use struct device The EDAC subsystem uses the old struct sysdev approach, creating all nodes using the raw sysfs API. This is bad, as the API is deprecated. As we'll be changing the EDAC API, let's first port the existing code to struct device. There's one drawback on this patch: driver-specific sysfs nodes, used by mpc85xx_edac, amd64_edac and i7core_edac won't be created anymore. While it would be possible to also port the device-specific code, that would mix kobj with struct device, with is not recommended. Also, it is easier and nicer to move the code to the drivers, instead, as the core can get rid of some complex logic that just emulates what the device_add() and device_create_file() already does. The next patches will convert the driver-specific code to use the device-specific calls. Then, the remaining bits of the old sysfs API will be removed. NOTE: a per-MC bus is required, otherwise devices with more than one memory controller will hit a bug like the one below: [ 819.094946] EDAC DEBUG: find_mci_by_dev: find_mci_by_dev() [ 819.094948] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device() idx=1 [ 819.094952] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device(): creating device mc1 [ 819.094967] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device creating dimm0, located at channel 0 slot 0 [ 819.094984] ------------[ cut here ]------------ [ 819.100142] WARNING: at fs/sysfs/dir.c:481 sysfs_add_one+0xc1/0xf0() [ 819.107282] Hardware name: S2600CP [ 819.111078] sysfs: cannot create duplicate filename '/bus/edac/devices/dimm0' [ 819.119062] Modules linked in: sb_edac(+) edac_core ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm microcode pcspkr iTCO_wdt iTCO_vendor_support igb i2c_i801 i2c_core sg ioatdma dca sr_mod cdrom sd_mod crc_t10dif ahci libahci isci libsas libata scsi_transport_sas scsi_mod wmi dm_mod [last unloaded: scsi_wait_scan] [ 819.175748] Pid: 10902, comm: modprobe Not tainted 3.3.0-0.11.el7.v12.2.x86_64 #1 [ 819.184113] Call Trace: [ 819.186868] [<ffffffff8105adaf>] warn_slowpath_common+0x7f/0xc0 [ 819.193573] [<ffffffff8105aea6>] warn_slowpath_fmt+0x46/0x50 [ 819.200000] [<ffffffff811f53d1>] sysfs_add_one+0xc1/0xf0 [ 819.206025] [<ffffffff811f5cf5>] sysfs_do_create_link+0x135/0x220 [ 819.212944] [<ffffffff811f7023>] ? sysfs_create_group+0x13/0x20 [ 819.219656] [<ffffffff811f5df3>] sysfs_create_link+0x13/0x20 [ 819.226109] [<ffffffff813b04f6>] bus_add_device+0xe6/0x1b0 [ 819.232350] [<ffffffff813ae7cb>] device_add+0x2db/0x460 [ 819.238300] [<ffffffffa0325634>] edac_create_dimm_object+0x84/0xf0 [edac_core] [ 819.246460] [<ffffffffa0325e18>] edac_create_sysfs_mci_device+0xe8/0x290 [edac_core] [ 819.255215] [<ffffffffa0322e2a>] edac_mc_add_mc+0x5a/0x2c0 [edac_core] [ 819.262611] [<ffffffffa03412df>] sbridge_register_mci+0x1bc/0x279 [sb_edac] [ 819.270493] [<ffffffffa03417a3>] sbridge_probe+0xef/0x175 [sb_edac] [ 819.277630] [<ffffffff813ba4e8>] ? pm_runtime_enable+0x58/0x90 [ 819.284268] [<ffffffff812f430c>] local_pci_probe+0x5c/0xd0 [ 819.290508] [<ffffffff812f5ba1>] __pci_device_probe+0xf1/0x100 [ 819.297117] [<ffffffff812f5bea>] pci_device_probe+0x3a/0x60 [ 819.303457] [<ffffffff813b1003>] really_probe+0x73/0x270 [ 819.309496] [<ffffffff813b138e>] driver_probe_device+0x4e/0xb0 [ 819.316104] [<ffffffff813b149b>] __driver_attach+0xab/0xb0 [ 819.322337] [<ffffffff813b13f0>] ? driver_probe_device+0xb0/0xb0 [ 819.329151] [<ffffffff813af5d6>] bus_for_each_dev+0x56/0x90 [ 819.335489] [<ffffffff813b0d7e>] driver_attach+0x1e/0x20 [ 819.341534] [<ffffffff813b0980>] bus_add_driver+0x1b0/0x2a0 [ 819.347884] [<ffffffffa0347000>] ? 0xffffffffa0346fff [ 819.353641] [<ffffffff813b19f6>] driver_register+0x76/0x140 [ 819.359980] [<ffffffff8159f18b>] ? printk+0x51/0x53 [ 819.365524] [<ffffffffa0347000>] ? 0xffffffffa0346fff [ 819.371291] [<ffffffff812f5896>] __pci_register_driver+0x56/0xd0 [ 819.378096] [<ffffffffa0347054>] sbridge_init+0x54/0x1000 [sb_edac] [ 819.385231] [<ffffffff8100203f>] do_one_initcall+0x3f/0x170 [ 819.391577] [<ffffffff810bcd2e>] sys_init_module+0xbe/0x230 [ 819.397926] [<ffffffff815bb529>] system_call_fastpath+0x16/0x1b [ 819.404633] ---[ end trace 1654fdd39556689f ]--- This happens because the bus is not being properly initialized. Instead of putting the memory sub-devices inside the memory controller, it is putting everything under the same directory: $ tree /sys/bus/edac/ /sys/bus/edac/ ├── devices │ ├── all_channel_counts -> ../../../devices/system/edac/mc/mc0/all_channel_counts │ ├── csrow0 -> ../../../devices/system/edac/mc/mc0/csrow0 │ ├── csrow1 -> ../../../devices/system/edac/mc/mc0/csrow1 │ ├── csrow2 -> ../../../devices/system/edac/mc/mc0/csrow2 │ ├── dimm0 -> ../../../devices/system/edac/mc/mc0/dimm0 │ ├── dimm1 -> ../../../devices/system/edac/mc/mc0/dimm1 │ ├── dimm3 -> ../../../devices/system/edac/mc/mc0/dimm3 │ ├── dimm6 -> ../../../devices/system/edac/mc/mc0/dimm6 │ ├── inject_addrmatch -> ../../../devices/system/edac/mc/mc0/inject_addrmatch │ ├── mc -> ../../../devices/system/edac/mc │ └── mc0 -> ../../../devices/system/edac/mc/mc0 ├── drivers ├── drivers_autoprobe ├── drivers_probe └── uevent On a multi-memory controller system, the names "csrow%d" and "dimm%d" should be under "mc%d", and not at the main hierarchy level. So, we need to create a per-MC bus, in order to have its own namespace. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Greg K H <gregkh@linuxfoundation.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 13:23:30 -03:00
Chris Metcalf	8447c4d15e	edac: Do alignment logic properly in edac_align_ptr() The logic was checking the sizeof the structure being allocated to determine whether an alignment fixup was required. This isn't right; what we actually care about is the alignment of the actual pointer that's about to be returned. This became an issue recently because struct edac_mc_layer has a size that is not zero modulo eight, so we were taking the correctly-aligned pointer and forcing it to be misaligned. On Tile this caused an alignment exception. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 12:43:16 -03:00
Mauro Carvalho Chehab	fd687502dc	edac: Rename the parent dev to pdev As EDAC doesn't use struct device itself, it created a parent dev pointer called as "pdev". Now that we'll be converting it to use struct device, instead of struct devsys, this needs to be fixed. No functional changes. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 11:56:06 -03:00
Mauro Carvalho Chehab	53f2d02898	RAS: Add a tracepoint for reporting memory controller events Add a new tracepoint-based hardware events report method for reporting Memory Controller events. Part of the description bellow is shamelessly copied from Tony Luck's notes about the Hardware Error BoF during LPC 2010 [1]. Tony, thanks for your notes and discussions to generate the h/w error reporting requirements. [1] http://lwn.net/Articles/416669/ We have several subsystems & methods for reporting hardware errors: 1) EDAC ("Error Detection and Correction"). In its original form this consisted of a platform specific driver that read topology information and error counts from chipset registers and reported the results via a sysfs interface. 2) mcelog - x86 specific decoding of machine check bank registers reporting in binary form via /dev/mcelog. Recent additions make use of the APEI extensions that were documented in version 4.0a of the ACPI specification to acquire more information about errors without having to rely reading chipset registers directly. A user level programs decodes into somewhat human readable format. 3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and decodes errors reported via machine check bank registers in AMD processors to the console log using printk(); Each of these mechanisms has a band of followers ... and none of them appear to meet all the needs of all users. As part of a RAS subsystem, let's encapsulate the memory error hardware events into a trace facility. The tracepoint printk will be displayed like: mc_event: [quant] (Corrected\|Uncorrected\|Fatal) error:[error msg] on [label] ([location] [edac_mc detail] [driver_detail] Where: [quant] is the quantity of errors [error msg] is the driver-specific error message (e. g. "memory read", "bus error", ...); [location] is the location in terms of memory controller and branch/channel/slot, channel/slot or csrow/channel; [label] is the memory stick label; [edac_mc detail] describes the address location of the error and the syndrome; [driver detail] is driver-specifig error message details, when needed/provided (e. g. "area:DMA", ...) For example: mc_event: 1 Corrected error:memory read on memory stick DIMM_1A (mc:0 location:0:0:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA) Of course, any userspace tools meant to handle errors should not parse the above data. They should, instead, use the binary fields provided by the tracepoint, mapping them directly into their Management Information Base. NOTE: The original patch was providing an additional mechanism for MCA-based trace events that also contained MCA error register data. However, as no agreement was reached so far for the MCA-based trace events, for now, let's add events only for memory errors. A latter patch is planned to change the tracepoint, for those types of event. Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 11:55:52 -03:00
Kim Phillips	b9bc5ddb1b	mpc85xx_edac: fix error: too few arguments to function 'edac_mc_alloc' commit `ca0907b` "edac: Remove the legacy EDAC ABI" broke mpc85xx_edac in the following manner: mpc85xx_edac.c:983:35: error: too few arguments to function 'edac_mc_alloc' this patch puts back the missing 'layers' argument. [mchehab@redhat.com: As Ben sent a similar fix, I added his SOB on this patch] Signed-off-by: Kim Phillips <kim.phillips@freescale.com> Signed-off-by: Ben Collins <bcollins@ubuntu.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 11:49:51 -03:00
Chen Gong	2cbb587d3b	edac: fix the error about memory type detection on SandyBridge On SandyBridge, DDRIOA(Dev: 17 Func: 0 Offset: 328) is used to detect whether DIMM is RDIMM/LRDIMM, not TA(Dev: 15 Func: 0). Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 11:49:51 -03:00
Chen Gong	e35fca4791	edac: avoid mce decoding crash after edac driver unloaded Some edac drivers register themselves as mce decoders via notifier_chain. But in current notifier_chain implementation logic, it doesn't accept same notifier registered twice. If so, it will be wrong when adding/removing the element from the list. For example, on one SandyBridge platform, remove module sb_edac and then trigger one error, it will hit oops because it has no mce decoder registered but related notifier_chain still points to an invalid callback function. Here is an example: Call Trace: [<ffffffff8150ef6a>] atomic_notifier_call_chain+0x1a/0x20 [<ffffffff8102b936>] mce_log+0x46/0x180 [<ffffffff8102eaea>] apei_mce_report_mem_error+0x4a/0x60 [<ffffffff812e19d2>] ghes_do_proc+0x192/0x210 [<ffffffff812e2066>] ghes_proc+0x46/0x70 [<ffffffff812e20d8>] ghes_notify_sci+0x48/0x80 [<ffffffff8150ef05>] notifier_call_chain+0x55/0x80 [<ffffffff81076f1a>] __blocking_notifier_call_chain+0x5a/0x80 [<ffffffff812aea11>] ? acpi_os_wait_events_complete+0x23/0x23 [<ffffffff81076f56>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff812ddc4d>] acpi_hed_notify+0x19/0x1b [<ffffffff812b16bd>] acpi_device_notify+0x19/0x1b [<ffffffff812beb38>] acpi_ev_notify_dispatch+0x67/0x7f [<ffffffff812aea3a>] acpi_os_execute_deferred+0x29/0x36 [<ffffffff81069dc2>] process_one_work+0x132/0x450 [<ffffffff8106bbcb>] worker_thread+0x17b/0x3c0 [<ffffffff8106ba50>] ? manage_workers+0x120/0x120 [<ffffffff81070aee>] kthread+0x9e/0xb0 [<ffffffff81514724>] kernel_thread_helper+0x4/0x10 [<ffffffff81070a50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff81514720>] ? gs_change+0x13/0x13 Code: f3 49 89 d4 45 85 ed 4d 89 c6 48 8b 0f 74 48 48 85 c9 75 17 eb 41 0f 1f 80 00 00 00 00 41 83 ed 01 4c 89 f9 74 22 4d 85 ff 74 1d <4c> 8b 79 08 4c 89 e2 48 89 de 48 89 cf ff 11 4d 85 f6 74 04 41 RIP [<ffffffff8150eef6>] notifier_call_chain+0x46/0x80 RSP <ffff88042868fb20> CR2: ffffffffa01af838 ---[ end trace 0100930068e73e6f ]--- BUG: unable to handle kernel paging request at fffffffffffffff8 IP: [<ffffffff810705b0>] kthread_data+0x10/0x20 PGD `1a0d067` PUD 1a0e067 PMD 0 Oops: 0000 [#2] SMP Only i7core_edac and sb_edac have such issues because they have more than one memory controller which means they have to register mce decoder many times. Cc: <stable@vger.kernel.org> # 3.2 and upper Signed-off-by: Chen Gong <gong.chen@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-06-11 11:49:51 -03:00
H. Peter Anvin	bbd771474e	Merge branch 'x86/trampoline' into x86/urgent x86/trampoline contains an urgent commit which is necessarily on a newer baseline. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-05-30 12:11:32 -07:00
Ingo Molnar	403e1c5b74	Merge branch 'x86/mce' into x86/urgent Merge in these fixlets. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-05-30 14:12:06 +02:00
Linus Torvalds	87a5af24e5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC internal API changes from Mauro Carvalho Chehab: "This changeset is the first part of a series of patches that fixes the EDAC sybsystem. On this set, it changes the Kernel EDAC API in order to properly represent the Intel i3/i5/i7, Xeon 3xxx/5xxx/7xxx, and Intel E5-xxxx memory controllers. The EDAC core used to assume that: - the DRAM chip select pin is directly accessed by the memory controller - when multiple channels are used, they're all filled with the same type of memory. None of the above premises is true on Intel memory controllers since 2002, when RAMBUS and FB-DIMMs were introduced, and Advanced Memory Buffer or by some similar technologies hides the direct access to the DRAM pins. So, the existing drivers for those chipsets had to lie to the EDAC core, in general telling that just one channel is filled. That produces some hard to understand error messages like: EDAC MC0: CE row 3, channel 0, label "DIMM1": 1 Unknown error(s): memory read error on FATAL area : cpu=0 Err=0008:00c2 (ch=2), addr = 0xad1f73480 => socket=0, Channel=0(mask=2), rank=1 The location information there (row3 channel 0) is completely bogus: it has no physical meaning, and are just some random values that the driver uses to talk with the EDAC core. The error actually happened at CPU socket 0, channel 0, slot 1, but this is not reported anywhere, as the EDAC core doesn't know anything about the memory layout. So, only advanced users that know how the EDAC driver works and that tests their systems to see how DIMMs are mapped can actually benefit for such error logs. This patch series fixes the error report logic, in order to allow the EDAC to expose the memory architecture used by them to the EDAC core. So, as the EDAC core now understands how the memory is organized, it can provide an useful report: EDAC MC0: CE memory read error on DIMM1 (channel:0 slot:1 page:0x364b1b offset:0x600 grain:32 syndrome:0x0 - count:1 area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:4) The location of the DIMM where the error happened is reported by "MC0" (cpu socket #0), at "channel:0 slot:1" location, and matches the physical location of the DIMM. There are two remaining issues not covered by this patch series: - The EDAC sysfs API will still report bogus values. So, userspace tools like edac-utils will still use the bogus data; - Add a new tracepoint-based way to get the binary information about the errors. Those are on a second series of patches (also at -next), but will probably miss the train for 3.5, due to the slow review process." Fix up trivial conflict (due to spelling correction of removed code) in drivers/edac/edac_device.c * git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (42 commits) i7core: fix ranks information at the per-channel struct i5000: Fix the fatal error handling i5100_edac: Fix a warning when compiled with 32 bits i82975x_edac: Test nr_pages earlier to save a few CPU cycles e752x_edac: provide more info about how DIMMS/ranks are mapped i5000_edac: Fix the logic that retrieves memory information i5400_edac: improve debug messages to better represent the filled memory edac: Cleanup the logs for i7core and sb edac drivers edac: Initialize the dimm label with the known information edac: Remove the legacy EDAC ABI x38_edac: convert driver to use the new edac ABI tile_edac: convert driver to use the new edac ABI sb_edac: convert driver to use the new edac ABI r82600_edac: convert driver to use the new edac ABI ppc4xx_edac: convert driver to use the new edac ABI pasemi_edac: convert driver to use the new edac ABI mv64x60_edac: convert driver to use the new edac ABI mpc85xx_edac: convert driver to use the new edac ABI i82975x_edac: convert driver to use the new edac ABI i82875p_edac: convert driver to use the new edac ABI ...	2012-05-29 18:32:37 -07:00
Mauro Carvalho Chehab	0bf09e829d	i7core: fix ranks information at the per-channel struct There is a flag at the per-channel struct that indicates if there are any 4R dimm on it. The way the presence of this flag were reported is not ok, as it might give the false idea that the channel were filled with 2R memories: [ 580.588701] EDAC DEBUG: get_dimm_config: Ch1 phy rd1, wr1 (0x063f7431): 2 ranks, UDIMMs [ 580.588704] EDAC DEBUG: get_dimm_config: dimm 0 1024 Mb offset: 0, bank: 8, rank: 1, row: 0x4000, col: 0x400 (in this case, just one 1R memory is filled on channel 1) So, use a better way to represent the per-channel ranks information. After the patch, it will show: [ 2002.233978] EDAC DEBUG: get_dimm_config: Ch0 phy rd0, wr0 (0x063f7431): UDIMMs [ 2002.233982] EDAC DEBUG: get_dimm_config: dimm 0 1024 Mb offset: 0, bank: 8, rank: 1, row: 0x4000, col: 0x400 [ 2002.233988] EDAC DEBUG: get_dimm_config: dimm 1 1024 Mb offset: 4, bank: 8, rank: 1, row: 0x4000, col: 0x400 (in this case, there isn't any 4R memories) Reported-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:55 -03:00
Mauro Carvalho Chehab	486dfb1638	i5000: Fix the fatal error handling The fatal error channel bits point to a single channel, and not to a range of channels. Fix the code to properly report it, instead of printing messages like: kernel: EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4) Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:54 -03:00
Mauro Carvalho Chehab	9f70d08a4c	i5100_edac: Fix a warning when compiled with 32 bits drivers/edac/i5100_edac.c: In function ‘i5100_init_csrows’: drivers/edac/i5100_edac.c:862:3: warning: format ‘%zd’ expects argument of type ‘signed size_t’, but argument 5 has type ‘long unsigned int’ [-Wformat] Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:54 -03:00
Mauro Carvalho Chehab	36683aab90	i82975x_edac: Test nr_pages earlier to save a few CPU cycles Avoid test nr_pages twice, and initializing some data that won't be used. Cleanup patch only. Reported-by: Aristeu Rozanski Filho <arozansk@redhat.com> Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:53 -03:00
Mauro Carvalho Chehab	805afb6997	e752x_edac: provide more info about how DIMMS/ranks are mapped No funtional changes here. Only the comments got updated. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:53 -03:00
Mauro Carvalho Chehab	64e1fdaf55	i5000_edac: Fix the logic that retrieves memory information The logic there is broken: it basically creates two csrows for each DIMM and assumes that all DIMM's are dual rank. Only one of the csrows will contain the entire DIMM size. If single rank memories are found, they'll be marked with 0 bytes. The check if the AMB is present were also wrong. Yet, as the error reports don't use the memory size in order to credit an error to the right DIMM, that part of the driver seems to work. That's why probably nobody detected the issue yet. After this patch, the memory layout is now properly reported, when debug mode is enabled, and the number of ranks per dimm is now shown: calculate_dimm_size: ---------------------------------------------------------- calculate_dimm_size: slot 3 0 MB \| 0 MB \| 0 MB \| 0 MB \| calculate_dimm_size: slot 2 0 MB \| 0 MB \| 0 MB \| 0 MB \| calculate_dimm_size: ---------------------------------------------------------- calculate_dimm_size: slot 1 0 MB \| 0 MB \| 0 MB \| 0 MB \| calculate_dimm_size: slot 0 512 MB 1R\| 512 MB 1R\| 512 MB 1R\| 512 MB 1R\| calculate_dimm_size: ---------------------------------------------------------- calculate_dimm_size: channel 0 \| channel 1 \| channel 2 \| channel 3 \| calculate_dimm_size: branch 0 \| branch 1 \| (1R above means that all memories on my test machine are single-ranked) Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:52 -03:00
Mauro Carvalho Chehab	68d086f89b	i5400_edac: improve debug messages to better represent the filled memory Improves the debug output message, in order to better represent the memory controller hierarchy, when outputing the debug messages. No functional changes when debug is disabled. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:51 -03:00
Mauro Carvalho Chehab	e17a2f42a4	edac: Cleanup the logs for i7core and sb edac drivers Remove some information that it is duplicated at the MCE log, and don't have much usage for the error. Those data will be added again, when creating a trace function that outputs both memory errors and MCE fields. Cc: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:51 -03:00
Mauro Carvalho Chehab	5926ff502f	edac: Initialize the dimm label with the known information While userspace doesn't fill the dimm labels, add there the dimm location, as described by the used memory model. This could eventually match what is described at the dmidecode, making easier for people to identify the memory. For example, on an Intel motherboard where the DMI table is reliable, the first memory stick is described as: Memory Device Array Handle: 0x0029 Error Information Handle: Not Provided Total Width: 64 bits Data Width: 64 bits Size: 2048 MB Form Factor: DIMM Set: 1 Locator: A1_DIMM0 Bank Locator: A1_Node0_Channel0_Dimm0 Type: <OUT OF SPEC> Type Detail: Synchronous Speed: 800 MHz Manufacturer: A1_Manufacturer0 Serial Number: A1_SerNum0 Asset Tag: A1_AssetTagNum0 Part Number: A1_PartNum0 The memory named as "A1_DIMM0" is physically located at the first memory controller (node 0), at channel 0, dimm slot 0. After this patch, the memory label will be filled with: /sys/devices/system/edac/mc/csrow0/ch0_dimm_label:mc#0channel#0slot#0 And (after the new EDAC API patches) as: /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0 So, even if the memory label is not initialized on userspace, an useful information with the error location is filled there, expecially since several systems/motherboards are provided with enough info to map from channel/slot (or branch/channel/slot) into the DIMM label. So, letting the EDAC core fill it by default is a good thing. It should noticed that, as the label filling happens at the edac_mc_alloc(), drivers can override it to better describe the memories (and some actually do it). Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:50 -03:00
Mauro Carvalho Chehab	ca0907b9e4	edac: Remove the legacy EDAC ABI Now that all drivers got converted to use the new ABI, we can drop the old one. Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:50 -03:00
Mauro Carvalho Chehab	e2acc357ee	x38_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:49 -03:00
Mauro Carvalho Chehab	40467db770	tile_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Acked-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:48 -03:00
Mauro Carvalho Chehab	c36e3e7768	sb_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:48 -03:00
Mauro Carvalho Chehab	63b5d1d9aa	r82600_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Tim Small <tim@buttersideup.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:47 -03:00
Mauro Carvalho Chehab	94d9337459	ppc4xx_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Josh Boyer <jwboyer@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:46 -03:00
Mauro Carvalho Chehab	f34575aca9	pasemi_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:46 -03:00
Mauro Carvalho Chehab	a583ac6ca8	mv64x60_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:45 -03:00
Mauro Carvalho Chehab	ad4d6e2311	mpc85xx_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:45 -03:00
Mauro Carvalho Chehab	705213580b	i82975x_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:44 -03:00
Mauro Carvalho Chehab	0a8a9ac9ca	i82875p_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:43 -03:00
Mauro Carvalho Chehab	84c3a68408	i82860_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:43 -03:00
Mauro Carvalho Chehab	40f562b191	i82443bxgx_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Tim Small <tim@buttersideup.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:42 -03:00
Mauro Carvalho Chehab	0975c16f4f	i7core_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:42 -03:00
Mauro Carvalho Chehab	70e2a8379b	i7300_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:13:41 -03:00
Mauro Carvalho Chehab	296da591ea	i5400_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:11:02 -03:00
Mauro Carvalho Chehab	d1afaa0a6e	i5100_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:11:01 -03:00
Mauro Carvalho Chehab	702df64053	i5000_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:11:01 -03:00
Mauro Carvalho Chehab	95b93287c6	i3200_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:11:01 -03:00
Mauro Carvalho Chehab	884906f197	i3000_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Jason Uhlenkott <juhlenko@akamai.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:11:01 -03:00
Mauro Carvalho Chehab	30ac440681	e7xxx_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:11:01 -03:00
Mauro Carvalho Chehab	ce11ce1710	e752x_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Mark Gross <mark.gross@intel.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:11:00 -03:00
Mauro Carvalho Chehab	df62b1e663	cpc925_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:11:00 -03:00
Mauro Carvalho Chehab	6458fc08b6	cell_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:11:00 -03:00
Mauro Carvalho Chehab	d8c34af4d0	amd76x_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:11:00 -03:00
Mauro Carvalho Chehab	ab5a503cb5	amd64_edac: convert driver to use the new edac ABI The legacy edac ABI is going to be removed. Port the driver to use and benefit from the new API functionality. Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:59 -03:00
Mauro Carvalho Chehab	4275be6355	edac: Change internal representation to work with layers Change the EDAC internal representation to work with non-csrow based memory controllers. There are lots of those memory controllers nowadays, and more are coming. So, the EDAC internal representation needs to be changed, in order to work with those memory controllers, while preserving backward compatibility with the old ones. The edac core was written with the idea that memory controllers are able to directly access csrows. This is not true for FB-DIMM and RAMBUS memory controllers. Also, some recent advanced memory controllers don't present a per-csrows view. Instead, they view memories as DIMMs, instead of ranks. So, change the allocation and error report routines to allow them to work with all types of architectures. This will allow the removal of several hacks with FB-DIMM and RAMBUS memory controllers. Also, several tests were done on different platforms using different x86 drivers. TODO: a multi-rank DIMMs are currently represented by multiple DIMM entries in struct dimm_info. That means that changing a label for one rank won't change the same label for the other ranks at the same DIMM. This bug is present since the beginning of the EDAC, so it is not a big deal. However, on several drivers, it is possible to fix this issue, but it should be a per-driver fix, as the csrow => DIMM arrangement may not be equal for all. So, don't try to fix it here yet. I tried to make this patch as short as possible, preceding it with several other patches that simplified the logic here. Yet, as the internal API changes, all drivers need changes. The changes are generally bigger in the drivers for FB-DIMMs. Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:59 -03:00
Mauro Carvalho Chehab	93e4fe64ec	edac: rewrite edac_align_ptr() The edac_align_ptr() function is used to prepare data for a single memory allocation kzalloc() call. It counts how many bytes are needed by some data structure. Using it as-is is not that trivial, as the quantity of memory elements reserved is not there, but, instead, it is on a next call. In order to avoid mistakes when using it, move the number of allocated elements into it, making easier to use it. Reviewed-by: Borislav Petkov <bp@amd64.org> Cc: Aristeu Rozanski <arozansk@redhat.com> Cc: Doug Thompson <norsk5@yahoo.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:59 -03:00
Mauro Carvalho Chehab	a895bf8b1e	edac: move nr_pages to dimm struct The number of pages is a dimm property. Move it to the dimm struct. After this change, it is possible to add sysfs nodes for the DIMM's that will properly represent the DIMM stick properties, including its size. A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when the memory controller represents the memory via chip select rows. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:58 -03:00
Mauro Carvalho Chehab	5e2af0c09e	edac: Don't initialize csrow's first_page & friends when not needed Almost all edac drivers initialize csrow_info->first_page, csrow_info->last_page and csrow_info->page_mask. Those vars are used inside the EDAC core, in order to calculate the csrow affected by an error, by using the routine edac_mc_find_csrow_by_page(). However, very few drivers actually use it: e752x_edac.c e7xxx_edac.c i3000_edac.c i82443bxgx_edac.c i82860_edac.c i82875p_edac.c i82975x_edac.c r82600_edac.c There also a few other drivers that have their own calculus formula internally using those vars. All the others are just wasting time by initializing those data. While initializing data without using them won't cause any troubles, as those information is stored at the wrong place (at csrows structure), it is better to remove what is unused, in order to simplify the next patch. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:58 -03:00
Mauro Carvalho Chehab	084a4fccef	edac: move dimm properties to struct dimm_info On systems based on chip select rows, all channels need to use memories with the same properties, otherwise the memories on channels A and B won't be recognized. However, such assumption is not true for all types of memory controllers. Controllers for FB-DIMM's don't have such requirements. Also, modern Intel controllers seem to be capable of handling such differences. So, we need to get rid of storing the DIMM information into a per-csrow data, storing it, instead at the right place. The first step is to move grain, mtype, dtype and edac_mode to the per-dimm struct. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Mark Gross <mark.gross@intel.com> Cc: Jason Uhlenkott <juhlenko@akamai.com> Cc: Tim Small <tim@buttersideup.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: Olof Johansson <olof@lixom.net> Cc: Egor Martovetsky <egor@pasemi.com> Cc: Michal Marek <mmarek@suse.cz> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Joe Perches <joe@perches.com> Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hitoshi Mitake <h.mitake@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: James Bottomley <James.Bottomley@parallels.com> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Cc: Shaohui Xie <Shaohui.Xie@freescale.com> Cc: Josh Boyer <jwboyer@gmail.com> Cc: Mike Williams <mike@mikebwilliams.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:58 -03:00
Mauro Carvalho Chehab	a7d7d2e1a0	edac: Create a dimm struct and move the labels into it The way a DIMM is currently represented implies that they're linked into a per-csrow struct. However, some drivers don't see csrows, as they're ridden behind some chip like the AMB's on FBDIMM's, for example. This forced drivers to fake^Wvirtualize a csrow struct, and to create a mess under csrow/channel original's concept. Move the DIMM labels into a per-DIMM struct, and add there the real location of the socket, in terms of csrow/channel. Latter patches will modify the location to properly represent the memory architecture. All other drivers will use a per-csrow type of location. Some of those drivers will require a latter conversion, as they also fake the csrows internally. TODO: While this patch doesn't change the existing behavior, on csrows-based memory controllers, a csrow/channel pair points to a memory rank. There's a known bug at the EDAC core that allows having different labels for the same DIMM, if it has more than one rank. A latter patch is need to merge the several ranks for a DIMM into the same dimm_info struct, in order to avoid having different labels for the same DIMM. The edac_mc_alloc() will now contain a per-dimm initialization loop that will be changed by latter patches in order to match other types of memory architectures. Reviewed-by: Aristeu Rozanski <arozansk@redhat.com> Reviewed-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Doug Thompson <norsk5@yahoo.com> Cc: Ranganathan Desikan <ravi@jetztechnologies.com> Cc: "Arvind R." <arvino55@gmail.com> Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-05-28 19:10:57 -03:00
Borislav Petkov	e8f380e008	x86/bitops: Move BIT_64() for a wider use Needed for shifting 64-bit values on 32-bit, like MSR values, for example. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frank Arnold <frank.arnold@amd.com> Link: http://lkml.kernel.org/r/1337684026-19740-1-git-send-email-bp@amd64.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-05-23 17:16:42 +02:00
Jiri Kosina	f70d4a95ed	edac, mips: don't change code that has been removed in edac/mips tree This is a partial revert of `15ed103a98` ("edac: Fix spelling errors") `6997991ab0` ("mips: Fix printk typos in arc/mips") which change code that doesn't exist any more in edac/mips trees. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-05-22 11:00:09 +02:00
David Mackey	15ed103a98	edac: Fix spelling errors. Signed-off-by: David Mackey <tdmackey@twitter.com> Signed-off-by: Vinson Lee <vlee@twitter.com> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2012-04-30 13:28:41 +02:00
Linus Torvalds	4157368edb	Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile Pull arch/tile bug fixes from Chris Metcalf: "This includes Paul Gortmaker's change to fix the <asm/system.h> disintegration issues on tile, a fix to unbreak the tilepro ethernet driver, and a backlog of bugfix-only changes from internal Tilera development over the last few months. They have all been to LKML and on linux-next for the last few days. The EDAC change to MAINTAINERS is an oddity but discussion on the linux-edac list suggested I ask you to pull that change through my tree since they don't have a tree to pull edac changes from at the moment." * 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (39 commits) drivers/net/ethernet/tile: fix netdev_alloc_skb() bombing MAINTAINERS: update EDAC information tilepro ethernet driver: fix a few minor issues tile-srom.c driver: minor code cleanup edac: say "TILEGx" not "TILEPro" for the tilegx edac driver arch/tile: avoid accidentally unmasking NMI-type interrupt accidentally arch/tile: remove bogus performance optimization arch/tile: return SIGBUS for addresses that are unaligned AND invalid arch/tile: fix finv_buffer_remote() for tilegx arch/tile: use atomic exchange in arch_write_unlock() arch/tile: stop mentioning the "kvm" subdirectory arch/tile: export the page_home() function. arch/tile: fix pointer cast in cacheflush.c arch/tile: fix single-stepping over swint1 instructions on tilegx arch/tile: implement panic_smp_self_stop() arch/tile: add "nop" after "nap" to help GX idle power draw arch/tile: use proper memparse() for "maxmem" options arch/tile: fix up locking in pgtable.c slightly arch/tile: don't leak kernel memory when we unload modules arch/tile: fix bug in delay_backoff() ...	2012-04-06 17:56:20 -07:00
Borislav Petkov	ec3e82d6dc	MCE, AMD: Drop too granulary family model checks MCA details seldom change inbetween the models of a family so don't be too conservative and enable decoding on everything starting from K8 onwards. Minor adjustments can come in later but most importantly, we have some decoding infrastructure in place for upcoming models by default. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-04-04 15:50:11 +02:00
Chris Metcalf	e2e110d759	edac: say "TILEGx" not "TILEPro" for the tilegx edac driver This is just an aesthetic change but it was silly to say TILEPro when booting up on the tilegx architecture. Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>	2012-04-02 12:14:06 -04:00
Linus Torvalds	f0f3680e50	Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac Pull EDAC fixes from Mauro Carvalho Chehab: "A series of EDAC driver fixes. It also has one core fix at the documentation, and a rename patch, fixing the name of the struct that contains the rank information." * 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: edac: rename channel_info to rank_info i5400_edac: Avoid calling pci_put_device() twice edac: i5100 ack error detection register after each read edac: i5100 fix erroneous define for M1Err edac: sb_edac: Fix a wrong value setting for the previous value edac: sb_edac: Fix a INTERLEAVE_MODE() misuse edac: sb_edac: Let the driver depend on PCI_MMCONFIG edac: Improve the comments to better describe the memory concepts edac/ppc4xx_edac: Fix compilation Fix sb_edac compilation with 32 bits kernels	2012-03-28 14:24:40 -07:00
Linus Torvalds	250f6715a4	The following text was taken from the original review request: "[RFC PATCH 0/2] audit of linux/device.h users in include/" https://lkml.org/lkml/2012/3/4/159 -- Nearly every subsystem has some kind of header with a proto like: void foo(struct device dev); and yet there is no reason for most of these guys to care about the sub fields within the device struct. This allows us to significantly reduce the scope of headers including headers. For this instance, a reduction of about 40% is achieved by replacing the include with the simple fact that the device is some kind of a struct. Unlike the much larger module.h cleanup, this one is simply two commits. One to fix the implicit <linux/device.h> users, and then one to delete the device.h includes from the linux/include/ dir wherever possible. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPbNxLAAoJEOvOhAQsB9HWR6QQAMRUZ94O2069/nW9h4TO/xTr Hq/80lo/TBBiRmob3iWBP76lzgeeMPPVEX1I6N7YYlhL3IL7HsaJH1DvpIPPHXQP GFKcBsZ5ZLV8c4CBDSr+/HFNdhXc0bw0awBjBvR7gAsWuZpNFn4WbhizJi4vWAoE 4ydhPu55G1G8TkBtYLJQ8xavxsmiNBSDhd2i+0vn6EVpgmXynjOMG8qXyaS97Jvg pZLwnN5Wu21coj6+xH3QUKCl1mJ+KGyamWX5gFBVIfsDB3k5H4neijVm7t1en4b0 cWxmXeR/JE3VLEl/17yN2dodD8qw1QzmTWzz1vmwJl2zK+rRRAByBrL0DP7QCwCZ ppeJbdhkMBwqjtknwrmMwsuAzUdJd79GXA+6Vm+xSEkr6FEPK1M0kGbvaqV9Usgd ohMewewbO6ddgR9eF7Kw2FAwo0hwkPNEplXIym9rZzFG1h+T0STGSHvkn7LV765E ul1FapSV3GCxEVRwWTwD28FLU2+0zlkOZ5sxXwNPTT96cNmW+R7TGuslZKNaMNjX q7eBZxo8DtVt/jqJTntR8bs8052c8g1Ac1IKmlW8VSmFwT1M6VBGRn1/JWAhuUgv dBK/FF+I1GJTAJWIhaFcKXLHvmV9uhS6JaIhLMDOetoOkpqSptJ42hDG+89WkFRk o55GQ5TFdoOpqxVzGbvE =3j4+ -----END PGP SIGNATURE----- Merge tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull <linux/device.h> avoidance patches from Paul Gortmaker: "Nearly every subsystem has some kind of header with a proto like: void foo(struct device dev); and yet there is no reason for most of these guys to care about the sub fields within the device struct. This allows us to significantly reduce the scope of headers including headers. For this instance, a reduction of about 40% is achieved by replacing the include with the simple fact that the device is some kind of a struct. Unlike the much larger module.h cleanup, this one is simply two commits. One to fix the implicit <linux/device.h> users, and then one to delete the device.h includes from the linux/include/ dir wherever possible." tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: device.h: audit and cleanup users in main include dir device.h: cleanup users outside of linux/include (C files)	2012-03-24 10:41:37 -07:00
Linus Torvalds	dae430c6f6	A bunch of fixes/updates for the AMD side of EDAC including * MCE decoding updates * tree-wide EDAC sweep making pci_device_ids __devinitconst * Scrub rate API correction * two amd64_edac corrections for K8 boxes and sysfs csrow nodes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPZ0GkAAoJEBLB8Bhh3lVKjr4P/2q+7iZY8skYJqjrGel1vpKG aMwn1DTv9Mhy2omE+UDSuMVWrZ8CI/G+5aW5j6uulj6uko+3Q1aQtkN3zQhoa1RG 7Q2X4t0skRnr7xOJKIYour+ufX4MLoVdEZqu4jvIFoniAQEXbDfbgT7KnA/hRwFp i04GaFRsPb97cvi7OfMpczTVan3lDUhT81kvdlrVZGhu3dVIkZw//hmsrt98SF1G qeBSvY/ji45D/mro6QMDZcN5+AWgLuI4ivM3Bk1tFgxs6KVaHcCOfPC8szQAuupl fQl/N4sHyj7BuTHlCWEog+V67ifpZcb5peya9Vs5nArhj/GzFmVhE0JBgFG7aCXQ hnAEhmgUI5NdHhLmrmGWSPOEQbG+V53/mBEk45sdH9zzl7yBH20gILdYFejmwEuX HfTTugIby4ncsw9Gkmi2rx33wNuzdOsXNRILk56bSf9X3e1m7LDtZV6EMeQ9I3D9 ONjrrmNrkz5AmI765IsGs6vJT6ZkpZk6DNbykEOTPtoOhwiCSsbapowh9NpdhL1k EwUuZfyWoulVeYJw+m6a6Aw9bp2stj694p0tfdXh4C9g09Qlvx2RANBAgEJQE/oD 5rWE/p/HiDFuMd7IrRuLaC1vV7YnWNX/EHLvtV7Ei2qslaK9XxFIb9p7jqyUIpoj nwUDJYIB4Dbk+NSV34N3 =Q+wu -----END PGP SIGNATURE----- Merge tag 'amd64-edac-updates-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull AMD64 EDAC fixes from Borislav Petkov: "A bunch of fixes/updates for the AMD side of EDAC including * MCE decoding updates * tree-wide EDAC sweep making pci_device_ids __devinitconst * Scrub rate API correction * two amd64_edac corrections for K8 boxes and sysfs csrow nodes" * tag 'amd64-edac-updates-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: MCE, AMD: Constify error tables MCE, AMD: Correct bank 5 error signatures MCE, AMD: Rework NB MCE signatures MCE, AMD: Correct VB data error description MCE, AMD: Correct ucode patch buffer description MCE, AMD: Correct some MC0 error types EDAC: Make pci_device_id tables __devinitconst. EDAC: Correct scrub rate API amd64_edac: Fix K8 revD and later chip select sizes amd64_edac: Fix missing csrows sysfs nodes	2012-03-23 17:59:47 -07:00
Mauro Carvalho Chehab	a4b4be3fd7	edac: rename channel_info to rank_info What it is pointed by a csrow/channel vector is a rank information, and not a channel information. On a traditional architecture, the memory controller directly access the memory ranks, via chip select rows. Different ranks at the same DIMM is selected via different chip select rows. So, typically, one csrow/channel pair means one different DIMM. On FB-DIMMs, there's a microcontroller chip at the DIMM, called Advanced Memory Buffer (AMB) that serves as the interface between the memory controller and the memory chips. The AMB selection is via the DIMM slot, and not via a csrow. It is up to the AMB to talk with the csrows of the DRAM chips. So, the FB-DIMM memory controllers see the DIMM slot, and not the DIMM rank. RAMBUS is similar. Newer memory controllers, like the ones found on Intel Sandy Bridge and Nehalem, even working with normal DDR3 DIMM's, don't use the usual channel A/channel B interleaving schema to provide 128 bits data access. Instead, they have more channels (3 or 4 channels), and they can use several interleaving schemas. Such memory controllers see the DIMMs directly on their registers, instead of the ranks, which is better for the driver, as its main usageis to point to a broken DIMM stick (the Field Repleceable Unit), and not to point to a broken DRAM chip. The drivers that support such such newer memory architecture models currently need to fake information and to abuse on EDAC structures, as the subsystem was conceived with the idea that the csrow would always be visible by the CPU. To make things a little worse, those drivers don't currently fake csrows/channels on a consistent way, as the concepts there don't apply to the memory controllers they're talking with. So, each driver author interpreted the concepts using a different logic. In order to fix it, let's rename the data structure that points into a DIMM rank to "rank_info", in order to be clearer about what's stored there. Latter patches will provide a better way to represent the memory hierarchy for the other types of memory controller. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:22:50 -03:00
Mauro Carvalho Chehab	0142877aa4	i5400_edac: Avoid calling pci_put_device() twice When i5400_edac driver is removed and re-loaded a few times, it causes an OOPS, as it is currently decrementing some PCI device usage two times. When called inside a loop, pci_get_device() will call pci_put_device(). That mangles the error count. In this specific case, it seems easier to just duplicate the call. Also fixes the error logic when pci_get_device fails. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:22:49 -03:00
Niklas Söderlund	df95e42e1f	edac: i5100 ack error detection register after each read If I only ack the detection register after a error have been detected I'm unable to reliably detect errors. I have verified this behavior using both an error injection DIMM and software to inject errors. I can't find any documentation supporting this behavior in Intel 5100 Memory Controller Hub Chipset, see 1. So this is all based on experimentation. [1] Intel® 5100 Memory Controller Hub Chipset http://www.intel.com/content/dam/doc/datasheet/5100- memory-controller-hub-chipset-datasheet.pdf Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:22:49 -03:00
Niklas Söderlund	b6378cb3e5	edac: i5100 fix erroneous define for M1Err According to [1] the define for M1Err in the FERR_NF_MEM register is wrong. It should be at position 1 not 0. [1] Intel 5100 Memory Controller Hub Chipset Doc.Nr: 318378 http://www.intel.com/content/dam/doc/datasheet/5100- memory-controller-hub-chipset-datasheet.pdf Reported-by: Ba Thang Nguyen <thang.b.nguyen@dektech.com.au> Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:20:55 -03:00
Hui Wang	7fae0db439	edac: sb_edac: Fix a wrong value setting for the previous value >From the driver design, the variable limit wants to compare with its previous value, we should set the value of limit instead of the value of tmp_mb to the variable prev. Signed-off-by: Hui Wang <jason77.wang@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:20:11 -03:00
Hui Wang	ad9c40b7dd	edac: sb_edac: Fix a INTERLEAVE_MODE() misuse We can identify dram interleave mode from the Dram Rule register rather than Dram Interleave list register. In this context, the reg of INTERLEAVE_MODE(reg) contains the Dram Interleave list register, we can't get interleave mode from the reg, while the variable interleave_mode saves the the mode got from the Dram Rule register, so we use the variable to replace INTERLEAVE_MDDE(reg) here. Signed-off-by: Hui Wang <jason77.wang@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:20:02 -03:00
Hui Wang	22a5c27bec	edac: sb_edac: Let the driver depend on PCI_MMCONFIG This driver needs to access PCIe Extended Configuration Space Registers (0x100~0xfff), to correctly access those registers, we need to enable PCI_MMCONFIG option. Since this option is not enabled for X86_64 by default, we let the driver depend on it to prevent users forgetting to enable this option. Signed-off-by: Hui Wang <jason77.wang@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:19:56 -03:00
Mauro Carvalho Chehab	b877763ea0	edac/ppc4xx_edac: Fix compilation It seems that nobody is cross-compiling for this arch anymore... drivers/edac/ppc4xx_edac.c: In function 'ppc4xx_edac_probe': drivers/edac/ppc4xx_edac.c:188:12: error: storage class specified for parameter 'ppc4xx_edac_remove' ... drivers/edac/ppc4xx_edac.c:1068:19: error: 'match' undeclared (first use in this function) drivers/edac/ppc4xx_edac.c:1068:19: note: each undeclared identifier is reported only once for each function it appears in drivers/edac/ppc4xx_edac.c:1068:36: warning: left-hand operand of comma expression has no effect [-Wunused-value] Acked-by: Josh Boyer <jwboyer@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:19:44 -03:00
Mauro Carvalho Chehab	5b889e379f	Fix sb_edac compilation with 32 bits kernels As reported by Josh Boyer <jwboyer@redhat.com>: > drivers/edac/sb_edac.c: In function 'get_memory_error_data': > drivers/edac/sb_edac.c:861:2: warning: left shift count >= width of type > [enabled by default] > <snip> > ERROR: "__udivdi3" [drivers/edac/sb_edac.ko] undefined! > make[1]: * [__modpost] Error 1 > make: * [modules] Error 2 PS.: compile-tested only Reported-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>	2012-03-21 15:19:38 -03:00
Cong Wang	4e5df7ca30	edac: remove the second argument of k[un]map_atomic() Signed-off-by: Cong Wang <amwang@redhat.com>	2012-03-20 21:48:17 +08:00
Borislav Petkov	ebe2aea868	MCE, AMD: Constify error tables ... so that checkpatch can chill out. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>	2012-03-19 12:06:26 +01:00
Borislav Petkov	ae615b4b5f	MCE, AMD: Correct bank 5 error signatures ... and remove superfluous ErrorCodeExt check. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>	2012-03-19 12:06:26 +01:00
Borislav Petkov	68782673e6	MCE, AMD: Rework NB MCE signatures Correct their formulation, replace per-family functions with a single, unified lookup table. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>	2012-03-19 12:06:25 +01:00
Borislav Petkov	b64a99c175	MCE, AMD: Correct VB data error description Sync with latest BKDG error types. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>	2012-03-19 12:06:25 +01:00
Borislav Petkov	6c1173a61e	MCE, AMD: Correct ucode patch buffer description This MC1 error signature is called differently now, fix it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>	2012-03-19 12:06:24 +01:00
Borislav Petkov	344f0a0631	MCE, AMD: Correct some MC0 error types Use "System Read Data Error" as a more general name for MC0 bus errors on F15h and update some error definitions. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>	2012-03-19 12:06:23 +01:00
Lionel Debroux	36c46f31df	EDAC: Make pci_device_id tables __devinitconst. These const tables are currently marked __devinitdata, but Documentation/PCI/pci.txt says: "o The ID table array should be marked __devinitconst; this is done automatically if the table is declared with DEFINE_PCI_DEVICE_TABLE()." So use DEFINE_PCI_DEVICE_TABLE(x). Based on PaX and earlier work by Andi Kleen. Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-03-19 12:04:54 +01:00
Borislav Petkov	5e8e19bf6c	EDAC: Correct scrub rate API The original scrub rate API definition states that if scrub rate accessors are not implemented, a negative value (-1) should be written to the sysfs file (/sys/devices/system/edac/mc/mc<N>/sdram_scrub_rate, where N is the memory controller number on the system). This is counter-intuitive and awkward at the very least because, when setting the scrub rate, userspace has to write to sysfs and then read it back to check error status of the operation. As Tony notes, best it would be to not have the sdram_scrub_rate in sysfs if scrub rate support is not implemented. It is too late about that and a bunch of drivers on a bunch of arches would need to be changed and tested which is not a trivial task ATM. Instead, settle for the next best thing of returning -ENODEV when implementation is missing and -EINVAL when there was an error encountered while setting the scrub rate. Reported-by: Han Pingtian <phan@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20110916105856.GA13253@hpt.nay.redhat.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-03-19 12:03:58 +01:00
Borislav Petkov	11b0a31473	amd64_edac: Fix K8 revD and later chip select sizes Fix DRAM chip select sizes calculation for K8, revisions D and E. Reported-by: Niklas Söderlund <niklas.soderlund@ericsson.com Link: http://lkml.kernel.org/r/1320849178-23340-1-git-send-email-niklas.soderlund@ericsson.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-03-19 12:02:46 +01:00
Ashish Shenoy	f92cae4526	amd64_edac: Fix missing csrows sysfs nodes While initializing the array of csrow attribute instances, a few csrows were uninitialized. This happened because the module only performed a check for DRAM base ctl register0's and not DRAM base ctl register1's chip select enable bit. There could be systems with DIMMs populated on only single memory channel whereas the module also assumed that a dual channel dimm had double the memory size of a single memory channel instead of checking the memory on each channel. This patch fixes these above issues. Signed-off-by: Ashish Shenoy <ashenoy@riverbed.com> Signed-off-by: Prasanna S. Panchamukhi <ppanchamukhi@riverbed.com> Link: http://lkml.kernel.org/r/4F459CFA.5090604@riverbed.com Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2012-03-19 11:57:28 +01:00

... 3 4 5 6 7 ...

1139 Commits