Commit Graph

1065 Commits

Author SHA1 Message Date
Borislav Petkov
b487c33e55 amd64_edac: Fix node id signedness
A node id can never be negative since we use it as an index into
the DRAM ranges array. This also makes one of the BUG_ON conditions
redundant.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:28 +01:00
Borislav Petkov
d88977a9c4 amd64_edac: Drop redundant declarations
Those were moved to the mce_amd.h header.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:27 +01:00
Borislav Petkov
df71a05324 amd64_edac: Enable driver on F15h
Add the PCI device ids required for driver registration. Remove
pvt->ctl_name and use the family descriptor directly, instead. Then,
bump driver version and fixup its format. Finally, enable DRAM ECC
decoding on F15h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:26 +01:00
Borislav Petkov
a3b7db09a6 amd64_edac: Adjust ECC symbol size to F15h
F15h has the same ECC symbol size options as F10h revD and later so
adjust checks to that. Simplify code a bit.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:26 +01:00
Borislav Petkov
87b3e0e6e4 amd64_edac: Simplify scrubrate setting
Drop per-instance variable and compute min scrubrate dynamically.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:25 +01:00
Borislav Petkov
41d8bfaba7 amd64_edac: Improve DRAM address mapping
Drop static tables which map the bits in F2x80 to a chip select size in
favor of functions doing the mapping with some bit fiddling. Also, add
F15 support.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:24 +01:00
Borislav Petkov
5a5d237169 amd64_edac: Sanitize ->read_dram_ctl_register
This function is relevant for F10h and higher, and it has only one
callsite so drop its function pointer from the low_ops struct.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:23 +01:00
Borislav Petkov
b15f0fcab1 amd64_edac: Adjust sys_addr to chip select conversion routine to F15h
F15h sys_addr to chip select mapping is almost identical to F10h's so
reuse that. Rename functions on that path accordingly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:22 +01:00
Borislav Petkov
355fba6005 amd64_edac: Beef up early exit reporting
Add paranoid checks for the sys address before going off and decoding
it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:22 +01:00
Borislav Petkov
614ec9d853 amd64_edac: Revamp online spare handling
Replace per-DCT macros with smarter ones, drop hack and look for the
spare rank on all chip selects on a channel.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:21 +01:00
Borislav Petkov
5d4b58e84a amd64_edac: Fix channel interleave removal
Remove the channel interleave select bit properly. See
F2x110[DctSelIntLvAddr] for details.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:21 +01:00
Borislav Petkov
e2f79dbdfb amd64_edac: Correct node interleaving removal
When node interleaving is enabled, a subset of the addr[14:12] bits has
to be removed in order to get the normalized DCT address of the DRAM
channel. The actual number of bits to remove is determined by F1x[1,
0][7C:40][IntlvEn]. Do this correctly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:20 +01:00
Borislav Petkov
95b0ef55cd amd64_edac: Add support for interleaved region swapping
On revC3 and revE Fam10h machines and later, non-interleaved graphics
framebuffer memory under the 16G mark can be swapped with a region
located at the bottom of memory so that the GPU can use the interleaved
region and thus two channels. Add support for that.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:20 +01:00
Borislav Petkov
700466249f amd64_edac: Unify get_error_address
The address bits from MC4_STATUS differ only between K8 and the rest so
no need for a per-family method.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:19 +01:00
Borislav Petkov
f192c7b16c amd64_edac: Simplify decoding path
Use the struct mce directly instead of copying from it into a custom
struct err_regs.

No functionality change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:19 +01:00
Borislav Petkov
7d20d14da1 amd64_edac: Adjust channel counting to F15h
The only difference is that F10h used to sport ganged DCTs and F15h
doesn't so adjust the F10h routine and reuse it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:19 +01:00
Borislav Petkov
5980bb9cd8 amd64_edac: Cleanup old defines cruft
Remove unused defines, drop family names from define names.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:18 +01:00
Borislav Petkov
bcd781f46a amd64_edac: Cleanup NBSH cruft
Remove reporting of errors with UC bit set - this is done by the MCE
decoding code anyway and this driver deals with DRAM ECC errors only. UC
(NB uncorrectable error) doesn't necessarily mean it is a DRAM error.
Remove unused macros while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:17 +01:00
Borislav Petkov
a97fa68ec4 amd64_edac: Cleanup NBCFG handling
The fact whether we are chipkill capable or not does not have any
bearing when computing the channel index on a ganged DCT configuration
so remove that. Also, simplify debug statements. Finally, remove old
error injection leftovers, while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:16 +01:00
Borislav Petkov
c9f4f26eae amd64_edac: Cleanup NBCTL code
Remove family names from macro names, drop single bit defines and
comment their meaning instead.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:15 +01:00
Borislav Petkov
78da121e15 amd64_edac: Cleanup DCT Select Low/High code
Shorten macro names, remove family name from macros, fix macro
arguments, shorten debug strings.

No functionality change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:15 +01:00
Borislav Petkov
cb32850744 amd64_edac: Cleanup Dram Configuration registers handling
* Restrict DCT ganged mode check since only Fam10h supports it
* Adjust DRAM type detection for BD since it only supports DDR3
* Remove second and thus unneeded DCLR read in k8_early_channel_count() - we do
  that in read_mc_regs()
* Cleanup comments and remove family names from register macros
* Remove unused defines

There should be no functional change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:14 +01:00
Borislav Petkov
525a1b20a6 amd64_edac: Cleanup DBAM handling
Do not read DBAM regs twice and simplify code around them.

There should be no functional change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:14 +01:00
Borislav Petkov
f678b8ccce amd64_edac: Replace huge bitmasks with a macro
Replace hard to read hex constants with a continuous masks macro.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:14 +01:00
Borislav Petkov
c8e518d567 amd64_edac: Sanitize f10_get_base_addr_offset
This function maps the system address to the normalized DCT address.
Document what the code does for more clarity and wrap insane bitmasks in
a more understandable macro which generates them. Also, reduce number of
arguments passed to the function. Finally, rename this function to what
it actually does.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:13 +01:00
Borislav Petkov
229a7a11ac amd64_edac: Sanitize channel extraction
Cleanup and simplify f10_determine_channel(); make it more readable.
Also drop f10_map_intlv_en_to_shift() in favor of simply counting the
bits in F1x124[DramIntlvEn] which is equivalent.

There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:13 +01:00
Borislav Petkov
11c75eadaf amd64_edac: Cleanup chipselect handling
Add a struct representing the DRAM chip select base/limit register
pairs. Concentrate all CS handling in a single function. Also, add CS
looping macros for cleaner, more readable code. While at it, adjust code
to F15h. Finally, do smaller macro names cleanups (remove family names
from register macros) and debug messages clarification.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:12 +01:00
Borislav Petkov
bc21fa5787 amd64_edac: Cleanup DHAR handling
Adjust to F15h, simplify code, fixup macros.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:12 +01:00
Borislav Petkov
7f19bf755c amd64_edac: Remove DRAM base/limit subfields caching
Add a struct representing the DRAM base/limit range pairs and remove all
cached subfields. Replace them with accessor functions, which actually
saves us some space:

   text    data     bss     dec     hex filename
  14712    1577     336   16625    40f1 drivers/edac/amd64_edac_mod.o.after
  14831    1609     336   16776    4188 drivers/edac/amd64_edac_mod.o.before

Also, it simplifies the code a lot allowing to merge the K8 and F10h
routines.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:11 +01:00
Borislav Petkov
b2b0c60543 amd64_edac: Add support for F15h DCT PCI config accesses
F15h "multiplexes" between the configuration space of the two DRAM
controllers by toggling D18F1x10C[DctCfgSel] while F10h has a different
set of registers for DCT0, and DCT1 in extended PCI config space.

Add DCT configuration space accessors per family thus wrapping all the
different access prerequisites. Clean up code while at it, shorten
names.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:11 +01:00
Borislav Petkov
b6a280bb96 EDAC: Shut up sysfs registration debug code
Raise the debug level of these routines so that their output get issued
out only when the highest debug level is selected. Otherwise, don't
pollute driver debug output.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:10 +01:00
Chris Metcalf
5c77075548 drivers/edac: provide support for tile architecture
Add tile support for the EDAC driver, which provides unified system
error (memory, PCI, etc.) reporting. For now, the TILEPro port
reports memory correctable error (CE) only.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-03-10 13:30:14 -05:00
Grant Likely
000061245a dt/powerpc: Eliminate users of of_platform_{,un}register_driver
Get rid of old users of of_platform_driver in arch/powerpc.  Most
of_platform_driver users can be converted to use the platform_bus
directly.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-02-28 01:36:39 -07:00
Arvind R
cb60a42269 edac: correct i82975x error-info reported
to edac-core

fix the totally wrong info w.r.t page,row,dimm-label previously reported to
edac-core by i82975x driver

Signed-off-by: Arvind R. <arvino55@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-02-17 16:47:04 +01:00
Arvind R
da95b3d21f edac: correct i82975x mci initialisation
corrected mtype, and added dev_name,scrubmode initialisers
in i82975x struct mem_ctl initialisation

Signed-off-by: Arvind R. <arvino55@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-02-17 16:46:22 +01:00
Arvind R
7ba9957581 edac: correct commented info
wrong comments in i82975x driver corrected

Signed-off-by: Arvind R. <arvino55@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-02-17 16:44:45 +01:00
Jiri Kosina
0a9d59a246 Merge branch 'master' into for-next 2011-02-15 10:24:31 +01:00
Borislav Petkov
4d7963648f amd64_edac: Fix DIMMs per DCTs output
amd64_debug_display_dimm_sizes() reports the distribution of the DIMMs
on each DRAM controller and its chip select sizes. Thus, the last don't
have anything to do with whether we're running in ganged DCT mode or not
- their sizes don't change all of a sudden. Fix that by removing the
ganged-check and dump DCT0's config for DCT1 when in ganged mode since
they're identical.

Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-02-10 14:41:49 +01:00
Arvind R
25527885e3 edac: i82975x author/maintainer email address change
edac-i82975x author/maintainer email address change

Signed-off-by: Arvind R. <arvino55@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-01-24 16:17:51 +01:00
Jesper Juhl
42b16b3fbb Kill off warning: ‘inline’ is not at beginning of declaration
Fix a bunch of
	warning: ‘inline’ is not at beginning of declaration
messages when building a 'make allyesconfig' kernel with -Wextra.

These warnings are trivial to kill, yet rather annoying when building with
-Wextra.
The more we can cut down on pointless crap like this the better (IMHO).

A previous patch to do this for a 'allnoconfig' build has already been
merged. This just takes the cleanup a little further.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-01-19 15:43:08 +01:00
Linus Torvalds
008d23e485 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
  Documentation/trace/events.txt: Remove obsolete sched_signal_send.
  writeback: fix global_dirty_limits comment runtime -> real-time
  ppc: fix comment typo singal -> signal
  drivers: fix comment typo diable -> disable.
  m68k: fix comment typo diable -> disable.
  wireless: comment typo fix diable -> disable.
  media: comment typo fix diable -> disable.
  remove doc for obsolete dynamic-printk kernel-parameter
  remove extraneous 'is' from Documentation/iostats.txt
  Fix spelling milisec -> ms in snd_ps3 module parameter description
  Fix spelling mistakes in comments
  Revert conflicting V4L changes
  i7core_edac: fix typos in comments
  mm/rmap.c: fix comment
  sound, ca0106: Fix assignment to 'channel'.
  hrtimer: fix a typo in comment
  init/Kconfig: fix typo
  anon_inodes: fix wrong function name in comment
  fix comment typos concerning "consistent"
  poll: fix a typo in comment
  ...

Fix up trivial conflicts in:
 - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
 - fs/ext4/ext4.h

Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-13 10:05:56 -08:00
Linus Torvalds
128283a47e Merge branch 'mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
* 'mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, MCE: Fix NB error formatting
  EDAC, MCE: Use BIT_64() to eliminate warnings on 32-bit
  EDAC, MCE: Enable MCE decoding on F15h
  EDAC, MCE: Allow F15h bank 6 MCE injection
  EDAC, MCE: Shorten error report formatting
  EDAC, MCE: Overhaul error fields extraction macros
  EDAC, MCE: Add F15h FP MCE decoder
  EDAC, MCE: Add F15 EX MCE decoder
  EDAC, MCE: Add an F15h NB MCE decoder
  EDAC, MCE: No F15h LS MCE decoder
  EDAC, MCE: Add F15h CU MCE decoder
  EDAC, MCE: Add F15h IC MCE decoder
  EDAC, MCE: Add F15h DC MCE decoder
  EDAC, MCE: Select extended error code mask
2011-01-07 14:54:03 -08:00
Borislav Petkov
6d5db46687 EDAC, MCE: Fix NB error formatting
Minor formatting fixup since the information which core was associated
with the MCE is not always valid.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:26 +01:00
Randy Dunlap
50adbbd8a8 EDAC, MCE: Use BIT_64() to eliminate warnings on 32-bit
Building for X86_32 produces shift count warnings, so use BIT_64() to
eliminate the warnings.

drivers/edac/mce_amd.c:778: warning: left shift count >= width of type
drivers/edac/mce_amd.c:778: warning: left shift count >= width of type

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: bluesmoke-devel@lists.sourceforge.net
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:25 +01:00
Borislav Petkov
bad11e0318 EDAC, MCE: Enable MCE decoding on F15h
Now that everything is inplace, enable MCE decoding on F15h. Make
initcall routine a bit more readable.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:24 +01:00
Borislav Petkov
1b07ca47ff EDAC, MCE: Allow F15h bank 6 MCE injection
F15h adds a sixth MCE bank: adjust bank number check in the injection
code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:23 +01:00
Borislav Petkov
fa7ae8cc8c EDAC, MCE: Shorten error report formatting
Shorten up MCi_STATUS flags and add BD's new deferred and poison types.
Also, simplify formatting.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:22 +01:00
Borislav Petkov
6245288232 EDAC, MCE: Overhaul error fields extraction macros
Make macro names shorter thus making code shorter and more clear.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:21 +01:00
Borislav Petkov
b8f85c477b EDAC, MCE: Add F15h FP MCE decoder
Add decoder for FP MCEs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:20 +01:00
Borislav Petkov
8259a7e572 EDAC, MCE: Add F15 EX MCE decoder
Integrate the single FIROB signature into an expanded table along with
the new BD MCE types.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:19 +01:00
Borislav Petkov
05cd667d66 EDAC, MCE: Add an F15h NB MCE decoder
by (almost) reusing the F10h one since the signatures are the same.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:18 +01:00
Borislav Petkov
b18434cad1 EDAC, MCE: No F15h LS MCE decoder
F15h BD doesn't generate LS MCEs so warn about it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:17 +01:00
Borislav Petkov
70fdb494aa EDAC, MCE: Add F15h CU MCE decoder
MCE bank 2 is redefined from a BU to a CU (Combined Unit) bank on F15h.
Add a decoder function for CU MCEs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:16 +01:00
Borislav Petkov
86039cd401 EDAC, MCE: Add F15h IC MCE decoder
Add support for decoding F15h IC MCEs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:15 +01:00
Borislav Petkov
25a4f8b059 EDAC, MCE: Add F15h DC MCE decoder
Add a decoder for F15h DC MCEs to support the new types of DC MCEs
introduced by the BD microarchitecture.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:14 +01:00
Borislav Petkov
2be64bfac7 EDAC, MCE: Select extended error code mask
F15h enlarges the extended error code of an MCE to a 5-bit field
(MCi_STATUS[20:16]). Add a mask variable which default 0xf is overridden
on F15h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:12 +01:00
Borislav Petkov
a135cef79a amd64_edac: Disable DRAM ECC injection on K8
K8 does not allow for an atomic RMW to a cacheline as F10h does so
disable the error injection interface for it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:38:46 +01:00
Borislav Petkov
390944439f EDAC: Fixup scrubrate manipulation
Make the ->{get|set}_sdram_scrub_rate return the actual scrub rate
bandwidth it succeeded setting and remove superfluous arg pointer used
for that. A negative value returned still means that an error occurred
while setting the scrubrate. Document this for future reference.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:38:31 +01:00
Borislav Petkov
360b7f3c60 amd64_edac: Remove two-stage initialization
Now that all prerequisites are in place, drop the two-stage driver
instances initialization in favor of the following simple init sequence:

1. Probe PCI device: we only test ECC capabilities here and if none exit
early.

2. If the hw supports ECC and it is/can be enabled, we init the per-node
instance.

Remove "amd64_" prefix from static functions touched, while at it.

There actually should be no visible functional change resulting from
this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:34:03 +01:00
Borislav Petkov
2299ef7114 amd64_edac: Check ECC capabilities initially
Rework the code to check the hardware ECC capabilities at PCI probing
time. We do all further initialization only if we actually can/have ECC
enabled.

While at it:
0. Fix function naming.
1. Simplify/clarify debug output.
2. Remove amd64_ prefix from the static functions
3. Reorganize code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:34:02 +01:00
Borislav Petkov
ae7bb7c679 amd64_edac: Carve out ECC-related hw settings
This is in preparation for the init path reorganization where we want
only to

1) test whether a particular node supports ECC
2) can it be enabled

and only then do the necessary allocation/initialization. For that,
we need to decouple the ECC settings of the node from the instance's
descriptor.

The should be no functional change introduced by this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:34:00 +01:00
Borislav Petkov
f1db274e1b amd64_edac: Remove PCI ECS enabling functions
PCI ECS is being enabled by default since 2.6.26 on AMD so this code is
just superfluous now, remove it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:59 +01:00
Borislav Petkov
027dbd6f5d amd64_edac: Remove explicit Kconfig PCI dependency
AMD_NB pulls in the dependency on PCI. Clarify/fix help text while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:58 +01:00
Borislav Petkov
cc4d8860fc amd64_edac: Allocate driver instances dynamically
Remove static allocation in favor of dynamically allocating space for as
many driver instances as northbridges present on the system.

There should be no functional change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:57 +01:00
Borislav Petkov
24f9a7fe3f amd64_edac: Rework printk macros
Add a macro per printk level, shorten up error messages. Add relevant
information to KERN_INFO level. No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:56 +01:00
Borislav Petkov
8d5b5d9c7b amd64_edac: Rename CPU PCI devices
Rename variables representing PCI devices to their BKDG names for faster
search and shorter, clearer code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:54 +01:00
Borislav Petkov
b8cfa02f83 amd64_edac: Concentrate per-family init even more
Move the remaining per-family init code into the proper place and
simplify the rest of the initialization. Reorganize error handling in
amd64_init_one_instance().

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:53 +01:00
Borislav Petkov
bbd0c1f675 amd64_edac: Cleanup the CPU PCI device reservation
Shorten code and clarify comments, return proper -E* values on error.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:52 +01:00
Borislav Petkov
0092b20d4c amd64_edac: Simplify CPU family detection
Concentrate CPU family detection in the per-family init function.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:51 +01:00
Borislav Petkov
395ae783b3 amd64_edac: Add per-family init function
Run a per-family init function which does all the settings based on
the family this driver instance is running on. Move the scrubrate
calculation in it and simplify code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:50 +01:00
Borislav Petkov
9f56da0e3c amd64_edac: Use cached extended CPU model
... instead of computing it needlessly again.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:49 +01:00
Borislav Petkov
3ab0e7dc2e amd64_edac: Remove F11h support
F11h doesn't support DRAM ECC so whack it away.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:47 +01:00
Linus Torvalds
42cbd8efb0 Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, cacheinfo: Cleanup L3 cache index disable support
  x86, amd-nb: Cleanup AMD northbridge caching code
  x86, amd-nb: Complete the rename of AMD NB and related code
2011-01-06 10:50:28 -08:00
David Sterba
e7bf068aa3 i7core_edac: fix typos in comments
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-12-28 01:20:51 +01:00
Jiri Kosina
4b7bd36470 Merge branch 'master' into for-next
Conflicts:
	MAINTAINERS
	arch/arm/mach-omap2/pm24xx.c
	drivers/scsi/bfa/bfa_fcpim.c

Needed to update to apply fixes for which the old branch was too
outdated.
2010-12-22 18:57:02 +01:00
Borislav Petkov
e726f3c368 amd64_edac: Fix interleaving check
When matching error address to the range contained by one memory node,
we're in valid range when node interleaving

1. is disabled, or
2. enabled and when the address bits we interleave on match the
interleave selector on this node (see the "Node Interleaving" section in
the BKDG for an enlightening example).

Thus, when we early-exit, we need to reverse the compound logic
statement properly.

Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-12-08 19:52:54 +01:00
Andrei Konovalov
76f04f2591 EDAC: Correct MiB_TO_PAGES() macro
This corrects the misprint introduced when moving '#if
PAGE_SHIFT' from i7core_edac.c to edac_core.h (commit
e9144601d3)

Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Andrei Konovalov <akonovalov@mvista.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-12-08 19:52:53 +01:00
Borislav Petkov
bb31b3122c EDAC: Fix workqueue-related crashes
00740c5854 changed edac_core to
un-/register a workqueue item only if a lowlevel driver supplies a
polling routine. Normally, when we remove a polling low-level driver, we
go and cancel all the queued work. However, the workqueue unreg happens
based on the ->op_state setting, and edac_mc_del_mc() sets this to
OP_OFFLINE _before_ we cancel the work item, leading to NULL ptr oops on
the workqueue list.

Fix it by putting the unreg stuff in proper order.

Cc: <stable@kernel.org> #36.x
Reported-and-tested-by: Tobias Karnat <tobias.karnat@googlemail.com>
LKML-Reference: <1291201307.3029.21.camel@Tobias-Karnat>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-12-08 19:52:27 +01:00
Axel Lin
df4b2a30e0 EDAC, MCE: Fix edac_init_mce_inject error handling
Otherwise, variable i will be -1 inside the latest iteration of the
while loop.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-22 15:35:32 +01:00
Tracey Dent
f570e1dd84 EDAC: Remove deprecated kbuild goal definitions
Change EDAC's Makefile to use <modules>-y instead of
<modules>-objs because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

 [bp: Fixup commit message]
 [bp: Fixup indentation]

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-22 15:35:31 +01:00
Hans Rosenfeld
9653a5c76c x86, amd-nb: Cleanup AMD northbridge caching code
Support more than just the "Misc Control" part of the northbridges.
Support more flags by turning "gart_supported" into a single bit flag
that is stored in a flags member. Clean up related code by using a set
of functions (amd_nb_num(), amd_nb_has_feature() and node_to_amd_nb())
instead of accessing the NB data structures directly. Reorder the
initialization code and put the GART flush words caching in a separate
function.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-18 15:53:05 +01:00
Hans Rosenfeld
eec1d4fa00 x86, amd-nb: Complete the rename of AMD NB and related code
Not only the naming of the files was confusing, it was even more so for
the function and variable names.

Renamed the K8 NB and NUMA stuff that is also used on other AMD
platforms. This also renames the CONFIG_K8_NUMA option to
CONFIG_AMD_NUMA and the related file k8topology_64.c to
amdtopology_64.c. No functional changes intended.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-18 15:53:04 +01:00
Uwe Kleine-König
b595076a18 tree-wide: fix comment/printk typos
"gadget", "through", "command", "maintain", "maintain", "controller", "address",
"between", "initiali[zs]e", "instead", "function", "select", "already",
"equal", "access", "management", "hierarchy", "registration", "interest",
"relative", "memory", "offset", "already",

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-11-01 15:38:34 -04:00
Linus Torvalds
da62aa69c1 Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: (34 commits)
  i7core_edac: return -ENODEV when devices were already probed
  i7core_edac: properly terminate pci_dev_table
  i7core_edac: Avoid PCI refcount to reach zero on successive load/reload
  i7core_edac: Fix refcount error at PCI devices
  i7core_edac: it is safe to i7core_unregister_mci() when mci=NULL
  i7core_edac: Fix an oops at i7core probe
  i7core_edac: Remove unused member channels in i7core_pvt
  i7core_edac: Remove unused arg csrow from get_dimm_config
  i7core_edac: Reduce args of i7core_register_mci
  i7core_edac: Introduce i7core_unregister_mci
  i7core_edac: Use saved pointers
  i7core_edac: Check probe counter in i7core_remove
  i7core_edac: Call pci_dev_put() when alloc_i7core_dev()  failed
  i7core_edac: Fix error path of i7core_register_mci
  i7core_edac: Fix order of lines in i7core_register_mci
  i7core_edac: Always do get/put for all devices
  i7core_edac: Introduce i7core_pci_ctl_create/release
  i7core_edac: Introduce free_i7core_dev
  i7core_edac: Introduce alloc_i7core_dev
  i7core_edac: Reduce args of i7core_get_onedevice
  ...
2010-10-26 10:13:48 -07:00
Linus Torvalds
229aebb873 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  Update broken web addresses in arch directory.
  Update broken web addresses in the kernel.
  Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget
  Revert "Fix typo: configuation => configuration" partially
  ida: document IDA_BITMAP_LONGS calculation
  ext2: fix a typo on comment in ext2/inode.c
  drivers/scsi: Remove unnecessary casts of private_data
  drivers/s390: Remove unnecessary casts of private_data
  net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data
  drivers/infiniband: Remove unnecessary casts of private_data
  drivers/gpu/drm: Remove unnecessary casts of private_data
  kernel/pm_qos_params.c: Remove unnecessary casts of private_data
  fs/ecryptfs: Remove unnecessary casts of private_data
  fs/seq_file.c: Remove unnecessary casts of private_data
  arm: uengine.c: remove C99 comments
  arm: scoop.c: remove C99 comments
  Fix typo configue => configure in comments
  Fix typo: configuation => configuration
  Fix typo interrest[ing|ed] => interest[ing|ed]
  Fix various typos of valid in comments
  ...

Fix up trivial conflicts in:
	drivers/char/ipmi/ipmi_si_intf.c
	drivers/usb/gadget/rndis.c
	net/irda/irnet/irnet_ppp.c
2010-10-24 13:41:39 -07:00
Linus Torvalds
8de547e182 Merge branch 'devel' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/edac
* 'devel' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/edac: (25 commits)
  i7300_edac: Properly initialize per-csrow memory size
  V4L/DVB: i7300_edac: better initialize page counts
  MAINTAINERS: Add maintainer for i7300-edac driver
  i7300-edac: CodingStyle cleanup
  i7300_edac: Improve comments
  i7300_edac: Cleanup: reorganize the file contents
  i7300_edac: Properly detect channel on CE errors
  i7300_edac: enrich FBD error info for corrected errors
  i7300_edac: enrich FBD error info for fatal errors
  i7300_edac: pre-allocate a buffer used to prepare err messages
  i7300_edac: Fix MTR x4/x8 detection logic
  i7300_edac: Make the debug messages coherent with the others
  i7300_edac: Cleanup: remove get_error_info logic
  i7300_edac: Add a code to cleanup error registers
  i7300_edac: Add support for reporting FBD errors
  i7300_edac: Properly detect the type of error correction
  i7300_edac: Detect if the device is on single mode
  i7300_edac: Adds detection for enhanced scrub mode on x8
  i7300_edac: Clear the error bit after reading
  i7300_edac: Add error detection code for global errors
  ...
2010-10-24 13:06:57 -07:00
Mauro Carvalho Chehab
76a7bd8113 i7core_edac: return -ENODEV when devices were already probed
Due to the nature of i7core, we need to probe and attach all PCI
devices used by this driver during the first time probe is called.
However, PCI core will call the probe routine one time for each CPU
socket. If we return -EINVAL to those calls, it would seem that the
driver fails, when, in fact, there's no more devices left to initialize.

Changing the return code to -ENODEV solves this issue.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:36:19 -02:00
Mauro Carvalho Chehab
3c52cc57cc i7core_edac: properly terminate pci_dev_table
At pci_xeon_fixup(), it waits for a null-terminated table, while at
i7core_get_all_devices, it just do a for 0..ARRAY_SIZE. As other tables
are zero-terminated, change it to be terminate with 0 as well, and fixes
a bug where it may be running out of the table elements.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:31:50 -02:00
Mauro Carvalho Chehab
a3e1541637 i7core_edac: Avoid PCI refcount to reach zero on successive load/reload
That's a nasty bug that took me a lot of time to track, and whose
solution took just one line to solve. The best fragrances and the worse
poisons are shipped on the smalest bottles.

The drivers/pci/quick.c implements the pci_get_device function. The normal
behavior is that you call it, the function returns you a pdev pointer
and increment pdev->kobj.kref.refcount of the pci device. However,
if you want to keep searching an object, you need to pass the previous
pdev function to the search.

When you use a not null pointer to pdev "from" field, pci_get_device
will decrement pdev->kobj.kref.refcount, assuming that the driver won't
be using the previous pdev.

The solution is simple: we just need to call pci_dev_get() manually,
for the pdev's that the driver will actually use.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:41 -02:00
Mauro Carvalho Chehab
79daef2099 i7core_edac: Fix refcount error at PCI devices
Probably due to a bug or some testing logic at PCI level, device
refcount for <bus>:00.0 device is decremented at the end of the
pci_get_device, made by i7core_get_all_devices(). The fact is that
the first versions of the driver relied on those devices to probe
for Nehalem, but the current versions don't use it at all.

So, let's just remove those devices from the driver, making it simpler
and fixing the bug.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:41 -02:00
Mauro Carvalho Chehab
88ef5ea976 i7core_edac: it is safe to i7core_unregister_mci() when mci=NULL
i7core_unregister_mci() checks internally when mci=NULL. There's no
need to test it outside.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:41 -02:00
Mauro Carvalho Chehab
6d37d240f2 i7core_edac: Fix an oops at i7core probe
changeset c91d57ba9ce5b5c93a7077e2f72510eb1f9131c4 moved the init
of the priv pointer to the end of the probe routine. However, we need
them before that, otherwise, we hit an OOPS:

[   67.743453] EDAC DEBUG: mci_bind_devs: Associated fn 0.0, dev = ffff88011b46e000, socket 0
[   67.751861] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[   67.759685] IP: [<ffffffffa017e484>] i7core_probe+0x979/0x130c [i7core_edac]
[   67.766721] PGD 10bd38067 PUD 10bd37067 PMD 0
[   67.771178] Oops: 0000 [#1] SMP
[   67.774414] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
[   67.782213] CPU 1
[   67.784042] Modules linked in: i7core_edac(+) edac_core cpufreq_ondemand binfmt_misc dm_multipath video output pci_slot snd_hda_codd

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:41 -02:00
Hidetoshi Seto
21b6806a8c i7core_edac: Remove unused member channels in i7core_pvt
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:41 -02:00
Hidetoshi Seto
2e5185f7ff i7core_edac: Remove unused arg csrow from get_dimm_config
A local is enough.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:41 -02:00
Hidetoshi Seto
aace42831a i7core_edac: Reduce args of i7core_register_mci
We can check the number of channels in i7core_register_mci.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
1c6edbbe25 i7core_edac: Introduce i7core_unregister_mci
In i7core_probe, when setup of mci for 2nd or later socket failed,
we should cleanup prepared mci for 1st socket or so before "put" of
all devices.

So let have i7core_unregister_mci that can be shared between here
and i7core_remove.

While here fix a typo "hanler".

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
73589c80cd i7core_edac: Use saved pointers
We already have saved pointers.  Use shorter ones.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
71fe01706d i7core_edac: Check probe counter in i7core_remove
Prevent i7core_remove from running multiple times.
Otherwise value proved will be negative and something will be wrong.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
2896637b86 i7core_edac: Call pci_dev_put() when alloc_i7core_dev() failed
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
628c5ddfb0 i7core_edac: Fix error path of i7core_register_mci
Release resources properly.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
5939813b9c i7core_edac: Fix order of lines in i7core_register_mci
The flag is_registered is not initialized until mci_bind_devs()
is called.  Refer it properly.

The mci->dev and mci->edac_check is required in edac_mc_add_mc(),
so prepare them just before the call.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:39 -02:00
Hidetoshi Seto
64c10f6e0e i7core_edac: Always do get/put for all devices
We already do 'get' for all sockets at once. So do 'put' in the
same way.

And let args of the 'get' function to void since it handles
only the single, static and known size table pci_dev_table[].

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:39 -02:00
Hidetoshi Seto
a3aa0a4ab5 i7core_edac: Introduce i7core_pci_ctl_create/release
Have a couple of method.
while here sort out lines in the i7core_register_mci() a bit.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:39 -02:00
Hidetoshi Seto
2aa9be448d i7core_edac: Introduce free_i7core_dev
Have a method to make a couple with alloc_i7core_dev() previously
introduced.  Using in pair will help proper resource handling.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:39 -02:00
Hidetoshi Seto
848b2f7ed6 i7core_edac: Introduce alloc_i7core_dev
It's nice to have a method for a single purpose.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Hidetoshi Seto
b197cba071 i7core_edac: Reduce args of i7core_get_onedevice
Since we need to pass the index of the entry, pass the table itself
instead of passing individual members of the table.

While here make it static.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Hidetoshi Seto
45b7c981ae i7core_edac: Fix the logic in i7core_remove()
commit 47251b4d960bdfa648b0d06dbc6d445f41cb3906 have changed
the logic for unexplained reasons.  It looks strange that it
can release i7core_dev without calling i7core_put_devices()
that releases i7core_dev->pdev.

Fix the part.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab
54a08ab153 i7core_edac: Don't do the legacy PCI probe by default
The legacy PCI probe sometimes cause hangs. Better to have it
disabled by default, and have a parameter to enable it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab
accf74fff3 i7core_edac: don't use a freed mci struct
This is a nasty bug. Since kobject count will be reduced by zero by
edac_mc_del_mc(), and this triggers the kobj release method, the
mci memory will be freed automatically. So, all we have left is ctl_name,
as shown by enabling debug:

[   80.822186] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device()  remove_link
[   80.832590] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device()  remove_mci_instance
[   80.843776] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
[   80.855163] EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:3f:03.0
[   80.862936] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2089: (null): free structs
[   80.871134] EDAC DEBUG: in drivers/edac/edac_mc.c, line at 238: edac_mc_free()
[   80.878379] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 726: edac_mc_unregister_sysfs_main_kobj()
[   80.888043] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1232: drivers/edac/i7core_edac.c: i7core_put_devices()

Also, kfree(mci) shouldn't happen at the kobj.release, as it happens
when edac_remove_sysfs_mci_device() is called, but the logic is:
	edac_remove_sysfs_mci_device(mci);
	edac_printk(KERN_INFO, EDAC_MC,
		"Removed device %d for %s %s: DEV %s\n", mci->mc_idx,
		mci->mod_name, mci->ctl_name, edac_dev_name(mci));
So, as the edac_printk() needs the mci struct, this generates an OOPS.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab
bbc560ae67 edac_core: Print debug messages at release calls
This is important to track a nasty bug at the free logic.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab
ac99768c53 edac_core: Don't let free(mci) happen while using it
A very nasty bug were happening on edac core, due to the way mci objects are
freed. mci memory is freed when kobject count reaches zero, by
edac_mci_control_release(). However, from the logs, this is clearly happening
before the final usage of mci struct:

[15799.607454] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
[15799.618773] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 769: edac_inst_grp_release()
[15799.627326] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 894: edac_remove_mci_instance_attributes() end of seeking for group all_channel_counts
[15799.640887] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 877: edac_remove_mci_instance_attributes() sysfs_attrib = ffffffffa01d7240
[15799.653412] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device()  remove_link
[15799.663753] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device()  remove_mci_instance

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
6fe1108f14 edac_core: Do a better job with node removal
Make sure we remove groups at the right order

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
39300e7143 i7core_edac: explicitly remove PCI devices from the devices list
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
41ba6c1058 i7core_edac: MCE NMI handling should stop first
Otherwise, a NMI may happen causing a race condition and a panic.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
6ee7dd5044 i7core_edac: Initialize all priv vars before start polling
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
3cfd01468b i7core_edac: Improve debug to seek for register/remove errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
e9144601d3 i7core_edac: move #if PAGE_SHIFT to edac_core.h
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:36 -02:00
Mauro Carvalho Chehab
1288c18f48 i7core_edac: Properly mark const static vars as such
There are two groups of sysfs attributes: one for rdimm and another
for udimm. Instead of changing dynamically the unique static struct
for handling udimm's, declare two vars and make them constant.

This avoids the risk of having two or more memory controllers, each
needing a different set of attributes.

While here, use const on all places where it is applicable.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac_core: use const for constant sysfs arguments

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:14 -02:00
Mauro Carvalho Chehab
18c29002f9 i7core_edac: move static vars to the beginning of the file
While here, don't initialize probed with 0.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:12 -02:00
Mauro Carvalho Chehab
939747bd68 i7core_edac: Be sure that the edac pci handler will be properly released
With multi-sockets, more than one edac pci handler is enabled. Be sure to
un-register all instances.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:12 -02:00
Linus Torvalds
c029e405bd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (21 commits)
  EDAC, MCE: Fix shift warning on 32-bit
  EDAC, MCE: Add a BIT_64() macro
  EDAC, MCE: Enable MCE decoding on F12h
  EDAC, MCE: Add F12h NB MCE decoder
  EDAC, MCE: Add F12h IC MCE decoder
  EDAC, MCE: Add F12h DC MCE decoder
  EDAC, MCE: Add support for F11h MCEs
  EDAC, MCE: Enable MCE decoding on F14h
  EDAC, MCE: Fix FR MCEs decoding
  EDAC, MCE: Complete NB MCE decoders
  EDAC, MCE: Warn about LS MCEs on F14h
  EDAC, MCE: Adjust IC decoders to F14h
  EDAC, MCE: Adjust DC decoders to F14h
  EDAC, MCE: Rename files
  EDAC, MCE: Rework MCE injection
  EDAC: Export edac sysfs class to users.
  EDAC, MCE: Pass complete MCE info to decoders
  EDAC, MCE: Sanitize error codes
  EDAC, MCE: Remove unused function parameter
  EDAC, MCE: Add HW_ERR prefix
  ...
2010-10-21 14:04:58 -07:00
Linus Torvalds
2f0384e5fc Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, amd_nb: Enable GART support for AMD family 0x15 CPUs
  x86, amd: Use compute unit information to determine thread siblings
  x86, amd: Extract compute unit information for AMD CPUs
  x86, amd: Add support for CPUID topology extension of AMD CPUs
  x86, nmi: Support NMI watchdog on newer AMD CPU families
  x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs
  x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB
  x86, k8-gart: Decouple handling of garts and northbridges
  x86, cacheinfo: Fix dependency of AMD L3 CID
  x86, kvm: add new AMD SVM feature bits
  x86, cpu: Fix allowed CPUID bits for KVM guests
  x86, cpu: Update AMD CPUID feature bits
  x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit
  x86, AMD: Remove needless CPU family check (for L3 cache info)
  x86, tsc: Remove CPU frequency calibration on AMD
2010-10-21 13:01:08 -07:00
Borislav Petkov
525906bc89 EDAC, MCE: Fix shift warning on 32-bit
Fix

drivers/edac/mce_amd.c:262: warning: left shift count >= width of type

on 32-bit builds.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:07 +02:00
Borislav Petkov
cf1d2200db EDAC, MCE: Add a BIT_64() macro
Add a macro for 64-bit vectors to use when accessing MSR contents.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:06 +02:00
Borislav Petkov
fda7561f43 EDAC, MCE: Enable MCE decoding on F12h
Turn on MCE decoding on F12h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:06 +02:00
Borislav Petkov
cb9d5ecdff EDAC, MCE: Add F12h NB MCE decoder
F12h is completely covered by the generic path.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:05 +02:00
Borislav Petkov
e7281eb37d EDAC, MCE: Add F12h IC MCE decoder
... which is the same as for K8 and F10h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:05 +02:00
Borislav Petkov
9be0bb1072 EDAC, MCE: Add F12h DC MCE decoder
F12h DC MCE signatures are a subset of F10h's so reuse them.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:04 +02:00
Borislav Petkov
f0157b3afd EDAC, MCE: Add support for F11h MCEs
F11h has almost the same MCE signatures as K8 except DRAM ECC and MC5
bank errors. Reuse functionality from the other families.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:04 +02:00
Borislav Petkov
9530d608ef EDAC, MCE: Enable MCE decoding on F14h
Now that all decoders have been taught about F14h, models < 0x10
MCEs, enable decoding on this family of CPUs. Also, issue a short
informational message upon boot that MCE decoding gets enabled.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:03 +02:00
Borislav Petkov
fe4ea2623b EDAC, MCE: Fix FR MCEs decoding
Those are N/A on K8, so don't decode them there.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:03 +02:00
Borislav Petkov
5ce88f6ea6 EDAC, MCE: Complete NB MCE decoders
Add support for decoding F14h BU MCEs and improve decoding of the
remaining families.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:02 +02:00
Borislav Petkov
ded5062328 EDAC, MCE: Warn about LS MCEs on F14h
F14h CPUs do not generate LS MCEs so exit early and warn the user in
case this path is ever hit that something else might be going haywire.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:02 +02:00
Borislav Petkov
dd53bce4e8 EDAC, MCE: Adjust IC decoders to F14h
Add support for IC MCEs for F14h CPUs. K8 and F10h are almost identical
so use one function for both.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:01 +02:00
Borislav Petkov
888ab8e6eb EDAC, MCE: Adjust DC decoders to F14h
Add a per-family data cache decoders. Since there is a certain overlap
between the different DC MCE signatures, reuse functionality between the
families as far as possible.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:00 +02:00
Borislav Petkov
47ca08a40b EDAC, MCE: Rename files
Drop "edac_" string from the filenames since they're prefixed with edac/
in their pathname anyway.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:00 +02:00
Borislav Petkov
9cdeb404a1 EDAC, MCE: Rework MCE injection
Add sysfs injection facilities for testing of the MCE decoding code.
Remove large parts of amd64_edac_dbg.c, as a result, which did only
NB MCE injection anyway and the new injection code supports that
functionality already.

Add an injection module so that MCE decoding code in production kernels
like those in RHEL and SLES can be tested.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:59 +02:00
Borislav Petkov
30e1f7a812 EDAC: Export edac sysfs class to users.
Move toplevel sysfs class to the stub and make it available to
non-modularized code too. Add proper refcounting of its users and move
the registration functionality into the reference counting routines.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:59 +02:00
Borislav Petkov
7cfd4a8744 EDAC, MCE: Pass complete MCE info to decoders
... instead of the MCi_STATUS info only for improved handling of certain
types of errors later.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:58 +02:00
Borislav Petkov
6337583d7d EDAC, MCE: Sanitize error codes
Clean up error codes names, shorten to mnemonics, add RRRR boundary
checking.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:58 +02:00
Borislav Petkov
0ee8efa8f4 EDAC, MCE: Remove unused function parameter
Remove remains from previous functionality.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:57 +02:00
Borislav Petkov
c9f281fd96 EDAC, MCE: Add HW_ERR prefix
.. so that the user knows what she's looking at there in dmesg. Also,
fix a minor cosmetic output inconsistency.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:57 +02:00
Borislav Petkov
ca755e0a49 EDAC: Fix error return
We should return a negative value when we cannot get the toplevel edac
sysfs class.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:56 +02:00
Justin P. Mattock
631dd1a885 Update broken web addresses in the kernel.
The patch below updates broken web addresses in the kernel

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Dimitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Acked-by: Ben Pfaff <blp@cs.stanford.edu>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-10-18 11:03:14 +02:00
Marcin Slusarz
64aab720bd i7core_edac: fix panic in udimm sysfs attributes registration
Array of udimm sysfs attributes was not ended with NULL marker, leading to
dereference of random memory.

  EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm0
  EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm1
  EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm2
  BUG: unable to handle kernel NULL pointer dereference at 00000000000001a4
  IP: [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
  Pid: 1, comm: swapper Not tainted 2.6.36-rc3-nv+ #483 P6T SE/System Product Name
  RIP: 0010:[<ffffffff81330b36>]  [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
  (...)
  Call Trace:
   [<ffffffff81330b86>] edac_create_mci_instance_attributes+0x198/0x1f1
   [<ffffffff81330c9a>] edac_create_sysfs_mci_device+0xbb/0x2b2
   [<ffffffff8132f533>] edac_mc_add_mc+0x46b/0x557
   [<ffffffff81428901>] i7core_probe+0xccf/0xec0
  RIP  [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
  ---[ end trace 20de320855b81d78 ]---
  Kernel panic - not syncing: Attempted to kill init!

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-01 10:50:58 -07:00
Borislav Petkov
00740c5854 amd64_edac: Fix driver module removal
f4347553b3 removed the edac polling
mechanism in favor of using a notifier chain for conveying MCE
information to edac. However, the module removal path didn't test
whether the driver had setup the polling function workqueue at all and
the rmmod process was hanging in the kernel at try_to_del_timer_sync()
in the cancel_delayed_work() path, trying to cancel an uninitialized
work struct.

Fix that by adding a balancing check to the workqueue removal path.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-09-27 12:52:58 +02:00
Mauro Carvalho Chehab
e6649cc629 i7300_edac: Properly initialize per-csrow memory size
Due to the current edac-core limits, we cannot represent a per-channel
memory size, for FB-DIMM drivers. So, we need to sum-up all values
for each slot, in order to properly represent the total amount of
memory found by the i7300 driver.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-09-24 14:16:12 -03:00
Mauro Carvalho Chehab
1aa4a7b6b0 V4L/DVB: i7300_edac: better initialize page counts
It is still somewhat fake, as the pages may not be on this exact order,
and may even be used in mirror mode, but this is a best guess than the
other random fake values.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-09-24 14:16:12 -03:00
Andreas Herrmann
23ac4ae827 x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB
The file names are somehow misleading as the code is not specific to
AMD K8 CPUs anymore. The files accomodate code for other AMD CPU
northbridges as well.

Same is true for the config option which is valid for AMD CPU
northbridges in general and not specific to K8.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100917160343.GD4958@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-20 14:22:58 -07:00
Andreas Herrmann
900f9ac9f1 x86, k8-gart: Decouple handling of garts and northbridges
So far we only provide num_k8_northbridges. This is required in
different areas (e.g. L3 cache index disable, GART). But not all AMD
CPUs provide a GART. Thus it is useful to split off the GART handling
from the generic caching of AMD northbridge misc devices.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100917160254.GC4958@loge.amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-17 13:26:21 -07:00
Mauro Carvalho Chehab
9c6f6b65d2 i7300-edac: CodingStyle cleanup
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:57:06 -03:00
Mauro Carvalho Chehab
d091a6eb17 i7300_edac: Improve comments
This is basically a cleanup patch, improving the comments for each
function.

While here, do a few cleanups.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:57:05 -03:00
Mauro Carvalho Chehab
b4552aceb3 i7300_edac: Cleanup: reorganize the file contents
This change should do no functional change. It just rearranges the
contents of the c file, in order to make easier to understand and
maintain it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:57:04 -03:00
Mauro Carvalho Chehab
37b69cf91c i7300_edac: Properly detect channel on CE errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:57:03 -03:00
Mauro Carvalho Chehab
32f9472613 i7300_edac: enrich FBD error info for corrected errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:57:02 -03:00
Mauro Carvalho Chehab
8199d8cc65 i7300_edac: enrich FBD error info for fatal errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:57:01 -03:00
Mauro Carvalho Chehab
85580ea4f7 i7300_edac: pre-allocate a buffer used to prepare err messages
Instead of dynamically allocating a buffer for it where needed,
just allocate it once. As we'll use the same buffer also during
fatal and non-fatal errors, is is very risky to dynamically allocate
it during an error.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:59 -03:00
Mauro Carvalho Chehab
28c2ce7c8b i7300_edac: Fix MTR x4/x8 detection logic
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:58 -03:00
Mauro Carvalho Chehab
3b330f6758 i7300_edac: Make the debug messages coherent with the others
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:57 -03:00
Mauro Carvalho Chehab
f427742248 i7300_edac: Cleanup: remove get_error_info logic
As the error logic in this driver came from i5400 driver, it
were using one function to get errors, and another to display.
Let's make it simpler and avoid doing it into two steps.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:56 -03:00
Mauro Carvalho Chehab
e432760509 i7300_edac: Add a code to cleanup error registers
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:55 -03:00
Mauro Carvalho Chehab
57021918aa i7300_edac: Add support for reporting FBD errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:54 -03:00
Mauro Carvalho Chehab
15154c57c6 i7300_edac: Properly detect the type of error correction
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:52 -03:00
Mauro Carvalho Chehab
bb81a21637 i7300_edac: Detect if the device is on single mode
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:51 -03:00
Mauro Carvalho Chehab
d7de2bdb0e i7300_edac: Adds detection for enhanced scrub mode on x8
While here, do some cleanup by adding some macros to check
for device features.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:50 -03:00
Mauro Carvalho Chehab
86002324cf i7300_edac: Clear the error bit after reading
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:49 -03:00
Mauro Carvalho Chehab
5de6e07ed7 i7300_edac: Add error detection code for global errors
There's no mention at the datasheet about how to enable global error
reporting. So, I'm assuming that those errors are always enabled.
Maybe I'm plain wrong about that ;)

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:48 -03:00
Mauro Carvalho Chehab
3e57eef64c i7300_edac: Better name PCI devices
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:47 -03:00
Mauro Carvalho Chehab
116389ed21 i7300_edac: Add a FIXME note about the error correction type
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:45 -03:00
Mauro Carvalho Chehab
c3af2eaf7a i7300_edac: add global error registers
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:44 -03:00
Mauro Carvalho Chehab
af3d8831e7 i7300_edac: display info if ECC is enabled or not
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:43 -03:00
Mauro Carvalho Chehab
fcaf780b2a i7300_edac: start a driver for i7300 chipset (Clarksboro)
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-08-30 14:56:42 -03:00
Borislav Petkov
37b7370a8d amd64_edac: Do not report error overflow as a separate error
When the Overflow MCi_STATUS bit is set, EDAC reports the lost error
with a "no information available" message which often puzzles users
parsing the dmesg. This doesn't make much sense since this error has
been lost anyway so no need for reporting it separately. Thus, report
the overflow bit setting in the MCE dump instead. While at it, remove
reporting of MiscV and ErrorEnable (en) which are superfluous.

Now it looks like this:

[ 1501.650024] MC4_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
[ 1501.666887] Northbridge Error, node 2

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-08-26 12:46:03 +02:00
Borislav Petkov
e045c29126 MCE, AMD: Limit MCE decoding to current families for now
Limit MCE error decoding to current and older families only (K8-F11h).

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-08-24 18:06:54 +02:00
Linus Torvalds
58d4ea65b9 Merge branch 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6
* 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6:
  mmc_spi: Fix unterminated of_match_table
  of/sparc: fix build regression from of_device changes
  of/device: Replace struct of_device with struct platform_device
2010-08-12 09:11:31 -07:00
Anton Vorontsov
cd1542c819 edac: mpc85xx: add support for new MPCxxx/Pxxxx EDAC controllers
Simply add proper IDs into the device table.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Cc: Scott Wood <scottwood@freescale.com>
Cc: Peter Tyser <ptyser@xes-inc.com>
Cc: Dave Jiang <djiang@mvista.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:21 -07:00
Kulikov Vasiliy
b425d5c82d edac: i5400: improve handling of pci_enable_device() return value
-EIO is not the only error code that pci_enable_device() may return, also
the set of errors can be enhanced in future.  We should compare return
code with zero, not with concrete error value.

Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Jeff Roberson <jroberson@jroberson.net>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:21 -07:00
Kulikov Vasiliy
44aa80f005 edac: i5000: improve handling of pci_enable_device() return value
-EIO is not the only error code that pci_enable_device() may return, also
the set of errors can be enhanced in future.  We should compare return
code with zero, not with concrete error value.

Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Jeff Roberson <jroberson@jroberson.net>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:21 -07:00
Christoph Egger
bd1688dcdf edac: add wissing pieces from MPC85xx -> FSL_SOC_BOOKE
In 5753c082f6 ("powerpc/85xx: Kconfig
cleanup") menuconfig MPC85xx was replaced by FSL_SOC_BOOKE but some
references insider the code were not adjusted accordingly.  This patch
adresses these missing pieces.

Signed-off-by: Christoph Egger <siccegge@cs.fau.de>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Peter Tyser <ptyser@xes-inc.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Scott Wood <scottwood@freescale.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:20 -07:00
Grant Likely
2dc1158137 of/device: Replace struct of_device with struct platform_device
of_device is just an alias for platform_device, so remove it entirely.  Also
replace to_of_device() with to_platform_device() and update comment blocks.

This patch was initially generated from the following semantic patch, and then
edited by hand to pick up the bits that coccinelle didn't catch.

@@
@@
-struct of_device
+struct platform_device

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Reviewed-by: David S. Miller <davem@davemloft.net>
2010-08-06 09:25:50 -06:00
Borislav Petkov
c4799c7570 amd64_edac: Minor formatting fix
EDAC MC3: CE page 0xc32281, offset 0x8a0, grain 0, syndrome 0x1, row 2, channel 1, label "": amd64_edac
EDAC MC3: CE - no information available: amd64_edacError Overflow

Add the missing space before "Error Overflow" on the second line.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-08-04 11:16:01 +02:00
Borislav Petkov
962b70a1eb amd64_edac: Fix operator precendence error
The bitwise AND is of higher precedence, make that explicit.

Cc: <stable@kernel.org> # 34.x
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-08-04 11:15:09 +02:00
Borislav Petkov
eba042a81e edac, mc: Improve scrub rate handling
Fortify the interface to not accept negative values, remove
memctrl_int_store() as a result. Also, sanitize bandwidth setting by
making the argument a simple u32 instead of strange u32 pointer being
passed around for no obvious reason. Then, fix error handling and teach
it to return proper error values. Finally, make code more readable,
simplify debug messages.

Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
2010-08-03 16:14:06 +02:00
Borislav Petkov
bc57117856 amd64_edac: Correct scrub rate setting
Exit early when setting scrub rate on unknown/unsupported families.

Cc: <stable@kernel.org> # 32.x 33.x 34.x
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
2010-08-03 16:14:05 +02:00
Borislav Petkov
9975a5f22a amd64_edac: Fix DCT base address selector
The correct check is to verify whether in high range we're below 4GB
and not to extract the DctSelBaseAddr again. See "2.8.5 Routing DRAM
Requests" in the F10h BKDG.

Cc: <stable@kernel.org> # .32.x .33.x .34.x
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
2010-08-03 16:14:04 +02:00
Borislav Petkov
f4347553b3 amd64_edac: Remove polling mechanism
Switch to reusing the mcheck core's machine check polling mechanism
instead of duplicating functionality by using the EDAC polling routine.

Correct formatting while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
2010-08-03 16:14:03 +02:00
Borislav Petkov
695426506e amd64_edac: Remove unneeded defines
All F2x110-related bit defines are used at only one place so replace
them with simple BIT() macros.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
2010-08-03 16:14:01 +02:00
Borislav Petkov
935ab88e34 edac: Remove EDAC_DEBUG_VERBOSE
This option differs from EDAC_DEBUG only by printing the file and
line of where the debug statement is placed, which contains unneeded
information. So remove it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
2010-08-03 16:14:00 +02:00
Borislav Petkov
ad6a32e969 amd64_edac: Sanitize syndrome extraction
Remove the two syndrome extraction macros and add a single function
which does the same thing but with proper typechecking. While at it,
make sure to cache ECC syndrome size and dump it in debug output.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-08-03 16:13:31 +02:00
Anton Vorontsov
952e1c6632 edac: mpc85xx: fix coldplug/hotplug module autoloading
The MPC85xx EDAC driver is missing module device aliases, so the driver
won't load automatically on boot.  This patch fixes the issue by adding
proper MODULE_DEVICE_TABLE() macros.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Peter Tyser <ptyser@xes-inc.com>
Cc: Dave Jiang <djiang@mvista.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-27 14:32:06 -07:00
Daniel J Blueman
ab08937400 quiesce EDAC initialisation on desktop/mobile i7
Don't print failure to detect Core i7 EDAC facilities to the console at
boot time, most often occurring on Core i7 desktops and laptops.

Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-26 08:17:44 -07:00
Anton Vorontsov
5528e229f0 edac: mpc85xx: add support for MPC8569 EDAC controllers
Simply add a proper ID into the device table.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Peter Tyser <ptyser@xes-inc.com>
Cc: Dave Jiang <djiang@mvista.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-20 16:25:40 -07:00
Anton Vorontsov
1cd8521e7d edac: mpc85xx: fix MPC85xx dependency
Since commit 5753c082f6 ("powerpc/85xx:
Kconfig cleanup"), there is no MPC85xx Kconfig symbol anymore, so the
driver became non-selectable.

This patch fixes the issue by switching to PPC_85xx symbol.

Signed-off-by: Anton Vorontsov <avorontsov@mvista.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Peter Tyser <ptyser@xes-inc.com>
Cc: Dave Jiang <djiang@mvista.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-07-20 16:25:40 -07:00
Linus Torvalds
62fd985717 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core:
  MAINTAINERS: Add an entry for i7core_edac
  i7core_edac: Avoid doing multiple probes for the same card
  i7core_edac: Properly discover the first QPI device
2010-07-04 20:12:06 -07:00
Mauro Carvalho Chehab
2d95d8158b i7core_edac: Avoid doing multiple probes for the same card
As Nehalem/Nehalem-EP/Westmere devices uses several devices for the same
functionality (memory controller), the default way of proping devices doesn't
work. So, instead of a per-device probe, all devices should be probed at once.

This means that we should block any new attempt of probe, otherwise, it will
try to register the same device several times.

Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-07-02 18:04:29 -03:00
Mauro Carvalho Chehab
bda142890e i7core_edac: Properly discover the first QPI device
On Nehalem/Nehalem-EP/Westmere, the first QPI device is the last PCI bus.
The last bus is generally at 0x3f or 0xff, but there are also other systems
using different setups. For example, HP Z800 has 0x7f as the last bus.

This patch adds a logic to discover the last bus, dynamically detecting it
at runtime.

Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-07-02 18:04:05 -03:00
Borislav Petkov
41c310447f amd64_edac: Fix syndrome calculation on K8
When calculating the DCT channel from the syndrome we need to know the
syndrome type (x4 vs x8). On F10h, this is read out from extended PCI
cfg space register F3x180 while on K8 we only support x4 syndromes and
don't have extended PCI config space anyway.

Make the code accessing F3x180 F10h only and fall back to x4 syndromes
on everything else.

Cc: <stable@kernel.org> # .33.x .34.x
Reported-by: Jeffrey Merkey <jeffmerkey@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-07-02 17:32:34 +02:00
Linus Torvalds
9a9620db07 Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: (83 commits)
  i7core_edac: Better describe the supported devices
  Add support for Westmere to i7core_edac driver
  i7core_edac: don't free on success
  i7core_edac: Add support for X5670
  Always call i7core_[ur]dimm_check_mc_ecc_err
  i7core_edac: fix memory leak of i7core_dev
  EDAC: add __init to i7core_xeon_pci_fixup
  i7core_edac: Fix wrong device id for channel 1 devices
  i7core: add support for Lynnfield alternate address
  i7core_edac: Add initial support for Lynnfield
  i7core_edac: do not export static functions
  edac: fix i7core build
  edac: i7core_edac produces undefined behaviour on 32bit
  i7core_edac: Use a more generic approach for probing PCI devices
  i7core_edac: PCI device is called NONCORE, instead of NOCORE
  i7core_edac: Fix ringbuffer maxsize
  i7core_edac: First store, then increment
  i7core_edac: Better parse "any" addrmask
  i7core_edac: Use a lockless ringbuffer
  edac: Create an unique instance for each kobj
  ...
2010-06-04 15:39:54 -07:00
Anatolij Gustschin
a26f95fed3 of/edac: fix build breakage in drivers
Fixes build errors in EDAC drivers caused by the OF
device_node pointer being moved into struct device

Signed-off-by: Anatolij Gustschin <agust@denx.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-06-02 21:02:41 -06:00
Joe Perches
63ae96be98 drivers/edac: convert logging messages direct uses of __FILE__ to %s, __FILE
Reduces text by eliminating multiple __FILE__ uses.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Tim Small <tim@buttersideup.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:52 -07:00
Grant Likely
cf9b59e9d3 Merge remote branch 'origin' into secretlab/next-devicetree
Merging in current state of Linus' tree to deal with merge conflicts and
build failures in vio.c after merge.

Conflicts:
	drivers/i2c/busses/i2c-cpm.c
	drivers/i2c/busses/i2c-mpc.c
	drivers/net/gianfar.c

Also fixed up one line in arch/powerpc/kernel/vio.c to use the
correct node pointer.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2010-05-22 00:36:56 -06:00
Grant Likely
4018294b53 of: Remove duplicate fields from of_platform_driver
.name, .match_table and .owner are duplicated in both of_platform_driver
and device_driver.  This patch is a removes the extra copies from struct
of_platform_driver and converts all users to the device_driver members.

This patch is a pretty mechanical change.  The usage model doesn't change
and if any drivers have been missed, or if anything has been fixed up
incorrectly, then it will fail with a compile time error, and the fixup
will be trivial.  This patch looks big and scary because it touches so
many files, but it should be pretty safe.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Sean MacLennan <smaclennan@pikatech.com>
2010-05-22 00:10:40 -06:00
Mauro Carvalho Chehab
52707f918c i7core_edac: Better describe the supported devices
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 20:43:52 -03:00
Vernon Mauery
bd9e19ca46 Add support for Westmere to i7core_edac driver
This adds new PCI IDs for the Westmere's memory controller
devices and modifies the i7core_edac driver to be able to
probe both Nehalem and Westmere processors.

Signed-off-by: Vernon Mauery <vernux@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 20:23:56 -03:00
Roman Fietze
ee6583f6e8 PCI: fix typos pci_device_dis/enable to pci_dis/enable_device in comments
This fixes all occurrences of pci_enable_device and pci_disable_device
in all comments. There are no code changes involved.

Signed-off-by: Roman Fietze <roman.fietze@telemotive.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-05-18 14:59:08 -07:00
Tony Luck
d4d1ef4515 i7core_edac: don't free on success
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 14:47:31 -03:00
Mauro Carvalho Chehab
ac1ececea9 i7core_edac: Add support for X5670
As reported by Vernon Mauery <vernux@us.ibm.com>, X5670 (Westmere-EP) uses a
different register for one of the uncore PCI devices. Add support for
it.

Those are the PCI ID's on this new chipset:

fe:00.0 0600: 8086:2c70 (rev 02)
fe:00.1 0600: 8086:2d81 (rev 02)
fe:02.0 0600: 8086:2d90 (rev 02)
fe:02.1 0600: 8086:2d91 (rev 02)
fe:02.2 0600: 8086:2d92 (rev 02)
fe:02.3 0600: 8086:2d93 (rev 02)
fe:02.4 0600: 8086:2d94 (rev 02)
fe:02.5 0600: 8086:2d95 (rev 02)
fe:03.0 0600: 8086:2d98 (rev 02)
fe:03.1 0600: 8086:2d99 (rev 02)
fe:03.2 0600: 8086:2d9a (rev 02)
fe:03.4 0600: 8086:2d9c (rev 02)
fe:04.0 0600: 8086:2da0 (rev 02)
fe:04.1 0600: 8086:2da1 (rev 02)
fe:04.2 0600: 8086:2da2 (rev 02)
fe:04.3 0600: 8086:2da3 (rev 02)
fe:05.0 0600: 8086:2da8 (rev 02)
fe:05.1 0600: 8086:2da9 (rev 02)
fe:05.2 0600: 8086:2daa (rev 02)
fe:05.3 0600: 8086:2dab (rev 02)
fe:06.0 0600: 8086:2db0 (rev 02)
fe:06.1 0600: 8086:2db1 (rev 02)
fe:06.2 0600: 8086:2db2 (rev 02)
fe:06.3 0600: 8086:2db3 (rev 02)
(as usual, the same PCI devices repeat at ff: bus)

The PCI device 8086:2c70 is shown as:

fe:00.0 Host bridge: Intel Corporation QuickPath Architecture Generic
Non-core Registers (rev 02)

So, for this device to be recognized, it is only a matter of adding this
new PCI ID to the driver.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 13:15:42 -03:00
Vernon Mauery
8a311e179e Always call i7core_[ur]dimm_check_mc_ecc_err
This fixes an error in function i7core_check_error

In commit ca9c90ba09 which converts the
driver to use double buffering, there is a change in the logic.  Before,
if mce_count was zero, it skipped over a couple of statements and
finished out with a call to the *check_mc_ecc_err function.  The current
code checks to see if mce_count is 0 and then exits.

This change reverts the behavior back to the original where if there are
no errors to report, we skip to the end and call the *check_mc_ecc_err
function.

This fix allows the driver to work again on my Nehalem based blades
again.

Signed-off-by: Vernon Mauery <vernux@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 12:43:23 -03:00
Alexander Beregalov
2a6fae3267 i7core_edac: fix memory leak of i7core_dev
Free already allocated i7core_dev.

Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 11:45:20 -03:00
Jiri Slaby
71753e0141 EDAC: add __init to i7core_xeon_pci_fixup
It's called only from an __init function and is the only user
of pcibios_scan_specific_bus which will be marked as __devinit in
the next patch.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-18 11:45:19 -03:00
Mauro Carvalho Chehab
508fa179f8 i7core_edac: Fix wrong device id for channel 1 devices
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 12:18:31 -03:00
Mauro Carvalho Chehab
f05da2f785 i7core: add support for Lynnfield alternate address
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 12:18:29 -03:00
Mauro Carvalho Chehab
52a2e4fc37 i7core_edac: Add initial support for Lynnfield
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 12:18:28 -03:00
Randy Dunlap
3b918c12df edac: fix i7core build
Fix build warning (missing header file) and
build error when CONFIG_SMP=n.

drivers/edac/i7core_edac.c:860: error: implicit declaration of function 'msleep'
drivers/edac/i7core_edac.c:1700: error: 'struct cpuinfo_x86' has no member named 'phys_proc_id'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:32 -03:00
Alan Cox
486dd09f12 edac: i7core_edac produces undefined behaviour on 32bit
Fix the shifts up

Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:32 -03:00
Mauro Carvalho Chehab
de06eeef58 i7core_edac: Use a more generic approach for probing PCI devices
Currently, only one PCI set of tables is allowed. This prevents using
the driver for other devices like Lynnfield, with have a different
set of PCI ID's.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab
fd3826549d i7core_edac: PCI device is called NONCORE, instead of NOCORE
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab
321ece4dda i7core_edac: Fix ringbuffer maxsize
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab
6e103be1c7 i7core_edac: First store, then increment
Fix ringbuffer store logic.

While here, add a few comments to the code and remove the undesired
printk that could otherwise be called during NMI time.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:31 -03:00
Mauro Carvalho Chehab
4f87fad1d3 i7core_edac: Better parse "any" addrmask
Instead of accepting just "any", accept also "any\n"

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:30 -03:00
Mauro Carvalho Chehab
ca9c90ba09 i7core_edac: Use a lockless ringbuffer
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:30 -03:00
Mauro Carvalho Chehab
b968759ee7 edac: Create an unique instance for each kobj
Current code only works when there's just one memory
controller, since we need one kobj for each instance.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:49:30 -03:00
Mauro Carvalho Chehab
f338d73691 i7core_edac: Convert UDIMM error counters into a proper sysfs group
Instead of displaying 3 values at the same var, break it into 3
different sysfs nodes:

/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1
/sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2

For registered dimms, however, the error counters are already being
displayed at:
	/sys/devices/system/edac/mc/mc0/csrow*/ce_count

So, there's no need to add any extra sysfs nodes.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:02 -03:00
Mauro Carvalho Chehab
c419d921e6 edac: Don't create csrow entries on instance groups
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:02 -03:00
Mauro Carvalho Chehab
cc301b3ae3 edac: store/show methods for device groups weren't working
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:02 -03:00
Mauro Carvalho Chehab
a5538e531f i7core_edac: Add support for sysfs addrmatch group
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:01 -03:00
Mauro Carvalho Chehab
9fa2fc2e2d edac_core: Allow the creation of sysfs groups
Currently, all sysfs nodes are stored at /sys/.*/mc. (regex)
However, sometimes it is needed to create attribute groups.

This patch extends edac_core to allow groups creation.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:01 -03:00
Mauro Carvalho Chehab
4af91889e0 i7core_edac: Avoid printing a warning when debug is disabled
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab
4253868034 i7core_edac: We need to use list_for_each_entry_safe to avoid errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab
22e6bcbdcf i7core_edac: change remove module strategy
The old remove module stragegy didn't work on devices with multiple
cores, since only one PCI device is used to open all mc's, due to
Nehalem nature.

Also, it were based at pdev value. However, this doesn't point to the
pci device used at mci->dev.

So, instead, it unregisters all devices at once, deleting them from the
device list.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab
0f062792b4 i7core_edac: remove static counter for max sockets
The number of sockets is now fully dynamic. Get rid of this obsolete
var.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:45:00 -03:00
Mauro Carvalho Chehab
13d6e9b653 i7core_edac: at remove, don't remove all pci devices at once
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:59 -03:00
Mauro Carvalho Chehab
d88b85072f i7core_edac: Fix a bug when printing error counts with RDIMMs
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:59 -03:00
Mauro Carvalho Chehab
d4c277957f i7core_edac: a few fixes for multiple mc's
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:59 -03:00
Mauro Carvalho Chehab
6c6aa3afdb i7core_edac: sanity check: print a warning if a mcelog is ignored
In thesis, the other mc controller should handle it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab
f47429494f i7core_edac: create one mc per socket/QPI
Instead of creating just one memory controller, create one per socket
(e. g. per Quick Link Path Interconnect).

This better reflects the Nehalem architecture.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab
66607706ce Dynamically allocate memory for PCI devices
Instead of using a static table assuming always 2 CPU sockets, allocate
space dynamically for Nehalem PCI devs.

This patch is part of a series of patches that changes i7core_edac to
allow more than 2 sockets and to properly report one memory controller
per socket.
2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab
a55456f344 i7core: temporary workaround to allow it to compile against 2.6.30
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:58 -03:00
Mauro Carvalho Chehab
3a3bb4a647 i7core_edac: Improve corrected_error_counts output for RDIMM
Just cosmetics. instead of showing something like:

socket 0, channel 2dimm0: 1
dimm1: 0
dimm2: 0
socket 1, channel 2dimm0: 0
dimm1: 0
dimm2: 0

Show:

socket 0, channel 2 RDIMM0: 1 RDIMM1: 0 RDIMM2: 0
socket 0, channel 2 RDIMM0: 0 RDIMM1: 0 RDIMM2: 0

This is more synthetic and easier to parse.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:58 -03:00
Keith Mannthey
bc2d7245ff i7core_edac: Probe on Xeons eariler
On the Xeon 55XX series cpus the pci deives are not exposed via acpi so
we much explicitly probe them to make the usable as a Linux PCI device.

This moves the detection of this state to before pci_register_driver is
called.  Its present position was not working on my systems, the driver
would complain about not finding a specific device.

This patch allows the driver to load on my systems.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:57 -03:00
Mauro Carvalho Chehab
14d2c08343 i7core: Use registered memories per processor
Instead of assuming that the entire machine has either registered or
unregistered memories, do it at CPU socket based.

While here, fix a bug at i7core_mce_output_error(), where the we're
using m->cpu directly as if it would represent a socket. Instead, the
proper socket_id is given by cpu_data[m->cpu].phys_proc_id.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
---
2010-05-10 11:44:57 -03:00
Mauro Carvalho Chehab
b4e8f0b6ea i7core_edac: Use Device 3 function 2 to report errors with RDIMM's
Nehalem and upper chipsets provide an special device that has corrected memory
error counters detected with registered dimms. This device is only seen if
there are registered memories plugged.

After this patch, on a machine fully equiped with RDIMM's, it will use the
Device 3 function 2 to count corrected errors instead on relying at mcelog.

For unregistered DIMMs, it will keep the old behavior, counting errors
via mcelog.

This patch were developed together with Keith Mannthey <kmannth@us.ibm.com>

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:56 -03:00
Keith Mannthey
61053fdedb i7core_edac: Fix ecc enable shift
From: Keith Mannthey <kmannth@us.ibm.com>

Simple correction to a shift value.
ECC_ENABLED is bit 4 of MC_STATUS, Dev 3 Fun 0 Offset 0x4c

This correctly identifies the state of the ECC at the machine.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:56 -03:00
Mauro Carvalho Chehab
3ef288a983 i7core_edac: Print an error message if pci register fails
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:56 -03:00
Mauro Carvalho Chehab
b990538a78 i7core_edac: CodingSyle fixes/cleanups
No functional changes.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:56 -03:00
Mauro Carvalho Chehab
4157d9f554 i7core_edac: fix error injection
There were two stupid error injection bugs introduced by wrong
cut-and-paste: one at socket store, and another at the error inject
register. The last one were causing the code to not work at all.

While here, adds debug messages to allow seeing what registers are being
set while sending error injection.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:55 -03:00
Mauro Carvalho Chehab
2068def56c i7core_edac: fix error codes for sysfs error injection interface
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:55 -03:00
Mauro Carvalho Chehab
276b824c30 i7core_edac: some fixes at error injection code
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:54 -03:00
Mauro Carvalho Chehab
17cb7b0cf7 i7core_edac: Some cleanups at displayed info
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:54 -03:00
Mauro Carvalho Chehab
086271a037 i7core: remove some uneeded noisy debug messages
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:54 -03:00
Mauro Carvalho Chehab
3a7dde7fcd i7core: add socket info at the debug msg
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab
ec6df24c15 i7core: better document i7core_get_active_channels()
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab
c77720b954 i7core: fix get_devices routine for Xeon55xx
i7core_get_devices() were preparet to get just the first found device of each type.
Due to that, on Xeon 55xx, only socket 1 were retrived.

Rework i7core_get_devices() to clean it and to properly support Xeon 55xx.

While here, fix a small typo.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab
a639539fa2 i7core: enrich error information based on memory transaction type
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab
c5d3452869 i7core: check if the memory error is fatal or non-fatal
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:53 -03:00
Mauro Carvalho Chehab
310cbb7284 i7core: fix probing on Xeon55xx
Xeon55xx fails to probe with this error message:

EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1660: MC: drivers/edac/i7core_edac.c: i7core_init()
EDAC i7core: Device not found: dev 00:00.0 PCI ID 8086:2c41
i7core_edac: probe of 0000:00:14.0 failed with error -22

This is due to the fact that, on Xeon35xx (and i7core), device 00.0 has
PCI ID 8086:2c40.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab
f237fcf2b7 i7core_edac: some fixes at memory error parser
m->bank is not related to the memory bank but, instead, to the MCA Error
register bank. Fix it accordingly. While here, improves the comments for
Nehalem bank.

A later fix is needed, in order to get bank/rank information from MCA
error log.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab
8a2f118e3a i7core_edac: decode mcelog error and send it via edac interface
Enriches mcelog error by using the encoded information at MCE status and
misc registers (IA32_MCx_STATUS, IA32_MCx_MISC).

Some fixes are still needed here, in order to properly fill the EDAC
fields.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab
ba6c5c62ee i7core_edac: maps all sockets as if ther are one MC controller
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:52 -03:00
Mauro Carvalho Chehab
67166af4ab i7core_edac: add support for more than one MC socket
Some Nehalem architectures have more than one MC socket. Socket 0 is
located at bus 255.

Currently, it is using up to 2 sockets, but increasing it to a larger
number is just a matter of increasing MAX_SOCKETS definition.

This seems to be required for properly support of Xeon 55xx.

Still needs testing with Xeon 55xx.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:51 -03:00
Mauro Carvalho Chehab
d1fd4fb69e i7core_edac: Add a code to probe Xeon 55xx bus
This code changes the detection procedure of i7core_edac. Instead of
directly probing for MC registers, it probes for another register found
on Nehalem. If found, it tries to pick the first MC PCI BUS. This should
work fine with Xeon 35xx, but, on Xeon 55xx, this is at bus 254 and 255
that are not properly detected by the non-legacy PCI methods.

The new detection code scans specifically at buses 254 and 255 for the
Xeon 55xx devices.

This code has not tested yet. After working, a change at the code will
be needed, since the i7core is not yet ready for working with 2 sets of
MC.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:51 -03:00
Mauro Carvalho Chehab
e9bd2e7379 i7core_edac: Adds write unlock to MC registers
The public Intel Xeon 5500 volume 2 datasheet describes, on page 53,
session 2.6.7 a register that can lock/unlock Memory Controller the
configuration register, called MC_CFG_CONTROL.

Adds support for it in the hope that software error injection would
work. With my tests with Xeon 35xx, there's still something missing.
With a program that does sequencial bit writes at dev 0.0, sometimes, it
produces error injection, after unblocking the MC_CFG_CONTROL (and,
sometimes, it just locks my testing machine).

I'll try later to discover by trial and error what's the register that
solves this issue on Xeon 35xx.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:50 -03:00
Mauro Carvalho Chehab
d5381642ab i7core_edac: Add edac_mce glue
Adds a glue code to allow i7core to work with mcelog. With the glue,
i7core registers itself on edac_mce. At mce, when an error is detected,
it calls all registered drivers (in this case, i7core), for EDAC error
handling.

TODO: It currently just prints the MCE error log using about the same
      format as mce panic messages. The error message should be enhanced
      with mcelog userspace info and converted into the proper EDAC format,
      to feed the EDAC error counts.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:50 -03:00
Mauro Carvalho Chehab
963c5ba359 edac/Kconfig: edac_mce can't be module
Since mcelog is bool, edac_mce glue should also be bool, or otherwise
will not work.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:50 -03:00
Mauro Carvalho Chehab
696e409dbd edac_mce: Add an interface driver to report mce errors via edac
edac_mce module is an interface module that gets mcelog data and
forwards to any registered edac module that expects to receive data via
mce.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:49 -03:00
Mauro Carvalho Chehab
41fcb7feed i7core_edac: CodingStyle fixes
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:48 -03:00
Mauro Carvalho Chehab
eb94fc402f i7core_edac: fill csrows edac sysfs info
csrows is still fake, since we can't identify its representation with
Nehalem registers.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:48 -03:00
Mauro Carvalho Chehab
5566cb7c91 i7core_edac: Memory info fixes and preparation for properly filling cswrow data
Now, memory size is properly displayed:

    EDAC i7core: DOD Max limits: DIMMS: 2, 1-ranked, 8-banked
    EDAC i7core: DOD Max rows x colums = 0x4000 x 0x400
    EDAC i7core: Memory channel configuration:
    EDAC i7core: Ch0 phy rd0, wr0 (0x063f7c31): 2 ranks, UDIMMs
    EDAC i7core:    dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8,
                    numrank: 1, numrow: 0x4000, numcol: 0x400
    EDAC i7core:    dimm 1 (0x00001288) 1024 Mb offset: 4, numbank: 8,
                    numrank: 1, numrow: 0x4000, numcol: 0x400
    EDAC i7core: Ch1 phy rd1, wr1 (0x063f7c31): 2 ranks, UDIMMs
    EDAC i7core:    dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8,
                    numrank: 1, numrow: 0x4000, numcol: 0x400
    EDAC i7core: Ch2 phy rd3, wr3 (0x063f7c31): 2 ranks, UDIMMs
    EDAC i7core:    dimm 0 (0x00000288) 1024 Mb offset: 0, numbank: 8,
                    numrank: 1, numrow: 0x4000, numcol: 0x400

Still, as the way to retrieve csrows info is not known, it does a
mapping of what's available to csrows basic unit at edac core.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:48 -03:00
Mauro Carvalho Chehab
854d334997 i7core_edac: Get more info about the memory DIMMs
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:48 -03:00
Mauro Carvalho Chehab
7dd6953c5f i7core_edac: Add more information about each active dimm
Thanks-to: Aristeu Rozanski <aris@redhat.com> for part of the code

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:47 -03:00
Mauro Carvalho Chehab
b7c761512c i7core_edac: Improve error handling
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:47 -03:00
Mauro Carvalho Chehab
1c6fed808f i7core_edac: Properly fill struct csrow_info
Thanks-to: Aristeu Rozanski <aris@redhat.com> for part of the code

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:47 -03:00
Mauro Carvalho Chehab
ef708b53b9 i7core_edac: Add additional tests for error detection
Properly check the number of channels and improve probing error detection

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:47 -03:00
Mauro Carvalho Chehab
442305b152 i7core_edac: Add a memory check routine, based on device 3 function 4
This function appears only on Xeon 5500 datasheet. Yet, testing with a
Xeon 3503 showed that this is also implemented on other Nehalem
processors.

At the first read, MC_TEST_ERR_RCV1 and MC_TEST_ERR_RCV0 can contain any
value. Modify CE error logic to update the error count only after the
second read.

An alternative approach would be to do a write at rcv0 and rcv1
registers, but it seemed better to keep they untouched, since BIOS might
eventually assume that they are exclusive for their usage.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:46 -03:00
Mauro Carvalho Chehab
87d1d272ba i7core_edac: need mci->edac_check, otherwise module removal doesn't work
There are some locking troubles with edac_core: if you don't declare an
edac_check, module may suffer from soft lock.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:46 -03:00
Mauro Carvalho Chehab
7b029d03c3 i7core_edac: A few fixes at error injection code
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:46 -03:00
Mauro Carvalho Chehab
f122a89222 i7core_edac: Show read/write virtual/physical channel association
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:46 -03:00
Mauro Carvalho Chehab
8f33190757 i7core_edac: Registers all supported MC functions
Now, it will try to register on all supported Memory Controller
functions.

It should be noticed that dev3, function 2 is present only on chips with
Registered DIMM's, according to the datasheet. So, the driver doesn't
return -ENODEV is all functions but this one were successfully
registered and enabled:

    EDAC i7core: Registered device 8086:2c18 fn=3 0
    EDAC i7core: Registered device 8086:2c19 fn=3 1
    EDAC i7core: Device not found: PCI ID 8086:2c1a (dev 3, func 2)
    EDAC i7core: Registered device 8086:2c1c fn=3 4
    EDAC i7core: Registered device 8086:2c20 fn=4 0
    EDAC i7core: Registered device 8086:2c21 fn=4 1
    EDAC i7core: Registered device 8086:2c22 fn=4 2
    EDAC i7core: Registered device 8086:2c23 fn=4 3
    EDAC i7core: Registered device 8086:2c28 fn=5 0
    EDAC i7core: Registered device 8086:2c29 fn=5 1
    EDAC i7core: Registered device 8086:2c2a fn=5 2
    EDAC i7core: Registered device 8086:2c2b fn=5 3
    EDAC i7core: Registered device 8086:2c30 fn=6 0
    EDAC i7core: Registered device 8086:2c31 fn=6 1
    EDAC i7core: Registered device 8086:2c32 fn=6 2
    EDAC i7core: Registered device 8086:2c33 fn=6 3
    EDAC i7core: Driver loaded.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:45 -03:00
Mauro Carvalho Chehab
0b2b7b7ec0 i7core_edac: Add more status functions to EDAC driver
This patch were co-authored with Aristeu Rozanski.

Signed-off-by: Aristeu Sergio <arozansk@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:45 -03:00
Mauro Carvalho Chehab
194a40feab i7core_edac: Add error insertion code for Nehalem
Implements set_inject_error() with the low-level code needed to inject
memory errors at Nehalem, and adds some sysfs nodes to allow error injection

The next patch will add an API for error injection.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:45 -03:00
Mauro Carvalho Chehab
a0c36a1f0f i7core_edac: Add an EDAC memory controller driver for Nehalem chipsets
This driver is meant to support i7 core/i7core extreme desktop
processors and Xeon 35xx/55xx series with integrated memory controller.
It is likely that it can be expanded in the future to work with other
processor series based at the same Memory Controller design.

For now, it has just a few MCH status reads.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-05-10 11:44:45 -03:00
Borislav Petkov
35d824b28f edac, mce: Fix wrong mask and macro usage
Correct two mishaps which prevented reporting error type (CECC vs UECC)
and extended error description.

Cc: <stable@kernel.org> # 32.x,  33.x
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-30 10:15:39 -07:00
Tejun Heo
5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Borislav Petkov
5b89d2f9ac edac, mce: Filter out invalid values
Print the CPU associated with the error only when the field is valid.

Cc: <stable@kernel.org> # .32.x .33.x
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-03-22 16:33:31 +01:00
Peter Tyser
8004fd2ad6 edac: e752x: add dram scrubbing support
Add support to scrub DRAM using the e752x integrated memory scrubbing
engine.  The e7320/7520/e7525 chipsets support scrubbing at one rate while
the i3100 chipset supports a normal and fast rate.

A similar patch was originally sent back in 2008:
http://sourceforge.net/mailarchive/forum.php?thread_name=1204835866.25206.70.camel@localhost.localdomain&forum_name=bluesmoke-devel

This version has the following updates:
- Use 16-bit PCI config cycles to access MCHSCRB register
    e7320/7520/e7525 docs say register is 16bits wide, i3100 says 8.  I
    tested 16bits on the i3100 to be safe.
- Recalcuate and round actual scrub rates

The changes have been tested on an i3100-based board.

Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-12 15:52:40 -08:00
Konstantin Olifer
8de5c1a165 edac: e752x fsb ecc
FSB parity is only supported on the Xeon processor.  Previously it was
incorrectly enabled for the Celeron as well.

Signed-off-by: Konstantin Olifer <kolifer@gmail.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-12 15:52:40 -08:00
H Hartley Sweeten
66ed3f7516 edac: mpc85xx use resource_size instead of raw math
Use resource_size() instead of arithmetic.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Acked-by: Dave Jiang <djiang@mvista.com>
Cc: Peter Tyser <ptyser@xes-inc.com>
Cc: Kumar Gala <galak@gate.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-12 15:52:40 -08:00
Peter Tyser
dcca7c3d00 edac: mpc85xx improve SDRAM error reporting
Add the ability to detect the specific data line or ECC line which failed
when printing out SDRAM single-bit errors.  An example of a single-bit
SDRAM ECC error is below:

  EDAC MPC85xx MC1: Err Detect Register: 0x80000004
  EDAC MPC85xx MC1: Faulty data bit: 59
  EDAC MPC85xx MC1: Expected Data / ECC:  0x7f80d000_409effa0 / 0x6d
  EDAC MPC85xx MC1: Captured Data / ECC:  0x7780d000_409effa0 / 0x6d
  EDAC MPC85xx MC1: Err addr: 0x00031ca0
  EDAC MPC85xx MC1: PFN: 0x00000031

Knowning which specific data or ECC line caused an error can be useful in
tracking down hardware issues such as improperly terminated signals, loose
pins, etc.

Note that this feature is only currently enabled for 64-bit wide data
buses, 32-bit wide bus support should be added.

I don't have any 32-bit wide systems to test on.  If someone has one and
is willing to give this patch a shot with the check for a 64-bit data bus
removed it would be much appreciated and I can re-submit with both 32 and
64 bit buses supported.

Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Dave Jiang <djiang@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-12 15:52:40 -08:00
Peter Tyser
21768639be edac: mpc85xx mask ecc syndrome correctly
With a 64-bit wide data bus only the lowest 8-bits of the ECC syndrome are
relevant.  With a 32-bit wide data bus only the lowest 16-bits are
relevant on most architectures.

Without this change, the ECC syndrome displayed can be mildly confusing,
eg:

  EDAC MPC85xx MC1: syndrome: 0x25252525

When in reality the ECC syndrome is 0x25.

A variety of Freescale manuals say a variety of different things about how
to decode the CAPTURE_ECC (syndrome) register.  I don't have a system with
a 32-bit bus to test on, but I believe the change is correct.  It'd be
good to get an ACK from someone at Freescale about this change though.

Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Dave Jiang <djiang@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-12 15:52:40 -08:00
Emese Revfy
52cf25d0ab Driver core: Constify struct sysfs_ops in struct kobj_type
Constify struct sysfs_ops.

This is part of the ops structure constification
effort started by Arjan van de Ven et al.

Benefits of this constification:

 * prevents modification of data that is shared
   (referenced) by many other structure instances
   at runtime

 * detects/prevents accidental (but not intentional)
   modification attempts on archs that enforce
   read-only kernel data at runtime

 * potentially better optimized code as the compiler
   can assume that the const data cannot be changed

 * the compiler/linker move const data into .rodata
   and therefore exclude them from false sharing

Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-03-07 17:04:49 -08:00
Linus Torvalds
eaa5eec739 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  amd64_edac: Simplify ECC override handling
2010-03-03 09:25:37 -08:00
Linus Torvalds
0a135ba14d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: add __percpu sparse annotations to what's left
  percpu: add __percpu sparse annotations to fs
  percpu: add __percpu sparse annotations to core kernel subsystems
  local_t: Remove leftover local.h
  this_cpu: Remove pageset_notifier
  this_cpu: Page allocator conversion
  percpu, x86: Generic inc / dec percpu instructions
  local_t: Move local.h include to ringbuffer.c and ring_buffer_benchmark.c
  module: Use this_cpu_xx to dynamically allocate counters
  local_t: Remove cpu_local_xx macros
  percpu: refactor the code in pcpu_[de]populate_chunk()
  percpu: remove compile warnings caused by __verify_pcpu_ptr()
  percpu: make accessors check for percpu pointer in sparse
  percpu: add __percpu for sparse.
  percpu: make access macros universal
  percpu: remove per_cpu__ prefix.
2010-03-03 07:34:18 -08:00
Borislav Petkov
d95cf4de6a amd64_edac: Simplify ECC override handling
No need for clearing ecc_enable_override and checking it in two places.
Instead, simply check it during probing and act accordingly. Also,
rename the flag bitfields according to the functionality they actually
represent. What is more, make sure original BIOS ECC settings are
restored when the module is unloaded.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-03-01 19:25:12 +01:00
Tejun Heo
a29d8b8e2d percpu: add __percpu sparse annotations to what's left
Add __percpu sparse annotations to places which didn't make it in one
of the previous patches.  All converions are trivial.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors.  This patch doesn't affect normal builds.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Neil Brown <neilb@suse.de>
2010-02-17 11:17:38 +09:00
Linus Torvalds
676ad58553 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  amd64_edac: Do not falsely trigger kerneloops
2010-02-11 14:07:13 -08:00
Peter Tyser
f8c63345b4 edac: mpc85xx fix build regression by removing unused debug code
Some unused, unsupported debug code existed in the mpc85xx EDAC driver
that resulted in a build failure when CONFIG_EDAC_DEBUG was defined:

  drivers/edac/mpc85xx_edac.c: In function 'mpc85xx_mc_err_probe':
  drivers/edac/mpc85xx_edac.c:1031: error: implicit declaration of function 'edac_mc_register_mcidev_debug'
  drivers/edac/mpc85xx_edac.c:1031: error: 'debug_attr' undeclared (first use in this function)
  drivers/edac/mpc85xx_edac.c:1031: error: (Each undeclared identifier is reported only once
  drivers/edac/mpc85xx_edac.c:1031: error: for each function it appears in.)

Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-11 13:59:42 -08:00
Peter Tyser
cff9279e4e edac: mpc85xx fix bad page calculation
Commit b484625172 ("edac: mpc85xx add
mpc83xx support") accidentally broke how a chip select's first and last
page addresses are calculated.  The page addresses are being shifted too
far right by PAGE_SHIFT.  This results in errors such as:

  EDAC MPC85xx MC1: Err addr: 0x003075c0
  EDAC MPC85xx MC1: PFN: 0x00000307
  EDAC MPC85xx MC1: PFN out of range!
  EDAC MC1: INTERNAL ERROR: row out of range (4 >= 4)
  EDAC MC1: CE - no information available: INTERNAL ERROR

The vaule of PAGE_SHIFT is already being taken into consideration during
the calculation of the 'start' and 'end' variables, thus it is not
necessary to account for it again when setting a chip select's first and
last page address.

Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Ira W. Snyder <iws@ovro.caltech.edu>
Cc: Kumar Gala <galak@gate.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-11 13:59:42 -08:00
Borislav Petkov
cab4d27764 amd64_edac: Do not falsely trigger kerneloops
An unfortunate "WARNING" in the message amd64_edac dumps when the system
doesn't support DRAM ECC or ECC checking is not enabled in the BIOS
used to trigger kerneloops which qualified the message as an OOPS thus
misleading the users. See, e.g.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/422536
http://bugzilla.kernel.org/show_bug.cgi?id=15238

Downgrade the message level to KERN_NOTICE and fix the formulation.

Cc: stable@kernel.org # .32.x
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
2010-02-11 20:32:14 +01:00
Tamas Vincze
118f3e1afd edac: i5000_edac critical fix panic out of bounds
EDAC MC0: INTERNAL ERROR: channel-b out of range (4 >= 4)
Kernel panic - not syncing: EDAC MC0: Uncorrected Error  (XEN) Domain 0 crashed: 'noreboot' set - not rebooting.

This happens because FERR_NF_FBD bit 28 is not updated on i5000.  Due to
that, both bits 28 and 29 may be equal to one, returning channel = 3.  As
this value is invalid, EDAC core generates the panic.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14568

Signed-off-by: Tamas Vincze <tom@vincze.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-16 12:15:38 -08:00
Roel Kluin
926311fd7d amd64_edac: Ensure index stays within bounds in amd64_get_scrub_rate
Add a missing iterator variable thus fixing the conditional of the
for-loop in amd64_get_scrub_rate().

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-01-15 10:45:58 +01:00
Borislav Petkov
5213c32f9d edac, pci: remove pesky debug printk
Do not spam the logs needlessly with the sole info that
edac_pci_dev_parity_clear is being called.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24 11:07:09 +01:00
Borislav Petkov
92389102b6 amd64_edac: restrict PCI config space access
Do not access F2x19[0,4] on K8 since they're undefined there.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24 11:07:08 +01:00
Borislav Petkov
43f5e68733 amd64_edac: fix forcing module load/unload
Clear the override flag after force-loading the module.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24 11:07:08 +01:00
Borislav Petkov
56b34b91e2 amd64_edac: make driver loading more robust
Currently, the module does not initialize fully when the DIMMs aren't
ECC but remains still loaded. Propagate the error when no instance of
the driver is properly initialized and prevent further loading.

Reorganize and polish error handling in amd64_edac_init() while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24 11:07:07 +01:00
Borislav Petkov
8f68ed9728 amd64_edac: fix driver instance freeing
Fix use-after-free errors by pushing all memory-freeing calls to the end
of amd64_remove_one_instance().

Reported-by: Darren Jenkins <darrenrjenkins@gmail.com>
LKML-Reference: <1261370306.11354.52.camel@ICE-BOX>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24 11:07:07 +01:00
Borislav Petkov
603adaf6b3 amd64_edac: fix K8 chip select reporting
Fix the case when amd64_debug_display_dimm_sizes() reports only half the
amount of DRAM on it because it doesn't account for when the single DCT
operates in 128-bit mode and merges chip selects from different DIMMs.

Reported-by: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
LKML-Reference: <200912112202.48173.johannes.hirte@fem.tu-ilmenau.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-24 11:07:07 +01:00
Linus Torvalds
661e338f72 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  edac, mce, amd: silence GART TLB errors
  edac, mce: correct corenum reporting
2009-12-16 10:09:43 -08:00
Borislav Petkov
256f7276af edac, mce, amd: silence GART TLB errors
Although reporting of benign GART TLB errors is disabled in
__mcheck_cpu_apply_quirks, those are still being logged, and, as a
result, trip up amd64_edac. Pull up reporting check so that machines
with loaded edac module bail out early and don't spit fragments into
dmesg.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-16 17:48:39 +01:00
Nils Carlson
bbead2104e edac: i5100 add 6 ranks per channel
Add support for 6 ranks per channel to the i5100 chipset.  I have tested
the patch as far as possible with correctible errors and things appear
good.  The DIMM mapping is correct for our board, but boards may differ.

Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se>
Acked-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:12 -08:00
Nils Carlson
295439f2a3 edac: i5100 add scrubbing
Addscrubbing to the i5100 chipset.  The i5100 chipset only supports one
scrubbing rate, which is not constant but dependent on memory load.  The
rate returned by this driver is an estimate based on some experimentation,
but is substantially closer to the truth than the speed supplied in the
documentation.

Also, scrubbing is done once, and then a done-bit is set.  This means that
to accomplish continuous scrubbing a re-enabling mechanism must be used.
I have created the simplest possible such mechanism in the form of a
work-queue which will check every five minutes.  This interval is quite
arbitrary but should be sufficient for all sizes of system memory.

Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:12 -08:00
Nils Carlson
b18dfd05f9 edac: i5100 clean controller to channel terms
The i5100 driver uses the word controller instead of channel in a lot of
places, this is simply a cleanup of the patch.

Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:12 -08:00
Borislav Petkov
35d8069234 edac, mce: correct corenum reporting
Fix core number reporting with NB MCEs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-15 15:52:13 +01:00
Borislav Petkov
505422517d x86, msr: Add support for non-contiguous cpumasks
The current rd/wrmsr_on_cpus helpers assume that the supplied
cpumasks are contiguous. However, there are machines out there
like some K8 multinode Opterons which have a non-contiguous core
enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see
http://www.gossamer-threads.com/lists/linux/kernel/1160268.

This patch fixes out-of-bounds writes (see URL above) by adding per-CPU
msr structs which are used on the respective cores.

Additionally, two helpers, msrs_{alloc,free}, are provided for use by
the callers of the MSR accessors.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20091211171440.GD31998@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-11 10:59:21 -08:00
Borislav Petkov
df5b1606bd amd64_edac: bump driver version
This was long overdue ...

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-08 13:38:14 +01:00
Andrew Morton
18ba54ac12 amd64_edac: fix use-uninitialised bug
drivers/edac/amd64_edac.c: In function 'amd64_edac_init':
drivers/edac/amd64_edac.c:2840: warning: 'ret' may be used uninitialized in this function

Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-08 13:38:13 +01:00
Borislav Petkov
bdc30a0c8c amd64_edac: correct sys address to chip select mapping
The routine does the reverse mapping of the error address of a CECC back
to the node id, DRAM controller and chip select of the DIMM which caused
the error. We should lookup the channel using the syndromes _only_ when
the DCTs are ganged so fix that.

Also, add an early exit when there's an error while scanning for the
csrow thus decreasing indentation levels for better readability.

Finally, fixup comments.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-08 13:38:12 +01:00
Borislav Petkov
bfc04aec7d amd64_edac: add a leaner syndrome decoding algorithm
Instead of using the whole syndrome tables for channel decoding, use a
set of eigenvectors with which the tables can be generated to search for
the syndrome in error. The algorithm operates independently of symbol
size and can be used for both x4 and x8 syndromes.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-08 13:37:59 +01:00
Borislav Petkov
986a42a250 amd64_edac: remove early hw support check
The .probe_valid_hardware low_ops member checked whether the DCTs are in
DDR3 mode and bailed out if so. Now that all the needed changes for DDR3
support is in place, remove it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:31 +01:00
Borislav Petkov
6b4c0bdeb0 amd64_edac: detect DDR3 memory type
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:31 +01:00
Borislav Petkov
239642fe19 edac: add memory types strings for debugging
Instead of using deeply-nested conditionals for dumping the DIMM type in
debug mode, add a strings array of the supported DIMM types.

This is useful in cases where an edac driver supports multiple DRAM
types and is only defined in debug builds.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:31 +01:00
Borislav Petkov
cec7924f56 edac, mce: update AMD F10h revD check
F10h revD start with model number 8.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:30 +01:00
Borislav Petkov
1f6bcee75e amd64_edac: remove unneeded extract_error_address wrapper
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:30 +01:00
Borislav Petkov
44e9e2ee21 amd64_edac: rename StinkyIdentifier
SystemAddress -> sys_addr

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:30 +01:00
Borislav Petkov
ad858bfa14 amd64_edac: remove superfluous dbg printk
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:29 +01:00
Borislav Petkov
1433eb9903 amd64_edac: enhance address to DRAM bank mapping
Add cs mode to cs size mapping tables for DDR2 and DDR3 and F10
and all K8 flavors and remove klugdy table of pseudo values. Add a
low_ops->dbam_to_cs member which is family-specific and replaces
low_ops->dbam_map_to_pages since the pages calculation is a one liner
now.

Further cleanups, while at it:

- shorten family name defines
- align amd64_family_types struct members

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:29 +01:00
Borislav Petkov
d16149e8c3 amd64_edac: cleanup f10_early_channel_count
Do not read DCLR[01] again since this is done in
amd64_read_mc_registers() earlier. There can be more than two physical
DIMMs present so clamp the channels value to max 2. Also, do not report
DCT data width - it is also done earlier.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:29 +01:00
Borislav Petkov
8566c4df16 amd64_edac: dump DIMM sizes on K8 too
Extend f10_debug_display_dimm_sizes to dump the logical DIMMs
configuration on K8 revF too. Remove the ganged arg since we print the
DCT operating mode (ganged vs unganged) earlier.

Also, DCT csrow configuration is relevant therefore dump it as
KERN_DEBUG instead of only on debug builds. Remove misleading DIMM
output since there's no reliable way of mapping of chip selects to
actual physical DIMMs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:28 +01:00
Borislav Petkov
8de1d91e62 amd64_edac: cleanup rest of amd64_dump_misc_regs
Clarify bitfields description, add PCI config function/offset names to
registers for easy reference, simplify code layout, remove unneeded
info.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:28 +01:00
Borislav Petkov
68798e1760 amd64_edac: cleanup DRAM cfg low debug output
Carve out the register-specific debug statements into a separate
function, clarify meanings of the single bitfields in the register,
remove irrelevant output and macros.

There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:28 +01:00
Borislav Petkov
6ba5dcdc44 amd64_edac: wrap-up pci config read error handling
Add a pci config read wrapper for signaling pci config space access
errors instead of them being visible only on a debug build. This is
important on amd64_edac since it uses all those pci config register
values to access the DRAM/DIMM configuration of the nodes.

In addition, the wrapper makes a _lot_ (look at the diffstat!) of
error handling code superfluous and improves much of the overall code
readability by removing error handling details out of the way.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:27 +01:00
Borislav Petkov
f6d6ae9657 amd64_edac: unify MCGCTL ECC switching
Unify almost identical code into one function and remove NUMA-specific
usage (specifically cpumask_of_node()) in favor of generic topology
methods.

Remove unused defines, while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:27 +01:00
Rusty Russell
ba578cb34a cpumask: use modern cpumask style in drivers/edac/amd64_edac.c
cpumask_t -> struct cpumask, and don't put one on the stack.  (Note: this
is actually on the stack unless CONFIG_CPUMASK_OFFSTACK=y).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:27 +01:00
Borislav Petkov
e97f8bb8ce amd64_edac: make DRAM regions output more human-readable
Do not shift the TOP_MEM and TOP_MEM2 values by 23 but rather save the
whole 64-bit value read from the MSR. Although the TOP_MEM/TOP_MEM2 bits
are only a subset of the 64bit register, the values are correct since
the remaining bits are Read-As-Zero and no shifting is needed.

Also, cleanup DRAM base/limit debug output.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:27 +01:00
Borislav Petkov
72381bd55e amd64_edac: clarify DRAM CTL debug reporting
Make debug info formulations about the DRAM and DCT configuration of the
machine more human readable.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-12-07 19:14:26 +01:00
Ingo Molnar
26fb20d008 Merge branch 'perf/mce' into perf/core
Merge reason: It's ready for v2.6.33.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 20:11:06 +01:00
Borislav Petkov
17adea01b9 amd64_edac: fix CECCs reporting
Shift error type bits properly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-11-04 14:04:06 +01:00
Li Hong
a3c4c58085 amd64_edac: fix a wrong goto clause in amd64_edac.c
In amd64_edac_init(void) in amd64_edac.c, cache_k8_northbridges() is
called before pci_register_driver. If it fails, should exit with err
directly.

Signed-off-by: Li Hong <lihong.hi@gmail.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-11-04 14:02:32 +01:00
Keith Mannthey
c2494ace99 edac: i5100 fix initialization code
Allow csrows to properly initialize when the topology only has active
channels on 2 and 3.  This new check allows proper detection and
initialization in this topology.  Only checking the first mrt that
represented channels 0 and 1 is not sufficient.

I also fixed up the related debug information path.  I can submit as a 2nd
patch if needed.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Acked-by: Aristeu Rozanski <aris@ruivo.org>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29 07:39:30 -07:00
Ira W. Snyder
0616fb003d edac: i5400 fix missing CONFIG_PCI define
When building without CONFIG_PCI the edac_pci_idx variable is unused,
causing a build-time warning.  Wrap the variable in #ifdef CONFIG_PCI,
just like the rest of the PCI support.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29 07:39:30 -07:00
Jeff Roberson
156edd4aaa edac: i5400 fix csrow mapping
The i5400 EDAC driver has several bugs with chip-select row computation
which most likely lead to bugs in detailed error reporting.  Attempts to
contact the authors have gone mostly unanswered so I am presenting my diff
here.  I do not subscribe to lkml and would appreciate being kept in the
cc.

The most egregious problem was miscalculating the addresses of MTR
registers after register 0 by assuming they are 32bit rather than 16.
This caused the driver to miss half of the memories.  Most motherboards
tend to have only 8 dimm slots and not 16, so this may not have been
noticed before.

Further, the row calculations multiplied the number of dimms several
times, ultimately ending up with a maximum row of 32.  The chipset only
supports 4 dimms in each of 4 channels, so csrow could not be higher than
4 unless you use a row per-rank with dual-rank dimms.  I opted to
eliminate this behavior as it is confusing to the user and the error
reporting works by slot and not rank.  This gives a much clearer view of
memory by slot and channel in /sys.

Signed-off-by: Jeff Roberson <jroberson@jroberson.net>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-29 07:39:30 -07:00
Borislav Petkov
4997811e3b amd64_edac: fix DRAM base and limit extraction masks, v2
This is a proper fix as a follow-up to 66216a7 and 916d11b.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-16 18:51:22 +02:00
Borislav Petkov
fb2531953f mce, edac: Use an atomic notifier for MCEs decoding
Add an atomic notifier which ensures proper locking when conveying
MCE info to EDAC for decoding. The actual notifier call overrides a
default, negative priority notifier.

Note: make sure we register the default decoder only once since
mcheck_init() runs on each CPU.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20091003065752.GA8935@liondog.tnic>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 12:24:45 +02:00
Linus Torvalds
624235c5b3 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, pci: Correct spelling in a comment
  x86: Simplify bound checks in the MTRR code
  x86: EDAC: carve out AMD MCE decoding logic
  initcalls: Add early_initcall() for modules
  x86: EDAC: MCE: Fix MCE decoding callback logic
2009-10-08 12:06:36 -07:00
Borislav Petkov
94baaee494 amd64_edac: beef up DRAM error injection
When injecting DRAM ECC errors (F3xBC_x8), EccVector[15:0] is a bitmask
of which bits should be error injected when written to and holds the
payload of 16-bit DRAM word when read, respectively.

Add /sysfs members to show the DRAM ECC section/word/vector.

Fail wrong injection values entered over /sysfs instead of truncating
them.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07 16:51:28 +02:00
Borislav Petkov
66216a7a15 amd64_edac: fix DRAM base and limit extraction
On Fam10h and above, F1x[1, 0][7C:40] are DRAM Base/Limit registers
which specify the destination node of a DRAM address. Those address
boundaries are being extracted into ->dram_base[] and ->dram_limit[].
Correct the extraction masks to match the respective address bits.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07 16:51:15 +02:00
Borislav Petkov
9d858bb10a amd64_edac: fix chip select handling
Different processor families support a different number of chip selects.
Handle this in a family-dependent way with the proper values assigned at
init time (see amd64_set_dct_base_and_mask).

Remove _DCSM_COUNT defines since they're used at one place and originate
from public documentation.

CC: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07 16:50:50 +02:00
Keith Mannthey
2cff18c22c amd64_edac: simple fix to allow reporting of CECC errors
This allows the errors to be further decoded and mapped to csrows.
Tested with ECC debug dimms and an Rev F cpu based system.

Signed-off-by: Keith Mannthey <kmannth@us.ibm.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07 16:49:58 +02:00
Borislav Petkov
8edc544589 amd64_edac: fix K8 intlv_sel check
The check when DRAM interleaving is enabled should be done against the
pvt->dram_IntlvSel field and not against the ->dram_limit.

Simplify first loop and fixup printk formatting while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07 16:49:43 +02:00
Borislav Petkov
72f158fe6f amd64_edac: fix interleave enable tests
The pvt->dram_IntlvEn saves the 3 "Interleave Enable" bits already
right-shifted by 8 so the check in find_mc_by_sys_addr() by shifting the
values to the left 8 bits is wrong.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07 16:48:08 +02:00
Borislav Petkov
916d11b2b5 amd64_edac: fix DRAM base and limit address extraction
K8 DRAM base and limit addresses from F1x40 +8*i and F1x44 + 8*i, where
i in (0..7) are both bits 39-24 and therefore the shifting should be
done by 24 and not by 8.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07 16:47:51 +02:00
Borislav Petkov
3011b20da9 amd64_edac: fix driver instance lookup table allocation
Allocate memory statically for 8-node machines max for simplicity
instead of relying on MAX_NUMNODES which is 0 on !CONFIG_NUMA builds.

Spotted by Jan Beulich.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-10-07 16:47:34 +02:00
Borislav Petkov
0d18b2e34b x86: EDAC: carve out AMD MCE decoding logic
This converts the MCE decoding logic into a standalone config
option which can be built-in or a module, the first one being the
default for MCEs happening early on in the boot process.

This, beyond being separated in a cleaner way, also saves RAM by
making the decoding logic modular.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <20091002133148.GD28682@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-02 15:42:19 +02:00
Ingo Molnar
f436f8bb73 x86: EDAC: MCE: Fix MCE decoding callback logic
Make decoding of MCEs happen only on AMD hardware by registering a
non-default callback only on CPU families which support it.

While looking at the interaction of decode_mce() with the other MCE
code i also noticed a few other things and made the following
cleanups/fixes:

 - Fixed the mce_decode() weak alias - a weak alias is really not
   good here, it should be a proper callback. A weak alias will be
   overriden if a piece of code is built into the kernel - not
   good, obviously.

 - The patch initializes the callback on AMD family 10h and 11h.

 - Added the more correct fallback printk of:

	No support for human readable MCE decoding on this CPU type.
	Transcribe the message and run it through 'mcelog --ascii' to decode.

   On CPUs that dont have a decoder.

 - Made the surrounding code more readable.

Note that the callback allows us to have a default fallback -
without having to check the CPU versions during the printout
itself. When an EDAC module registers itself, it can install the
decode-print function.

(there's no unregister needed as this is core code.)

version -v2 by Borislav Petkov:

 - add K8 to the set of supported CPUs

 - always build in edac_mce_amd since we use an early_initcall now

 - fix checkpatch warnings

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <20091001141432.GA11410@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-02 15:42:18 +02:00
Jesper Dangaard Brouer
458e5ff13e edac: core: remove completion-wait for complete with rcu_barrier
Module edac_core.ko uses call_rcu() callbacks in edac_device.c, edac_mc.c
and edac_pci.c.

They all use a wait_for_completion() scheme, but this scheme it not 100%
safe on multiple CPUs.  See the _rcu_barrier() implementation which
explains why extra precausion is needed.

The patch adds a comment about rcu_barrier() and as a precausion calls
rcu_barrier().  A maintainer needs to look at removing the
wait_for_completion code.

[dougthompson@xmission.com: remove the wait_for_completion code]
Signed-off-by Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:05 -07:00
Jason Uhlenkott
dd8ef1db87 edac: i3200 memory controller driver
A driver for the Intel 3200 and 3210 memory controllers.  It has only had
light testing so far, and currently makes no attempt to decode error
addresses at anything finer than csrow granularity.

Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:04 -07:00
Julia Lawall
30a61fff3a edac: fix resource size calculation
Use the function resource_size, which reduces the chance of introducing
off-by-one errors in calculating the resource size.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
struct resource *res;
@@

- (res->end - res->start) + 1
+ resource_size(res)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:04 -07:00
Ira W. Snyder
b484625172 edac: mpc85xx add mpc83xx support
Add support for the Freescale MPC83xx memory controller to the existing
driver for the Freescale MPC85xx memory controller.  The only difference
between the two processors are in the CS_BNDS register parsing code, which
has been changed so it will work on both processors.

The L2 cache controller does not exist on the MPC83xx, but the OF
subsystem will not use the driver if the device is not present in the OF
device tree.

I had to change the nr_pages calculation to make the math work out.  I
checked it on my board and did the math by hand for a 64GB 85xx using 64K
pages.  In both cases, nr_pages * PAGE_SIZE comes out to the correct
value.

Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Kumar Gala <galak@gate.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:04 -07:00
Yang Shi
a014554e66 edac: mpc85xx add P2020DS support
Based on Kumar's new compatible types patch, add P2020 into MPC85xx EDAC
compatible lists so that EDAC can recognize P2020 meomry controller and L2
cache controller and export the relevant fields to sysfs.

EDAC MPC85xx DDR3 support is needed if DDR3 memory stick is installed on a
P2020DS board so that EDAC core can recognize DDR3 memory type.

Signed-off-by: Yang Shi <yang.shi@windriver.com>
Acked-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:04 -07:00
Anand Gadiyar
411c940385 trivial: fix typo "for for" in multiple files
trivial: fix typo "for for" in multiple files

Signed-off-by: Anand Gadiyar <gadiyar@ti.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-09-21 15:14:54 +02:00
Borislav Petkov
06724535f8 amd64_edac: check NB MCE bank enable on the current node properly
The old code was using smp_call_function_many which skips the current
cpu if it is in the supplied cpumask. Switch to the rdmsr_on_cpus()
interface which takes care of that.

In addition, add get_cpus_on_this_dct_cpumask helper which computes a
cpumask of all the cores on a node and thus on a DCT.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-16 13:05:46 +02:00
Wan Wei
57a30854c8 amd64_edac: Rewrite unganged mode code of f10_early_channel_count
Simplify the procedure by checking if there is any DIMM in each channel.
This patch will fix the bugs such as when there is no DIMMs under
certain node, two DIMMs in the same channel, and only one DIMM in each
channel of the node.

Borislav: minor fixups

Signed-off-by: Wan Wei <wanwei@mail.dawning.com.cn>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-16 12:42:55 +02:00
Borislav Petkov
be3468e8ff amd64_edac: cleanup amd64_check_ecc_enabled
Simplify code flow and make sure return value is always valid since
further driver init depends on it. Carve out long warning string and
make code more readable. Shorten some names, while at it.

There should be no functional change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-16 12:40:38 +02:00
Andreas Herrmann
6a8126911a x86, EDAC: Provide function to return NodeId of a CPU
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
2009-09-16 11:33:40 +02:00
Ingo Molnar
b9183f9b99 amd64_edac: build driver only on AMD hardware
-tip testing found the following build failure (config attached):

drivers/built-in.o: In function `amd64_check':
amd64_edac.c:(.text+0x3e9491): undefined reference to `amd_decode_nb_mce'
drivers/built-in.o: In function `amd64_init_2nd_stage':
amd64_edac.c:(.text+0x3e9b46): undefined reference to `amd_report_gart_errors'
amd64_edac.c:(.text+0x3e9b55): undefined reference to `amd_register_ecc_decoder'
drivers/built-in.o: In function `amd64_nbea_store':
amd64_edac_dbg.c:(.text+0x3ea22e): undefined reference to `amd_decode_nb_mce'
drivers/built-in.o: In function `amd64_remove_one_instance':
amd64_edac.c:(.devexit.text+0x3eea): undefined reference to `amd_report_gart_errors'
amd64_edac.c:(.devexit.text+0x3ef6): undefined reference to `amd_unregister_ecc_decoder'

the AMD EDAC code has a dependency on CONFIG_CPU_SUP_AMD facilities. The
patch below solves the problem here.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-16 11:31:57 +02:00
Borislav Petkov
53bd5fedca EDAC, AMD: decode FR MCEs
See Fam10h BKDG (31116, rev. 3.28), Table 101.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 19:01:37 +02:00
Borislav Petkov
f9350efd6f EDAC, AMD: decode load store MCEs
See Fam10h BKDG (31116, rev. 3.28), Table 100.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 19:01:33 +02:00
Borislav Petkov
56cad2d6fb EDAC, AMD: decode bus unit MCEs
... according to Table 69, Fam10h BKDG (31116, rev. 3.28).

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 19:01:30 +02:00
Borislav Petkov
ab5535e70f EDAC, AMD: decode instruction cache MCEs
See Fam10h BKDG (31116, rev. 3.28), Table 95

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 19:01:27 +02:00
Borislav Petkov
5196624136 EDAC, AMD: decode data cache MCEs
Those get reported in MC0_STATUS, see Table 92, F10h BKDG (31116, rev.
3.28) for more details.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 19:01:23 +02:00
Borislav Petkov
d93cc222ad EDAC, AMD: carve out decoding of MCi_STATUS ErrorCode
This is the MCE error code from the MCi_STATUS banks, bits [15:0] which
describe what type of error was encountered: GART TLB, Memory or Bus
error. The semantics of those bits are identical across all MCE banks so
decode those separately, irrespectively of MCE type.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 19:01:20 +02:00
Borislav Petkov
b69b29de65 EDAC, AMD: carve out MCi_STATUS decoding
The MCi_STATUS registers have most field definitions in common so decode
them in the general path. Do not pass ecc_type along and compute it in
__amd64_decode_bus_error instead.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 19:01:07 +02:00
Borislav Petkov
549d042df2 x86, mce: pass mce info to EDAC for decoding
Move NB decoder along with required defines to EDAC MCE core. Add
registration routines for further decoding of the MCE info in the AMD64
EDAC module.

CC: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 18:59:17 +02:00
Borislav Petkov
ecaf5606de amd64_edac: cleanup amd64_decode_bus_error
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 18:58:37 +02:00
Borislav Petkov
b7225e4fc1 amd64_edac: remove memory and GART TLB error decoders
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 18:58:29 +02:00
Borislav Petkov
5110dbdeab amd64_edac: cleanup/complete NB MCE decoding
* don't dump info which mcheck already does
* update to newest BKDG
* mv amd64_process_error_info -> amd64_decode_nb_mce
* shorten error struct names
* remove redundant info ptr in amd64_process_error_info
* remove unused ErrorCodeExt[19:16] (MCx_STATUS) defines

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 18:58:25 +02:00
Borislav Petkov
ef44cc4c22 amd64_edac: cleanup amd64_process_error_info
* mv amd64_error_info_regs -> err_regs

* remove redundant info ptr

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 18:58:18 +02:00
Borislav Petkov
1c43f2e24d EDAC: beef up ErrorCodeExt error signatures
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 18:58:14 +02:00
Borislav Petkov
b70ef01016 EDAC: move MCE error descriptions to EDAC core
This is in preparation of adding AMD-specific MCE decoding functionality
to the EDAC core. The error decoding macros originate from the AMD64
EDAC driver albeit in a simplified and cleaned up version here.

While at it, add macros to generate the error description strings and
use them in the error type decoders directly which removes a bunch of
code and makes the decoding functions much more readable. Also, fix
strings and shorten macro names.

Remove superfluous htlink_msgs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-09-14 18:57:48 +02:00
Doug Thompson
c2718348b4 amd64_edac: print debug statements only on error
Add forgotten return calls for the successful cases.

Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-08-04 12:10:06 +02:00
Doug Thompson
126b67b8d2 amd64_edac: fix ECC checking
On the good path of BIOS enabled ECC and no override, the value returned
is 1 by omission and thus is deemed failing by the probe-function.

Allow proper module initialization by clearing the retval explicitly.

Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-08-03 16:54:20 +02:00
Lu Zhihe
3d768213a6 edac: x38 fix mchbar high register addr
Intel X38 MCHBAR is a 64bits register, base from 0x48, so its higher base
is 0x4C.

Signed-off-by: Lu Zhihe <tombowfly@gmail.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@kernel.org>		[2.6.30.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-29 19:10:34 -07:00
Wan Wei
4afcd2dcc6 amd64_edac: read the right F2 maskoffset reg
Signed-off-by: Wan Wei <onewayforever@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-07-27 14:42:24 +02:00
Yang Shi
b1cfebc923 edac: add DDR3 memory type for MPC85xx EDAC
Since some new MPC85xx SOCs support DDR3 memory now, so add DDR3 memory
type for MPC85xx EDAC.

Signed-off-by: Yang Shi <yang.shi@windriver.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-30 18:55:59 -07:00
Borislav Petkov
37da045067 amd64_edac: misc small cleanups
- cleanup debug calls
- shorten function names
- cleanup error exit paths

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-26 13:06:41 +02:00
Borislav Petkov
30c875cbc1 amd64_edac: fix ecc_enable_override handling
amd64_check_ecc_enabled() returns non-zero status when ECC
checking/correcting is disabled and this fails further loading of the
driver even when 'ecc_enable_override' boot param is used.

Fix that by clearing return status in that case.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-26 13:06:41 +02:00
Borislav Petkov
584fcff428 amd64_edac: check only ECC bit in amd64_determine_edac_cap
Checking whether the machine is using ECC enabled DRAM is done through
testing the DimmEccEn bit in the DRAM Cfg Low register (F2x[1,0]90). Do
that instead of testing all bits from the DimmEccEn upwards.

Also, remove mci->edac_cap assignment and use value returned from
amd64_determine_edac_cap().

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-26 13:06:40 +02:00
GeunSik Lim
e24aca672f edac: Kconfig: fix the meaning of EDAC abbreviation
Fix the meaning of EDAC(Error Detection And Correction) correctly.

[akpm@linux-foundation.org: add missing space]
Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:57 -07:00
Mike Frysinger
20ea8fad9e edac: add missing __devexit_p()
The remove function uses __devexit, so the .remove assignment needs
__devexit_p() to fix a build error with hotplug disabled.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:57 -07:00
Harry Ciao
1dc9b70d7d edac: add edac_device_alloc_index()
Add edac_device_alloc_index(), because for MAPLE platform there may
exist several EDAC driver modules that could make use of
edac_device_ctl_info structure at the same time. The index allocation
for these structures should be taken care of by EDAC core.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:56 -07:00
Harry Ciao
2a9036afff edac: add CPC925 Memory Controller driver
Introduce IBM CPC925 EDAC driver, which makes use of ECC, CPU and
HyperTransport Link error detections and corrections on the IBM
CPC925 Bridge and Memory Controller.

[akpm@linux-foundation.org: cleanup]
Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-18 13:03:56 -07:00
Martin Olsson
98a1708de1 trivial: fix typos s/paramter/parameter/ and s/excute/execute/ in documentation and source comments.
Signed-off-by: Martin Olsson <martin@minimum.se>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-06-12 18:01:46 +02:00
Borislav Petkov
9456ffffcf EDAC: do not enable modules by default
Prevent EDAC compilation units from being built by default and let the
user explicitly select the needed modules.

Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:19:41 +02:00
Borislav Petkov
3d37329045 amd64_edac: do not enable module by default
While at it, fix a link failure when !K8_NB.

Acked-by: Doug Thompson <dougthompson@xmission.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:19:40 +02:00
Doug Thompson
7d6034d321 amd64_edac: add module registration routines
Also, link into Kbuild by adding Kconfig and Makefile entries.

Borislav:
- Kconfig/Makefile splitting
- use zero-sized arrays for the sysfs attrs if not enabled
- rename sysfs attrs to more conform values
- shorten CONFIG_ names
- make multiple structure members assignment vertically aligned
- fix/cleanup comments
- fix function return value patterns
- fix err labels
- fix a memleak bug caught by Ingo
- remove the NUMA dependency and use num_k8_northbrides for initializing
  a driver instance per NB.
- do not copy the pvt contents into the mci struct in
  amd64_init_2nd_stage() and save it in the mci->pvt_info void ptr
  instead.
- cleanup debug calls
- simplify amd64_setup_pci_device()

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:19:28 +02:00
Doug Thompson
f9431992b6 amd64_edac: add ECC reporting initializers
Borislav:
- convert to the new {rd|wr}msr_on_cpus interfaces.
- convert pvt->old_mcgctl to a bitmask thus saving some bytes
- fix/cleanup comments
- fix function return value patterns
- add a proper bugfix found by Doug to amd64_check_ecc_enabled where we
  missed checking for the ECC enabled bit in NB CFG.
- cleanup debug calls

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:19:01 +02:00
Doug Thompson
0ec449ee95 amd64_edac: add EDAC core-related initializers
Borislav:

- add a amd64_free_mc_sibling_devices() helper instead of opencoding the
  release-path.
- fix/cleanup comments
- fix function return value patterns
- cleanup debug calls

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:19:00 +02:00
Doug Thompson
d27bf6fa36 amd64_edac: add error decoding logic
Borislav:

- fold amd64_error_info_valid() into its only user
- fix/cleanup comments
- fix function return value patterns
- cleanup debug calls

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:59 +02:00
Doug Thompson
b1289d6f9d amd64_edac: add ECC chipkill syndrome mapping table
Borislav:

- fix comments
- cleanup debug calls

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:58 +02:00
Doug Thompson
4d37607adb amd64_edac: add per-family descriptors
Borislav:

- fix comments
- fix function return value patterns

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:57 +02:00
Doug Thompson
f71d0a0500 amd64_edac: add F10h-and-later methods-p3
Borislav:

- compute dct_sel_base_off in f10_match_to_this_node() correctly since
it cannot be assumed that the Reserved bits are zero and they have to be
masked out instead.

- cleanup, remove StinkyIdentifiers, simplify logic
- fix function return value patterns
- cleanup debug calls

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:56 +02:00
Doug Thompson
6163b5d4fb amd64_edac: add F10h-and-later methods-p2
Borislav:

- fix a wrong negation in f10_determine_base_addr_offset()
- fix a wrong mask in f10_determine_base_addr_offset() which should
select DctSelBaseAddr[31:11] and not [31:16] as it was before
- remove StinkyIdentifiers, trivially simplify code.
- fix/cleanup comments
- fix function return value patterns

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:56 +02:00
Doug Thompson
1afd3c98b5 amd64_edac: add F10h-and-later methods-p1
Borislav:

Fail f10_early_channel_count() if error encountered while reading a NB
register since those cached register contents are accessed afterwards.

- fix/cleanup comments
- fix function return value patterns
- cleanup debug calls

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:55 +02:00
Doug Thompson
ddff876d20 amd64_edac: add k8-specific methods
Borislav:

- fix/cleanup/move comments
- fix function return value patterns
- cleanup debug calls

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:54 +02:00
Doug Thompson
94be4bff21 amd64_edac: assign DRAM chip select base and mask in a family-specific way
Borislav:

- cleanup/fix comments
- fix function return value patterns
- cleanup debug calls

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:53 +02:00
Doug Thompson
2da11654ea amd64_edac: add helper to dump relevant registers
Borislav:

- cleanup/fix comments
- fix function return value patterns
- cleanup dbg calls

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:52 +02:00
Doug Thompson
93c2df58b5 amd64_edac: add DRAM address type conversion facilities
Borislav:

- cleanup/fix comments, add BKDG refs
- fix function return value patterns
- cleanup dbg calls

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:51 +02:00
Doug Thompson
e2ce7255e8 amd64_edac: add functionality to compute the DRAM hole
Borislav:

- cleanup/fix comments, add BKDG refs
- cleanup debug calls

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:50 +02:00
Doug Thompson
6775763a23 amd64_edac: add sys addr to memory controller mapping helpers
Borislav:

- cleanup comments
- cleanup debug calls
- simplify find_mc_by_sys_addr's exit path

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:49 +02:00
Doug Thompson
2bc6541872 amd64_edac: add memory scrubber interface
Borislav:
- fix/cleanup comments
- fix function return value patterns
- cleanup debug calls

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:49 +02:00
Doug Thompson
b52401cece amd64_edac: add MCA error types
Borislav:
- cleanup comments

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:48 +02:00
Doug Thompson
eb919690be amd64_edac: add DRAM error injection logic using sysfs
Borislav:
- rename sysfs attrs to more conform names
- cleanup/fix comments according to BKDG text
- fix function return value patterns
- cleanup debug calls

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:47 +02:00
Doug Thompson
fd3d6780f7 amd64_edac: add debugging/testing code
This is for dumping different registers and testing the address mapping
logic using the ECC syndromes.

Borislav:

- split sysfs attrs per file
- use more conform names for the sysfs attrs
- fix function return value patterns
- cleanup/fix comments
- cleanup debug calls

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:46 +02:00
Doug Thompson
cfe40fdb4a amd64_edac: add driver header
Borislav:
- remove register bit descriptions (complete text in BKDG)
- cleanup and remove excessive/superfluous comments

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:45 +02:00
Borislav Petkov
d357cbb445 edac: fold __func__ into edac_debug_printk
This shortens debugfX() calls a bit.

Reviewed-by: Mauro Carvalho Chehab <mchehab@redhat.com>
CC: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2009-06-10 12:18:44 +02:00
Harry Ciao
715fe7af9f edac: AMD8111 & AMD8131 Kconfig fixup
The amd8111_edac.c driver will fail allmodconfig on architectures other
than PPC, introduce Kconfig dependency to avoid this, since both AMD8111
and AMD8131 chips are only adopted on Maple so far.

Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-29 08:40:03 -07:00
Harry Ciao
56ec0c7b88 edac: AMD8111 & AMD8131 use dev_name()
The "bus_id" member in the device structure has been obsolete, use
dev_name() instead.

Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-29 08:40:03 -07:00
Dave Jiang
55e5750b3e edac: ppc mpc85xx fix mc err detect
Error found by Jeff Haran.

The error detect register is 0s when no errors are detected.  The check
code is incorrect, so reverse check sense.

Reported-by: Jeff Haran <jharan@Brocade.COM>
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-21 13:41:51 -07:00
Jean Delvare
fbeb438474 edac: use to_delayed_work()
The edac-core driver includes code which assumes that the work_struct
which is included in every delayed_work is the first member of that
structure.  This is currently the case but might change in the future, so
use to_delayed_work() instead, which doesn't make such an assumption.

linux-2.6.30-rc1 has the to_delayed_work() function that will allow this
patch to work

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-13 15:04:34 -07:00
Jeff Haran
e6da46b273 edac: fix local pci_write_bits32
Fix the edac local pci_write_bits32 to properly note the 'escape' mask if
all ones in a 32-bit word.

Currently no consumer of this function uses that mask, so there is no
danger to existing code.

Signed-off-by: Jeff Haran <jharan@Brocade.COM>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-13 15:04:33 -07:00
Harry Ciao
58b4ce6f24 edac: AMD8111 driver Kconfig & Makefile
Introduce Kconfig and Makefile options for AMD8111 EDAC driver.

Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:05:04 -07:00
Harry Ciao
e876558415 edac: AMD8131 driver Kconfig & Makefile
Introduce Kconfig and Makefile options for AMD8131 EDAC driver.

Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:05:04 -07:00
Harry Ciao
28d16272b1 edac: AMD8131 driver source file
Introduce AMD8131 EDAC driver source file, which makes use of error
detections on the PCI-X Bridge Controllers on the AMD8131 HyperTransport
PCI-X Tunnel.

Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:05:04 -07:00
Harry Ciao
a35a281880 edac: AMD8131 driver header file
Introduce AMD8131 EDAC driver header file, which adds register and bits
definitions for the PCI-X Bridge Controller on the AMD8131 HyperTransport
I/O Hub.

Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:05:03 -07:00
Harry Ciao
8641a3845d edac: Add edac_pci_alloc_index()
Add edac_pci_alloc_index(), because for MAPLE platform there may exist
several EDAC driver modules that could make use of edac_pci_ctl_info
structure at the same time.  The index allocation for these structures
should be taken care of by EDAC core.

Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:05:03 -07:00
Harry Ciao
697dab6484 edac: AMD8111 driver source file
Introduce AMD8111 EDAC driver source file, which makes use of error
detections on the LPC Bridge Controller and PCI Bridge Controller on the
AMD8111 HyperTransport I/O Hub.

Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:05:03 -07:00
Harry Ciao
ec2cf2e272 edac: AMD8111 driver header file
Introduce AMD8111 EDAC driver header file, which adds register and bits
definitions for the LPC Bridge Controller and PCI Bridge Controller on the
AMD8111 HyperTransport I/O Hub.

Signed-off-by: Harry Ciao <qingtao.cao@windriver.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:05:03 -07:00
Grant Erickson
dba7a77c0e edac: new ppc4xx driver module
This adds support for an EDAC memory controller adaptation driver for the
"ibm,sdram-4xx-ddr2" ECC controller realized in the AMCC PowerPC 405EX[r].

At present, this driver has been developed and tested against the
controller realization in the AMCC PPC405EX[r] on the AMCC Kilauea and
Haleakala boards (256 MiB w/o ECC memory soldered onto the board) and a
proprietary board based on those designs (128 MiB ECC memory, also
soldered onto the board).

In the future, dynamic feature detection and handling needs to be added
for the other realizations of this controller found in the 440SP, 440SPe,
460EX, 460GT and 460SX.

Eventually, this driver will likely be evolved and adapted to the above
variant realizations of this controller as well as broken apart to handle
the other known ECC-capable controllers prevalent in other PPC4xx
processors:

  - IBM SDRAM (405GP, 405CR and 405EP) "ibm,sdram-4xx"
  - IBM DDR1 (440GP, 440GX, 440EP and 440GR) "ibm,sdram-4xx-ddr"
  - Denali DDR1/DDR2 (440EPX and 440GRX) "denali,sdram-4xx-ddr2"

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Grant Erickson <gerickson@nuovations.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:05:03 -07:00
Doug Thompson
4577ca5568 edac: remove EDAC's experimental status
After 3 years, this is a patch to remove the EXPERIMENTAL tag on EDAC.  We
now have many module drivers submitters in EDAC and believe EDAC is no
longer EXPERIMENTAL

Signed-off-by: Doug Thompson <dougthompson@xmission.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:05:03 -07:00
Hitoshi Mitake
cc18e3cd53 edac: add more verbose debug info
A patch for making a debugging information more verbose for use in
development debugging.

By enabling the new option "More verbose debugging", information about
source file and line number will be added to debugging message.

This is sample output,

EDAC MC0: Giving out device to 'e7xxx_edac' 'E7205': DEV 0000:00:00.0
EDAC DEBUG: in drivers/edac/edac_pci.c, line at 48: edac_pci_alloc_ctl_info()
EDAC DEBUG: in drivers/edac/edac_pci.c, line at 334: edac_pci_add_device()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-02 19:05:02 -07:00
Kay Sievers
031d551859 edac: struct device - replace bus_id with dev_name(), dev_set_name()
Cc: dougthompson@xmission.com
Cc: bluesmoke-devel@lists.sourceforge.net
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
2009-03-24 16:38:21 -07:00
Stephen Rothwell
4712fff9be powerpc: More printing warning fixes for the l64 to ll64 conversion
These are all powerpc specific drivers.

res.start in fsl_elbc_nand.c needs to be cast since it may be either 32
or 64 bit.  Thanks to Scott Wood for noticing.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Arnd Bergmann <arnd@arndb.de> call_edac bits in particular
Acked-by: Olof Johansson <olof@lixom.net> pasemi_nand peices
Acked-by: Scott Wood <scottwood@freescale.com> fsl_elbc fixes
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-28 17:15:52 +11:00
Mauro Carvalho Chehab
8375d4909a edac: driver for i5400 MCH (update)
Signed-off-by: Ben Woodard <woodard@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:30 -08:00
Mauro Carvalho Chehab
920c8df6ac edac: driver for i5400 MCH (Seaburg)
EDAC driver for i5400 MCH (Seaburg)

This driver adds support for i5400 MCH chipset.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Ben Woodard <woodard@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:30 -08:00
Kumar Gala
29d6cf26a7 edac: fix mpc85xx and add mpc8536 mpc8560
All other compatibles that are uniquely identifying the processor use a
prefix of the form fsl,mpc85...'.  We add support for it so we can
deprecate the older 'fsl,85...' that was improperly used here.

Additionally added mpc8536 & mpc8560 to the compatible lists.

This patch is based on Nate's 8572 patch.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Acked-by: Dave Jiang <djiang@mvista.com>
Cc: Nate Case <ncase@xes-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:30 -08:00
Kay Sievers
281efb17d8 edac: struct device: replace bus_id with dev_name(), dev_set_name()
This patch is part of a larger patch series which will remove the "char
bus_id[20]" name string from struct device.  The device name is managed in
the kobject anyway, and without any size limitation, and just needlessly
copied into "struct device".

[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:30 -08:00
Arjan van de Ven
1dca00bd02 pci: use pci_ioremap_bar() in drivers/edac
Use the newly introduced pci_ioremap_bar() function in drivers/edac.
pci_ioremap_bar() just takes a pci device and a bar number, with the goal
of making it really hard to get wrong, while also having a central place
to stick sanity checks.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:30 -08:00
Linus Torvalds
3c92ec8ae9 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits)
  powerpc/44x: Support 16K/64K base page sizes on 44x
  powerpc: Force memory size to be a multiple of PAGE_SIZE
  powerpc/32: Wire up the trampoline code for kdump
  powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M
  powerpc/32: Allow __ioremap on RAM addresses for kdump kernel
  powerpc/32: Setup OF properties for kdump
  powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs()
  powerpc: Prepare xmon_save_regs for use with kdump
  powerpc: Remove default kexec/crash_kernel ops assignments
  powerpc: Make default kexec/crash_kernel ops implicit
  powerpc: Setup OF properties for ppc32 kexec
  powerpc/pseries: Fix cpu hotplug
  powerpc: Fix KVM build on ppc440
  powerpc/cell: add QPACE as a separate Cell platform
  powerpc/cell: fix build breakage with CONFIG_SPUFS disabled
  powerpc/mpc5200: fix error paths in PSC UART probe function
  powerpc/mpc5200: add rts/cts handling in PSC UART driver
  powerpc/mpc5200: Make PSC UART driver update serial errors counters
  powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver
  powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver
  ...

Fix trivial conflict in drivers/char/Makefile as per Paul's directions
2008-12-28 16:54:33 -08:00
Harry Ciao
d519c8d9cc edac: fix edac core deadlock when removing a device
When deleting an edac device, we have to wait for its edac_dev.work to be
completed before deleting the whole edac_dev structure.  Since we have no
idea which work in current edac_poller's workqueue is the work we are
conerned about, we wait for all work in the edac_poller's workqueue to be
proceseed.  This is done via flush_cpu_workqueue() which inserts a
wq_barrier into the tail of the workqueue and then sleeping on the
completion of this wq_barrier.  The edac_poller will wake up sleepers when
it is found.

EDAC core creates only one kernel worker thread, edac_poller, to run the
works of all current edac devices.  They share the same callback function
of edac_device_workq_function(), which would grab the mutex of
device_ctls_mutex first before it checks the device.  This is exactly
where edac_poller and rmmod would have a great chance to deadlock.

In below call trace of rmmod > ... >
edac_device_del_device >
edac_device_workq_teardown > flush_workqueue > flush_cpu_workqueue,

device_ctls_mutex would have already been grabbed by
edac_device_del_device().  So, on one hand rmmod would sleep on the
completion of a wq_barrier, holding device_ctls_mutex; on the other hand
edac_poller would be blocked on the same mutex when it's running any one
of works of existing edac evices(Note, this edac_dev.work is likely to be
totally irrelevant to the one that is being removed right now)and never
would have a chance to run the work of above wq_barrier to wake rmmod up.

edac_device_workq_teardown() should not be called within the critical
region of device_ctls_mutex.  Just like is done in edac_pci_del_device()
and edac_mc_del_mc(), where edac_pci_workq_teardown() and
edac_mc_workq_teardown() are called after related mutex are released.

Moreover, an edac_dev.work should check first if it is being removed.  If
this is the case, then it should bail out immediately.  Since not all of
existing edac devices are to be removed, this "shutting flag" should be
contained to edac device being removed.  The current edac_dev.op_state can
be used to serve this purpose.

The original deadlock problem and the solution have been witnessed and
tested on actual hardware.  Without the solution, rmmod an edac driver
would result in below deadlock:

root@localhost:/root> rmmod mv64x60_edac
EDAC DEBUG: mv64x60_dma_err_remove()
EDAC DEBUG: edac_device_del_device()
EDAC DEBUG: find_edac_device_by_dev()

(hang for a moment)

INFO: task edac-poller:2030 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
edac-poller   D 00000000     0  2030      2
Call Trace:
[df159dc0] [c0071e3c] free_hot_cold_page+0x17c/0x304 (unreliable)
[df159e80] [c000a024] __switch_to+0x6c/0xa0
[df159ea0] [c03587d8] schedule+0x2f4/0x4d8
[df159f00] [c03598a8] __mutex_lock_slowpath+0xa0/0x174
[df159f40] [e1030434] edac_device_workq_function+0x28/0xd8 [edac_core]
[df159f60] [c003beb4] run_workqueue+0x114/0x218
[df159f90] [c003c674] worker_thread+0x5c/0xc8
[df159fd0] [c004106c] kthread+0x5c/0xa0
[df159ff0] [c0013538] original_kernel_thread+0x44/0x60
INFO: task rmmod:2062 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
rmmod         D 0ff2c9fc     0  2062   1839
Call Trace:
[df119c00] [c0437a74] 0xc0437a74 (unreliable)
[df119cc0] [c000a024] __switch_to+0x6c/0xa0
[df119ce0] [c03587d8] schedule+0x2f4/0x4d8
[df119d40] [c03591dc] schedule_timeout+0xb0/0xf4

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-23 15:58:21 -08:00
Benjamin Krill
def434c231 powerpc/cell: add QPACE as a separate Cell platform
Since the QPACE (Chromodynamics Parallel Computing on the
Cell Broadband Engine) platform doesn't use a iommu, doesn't
have PCI devices and a MPIC much lesser setup and
configurations are needed. So far all devices are detected
as OF device. A notifier function is used to set the dma_ops
for the of_platform bus. Further this patch splits the
PPC_CELL_NATIVE into PPC_CELL_COMMON which are parts that are
shared with the QPACE platform and the rest.

Signed-off-by: Benjamin Krill <ben@codiert.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2008-12-22 22:19:19 +01:00
Jarkko Lavinen
09a81269c7 i82875p_edac: fix module remove
Fix module removal bugs of i82875p_edac.  Also i82975x_edac code seems to
have the same module removal bugs as in i82875p_edac.

The problems were:

1. In module removal i82875p_remove_one() is never called.

   Variable i82875p_registered is newer changed from 1, which
   guarantees i82875p_remove_one() is not called (and even if it were
   called, it would be called in wrong order).

   As a result, the edac_mc workque is not stopped and keeps probing.
   If kernel debugging options are not enabled, user may not notice
   anything going wrong.

   if debugging options are enabled and I do "rmmod i82875p_edac", I
   get:

      edac debug: edac_pci_workq_function() checking
      BUG: unable to handle kernel paging request at f882d16f
      ...
      call trace:
       [<f8834df3>] ? edac_mc_workq_function+0x55/0x7e [edac_core]
       [<c0233974>] ? run_workqueue+0xd7/0x1a5
       [<c023392f>] ? run_workqueue+0x92/0x1a5
       [<f8834d9e>] ? edac_mc_workq_function+0x0/0x7e [edac_core]
       [<c0233af9>] ? worker_thread+0xb7/0xc3
       [<c0236a7b>] ? autoremove_wake_function+0x0/0x33
       [<c0233a42>] ? worker_thread+0x0/0xc3
       [<c0236809>] ? kthread+0x3b/0x61
       [<c02367ce>] ? kthread+0x0/0x61
       [<c0204587>] ? kernel_thread_helper+0x7/0x10

   Fix for this is to get rid of needles variable i82875p_registered
   altogether and run i82875p_remove_one() *before*
   pci_unregister_driver().

2. edac_mc_del_mc() uses mci after freeing mci

   edac_mc_del_mc() calls calls edac_remove_sysfs_mci_device().  The
   kobject refcount of mci drops to 0 and mci is freed.  After this
   mci is accessed via debug print and i82875p_remove_one() still
   uses mci->pvt and tries to free mci again with edac_mc_free().

   The fix for this is add kobject_get(&mci->edac_mci_kobj) after
   edac_mc_alloc(). Then the mci is still available after returning
   from edac_mc_del_mc() with refcount 1, and mci->pvt is still
   available. When i82875p_remove_one() finally calls edac_mc_free(),
   this will cause kobject_put() and mci is released properly.

Signed-off-by: Jarkko Lavinen <jlavi@iki.fi>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-01 19:55:25 -08:00
Jarkko Lavinen
307d114441 i82875p_edac: fix overflow device resource setup
When I do "modprobe i82875p_edac" on my Asus P4C800 MB on kernels 2.6.26
or later, the module load fails due to BAR 0 collision.  On 2.6.25 the
module loads just fine.

The overflow device on the MB seems to be hidden and its resources are not
allocated at normal PCI bus init.  Log shows the missing resource problem:

  EDAC DEBUG: i82875p_probe1()
  PCI: 0000:00:06.0 reg 10 32bit mmio: [fecf0000, fecf0fff]
  pci 0000:00:06.0: device not available because of BAR 0
[0xfecf0000-0xfecf0fff] collisions
  EDAC i82875p: i82875p_setup_overfl_dev(): Failed to enable overflow
device

The patch below fixes this by calling pci_bus_assign_resources() after
the overflow device is revealed and added to the bus. With this patch
I am again able to load and use the module.

Signed-off-by: Jarkko Lavinen <jlavi@iki.fi>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-12-01 19:55:25 -08:00
Darrick J. Wong
f0f7e0dc73 i5000-edac: hold reference to mci kobject
It turns out that edac_mc_del_mc will kobject_put the last kref on the
mci object.

If the timing is just right, that means that the mci object is freed
before before i5000_remove_one has a chance to free the resources
associated with it, causing a null pointer exceptions when unloading the
driver.  Insert a kobject_{get,put} pair so that this doesn't happen.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-12 17:17:16 -08:00
Benjamin Herrenschmidt
992b692dcf edac: fix enabling of polling cell module
The edac driver on cell turned out to be not enabled because of a missing
op_state.  This patch introduces it.  Verified to work on top of Ben's
next branch.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Osterkamp <jens@linux.vnet.ibm.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30 11:38:46 -07:00
Hitoshi Mitake
df8bc08c19 edac x38: new MC driver module
I wrote a new module for Intel X38 chipset.  This chipset is very similar
to Intel 3200 chipset, but there are some different points, so I copyed
i3200_edac.c and modified.

This is Intel's web page describing this chipset.
http://www.intel.com/Products/Desktop/Chipsets/X38/X38-overview.htm

I've tested this new module with broken memory, and it seems to be working
well.

Signed-off-by: Hitoshi Mitake <mitake@clustcom.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-30 11:38:45 -07:00
Benjamin Herrenschmidt
3b274f44d2 edac cell: fix incorrect edac_mode
The cell_edac driver is setting the edac_mode field of the csrow's to an
incorrect value, causing the sysfs show routine for that field to go out
of an array bound and Oopsing the kernel when used.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@kernel.org>		[2.6.27.x, 2.6.26.x. 2.6.25.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:52:40 -07:00
Aristeu Rozanski
8360e81b5d edac i5000: fix thermal issues
Make the Thermal messages (temperature got past Tmid) be displayed only
once because:

1) it's the BIOS job to configure and handle the memory throttling
2) if the BIOS is broken or is aware about the condition, flooding the
   system logs won't help anything.
3) According to the specification update for Intel 5000 MCHs, all the
   revisions of this MCH have problems on the thermal sensors, making
   not automatic (a.k.a. intelligent thermal throttling) impossible.

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:48 -07:00
Aristeu Rozanski
c066740739 edac i5000: fix error messages
Update the i5000_edac messages, making everything pass through the EDAC
(so the log controls will work) and being more specific about the errors.
Also, it makes the miscellaneous errors optional and disabled by default.

As I didn't found anywhere information about M23ERR-M26ERR
(FERR_NF_THERMAL) on FERR_NF_FBD, I'm removing them.

Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:48 -07:00
Andrew Kilkenny
60be75515e edac mpc85xx: add support for mpc8572
This adds support for the dual-core MPC8572 processor.  We have
to support making SPR changes on each core.  Also, since we can
have multiple memory controllers sharing an interrupt, flag the
interrupts with IRQF_SHARED.

Signed-off-by: Andrew Kilkenny <akilkenny@xes-inc.com>
Signed-off-by: Nate Case <ncase@xes-inc.com>
Acked-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:48 -07:00
Vladislav Bogdanov
53a2fe5804 edac: make i82443bxgx_edac coexist with intel_agp
Fix 443BX/GX MCH suppport in a EDAC.

It makes i82443bxgx_edac coexist with intel_agp using the same approach as
several other EDAC drivers.

Tested on Intel's L443GX with redhat's 2.6.18 with whole EDAC subsystem
backported a while ago.

[root@host ~]# dmesg|grep -iE '(AGP|EDAC)'
Linux agpgart interface v0.101 (c) Dave Jones
agpgart: Detected an Intel 440GX Chipset.
agpgart: AGP aperture is 64M @ 0xf8000000
EDAC MC: Ver: 2.1.0 Jun 27 2008
EDAC MC0: Giving out device to 'i82443bxgx_edac' 'I82443BXGX': DEV 0000:00:00.0
EDAC PCI0: Giving out device to module 'i82443bxgx_edac' controller 'EDAC PCI controller': DEV '0000:00:00.0' (POLLED)

Signed-off-by: Vladislav Bogdanov <slava@nsys.by>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-16 11:21:48 -07:00
Adrian Bunk
7a8fc9b248 removed unused #include <linux/version.h>'s
This patch lets the files using linux/version.h match the files that
#include it.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-23 12:14:12 -07:00
Dave Jiang
f87bd330ed edac: mpc85xx fix pci ofdev 2nd pass
Convert PCI err device from platform to open firmware of_dev to comply
with powerpc schemes.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:49 -07:00
Dave Jiang
fcb19171d1 edac: mv64x60 add pci fixup
Fixup of missing bit 0 on 64360 PCIx_ERR_MASK and errata FEr-#11 and
FEr-#16 for the 64460.  Bit 0 must remain 0.

Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Doug Thompson <dougthompson.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:49 -07:00
Dave Jiang
596d394103 edac: mv64x60 fix get_property
Update get_property() call to use of_get_property() in order to fix compile

Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Doug Thompson <dougthompson.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:49 -07:00
Doug Thompson
10d33e9c36 edac: e752x fix too loud on nonmemory errors
This module harvests more than just memory errors, it also harvests
various bus and dma errors that the Chipset detects.  Previously, it would
report all such errors, which would cause output to be TOO loud.

This patches therefore adds a parameter which is used to turn off
NON-MEMORY error reports by default.  Or the reporting can be enabled via
the parameter

Also did code style cleanup: less than 80 characters per line rule

Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:49 -07:00
Arthur Jones
124682c785 edac: core fix added newline to sysfs dimm labels
The channel DIMM label does not seem to be used much in the edac code.
However, where it is used (in the core code), it is assumed to not have a
newline embedded.  This leaves the sysfs file newline free which looks
funny when cat'ing it.  Here we just add the trailing newline to the sysfs
chX_dimm_label output...

[Doug Thompson note: the DIMM label is one of the primary uses of EDAC.
User space daemon scripts, edac-utils@sourceforge, populate the DIMM label
fields, via /sys/devices/system/edac attributes, with the silk screen
labels of the motherboard in use.  dmidecode access BIOS tables, but BIOS
tables are well known to be incorrect and useless in these respects.
edac-utils will strip off any newlines before its use of the output, when
displaying DIMM slot silk screen labels.

Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:49 -07:00
Arthur Jones
f9fc82adca edac: core fix static to dynamic kset
Static kobjects and ksets are not supported in Linux kernel.  Convert the
mc_kset from static to dynamic.  This patch depends on my previous patch
to remove the module parameter attributes from mc...

Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:49 -07:00
Arthur Jones
327dafb1c6 edac: core fix redundant sysfs controls to parameters
/sys/devices/system/edac/mc has a few files which are duplicated in
/sys/module/edac_core/parameters.  Now that all the functionality is
duplicated between these two locations, we remove the former kobject
attributes and update the documentation.

Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:49 -07:00
Arthur Jones
096846e2b0 edac: core fix workq timer
When updating the edac_mc_poll_msec module parameter from the sysfs
/sys/module/edac_core/parameters/edac_mc_poll_msec file, we don't update
the workq timers.  So that, if we move from a big poll time to a small
one, the small one won't take effect until the big one has timed out.

Here we provide a new module parameter set method to call out to the
update routine.  This brings the /sys/module/edac_core/parameters
functionality up to that provided by the /sys/drivers/system/edac/mc sysfs
module parameter files so that we can remove them or at least link to the
/sys/module files...

Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:49 -07:00
Arthur Jones
14cc571bb1 edac: core fix to use dynamic kobject
Static kobjects are not supported in linux kernel.  Convert the
edac_pci_top_main_kobj from static to dynamic.  This avoids the double
free of the edac_pci_top_main_kobj.name that we see on module reload of
the e752x edac driver (and probably others as well).

In addition Greg KH <greg@kroah.com> has pointed out that this code may be
cleaned up significantly.  I will look at that as a follow-on patch, for
now, I just want the minimum fix to get this double-free oops bug
squashed...

Many thanks to Greg KH for his patience in showing me what the
Documentation/kobject.txt already said (oops)...

Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:48 -07:00
Arthur Jones
b238e57723 edac: i5100: cleanup
Some code cleanliness issues found by Andrew Morton (thanks!) which should
not affect functionality, but which should help make the code more
maintainable.

In particular, we now:

* convert all #define's w/ a parameter to static inlines
* use 1UL rather than 1ULL when calculating an unsigned long
* use pci_disable_device

The resulting code is tested and seems to work fine...

Signed-off-by: Arthur Jones <ajones@riverbed.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:48 -07:00
Arthur Jones
178d5a7422 edac: i5100 fix unmask ecc bits
Explicitly unmask ECC errors we are interested in reporting.

Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:48 -07:00
Arthur Jones
43920a598f edac: i5100 fix enable ecc hardware
It is possible that the BIOS did not enable ECC at boot time.  We check
for that case and fail to load if it is true.

Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:48 -07:00
Arthur Jones
f7952ffcff edac: i5100 fix missing bits
The error mask we use to trigger ECC notifications is missing many bits of
interest.  We add these bits here so that all possible ECC errors can be
reported.

Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:48 -07:00
Arthur Jones
8f421c595a edac: i5100 new intel chipset driver
Preliminary support for the Intel 5100 MCH.  CE and UE errors are reported
along with the current DIMM label information and other memory parameters.

Reasons why this is preliminary:

1) This chip has 2 independent memory controllers which, for best
   perforance, use interleaved accesses to the DDR2 memory.  This
   architecture does not map very well to the current edac data structures
   which depend on symmetric channel access to the interleaved data.
   Without core changes, the best I could do for now is to map both memory
   controllers to different csrows (first all ranks of controller 0, then
   all ranks of controller 1).  Someone much more familiar with the edac
   core than I will probably need to come up with a more general data
   structure to handle the interleaving and de-interleaving of the two
   memory controllers.

2) I have not yet tackled the de-interleaving of the rank/controller
   address space into the physical address space of the CPU.  There is
   nothing fundamentally missing, it is just ending up to be a lot of
   code, and I'd rather keep it separate for now, esp since it doesn't
   work yet...

3) The code depends on a particular i5100 chip select to DIMM mainboard
   chip select mapping.  This mapping seems obvious to me in order to
   support dual and single ranked memory, but it is not unique and DIMM
   labels could be wrong on other mainboards.  There is no way to query
   this mapping that I know of.

4) The code requires that the i5100 is in 32GB mode.  Only 4 ranks per
   controller, 2 ranks per DIMM are supported.  I do not have hardware
   (nor do I expect to have hardware anytime soon) for the 48GB (6 ranks
   per controller) mode.

5) The serial presence detect code should be broken out into a "real"
   i2c driver so that decode-dimms.pl can work.

Signed-off-by: Arthur Jones <ajones@riverbed.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-25 10:53:48 -07:00
Maxim Shchetynin
c134fd868f powerpc/cell/edac: Log a syndrome code in case of correctable error
If correctable error occurs the syndrome code was logged as 0. This patch
lets EDAC to log a correct syndrome code to make problem investigation
easier.

Signed-off-by: Maxim Shchetynin <maxim@de.ibm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-07-22 10:39:36 +10:00
Kumar Gala
f99c90094b edac: mpc85xx: fix building as a module
including of <asm/mpc85xx.h> causes build problems since it doesn't exist.

Also removed warning:
drivers/edac/mpc85xx_edac.c:45: warning: 'mpc85xx_ctl_name' defined but not used

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Acked-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:13 -07:00
Stephen Rothwell
17aa7e0344 dev_name introduction fall out fix
Commit 06916639e2 ("driver-core: add
dev_name() to help transition away from using bus_id") added a static
inline dev_name() and used it in dev_printk.

Unfortunately, drivers/edac/edac_core.h defines a macro called
dev_name().  Rename the latter.

Diagnosis by Tony Breeds and Michael Ellerman.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-05 15:08:38 -07:00
Stephen Rothwell
a94a630a4c pasemi_edac needs to include linux/edac.h
Commit c3c52bce69 ("edac: fix module
initialization on several modules 2nd time") added a call to opstate_init
but did not include linux/edac.h that declares it.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 19:06:57 -07:00
Hitoshi Mitake
c3c52bce69 edac: fix module initialization on several modules 2nd time
I implemented opstate_init() as a inline function in linux/edac.h.

added calling opstate_init() to:
	i82443bxgx_edac.c
	i82860_edac.c
	i82875p_edac.c
	i82975x_edac.c

I wrote a fixed patch of
edac-fix-module-initialization-on-several-modules.patch,
and tested building 2.6.25-rc7 with applying this. It was succeed.
I think the patch is now correct.

Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:26 -07:00
Adrian Bunk
1a45027d1a edac: remove unneeded functions and add static accessor
Collection of patches, merged into one, from Adrian that do the following:

1) This patch makes the following needlessly global functions static:
- edac_pci_get_log_pe()
- edac_pci_get_log_npe()
- edac_pci_get_panic_on_pe()
- edac_pci_unregister_sysfs_instance_kobj()
- edac_pci_main_kobj_setup()

2) Remove unneeded function edac_device_find()

3) Added #if 0 around function  edac_pci_find()

4) make the needlessly global edac_pci_generic_check() static

5) Removed function edac_check_mc_devices()

Doug Thompson modified Adrian's patches, to bettern represent
the direction of EDAC, and make them one patch.

Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:26 -07:00
Robert P. J. Day
ff6ac2a616 edac: use the shorter LIST_HEAD for brevity
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:26 -07:00
Peter Tyser
94ee1cf5a8 edac: add e752x parameter for sysbus_parity selection
Add a module parameter "sysbus_parity" to allow forcing system bus parity
error checking on or off.  Also add support to automatically disable system
bus parity errors for processors which do not support it.

If the sysbus_parity parameter is specified, sysbus parity detection will be
forced on or off.  If it is not specified, the driver will attempt to look at
the CPU identifier string and determine if the CPU supports system bus parity.
 A blacklist was used instead of a whitelist so that system bus parity would
be enabled by default and to minimize the chances of breaking things for those
people already using the driver which for some reason have a processor that
does not have a valid CPU identifier string.

[akpm@linux-foundation.org: coding-style fixes]
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Peter Tyser <ptyser@xes-inc.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:26 -07:00
Andrei Konovalov
5135b797c8 edac: new support for Intel 3100 chipset
Add Intel 3100 chipset support to e752x EDAC driver.

Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrei Konovalov <akonovalov@ru.mvista.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:25 -07:00
Jason Uhlenkott
870897a5ab drivers/edac/i3000: document type promotion
By popular request, add a comment documenting the implicit type promotion
here.

Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:23 -08:00
Hitoshi Mitake
7ed31e0fa0 drivers/edac: i3000: missing init code
There is a missing sequence of initialization code during startup.

Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com>
Signed-off-by: Doug Thompson <dougthompson@xmisson.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:23 -08:00
Doug Thompson
cd4755c2a9 drivers/edac: mpc85xx: add static scope
Made a previous global variable, static in scope

Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:23 -08:00
Jason Uhlenkott
f5c0454c86 drivers/edac: i3000: 64bit build
Modified to run on x86_64 as well as x86

i3000_edac builds (and runs) fine on x86_64.

Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:23 -08:00
Bryan Boatright
6b09ff9d78 drivers/edac: pci: broken parity regression
Using the EDAC code in kernel.org kernel version 2.6.23.8 I am seeing the
following problem:

    In the kernel there is a pci device attribute located in sysfs that is
    checked by the EDAC PCI scanning code. If that attribute is set,
    PCI parity/error scannining is skipped for that device. The attribute
    is:

            broken_parity_status

    as is located in /sys/devices/pci<XXX>/0000:XX:YY.Z directorys for
    PCI devices.

I don't think this check was actually implemented.  I have a misbehaved card
that reports a parity error every 1000 ms:

Nov 25 07:28:43 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
Nov 25 07:28:44 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
Nov 25 07:28:45 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0

Setting that card's broken_parity_status bit did not mask the error:

echo "1" > /sys/bus/pci/devices/0000:05:01.0/broken_parity_status

I looked through the EDAC code and did not readily see any reference to
broken_parity_status at all (which makes sense based on the behavior I am
seeing).  I applied the following patch as a proof-of-concept and now EDAC's
PCI parity error reporting behaves as documented:

bryan

Good regression find, bryan. It used to work. sigh.
I added more logic to your patch, for more coverage of the error.

Doug T

Signed-off-by: Bryan Boatright <b1@omega71.com>
Signed-off-by: Doug Thompson <dougthompson@xmisson.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:23 -08:00
Dave Jiang
4f4aeeabc0 drivers-edac: add marvell mv64x60 driver
Marvell mv64x60 SoC support for EDAC.  Used on PPC and MIPS platforms.
Development and testing done on PPC Motorola prpmc2800 ATCA board.

[akpm@linux-foundation.org: make mv64x60_ctl_name static]
Signed-off-by: Dave Jiang <djiang@mvista.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:23 -08:00
Dave Jiang
a9a753d532 drivers-edac: add freescale mpc85xx driver
EDAC chip driver support for Freescale MPC85xx platforms. PPC based.

Signed-off-by: Dave Jiang <djiang@mvista.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk
Signed-off-by:	Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:23 -08:00
Jason Uhlenkott
4d2b165eca drivers-edac: i3000 replace macros with functions
Replace function-like macros with functions.

Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:23 -08:00
Jason Uhlenkott
ce783d70b9 drivers-edac: i3000 code tidying
Style cleanup, mostly just 80-column fixes.

Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:23 -08:00
Benjamin Herrenschmidt
48764e4143 drivers-edac: add Cell MC driver
Adds driver for the Cell memory controller when used without a Hypervisor such
as on the IBM Cell blades.  There might still be some improvements to do to
this such as finding if it's possible to properly obtain more details about
the address of the error but it's good enough already to report CE counts
which is our main priority at the moment.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:23 -08:00
Benjamin Herrenschmidt
1d5f726cbf drivers-edac: add Cell XDR memory types
Add the definitions for the Rambus XDR memory type used by the Cell processor.
It's a pre-requisite for the followup Cell EDAC patch.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:23 -08:00
Anton Blanchard
c2ae24cfd1 drivers-edac: use round_jiffies_relative
When rounding a relative timeout we need to use round_jiffies_relative().

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:23 -08:00
Doug Thompson
56e61a9c5f drivers-edac: turn on edac device error logging
ENABLE the 'logging' of CE and UE events for the EDAC_DEVICE class of error
harvester in EDAC

Cc: Alan Cox <alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:23 -08:00
Joe Perches
6f042b50e0 drivers/edac/: Spelling fixes
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2008-02-03 17:12:34 +02:00
Paul Mackerras
bd45ac0c5d Merge branch 'linux-2.6' 2008-01-31 11:25:51 +11:00
Kay Sievers
af5ca3f4ec Driver core: change sysdev classes to use dynamic kobject names
All kobjects require a dynamically allocated name now. We no longer
need to keep track if the name is statically assigned, we can just
unconditionally free() all kobject names on cleanup.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:40 -08:00
Greg Kroah-Hartman
c10997f657 Kobject: convert drivers/* from kobject_unregister() to kobject_put()
There is no need for kobject_unregister() anymore, thanks to Kay's
kobject cleanup changes, so replace all instances of it with
kobject_put().


Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:40 -08:00
Greg Kroah-Hartman
b2ed215a33 Kobject: change drivers/edac to use kobject_init_and_add
Stop using kobject_register, as this way we can control the sending of
the uevent properly, after everything is properly initialized.

Acked-by: Doug Thompson <dougthompson@xmission.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:28 -08:00
Greg Kroah-Hartman
3514faca19 kobject: remove struct kobj_type from struct kset
We don't need a "default" ktype for a kset.  We should set this
explicitly every time for each kset.  This change is needed so that we
can make ksets dynamic, and cleans up one of the odd, undocumented
assumption that the kset/kobject/ktype model has.

This patch is based on a lot of help from Kay Sievers.

Nasty bug in the block code was found by Dave Young
<hidave.darkstar@gmail.com>

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-01-24 20:40:10 -08:00
Olof Johansson
0d08a84770 [POWERPC] pasemi: Broaden specific references to 1682M
There will be more product numbers in the future than just PA6T-1682M,
but they will share much of the features. Remove some of the explicit
references and compatibility checks with 1682M, and replace most of them
with the more generic term "PWRficient".

Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Michael Buesch <mb@bu3sch.de>
Acked-by: Doug Thompson <dougthompson@xmission.com>
2007-11-29 22:30:47 -06:00
Darrick J. Wong
57510c2f93 i5000_edac: no need to __stringify() KBUILD_BASENAME
The i5000_edac driver's PCI registration structure has the name
""i5000_edac"" (with extra set of double-quotes) which is probably not
intentional.  Get rid of __stringify.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-11-14 18:45:41 -08:00
Stephen Rothwell
1b3e4c706c NULL terminate the pci_device_ids in pasemi_edac
Fixes:
drivers/edac/pasemi_edac: struct pci_device_id is 32 bytes.  The last of 1 is:
0x00 0x00 0x19 0x59 0x00 0x00 0xa0 0x0a 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
FATAL: drivers/edac/pasemi_edac: struct pci_device_id is not terminated with a NULL entry!

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:56 -07:00
Jiri Slaby
93043ece03 define global BIT macro
define global BIT macro

move all local BIT defines to the new globally define macro.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 11:53:42 -07:00
Greg Kroah-Hartman
34980ca8fa Drivers: clean up direct setting of the name of a kset
A kset should not have its name set directly, so dynamically set the
name at runtime.

This is needed to remove the static array in the kobject structure which
will be changed in a future patch.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-12 14:51:02 -07:00
Aristeu Rozanski
f9b5a5d193 drivers/edac: fix e752x correct return code
This patch changes the error code when dev0:fun1 was hidden by BIOS to one
more appropriate.

Signed-off-by: Aristeu Rozanski <aris@ruivo.org>
Signed-off-by: Mark Gross <mark.gross@intel.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-11 17:21:19 -07:00
Doug Thompson
3c8bb2cfa2 drivers/edac: fix printk level down to debug from emerg
When EDAC is configured for EDAC DEBUGGING, the debug printk output level
was set TOO high (EMERG). This patch brings it down to a DEBUG level

Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-11 17:21:19 -07:00
Doug Thompson
ddcc3050bd drivers/edac: fix pasemi kconfig depends
Fixed 'depends on PPC_PASEMI' in EDAC Kconfig.  Module PASEMI depends ONLY on
the PASEMI on PPC.

Was previously enabled for ALL PPC

Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Egor N. Martovetsky <egor@pasemi.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-26 11:35:18 -07:00