Commit Graph

919 Commits

Author SHA1 Message Date
Mauro Carvalho Chehab
a7d7d2e1a0 edac: Create a dimm struct and move the labels into it
The way a DIMM is currently represented implies that they're
linked into a per-csrow struct. However, some drivers don't see
csrows, as they're ridden behind some chip like the AMB's
on FBDIMM's, for example.

This forced drivers to fake^Wvirtualize a csrow struct, and to create
a mess under csrow/channel original's concept.

Move the DIMM labels into a per-DIMM struct, and add there
the real location of the socket, in terms of csrow/channel.
Latter patches will modify the location to properly represent the
memory architecture.

All other drivers will use a per-csrow type of location.
Some of those drivers will require a latter conversion, as
they also fake the csrows internally.

TODO: While this patch doesn't change the existing behavior, on
csrows-based memory controllers, a csrow/channel pair points to a memory
rank. There's a known bug at the EDAC core that allows having different
labels for the same DIMM, if it has more than one rank. A latter patch
is need to merge the several ranks for a DIMM into the same dimm_info
struct, in order to avoid having different labels for the same DIMM.

The edac_mc_alloc() will now contain a per-dimm initialization loop that
will be changed by latter patches in order to match other types of
memory architectures.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-05-28 19:10:57 -03:00
Borislav Petkov
e8f380e008 x86/bitops: Move BIT_64() for a wider use
Needed for shifting 64-bit values on 32-bit, like MSR values,
for example.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frank Arnold <frank.arnold@amd.com>
Link: http://lkml.kernel.org/r/1337684026-19740-1-git-send-email-bp@amd64.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-05-23 17:16:42 +02:00
Jiri Kosina
f70d4a95ed edac, mips: don't change code that has been removed in edac/mips tree
This is a partial revert of

	15ed103a98 ("edac: Fix spelling errors")
	6997991ab0 ("mips: Fix printk typos in arc/mips")

which change code that doesn't exist any more in edac/mips trees.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-05-22 11:00:09 +02:00
David Mackey
15ed103a98 edac: Fix spelling errors.
Signed-off-by: David Mackey <tdmackey@twitter.com>
Signed-off-by: Vinson Lee <vlee@twitter.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-04-30 13:28:41 +02:00
Linus Torvalds
4157368edb Merge branch 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull arch/tile bug fixes from Chris Metcalf:
 "This includes Paul Gortmaker's change to fix the <asm/system.h>
  disintegration issues on tile, a fix to unbreak the tilepro ethernet
  driver, and a backlog of bugfix-only changes from internal Tilera
  development over the last few months.

  They have all been to LKML and on linux-next for the last few days.
  The EDAC change to MAINTAINERS is an oddity but discussion on the
  linux-edac list suggested I ask you to pull that change through my
  tree since they don't have a tree to pull edac changes from at the
  moment."

* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (39 commits)
  drivers/net/ethernet/tile: fix netdev_alloc_skb() bombing
  MAINTAINERS: update EDAC information
  tilepro ethernet driver: fix a few minor issues
  tile-srom.c driver: minor code cleanup
  edac: say "TILEGx" not "TILEPro" for the tilegx edac driver
  arch/tile: avoid accidentally unmasking NMI-type interrupt accidentally
  arch/tile: remove bogus performance optimization
  arch/tile: return SIGBUS for addresses that are unaligned AND invalid
  arch/tile: fix finv_buffer_remote() for tilegx
  arch/tile: use atomic exchange in arch_write_unlock()
  arch/tile: stop mentioning the "kvm" subdirectory
  arch/tile: export the page_home() function.
  arch/tile: fix pointer cast in cacheflush.c
  arch/tile: fix single-stepping over swint1 instructions on tilegx
  arch/tile: implement panic_smp_self_stop()
  arch/tile: add "nop" after "nap" to help GX idle power draw
  arch/tile: use proper memparse() for "maxmem" options
  arch/tile: fix up locking in pgtable.c slightly
  arch/tile: don't leak kernel memory when we unload modules
  arch/tile: fix bug in delay_backoff()
  ...
2012-04-06 17:56:20 -07:00
Borislav Petkov
ec3e82d6dc MCE, AMD: Drop too granulary family model checks
MCA details seldom change inbetween the models of a family so don't
be too conservative and enable decoding on everything starting from
K8 onwards. Minor adjustments can come in later but most importantly,
we have some decoding infrastructure in place for upcoming models by
default.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-04-04 15:50:11 +02:00
Chris Metcalf
e2e110d759 edac: say "TILEGx" not "TILEPro" for the tilegx edac driver
This is just an aesthetic change but it was silly to say TILEPro
when booting up on the tilegx architecture.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2012-04-02 12:14:06 -04:00
Linus Torvalds
f0f3680e50 Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull EDAC fixes from Mauro Carvalho Chehab:
 "A series of EDAC driver fixes.  It also has one core fix at the
  documentation, and a rename patch, fixing the name of the struct that
  contains the rank information."

* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac:
  edac: rename channel_info to rank_info
  i5400_edac: Avoid calling pci_put_device() twice
  edac: i5100 ack error detection register after each read
  edac: i5100 fix erroneous define for M1Err
  edac: sb_edac: Fix a wrong value setting for the previous value
  edac: sb_edac: Fix a INTERLEAVE_MODE() misuse
  edac: sb_edac: Let the driver depend on PCI_MMCONFIG
  edac: Improve the comments to better describe the memory concepts
  edac/ppc4xx_edac: Fix compilation
  Fix sb_edac compilation with 32 bits kernels
2012-03-28 14:24:40 -07:00
Linus Torvalds
250f6715a4 The following text was taken from the original review request:
"[RFC PATCH 0/2] audit of linux/device.h users in include/*"
 		https://lkml.org/lkml/2012/3/4/159
 --
 
 Nearly every subsystem has some kind of header with a proto like:
 
 	void foo(struct device *dev);
 
 and yet there is no reason for most of these guys to care about the
 sub fields within the device struct.  This allows us to significantly
 reduce the scope of headers including headers.  For this instance, a
 reduction of about 40% is achieved by replacing the include with the
 simple fact that the device is some kind of a struct.
 
 Unlike the much larger module.h cleanup, this one is simply two
 commits.  One to fix the implicit <linux/device.h> users, and then
 one to delete the device.h includes from the linux/include/ dir
 wherever possible.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPbNxLAAoJEOvOhAQsB9HWR6QQAMRUZ94O2069/nW9h4TO/xTr
 Hq/80lo/TBBiRmob3iWBP76lzgeeMPPVEX1I6N7YYlhL3IL7HsaJH1DvpIPPHXQP
 GFKcBsZ5ZLV8c4CBDSr+/HFNdhXc0bw0awBjBvR7gAsWuZpNFn4WbhizJi4vWAoE
 4ydhPu55G1G8TkBtYLJQ8xavxsmiNBSDhd2i+0vn6EVpgmXynjOMG8qXyaS97Jvg
 pZLwnN5Wu21coj6+xH3QUKCl1mJ+KGyamWX5gFBVIfsDB3k5H4neijVm7t1en4b0
 cWxmXeR/JE3VLEl/17yN2dodD8qw1QzmTWzz1vmwJl2zK+rRRAByBrL0DP7QCwCZ
 ppeJbdhkMBwqjtknwrmMwsuAzUdJd79GXA+6Vm+xSEkr6FEPK1M0kGbvaqV9Usgd
 ohMewewbO6ddgR9eF7Kw2FAwo0hwkPNEplXIym9rZzFG1h+T0STGSHvkn7LV765E
 ul1FapSV3GCxEVRwWTwD28FLU2+0zlkOZ5sxXwNPTT96cNmW+R7TGuslZKNaMNjX
 q7eBZxo8DtVt/jqJTntR8bs8052c8g1Ac1IKmlW8VSmFwT1M6VBGRn1/JWAhuUgv
 dBK/FF+I1GJTAJWIhaFcKXLHvmV9uhS6JaIhLMDOetoOkpqSptJ42hDG+89WkFRk
 o55GQ5TFdoOpqxVzGbvE
 =3j4+
 -----END PGP SIGNATURE-----

Merge tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

Pull <linux/device.h> avoidance patches from Paul Gortmaker:
 "Nearly every subsystem has some kind of header with a proto like:

	void foo(struct device *dev);

  and yet there is no reason for most of these guys to care about the
  sub fields within the device struct.  This allows us to significantly
  reduce the scope of headers including headers.  For this instance, a
  reduction of about 40% is achieved by replacing the include with the
  simple fact that the device is some kind of a struct.

  Unlike the much larger module.h cleanup, this one is simply two
  commits.  One to fix the implicit <linux/device.h> users, and then one
  to delete the device.h includes from the linux/include/ dir wherever
  possible."

* tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
  device.h: audit and cleanup users in main include dir
  device.h: cleanup users outside of linux/include (C files)
2012-03-24 10:41:37 -07:00
Linus Torvalds
dae430c6f6 A bunch of fixes/updates for the AMD side of EDAC including
* MCE decoding updates
 * tree-wide EDAC sweep making pci_device_ids __devinitconst
 * Scrub rate API correction
 * two amd64_edac corrections for K8 boxes and sysfs csrow nodes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPZ0GkAAoJEBLB8Bhh3lVKjr4P/2q+7iZY8skYJqjrGel1vpKG
 aMwn1DTv9Mhy2omE+UDSuMVWrZ8CI/G+5aW5j6uulj6uko+3Q1aQtkN3zQhoa1RG
 7Q2X4t0skRnr7xOJKIYour+ufX4MLoVdEZqu4jvIFoniAQEXbDfbgT7KnA/hRwFp
 i04GaFRsPb97cvi7OfMpczTVan3lDUhT81kvdlrVZGhu3dVIkZw//hmsrt98SF1G
 qeBSvY/ji45D/mro6QMDZcN5+AWgLuI4ivM3Bk1tFgxs6KVaHcCOfPC8szQAuupl
 fQl/N4sHyj7BuTHlCWEog+V67ifpZcb5peya9Vs5nArhj/GzFmVhE0JBgFG7aCXQ
 hnAEhmgUI5NdHhLmrmGWSPOEQbG+V53/mBEk45sdH9zzl7yBH20gILdYFejmwEuX
 HfTTugIby4ncsw9Gkmi2rx33wNuzdOsXNRILk56bSf9X3e1m7LDtZV6EMeQ9I3D9
 ONjrrmNrkz5AmI765IsGs6vJT6ZkpZk6DNbykEOTPtoOhwiCSsbapowh9NpdhL1k
 EwUuZfyWoulVeYJw+m6a6Aw9bp2stj694p0tfdXh4C9g09Qlvx2RANBAgEJQE/oD
 5rWE/p/HiDFuMd7IrRuLaC1vV7YnWNX/EHLvtV7Ei2qslaK9XxFIb9p7jqyUIpoj
 nwUDJYIB4Dbk+NSV34N3
 =Q+wu
 -----END PGP SIGNATURE-----

Merge tag 'amd64-edac-updates-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp

Pull AMD64 EDAC fixes from Borislav Petkov:
 "A bunch of fixes/updates for the AMD side of EDAC including

   * MCE decoding updates
   * tree-wide EDAC sweep making pci_device_ids __devinitconst
   * Scrub rate API correction
   * two amd64_edac corrections for K8 boxes and sysfs csrow nodes"

* tag 'amd64-edac-updates-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  MCE, AMD: Constify error tables
  MCE, AMD: Correct bank 5 error signatures
  MCE, AMD: Rework NB MCE signatures
  MCE, AMD: Correct VB data error description
  MCE, AMD: Correct ucode patch buffer description
  MCE, AMD: Correct some MC0 error types
  EDAC: Make pci_device_id tables __devinitconst.
  EDAC: Correct scrub rate API
  amd64_edac: Fix K8 revD and later chip select sizes
  amd64_edac: Fix missing csrows sysfs nodes
2012-03-23 17:59:47 -07:00
Mauro Carvalho Chehab
a4b4be3fd7 edac: rename channel_info to rank_info
What it is pointed by a csrow/channel vector is a rank information, and
not a channel information.

On a traditional architecture, the memory controller directly access the
memory ranks, via chip select rows. Different ranks at the same DIMM is
selected via different chip select rows. So, typically, one
csrow/channel pair means one different DIMM.

On FB-DIMMs, there's a microcontroller chip at the DIMM, called Advanced
Memory Buffer (AMB) that serves as the interface between the memory
controller and the memory chips.

The AMB selection is via the DIMM slot, and not via a csrow.

It is up to the AMB to talk with the csrows of the DRAM chips.

So, the FB-DIMM memory controllers see the DIMM slot, and not the DIMM
rank. RAMBUS is similar.

Newer memory controllers, like the ones found on Intel Sandy Bridge and
Nehalem, even working with normal DDR3 DIMM's, don't use the usual
channel A/channel B interleaving schema to provide 128 bits data access.

Instead, they have more channels (3 or 4 channels), and they can use
several interleaving schemas. Such memory controllers see the DIMMs
directly on their registers, instead of the ranks, which is better for
the driver, as its main usageis to point to a broken DIMM stick (the
Field Repleceable Unit), and not to point to a broken DRAM chip.

The drivers that support such such newer memory architecture models
currently need to fake information and to abuse on EDAC structures, as
the subsystem was conceived with the idea that the csrow would always be
visible by the CPU.

To make things a little worse, those drivers don't currently fake
csrows/channels on a consistent way, as the concepts there don't apply
to the memory controllers they're talking with. So, each driver author
interpreted the concepts using a different logic.

In order to fix it, let's rename the data structure that points into a
DIMM rank to "rank_info", in order to be clearer about what's stored
there.

Latter patches will provide a better way to represent the memory
hierarchy for the other types of memory controller.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21 15:22:50 -03:00
Mauro Carvalho Chehab
0142877aa4 i5400_edac: Avoid calling pci_put_device() twice
When i5400_edac driver is removed and re-loaded a few times, it causes
an OOPS, as it is currently decrementing some PCI device usage two
times.

When called inside a loop, pci_get_device() will call
pci_put_device(). That mangles the error count. In this specific
case, it seems easier to just duplicate the call.

Also fixes the error logic when pci_get_device fails.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21 15:22:49 -03:00
Niklas Söderlund
df95e42e1f edac: i5100 ack error detection register after each read
If I only ack the detection register after a error have been detected
I'm unable to reliably detect errors. I have verified this behavior
using both an error injection DIMM and software to inject errors.

I can't find any documentation supporting this behavior in Intel 5100
Memory Controller Hub Chipset, see 1. So this is all based on
experimentation.

[1] Intel® 5100 Memory Controller Hub Chipset
    http://www.intel.com/content/dam/doc/datasheet/5100-
	memory-controller-hub-chipset-datasheet.pdf

Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21 15:22:49 -03:00
Niklas Söderlund
b6378cb3e5 edac: i5100 fix erroneous define for M1Err
According to [1] the define for M1Err in the FERR_NF_MEM register is
wrong. It should be at position 1 not 0.

[1] Intel 5100 Memory Controller Hub Chipset Doc.Nr: 318378
    http://www.intel.com/content/dam/doc/datasheet/5100-
    memory-controller-hub-chipset-datasheet.pdf

Reported-by: Ba Thang Nguyen <thang.b.nguyen@dektech.com.au>
Signed-off-by: Niklas Söderlund <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21 15:20:55 -03:00
Hui Wang
7fae0db439 edac: sb_edac: Fix a wrong value setting for the previous value
>From the driver design, the variable limit wants to compare with its
previous value, we should set the value of limit instead of the value
of tmp_mb to the variable prev.

Signed-off-by: Hui Wang <jason77.wang@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21 15:20:11 -03:00
Hui Wang
ad9c40b7dd edac: sb_edac: Fix a INTERLEAVE_MODE() misuse
We can identify dram interleave mode from the Dram Rule register
rather than Dram Interleave list register.

In this context, the reg of INTERLEAVE_MODE(reg) contains the Dram
Interleave list register, we can't get interleave mode from the reg,
while the variable interleave_mode saves the the mode got from the
Dram Rule register, so we use the variable to replace
INTERLEAVE_MDDE(reg) here.

Signed-off-by: Hui Wang <jason77.wang@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21 15:20:02 -03:00
Hui Wang
22a5c27bec edac: sb_edac: Let the driver depend on PCI_MMCONFIG
This driver needs to access PCIe Extended Configuration Space
Registers (0x100~0xfff), to correctly access those registers, we need
to enable PCI_MMCONFIG option. Since this option is not enabled for
X86_64 by default, we let the driver depend on it to prevent users
forgetting to enable this option.

Signed-off-by: Hui Wang <jason77.wang@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21 15:19:56 -03:00
Mauro Carvalho Chehab
b877763ea0 edac/ppc4xx_edac: Fix compilation
It seems that nobody is cross-compiling for this arch anymore...

drivers/edac/ppc4xx_edac.c: In function 'ppc4xx_edac_probe':
drivers/edac/ppc4xx_edac.c:188:12: error: storage class specified for parameter 'ppc4xx_edac_remove'
...
drivers/edac/ppc4xx_edac.c:1068:19: error: 'match' undeclared (first use in this function)
drivers/edac/ppc4xx_edac.c:1068:19: note: each undeclared identifier is reported only once for each function it appears in
drivers/edac/ppc4xx_edac.c:1068:36: warning: left-hand operand of comma expression has no effect [-Wunused-value]

Acked-by: Josh Boyer <jwboyer@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21 15:19:44 -03:00
Mauro Carvalho Chehab
5b889e379f Fix sb_edac compilation with 32 bits kernels
As reported by Josh Boyer <jwboyer@redhat.com>:
>	drivers/edac/sb_edac.c: In function 'get_memory_error_data':
> 	drivers/edac/sb_edac.c:861:2: warning: left shift count >= width of type
> 	[enabled by default]
> 	<snip>
> 	ERROR: "__udivdi3" [drivers/edac/sb_edac.ko] undefined!
> 	make[1]: *** [__modpost] Error 1
> 	make: *** [modules] Error 2

PS.: compile-tested only

Reported-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2012-03-21 15:19:38 -03:00
Cong Wang
4e5df7ca30 edac: remove the second argument of k[un]map_atomic()
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:17 +08:00
Borislav Petkov
ebe2aea868 MCE, AMD: Constify error tables
... so that checkpatch can chill out.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
2012-03-19 12:06:26 +01:00
Borislav Petkov
ae615b4b5f MCE, AMD: Correct bank 5 error signatures
... and remove superfluous ErrorCodeExt check.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
2012-03-19 12:06:26 +01:00
Borislav Petkov
68782673e6 MCE, AMD: Rework NB MCE signatures
Correct their formulation, replace per-family functions with a single,
unified lookup table.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
2012-03-19 12:06:25 +01:00
Borislav Petkov
b64a99c175 MCE, AMD: Correct VB data error description
Sync with latest BKDG error types.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
2012-03-19 12:06:25 +01:00
Borislav Petkov
6c1173a61e MCE, AMD: Correct ucode patch buffer description
This MC1 error signature is called differently now, fix it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
2012-03-19 12:06:24 +01:00
Borislav Petkov
344f0a0631 MCE, AMD: Correct some MC0 error types
Use "System Read Data Error" as a more general name for MC0 bus errors
on F15h and update some error definitions.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
2012-03-19 12:06:23 +01:00
Lionel Debroux
36c46f31df EDAC: Make pci_device_id tables __devinitconst.
These const tables are currently marked __devinitdata, but
Documentation/PCI/pci.txt says:

"o The ID table array should be marked __devinitconst; this is done
automatically if the table is declared with DEFINE_PCI_DEVICE_TABLE()."

So use DEFINE_PCI_DEVICE_TABLE(x).

Based on PaX and earlier work by Andi Kleen.

Signed-off-by: Lionel Debroux <lionel_debroux@yahoo.fr>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-03-19 12:04:54 +01:00
Borislav Petkov
5e8e19bf6c EDAC: Correct scrub rate API
The original scrub rate API definition states that if scrub rate
accessors are not implemented, a negative value (-1) should be written
to the sysfs file (/sys/devices/system/edac/mc/mc<N>/sdram_scrub_rate,
where N is the memory controller number on the system). This is
counter-intuitive and awkward at the very least because, when setting
the scrub rate, userspace has to write to sysfs and then read it back to
check error status of the operation.

As Tony notes, best it would be to not have the sdram_scrub_rate in
sysfs if scrub rate support is not implemented. It is too late about
that and a bunch of drivers on a bunch of arches would need to be
changed and tested which is not a trivial task ATM.

Instead, settle for the next best thing of returning -ENODEV when
implementation is missing and -EINVAL when there was an error
encountered while setting the scrub rate.

Reported-by: Han Pingtian <phan@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20110916105856.GA13253@hpt.nay.redhat.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-03-19 12:03:58 +01:00
Borislav Petkov
11b0a31473 amd64_edac: Fix K8 revD and later chip select sizes
Fix DRAM chip select sizes calculation for K8, revisions D and E.

Reported-by: Niklas Söderlund <niklas.soderlund@ericsson.com
Link: http://lkml.kernel.org/r/1320849178-23340-1-git-send-email-niklas.soderlund@ericsson.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-03-19 12:02:46 +01:00
Ashish Shenoy
f92cae4526 amd64_edac: Fix missing csrows sysfs nodes
While initializing the array of csrow attribute instances, a few csrows
were uninitialized. This happened because the module only performed a
check for DRAM base ctl register0's and not DRAM base ctl register1's
chip select enable bit. There could be systems with DIMMs populated
on only single memory channel whereas the module also assumed that a
dual channel dimm had double the memory size of a single memory channel
instead of checking the memory on each channel.

This patch fixes these above issues.

Signed-off-by: Ashish Shenoy <ashenoy@riverbed.com>
Signed-off-by: Prasanna S. Panchamukhi <ppanchamukhi@riverbed.com>
Link: http://lkml.kernel.org/r/4F459CFA.5090604@riverbed.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2012-03-19 11:57:28 +01:00
Paul Gortmaker
51990e8254 device.h: cleanup users outside of linux/include (C files)
For files that are actively using linux/device.h, make sure
that they call it out.  This will allow us to clean up some
of the implicit uses of linux/device.h within include/*
without introducing build regressions.

Yes, this was created by "cheating" -- i.e. the headers were
cleaned up, and then the fallout was found and fixed, and then
the two commits were reordered.  This ensures we don't introduce
build regressions into the git history.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-03-11 14:27:37 -04:00
Hitoshi Mitake
797a796a13 asm-generic: architecture independent readq/writeq for 32bit environment
This provides unified readq()/writeq() helper functions for 32-bit
drivers.

For some cases, readq/writeq without atomicity is harmful, and order of
io access has to be specified explicitly.  So in this patch, new two
header files which contain non-atomic readq/writeq are added.

 - <asm-generic/io-64-nonatomic-lo-hi.h> provides non-atomic readq/
   writeq with the order of lower address -> higher address

 - <asm-generic/io-64-nonatomic-hi-lo.h> provides non-atomic readq/
   writeq with reversed order

This allows us to remove some readq()s that were added drivers when the
default non-atomic ones were removed in commit dbee8a0aff ("x86:
remove 32-bit versions of readq()/writeq()")

The drivers which need readq/writeq but can do with the non-atomic ones
must add the line:

  #include <asm-generic/io-64-nonatomic-lo-hi.h> /* or hi-lo.h */

But this will be nop in 64-bit environments, and no other #ifdefs are
required.  So I believe that this patch can solve the problem of
 1. driver-specific readq/writeq
 2. atomicity and order of io access

This patch is tested with building allyesconfig and allmodconfig as
ARCH=x86 and ARCH=i386 on top of tip/master.

Cc: Kashyap Desai <Kashyap.Desai@lsi.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Ravi Anand <ravi.anand@qlogic.com>
Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: James Bottomley <James.Bottomley@parallels.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: James Bottomley <jbottomley@parallels.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Hitoshi Mitake <h.mitake@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-02-21 16:47:28 -08:00
Rusty Russell
90ab5ee941 module_param: make bool parameters really bool (drivers & misc)
module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.

Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2012-01-13 09:32:20 +10:30
Linus Torvalds
98793265b4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
  Kconfig: acpi: Fix typo in comment.
  misc latin1 to utf8 conversions
  devres: Fix a typo in devm_kfree comment
  btrfs: free-space-cache.c: remove extra semicolon.
  fat: Spelling s/obsolate/obsolete/g
  SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
  tools/power turbostat: update fields in manpage
  mac80211: drop spelling fix
  types.h: fix comment spelling for 'architectures'
  typo fixes: aera -> area, exntension -> extension
  devices.txt: Fix typo of 'VMware'.
  sis900: Fix enum typo 'sis900_rx_bufer_status'
  decompress_bunzip2: remove invalid vi modeline
  treewide: Fix comment and string typo 'bufer'
  hyper-v: Update MAINTAINERS
  treewide: Fix typos in various parts of the kernel, and fix some comments.
  clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
  gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
  leds: Kconfig: Fix typo 'D2NET_V2'
  sound: Kconfig: drop unknown symbol ARCH_CLPS7500
  ...

Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
kconfig additions, close to removed commented-out old ones)
2012-01-08 13:21:22 -08:00
Linus Torvalds
7affca3537 Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
* 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
  arm: fix up some samsung merge sysdev conversion problems
  firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
  Drivers:hv: Fix a bug in vmbus_driver_unregister()
  driver core: remove __must_check from device_create_file
  debugfs: add missing #ifdef HAS_IOMEM
  arm: time.h: remove device.h #include
  driver-core: remove sysdev.h usage.
  clockevents: remove sysdev.h
  arm: convert sysdev_class to a regular subsystem
  arm: leds: convert sysdev_class to a regular subsystem
  kobject: remove kset_find_obj_hinted()
  m86k: gpio - convert sysdev_class to a regular subsystem
  mips: txx9_sram - convert sysdev_class to a regular subsystem
  mips: 7segled - convert sysdev_class to a regular subsystem
  sh: dma - convert sysdev_class to a regular subsystem
  sh: intc - convert sysdev_class to a regular subsystem
  power: suspend - convert sysdev_class to a regular subsystem
  power: qe_ic - convert sysdev_class to a regular subsystem
  power: cmm - convert sysdev_class to a regular subsystem
  s390: time - convert sysdev_class to a regular subsystem
  ...

Fix up conflicts with 'struct sysdev' removal from various platform
drivers that got changed:
 - arch/arm/mach-exynos/cpu.c
 - arch/arm/mach-exynos/irq-eint.c
 - arch/arm/mach-s3c64xx/common.c
 - arch/arm/mach-s3c64xx/cpu.c
 - arch/arm/mach-s5p64x0/cpu.c
 - arch/arm/mach-s5pv210/common.c
 - arch/arm/plat-samsung/include/plat/cpu.h
 - arch/powerpc/kernel/sysfs.c
and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h
2012-01-07 12:03:30 -08:00
Linus Torvalds
edf7c8148e Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: add IRQ context simulation in module mce-inject
  x86, mce, therm_throt: Don't report power limit and package level thermal throttle events in mcelog
  x86, MCE: Drain mcelog buffer
  x86, mce: Add wrappers for registering on the decode chain
2012-01-06 15:02:37 -08:00
Greg Kroah-Hartman
ff4b8a57f0 Merge branch 'driver-core-next' into Linux 3.2
This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
and it fixes the build error in the arch/x86/kernel/microcode_core.c
file, that the merge did not catch.

The microcode_core.c patch was provided by Stephen Rothwell
<sfr@canb.auug.org.au> who was invaluable in the merge issues involved
with the large sysdev removal process in the driver-core tree.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-06 11:42:52 -08:00
Kevin Winchester
141168c36c x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86'
Several fields in struct cpuinfo_x86 were not defined for the
!SMP case, likely to save space.  However, those fields still
have some meaning for UP, and keeping them allows some #ifdef
removal from other files.  The additional size of the UP kernel
from this change is not significant enough to worry about
keeping up the distinction:

	   text    data     bss     dec     hex filename
	4737168	 506459	 972040	6215667	 5ed7f3	vmlinux.o.before
	4737444	 506459	 972040	6215943	 5ed907	vmlinux.o.after

for a difference of 276 bytes for an example UP config.

If someone wants those 276 bytes back badly then it should
be implemented in a cleaner way.

Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Cc: Steffen Persvold <sp@numascale.com>
Link: http://lkml.kernel.org/r/1324428742-12498-1-git-send-email-kjwinchester@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-12-21 09:25:09 +01:00
Kay Sievers
fe5ff8b84c edac: convert sysdev_class to a regular subsystem
After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-12-14 15:21:07 -08:00
Borislav Petkov
3653ada5d3 x86, mce: Add wrappers for registering on the decode chain
No functionality change, this is done so that in a follow-on patch all
queued-up MCEs can be decoded after registering on the chain.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-12-14 12:50:12 +01:00
Justin P. Mattock
42b2aa86c6 treewide: Fix typos in various parts of the kernel, and fix some comments.
The below patch fixes some typos in various parts of the kernel, as well as fixes some comments.
Please let me know if I missed anything, and I will try to get it changed and resent.

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-12-02 14:57:31 +01:00
Shaohui Xie
86f9a43305 drivers/edac/mpc85xx_edac.c: fix memory controller compatible for edac
compatible in dts has been changed, so the driver needs to be updated
accordingly.

Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2011-11-24 01:59:38 -06:00
Jiri Kosina
2290c0d06d Merge branch 'master' into for-next
Sync with Linus tree to have 157550ff ("mtd: add GPMI-NAND driver
in the config and Makefile") as I have patch depending on that one.
2011-11-13 20:55:53 +01:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Linus Torvalds
1197ab2942 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (106 commits)
  powerpc/p3060qds: Add support for P3060QDS board
  powerpc/83xx: Add shutdown request support to MCU handling on MPC8349 MITX
  powerpc/85xx: Make kexec to interate over online cpus
  powerpc/fsl_booke: Fix comment in head_fsl_booke.S
  powerpc/85xx: issue 15 EOI after core reset for FSL CoreNet devices
  powerpc/8xxx: Fix interrupt handling in MPC8xxx GPIO driver
  powerpc/85xx: Add 'fsl,pq3-gpio' compatiable for GPIO driver
  powerpc/86xx: Correct Gianfar support for GE boards
  powerpc/cpm: Clear muram before it is in use.
  drivers/virt: add ioctl for 32-bit compat on 64-bit to fsl-hv-manager
  powerpc/fsl_msi: add support for "msi-address-64" property
  powerpc/85xx: Setup secondary cores PIR with hard SMP id
  powerpc/fsl-booke: Fix settlbcam for 64-bit
  powerpc/85xx: Adding DCSR node to dtsi device trees
  powerpc/85xx: clean up FPGA device tree nodes for Freecsale QorIQ boards
  powerpc/85xx: fix PHYS_64BIT selection for P1022DS
  powerpc/fsl-booke: Fix setup_initial_memory_limit to not blindly map
  powerpc: respect mem= setting for early memory limit setup
  powerpc: Update corenet64_smp_defconfig
  powerpc: Update mpc85xx/corenet 32-bit defconfigs
  ...

Fix up trivial conflicts in:
 - arch/powerpc/configs/40x/hcu4_defconfig
	removed stale file, edited elsewhere
 - arch/powerpc/include/asm/udbg.h, arch/powerpc/kernel/udbg.c:
	added opal and gelic drivers vs added ePAPR driver
 - drivers/tty/serial/8250.c
	moved UPIO_TSI to powerpc vs removed UPIO_DWAPB support
2011-11-06 17:12:03 -08:00
Josh Boyer
f04c045f8c edac: Only build sb_edac on 64-bit
The sb_edac driver is marginally useful on a 32-bit kernel, and
currently has 64-bit divide compile errors when building that config.
For now, make this build on only for 64-bit kernels.

Signed-off-by: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-11-03 11:22:44 -07:00
Linus Torvalds
6681ba7ec4 Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (21 commits)
  MAINTAINERS: add an entry for Edac Sandy Bridge driver
  edac: tag sb_edac as EXPERIMENTAL, as it requires more testing
  EDAC: Fix incorrect edac mode reporting in sb_edac
  edac: sb_edac: Add it to the building system
  edac: Add an experimental new driver to support Sandy Bridge CPU's
  i7300_edac: Fix error cleanup logic
  i7core_edac: Initialize memory name with cpu, channel, bank
  i7core_edac: Fix compilation on 32 bits arch
  i7core_edac: scrubbing fixups
  EDAC: Correct Kconfig dependencies
  i7core_edac: return -ENODEV if no MC is found
  i7core_edac: use edac's own way to print errors
  MAINTAINERS: remove dropped edac_mce.* from the file
  i7core_edac: Drop the edac_mce facility
  x86, MCE: Use notifier chain only for MCE decoding
  EDAC i7core: Use mce socketid for better compatibility
  i7core_edac: Don't enable memory scrubbing for Xeon 35xx
  i7core_edac: Add scrubbing support
  edac: Move edac main structs to include/linux/edac.h
  i7core_edac: Fix oops when trying to inject errors
  ...
2011-11-02 16:55:15 -07:00
Mauro Carvalho Chehab
124a02c9b4 edac: tag sb_edac as EXPERIMENTAL, as it requires more testing
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01 10:01:55 -02:00
Mark A. Grondona
c6e13b528f EDAC: Fix incorrect edac mode reporting in sb_edac
The edac driver for Sandy Bridge was found to be reporting "FPM"
for edac_mode, which clearly doesn't make sense. It was found that
sb_edac.c:get_dimm_config was reusing a variable for both mem_type
and edac_type, and thus was overwriting the value after setting
it correctly. This patch fixes that issue.

Before the patch:
/sys/devices/system/edac/mc/mc0/csrow0/edac_mode:FPM
/sys/devices/system/edac/mc/mc0/csrow1/edac_mode:FPM
/sys/devices/system/edac/mc/mc0/csrow2/edac_mode:FPM
/sys/devices/system/edac/mc/mc0/csrow3/edac_mode:FPM

After:
/sys/devices/system/edac/mc/mc0/csrow0/edac_mode:S4ECD4ED
/sys/devices/system/edac/mc/mc0/csrow1/edac_mode:S4ECD4ED
/sys/devices/system/edac/mc/mc0/csrow2/edac_mode:S4ECD4ED
/sys/devices/system/edac/mc/mc0/csrow3/edac_mode:S4ECD4ED

Signed-off-by: Mark A. Grondona <mgrondona@llnl.gov>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01 10:01:55 -02:00
Mauro Carvalho Chehab
3d78c9af78 edac: sb_edac: Add it to the building system
Some changes on it were required due to changeset cd90cc84c6bf0, that
changed the glue with the MCE logic.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01 10:01:54 -02:00
Mauro Carvalho Chehab
eebf11a016 edac: Add an experimental new driver to support Sandy Bridge CPU's
This driver is known to work on mine and Tony's test environments,
using software error injection, and a partial hardware/software
error injection tool.

There's no broader range test yet to double check if the error decoding
logic will actually point to the right DIMM, so use it with care.
More tests are required to be sure that the driver will work on all
different types of memory configurations.

If you're willing to risk using it, I suggest you to enable EDAC debugs
for your test machines, as the debug logs helps to track what's going
inside the driver.

Please feed me with bug reports, if you notice that the driver
is miss-behaving.

Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01 10:01:53 -02:00
Mauro Carvalho Chehab
5f032119d6 i7300_edac: Fix error cleanup logic
The error cleanup logic was broken. Due to that, one error is generated for
every error polling.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01 10:01:53 -02:00
Mauro Carvalho Chehab
767ba4a52a i7core_edac: Initialize memory name with cpu, channel, bank
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01 10:01:52 -02:00
Sedat Dilek
4fad8098bc i7core_edac: Fix compilation on 32 bits arch
on i386:
	ERROR: "__udivdi3" [drivers/edac/i7core_edac.ko] undefined!\

In both get_sdram_scrub_rate() and set_sdram_scrub_rate()

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01 10:01:51 -02:00
Nils Carlson
535e9c78e1 i7core_edac: scrubbing fixups
Get a more reliable DCLK value from DMI, name the SCRUBINTERVAL mask
and guard against potential overflow in the scrub rate computations.

Signed-off-by: Nils Carlson <nils.carlson@ericsson.com>
2011-11-01 10:01:51 -02:00
Borislav Petkov
168eb34ded EDAC: Correct Kconfig dependencies
Both AMD and Intel i7 EDAC drivers use MCE features and are thus
dependent of this functionality present in the kernel. Express this in
Kconfig so that randconfig builds don't break.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01 10:01:50 -02:00
Mauro Carvalho Chehab
4055759145 i7core_edac: return -ENODEV if no MC is found
Nehalem-EX uses a different memory controller. However, as the
memory controller is not visible on some Nehalem/Nehalem-EP, we
need to indirectly probe via a X58 PCI device. The same devices
are found on (some) Nehalem-EX. So, on those machines, the
probe routine needs to return -ENODEV, as the actual Memory
Controller registers won't be detected.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01 10:01:49 -02:00
Mauro Carvalho Chehab
f9902f24fc i7core_edac: use edac's own way to print errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01 10:01:48 -02:00
Borislav Petkov
4140c54266 i7core_edac: Drop the edac_mce facility
Remove edac_mce pieces and use the normal MCE decoder notifier chain by
retaining the same functionality with considerably less code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-11-01 10:01:24 -02:00
Paul Gortmaker
80a2e2e35d drivers/edac: Add module.h to mce_amd_inj.c
This file really needs the full module.h header file present, but
was just getting it implicitly before.  Fix it up in advance so we
avoid build failures once the cleanup commit is present.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:45 -04:00
Thomas Renninger
5034086b72 EDAC i7core: Use mce socketid for better compatibility
mce->socketid and cpu_data(mce->cpu).phys_proc_id are the same,
compare with mce_setup (in mce.c):
	m->cpu = m->extcpu = smp_processor_id();
        ...
	m->socketid = cpu_data(m->extcpu).phys_proc_id;

This makes it easier for example for XEN patches to hook into
the MCE subsystem.
Compile tested on x86_64.

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: JBeulich@novell.com
CC: linux-edac@vger.kernel.org
CC: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-10-31 15:10:05 -02:00
Mauro Carvalho Chehab
27100db0e0 i7core_edac: Don't enable memory scrubbing for Xeon 35xx
Xeon 35xx doesn't mention memory scrub. It seems that only Xeon 55xx
and above supports it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-10-31 15:10:05 -02:00
Samuel Gabrielsson
e8b6a12710 i7core_edac: Add scrubbing support
Add scrubbing support to i7core_edac, tested on intel Xeon L5638.

Signed-off-by: Samuel Gabrielsson <samuel.gabrielsson@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-10-31 15:10:05 -02:00
Mauro Carvalho Chehab
ddeb3547d4 edac: Move edac main structs to include/linux/edac.h
As we'll need to use those structs for trace functions, they should
be on a more public place. So, move struct mem_ctl_info & friends
to edac.h.

No functional changes on this patch.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
2011-10-31 15:10:04 -02:00
Mauro Carvalho Chehab
224e871f36 i7core_edac: Fix oops when trying to inject errors
Error injection needs the pci device 0:0. So, we need to revert
this changeset: 79daef2099.

Tests need to be made to be sure that refcount won't be wrong
as noticed before.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-10-31 15:10:04 -02:00
David Sterba
80b8ce89eb i7core_edac: fix misuse of logical operation in place of bitop
CC: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2011-10-31 15:10:04 -02:00
Arvind R
6099e419ae edac:i82975x fix 32bit compile and cleanup
the clean up achieves:
  1. fix warning on 32-bit compile
  2. reorder info extraction for clarity
  3. add error-trap diagnostic message
  4. handle ALL modes of memory configurations

Signed-off-by: Arvind R. <arvino55@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-10-29 21:29:42 +02:00
Dan Carpenter
1f6189ed18 amd64_edac: Cleanup return type of amd64_determine_edac_cap()
Sparse complains that edac_cap was declared as dev_type and we are
returning edac_type.  Historically, edac_type was correct but since
then we have changed it to return a bit field.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: http://lkml.kernel.org/r/20111006063025.GA2615@mwanda
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-06 12:35:46 +02:00
Borislav Petkov
73ba85937b amd64_edac: Add a fix for Erratum 505
When accessing the scrub rate control register (F3x58) on F15h, the DRAM
controller selector (F1x10C[DctCfgSel]) has to point to DCT0 so that the
scrub rate configuration can take effect. See Erratum 505 in the AMD
F15h revision guide for more details.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-06 12:34:05 +02:00
Borislav Petkov
b0b07a2bd4 EDAC, MCE, AMD: Simplify NB MCE decoder interface
Drop third nbcfg argument which is old remains and not required anymore.

No functionality change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-06 12:34:04 +02:00
Borislav Petkov
295d8cda26 EDAC, MCE, AMD: Drop local coreid reporting
MCE decoding code is reporting the core which encountered the error
unconditionally now so drop this piece. Besides, it reported the
coreid in the local processor package which is not that valuable as a
datapoint.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-06 12:34:03 +02:00
Borislav Petkov
086be786ca EDAC, MCE, AMD: Print valid addr when reporting an error
The MCi_STATUS bank has a AddrV bit which, when set, denotes that the
corresponding MCi_ADDR MSR contains a valid address belonging to the
MCE currently being reported. Dump it since it is definitely relevant
information.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-06 12:34:03 +02:00
Borislav Petkov
bff7b81246 EDAC, MCE, AMD: Print CPU number when reporting the error
Currently, correctable ECCs go through mcelog and do not print the scary
MCE banner. In that case, however, reporting the core where the CECC
happened is important information so dump it along with the decoded
string albeit at risk of having a minor redundancy.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-10-06 12:34:02 +02:00
Dmitry Eremin-Solenikov
ce39508883 cpc925_edac: Support single-processor configurations
If second CPU is not enabled, CPC925 EDAC driver will spill out warnings
about errors on second Processor Interface. Support masking that out,
by detecting at runtime which CPUs are present in device tree.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Harry Ciao <qingtao.cao@windriver.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-09-01 16:00:18 +10:00
Benjamin Herrenschmidt
9bb7361d99 Merge remote-tracking branch 'jwb/next' into next 2011-08-30 15:14:46 +10:00
Mathias Krause
8cf2d2399a i7core_edac: fixed typo in error count calculation
Based on a patch from the PaX Team, found during a clang analysis pass.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: stable@kernel.org [v2.6.35+]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-18 14:07:15 -07:00
Mike Williams
e57b708bea powerpc/4xx: edac: Add comma to fix build error
Commit 4018294b53 broke the ppc4xx_edac driver at
line 210 where the struct member is missing a comma.

Signed-off-by: Mike Williams <mike@mikebwilliams.com>
Signed-off-by: Josh Boyer <jwboyer@gmail.com>
2011-08-11 13:52:03 -04:00
Linus Torvalds
a9f729f0e2 Revert "EDAC: Correct Kconfig dependencies"
This reverts commit af9d220bac.

It turns out that one was meant to be applied on top of the edac.git
tree in -next that has more i7core_edac changes, but that wasn't clear
in the original email.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-11 08:58:41 -07:00
Borislav Petkov
af9d220bac EDAC: Correct Kconfig dependencies
Both AMD and Intel i7 EDAC drivers use MCE features and are thus
dependent of this functionality present in the kernel. Express this in
Kconfig so that randconfig builds don't break.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-08-10 10:57:42 -07:00
Arun Sharma
60063497a9 atomic: use <linux/atomic.h>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:47 -07:00
Kai.Jiang
444d292121 drivers/edac/mpc85xx_edac.c: correct offset_in_page mask bits in edac_mc_handle_ce()
Parameter offset_in_page in edac_mc_handle_ce() should mask the higher
bits above the page size, not the lower bits.  The original input
sometimes causes a crash.

Signed-off-by: Kai.Jiang <Kai.Jiang@freescale.com>
Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Anton Vorontsov <avorontsov@mvista.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:45 -07:00
Joe Perches
28f65c11f2 treewide: Convert uses of struct resource to resource_size(ptr)
Several fixes as well where the +1 was missing.

Done via coccinelle scripts like:

@@
struct resource *ptr;
@@

- ptr->end - ptr->start + 1
+ resource_size(ptr)

and some grep and typing.

Mostly uncompiled, no cross-compilers.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-06-10 14:55:36 +02:00
Lai Jiangshan
e2e7709876 edac,rcu: use synchronize_rcu() instead of call_rcu()+rcu_barrier()
synchronize_rcu() does the stuff as needed.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-26 17:12:37 -07:00
Linus Torvalds
b7c2f03628 Merge branch 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6
* 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
  gfs2: Drop __TIME__ usage
  isdn/diva: Drop __TIME__ usage
  atm: Drop __TIME__ usage
  dlm: Drop __TIME__ usage
  wan/pc300: Drop __TIME__ usage
  parport: Drop __TIME__ usage
  hdlcdrv: Drop __TIME__ usage
  baycom: Drop __TIME__ usage
  pmcraid: Drop __DATE__ usage
  edac: Drop __DATE__ usage
  rio: Drop __DATE__ usage
  scsi/wd33c93: Drop __TIME__ usage
  scsi/in2000: Drop __TIME__ usage
  aacraid: Drop __TIME__ usage
  media/cx231xx: Drop __TIME__ usage
  media/radio-maxiradio: Drop __TIME__ usage
  nozomi: Drop __TIME__ usage
  cyclades: Drop __TIME__ usage
2011-05-26 13:19:00 -07:00
Roland Dreier
dbee8a0aff x86: remove 32-bit versions of readq()/writeq()
The presense of a writeq() implementation on 32-bit x86 that splits the
64-bit write into two 32-bit writes turns out to break the mpt2sas driver
(and in general is risky for drivers as was discussed in
<http://lkml.kernel.org/r/adaab6c1h7c.fsf@cisco.com>).  To fix this,
revert 2c5643b1c5 ("x86: provide readq()/writeq() on 32-bit too") and
follow-on cleanups.

This unfortunately leads to pushing non-atomic definitions of readq() and
write() to various x86-only drivers that in the meantime started using the
definitions in the x86 version of <asm/io.h>.  However as discussed
exhaustively, this is actually the right thing to do, because the right
way to split a 64-bit transaction is hardware dependent and therefore
belongs in the hardware driver (eg mpt2sas needs a spinlock to make sure
no other accesses occur in between the two halves of the access).

Build tested on 32- and 64-bit x86 allmodconfig.

Link: http://lkml.kernel.org/r/x86-32-writeq-is-broken@mdm.bga.com
Acked-by: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Kashyap Desai <Kashyap.Desai@lsi.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Ravi Anand <ravi.anand@qlogic.com>
Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Acked-by: James Bottomley <James.Bottomley@parallels.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25 08:39:44 -07:00
Grant Likely
b1608d69cb drivercore: revert addition of of_match to struct device
Commit b826291c, "drivercore/dt: add a match table pointer to struct
device" added an of_match pointer to struct device to cache the
of_match_table entry discovered at driver match time.  This was unsafe
because matching is not an atomic operation with probing a driver.  If
two or more drivers are attempted to be matched to a driver at the
same time, then the cached matching entry pointer could get
overwritten.

This patch reverts the of_match cache pointer and reworks all users to
call of_match_device() directly instead.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-05-18 12:32:23 -06:00
Borislav Petkov
c1ae68309b amd64_edac: Erratum #637 workaround
F15h CPUs may report a non-DRAM address when reporting an error address
belonging to a CC6 state save area. Add a workaround to detect this
condition and compute the actual DRAM address of the error as documented
in the Revision Guide for AMD Family 15h Models 00h-0Fh Processors.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-04-26 16:18:56 +02:00
Borislav Petkov
f08e457cec amd64_edac: Factor in CC6 save area
F15h and later use a portion of DRAM as a CC6 storage area. BIOS
programs D18F1x[17C:140,7C:40] DRAM Base/Limit accordingly by
subtracting the storage area from the DRAM limit setting. However, in
order for edac to consider that part of DRAM too, we need to include it
into the per-node range.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-04-26 16:18:44 +02:00
Borislav Petkov
f030ddfb37 amd64_edac: Remove node interleave warning
This warning was wrongfully added for a normal condition - intlvsel
actually selects the destination node when node interleaving is enabled
and it is not a mismatch. For a detailed example, see section 2.8.10.2
"Node Interleaving" in F10h BKDG.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-04-26 16:18:12 +02:00
Markus Trippelsdorf
4949603a6f EDAC: Remove debugging output in scrub rate handling
This patch removes superfluous debugging output in the sysfs scrub rate
handler. It also consolidates the error handling in the scrub rate
accessors.

Signed-off-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-04-21 12:44:58 +02:00
Michal Marek
152ba39422 edac: Drop __DATE__ usage
The kernel already prints its build timestamp during boot, no need to
repeat it in random drivers and produce different object files each
time.

Cc: Doug Thompson <dougthompson@xmission.com>
Cc: bluesmoke-devel@lists.sourceforge.net
Cc: linux-edac@vger.kernel.org
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2011-04-19 00:23:22 +02:00
Linus Torvalds
42933bac11 Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6:
  Fix common misspellings
2011-04-07 11:14:49 -07:00
Kumar Gala
a94d7b3506 edac/mpc85xx: Limit setting/clearing of HID1[RFXE] to e500v1/v2 cores
Only the e500v1/v2 cores have HID1[RXFE] so we should attempt to set or
clear this register bit on them.  Otherwise we get crashes like:

NIP: c0579f84 LR: c006d550 CTR: c0579f84
REGS: ef857ec0 TRAP: 0700   Not tainted  (2.6.38.2-00072-gf15ba3c)
MSR: 00021002 <ME,CE>  CR: 22044022  XER: 00000000
TASK = ef8559c0[1] 'swapper' THREAD: ef856000 CPU: 0
GPR00: c006d538 ef857f70 ef8559c0 00000000 00000004 00000000 00000000 00000000
GPR08: c0590000 c30170a8 00000000 c30170a8 00000001 0fffe000 00000000 00000000
GPR16: 00000000 7ffa0e60 00000000 00000000 7ffb0bd8 7ff3b844 c05be000 00000000
GPR24: 00000000 00000000 c05c28b0 c0579fac 00000000 00029002 00000000 c0579f84
NIP [c0579f84] mpc85xx_mc_clear_rfxe+0x0/0x28
LR [c006d550] on_each_cpu+0x34/0x50
Call Trace:
[ef857f70] [c006d538] on_each_cpu+0x1c/0x50 (unreliable)
[ef857f90] [c057a070] mpc85xx_mc_init+0xc4/0xdc
[ef857fa0] [c0001cd4] do_one_initcall+0x34/0x1a8
[ef857fd0] [c055d9d8] kernel_init+0x17c/0x218
[ef857ff0] [c000cda4] kernel_thread+0x4c/0x68
Instruction dump:
40be0018 3c60c052 3863c70c 4be9baad 3be0ffed 4bd7c99d 80010014 7fe3fb78
83e1000c 38210010 7c0803a6 4e800020 <7c11faa6> 54290024 81290008
3d60c06e
Oops: Exception in kernel mode, sig: 4 [#2]
---[ end trace 49ff3b8f93efde1a ]---

Also use the HID1_RFXE define rather than a magic number.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2011-04-04 09:31:35 -05:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Borislav Petkov
a9f0fbe2bb amd64_edac: Fix potential memleak
We check the pointers together but at least one of them could be invalid
due to failed allocation. Since we cannot continue if either of the two
allocations has failed, exit early by freeing them both.

Cc: <stable@kernel.org> # 38.x
Reported-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-29 18:19:06 +02:00
Linus Torvalds
e16b396ce3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits)
  doc: CONFIG_UNEVICTABLE_LRU doesn't exist anymore
  Update cpuset info & webiste for cgroups
  dcdbas: force SMI to happen when expected
  arch/arm/Kconfig: remove one to many l's in the word.
  asm-generic/user.h: Fix spelling in comment
  drm: fix printk typo 'sracth'
  Remove one to many n's in a word
  Documentation/filesystems/romfs.txt: fixing link to genromfs
  drivers:scsi Change printk typo initate -> initiate
  serial, pch uart: Remove duplicate inclusion of linux/pci.h header
  fs/eventpoll.c: fix spelling
  mm: Fix out-of-date comments which refers non-existent functions
  drm: Fix printk typo 'failled'
  coh901318.c: Change initate to initiate.
  mbox-db5500.c Change initate to initiate.
  edac: correct i82975x error-info reported
  edac: correct i82975x mci initialisation
  edac: correct commented info
  fs: update comments to point correct document
  target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c
  ...

Trivial conflict in fs/eventpoll.c (spelling vs addition)
2011-03-18 10:37:40 -07:00
Linus Torvalds
08351fc6a7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (27 commits)
  arch/tile: support newer binutils assembler shift semantics
  arch/tile: fix deadlock bugs in rwlock implementation
  drivers/edac: provide support for tile architecture
  tile on-chip network driver: sync up with latest fixes
  arch/tile: support 4KB page size as well as 64KB
  arch/tile: add some more VMSPLIT options and use consistent naming
  arch/tile: fix some comments and whitespace
  arch/tile: export some additional module symbols
  arch/tile: enhance existing finv_buffer_remote() routine
  arch/tile: fix two bugs in the backtracer code
  arch/tile: use extended assembly to inline __mb_incoherent()
  arch/tile: use a cleaner technique to enable interrupt for cpu_idle()
  arch/tile: sync up with <arch/sim.h> and <arch/sim_def.h> changes
  arch/tile: fix reversed test of strict_strtol() return value
  arch/tile: avoid a simulator warning during bootup
  arch/tile: export <asm/hardwall.h> to userspace
  arch/tile: warn and retry if an IPI is not accepted by the target cpu
  arch/tile: stop disabling INTCTRL_1 interrupts during hypervisor downcalls
  arch/tile: fix __ndelay etc to work better
  arch/tile: bug fix: exec'ed task thought it was still single-stepping
  ...

Fix up trivial conflict in arch/tile/kernel/vmlinux.lds.S (percpu
alignment vs section naming convention fix)
2011-03-17 19:34:12 -07:00
Linus Torvalds
978ca164bd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (38 commits)
  amd64_edac: Fix decode_syndrome types
  amd64_edac: Fix DCT argument type
  amd64_edac: Fix ranges signedness
  amd64_edac: Drop local variable
  amd64_edac: Fix PCI config addressing types
  amd64_edac: Fix DRAM base macros
  amd64_edac: Fix node id signedness
  amd64_edac: Drop redundant declarations
  amd64_edac: Enable driver on F15h
  amd64_edac: Adjust ECC symbol size to F15h
  amd64_edac: Simplify scrubrate setting
  PCI: Rename CPU PCI id define
  amd64_edac: Improve DRAM address mapping
  amd64_edac: Sanitize ->read_dram_ctl_register
  amd64_edac: Adjust sys_addr to chip select conversion routine to F15h
  amd64_edac: Beef up early exit reporting
  amd64_edac: Revamp online spare handling
  amd64_edac: Fix channel interleave removal
  amd64_edac: Correct node interleaving removal
  amd64_edac: Add support for interleaved region swapping
  ...

Fix up trivial conflict in include/linux/pci_ids.h due to
AMD_15H_NB_MISC being renamed as AMD_15H_NB_F3 next to the new
AMD_15H_NB_LINK entry.
2011-03-17 17:21:32 -07:00
Borislav Petkov
d34a6ecd45 amd64_edac: Fix decode_syndrome types
Those should all be unsigned.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:40 +01:00
Borislav Petkov
8c6717510f amd64_edac: Fix DCT argument type
Fix amd64_debug_display_dimm_sizes() arguments order per convention (pvt
is always first). Also, the now second arg denotes the DCT so adjust its
type.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:34 +01:00
Borislav Petkov
e761359a25 amd64_edac: Fix ranges signedness
The dram ranges make sense only as an unsigned type.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:33 +01:00
Borislav Petkov
972ea17ab9 amd64_edac: Drop local variable
Use the macro directly instead

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:32 +01:00
Borislav Petkov
71d2a32e8e amd64_edac: Fix PCI config addressing types
Adjust argument types to the PCI config API's types.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:31 +01:00
Borislav Petkov
151fa71c58 amd64_edac: Fix DRAM base macros
Return unsigned u8 values only.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:30 +01:00
Borislav Petkov
b487c33e55 amd64_edac: Fix node id signedness
A node id can never be negative since we use it as an index into
the DRAM ranges array. This also makes one of the BUG_ON conditions
redundant.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:28 +01:00
Borislav Petkov
d88977a9c4 amd64_edac: Drop redundant declarations
Those were moved to the mce_amd.h header.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:27 +01:00
Borislav Petkov
df71a05324 amd64_edac: Enable driver on F15h
Add the PCI device ids required for driver registration. Remove
pvt->ctl_name and use the family descriptor directly, instead. Then,
bump driver version and fixup its format. Finally, enable DRAM ECC
decoding on F15h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:26 +01:00
Borislav Petkov
a3b7db09a6 amd64_edac: Adjust ECC symbol size to F15h
F15h has the same ECC symbol size options as F10h revD and later so
adjust checks to that. Simplify code a bit.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:26 +01:00
Borislav Petkov
87b3e0e6e4 amd64_edac: Simplify scrubrate setting
Drop per-instance variable and compute min scrubrate dynamically.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:25 +01:00
Borislav Petkov
41d8bfaba7 amd64_edac: Improve DRAM address mapping
Drop static tables which map the bits in F2x80 to a chip select size in
favor of functions doing the mapping with some bit fiddling. Also, add
F15 support.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:24 +01:00
Borislav Petkov
5a5d237169 amd64_edac: Sanitize ->read_dram_ctl_register
This function is relevant for F10h and higher, and it has only one
callsite so drop its function pointer from the low_ops struct.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:23 +01:00
Borislav Petkov
b15f0fcab1 amd64_edac: Adjust sys_addr to chip select conversion routine to F15h
F15h sys_addr to chip select mapping is almost identical to F10h's so
reuse that. Rename functions on that path accordingly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:22 +01:00
Borislav Petkov
355fba6005 amd64_edac: Beef up early exit reporting
Add paranoid checks for the sys address before going off and decoding
it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:22 +01:00
Borislav Petkov
614ec9d853 amd64_edac: Revamp online spare handling
Replace per-DCT macros with smarter ones, drop hack and look for the
spare rank on all chip selects on a channel.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:21 +01:00
Borislav Petkov
5d4b58e84a amd64_edac: Fix channel interleave removal
Remove the channel interleave select bit properly. See
F2x110[DctSelIntLvAddr] for details.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:21 +01:00
Borislav Petkov
e2f79dbdfb amd64_edac: Correct node interleaving removal
When node interleaving is enabled, a subset of the addr[14:12] bits has
to be removed in order to get the normalized DCT address of the DRAM
channel. The actual number of bits to remove is determined by F1x[1,
0][7C:40][IntlvEn]. Do this correctly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:20 +01:00
Borislav Petkov
95b0ef55cd amd64_edac: Add support for interleaved region swapping
On revC3 and revE Fam10h machines and later, non-interleaved graphics
framebuffer memory under the 16G mark can be swapped with a region
located at the bottom of memory so that the GPU can use the interleaved
region and thus two channels. Add support for that.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:20 +01:00
Borislav Petkov
700466249f amd64_edac: Unify get_error_address
The address bits from MC4_STATUS differ only between K8 and the rest so
no need for a per-family method.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:19 +01:00
Borislav Petkov
f192c7b16c amd64_edac: Simplify decoding path
Use the struct mce directly instead of copying from it into a custom
struct err_regs.

No functionality change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:19 +01:00
Borislav Petkov
7d20d14da1 amd64_edac: Adjust channel counting to F15h
The only difference is that F10h used to sport ganged DCTs and F15h
doesn't so adjust the F10h routine and reuse it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:19 +01:00
Borislav Petkov
5980bb9cd8 amd64_edac: Cleanup old defines cruft
Remove unused defines, drop family names from define names.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:18 +01:00
Borislav Petkov
bcd781f46a amd64_edac: Cleanup NBSH cruft
Remove reporting of errors with UC bit set - this is done by the MCE
decoding code anyway and this driver deals with DRAM ECC errors only. UC
(NB uncorrectable error) doesn't necessarily mean it is a DRAM error.
Remove unused macros while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:17 +01:00
Borislav Petkov
a97fa68ec4 amd64_edac: Cleanup NBCFG handling
The fact whether we are chipkill capable or not does not have any
bearing when computing the channel index on a ganged DCT configuration
so remove that. Also, simplify debug statements. Finally, remove old
error injection leftovers, while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:16 +01:00
Borislav Petkov
c9f4f26eae amd64_edac: Cleanup NBCTL code
Remove family names from macro names, drop single bit defines and
comment their meaning instead.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:15 +01:00
Borislav Petkov
78da121e15 amd64_edac: Cleanup DCT Select Low/High code
Shorten macro names, remove family name from macros, fix macro
arguments, shorten debug strings.

No functionality change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:15 +01:00
Borislav Petkov
cb32850744 amd64_edac: Cleanup Dram Configuration registers handling
* Restrict DCT ganged mode check since only Fam10h supports it
* Adjust DRAM type detection for BD since it only supports DDR3
* Remove second and thus unneeded DCLR read in k8_early_channel_count() - we do
  that in read_mc_regs()
* Cleanup comments and remove family names from register macros
* Remove unused defines

There should be no functional change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:14 +01:00
Borislav Petkov
525a1b20a6 amd64_edac: Cleanup DBAM handling
Do not read DBAM regs twice and simplify code around them.

There should be no functional change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:14 +01:00
Borislav Petkov
f678b8ccce amd64_edac: Replace huge bitmasks with a macro
Replace hard to read hex constants with a continuous masks macro.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:14 +01:00
Borislav Petkov
c8e518d567 amd64_edac: Sanitize f10_get_base_addr_offset
This function maps the system address to the normalized DCT address.
Document what the code does for more clarity and wrap insane bitmasks in
a more understandable macro which generates them. Also, reduce number of
arguments passed to the function. Finally, rename this function to what
it actually does.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:13 +01:00
Borislav Petkov
229a7a11ac amd64_edac: Sanitize channel extraction
Cleanup and simplify f10_determine_channel(); make it more readable.
Also drop f10_map_intlv_en_to_shift() in favor of simply counting the
bits in F1x124[DramIntlvEn] which is equivalent.

There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:13 +01:00
Borislav Petkov
11c75eadaf amd64_edac: Cleanup chipselect handling
Add a struct representing the DRAM chip select base/limit register
pairs. Concentrate all CS handling in a single function. Also, add CS
looping macros for cleaner, more readable code. While at it, adjust code
to F15h. Finally, do smaller macro names cleanups (remove family names
from register macros) and debug messages clarification.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:12 +01:00
Borislav Petkov
bc21fa5787 amd64_edac: Cleanup DHAR handling
Adjust to F15h, simplify code, fixup macros.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:12 +01:00
Borislav Petkov
7f19bf755c amd64_edac: Remove DRAM base/limit subfields caching
Add a struct representing the DRAM base/limit range pairs and remove all
cached subfields. Replace them with accessor functions, which actually
saves us some space:

   text    data     bss     dec     hex filename
  14712    1577     336   16625    40f1 drivers/edac/amd64_edac_mod.o.after
  14831    1609     336   16776    4188 drivers/edac/amd64_edac_mod.o.before

Also, it simplifies the code a lot allowing to merge the K8 and F10h
routines.

No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:11 +01:00
Borislav Petkov
b2b0c60543 amd64_edac: Add support for F15h DCT PCI config accesses
F15h "multiplexes" between the configuration space of the two DRAM
controllers by toggling D18F1x10C[DctCfgSel] while F10h has a different
set of registers for DCT0, and DCT1 in extended PCI config space.

Add DCT configuration space accessors per family thus wrapping all the
different access prerequisites. Clean up code while at it, shorten
names.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:11 +01:00
Borislav Petkov
b6a280bb96 EDAC: Shut up sysfs registration debug code
Raise the debug level of these routines so that their output get issued
out only when the highest debug level is selected. Otherwise, don't
pollute driver debug output.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-03-17 14:46:10 +01:00
Chris Metcalf
5c77075548 drivers/edac: provide support for tile architecture
Add tile support for the EDAC driver, which provides unified system
error (memory, PCI, etc.) reporting. For now, the TILEPro port
reports memory correctable error (CE) only.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
2011-03-10 13:30:14 -05:00
Grant Likely
000061245a dt/powerpc: Eliminate users of of_platform_{,un}register_driver
Get rid of old users of of_platform_driver in arch/powerpc.  Most
of_platform_driver users can be converted to use the platform_bus
directly.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-02-28 01:36:39 -07:00
Arvind R
cb60a42269 edac: correct i82975x error-info reported
to edac-core

fix the totally wrong info w.r.t page,row,dimm-label previously reported to
edac-core by i82975x driver

Signed-off-by: Arvind R. <arvino55@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-02-17 16:47:04 +01:00
Arvind R
da95b3d21f edac: correct i82975x mci initialisation
corrected mtype, and added dev_name,scrubmode initialisers
in i82975x struct mem_ctl initialisation

Signed-off-by: Arvind R. <arvino55@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-02-17 16:46:22 +01:00
Arvind R
7ba9957581 edac: correct commented info
wrong comments in i82975x driver corrected

Signed-off-by: Arvind R. <arvino55@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-02-17 16:44:45 +01:00
Jiri Kosina
0a9d59a246 Merge branch 'master' into for-next 2011-02-15 10:24:31 +01:00
Borislav Petkov
4d7963648f amd64_edac: Fix DIMMs per DCTs output
amd64_debug_display_dimm_sizes() reports the distribution of the DIMMs
on each DRAM controller and its chip select sizes. Thus, the last don't
have anything to do with whether we're running in ganged DCT mode or not
- their sizes don't change all of a sudden. Fix that by removing the
ganged-check and dump DCT0's config for DCT1 when in ganged mode since
they're identical.

Reported-and-tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-02-10 14:41:49 +01:00
Arvind R
25527885e3 edac: i82975x author/maintainer email address change
edac-i82975x author/maintainer email address change

Signed-off-by: Arvind R. <arvino55@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-01-24 16:17:51 +01:00
Jesper Juhl
42b16b3fbb Kill off warning: ‘inline’ is not at beginning of declaration
Fix a bunch of
	warning: ‘inline’ is not at beginning of declaration
messages when building a 'make allyesconfig' kernel with -Wextra.

These warnings are trivial to kill, yet rather annoying when building with
-Wextra.
The more we can cut down on pointless crap like this the better (IMHO).

A previous patch to do this for a 'allnoconfig' build has already been
merged. This just takes the cleanup a little further.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-01-19 15:43:08 +01:00
Linus Torvalds
008d23e485 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
  Documentation/trace/events.txt: Remove obsolete sched_signal_send.
  writeback: fix global_dirty_limits comment runtime -> real-time
  ppc: fix comment typo singal -> signal
  drivers: fix comment typo diable -> disable.
  m68k: fix comment typo diable -> disable.
  wireless: comment typo fix diable -> disable.
  media: comment typo fix diable -> disable.
  remove doc for obsolete dynamic-printk kernel-parameter
  remove extraneous 'is' from Documentation/iostats.txt
  Fix spelling milisec -> ms in snd_ps3 module parameter description
  Fix spelling mistakes in comments
  Revert conflicting V4L changes
  i7core_edac: fix typos in comments
  mm/rmap.c: fix comment
  sound, ca0106: Fix assignment to 'channel'.
  hrtimer: fix a typo in comment
  init/Kconfig: fix typo
  anon_inodes: fix wrong function name in comment
  fix comment typos concerning "consistent"
  poll: fix a typo in comment
  ...

Fix up trivial conflicts in:
 - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c)
 - fs/ext4/ext4.h

Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.
2011-01-13 10:05:56 -08:00
Linus Torvalds
128283a47e Merge branch 'mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
* 'mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  EDAC, MCE: Fix NB error formatting
  EDAC, MCE: Use BIT_64() to eliminate warnings on 32-bit
  EDAC, MCE: Enable MCE decoding on F15h
  EDAC, MCE: Allow F15h bank 6 MCE injection
  EDAC, MCE: Shorten error report formatting
  EDAC, MCE: Overhaul error fields extraction macros
  EDAC, MCE: Add F15h FP MCE decoder
  EDAC, MCE: Add F15 EX MCE decoder
  EDAC, MCE: Add an F15h NB MCE decoder
  EDAC, MCE: No F15h LS MCE decoder
  EDAC, MCE: Add F15h CU MCE decoder
  EDAC, MCE: Add F15h IC MCE decoder
  EDAC, MCE: Add F15h DC MCE decoder
  EDAC, MCE: Select extended error code mask
2011-01-07 14:54:03 -08:00
Borislav Petkov
6d5db46687 EDAC, MCE: Fix NB error formatting
Minor formatting fixup since the information which core was associated
with the MCE is not always valid.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:26 +01:00
Randy Dunlap
50adbbd8a8 EDAC, MCE: Use BIT_64() to eliminate warnings on 32-bit
Building for X86_32 produces shift count warnings, so use BIT_64() to
eliminate the warnings.

drivers/edac/mce_amd.c:778: warning: left shift count >= width of type
drivers/edac/mce_amd.c:778: warning: left shift count >= width of type

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: bluesmoke-devel@lists.sourceforge.net
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:25 +01:00
Borislav Petkov
bad11e0318 EDAC, MCE: Enable MCE decoding on F15h
Now that everything is inplace, enable MCE decoding on F15h. Make
initcall routine a bit more readable.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:24 +01:00
Borislav Petkov
1b07ca47ff EDAC, MCE: Allow F15h bank 6 MCE injection
F15h adds a sixth MCE bank: adjust bank number check in the injection
code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:23 +01:00
Borislav Petkov
fa7ae8cc8c EDAC, MCE: Shorten error report formatting
Shorten up MCi_STATUS flags and add BD's new deferred and poison types.
Also, simplify formatting.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:22 +01:00
Borislav Petkov
6245288232 EDAC, MCE: Overhaul error fields extraction macros
Make macro names shorter thus making code shorter and more clear.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:21 +01:00
Borislav Petkov
b8f85c477b EDAC, MCE: Add F15h FP MCE decoder
Add decoder for FP MCEs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:20 +01:00
Borislav Petkov
8259a7e572 EDAC, MCE: Add F15 EX MCE decoder
Integrate the single FIROB signature into an expanded table along with
the new BD MCE types.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:19 +01:00
Borislav Petkov
05cd667d66 EDAC, MCE: Add an F15h NB MCE decoder
by (almost) reusing the F10h one since the signatures are the same.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:18 +01:00
Borislav Petkov
b18434cad1 EDAC, MCE: No F15h LS MCE decoder
F15h BD doesn't generate LS MCEs so warn about it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:17 +01:00
Borislav Petkov
70fdb494aa EDAC, MCE: Add F15h CU MCE decoder
MCE bank 2 is redefined from a BU to a CU (Combined Unit) bank on F15h.
Add a decoder function for CU MCEs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:16 +01:00
Borislav Petkov
86039cd401 EDAC, MCE: Add F15h IC MCE decoder
Add support for decoding F15h IC MCEs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:15 +01:00
Borislav Petkov
25a4f8b059 EDAC, MCE: Add F15h DC MCE decoder
Add a decoder for F15h DC MCEs to support the new types of DC MCEs
introduced by the BD microarchitecture.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:14 +01:00
Borislav Petkov
2be64bfac7 EDAC, MCE: Select extended error code mask
F15h enlarges the extended error code of an MCE to a 5-bit field
(MCi_STATUS[20:16]). Add a mask variable which default 0xf is overridden
on F15h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:54:12 +01:00
Borislav Petkov
a135cef79a amd64_edac: Disable DRAM ECC injection on K8
K8 does not allow for an atomic RMW to a cacheline as F10h does so
disable the error injection interface for it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:38:46 +01:00
Borislav Petkov
390944439f EDAC: Fixup scrubrate manipulation
Make the ->{get|set}_sdram_scrub_rate return the actual scrub rate
bandwidth it succeeded setting and remove superfluous arg pointer used
for that. A negative value returned still means that an error occurred
while setting the scrubrate. Document this for future reference.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:38:31 +01:00
Borislav Petkov
360b7f3c60 amd64_edac: Remove two-stage initialization
Now that all prerequisites are in place, drop the two-stage driver
instances initialization in favor of the following simple init sequence:

1. Probe PCI device: we only test ECC capabilities here and if none exit
early.

2. If the hw supports ECC and it is/can be enabled, we init the per-node
instance.

Remove "amd64_" prefix from static functions touched, while at it.

There actually should be no visible functional change resulting from
this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:34:03 +01:00
Borislav Petkov
2299ef7114 amd64_edac: Check ECC capabilities initially
Rework the code to check the hardware ECC capabilities at PCI probing
time. We do all further initialization only if we actually can/have ECC
enabled.

While at it:
0. Fix function naming.
1. Simplify/clarify debug output.
2. Remove amd64_ prefix from the static functions
3. Reorganize code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:34:02 +01:00
Borislav Petkov
ae7bb7c679 amd64_edac: Carve out ECC-related hw settings
This is in preparation for the init path reorganization where we want
only to

1) test whether a particular node supports ECC
2) can it be enabled

and only then do the necessary allocation/initialization. For that,
we need to decouple the ECC settings of the node from the instance's
descriptor.

The should be no functional change introduced by this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:34:00 +01:00
Borislav Petkov
f1db274e1b amd64_edac: Remove PCI ECS enabling functions
PCI ECS is being enabled by default since 2.6.26 on AMD so this code is
just superfluous now, remove it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:59 +01:00
Borislav Petkov
027dbd6f5d amd64_edac: Remove explicit Kconfig PCI dependency
AMD_NB pulls in the dependency on PCI. Clarify/fix help text while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:58 +01:00
Borislav Petkov
cc4d8860fc amd64_edac: Allocate driver instances dynamically
Remove static allocation in favor of dynamically allocating space for as
many driver instances as northbridges present on the system.

There should be no functional change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:57 +01:00
Borislav Petkov
24f9a7fe3f amd64_edac: Rework printk macros
Add a macro per printk level, shorten up error messages. Add relevant
information to KERN_INFO level. No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:56 +01:00
Borislav Petkov
8d5b5d9c7b amd64_edac: Rename CPU PCI devices
Rename variables representing PCI devices to their BKDG names for faster
search and shorter, clearer code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:54 +01:00
Borislav Petkov
b8cfa02f83 amd64_edac: Concentrate per-family init even more
Move the remaining per-family init code into the proper place and
simplify the rest of the initialization. Reorganize error handling in
amd64_init_one_instance().

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:53 +01:00
Borislav Petkov
bbd0c1f675 amd64_edac: Cleanup the CPU PCI device reservation
Shorten code and clarify comments, return proper -E* values on error.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:52 +01:00
Borislav Petkov
0092b20d4c amd64_edac: Simplify CPU family detection
Concentrate CPU family detection in the per-family init function.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:51 +01:00
Borislav Petkov
395ae783b3 amd64_edac: Add per-family init function
Run a per-family init function which does all the settings based on
the family this driver instance is running on. Move the scrubrate
calculation in it and simplify code.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:50 +01:00
Borislav Petkov
9f56da0e3c amd64_edac: Use cached extended CPU model
... instead of computing it needlessly again.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:49 +01:00
Borislav Petkov
3ab0e7dc2e amd64_edac: Remove F11h support
F11h doesn't support DRAM ECC so whack it away.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2011-01-07 11:33:47 +01:00
Linus Torvalds
42cbd8efb0 Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, cacheinfo: Cleanup L3 cache index disable support
  x86, amd-nb: Cleanup AMD northbridge caching code
  x86, amd-nb: Complete the rename of AMD NB and related code
2011-01-06 10:50:28 -08:00
David Sterba
e7bf068aa3 i7core_edac: fix typos in comments
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-12-28 01:20:51 +01:00
Jiri Kosina
4b7bd36470 Merge branch 'master' into for-next
Conflicts:
	MAINTAINERS
	arch/arm/mach-omap2/pm24xx.c
	drivers/scsi/bfa/bfa_fcpim.c

Needed to update to apply fixes for which the old branch was too
outdated.
2010-12-22 18:57:02 +01:00
Borislav Petkov
e726f3c368 amd64_edac: Fix interleaving check
When matching error address to the range contained by one memory node,
we're in valid range when node interleaving

1. is disabled, or
2. enabled and when the address bits we interleave on match the
interleave selector on this node (see the "Node Interleaving" section in
the BKDG for an enlightening example).

Thus, when we early-exit, we need to reverse the compound logic
statement properly.

Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-12-08 19:52:54 +01:00
Andrei Konovalov
76f04f2591 EDAC: Correct MiB_TO_PAGES() macro
This corrects the misprint introduced when moving '#if
PAGE_SHIFT' from i7core_edac.c to edac_core.h (commit
e9144601d3)

Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Andrei Konovalov <akonovalov@mvista.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-12-08 19:52:53 +01:00
Borislav Petkov
bb31b3122c EDAC: Fix workqueue-related crashes
00740c5854 changed edac_core to
un-/register a workqueue item only if a lowlevel driver supplies a
polling routine. Normally, when we remove a polling low-level driver, we
go and cancel all the queued work. However, the workqueue unreg happens
based on the ->op_state setting, and edac_mc_del_mc() sets this to
OP_OFFLINE _before_ we cancel the work item, leading to NULL ptr oops on
the workqueue list.

Fix it by putting the unreg stuff in proper order.

Cc: <stable@kernel.org> #36.x
Reported-and-tested-by: Tobias Karnat <tobias.karnat@googlemail.com>
LKML-Reference: <1291201307.3029.21.camel@Tobias-Karnat>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-12-08 19:52:27 +01:00
Axel Lin
df4b2a30e0 EDAC, MCE: Fix edac_init_mce_inject error handling
Otherwise, variable i will be -1 inside the latest iteration of the
while loop.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-22 15:35:32 +01:00
Tracey Dent
f570e1dd84 EDAC: Remove deprecated kbuild goal definitions
Change EDAC's Makefile to use <modules>-y instead of
<modules>-objs because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

 [bp: Fixup commit message]
 [bp: Fixup indentation]

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-22 15:35:31 +01:00
Hans Rosenfeld
9653a5c76c x86, amd-nb: Cleanup AMD northbridge caching code
Support more than just the "Misc Control" part of the northbridges.
Support more flags by turning "gart_supported" into a single bit flag
that is stored in a flags member. Clean up related code by using a set
of functions (amd_nb_num(), amd_nb_has_feature() and node_to_amd_nb())
instead of accessing the NB data structures directly. Reorder the
initialization code and put the GART flush words caching in a separate
function.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-18 15:53:05 +01:00
Hans Rosenfeld
eec1d4fa00 x86, amd-nb: Complete the rename of AMD NB and related code
Not only the naming of the files was confusing, it was even more so for
the function and variable names.

Renamed the K8 NB and NUMA stuff that is also used on other AMD
platforms. This also renames the CONFIG_K8_NUMA option to
CONFIG_AMD_NUMA and the related file k8topology_64.c to
amdtopology_64.c. No functional changes intended.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-11-18 15:53:04 +01:00
Uwe Kleine-König
b595076a18 tree-wide: fix comment/printk typos
"gadget", "through", "command", "maintain", "maintain", "controller", "address",
"between", "initiali[zs]e", "instead", "function", "select", "already",
"equal", "access", "management", "hierarchy", "registration", "interest",
"relative", "memory", "offset", "already",

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-11-01 15:38:34 -04:00
Linus Torvalds
da62aa69c1 Merge branch 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: (34 commits)
  i7core_edac: return -ENODEV when devices were already probed
  i7core_edac: properly terminate pci_dev_table
  i7core_edac: Avoid PCI refcount to reach zero on successive load/reload
  i7core_edac: Fix refcount error at PCI devices
  i7core_edac: it is safe to i7core_unregister_mci() when mci=NULL
  i7core_edac: Fix an oops at i7core probe
  i7core_edac: Remove unused member channels in i7core_pvt
  i7core_edac: Remove unused arg csrow from get_dimm_config
  i7core_edac: Reduce args of i7core_register_mci
  i7core_edac: Introduce i7core_unregister_mci
  i7core_edac: Use saved pointers
  i7core_edac: Check probe counter in i7core_remove
  i7core_edac: Call pci_dev_put() when alloc_i7core_dev()  failed
  i7core_edac: Fix error path of i7core_register_mci
  i7core_edac: Fix order of lines in i7core_register_mci
  i7core_edac: Always do get/put for all devices
  i7core_edac: Introduce i7core_pci_ctl_create/release
  i7core_edac: Introduce free_i7core_dev
  i7core_edac: Introduce alloc_i7core_dev
  i7core_edac: Reduce args of i7core_get_onedevice
  ...
2010-10-26 10:13:48 -07:00
Linus Torvalds
229aebb873 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
  Update broken web addresses in arch directory.
  Update broken web addresses in the kernel.
  Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget
  Revert "Fix typo: configuation => configuration" partially
  ida: document IDA_BITMAP_LONGS calculation
  ext2: fix a typo on comment in ext2/inode.c
  drivers/scsi: Remove unnecessary casts of private_data
  drivers/s390: Remove unnecessary casts of private_data
  net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data
  drivers/infiniband: Remove unnecessary casts of private_data
  drivers/gpu/drm: Remove unnecessary casts of private_data
  kernel/pm_qos_params.c: Remove unnecessary casts of private_data
  fs/ecryptfs: Remove unnecessary casts of private_data
  fs/seq_file.c: Remove unnecessary casts of private_data
  arm: uengine.c: remove C99 comments
  arm: scoop.c: remove C99 comments
  Fix typo configue => configure in comments
  Fix typo: configuation => configuration
  Fix typo interrest[ing|ed] => interest[ing|ed]
  Fix various typos of valid in comments
  ...

Fix up trivial conflicts in:
	drivers/char/ipmi/ipmi_si_intf.c
	drivers/usb/gadget/rndis.c
	net/irda/irnet/irnet_ppp.c
2010-10-24 13:41:39 -07:00
Linus Torvalds
8de547e182 Merge branch 'devel' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/edac
* 'devel' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/edac: (25 commits)
  i7300_edac: Properly initialize per-csrow memory size
  V4L/DVB: i7300_edac: better initialize page counts
  MAINTAINERS: Add maintainer for i7300-edac driver
  i7300-edac: CodingStyle cleanup
  i7300_edac: Improve comments
  i7300_edac: Cleanup: reorganize the file contents
  i7300_edac: Properly detect channel on CE errors
  i7300_edac: enrich FBD error info for corrected errors
  i7300_edac: enrich FBD error info for fatal errors
  i7300_edac: pre-allocate a buffer used to prepare err messages
  i7300_edac: Fix MTR x4/x8 detection logic
  i7300_edac: Make the debug messages coherent with the others
  i7300_edac: Cleanup: remove get_error_info logic
  i7300_edac: Add a code to cleanup error registers
  i7300_edac: Add support for reporting FBD errors
  i7300_edac: Properly detect the type of error correction
  i7300_edac: Detect if the device is on single mode
  i7300_edac: Adds detection for enhanced scrub mode on x8
  i7300_edac: Clear the error bit after reading
  i7300_edac: Add error detection code for global errors
  ...
2010-10-24 13:06:57 -07:00
Mauro Carvalho Chehab
76a7bd8113 i7core_edac: return -ENODEV when devices were already probed
Due to the nature of i7core, we need to probe and attach all PCI
devices used by this driver during the first time probe is called.
However, PCI core will call the probe routine one time for each CPU
socket. If we return -EINVAL to those calls, it would seem that the
driver fails, when, in fact, there's no more devices left to initialize.

Changing the return code to -ENODEV solves this issue.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:36:19 -02:00
Mauro Carvalho Chehab
3c52cc57cc i7core_edac: properly terminate pci_dev_table
At pci_xeon_fixup(), it waits for a null-terminated table, while at
i7core_get_all_devices, it just do a for 0..ARRAY_SIZE. As other tables
are zero-terminated, change it to be terminate with 0 as well, and fixes
a bug where it may be running out of the table elements.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:31:50 -02:00
Mauro Carvalho Chehab
a3e1541637 i7core_edac: Avoid PCI refcount to reach zero on successive load/reload
That's a nasty bug that took me a lot of time to track, and whose
solution took just one line to solve. The best fragrances and the worse
poisons are shipped on the smalest bottles.

The drivers/pci/quick.c implements the pci_get_device function. The normal
behavior is that you call it, the function returns you a pdev pointer
and increment pdev->kobj.kref.refcount of the pci device. However,
if you want to keep searching an object, you need to pass the previous
pdev function to the search.

When you use a not null pointer to pdev "from" field, pci_get_device
will decrement pdev->kobj.kref.refcount, assuming that the driver won't
be using the previous pdev.

The solution is simple: we just need to call pci_dev_get() manually,
for the pdev's that the driver will actually use.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:41 -02:00
Mauro Carvalho Chehab
79daef2099 i7core_edac: Fix refcount error at PCI devices
Probably due to a bug or some testing logic at PCI level, device
refcount for <bus>:00.0 device is decremented at the end of the
pci_get_device, made by i7core_get_all_devices(). The fact is that
the first versions of the driver relied on those devices to probe
for Nehalem, but the current versions don't use it at all.

So, let's just remove those devices from the driver, making it simpler
and fixing the bug.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:41 -02:00
Mauro Carvalho Chehab
88ef5ea976 i7core_edac: it is safe to i7core_unregister_mci() when mci=NULL
i7core_unregister_mci() checks internally when mci=NULL. There's no
need to test it outside.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:41 -02:00
Mauro Carvalho Chehab
6d37d240f2 i7core_edac: Fix an oops at i7core probe
changeset c91d57ba9ce5b5c93a7077e2f72510eb1f9131c4 moved the init
of the priv pointer to the end of the probe routine. However, we need
them before that, otherwise, we hit an OOPS:

[   67.743453] EDAC DEBUG: mci_bind_devs: Associated fn 0.0, dev = ffff88011b46e000, socket 0
[   67.751861] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[   67.759685] IP: [<ffffffffa017e484>] i7core_probe+0x979/0x130c [i7core_edac]
[   67.766721] PGD 10bd38067 PUD 10bd37067 PMD 0
[   67.771178] Oops: 0000 [#1] SMP
[   67.774414] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
[   67.782213] CPU 1
[   67.784042] Modules linked in: i7core_edac(+) edac_core cpufreq_ondemand binfmt_misc dm_multipath video output pci_slot snd_hda_codd

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:41 -02:00
Hidetoshi Seto
21b6806a8c i7core_edac: Remove unused member channels in i7core_pvt
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:41 -02:00
Hidetoshi Seto
2e5185f7ff i7core_edac: Remove unused arg csrow from get_dimm_config
A local is enough.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:41 -02:00
Hidetoshi Seto
aace42831a i7core_edac: Reduce args of i7core_register_mci
We can check the number of channels in i7core_register_mci.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
1c6edbbe25 i7core_edac: Introduce i7core_unregister_mci
In i7core_probe, when setup of mci for 2nd or later socket failed,
we should cleanup prepared mci for 1st socket or so before "put" of
all devices.

So let have i7core_unregister_mci that can be shared between here
and i7core_remove.

While here fix a typo "hanler".

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
73589c80cd i7core_edac: Use saved pointers
We already have saved pointers.  Use shorter ones.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
71fe01706d i7core_edac: Check probe counter in i7core_remove
Prevent i7core_remove from running multiple times.
Otherwise value proved will be negative and something will be wrong.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
2896637b86 i7core_edac: Call pci_dev_put() when alloc_i7core_dev() failed
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
628c5ddfb0 i7core_edac: Fix error path of i7core_register_mci
Release resources properly.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:40 -02:00
Hidetoshi Seto
5939813b9c i7core_edac: Fix order of lines in i7core_register_mci
The flag is_registered is not initialized until mci_bind_devs()
is called.  Refer it properly.

The mci->dev and mci->edac_check is required in edac_mc_add_mc(),
so prepare them just before the call.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:39 -02:00
Hidetoshi Seto
64c10f6e0e i7core_edac: Always do get/put for all devices
We already do 'get' for all sockets at once. So do 'put' in the
same way.

And let args of the 'get' function to void since it handles
only the single, static and known size table pci_dev_table[].

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:39 -02:00
Hidetoshi Seto
a3aa0a4ab5 i7core_edac: Introduce i7core_pci_ctl_create/release
Have a couple of method.
while here sort out lines in the i7core_register_mci() a bit.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:39 -02:00
Hidetoshi Seto
2aa9be448d i7core_edac: Introduce free_i7core_dev
Have a method to make a couple with alloc_i7core_dev() previously
introduced.  Using in pair will help proper resource handling.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:39 -02:00
Hidetoshi Seto
848b2f7ed6 i7core_edac: Introduce alloc_i7core_dev
It's nice to have a method for a single purpose.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Hidetoshi Seto
b197cba071 i7core_edac: Reduce args of i7core_get_onedevice
Since we need to pass the index of the entry, pass the table itself
instead of passing individual members of the table.

While here make it static.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Hidetoshi Seto
45b7c981ae i7core_edac: Fix the logic in i7core_remove()
commit 47251b4d960bdfa648b0d06dbc6d445f41cb3906 have changed
the logic for unexplained reasons.  It looks strange that it
can release i7core_dev without calling i7core_put_devices()
that releases i7core_dev->pdev.

Fix the part.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab
54a08ab153 i7core_edac: Don't do the legacy PCI probe by default
The legacy PCI probe sometimes cause hangs. Better to have it
disabled by default, and have a parameter to enable it.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab
accf74fff3 i7core_edac: don't use a freed mci struct
This is a nasty bug. Since kobject count will be reduced by zero by
edac_mc_del_mc(), and this triggers the kobj release method, the
mci memory will be freed automatically. So, all we have left is ctl_name,
as shown by enabling debug:

[   80.822186] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device()  remove_link
[   80.832590] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device()  remove_mci_instance
[   80.843776] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
[   80.855163] EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:3f:03.0
[   80.862936] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2089: (null): free structs
[   80.871134] EDAC DEBUG: in drivers/edac/edac_mc.c, line at 238: edac_mc_free()
[   80.878379] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 726: edac_mc_unregister_sysfs_main_kobj()
[   80.888043] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1232: drivers/edac/i7core_edac.c: i7core_put_devices()

Also, kfree(mci) shouldn't happen at the kobj.release, as it happens
when edac_remove_sysfs_mci_device() is called, but the logic is:
	edac_remove_sysfs_mci_device(mci);
	edac_printk(KERN_INFO, EDAC_MC,
		"Removed device %d for %s %s: DEV %s\n", mci->mc_idx,
		mci->mod_name, mci->ctl_name, edac_dev_name(mci));
So, as the edac_printk() needs the mci struct, this generates an OOPS.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab
bbc560ae67 edac_core: Print debug messages at release calls
This is important to track a nasty bug at the free logic.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:38 -02:00
Mauro Carvalho Chehab
ac99768c53 edac_core: Don't let free(mci) happen while using it
A very nasty bug were happening on edac core, due to the way mci objects are
freed. mci memory is freed when kobject count reaches zero, by
edac_mci_control_release(). However, from the logs, this is clearly happening
before the final usage of mci struct:

[15799.607454] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
[15799.618773] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 769: edac_inst_grp_release()
[15799.627326] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 894: edac_remove_mci_instance_attributes() end of seeking for group all_channel_counts
[15799.640887] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 877: edac_remove_mci_instance_attributes() sysfs_attrib = ffffffffa01d7240
[15799.653412] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device()  remove_link
[15799.663753] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device()  remove_mci_instance

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
6fe1108f14 edac_core: Do a better job with node removal
Make sure we remove groups at the right order

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
39300e7143 i7core_edac: explicitly remove PCI devices from the devices list
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
41ba6c1058 i7core_edac: MCE NMI handling should stop first
Otherwise, a NMI may happen causing a race condition and a panic.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
6ee7dd5044 i7core_edac: Initialize all priv vars before start polling
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
3cfd01468b i7core_edac: Improve debug to seek for register/remove errors
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:37 -02:00
Mauro Carvalho Chehab
e9144601d3 i7core_edac: move #if PAGE_SHIFT to edac_core.h
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:36 -02:00
Mauro Carvalho Chehab
1288c18f48 i7core_edac: Properly mark const static vars as such
There are two groups of sysfs attributes: one for rdimm and another
for udimm. Instead of changing dynamically the unique static struct
for handling udimm's, declare two vars and make them constant.

This avoids the risk of having two or more memory controllers, each
needing a different set of attributes.

While here, use const on all places where it is applicable.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>

edac_core: use const for constant sysfs arguments

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:14 -02:00
Mauro Carvalho Chehab
18c29002f9 i7core_edac: move static vars to the beginning of the file
While here, don't initialize probed with 0.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:12 -02:00
Mauro Carvalho Chehab
939747bd68 i7core_edac: Be sure that the edac pci handler will be properly released
With multi-sockets, more than one edac pci handler is enabled. Be sure to
un-register all instances.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
2010-10-24 11:20:12 -02:00
Linus Torvalds
c029e405bd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: (21 commits)
  EDAC, MCE: Fix shift warning on 32-bit
  EDAC, MCE: Add a BIT_64() macro
  EDAC, MCE: Enable MCE decoding on F12h
  EDAC, MCE: Add F12h NB MCE decoder
  EDAC, MCE: Add F12h IC MCE decoder
  EDAC, MCE: Add F12h DC MCE decoder
  EDAC, MCE: Add support for F11h MCEs
  EDAC, MCE: Enable MCE decoding on F14h
  EDAC, MCE: Fix FR MCEs decoding
  EDAC, MCE: Complete NB MCE decoders
  EDAC, MCE: Warn about LS MCEs on F14h
  EDAC, MCE: Adjust IC decoders to F14h
  EDAC, MCE: Adjust DC decoders to F14h
  EDAC, MCE: Rename files
  EDAC, MCE: Rework MCE injection
  EDAC: Export edac sysfs class to users.
  EDAC, MCE: Pass complete MCE info to decoders
  EDAC, MCE: Sanitize error codes
  EDAC, MCE: Remove unused function parameter
  EDAC, MCE: Add HW_ERR prefix
  ...
2010-10-21 14:04:58 -07:00
Linus Torvalds
2f0384e5fc Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, amd_nb: Enable GART support for AMD family 0x15 CPUs
  x86, amd: Use compute unit information to determine thread siblings
  x86, amd: Extract compute unit information for AMD CPUs
  x86, amd: Add support for CPUID topology extension of AMD CPUs
  x86, nmi: Support NMI watchdog on newer AMD CPU families
  x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs
  x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB
  x86, k8-gart: Decouple handling of garts and northbridges
  x86, cacheinfo: Fix dependency of AMD L3 CID
  x86, kvm: add new AMD SVM feature bits
  x86, cpu: Fix allowed CPUID bits for KVM guests
  x86, cpu: Update AMD CPUID feature bits
  x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit
  x86, AMD: Remove needless CPU family check (for L3 cache info)
  x86, tsc: Remove CPU frequency calibration on AMD
2010-10-21 13:01:08 -07:00
Borislav Petkov
525906bc89 EDAC, MCE: Fix shift warning on 32-bit
Fix

drivers/edac/mce_amd.c:262: warning: left shift count >= width of type

on 32-bit builds.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:07 +02:00
Borislav Petkov
cf1d2200db EDAC, MCE: Add a BIT_64() macro
Add a macro for 64-bit vectors to use when accessing MSR contents.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:06 +02:00
Borislav Petkov
fda7561f43 EDAC, MCE: Enable MCE decoding on F12h
Turn on MCE decoding on F12h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:06 +02:00
Borislav Petkov
cb9d5ecdff EDAC, MCE: Add F12h NB MCE decoder
F12h is completely covered by the generic path.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:05 +02:00
Borislav Petkov
e7281eb37d EDAC, MCE: Add F12h IC MCE decoder
... which is the same as for K8 and F10h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:05 +02:00
Borislav Petkov
9be0bb1072 EDAC, MCE: Add F12h DC MCE decoder
F12h DC MCE signatures are a subset of F10h's so reuse them.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:04 +02:00
Borislav Petkov
f0157b3afd EDAC, MCE: Add support for F11h MCEs
F11h has almost the same MCE signatures as K8 except DRAM ECC and MC5
bank errors. Reuse functionality from the other families.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:04 +02:00
Borislav Petkov
9530d608ef EDAC, MCE: Enable MCE decoding on F14h
Now that all decoders have been taught about F14h, models < 0x10
MCEs, enable decoding on this family of CPUs. Also, issue a short
informational message upon boot that MCE decoding gets enabled.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:03 +02:00
Borislav Petkov
fe4ea2623b EDAC, MCE: Fix FR MCEs decoding
Those are N/A on K8, so don't decode them there.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:03 +02:00
Borislav Petkov
5ce88f6ea6 EDAC, MCE: Complete NB MCE decoders
Add support for decoding F14h BU MCEs and improve decoding of the
remaining families.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:02 +02:00
Borislav Petkov
ded5062328 EDAC, MCE: Warn about LS MCEs on F14h
F14h CPUs do not generate LS MCEs so exit early and warn the user in
case this path is ever hit that something else might be going haywire.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:02 +02:00
Borislav Petkov
dd53bce4e8 EDAC, MCE: Adjust IC decoders to F14h
Add support for IC MCEs for F14h CPUs. K8 and F10h are almost identical
so use one function for both.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:01 +02:00
Borislav Petkov
888ab8e6eb EDAC, MCE: Adjust DC decoders to F14h
Add a per-family data cache decoders. Since there is a certain overlap
between the different DC MCE signatures, reuse functionality between the
families as far as possible.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:00 +02:00
Borislav Petkov
47ca08a40b EDAC, MCE: Rename files
Drop "edac_" string from the filenames since they're prefixed with edac/
in their pathname anyway.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:48:00 +02:00
Borislav Petkov
9cdeb404a1 EDAC, MCE: Rework MCE injection
Add sysfs injection facilities for testing of the MCE decoding code.
Remove large parts of amd64_edac_dbg.c, as a result, which did only
NB MCE injection anyway and the new injection code supports that
functionality already.

Add an injection module so that MCE decoding code in production kernels
like those in RHEL and SLES can be tested.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:59 +02:00
Borislav Petkov
30e1f7a812 EDAC: Export edac sysfs class to users.
Move toplevel sysfs class to the stub and make it available to
non-modularized code too. Add proper refcounting of its users and move
the registration functionality into the reference counting routines.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:59 +02:00
Borislav Petkov
7cfd4a8744 EDAC, MCE: Pass complete MCE info to decoders
... instead of the MCi_STATUS info only for improved handling of certain
types of errors later.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:58 +02:00
Borislav Petkov
6337583d7d EDAC, MCE: Sanitize error codes
Clean up error codes names, shorten to mnemonics, add RRRR boundary
checking.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:58 +02:00
Borislav Petkov
0ee8efa8f4 EDAC, MCE: Remove unused function parameter
Remove remains from previous functionality.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:57 +02:00
Borislav Petkov
c9f281fd96 EDAC, MCE: Add HW_ERR prefix
.. so that the user knows what she's looking at there in dmesg. Also,
fix a minor cosmetic output inconsistency.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:57 +02:00
Borislav Petkov
ca755e0a49 EDAC: Fix error return
We should return a negative value when we cannot get the toplevel edac
sysfs class.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-10-21 14:47:56 +02:00
Justin P. Mattock
631dd1a885 Update broken web addresses in the kernel.
The patch below updates broken web addresses in the kernel

Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Finn Thain <fthain@telegraphics.com.au>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Dimitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Acked-by: Ben Pfaff <blp@cs.stanford.edu>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Reviewed-by: Finn Thain <fthain@telegraphics.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2010-10-18 11:03:14 +02:00
Marcin Slusarz
64aab720bd i7core_edac: fix panic in udimm sysfs attributes registration
Array of udimm sysfs attributes was not ended with NULL marker, leading to
dereference of random memory.

  EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm0
  EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm1
  EDAC DEBUG: edac_create_mci_instance_attributes: edac_create_mci_instance_attributes() file udimm2
  BUG: unable to handle kernel NULL pointer dereference at 00000000000001a4
  IP: [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
  Pid: 1, comm: swapper Not tainted 2.6.36-rc3-nv+ #483 P6T SE/System Product Name
  RIP: 0010:[<ffffffff81330b36>]  [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
  (...)
  Call Trace:
   [<ffffffff81330b86>] edac_create_mci_instance_attributes+0x198/0x1f1
   [<ffffffff81330c9a>] edac_create_sysfs_mci_device+0xbb/0x2b2
   [<ffffffff8132f533>] edac_mc_add_mc+0x46b/0x557
   [<ffffffff81428901>] i7core_probe+0xccf/0xec0
  RIP  [<ffffffff81330b36>] edac_create_mci_instance_attributes+0x148/0x1f1
  ---[ end trace 20de320855b81d78 ]---
  Kernel panic - not syncing: Attempted to kill init!

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-01 10:50:58 -07:00
Borislav Petkov
00740c5854 amd64_edac: Fix driver module removal
f4347553b3 removed the edac polling
mechanism in favor of using a notifier chain for conveying MCE
information to edac. However, the module removal path didn't test
whether the driver had setup the polling function workqueue at all and
the rmmod process was hanging in the kernel at try_to_del_timer_sync()
in the cancel_delayed_work() path, trying to cancel an uninitialized
work struct.

Fix that by adding a balancing check to the workqueue removal path.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2010-09-27 12:52:58 +02:00