Due to the nature of i7core, we need to probe and attach all PCI
devices used by this driver during the first time probe is called.
However, PCI core will call the probe routine one time for each CPU
socket. If we return -EINVAL to those calls, it would seem that the
driver fails, when, in fact, there's no more devices left to initialize.
Changing the return code to -ENODEV solves this issue.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
At pci_xeon_fixup(), it waits for a null-terminated table, while at
i7core_get_all_devices, it just do a for 0..ARRAY_SIZE. As other tables
are zero-terminated, change it to be terminate with 0 as well, and fixes
a bug where it may be running out of the table elements.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
That's a nasty bug that took me a lot of time to track, and whose
solution took just one line to solve. The best fragrances and the worse
poisons are shipped on the smalest bottles.
The drivers/pci/quick.c implements the pci_get_device function. The normal
behavior is that you call it, the function returns you a pdev pointer
and increment pdev->kobj.kref.refcount of the pci device. However,
if you want to keep searching an object, you need to pass the previous
pdev function to the search.
When you use a not null pointer to pdev "from" field, pci_get_device
will decrement pdev->kobj.kref.refcount, assuming that the driver won't
be using the previous pdev.
The solution is simple: we just need to call pci_dev_get() manually,
for the pdev's that the driver will actually use.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Probably due to a bug or some testing logic at PCI level, device
refcount for <bus>:00.0 device is decremented at the end of the
pci_get_device, made by i7core_get_all_devices(). The fact is that
the first versions of the driver relied on those devices to probe
for Nehalem, but the current versions don't use it at all.
So, let's just remove those devices from the driver, making it simpler
and fixing the bug.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
changeset c91d57ba9ce5b5c93a7077e2f72510eb1f9131c4 moved the init
of the priv pointer to the end of the probe routine. However, we need
them before that, otherwise, we hit an OOPS:
[ 67.743453] EDAC DEBUG: mci_bind_devs: Associated fn 0.0, dev = ffff88011b46e000, socket 0
[ 67.751861] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 67.759685] IP: [<ffffffffa017e484>] i7core_probe+0x979/0x130c [i7core_edac]
[ 67.766721] PGD 10bd38067 PUD 10bd37067 PMD 0
[ 67.771178] Oops: 0000 [#1] SMP
[ 67.774414] last sysfs file: /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map
[ 67.782213] CPU 1
[ 67.784042] Modules linked in: i7core_edac(+) edac_core cpufreq_ondemand binfmt_misc dm_multipath video output pci_slot snd_hda_codd
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
We can check the number of channels in i7core_register_mci.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
In i7core_probe, when setup of mci for 2nd or later socket failed,
we should cleanup prepared mci for 1st socket or so before "put" of
all devices.
So let have i7core_unregister_mci that can be shared between here
and i7core_remove.
While here fix a typo "hanler".
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Prevent i7core_remove from running multiple times.
Otherwise value proved will be negative and something will be wrong.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The flag is_registered is not initialized until mci_bind_devs()
is called. Refer it properly.
The mci->dev and mci->edac_check is required in edac_mc_add_mc(),
so prepare them just before the call.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
We already do 'get' for all sockets at once. So do 'put' in the
same way.
And let args of the 'get' function to void since it handles
only the single, static and known size table pci_dev_table[].
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Have a couple of method.
while here sort out lines in the i7core_register_mci() a bit.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Have a method to make a couple with alloc_i7core_dev() previously
introduced. Using in pair will help proper resource handling.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
It's nice to have a method for a single purpose.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Since we need to pass the index of the entry, pass the table itself
instead of passing individual members of the table.
While here make it static.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
commit 47251b4d960bdfa648b0d06dbc6d445f41cb3906 have changed
the logic for unexplained reasons. It looks strange that it
can release i7core_dev without calling i7core_put_devices()
that releases i7core_dev->pdev.
Fix the part.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The legacy PCI probe sometimes cause hangs. Better to have it
disabled by default, and have a parameter to enable it.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This is a nasty bug. Since kobject count will be reduced by zero by
edac_mc_del_mc(), and this triggers the kobj release method, the
mci memory will be freed automatically. So, all we have left is ctl_name,
as shown by enabling debug:
[ 80.822186] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device() remove_link
[ 80.832590] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device() remove_mci_instance
[ 80.843776] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
[ 80.855163] EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:3f:03.0
[ 80.862936] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2089: (null): free structs
[ 80.871134] EDAC DEBUG: in drivers/edac/edac_mc.c, line at 238: edac_mc_free()
[ 80.878379] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 726: edac_mc_unregister_sysfs_main_kobj()
[ 80.888043] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1232: drivers/edac/i7core_edac.c: i7core_put_devices()
Also, kfree(mci) shouldn't happen at the kobj.release, as it happens
when edac_remove_sysfs_mci_device() is called, but the logic is:
edac_remove_sysfs_mci_device(mci);
edac_printk(KERN_INFO, EDAC_MC,
"Removed device %d for %s %s: DEV %s\n", mci->mc_idx,
mci->mod_name, mci->ctl_name, edac_dev_name(mci));
So, as the edac_printk() needs the mci struct, this generates an OOPS.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
There are two groups of sysfs attributes: one for rdimm and another
for udimm. Instead of changing dynamically the unique static struct
for handling udimm's, declare two vars and make them constant.
This avoids the risk of having two or more memory controllers, each
needing a different set of attributes.
While here, use const on all places where it is applicable.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
edac_core: use const for constant sysfs arguments
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
With multi-sockets, more than one edac pci handler is enabled. Be sure to
un-register all instances.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Don't print failure to detect Core i7 EDAC facilities to the console at
boot time, most often occurring on Core i7 desktops and laptops.
Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As Nehalem/Nehalem-EP/Westmere devices uses several devices for the same
functionality (memory controller), the default way of proping devices doesn't
work. So, instead of a per-device probe, all devices should be probed at once.
This means that we should block any new attempt of probe, otherwise, it will
try to register the same device several times.
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
On Nehalem/Nehalem-EP/Westmere, the first QPI device is the last PCI bus.
The last bus is generally at 0x3f or 0xff, but there are also other systems
using different setups. For example, HP Z800 has 0x7f as the last bus.
This patch adds a logic to discover the last bus, dynamically detecting it
at runtime.
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This adds new PCI IDs for the Westmere's memory controller
devices and modifies the i7core_edac driver to be able to
probe both Nehalem and Westmere processors.
Signed-off-by: Vernon Mauery <vernux@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
This fixes an error in function i7core_check_error
In commit ca9c90ba09 which converts the
driver to use double buffering, there is a change in the logic. Before,
if mce_count was zero, it skipped over a couple of statements and
finished out with a call to the *check_mc_ecc_err function. The current
code checks to see if mce_count is 0 and then exits.
This change reverts the behavior back to the original where if there are
no errors to report, we skip to the end and call the *check_mc_ecc_err
function.
This fix allows the driver to work again on my Nehalem based blades
again.
Signed-off-by: Vernon Mauery <vernux@us.ibm.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
It's called only from an __init function and is the only user
of pcibios_scan_specific_bus which will be marked as __devinit in
the next patch.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Fix build warning (missing header file) and
build error when CONFIG_SMP=n.
drivers/edac/i7core_edac.c:860: error: implicit declaration of function 'msleep'
drivers/edac/i7core_edac.c:1700: error: 'struct cpuinfo_x86' has no member named 'phys_proc_id'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>