* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, efi: Ensure that the entirity of a region is mapped
x86, efi: Pass a minimal map to SetVirtualAddressMap()
x86, efi: Merge contiguous memory regions of the same type and attribute
x86, efi: Consolidate EFI nx control
x86, efi: Remove virtual-mode SetVirtualAddressMap call
* 'x86-gart-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, gart: Don't enforce GART aperture lower-bound by alignment
* 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Don't unmask disabled irqs when migrating them
x86: Skip migrating IRQF_PER_CPU irqs in fixup_irqs()
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, mce: Drop the default decoding notifier
x86, MCE: Do not taint when handling correctable errors
This patch fixes a bug reported by a customer, who found
that many unreasonable error interrupts reported on all
non-boot CPUs (APs) during the system boot stage.
According to Chapter 10 of Intel Software Developer Manual
Volume 3A, Local APIC may signal an illegal vector error when
an LVT entry is set as an illegal vector value (0~15) under
FIXED delivery mode (bits 8-11 is 0), regardless of whether
the mask bit is set or an interrupt actually happen. These
errors are seen as error interrupts.
The initial value of thermal LVT entries on all APs always reads
0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
sequence to them and LVT registers are reset to 0s except for
the mask bits which are set to 1s when APs receive INIT IPI.
When the BIOS takes over the thermal throttling interrupt,
the LVT thermal deliver mode should be SMI and it is required
from the kernel to keep AP's LVT thermal monitoring register
programmed as such as well.
This issue happens when BIOS does not take over thermal throttling
interrupt, AP's LVT thermal monitor register will be restored to
0x10000 which means vector 0 and fixed deliver mode, so all APs will
signal illegal vector error interrupts.
This patch check if interrupt delivery mode is not fixed mode before
restoring AP's LVT thermal monitor register.
Signed-off-by: Youquan Song <youquan.song@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yong Wang <yong.y.wang@intel.com>
Cc: hpa@linux.intel.com
Cc: joe@perches.com
Cc: jbaron@redhat.com
Cc: trenn@suse.de
Cc: kent.liu@intel.com
Cc: chaohong.guo@intel.com
Cc: <stable@kernel.org> # As far back as possible
Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
b may be added to a list, but is not removed before being freed
in the case of an error. This is done in the corresponding
deallocation function, so the code here has been changed to
follow that.
The sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression E,E1,E2;
identifier l;
@@
*list_add(&E->l,E1);
... when != E1
when != list_del(&E->l)
when != list_del_init(&E->l)
when != E = E2
*kfree(E);// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/1305294731-12127-1-git-send-email-julia@diku.dk
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The default notifier doesn't make a lot of sense to call in the
correctable errors case. Drop it and emit the mcelog decoding
hint only in the uncorrectable errors case and when no notifier
is registered. Also, limit issuing the "mcelog --ascii" message
in the rare case when we dump unreported CEs before panicking.
While at it, remove unused old x86_mce_decode_callback from the
header.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Nagananda Chumbalkar <Nagananda.Chumbalkar@hp.com>
Cc: Russ Anderson <rja@sgi.com>
Link: http://lkml.kernel.org/r/20110420102349.GB1361@aftab
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Correctable errors are considered something rather normal on
modern hardware these days. Even more importantly, correctable
errors mean exactly that - they've been corrected by the
hardware - and there's no need to taint the kernel since
execution hasn't been compromised so far.
Also, drop tainting in the thermal throttling code for a similar
reason: crossing a thermal threshold does not mean corruption.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Nagananda Chumbalkar <Nagananda.Chumbalkar@hp.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1303135222-17118-1-git-send-email-bp@amd64.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The MCE subsystem needs to sample an RCU-protected index outside of
any protection for that index. If this was a pointer, we would use
rcu_access_pointer(), but there is no corresponding rcu_access_index().
This commit therefore creates an rcu_access_index() and applies it
to MCE.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Zdenek Kabelac <zkabelac@redhat.com>
It is more effective to use a segment prefix instead of calculating the
address of the current cpu area amd then testing flags.
Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
* 'syscore' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
Introduce ARCH_NO_SYSDEV_OPS config option (v2)
cpufreq: Use syscore_ops for boot CPU suspend/resume (v2)
KVM: Use syscore_ops instead of sysdev class and sysdev
PCI / Intel IOMMU: Use syscore_ops instead of sysdev class and sysdev
timekeeping: Use syscore_ops instead of sysdev class and sysdev
x86: Use syscore_ops instead of sysdev classes and sysdevs
Some subsystems in the x86 tree need to carry out suspend/resume and
shutdown operations with one CPU on-line and interrupts disabled and
they define sysdev classes and sysdevs or sysdev drivers for this
purpose. This leads to unnecessarily complicated code and excessive
memory usage, so switch them to using struct syscore_ops objects for
this purpose instead.
Generally, there are three categories of subsystems that use
sysdevs for implementing PM operations: (1) subsystems whose
suspend/resume callbacks ignore their arguments entirely (the
majority), (2) subsystems whose suspend/resume callbacks use their
struct sys_device argument, but don't really need to do that,
because they can be implemented differently in an arguably simpler
way (io_apic.c), and (3) subsystems whose suspend/resume callbacks
use their struct sys_device argument, but the value of that argument
is always the same and could be ignored (microcode_core.c). In all
of these cases the subsystems in question may be readily converted to
using struct syscore_ops objects for power management and shutdown.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
APEI ERST firmware interface and implementation has no multiple users
in mind. For example, if there is four records in storage with ID: 1,
2, 3 and 4, if two ERST readers enumerate the records via
GET_NEXT_RECORD_ID as follow,
reader 1 reader 2
1
2
3
4
-1
-1
where -1 signals there is no more record ID.
Reader 1 has no chance to check record 2 and 4, while reader 2 has no
chance to check record 1 and 3. And any other GET_NEXT_RECORD_ID will
return -1, that is, other readers will has no chance to check any
record even they are not cleared by anyone.
This makes raw GET_NEXT_RECORD_ID not suitable for used by multiple
users.
To solve the issue, an in-memory ERST record ID cache is designed and
implemented. When enumerating record ID, the ID returned by
GET_NEXT_RECORD_ID is added into cache in addition to be returned to
caller. So other readers can check the cache to get all record ID
available.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
They were generated by 'codespell' and then manually reviewed.
Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: trivial@kernel.org
LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
cpu_info is already with per_cpu, We can take llc_shared_map out
of cpu_info, and declare it as per_cpu variable directly.
So later referencing could be simple and directly instead of
diving to find cpu_info at first.
Also could make smp_store_cpu_info() much simple to avoid to do
save and restore trick.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Alok N Kataria <akataria@vmware.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Hans J. Koch <hjk@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <4D3A16E8.5020608@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In therm_throt.c, commit
9e76a97efd patch doesn't export
the symbol platform_thermal_notify.
Other drivers (e.g. drivers/hwmon/coretemp.c) can not find the
symbol platform_thermal_notify when defining threshould
interrupt handler.
Please apply this patch to allow threshold interrupt handler in
coretemp.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Cc: R Durgadoss <durgadoss.r@intel.com>
Cc: khali@linux-fr.org <khali@linux-fr.org>
Cc: lm-sensors@lm-sensors.org <lm-sensors@lm-sensors.org>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
LKML-Reference: <20110121041239.GB26954@linux-os.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
gameport: use this_cpu_read instead of lookup
x86: udelay: Use this_cpu_read to avoid address calculation
x86: Use this_cpu_inc_return for nmi counter
x86: Replace uses of current_cpu_data with this_cpu ops
x86: Use this_cpu_ops to optimize code
vmstat: User per cpu atomics to avoid interrupt disable / enable
irq_work: Use per cpu atomics instead of regular atomics
cpuops: Use cmpxchg for xchg to avoid lock semantics
x86: this_cpu_cmpxchg and this_cpu_xchg operations
percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
percpu,x86: relocate this_cpu_add_return() and friends
connector: Use this_cpu operations
xen: Use this_cpu_inc_return
taskstats: Use this_cpu_ops
random: Use this_cpu_inc_return
fs: Use this_cpu_inc_return in buffer.c
highmem: Use this_cpu_xx_return() operations
vmstat: Use this_cpu_inc_return for vm statistics
x86: Support for this_cpu_add, sub, dec, inc_return
percpu: Generic support for this_cpu_add, sub, dec, inc_return
...
Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
as per Tejun.
With priorities in place and no one really understanding the difference between
DIE_NMI and DIE_NMI_IPI, just remove DIE_NMI_IPI and convert everyone to DIE_NMI.
This also simplifies default_do_nmi() a little bit. Instead of calling the
die_notifier in both the if and else part, just pull it out and call it before
the if-statement. This has the side benefit of avoiding a call to the ioport
to see if there is an external NMI sitting around until after the (more frequent)
internal NMIs are dealt with.
Patch-Inspired-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-5-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In order to consolidate the NMI die_chain events, we need to setup the priorities
for the die notifiers.
I started by defining a bunch of common priorities that can be used by the
notifier blocks. Then I modified the notifier blocks to use the newly created
priorities.
Now that the priorities are straightened out, it should be easier to remove the
event DIE_NMI_IPI.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1294348732-15030-4-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resume
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-64, asm: Use fxsaveq/fxrestorq in more places
* 'x86-hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, hwmon: Add core threshold notification to therm_throt.c
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, paravirt: Use native_halt on a halt, not native_safe_halt
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
locking, lockdep: Convert sprintf_symbol to %pS
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
irq: Better struct irqaction layout
This patch adds code to therm_throt.c to notify core thermal threshold
events. These thresholds are supported by the IA32_THERM_INTERRUPT register.
The status/log for the same is monitored using the IA32_THERM_STATUS register.
The necessary #defines are in msr-index.h. A call back is added to mce.h, to
further notify the thermal stack, about the threshold events.
Signed-off-by: Durgadoss R <durgadoss.r@intel.com>
LKML-Reference: <D6D887BA8C9DFF48B5233887EF04654105C1251710@bgsmsx502.gar.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Replace all uses of current_cpu_data with this_cpu operations on the
per cpu structure cpu_info. The scala accesses are replaced with the
matching this_cpu ops which results in smaller and more efficient
code.
In the long run, it might be a good idea to remove cpu_data() macro
too and use per_cpu macro directly.
tj: updated description
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Go through x86 code and replace __get_cpu_var and get_cpu_var
instances that refer to a scalar and are not used for address
determinations.
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This patch reworks and cleans up mce_amd_feature_init() by
introducing helper functions to setup and check the LVT offset.
It also fixes line endings in pr_err() calls.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1288015419-29543-4-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Shorten this variables to make later changes more readable.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1288015419-29543-3-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds a helper function for the initial setup of an
mce threshold block. The LVT offset is passed as argument. Also
making variable threshold_defaults local as it is only used in
function mce_amd_feature_init(). Function
threshold_restart_bank() is extended to setup the LVT offset,
the change is backward compatible. Thus, now there is only a
single wrmsrl() to setup the block.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1288015419-29543-2-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
vfs: make no_llseek the default
vfs: don't use BKL in default_llseek
llseek: automatically add .llseek fop
libfs: use generic_file_llseek for simple_attr
mac80211: disallow seeks in minstrel debug code
lirc: make chardev nonseekable
viotape: use noop_llseek
raw: use explicit llseek file operations
ibmasmfs: use generic_file_llseek
spufs: use llseek in all file operations
arm/omap: use generic_file_llseek in iommu_debug
lkdtm: use generic_file_llseek in debugfs
net/wireless: use generic_file_llseek in debugfs
drm: use noop_llseek
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (96 commits)
apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsets
apic, x86: Check if EILVT APIC registers are available (AMD only)
x86: ioapic: Call free_irte only if interrupt remapping enabled
arm: Use ARCH_IRQ_INIT_FLAGS
genirq, ARM: Fix boot on ARM platforms
genirq: Fix CONFIG_GENIRQ_NO_DEPRECATED=y build
x86: Switch sparse_irq allocations to GFP_KERNEL
genirq: Switch sparse_irq allocator to GFP_KERNEL
genirq: Make sparse_lock a mutex
x86: lguest: Use new irq allocator
genirq: Remove the now unused sparse irq leftovers
genirq: Sanitize dynamic irq handling
genirq: Remove arch_init_chip_data()
x86: xen: Sanitise sparse_irq handling
x86: Use sane enumeration
x86: uv: Clean up the direct access to irq_desc
x86: Make io_apic.c local functions static
genirq: Remove irq_2_iommu
x86: Speed up the irq_remapped check in hot pathes
intr_remap: Simplify the code further
...
Fix up trivial conflicts in arch/x86/Kconfig
* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Remove pr_<level> uses of KERN_<level>
therm_throt.c: Trivial printk message fix for a unsuitable abbreviation of 'thermal'
x86: Use {push,pop}{l,q}_cfi in more places
i386: Add unwind directives to syscall ptregs stubs
x86-64: Use symbolics instead of raw numbers in entry_64.S
x86-64: Adjust frame type at paranoid_exit:
x86-64: Fix unwind annotations in syscall stubs
We want the BIOS to setup the EILVT APIC registers. The offsets
were hardcoded and BIOS settings were overwritten by the OS.
Now, the subsystems for MCE threshold and IBS determine the LVT
offset from the registers the BIOS has setup. If the BIOS setup
is buggy on a family 10h system, a workaround enables IBS. If
the OS determines an invalid register setup, a "[Firmware Bug]:
" error message is reported.
We need this change also for upcomming cpu families.
Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1286360874-1471-3-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.
The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.
New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.
The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.
Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.
Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.
===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}
@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}
@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}
@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}
@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}
@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};
@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};
@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};
@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};
@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};
// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};
@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};
// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};
// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};
// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};
@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};
// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////
@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};
@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};
@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};
@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
This fixes possible cases of not collecting valid error info in
the MCE error thresholding groups on F10h hardware.
The current code contains a subtle problem of checking only the
Valid bit of MSR0000_0413 (which is MC4_MISC0 - DRAM
thresholding group) in its first iteration and breaking out if
the bit is cleared.
But (!), this MSR contains an offset value, BlkPtr[31:24], which
points to the remaining MSRs in this thresholding group which
might contain valid information too. But if we bail out only
after we checked the valid bit in the first MSR and not the
block pointer too, we miss that other information.
The thing is, MC4_MISC0[BlkPtr] is not predicated on
MCi_STATUS[MiscV] or MC4_MISC0[Valid] and should be checked
prior to iterating over the MCI_MISCj thresholding group,
irrespective of the MC4_MISC0[Valid] setting.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When the feature PTS is not supported by CPU, the sysfile
package_power_limit_count for package should not be
generated.
This patch is used for fixing missing { and }.
The patch is not complete as there are other error handling
problems in this function - but that can wait until the
merge window.
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Reviewed-by: Fenghua Yu <fenghua.yu@initel.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Cc: Brown Len <len.brown@intel.com>
Cc: Guenter Roeck <guenter.roeck@ericsson.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: lm-sensors@lm-sensors.org <lm-sensors@lm-sensors.org>
LKML-Reference: <4C7625D1.4060201@np.css.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
In unexpected_thermal_interrupt(), "LVT TMR interrupt" is used
in error message.
I don't think TMR is a suitable abbreviation for thermal.
1.TMR has been used in IA32 Architectures Software Developer's
Manual, and is the abbreviation for Trigger Mode Register.
2.There is not an standard abbreviation "TMR" defined for thermal
in IA32 Architectures Software Developer's Manual.
3.Though we could understand it as Thermal Monitor Register, it is
easy to be misunderstood as a *TIMER* interrupt also.
I think this patch will fix it.
Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Reviewed-by: Jean Delvare <khali@linux-fr.org>
Cc: Brown Len <len.brown@intel.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <4C7C492D.5020704@np.css.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix BUG: using smp_processor_id() in preemptible thermal_throttle_add_dev.
We know the cpu number when calling thermal_throttle_add_dev, so we can
remove smp_processor_id call in thermal_throttle_add_dev by supplying
the cpu number as argument.
This should resolve kernel bugzilla 16615/16629.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
LKML-Reference: <20100820073634.GB5209@swordfish.minsk.epam.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The abbreviation of severity should be SEV instead of SER, so the CPER
severity constants are renamed accordingly. GHES severity constants
are renamed in the same way too.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
Revert "net: Make accesses to ->br_port safe for sparse RCU"
mce: convert to rcu_dereference_index_check()
net: Make accesses to ->br_port safe for sparse RCU
vfs: add fs.h to define struct file
lockdep: Add an in_workqueue_context() lockdep-based test function
rcu: add __rcu API for later sparse checking
rcu: add an rcu_dereference_index_check()
tree/tiny rcu: Add debug RCU head objects
mm: remove all rcu head initializations
fs: remove all rcu head initializations, except on_stack initializations
powerpc: remove all rcu head initializations
Power limit notification feature is published in Intel 64 and IA-32
Architectures SDMV Vol 3A 14.5.6 Power Limit Notification.
It is implemented first on Intel Sandy Bridge platform.
The patch handles notification interrupt. Interrupt handler dumps power limit
information in log_buf, logs the event in mce log, and increases the event
counters (core_power_limit and package_power_limit). Upper level applications
could use the data to detect system health or diagnose functionality/performance
issues.
In the future, the event could be handled in a more fancy way.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1280448826-12004-5-git-send-email-fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Add package level thermal throttle interrupt support. The interrupt handler
increases package level thermal throttle count. It also logs the event in MCE
log.
The package level thermal throttle interrupt happens across threads in a
package. Each thread handles the interrupt individually. User level application
is supposed to retrieve correct event count and log based on package/thread
topology. This is the same situation for core level interrupt handler. In the
future, interrupt may be reported only per package or per core.
core_throttle_count and package_throttle_count are used for user interface.
Previously only throttle_count is used for core throttle count. If you think
new core_throttle_count name breaks user interface, I can change this part.
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1280448826-12004-4-git-send-email-fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The mce processing applies rcu_dereference_check() to integers used as
array indices. This patch therefore moves mce to the new RCU API
rcu_dereference_index_check() that avoids the sparse processing that
would otherwise result in compiler errors.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Use HW_ERR printk prefix in MCE handler. To make it more explicit that
this is hardware error instead of software error.
Signed-off-by: Huang Ying <ying.huang@intel.com>
LKML-Reference: <1275978939.3444.668.camel@yhuang-dev.sh.intel.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
It is reported that CMCI is not raised when number of corrected error
reaches preset threshold. After inspection, it is found that
MSR_IA32_MCI_CTL2 threshold field is not setup properly. This patch
fixed it.
Value of MCI_CTL2_CMCI_THRESHOLD_MASK is fixed according to x86_64
Software Developer's Manual too.
Reported-by: Shaohui Zheng <shaohui.zheng@intel.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
LKML-Reference: <1275977350.3444.660.camel@yhuang-dev.sh.intel.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Rename CMCI_EN to MCI_CTL2_CMCI_EN and CMCI_THRESHOLD_MASK to
MCI_CTL2_CMCI_THRESHOLD_MASK to make naming consistent.
Signed-off-by: Huang Ying <ying.huang@intel.com>
LKML-Reference: <1275977348.3444.659.camel@yhuang-dev.sh.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: (83 commits)
i7core_edac: Better describe the supported devices
Add support for Westmere to i7core_edac driver
i7core_edac: don't free on success
i7core_edac: Add support for X5670
Always call i7core_[ur]dimm_check_mc_ecc_err
i7core_edac: fix memory leak of i7core_dev
EDAC: add __init to i7core_xeon_pci_fixup
i7core_edac: Fix wrong device id for channel 1 devices
i7core: add support for Lynnfield alternate address
i7core_edac: Add initial support for Lynnfield
i7core_edac: do not export static functions
edac: fix i7core build
edac: i7core_edac produces undefined behaviour on 32bit
i7core_edac: Use a more generic approach for probing PCI devices
i7core_edac: PCI device is called NONCORE, instead of NOCORE
i7core_edac: Fix ringbuffer maxsize
i7core_edac: First store, then increment
i7core_edac: Better parse "any" addrmask
i7core_edac: Use a lockless ringbuffer
edac: Create an unique instance for each kobj
...
By the previous modification, the cpu notifier can return encapsulate
errno value. This converts the cpu notifiers for msr, cpuid, and
therm_throt.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Traditionally, fatal MCE will cause Linux print error log to console
then reboot. Because MCE registers will preserve their content after
warm reboot, the hardware error can be logged to disk or network after
reboot. But system may fail to warm reboot, then you may lose the
hardware error log. ERST can help here. Through saving the hardware
error log into flash via ERST before go panic, the hardware error log
can be gotten from the flash after system boot successful again.
The fatal MCE processing procedure with ERST involved is as follow:
- Hardware detect error, MCE raised
- MCE read MCE registers, check error severity (fatal), prepare error record
- Write MCE error record into flash via ERST
- Go panic, then trigger system reboot
- System reboot, /sbin/mcelog run, it reads /dev/mcelog to check flash
for error record of previous boot via ERST, and output and clear
them if available
- /sbin/mcelog logs error records into disk or network
ERST only accepts CPER record format, but there is no pre-defined CPER
section can accommodate all information in struct mce, so a customized
section type is defined to hold struct mce inside a CPER record as an
error section.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Generic Hardware Error Source provides a way to report platform
hardware errors (such as that from chipset). It works in so called
"Firmware First" mode, that is, hardware errors are reported to
firmware firstly, then reported to Linux by firmware. This way, some
non-standard hardware error registers or non-standard hardware link
can be checked by firmware to produce more valuable hardware error
information for Linux.
Now, only SCI notification type and memory errors are supported. More
notification type and hardware error type will be added later. These
memory errors are reported to user space through /dev/mcelog via
faking a corrected Machine Check, so that the error memory page can be
offlined by /sbin/mcelog if the error count for one page is beyond the
threshold.
On some machines, Machine Check can not report physical address for
some corrected memory errors, but GHES can do that. So this simplified
GHES is implemented firstly.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
edac_mce module is an interface module that gets mcelog data and
forwards to any registered edac module that expects to receive data via
mce.
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Commit f56e8a076 "x86/mce: Fix RCU lockdep splats" introduced the
following build bug:
arch/x86/kernel/cpu/mcheck/mce.c: In function 'mce_log':
arch/x86/kernel/cpu/mcheck/mce.c:166: error: 'mce_read_mutex' undeclared (first use in this function)
arch/x86/kernel/cpu/mcheck/mce.c:166: error: (Each undeclared identifier is reported only once
arch/x86/kernel/cpu/mcheck/mce.c:166: error: for each function it appears in.)
Move the in-the-middle-of-file lock variable up to the variable
definition section, the top of the .c file.
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
LKML-Reference: <1267830207-9474-3-git-send-email-paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, k8 nb: Fix boot crash: enable k8_northbridges unconditionally on AMD systems
x86, UV: Fix target_cpus() in x2apic_uv_x.c
x86: Reduce per cpu warning boot up messages
x86: Reduce per cpu MCA boot up messages
x86_64, cpa: Don't work hard in preserving kernel 2M mappings when using 4K already
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
locking: Make sparse work with inline spinlocks and rwlocks
x86/mce: Fix RCU lockdep splats
rcu: Increase RCU CPU stall timeouts if PROVE_RCU
ftrace: Replace read_barrier_depends() with rcu_dereference_raw()
rcu: Suppress RCU lockdep warnings during early boot
rcu, ftrace: Fix RCU lockdep splat in ftrace_perf_buf_prepare()
rcu: Suppress __mpol_dup() false positive from RCU lockdep
rcu: Make rcu_read_lock_sched_held() handle !PREEMPT
rcu: Add control variables to lockdep_rcu_dereference() diagnostics
rcu, cgroup: Relax the check in task_subsys_state() as early boot is now handled by lockdep-RCU
rcu: Use wrapper function instead of exporting tasklist_lock
sched, rcu: Fix rcu_dereference() for RCU-lockdep
rcu: Make task_subsys_state() RCU-lockdep checks handle boot-time use
rcu: Fix holdoff for accelerated GPs for last non-dynticked CPU
x86/gart: Unexport gart_iommu_aperture
Fix trivial conflicts in kernel/trace/ftrace.c
These are the non-static sysfs attributes that exist on
my test machine. Fix them to use sysfs_attr_init or
sysfs_bin_attr_init as appropriate. It simply requires
making a sysfs attribute present to see this. So this
is a little bit tedious but otherwise not too bad.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Constify struct sysfs_ops.
This is part of the ops structure constification
effort started by Arjan van de Ven et al.
Benefits of this constification:
* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime
* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime
* potentially better optimized code as the compiler
can assume that the const data cannot be changed
* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing
Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It looks better to have a common function. No change in functionality.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <4B25FDDC.407@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Add check if APIC is not disabled since thermal
monitoring depends on it. As only apic gets disabled
we should not try to install "thermal monitor" vector,
print out that thermal monitoring is enabled and etc...
Note that "Intel Correct Machine Check Interrupts" already
has such a check.
Also I decided to not add cpu_has_apic check into
mcheck_intel_therm_init since even if it'll call apic_read on
disabled apic -- it's safe here and allow us to save a few code
bytes.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
LKML-Reference: <4B25FDC2.3020401@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When there are a large number of processors in a system, there
is an excessive amount of messages sent to the system console.
It's estimated that with 4096 processors in a system, and the
console baudrate set to 56K, the startup messages will take
about 84 minutes to clear the serial port.
This set of patches limits the number of repetitious messages
which contain no additional information. Much of this information
is obtainable from the /proc and /sysfs. Some of the messages
are also sent to the kernel log buffer as KERN_DEBUG messages so
dmesg can be used to examine more closely any details specific to
a problem.
The new cpu bootup sequence for system_state == SYSTEM_BOOTING:
Booting Node 0, Processors #1#2#3#4#5#6#7 Ok.
Booting Node 1, Processors #8#9#10#11#12#13#14#15 Ok.
...
Booting Node 3, Processors #56#57#58#59#60#61#62#63 Ok.
Brought up 64 CPUs
After the system is running, a single line boot message is displayed
when CPU's are hotplugged on:
Booting Node %d Processor %d APIC 0x%x
Status of the following lines:
CPU: Physical Processor ID: printed once (for boot cpu)
CPU: Processor Core ID: printed once (for boot cpu)
CPU: Hyper-Threading is disabled printed once (for boot cpu)
CPU: Thermal monitoring enabled printed once (for boot cpu)
CPU %d/0x%x -> Node %d: removed
CPU %d is now offline: only if system_state == RUNNING
Initializing CPU#%d: KERN_DEBUG
Signed-off-by: Mike Travis <travis@sgi.com>
LKML-Reference: <4B219E28.8080601@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Commit cebe182033 had an unnecessary,
wrong change: &mce_banks[i].attr is equivalent to the former
bank_attrs[i], not to mce_attrs[i].
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <4B1E05CC.4040703@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
mce_timer must be passed to setup_timer() in all cases, no
matter whether it is going to be actually used. Otherwise, when
the CPU gets brought down, its call to del_timer_sync() will
never return, as the timer won't have a base associated, and
hence lock_timer_base() will loop infinitely.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: <stable@kernel.org>
LKML-Reference: <4B1DB831.2030801@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Even it is in error path unlikely taken, add_timer_on() at
CPU_DOWN_FAILED* needs to be skipped if mce_timer is disabled.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: <stable@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The mce_disable_cpu() and mce_reenable_cpu() are called only
from mce_cpu_callback() which is marked as __cpuinit.
So these functions can be __cpuinit too.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <4B0E3C4E.4090809@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The intel_init_thermal() is called from resume path, so it
cannot be marked as __init.
OTOH mce_banks_init() is only called from
__mcheck_cpu_cap_init() which is marked as __cpuinit, so it can
be also marked as __cpuinit.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Yong Wang <yong.y.wang@linux.intel.com>
LKML-Reference: <4AFBB0B8.2070501@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mark the thermal init functions __init so that the init memory
can be freed.
Signed-off-by: Yong Wang <yong.y.wang@intel.com>
LKML-Reference: <20091111075125.GA17900@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On platforms where the BIOS handles the thermal monitor interrupt,
APIC_LVTTHMR on each logical CPU is programmed to generate a SMI
and OS must not touch it.
Unfortunately AP bringup sequence using INIT-SIPI-SIPI clears all
the LVT entries except the mask bit. Essentially this results in
all LVT entries including the thermal monitoring interrupt set
to masked (clearing the bios programmed value for APIC_LVTTHMR).
And this leads to kernel take over the thermal monitoring
interrupt on AP's but not on BSP (leaving the bios programmed
value only on BSP).
As a result of this, we have seen system hangs when the thermal
monitoring interrupt is generated.
Fix this by reading the initial value of thermal LVT entry on
BSP and if bios has taken over the control, then program the
same value on all AP's and leave the thermal monitoring
interrupt control on all the logical cpu's to the bios.
Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Arjan van de Ven <arjan@infradead.org>
LKML-Reference: <20091110013824.GA24940@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
Note that there's no freeing the cpu var, since this module has
no unload function.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
LKML-Reference: <200911031458.30987.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Prefix global/setup routines with "mcheck_" thus differentiating
from the internal facilities prefixed with "mce_". Also, prefix
the per cpu calls with mcheck_cpu and rename them to reflect the
MCE setup hierarchy of calls better.
There should be no functionality change resulting from this
patch.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <1255689093-26921-1-git-send-email-borislav.petkov@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The MCE initialization code explicitly says it doesn't handle
asymmetric configurations where different CPUs support different
numbers of MCE banks, and it prints a big warning in that case.
Therefore, printing the "mce: CPU supports <x> MCE banks"
message into the kernel log for every CPU is pure redundancy
that clutters the log significantly for systems with lots of
CPUs.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
LKML-Reference: <adaeip473qt.fsf@cisco.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This approach is the first baby step towards solving many of the
structural problems the x86 MCE logging code is having today:
- It has a private ring-buffer implementation that has a number
of limitations and has been historically fragile and buggy.
- It is using a quirky /dev/mcelog ioctl driven ABI that is MCE
specific. /dev/mcelog is not part of any larger logging
framework and hence has remained on the fringes for many years.
- The MCE logging code is still very unclean partly due to its ABI
limitations. Fields are being reused for multiple purposes, and
the whole message structure is limited and x86 specific to begin
with.
All in one, the x86 tree would like to move away from this private
implementation of an event logging facility to a broader framework.
By using perf events we gain the following advantages:
- Multiple user-space agents can access MCE events. We can have an
mcelog daemon running but also a system-wide tracer capturing
important events in flight-recorder mode.
- Sampling support: the kernel and the user-space call-chain of MCE
events can be stored and analyzed as well. This way actual patterns
of bad behavior can be matched to precisely what kind of activity
happened in the kernel (and/or in the app) around that moment in
time.
- Coupling with other hardware and software events: the PMU can track a
number of other anomalies - monitoring software might chose to
monitor those plus the MCE events as well - in one coherent stream of
events.
- Discovery of MCE sources - tracepoints are enumerated and tools can
act upon the existence (or non-existence) of various channels of MCE
information.
- Filtering support: we just subscribe to and act upon the events we
are interested in. Then even on a per event source basis there's
in-kernel filter expressions available that can restrict the amount
of data that hits the event channel.
- Arbitrary deep per cpu buffering of events - we can buffer 32
entries or we can buffer as much as we want, as long as we have
the RAM.
- An NMI-safe ring-buffer implementation - mappable to user-space.
- Built-in support for timestamping of events, PID markers, CPU
markers, etc.
- A rich ABI accessible over system call interface. Per cpu, per task
and per workload monitoring of MCE events can be done this way. The
ABI itself has a nice, meaningful structure.
- Extensible ABI: new fields can be added without breaking tooling.
New tracepoints can be added as the hardware side evolves. There's
various parsers that can be used.
- Lots of scheduling/buffering/batching modes of operandi for MCE
events. poll() support. mmap() support. read() support. You name it.
- Rich tooling support: even without any MCE specific extensions added
the 'perf' tool today offers various views of MCE data: perf report,
perf stat, perf trace can all be used to view logged MCE events and
perhaps correlate them to certain user-space usage patterns. But it
can be used directly as well, for user-space agents and policy action
in mcelog, etc.
With this we hope to achieve significant code cleanup and feature
improvements in the MCE code, and we hope to be able to drop the
/dev/mcelog facility in the end.
This patch is just a plain dumb dump of mce_log() records to
the tracepoints / perf events framework - a first proof of
concept step.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <4AD42A0D.7050104@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add an atomic notifier which ensures proper locking when conveying
MCE info to EDAC for decoding. The actual notifier call overrides a
default, negative priority notifier.
Note: make sure we register the default decoder only once since
mcheck_init() runs on each CPU.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20091003065752.GA8935@liondog.tnic>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Make decoding of MCEs happen only on AMD hardware by registering a
non-default callback only on CPU families which support it.
While looking at the interaction of decode_mce() with the other MCE
code i also noticed a few other things and made the following
cleanups/fixes:
- Fixed the mce_decode() weak alias - a weak alias is really not
good here, it should be a proper callback. A weak alias will be
overriden if a piece of code is built into the kernel - not
good, obviously.
- The patch initializes the callback on AMD family 10h and 11h.
- Added the more correct fallback printk of:
No support for human readable MCE decoding on this CPU type.
Transcribe the message and run it through 'mcelog --ascii' to decode.
On CPUs that dont have a decoder.
- Made the surrounding code more readable.
Note that the callback allows us to have a default fallback -
without having to check the CPU versions during the printout
itself. When an EDAC module registers itself, it can install the
decode-print function.
(there's no unregister needed as this is core code.)
version -v2 by Borislav Petkov:
- add K8 to the set of supported CPUs
- always build in edac_mce_amd since we use an early_initcall now
- fix checkpatch warnings
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <20091001141432.GA11410@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 22223c9b41, as
requested by Andi Kleen:
"Obviously kernels compiled with AMD support can still run on non AMD
systems, so messages like this can never be removed at compile time."
Requsted-by: Andi Kleen <andi@firstfloor.org>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use rdmsrl_safe() when accessing MCE registers. While in
theory we always 'know' which ones are safe to access from
the capability bits, there's a lot of hardware variations
and reality might differ from theory, as it did in this case:
http://bugzilla.kernel.org/show_bug.cgi?id=14204
[ 0.010016] mce: CPU supports 5 MCE banks
[ 0.011029] general protection fault: 0000 [#1]
[ 0.011998] last sysfs file:
[ 0.011998] Modules linked in:
[ 0.011998]
[ 0.011998] Pid: 0, comm: swapper Not tainted (2.6.31_router #1) HP Vectra
[ 0.011998] EIP: 0060:[<c100d9b9>] EFLAGS: 00010246 CPU: 0
[ 0.011998] EIP is at mce_rdmsrl+0x19/0x60
[ 0.011998] EAX: 00000000 EBX: 00000001 ECX: 00000407 EDX: 08000000
[ 0.011998] ESI: 00000000 EDI: 8c000000 EBP: 00000405 ESP: c17d5eac
So WARN_ONCE() instead of crashing the box.
( also fix a number of stylistic inconsistencies in the code. )
Note, we might still crash in wrmsrl() if we get that far, but
we shouldnt if the registers are truly inaccessible.
Reported-by: GNUtoo <GNUtoo@no-log.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <bug-14204-5438@http.bugzilla.kernel.org/>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Current raise_local() uses a struct mce that comes from mce_write()
as a parameter instead of the real inject-msg, so when we set
mce.finished = 0 to clear injected MCE, the real inject stays
valid.
This will cause the remaining inject-msg affect the next injection,
which is not desired.
To fix this, real inject-msg is used in raise_local instead of the
one on the stack.
This patch is based on the diagnosis and the fixes by Dean Nelson.
Reported-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <1253601357.15717.757.camel@yhuang-dev.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If a system switches back and forth between hot and cold mode,
the MCE code will print a stream of critical kernel messages.
Extend the throttling code to properly notice this, by
only printing the first hot + cold transition and omitting
the rest up to CHECK_INTERVAL (5 minutes).
This way we'll only get a single incident of:
[ 102.356584] CPU0: Temperature above threshold, cpu clock throttled (total events = 1)
[ 102.357000] Disabling lock debugging due to kernel taint
[ 102.369223] CPU0: Temperature/speed normal
Every 5 minutes. The 'total events' count tells the number of cold/hot
transitions detected, should overheating occur after 5 minutes again:
[ 402.357580] CPU0: Temperature above threshold, cpu clock throttled (total events = 24891)
[ 402.358001] CPU0: Temperature/speed normal
[ 450.704142] Machine check events logged
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Instead of a mess of three separate percpu variables, consolidate
the state into a single structure.
Also clean up therm_throt_process(), use cleaner and more
understandable variable names and a clearer logic.
This, without changing the logic, makes the code more
streamlined, more readable and smaller as well:
text data bss dec hex filename
1487 169 4 1660 67c therm_throt.o.before
1432 176 4 1612 64c therm_throt.o.after
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix following compile warning:
arch/x86/kernel/cpu/mcheck/mce_amd.c: In function 'threshold_create_bank':
arch/x86/kernel/cpu/mcheck/mce_amd.c:492: warning: unused variable 'c'
which shows up when kernel is compiled with CONFIG_SMP=n.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20090915151727.GB21670@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (21 commits)
x86, mce: Fix compilation with !CONFIG_DEBUG_FS in mce-severity.c
x86, mce: CE in last bank prevents panic by unknown MCE
x86, mce: Fake panic support for MCE testing
x86, mce: Move debugfs mce dir creating to mce.c
x86, mce: Support specifying raise mode for software MCE injection
x86, mce: Support specifying context for software mce injection
x86, mce: fix reporting of Thermal Monitoring mechanism enabled
x86, mce: remove never executed code
x86, mce: add missing __cpuinit tags
x86, mce: fix "mce" boot option handling for CONFIG_X86_NEW_MCE
x86, mce: don't log boot MCEs on Pentium M (model == 13) CPUs
x86: mce: Lower maximum number of banks to architecture limit
x86: mce: macros to compute banks MSRs
x86: mce: Move per bank data in a single datastructure
x86: mce: Move code in mce.c
x86: mce: Rename CONFIG_X86_NEW_MCE to CONFIG_X86_MCE
x86: mce: Remove old i386 machine check code
x86: mce: Update X86_MCE description in x86/Kconfig
x86: mce: Make CONFIG_X86_ANCIENT_MCE dependent on CONFIG_X86_MCE
x86, mce: use atomic_inc_return() instead of add by 1
...
Manually fixed up trivial conflicts:
Documentation/feature-removal-schedule.txt
arch/x86/kernel/cpu/mcheck/mce.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
powerpc64: convert to dynamic percpu allocator
sparc64: use embedding percpu first chunk allocator
percpu: kill lpage first chunk allocator
x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
percpu: update embedding first chunk allocator to handle sparse units
percpu: use group information to allocate vmap areas sparsely
vmalloc: implement pcpu_get_vm_areas()
vmalloc: separate out insert_vmalloc_vm()
percpu: add chunk->base_addr
percpu: add pcpu_unit_offsets[]
percpu: introduce pcpu_alloc_info and pcpu_group_info
percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
percpu: add @align to pcpu_fc_alloc_fn_t
percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
percpu: drop @static_size from first chunk allocators
percpu: generalize first chunk allocator selection
percpu: build first chunk allocators selectively
percpu: rename 4k first chunk allocator to page
percpu: improve boot messages
percpu: fix pcpu_reclaim() locking
...
Fix trivial conflict as by Tejun Heo in kernel/sched.c
Fix compilation error in arch/x86/kernel/cpu/mcheck/mce-severity.c
when CONFIG_DEBUG_FS is disabled, introduced in commit
5be9ed251f.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Move NB decoder along with required defines to EDAC MCE core. Add
registration routines for further decoding of the MCE info in the AMD64
EDAC module.
CC: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
This fixes threshold_bank4 support on multi-node processors.
The correct mask to use is llc_shared_map, representing an internal
node on Magny-Cours.
We need to create 2 sets of symlinks for sibling shared banks -- one
set for each internal node, symlinks of each set should target the
first core on same internal node.
Currently only one set is created where all symlinks are targeting
the first core of the entire socket.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
If MCE handler is called but none of mces_seen have machine
check event which might signal the MCE (i.e. event higher than
MCE_KEEP_SEVERITY), panic with "Machine check from unknown
source" will be taken since the MCE is assumed to be signaled
from external agent or so.
Usually mces_seen never point MCE_KEEP_SEVERITY event such as
CE. But it can happen because initial value of mces_seen is
accidentally modified by mce_no_way_out() - in case if
mce_no_way_out() run through all banks and the last bank has
the CE, mces_seen points the CE and the "panic by unknown" will
not be taken.
This patch fixes this undesired behavior, and clarifies the logic.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com>
LKML-Reference: <4A94E244.3020301@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
An older test-box started hanging at the following point during
bootup:
[ 0.022996] Mount-cache hash table entries: 512
[ 0.024996] Initializing cgroup subsys debug
[ 0.025996] Initializing cgroup subsys cpuacct
[ 0.026995] Initializing cgroup subsys devices
[ 0.027995] Initializing cgroup subsys freezer
[ 0.028995] mce: CPU supports 5 MCE banks
I've bisected it down to commit 4efc0670 ("x86, mce: use 64bit
machine check code on 32bit"), which utilizes the MCE code on
32-bit systems too.
The problem is caused by this detail in my config:
# CONFIG_CPU_SUP_INTEL is not set
This disables the quirks in mce_cpu_quirks() but still enables
MCE support - which then hangs due to the missing quirk
workaround needed on this CPU:
if (c->x86 == 6 && c->x86_model < 0x1A && banks > 0)
mce_banks[0].init = 0;
The safe solution is to not initialize MCEs if we dont know on
what CPU we are running (or if that CPU's support code got
disabled in the config).
Also be a bit more defensive on 32-bit systems: dont do a
boot-time dump of pending MCEs not just on the specific system
that we found a problem with (Pentium-M), but earlier ones as
well.
Now this problem is probably not common and disabling CPU
support is rare - but still being more defensive in something
we turned on for a wide range of CPUs is prudent.
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
LKML-Reference: Message-ID: <4A88E3E4.40506@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On my legacy Pentium M laptop (Acer Extensa 2900) I get bogus MCE on a cold
boot with CONFIG_X86_NEW_MCE enabled, i.e. (after decoding it with mcelog):
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 1 MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: Data CACHE Level-1 UNKNOWN Error
STATUS f200000000000195 MCGSTATUS 0
[ The other STATUS values observed: f2000000000001b5 (... UNKNOWN error)
and f200000000000115 (... READ Error).
To verify that this is not a CONFIG_X86_NEW_MCE bug I also modified
the CONFIG_X86_OLD_MCE code (which doesn't log any MCEs) to dump
content of STATUS MSR before it is cleared during initialization. ]
Since the bogus MCE results in a kernel taint (which in turn disables
lockdep support) don't log boot MCEs on Pentium M (model == 13) CPUs
by default ("mce=bootlog" boot parameter can be be used to get the old
behavior).
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/sparc/kernel/smp_64.c
arch/x86/kernel/cpu/perf_counter.c
arch/x86/kernel/setup_percpu.c
drivers/cpufreq/cpufreq_ondemand.c
mm/percpu.c
Conflicts in core and arch percpu codes are mostly from commit
ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all
the first chunk allocators into mm/percpu.c, the changes are moved
from arch code to mm/percpu.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
My Latitude d630 seems to be handling thermal events in SMI by
lowering the max frequency of the CPU till it cools down but
still leaks the "everything is normal" events.
This spams the console and with high priority printks.
Adjust therm_throt driver to only print messages about the fact
that temperatire returned back to normal when leaving the
throttling state.
Also lower the severity of "back to normal" message from
KERN_CRIT to KERN_INFO.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Acked-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <20090810051513.0558F526EC9@mailhub.coreip.homeip.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If "fake panic" mode is turned on, just log panic message instead of
go real panic. This is used for testing only, so that the test suite
can check for the correct panic message and do regression testing for
MCE would go panic.
This patch is based on x86-tip.git/mce.
ChangeLog:
v5:
- Rebased on x86-tip.git/mce
v4:
- Move config file from sysfs to debugfs
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Because more debugfs files under mce dir will be create in mce.c.
ChangeLog:
v5:
- Rebased on x86-tip.git/mce
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Raise mode include raising as exception or raising as poll, it is
specified via the mce.inject_flags field.
This can be used to specify raise mode of UCNA, which is UC error but
raised not as exception. And this can be used to test the filter code
of poll handler or exception handler too. For example, enforce a poll
raise mode for a fatal MCE.
ChangeLog:
v2:
- Re-base on latest x86-tip.git/mce3
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The cpu context is specified via the new mce.inject_flags fields.
This allows more realistic machine check testing in different
situations. "RANDOM" context is implemented via NMI broadcasting to
add randomization to testing.
AK: Fix NMI broadcasting check. Fix 32-bit building. Some race
fixes. Move to module. Various changes
ChangeLog:
v3:
- Re-based on latest x86-tip.git/mce4
- Fix 32-bit building
v2:
- Re-base on latest x86-tip.git/mce3
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Early Pentium M models use different method for enabling TM2
(per paragraph 13.5.2.3 of the "Intel 64 and IA-32 Architectures
Software Developer's Manual Volume 3A: System Programming Guide,
Part 1").
Tested on the affected Pentium M variant (model == 13).
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
fseverities_coverage is never NULL in err_out code path.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
mce_cap_init() and mce_cpu_quirks() can be tagged with __cpuinit.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
"mce argument mce ignored. Please use /sys" message shouldn't
be printed when using "mce" boot option.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
On my legacy Pentium M laptop (Acer Extensa 2900) I get bogus MCE on a cold
boot with CONFIG_X86_NEW_MCE enabled, i.e. (after decoding it with mcelog):
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 1 MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: Data CACHE Level-1 UNKNOWN Error
STATUS f200000000000195 MCGSTATUS 0
[ The other STATUS values observed: f2000000000001b5 (... UNKNOWN error)
and f200000000000115 (... READ Error).
To verify that this is not a CONFIG_X86_NEW_MCE bug I also modified
the CONFIG_X86_OLD_MCE code (which doesn't log any MCEs) to dump
content of STATUS MSR before it is cleared during initialization. ]
Since the bogus MCE results in a kernel taint (which in turn disables
lockdep support) don't log boot MCEs on Pentium M (model == 13) CPUs
by default ("mce=bootlog" boot parameter can be be used to get the old
behavior).
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Fix the condition checking the result of strchr() (which previously
could result in an oops), and make the function return the number of
bytes actively used.
[ Impact: fix oops ]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <4A5F04B7020000780000AB59@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Instead of open coded calculations for bank MSRs hide the indexing of higher
banks MCE register MSRs in new macros.
No semantic changes.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This addresses one of the leftover review comments.
Move the per bank data into a single structure. This avoids
several separate variables and also separate allocation of sysfs objects.
I didn't move the CMCI ownership information so far because
that would have needed some non trivial changes in the algorithms.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Now that the X86_OLD_MCE ifdefs are gone move some code that
used to be outside the big ifdef to a more natural place
near its user.
No code change.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Drop the CONFIG_X86_NEW_MCE symbol and change all
references to it to check for CONFIG_X86_MCE directly.
No code changes
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
As announced in feature-remove-schedule.txt remove CONFIG_X86_OLD_MCE
This patch only removes code.
The ancient machine check code for very old systems that are not supported
by CONFIG_X86_NEW_MCE is still kept.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Commit 5fd29d6ccb ("printk: clean up
handling of log-levels and newlines") changed printk semantics. printk
lines with multiple KERN_<level> prefixes are no longer emitted as
before the patch.
<level> is now included in the output on each additional use.
Remove all uses of multiple KERN_<level>s in formats.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix
changes. As alpha in percpu tree uses 'weak' attribute instead of
inline assembly, there's no need for __used attribute.
Conflicts:
arch/alpha/include/asm/percpu.h
arch/mn10300/kernel/vmlinux.lds.S
include/linux/percpu-defs.h
If CONFIG_NO_HZ + CONFIG_SMP, timer added via add_timer() might
be migrated on other cpu. Use add_timer_on() instead.
Avoids the following failure:
Maciej Rutecki wrote:
> > After normal boot I try:
> >
> > echo 1 > /sys/devices/system/machinecheck/machinecheck0/check_interval
> >
> > I found this in dmesg:
> >
> > [ 141.704025] ------------[ cut here ]------------
> > [ 141.704039] WARNING: at arch/x86/kernel/cpu/mcheck/mce.c:1102
> > mcheck_timer+0xf5/0x100()
Reported-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Percpu variable definition is about to be updated such that all percpu
symbols including the static ones must be unique. Update percpu
variable definitions accordingly.
* as,cfq: rename ioc_count uniquely
* cpufreq: rename cpu_dbs_info uniquely
* xen: move nesting_count out of xen_evtchn_do_upcall() and rename it
* mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and
rename it
* ipv4,6: rename cookie_scratch uniquely
* x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to
pmc_irq_entry and nmi_entry to pmc_nmi_entry
* perf_counter: rename disable_count to perf_disable_count
* ftrace: rename test_event_disable to ftrace_test_event_disable
* kmemleak: rename test_pointer to kmemleak_test_pointer
* mce: rename next_interval to mce_next_interval
[ Impact: percpu usage cleanups, no duplicate static percpu var names ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: linux-mm <linux-mm@kvack.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Currently, the following three different ways to define percpu arrays
are in use.
1. DEFINE_PER_CPU(elem_type[array_len], array_name);
2. DEFINE_PER_CPU(elem_type, array_name[array_len]);
3. DEFINE_PER_CPU(elem_type, array_name)[array_len];
Unify to #1 which correctly separates the roles of the two parameters
and thus allows more flexibility in the way percpu variables are
defined.
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: linux-mm@kvack.org
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David S. Miller <davem@davemloft.net>
Use atomic_inc_return() instead of atomic_add_return() by 1.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
We need a cleared cpu_mask to record if mce is initialized, especially
when MAXSMP is used.
used zalloc_... instead
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The sysfs attribute cmci_disabled was accidentall turned into a
duplicate of ignore_ce, breaking all other attributes.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
mce_intel.c uses apic_write() and lapic_get_maxlvt(), and so it needs
<asm/apic.h>.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
If APIC was disabled (for some reason) and as result
it's not even mapped we should not try to enable thermal
interrupts at all.
Reported-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Tested-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090615182633.GA7606@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rename files that are no longer 64bit specific:
mce_amd_64.c => mce_amd.c
mce_intel_64.c => mce_intel.c
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Now all symbols in the header are static. Remove the header.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
move intel_init_thermal() into therm_throt.c
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Put common functions into therm_throt.c, modify Makefile.
unexpected_thermal_interrupt
intel_thermal_interrupt
smp_thermal_interrupt
intel_set_thermal_handler
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Break smp_thermal_interrupt() into two functions.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Remove unused argument regs from handlers, and use inc_irq_stat.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The mce_disabled on 32bit is a tristate variable [1,0,-1],
while 64bit version is boolean [0,1].
This patch makes mce_disabled always boolean, and use mce_p5_enabled
to indicate the third state instead.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
There are 2 headers:
arch/x86/include/asm/mce.h
arch/x86/kernel/cpu/mcheck/mce.h
and in the latter small header:
#include <asm/mce.h>
This patch move all contents in the latter header into the former,
and fix all files using the latter to include the former instead.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add sysfs interface for admins who want to tweak these options without
rebooting the system.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
"trigger" is not straight forward name for valiable that holds name
of user mode helper program which triggered by machine check events.
This patch renames this valiable and kins to more recognizable names.
trigger => mce_helper
trigger_argv => mce_helper_argv
notify_user => mce_need_notify
No functional changes.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add __read_mostly to data written during setup.
Suggested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Simplify interface of mce_start():
- no_way_out = mce_start(no_way_out, &order);
+ order = mce_start(&no_way_out);
Now Monarch and Subjects share same exit(return) in usual path.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
In mce_cpu_restart, mce_init_timer is called unconditionally.
If !mce_available (e.g. mce is disabled), there are no useful work
for timer. Stop running it.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
If one CPU has no_way_out == 1, all other CPUs should have no_way_out
== 1. But despite global_nwo is read after mce_callin, global_nwo is
updated after mce_callin too. So it is possible that some CPU read
global_nwo before some other CPU update global_nwo, so that no_way_out
== 1 for some CPU, while no_way_out == 0 for some other CPU.
This patch fixes this race condition via moving mce_callin updating
after global_nwo updating, with a smp_wmb in between. A smp_rmb is
added between their reading too.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>