Otherwise, system with apci id lifting will have wrong apicid in
/proc/cpuinfo.
and use that in srat_detect_node().
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <4A998CCA.1040407@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Kernel BTS tracing generates too much data too fast for us to
handle, causing the kernel to hang.
Fail for BTS requests for kernel code.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Acked-by: Peter Zijlstra <a.p.zjilstra@chello.nl>
LKML-Reference: <20090902140616.901253000@intel.com>
[ This is really a workaround - but we want BTS tracing in .32
so make sure we dont regress. The lockup should be fixed
ASAP. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On 32bit, pointers in the DS AREA configuration are cast to
u64. The current (long) cast to avoid compiler warnings results
in a signed 64bit address.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090902140615.305889000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reserve PERF_COUNT_HW_BRANCH_INSTRUCTIONS with sample_period ==
1 for BTS tracing and fail, if BTS is not available.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090902140612.943801000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pack aligned things together into a special section to minimize
padding holes.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Tejun Heo <tj@kernel.org>
LKML-Reference: <4AA035C0.9070202@goop.org>
[ queued up in tip:x86/asm because it depends on this commit:
x86/i386: Make sure stack-protector segment base is cache aligned ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This fixes threshold_bank4 support on multi-node processors.
The correct mask to use is llc_shared_map, representing an internal
node on Magny-Cours.
We need to create 2 sets of symlinks for sibling shared banks -- one
set for each internal node, symlinks of each set should target the
first core on same internal node.
Currently only one set is created where all symlinks are targeting
the first core of the entire socket.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
L3 cache size, associativity and shared_cpu information need to be
adapted to show information for an internal node instead of the
entire physical package.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Construct entire NodeID and use it as cpu_llc_id. Thus internal node
siblings are stored in llc_shared_map.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The Intel Optimization Reference Guide says:
In Intel Atom microarchitecture, the address generation unit
assumes that the segment base will be 0 by default. Non-zero
segment base will cause load and store operations to experience
a delay.
- If the segment base isn't aligned to a cache line
boundary, the max throughput of memory operations is
reduced to one [e]very 9 cycles.
[...]
Assembly/Compiler Coding Rule 15. (H impact, ML generality)
For Intel Atom processors, use segments with base set to 0
whenever possible; avoid non-zero segment base address that is
not aligned to cache line boundary at all cost.
We can't avoid having a non-zero base for the stack-protector
segment, but we can make it cache-aligned.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: <stable@kernel.org>
LKML-Reference: <4AA01893.6000507@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Create a blacklist for processors that should not load the acpi-cpufreq module.
The initial entry in the blacklist function is the Intel 0f68 processor. It's
specification update mentions errata AL30 which implies that cpufreq should not
run on this processor.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Remove an obsolete check that used to prevent there being more
than 2 low P-states. Now that low-to-low P-states changes are
enabled, it prevents otherwise workable configurations with
multiple low P-states.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Tested-by: Krists Krilovs <pow@pow.za.net>
Signed-off-by: Dave Jones <davej@redhat.com>
fbd8b1819e turns off the bit for
/proc/cpuinfo. However, a proper/full fix would be to additionally
turn off the bit in the CPUID output so that future callers get
correct CPU features info.
Do that by basically reversing what the BIOS wrongfully does at boot.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1251705011-18636-3-git-send-email-petkovbb@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
TSC calibration is modified by the vmware hypervisor and paravirt by
separate means. Moorestown wants to add its own calibration routine as
well. So make calibrate_tsc a proper x86_init_ops function and
override it by paravirt or by the early setup of the vmware
hypervisor.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reason: Change to is_new_memtype_allowed() in x86/urgent
Resolved semantic conflicts in:
arch/x86/mm/pat.c
arch/x86/mm/ioremap.c
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
If MCE handler is called but none of mces_seen have machine
check event which might signal the MCE (i.e. event higher than
MCE_KEEP_SEVERITY), panic with "Machine check from unknown
source" will be taken since the MCE is assumed to be signaled
from external agent or so.
Usually mces_seen never point MCE_KEEP_SEVERITY event such as
CE. But it can happen because initial value of mces_seen is
accidentally modified by mce_no_way_out() - in case if
mce_no_way_out() run through all banks and the last bank has
the CE, mces_seen points the CE and the "panic by unknown" will
not be taken.
This patch fixes this undesired behavior, and clarifies the logic.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com>
LKML-Reference: <4A94E244.3020301@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
mtr_aps_delayed_init was declared u32 and made global, but it only
ever takes boolean values and is only ever used in
arch/x86/kernel/cpu/mtrr/main.c. Declare it "static bool" and remove
external references.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
SDM Vol 3a section titled "MTRR considerations in MP systems" specifies
the need for synchronizing the logical cpu's while initializing/updating
MTRR.
Currently Linux kernel does the synchronization of all cpu's only when
a single MTRR register is programmed/updated. During an AP online
(during boot/cpu-online/resume) where we initialize all the MTRR/PAT registers,
we don't follow this synchronization algorithm.
This can lead to scenarios where during a dynamic cpu online, that logical cpu
is initializing MTRR/PAT with cache disabled (cr0.cd=1) etc while other logical
HT sibling continue to run (also with cache disabled because of cr0.cd=1
on its sibling).
Starting from Westmere, VMX transitions with cr0.cd=1 don't work properly
(because of some VMX performance optimizations) and the above scenario
(with one logical cpu doing VMX activity and another logical cpu coming online)
can result in system crash.
Fix the MTRR initialization by doing rendezvous of all the cpus. During
boot and resume, we delay the MTRR/PAT init for APs till all the
logical cpu's come online and the rendezvous process at the end of AP's bringup,
will initialize the MTRR/PAT for all AP's.
For dynamic single cpu online, we synchronize all the logical cpus and
do the MTRR/PAT init on the AP that is coming online.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
load_percpu_segment() is used to set up the per-cpu segment registers,
which are also used for -fstack-protector. Make sure that the
load_percpu_segment() function doesn't have stackprotector enabled.
[ Impact: allow percpu setup before calling stack-protected functions ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
An older test-box started hanging at the following point during
bootup:
[ 0.022996] Mount-cache hash table entries: 512
[ 0.024996] Initializing cgroup subsys debug
[ 0.025996] Initializing cgroup subsys cpuacct
[ 0.026995] Initializing cgroup subsys devices
[ 0.027995] Initializing cgroup subsys freezer
[ 0.028995] mce: CPU supports 5 MCE banks
I've bisected it down to commit 4efc0670 ("x86, mce: use 64bit
machine check code on 32bit"), which utilizes the MCE code on
32-bit systems too.
The problem is caused by this detail in my config:
# CONFIG_CPU_SUP_INTEL is not set
This disables the quirks in mce_cpu_quirks() but still enables
MCE support - which then hangs due to the missing quirk
workaround needed on this CPU:
if (c->x86 == 6 && c->x86_model < 0x1A && banks > 0)
mce_banks[0].init = 0;
The safe solution is to not initialize MCEs if we dont know on
what CPU we are running (or if that CPU's support code got
disabled in the config).
Also be a bit more defensive on 32-bit systems: dont do a
boot-time dump of pending MCEs not just on the specific system
that we found a problem with (Pentium-M), but earlier ones as
well.
Now this problem is probably not common and disabling CPU
support is rare - but still being more defensive in something
we turned on for a wide range of CPUs is prudent.
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
LKML-Reference: Message-ID: <4A88E3E4.40506@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On my legacy Pentium M laptop (Acer Extensa 2900) I get bogus MCE on a cold
boot with CONFIG_X86_NEW_MCE enabled, i.e. (after decoding it with mcelog):
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 1 MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: Data CACHE Level-1 UNKNOWN Error
STATUS f200000000000195 MCGSTATUS 0
[ The other STATUS values observed: f2000000000001b5 (... UNKNOWN error)
and f200000000000115 (... READ Error).
To verify that this is not a CONFIG_X86_NEW_MCE bug I also modified
the CONFIG_X86_OLD_MCE code (which doesn't log any MCEs) to dump
content of STATUS MSR before it is cleared during initialization. ]
Since the bogus MCE results in a kernel taint (which in turn disables
lockdep support) don't log boot MCEs on Pentium M (model == 13) CPUs
by default ("mce=bootlog" boot parameter can be be used to get the old
behavior).
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
kernel/perf_counter.c
Merge reason: update to latest upstream (-rc6) and resolve
the conflict with urgent fixes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
arch/sparc/kernel/smp_64.c
arch/x86/kernel/cpu/perf_counter.c
arch/x86/kernel/setup_percpu.c
drivers/cpufreq/cpufreq_ondemand.c
mm/percpu.c
Conflicts in core and arch percpu codes are mostly from commit
ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all
the first chunk allocators into mm/percpu.c, the changes are moved
from arch code to mm/percpu.c.
Signed-off-by: Tejun Heo <tj@kernel.org>
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Report the cloning task as parent on perf_counter_fork()
perf_counter: Fix an ipi-deadlock
perf: Rework/fix the whole read vs group stuff
perf_counter: Fix swcounter context invariance
perf report: Don't show unresolved DSOs and symbols when -S/-d is used
perf tools: Add a general option to enable raw sample records
perf tools: Add a per tracepoint counter attribute to get raw sample
perf_counter: Provide hw_perf_counter_setup_online() APIs
perf list: Fix large list output by using the pager
perf_counter, x86: Fix/improve apic fallback
perf record: Add missing -C option support for specifying profile cpu
perf tools: Fix dso__new handle() to handle deleted DSOs
perf tools: Fix fallback to cplus_demangle() when bfd_demangle() is not available
perf report: Show the tid too in -D
perf record: Fix .tid and .pid fill-in when synthesizing events
perf_counter, x86: Fix generic cache events on P6-mobile CPUs
perf_counter, x86: Fix lapic printk message
Johannes Stezenbach reported that his Pentium-M based
laptop does not have the local APIC enabled by default,
and hence perfcounters do not get initialized.
Add a fallback for this case: allow non-sampled counters
and return with an error on sampled counters. This allows
'perf stat' to work out of box - and allows 'perf top'
and 'perf record' to fall back on a hrtimer based sampling
method.
( Passing 'lapic' on the boot line will allow hardware
sampling to occur - but if the APIC is disabled
permanently by the hardware then this fallback still
allows more systems to use perfcounters. )
Also decouple perfcounter support from X86_LOCAL_APIC.
-v2: fix typo breaking counters on all other systems ...
Reported-by: Johannes Stezenbach <js@sig21.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Kernel is broken for x86 CPUs without CPUID since 2.6.28. It
crashes with NULL pointer dereference in identify_cpu():
766 generic_identify(c);
767
768--> if (this_cpu->c_identify)
769 this_cpu->c_identify(c);
this_cpu is NULL. This is because it's only initialized in
get_cpu_vendor() function, which is not called if the CPU has
no CPUID instruction.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
LKML-Reference: <200908112000.15993.linux@rainbow-software.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Due to an erratum with certain AMD Athlon 64 processors, the
BIOS may need to force enable the LAHF_LM capability.
Unfortunately, in at least one case, the BIOS does this even
for processors that do not support the functionality.
Add a specific check that will clear the feature bit for
processors known not to support the LAHF/SAHF instructions.
Signed-off-by: Kevin Winchester <kjwinchester@gmail.com>
Acked-by: Borislav Petkov <petkovbb@googlemail.com>
LKML-Reference: <4A80A5AD.2000209@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Johannes Stezenbach reported that 'perf stat' does not count
cache-miss and cache-references events on his Pentium-M based
laptop.
This is because we left them blank in p6_perfmon_event_map[],
fill them in.
Reported-by: Johannes Stezenbach <js@sig21.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
My Latitude d630 seems to be handling thermal events in SMI by
lowering the max frequency of the CPU till it cools down but
still leaks the "everything is normal" events.
This spams the console and with high priority printks.
Adjust therm_throt driver to only print messages about the fact
that temperatire returned back to normal when leaving the
throttling state.
Also lower the severity of "back to normal" message from
KERN_CRIT to KERN_INFO.
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Acked-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <20090810051513.0558F526EC9@mailhub.coreip.homeip.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If "fake panic" mode is turned on, just log panic message instead of
go real panic. This is used for testing only, so that the test suite
can check for the correct panic message and do regression testing for
MCE would go panic.
This patch is based on x86-tip.git/mce.
ChangeLog:
v5:
- Rebased on x86-tip.git/mce
v4:
- Move config file from sysfs to debugfs
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Because more debugfs files under mce dir will be create in mce.c.
ChangeLog:
v5:
- Rebased on x86-tip.git/mce
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Raise mode include raising as exception or raising as poll, it is
specified via the mce.inject_flags field.
This can be used to specify raise mode of UCNA, which is UC error but
raised not as exception. And this can be used to test the filter code
of poll handler or exception handler too. For example, enforce a poll
raise mode for a fatal MCE.
ChangeLog:
v2:
- Re-base on latest x86-tip.git/mce3
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The cpu context is specified via the new mce.inject_flags fields.
This allows more realistic machine check testing in different
situations. "RANDOM" context is implemented via NMI broadcasting to
add randomization to testing.
AK: Fix NMI broadcasting check. Fix 32-bit building. Some race
fixes. Move to module. Various changes
ChangeLog:
v3:
- Re-based on latest x86-tip.git/mce4
- Fix 32-bit building
v2:
- Re-base on latest x86-tip.git/mce3
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Implement a performance counter with:
attr.type = PERF_TYPE_HARDWARE
attr.config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS
attr.sample_period = 1
Using branch trace store (BTS) on x86 hardware, if available.
The from and to address for each branch can be sampled using:
PERF_SAMPLE_IP for the from address
PERF_SAMPLE_ADDR for the to address
[ v2: address review feedback, fix bugs ]
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
GDT_ENTRY_INIT is static initializer of desc_struct.
We already have similar macro GDT_ENTRY() but it's static
initializer for u64 and it cannot be used for desc_struct.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
LKML-Reference: <20090718151219.GD11294@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On x86_64, percpu variables current_task and kernel_stack are used for
get_current() and current_thread_info() respectively and thus are
often used close to each other. Move definition of current_task to
kernel/cpu/common.c right above kernel_stack definition and align it
to cacheline so that they always fall into the same cacheline. Two
percpu variables defined there together - irq_stack_ptr and irq_count
- are also pretty hot and will benefit from sharing the cacheline.
For consistency, current_task definition for x86_32 is also moved to
kernel/cpu/common.c.
Putting current_task and kernel_stack into the same cacheline was
suggested by Linus Torvalds.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
DECLARE/DEFINE_PER_CPU_PAGE_ALIGNED() put percpu variables in
.page_aligned section without adding any alignment restrictions.
Currently, this doesn't cause any problem because all users of the
macros have explicit page alignment and page-sized but it's much safer
to enforce page alignment from the macros. After all, it's what they
claim to do.
Add __aligned(PAGE_SIZE) to DECLARE/DEFINE_PER_CPU_PAGE_ALIGNED() and
drop explicit alignment from it users.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Early Pentium M models use different method for enabling TM2
(per paragraph 13.5.2.3 of the "Intel 64 and IA-32 Architectures
Software Developer's Manual Volume 3A: System Programming Guide,
Part 1").
Tested on the affected Pentium M variant (model == 13).
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
fseverities_coverage is never NULL in err_out code path.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
mce_cap_init() and mce_cpu_quirks() can be tagged with __cpuinit.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
"mce argument mce ignored. Please use /sys" message shouldn't
be printed when using "mce" boot option.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
On my legacy Pentium M laptop (Acer Extensa 2900) I get bogus MCE on a cold
boot with CONFIG_X86_NEW_MCE enabled, i.e. (after decoding it with mcelog):
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 1 MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: Data CACHE Level-1 UNKNOWN Error
STATUS f200000000000195 MCGSTATUS 0
[ The other STATUS values observed: f2000000000001b5 (... UNKNOWN error)
and f200000000000115 (... READ Error).
To verify that this is not a CONFIG_X86_NEW_MCE bug I also modified
the CONFIG_X86_OLD_MCE code (which doesn't log any MCEs) to dump
content of STATUS MSR before it is cleared during initialization. ]
Since the bogus MCE results in a kernel taint (which in turn disables
lockdep support) don't log boot MCEs on Pentium M (model == 13) CPUs
by default ("mce=bootlog" boot parameter can be be used to get the old
behavior).
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: geode: Mark mfgpt irq IRQF_TIMER to prevent resume failure
x86, amd: Don't probe for extended APIC ID if APICs are disabled
x86, mce: Rename incorrect macro name "CONFIG_X86_THRESHOLD"
x86-64: Fix bad_srat() to clear all state
x86, mce: Fix set_trigger() accessor
x86: Fix movq immediate operand constraints in uaccess.h
x86: Fix movq immediate operand constraints in uaccess_64.h
x86: Add reboot fixup for SBC-fitPC2
x86: Include all of .data.* sections in _edata on 64-bit
x86: Add quirk for Intel DG45ID board to avoid low memory corruption
* 'perf-counters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf: (31 commits)
perf_counter tools: Give perf top inherit option
perf_counter tools: Fix vmlinux symbol generation breakage
perf_counter: Detect debugfs location
perf_counter: Add tracepoint support to perf list, perf stat
perf symbol: C++ demangling
perf: avoid structure size confusion by using a fixed size
perf_counter: Fix throttle/unthrottle event logging
perf_counter: Improve perf stat and perf record option parsing
perf_counter: PERF_SAMPLE_ID and inherited counters
perf_counter: Plug more stack leaks
perf: Fix stack data leak
perf_counter: Remove unused variables
perf_counter: Make call graph option consistent
perf_counter: Add perf record option to log addresses
perf_counter: Log vfork as a fork event
perf_counter: Synthesize VDSO mmap event
perf_counter: Make sure we dont leak kernel memory to userspace
perf_counter tools: Fix index boundary check
perf_counter: Fix the tracepoint channel to perfcounters
perf_counter, x86: Extend perf_counter Pentium M support
...
If we've logically disabled apics, don't probe the PCI space for the
AMD extended APIC ID.
[ Impact: prevent boot crash under Xen. ]
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reported-by: Bastian Blank <bastian@waldi.eu.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Fix the condition checking the result of strchr() (which previously
could result in an oops), and make the function return the number of
bytes actively used.
[ Impact: fix oops ]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <4A5F04B7020000780000AB59@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
I've attached a patch to remove the Pentium M special casing of
EMON and as noticed at least with my Pentium M the hardware PMU
now works:
Performance counter stats for '/bin/ls /var/tmp':
1.809988 task-clock-msecs # 0.125 CPUs
1 context-switches # 0.001 M/sec
0 CPU-migrations # 0.000 M/sec
224 page-faults # 0.124 M/sec
1425648 cycles # 787.656 M/sec
912755 instructions # 0.640 IPC
Vince suggested that this code was trying to address erratum
Y17 in Pentium-M's:
http://download.intel.com/support/processors/mobile/pm/sb/25266532.pdf
But that erratum (related to IA32_MISC_ENABLES.7) does not
affect perfcounters as we dont use this toggle to disable RDPMC
and WRMSR/RDMSR access to performance counters. We keep cr4's
bit 8 (X86_CR4_PCE) clear so unprivileged RDPMC access is not
allowed anyway.
Cc: Vince Weaver <vince@deater.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@googlemail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
No code changes except printk levels (although some of the K6
mtrr code might be clearer if there were a few as would
splitting out some of the intel cache code).
Signed-off-by: Alan Cox <alan@linux.intel.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
perf report: Add "Fractal" mode output - support callchains with relative overhead rate
perf_counter tools: callchains: Manage the cumul hits on the fly
perf report: Change default callchain parameters
perf report: Use a modifiable string for default callchain options
perf report: Warn on callchain output request from non-callchain file
x86: atomic64: Inline atomic64_read() again
x86: atomic64: Clean up atomic64_sub_and_test() and atomic64_add_negative()
x86: atomic64: Improve atomic64_xchg()
x86: atomic64: Export APIs to modules
x86: atomic64: Improve atomic64_read()
x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP
x86: atomic64: Fix unclean type use in atomic64_xchg()
x86: atomic64: Make atomic_read() type-safe
x86: atomic64: Reduce size of functions
x86: atomic64: Improve atomic64_add_return()
x86: atomic64: Improve cmpxchg8b()
x86: atomic64: Improve atomic64_read()
x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file
x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too
perf report: Annotate variable initialization
...
Ingo noticed that both AMD and P6 call
x86_pmu_disable_counter() on *_pmu_enable_counter(). This is
because we rely on the side effect of that call to program
the event config but not touch the EN bit.
We change that for AMD by having enable_all() simply write
the full config in, and for P6 by explicitly coding it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The P6 doesn't seem to support cache ref/hit/miss counts, so
we extend the generic hardware event codes to have 0 and -1
mean the same thing as for the generic cache events.
Furthermore, it turns out the 0 event does not count
(that is, its reported that on PPro it actually does count
something), therefore use a event configuration that's
specified not to count to disable the counters.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add basic P6 PMU support. The P6 uses the EVNTSEL0 EN bit to
enable/disable both its counters. We use this for the
global enable/disable, and clear all config bits (except EN)
to disable individual counters.
Actual ia32 hardware doesn't support lfence, so use a locked
op without side-effect to implement a full barrier.
perf stat and perf record seem to function correctly.
[a.p.zijlstra@chello.nl: cleanups and complete the enable/disable code]
Signed-off-by: Vince Weaver <vince@deater.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <Pine.LNX.4.64.0907081718450.2715@pianoman.cluster.toy>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Instead of open coded calculations for bank MSRs hide the indexing of higher
banks MCE register MSRs in new macros.
No semantic changes.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This addresses one of the leftover review comments.
Move the per bank data into a single structure. This avoids
several separate variables and also separate allocation of sysfs objects.
I didn't move the CMCI ownership information so far because
that would have needed some non trivial changes in the algorithms.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Now that the X86_OLD_MCE ifdefs are gone move some code that
used to be outside the big ifdef to a more natural place
near its user.
No code change.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Drop the CONFIG_X86_NEW_MCE symbol and change all
references to it to check for CONFIG_X86_MCE directly.
No code changes
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
As announced in feature-remove-schedule.txt remove CONFIG_X86_OLD_MCE
This patch only removes code.
The ancient machine check code for very old systems that are not supported
by CONFIG_X86_NEW_MCE is still kept.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Commit 5fd29d6ccb ("printk: clean up
handling of log-levels and newlines") changed printk semantics. printk
lines with multiple KERN_<level> prefixes are no longer emitted as
before the patch.
<level> is now included in the output on each additional use.
Remove all uses of multiple KERN_<level>s in formats.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide support for family 0xf processors with 2 P-states
below the elevator voltage. Remove the checks that prevent
this configuration from being supported and increase the
transition voltage to prevent errors during the transition.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix usage of bios intcall()
x86: Remove unused function lapic_watchdog_ok()
x86: Remove unused variable disable_x2apic
x86, kvm: Fix section mismatches in kvm.c
x86: Add missing annotation to arch/x86/lib/copy_user_64.S::copy_to_user
x86: Fix fixmap page order for FIX_TEXT_POKE0,1
amd-iommu: set evt_buf_size correctly
amd-iommu: handle alias entries correctly in init code
x86: Fix printk call in print_local_apic()
x86: Declare check_efer() before it gets used
x86: Mark device_nb as static and fix NULL noise
x86: Remove double declaration of MSR_P6_EVNTSEL0 and MSR_P6_EVNTSEL1
xen: Use kcalloc() in xen_init_IRQ()
x86: Fix fixmap ordering
x86: Fix symbol annotation for arch/x86/lib/clear_page_64.S::clear_page_c
Yinghai noticed that i defined BIOS_BUG_MSG but added no
usage for it. The usage is to clean up this turd in generic.c:
printk(KERN_WARNING "WARNING: BIOS bug: VAR MTRR %d "
"contains strange UC entry under 1M, check "
"with your system vendor!\n", i);
Breaking printk lines in the middle looks ugly, is hard to read
and breaks 'git grep'. Use the BIOS_BUG_MSG instead.
Also complete the moving of structure definitions and variables
to the top of the file.
Reported-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix following trivial style problems:
ERROR: trailing whitespace X 25
WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
WARNING: Use #include <linux/kvm_para.h> instead of <asm/kvm_para.h>
ERROR: do not initialise externals to 0 or NULL X 2
ERROR: "foo * bar" should be "foo *bar" X 5
ERROR: do not use assignment in if condition X 2
WARNING: line over 80 characters X 8
ERROR: return is not a function, parentheses are not required
WARNING: braces {} are not necessary for any arm of this statement
ERROR: space required before the open parenthesis '(' X 2
ERROR: open brace '{' following function declarations go on the next line
ERROR: space required after that ',' (ctx:VxV) X 8
ERROR: space required before the open parenthesis '(' X 3
ERROR: else should follow close brace '}'
WARNING: space prohibited between function name and open parenthesis '('
WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable X 2
Also use pr_debug and pr_warning where possible.
total: 50 errors, 14 warnings
arch/x86/kernel/cpu/mtrr/main.o:
text data bss dec hex filename
3668 116 4156 7940 1f04 main.o.before
3668 116 4156 7940 1f04 main.o.after
md5:
e01af2fd28deef77c8d01e71acfbd365 main.o.before.asm
e01af2fd28deef77c8d01e71acfbd365 main.o.after.asm
Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
Cc: Avi Kivity <avi@redhat.com> # Avi, please have a look at the kvm_para.h bit
[ More cleanups ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix:
ERROR: do not use C99 // comments
ERROR: "foo * bar" should be "foo *bar" X 2
Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
[ More tidyups ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix:
WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
ERROR: trailing whitespace X 7
ERROR: trailing statements should be on next line X 3
WARNING: line over 80 characters X 5
ERROR: space required before the open parenthesis '('
arch/x86/kernel/cpu/mtrr/if.o:
text data bss dec hex filename
2239 4 0 2243 8c3 if.o.before
2239 4 0 2243 8c3 if.o.after
md5:
78d1f2aa4843ec6509c18e2dee54bc7f if.o.before.asm
78d1f2aa4843ec6509c18e2dee54bc7f if.o.after.asm
Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
[ More cleanups to make the code more consistent. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix following trivial style problems:
ERROR: trailing whitespace X 4
WARNING: Use #include <linux/io.h> instead of <asm/io.h>
WARNING: braces {} are not necessary for single statement blocks X 3
ERROR: "foo * bar" should be "foo *bar"
WARNING: line over 80 characters X 6
ERROR: "foo * bar" should be "foo *bar"
ERROR: spaces required around that '=' (ctx:VxO)
ERROR: space required before that '-' (ctx:OxV)
WARNING: suspect code indent for conditional statements (8, 12)
ERROR: spaces required around that '=' (ctx:VxV)
ERROR: do not initialise statics to 0 or NULL
ERROR: space prohibited after that open parenthesis '(' X 2
ERROR: space prohibited before that close parenthesis ')' X 2
ERROR: trailing statements should be on next line
ERROR: return is not a function, parentheses are not required
Also use pr_debug and pr_warning where possible.
arch/x86/kernel/cpu/mtrr/generic.o:
text data bss dec hex filename
5652 77 4224 9953 26e1 generic.o.before
5652 77 4220 9949 26dd generic.o.after
The md5 changed:
b34d6c045f06daa4ed092b90cc760e8f generic.o.before.asm
a490c6251cfd8442fbffecc0e09a573d generic.o.after.asm
Because mtrr_state moved from data to bss, changing its
offsets - and also because __LINE__ numbers changed.
Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
[ Further cleanups to make the code more consistent ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix trivial style problems:
WARNING: Use #include <linux/io.h> instead of <asm/io.h>
WARNING: line over 80 characters
ERROR: do not initialise statics to 0 or NULL
ERROR: space prohibited after that open parenthesis '(' X 2
ERROR: space prohibited before that close parenthesis ')' X 2
ERROR: trailing whitespace X 2
ERROR: trailing statements should be on next line
ERROR: do not use C99 // comments X 2
arch/x86/kernel/cpu/mtrr/cyrix.o:
text data bss dec hex filename
1637 32 8 1677 68d cyrix.o.before
1637 32 8 1677 68d cyrix.o.after
md5:
6f52abd06905be3f4cabb5239f9b0ff0 cyrix.o.before.asm
6f52abd06905be3f4cabb5239f9b0ff0 cyrix.o.after.asm
Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
[ Made the code more consistent ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix trivial style problems:
WARNING: Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
WARNING: Use #include <linux/kvm_para.h> instead of <asm/kvm_para.h>
Also, nr_mtrr_spare_reg should be unsigned long.
arch/x86/kernel/cpu/mtrr/cleanup.o:
text data bss dec hex filename
6241 8992 2056 17289 4389 cleanup.o.before
6241 8992 2056 17289 4389 cleanup.o.after
The md5 has changed:
1a7a27513aef1825236daf29110fe657 cleanup.o.before.asm
bcea358efa2532b6020e338e158447af cleanup.o.after.asm
Because a WARN_ON()'s __LINE__ value changed by 3 lines.
Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
[ Did lots of other cleanups to make the code look more consistent. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix trivial style problems :
ERROR: trailing whitespace
WARNING: line over 80 characters
ERROR: do not use C99 // comments
arch/x86/kernel/cpu/mtrr/amd.o:
text data bss dec hex filename
501 32 0 533 215 amd.o.before
501 32 0 533 215 amd.o.after
md5:
62f795eb840ee2d17b03df89e789e76c amd.o.before.asm
62f795eb840ee2d17b03df89e789e76c amd.o.after.asm
Suggested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20090703164225.GA21447@elte.hu>
[ Also restructured comments to be standard, removed stray return,
converted function description to DocBook style, etc. ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix
changes. As alpha in percpu tree uses 'weak' attribute instead of
inline assembly, there's no need for __used attribute.
Conflicts:
arch/alpha/include/asm/percpu.h
arch/mn10300/kernel/vmlinux.lds.S
include/linux/percpu-defs.h
lapic_watchdog_ok() is a global function but no one is using it.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1246554335.2242.29.camel@jaswinder.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
About every callchains recorded with perf record are filled up
including the internal perfcounter nmi frame:
perf_callchain
perf_counter_overflow
intel_pmu_handle_irq
perf_counter_nmi_handler
notifier_call_chain
atomic_notifier_call_chain
notify_die
do_nmi
nmi
We want ignore this frame as it's not interesting for
instrumentation. To solve this, we simply ignore every frames
from nmi context.
New example of "perf report -s sym -c" after this patch:
9.59% [k] search_by_key
4.88%
search_by_key
reiserfs_read_locked_inode
reiserfs_iget
reiserfs_lookup
do_lookup
__link_path_walk
path_walk
do_path_lookup
user_path_at
vfs_fstatat
vfs_lstat
sys_newlstat
system_call_fastpath
__lxstat
0x406fb1
3.19%
search_by_key
search_by_entry_key
reiserfs_find_entry
reiserfs_lookup
do_lookup
__link_path_walk
path_walk
do_path_lookup
user_path_at
vfs_fstatat
vfs_lstat
sys_newlstat
system_call_fastpath
__lxstat
0x406fb1
[...]
For now this patch only solves the problem in x86-64.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246474930-6088-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (47 commits)
perf report: Add --symbols parameter
perf report: Add --comms parameter
perf report: Add --dsos parameter
perf_counter tools: Adjust only prelinked symbol's addresses
perf_counter: Provide a way to enable counters on exec
perf_counter tools: Reduce perf stat measurement overhead/skew
perf stat: Use percentages for scaling output
perf_counter, x86: Update x86_pmu after WARN()
perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run
perf stat: Improve output
perf stat: Fix multi-run stats
perf stat: Add -n/--null option to run without counters
perf_counter tools: Remove dead code
perf_counter: Complete counter swap
perf report: Print sorted callchains per histogram entries
perf_counter tools: Prepare a small callchain framework
perf record: Fix unhandled io return value
perf_counter tools: Add alias for 'l1d' and 'l1i'
perf-report: Add bare minimum PERF_EVENT_READ parsing
perf-report: Add modes for inherited stats and no-samples
...
The print out should read the value before changing the value.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4A487017.4090007@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 95ee14e437.
Mikael Petterson <mikepe@it.uu.se> reported that at least one of his
systems will not boot as a result. We have ruled out the detection
algorithm malfunctioning, so it is not a matter of producing the
incorrect bitmasks; rather, something in the application of them
fails.
Revert the commit until we can root cause and correct this problem.
-stable team: this means the underlying commit should be rejected.
Reported-and-isolated-by: Mikael Petterson <mikpe@it.uu.se>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <200906261559.n5QFxJH8027336@pilspetsen.it.uu.se>
Cc: stable@kernel.org
Cc: Grant Grundler <grundler@parisc-linux.org>
If CONFIG_NO_HZ + CONFIG_SMP, timer added via add_timer() might
be migrated on other cpu. Use add_timer_on() instead.
Avoids the following failure:
Maciej Rutecki wrote:
> > After normal boot I try:
> >
> > echo 1 > /sys/devices/system/machinecheck/machinecheck0/check_interval
> >
> > I found this in dmesg:
> >
> > [ 141.704025] ------------[ cut here ]------------
> > [ 141.704039] WARNING: at arch/x86/kernel/cpu/mcheck/mce.c:1102
> > mcheck_timer+0xf5/0x100()
Reported-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Tested-by: Maciej Rutecki <maciej.rutecki@gmail.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Update the mmap control page with the needed information to
use the userspace RDPMC instruction for self monitoring.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Previous code made an assumption that the power on value of global
control MSR has enabled all fixed and general purpose counters properly.
However, this is not the case for certain Intel processors, such as
Atom - and it might also be firmware dependent.
Each enable bit in IA32_PERF_GLOBAL_CTRL is AND'ed with the
enable bits for all privilege levels in the respective IA32_PERFEVTSELx
or IA32_PERF_FIXED_CTR_CTRL MSRs to start/stop the counting of
respective counters. Counting is enabled if the AND'ed results is true;
counting is disabled when the result is false.
The end result is that all fixed counters are always disabled on Atom
processors because the assumption is just invalid.
Fix this by not initializing the ctrl-mask out of the global MSR,
but setting it to perf_counter_mask.
Reported-by: Stephane Eranian <eranian@googlemail.com>
Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090624021324.GA2788@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Percpu variable definition is about to be updated such that all percpu
symbols including the static ones must be unique. Update percpu
variable definitions accordingly.
* as,cfq: rename ioc_count uniquely
* cpufreq: rename cpu_dbs_info uniquely
* xen: move nesting_count out of xen_evtchn_do_upcall() and rename it
* mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and
rename it
* ipv4,6: rename cookie_scratch uniquely
* x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to
pmc_irq_entry and nmi_entry to pmc_nmi_entry
* perf_counter: rename disable_count to perf_disable_count
* ftrace: rename test_event_disable to ftrace_test_event_disable
* kmemleak: rename test_pointer to kmemleak_test_pointer
* mce: rename next_interval to mce_next_interval
[ Impact: percpu usage cleanups, no duplicate static percpu var names ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: linux-mm <linux-mm@kvack.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Currently, the following three different ways to define percpu arrays
are in use.
1. DEFINE_PER_CPU(elem_type[array_len], array_name);
2. DEFINE_PER_CPU(elem_type, array_name[array_len]);
3. DEFINE_PER_CPU(elem_type, array_name)[array_len];
Unify to #1 which correctly separates the roles of the two parameters
and thus allows more flexibility in the way percpu variables are
defined.
[ Impact: cleanup ]
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: linux-mm@kvack.org
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David S. Miller <davem@davemloft.net>
This counts when building sched domains in case NUMA information
is not available.
( See cpu_coregroup_mask() which uses llc_shared_map which in turn is
created based on cpu_llc_id. )
Currently Linux builds domains as follows:
(example from a dual socket quad-core system)
CPU0 attaching sched-domain:
domain 0: span 0-7 level CPU
groups: 0 1 2 3 4 5 6 7
...
CPU7 attaching sched-domain:
domain 0: span 0-7 level CPU
groups: 7 0 1 2 3 4 5 6
Ever since that is borked for multi-core AMD CPU systems.
This patch fixes that and now we get a proper:
CPU0 attaching sched-domain:
domain 0: span 0-3 level MC
groups: 0 1 2 3
domain 1: span 0-7 level CPU
groups: 0-3 4-7
...
CPU7 attaching sched-domain:
domain 0: span 4-7 level MC
groups: 7 4 5 6
domain 1: span 0-7 level CPU
groups: 4-7 0-3
This allows scheduler to assign tasks to cores on different sockets
(i.e. that don't share last level cache) for performance reasons.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20090619085909.GJ5218@alberich.amd.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use atomic_inc_return() instead of atomic_add_return() by 1.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
perfcounter: Handle some IO return values
perf_counter: Push perf_sample_data through the swcounter code
perf_counter tools: Define and use our own u64, s64 etc. definitions
perf_counter: Close race in perf_lock_task_context()
perf_counter, x86: Improve interactions with fast-gup
perf_counter: Simplify and fix task migration counting
perf_counter tools: Add a data file header
perf_counter: Update userspace callchain sampling uses
perf_counter: Make callchain samples extensible
perf report: Filter to parent set by default
perf_counter tools: Handle lost events
perf_counter: Add event overlow handling
fs: Provide empty .set_page_dirty() aop for anon inodes
perf_counter: tools: Makefile tweaks for 64-bit powerpc
perf_counter: powerpc: Add processor back-end for MPC7450 family
perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
perf_counter: powerpc: Change how processor-specific back-ends get selected
perf_counter: powerpc: Use unsigned long for register and constraint values
perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
perf_counter tools: Add and use isprint()
...
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (45 commits)
x86, mce: fix error path in mce_create_device()
x86: use zalloc_cpumask_var for mce_dev_initialized
x86: fix duplicated sysfs attribute
x86: de-assembler-ize asm/desc.h
i386: fix/simplify espfix stack switching, move it into assembly
i386: fix return to 16-bit stack from NMI handler
x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present
x86: Remove duplicated #include's
x86: msr.h linux/types.h is only required for __KERNEL__
x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround
x86, mce: mce_intel.c needs <asm/apic.h>
x86: apic/io_apic.c: dmar_msi_type should be static
x86, io_apic.c: Work around compiler warning
x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present
x86: mce: Handle banks == 0 case in K7 quirk
x86, boot: use .code16gcc instead of .code16
x86: correct the conversion of EFI memory types
x86: cap iomem_resource to addressable physical memory
x86, mce: rename _64.c files which are no longer 64-bit-specific
x86, mce: mce.h cleanup
...
Manually fix up trivial conflict in arch/x86/mm/fault.c
Before exposing upstream tools to a callchain-samples ABI, tidy it
up to make it more extensible in the future:
Use markers in the IP chain to denote context, use (u64)-1..-4095 range
for these context markers because we use them for ERR_PTR(), so these
addresses are unlikely to be mapped.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We need a cleared cpu_mask to record if mce is initialized, especially
when MAXSMP is used.
used zalloc_... instead
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The sysfs attribute cmci_disabled was accidentall turned into a
duplicate of ignore_ce, breaking all other attributes.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The espfix code triggers if we have a protected mode userspace
application with a 16-bit stack. On returning to userspace, with iret,
the CPU doesn't restore the high word of the stack pointer. This is an
"official" bug, and the work-around used in the kernel is to temporarily
switch to a 32-bit stack segment/pointer pair where the high word of the
pointer is equal to the high word of the userspace stackpointer.
The current implementation uses THREAD_SIZE to determine the cut-off,
but there is no good reason not to use the more natural 64kb... However,
implementing this by simply substituting THREAD_SIZE with 65536 in
patch_espfix_desc crashed the test application. patch_espfix_desc tries
to do what is described above, but gets it subtly wrong if the userspace
stack pointer is just below a multiple of THREAD_SIZE: an overflow
occurs to bit 13... With a bit of luck, when the kernelspace
stackpointer is just below a 64kb-boundary, the overflow then ripples
trough to bit 16 and userspace will see its stack pointer changed by
65536.
This patch moves all espfix code into entry_32.S. Selecting a 16-bit
cut-off simplifies the code. The game with changing the limit dynamically
is removed too. It complicates matters and I see no value in it. Changing
only the top 16-bit word of ESP is one instruction and it also implies
that only two bytes of the ESPFIX GDT entry need to be changed and this
can be implemented in just a handful simple to understand instructions.
As a side effect, the operation to compute the original ESP from the
ESPFIX ESP and the GDT entry simplifies a bit too, and the remaining
three instructions have been expanded inline in entry_32.S.
impact: can now reliably run userspace with ESP=xxxxfffc on 16-bit
stack segment
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Acked-by: Stas Sergeev <stsp@aknet.ru>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Commit 9e350de37a ("perf_counter: Accurate period data")
missed a spot, which caused all Intel-PMU samples to have a
period of 0.
This broke auto-freq sampling.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] cpumask: new cpumask operators for arch/x86/kernel/cpu/cpufreq/powernow-k8.c
[CPUFREQ] cpumask: avoid playing with cpus_allowed in powernow-k8.c
[CPUFREQ] cpumask: avoid cpumask games in arch/x86/kernel/cpu/cpufreq/speedstep-centrino.c
[CPUFREQ] cpumask: avoid playing with cpus_allowed in speedstep-ich.c
[CPUFREQ] powernow-k8: get drv data for correct CPU
[CPUFREQ] powernow-k8: read P-state from HW
[CPUFREQ] reduce scope of ACPI_PSS_BIOS_BUG_MSG[]
[CPUFREQ] Clean up convoluted code in arch/x86/kernel/tsc.c:time_cpufreq_notifier()
[CPUFREQ] minor correction to cpu-freq documentation
[CPUFREQ] powernow-k8.c: mess cleanup
[CPUFREQ] Only set sampling_rate_max deprecated, sampling_rate_min is useful
[CPUFREQ] powernow-k8: Set transition latency to 1 if ACPI tables export 0
[CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case
Expand Intel NMI perfctr1 workaround to include a Core2 processor stepping
(cpuid family-6, model-f, stepping-4). Resolves a situation where the NMI
would not enable on these processors.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: prarit@redhat.com
Cc: suresh.b.siddha@intel.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
mce_intel.c uses apic_write() and lapic_get_maxlvt(), and so it needs
<asm/apic.h>.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
If APIC was disabled (for some reason) and as result
it's not even mapped we should not try to enable thermal
interrupts at all.
Reported-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Tested-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
LKML-Reference: <20090615182633.GA7606@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* akpm: (182 commits)
fbdev: bf54x-lq043fb: use kzalloc over kmalloc/memset
fbdev: *bfin*: fix __dev{init,exit} markings
fbdev: *bfin*: drop unnecessary calls to memset
fbdev: bfin-t350mcqb-fb: drop unused local variables
fbdev: blackfin has __raw I/O accessors, so use them in fb.h
fbdev: s1d13xxxfb: add accelerated bitblt functions
tcx: use standard fields for framebuffer physical address and length
fbdev: add support for handoff from firmware to hw framebuffers
intelfb: fix a bug when changing video timing
fbdev: use framebuffer_release() for freeing fb_info structures
radeon: P2G2CLK_ALWAYS_ONb tested twice, should 2nd be P2G2CLK_DAC_ALWAYS_ONb?
s3c-fb: CPUFREQ frequency scaling support
s3c-fb: fix resource releasing on error during probing
carminefb: fix possible access beyond end of carmine_modedb[]
acornfb: remove fb_mmap function
mb862xxfb: use CONFIG_OF instead of CONFIG_PPC_OF
mb862xxfb: restrict compliation of platform driver to PPC
Samsung SoC Framebuffer driver: add Alpha Channel support
atmel-lcdc: fix pixclock upper bound detection
offb: use framebuffer_alloc() to allocate fb_info struct
...
Manually fix up conflicts due to kmemcheck in mm/slab.c
There are some places to be able to use printk_once instead of hard coding.
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
iomem_resource is by default initialized to -1, which means 64 bits of
physical address space if 64-bit resources are enabled. However, x86
CPUs cannot address 64 bits of physical address space. Thus, we want
to cap the physical address space to what the union of all CPU can
actually address.
Without this patch, we may end up assigning inaccessible values to
uninitialized 64-bit PCI memory resources.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Martin Mares <mj@ucw.cz>
Cc: stable@kernel.org
Rename files that are no longer 64bit specific:
mce_amd_64.c => mce_amd.c
mce_intel_64.c => mce_intel.c
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Now all symbols in the header are static. Remove the header.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
move intel_init_thermal() into therm_throt.c
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Put common functions into therm_throt.c, modify Makefile.
unexpected_thermal_interrupt
intel_thermal_interrupt
smp_thermal_interrupt
intel_set_thermal_handler
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Break smp_thermal_interrupt() into two functions.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Remove unused argument regs from handlers, and use inc_irq_stat.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The mce_disabled on 32bit is a tristate variable [1,0,-1],
while 64bit version is boolean [0,1].
This patch makes mce_disabled always boolean, and use mce_p5_enabled
to indicate the third state instead.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
There are 2 headers:
arch/x86/include/asm/mce.h
arch/x86/kernel/cpu/mcheck/mce.h
and in the latter small header:
#include <asm/mce.h>
This patch move all contents in the latter header into the former,
and fix all files using the latter to include the former instead.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add sysfs interface for admins who want to tweak these options without
rebooting the system.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
"trigger" is not straight forward name for valiable that holds name
of user mode helper program which triggered by machine check events.
This patch renames this valiable and kins to more recognizable names.
trigger => mce_helper
trigger_argv => mce_helper_argv
notify_user => mce_need_notify
No functional changes.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Add __read_mostly to data written during setup.
Suggested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Simplify interface of mce_start():
- no_way_out = mce_start(no_way_out, &order);
+ order = mce_start(&no_way_out);
Now Monarch and Subjects share same exit(return) in usual path.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
In mce_cpu_restart, mce_init_timer is called unconditionally.
If !mce_available (e.g. mce is disabled), there are no useful work
for timer. Stop running it.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
If one CPU has no_way_out == 1, all other CPUs should have no_way_out
== 1. But despite global_nwo is read after mce_callin, global_nwo is
updated after mce_callin too. So it is possible that some CPU read
global_nwo before some other CPU update global_nwo, so that no_way_out
== 1 for some CPU, while no_way_out == 0 for some other CPU.
This patch fixes this race condition via moving mce_callin updating
after global_nwo updating, with a smp_wmb in between. A smp_rmb is
added between their reading too.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Remove all old-style cpumask operators, and cpumask_t.
Also: get rid of the unused define_siblings function.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Tested-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
cpumask: avoid playing with cpus_allowed in powernow-k8.c
It's generally a very bad idea to mug some process's cpumask: it could
legitimately and reasonably be changed by root, which could break us
(if done before our code) or them (if we restore the wrong value).
I did not replace powernowk8_target; it needs fixing, but it grabs a
mutex (so no smp_call_function_single here) but Mark points out it can
be called multiple times per second, so work_on_cpu is too heavy.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: cpufreq@vger.kernel.org
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Tested-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Impact: don't play with current's cpumask
It's generally a very bad idea to mug some process's cpumask: it could
legitimately and reasonably be changed by root, which could break us
(if done before our code) or them (if we restore the wrong value).
Use rdmsr_on_cpu and wrmsr_on_cpu instead.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: cpufreq@vger.kernel.org
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Impact: don't play with current's cpumask
It's generally a very bad idea to mug some process's cpumask: it could
legitimately and reasonably be changed by root, which could break us
(if done before our code) or them (if we restore the wrong value).
We use smp_call_function_single: this had the advantage of being more
efficient, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: cpufreq@vger.kernel.org
Cc: Dominik Brodowski <linux@brodo.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Make powernowk8_get() similar to powernowk8_target() and powernowk8_verify()
in the way it obtains "powernow_data" for a given CPU.
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Langsdorf, Mark <mark.langsdorf@amd.com>
Cc: Thomas Renninger <trenn@suse.de>
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Langsdorf, Mark <mark.langsdorf@amd.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
By definition, "cpuinfo_cur_freq" should report the value from HW. So, don't
depend on the cached value. Instead read P-state directly from HW, while
taking into account the erratum 311 workaround for Fam 11h processors.
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Langsdorf, Mark <mark.langsdorf@amd.com>
Cc: Thomas Renninger <trenn@suse.de>
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Acked-by: Langsdorf, Mark <mark.langsdorf@amd.com>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
This symbol doesn't need file-global scope.
Cc: "Zhang, Rui" <rui.zhang@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Langsdorf, Mark <mark.langsdorf@amd.com>
Cc: Leo Milano <lmilano@gmx.net>
Cc: Thomas Renninger <trenn@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
This doesn't fix anything, but it's expected that a transition latency of 0
could cause trouble in the future.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: Langsdorf, Mark <mark.langsdorf@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
__copy_from_user_inatomic() isn't NMI safe in that it can trigger
the page fault handler which is another trap and its return path
invokes IRET which will also close the NMI context.
Therefore use a GUP based approach to copy the stack frames over.
We tried an alternative solution as well: we used a forward ported
version of Mathieu Desnoyers's "NMI safe INT3 and Page Fault" patch
that modifies the exception return path to use an open-coded IRET with
explicit stack unrolling and TF checking.
This didnt work as it interacted with faulting user-space instructions,
causing them not to restart properly, which corrupts user-space
registers.
Solving that would probably involve disassembling those instructions
and backtracing the RIP. But even without that, the code was deemed
rather complex to the already non-trivial x86 entry assembly code,
so instead we went for this GUP based method that does a
software-walk of the pagetables.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The hooks that we modify are:
- Page fault handler (to handle kmemcheck faults)
- Debug exception handler (to hide pages after single-stepping
the instruction that caused the page fault)
Also redefine memset() to use the optimized version if kmemcheck is
enabled.
(Thanks to Pekka Enberg for minimizing the impact on the page fault
handler.)
As kmemcheck doesn't handle MMX/SSE instructions (yet), we also disable
the optimized xor code, and rely instead on the generic C implementation
in order to avoid false-positive warnings.
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
[whitespace fixlet]
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
Kernel-space call-chains were trimmed at the first entry because
we never processed anything beyond the first stack context.
Allow the backtrace to jump from NMI to IRQ stack then to task stack
and finally user-space stack.
Also calculate the stack and bp variables correctly so that the
stack walker does not exit early.
We can get deep traces as a result, visible in perf report -D output:
0x32af0 [0xe0]: PERF_EVENT (IP, 5): 15134: 0xffffffff815225fd period: 1
... chain: u:2, k:22, nr:24
..... 0: 0xffffffff815225fd
..... 1: 0xffffffff810ac51c
..... 2: 0xffffffff81018e29
..... 3: 0xffffffff81523939
..... 4: 0xffffffff81524b8f
..... 5: 0xffffffff81524bd9
..... 6: 0xffffffff8105e498
..... 7: 0xffffffff8152315a
..... 8: 0xffffffff81522c3a
..... 9: 0xffffffff810d9b74
..... 10: 0xffffffff810dbeec
..... 11: 0xffffffff810dc3fb
This is a 22-entries kernel-space chain.
(We still only record reliable stack entries.)
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix the ptregs variant when we hit user-mode tasks.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
32 bits can also access x86_cache_alignment, x86_phys_bits and
x86_virt_bits, make them available to user space just as on 64 bits.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
LKML-Reference: <1244921390.11733.30.camel@ht.satnam>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
All AMD models share the same hw caching related event table.
Also complete the table with more events.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1244835381.2802.2.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
AMD supports performance monitoring start from K7 (i.e. family 6),
so disable it for earlier AMD CPUs.
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1244714289.6923.0.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The fixed-function performance counters do not work on current Atom
processors. Use the general-purpose ones instead.
Signed-off-by: Yong Wang <yong.y.wang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <20090612080855.GA2286@ywang-moblin2.bj.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>