Use asm/syscall.h interfaces that do the same things.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
After fixing the u32 thinko I sill had occasional hickups on ATI chipsets
with small deltas. There seems to be a delay between writing the compare
register and the transffer to the internal register which triggers the
interrupt. Reading back the value makes sure, that it hit the internal
match register befor we compare against the counter value.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We use the HPET only in 32bit mode because:
1) some HPETs are 32bit only
2) on i386 there is no way to read/write the HPET atomic 64bit wide
The HPET code unification done by the "moron of the year" did
not take into account that unsigned long is different on 32 and
64 bit.
This thinko results in a possible endless loop in the clockevents
code, when the return comparison fails due to the 64bit/332bit
unawareness.
unsigned long cnt = (u32) hpet_read() + delta can wrap over 32bit.
but the final compare will fail and return -ETIME causing endless
loops.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use X86_FEATURE_NOPL to determine if it is safe to use P6 NOPs in
alternatives. Also, replace table and loop with simple if statement.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The long noops ("NOPL") are supposed to be detected by family >= 6.
Unfortunately, several non-Intel x86 implementations, both hardware
and software, don't obey this dictum. Instead, probe for NOPL
directly by executing a NOPL instruction and see if we get #UD.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Some BIOSes (the Intel DG33BU, for example) wrongly claim to have DMAR
when they don't. Avoid the resulting crashes when it doesn't work as
expected.
I'd still be grateful if someone could test it on a DG33BU with the old
BIOS though, since I've killed mine. I tested the DMI version, but not
this one.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds the detection of the northbridges in the AMD family 0x11
processors. It also fixes the magic numbers there while changing this code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move reset_lazy_tlbstate into tlb_32.c, and define noop versions of
play_dead() in process_{32,64}.c when !CONFIG_SMP.
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The minimum reprogramming delta was hardcoded in HPET ticks,
which is stupid as it does not work with faster running HPETs.
The C1E idle patches made this prominent on AMD/RS690 chipsets,
where the HPET runs with 25MHz. Set it to 5us which seems to be
a reasonable value and fixes the problems on the bug reporters
machines. We have a further sanity check now in the clock events,
which increases the delta when it is not sufficient.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br>
Tested-by: Dmitry Nezhevenko <dion@inhex.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use cpu/common.c on both 64-bit and 32-bit and remove cpu/common_64.c.
We started out with this linecount:
816 arch/x86/kernel/cpu/common_64.c
805 arch/x86/kernel/cpu/common.c
and the resulting common.c is 1197 lines long, so there's already
424 lines of code eliminated in this phase of the unification.
Signed-off-by: Yinghai <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merge leftover whitespaces, to make arch/x86/kernel/cpu/common_64.c
exactly identical to arch/x86/kernel/cpu/common.c.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
hard to merge by lines... (as here we have material differences between
32-bit and 64-bit mode) - will try to do it later.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move the 32-bit and 64-bit gdt_page definitions next to each
other, separated with an #ifdef.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Make the files more similar in preparation to unification, no
code changed.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
64-bit has X86_HT set too, so use that instead of SMP.
This also removes a include/asm-x86/processor.h ifdef.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce a fast TSC-calibration method on sane hardware.
It only uses 17920 PIT timer ticks to calibrate the TSC, plus 256 ticks on
each side to make sure the TSC values were very close to the tick, so the
whole calibration takes 15ms. Yet, despite only takign 15ms,
we can actually give pretty stringent guarantees of accuracy:
- the code requires that we hit each 256-counter block at least 50 times,
so the TSC error is basically at *MOST* just a few PIT cycles off in
any direction. In practice, it's going to be about one microseconds
off (which is how long it takes to read the counter)
- so over 17920 PIT cycles, we can pretty much guarantee that the
calibration error is less than one half of a percent.
My testing bears this out: on my machine, the quick-calibration reports
2934.085kHz, while the slow one reports 2933.415.
Yes, the slower calibration is still more precise. For me, the slow
calibration is stable to within about one hundreth of a percent, so it's
(at a guess) roughly an order-and-a-half of magnitude more precise. The
longer you wait, the more precise you can be.
However, the nice thing about the fast TSC PIT synchronization is that
it's pretty much _guaranteed_ to give that 0.5% precision, and fail
gracefully (and very quickly) if it doesn't get it. And it really is
fairly simple (even if there's a lot of _details_ there, and I didn't get
all of those right ont he first try or even the second ;)
The patch says "110 insertions", but 63 of those new lines are actually
comments.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/kernel/tsc.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 110 insertions(+), 1 deletions(-)
1. add c_x86_vendor into cpu_dev
2. change cpu_devs to static
3. check c_x86_vendor before put that cpu_dev into array
4. remove alignment for 64bit
5. order the sequence in cpu_devs according to link sequence...
so could put intel at first, then amd...
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
v2: make 64 bit get c->x86_cache_alignment = c->x86_clfush_size
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1. add extended_cpuid_level for 32bit
2. add generic_identify for 64bit
3. add early_identify_cpu for 32bit
4. early_identify_cpu not be called by identify_cpu
5. remove early in get_cpu_vendor for 32bit
6. add get_cpu_cap
7. add cpu_detect for 64bit
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move early cpu initialization after cpu early get cap so the
early cpu initialization can fix up cpu caps.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Krzysztof Helt found MTRR is not detected on k6-2
root cause:
we moved mtrr_bp_init() early for mtrr trimming,
and in early_detect we only read the CPU capability from cpuid,
so some cpu doesn't have that bit in cpuid.
So we need to add early_init_xxxx to preset those bit before mtrr_bp_init
for those earlier cpus.
this patch is for v2.6.27
Reported-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
try to insert_resource second time, by expanding the resource...
for case: e820 reserved entry is partially overlapped with bar res...
hope it will never happen
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: aestetic
Capitalize function call interrupts consistently.
All other descriptions in /proc/interrupts are capitalized except
for "function call interrupts". Capitalize it too for consistency.
While that's technically a published ABI I think the risk of anyone
relying on that text to stay the same is negligible.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
this one replaces:
| commit a2bd7274b4
| Author: Yinghai Lu <yhlu.kernel@gmail.com>
| Date: Mon Aug 25 00:56:08 2008 -0700
|
| x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3
v2: insert e820 reserve resources before pnp_system_init
v3: fix merging problem in tip/x86/core
v4: address Linus's review about comments and condition in _late()
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
so could let BAR res register at first, or even pnp.
v2: insert e820 reserve resources before pnp_system_init
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The last changes made the calibration loop 250ms long which is far
too much. Try to do that more clever.
Experiments have shown that using a 10ms delay for the PIT based calibration
gives us a good enough value. If we have a reference (HPET/PMTIMER) and the
result of the PIT and the reference is close enough, then we can break out of
the calibration loop on a match right away and use the reference value.
Otherwise we just loop 3 times and decide then, which value to take.
One caveat is that for virtualized environments the PIT calibration often does
not work at all and I found out that 10us is a bit too short as well for the
reference to give a sane result. The solution here is to make the last loop
longer when the first two PIT calibrations failed.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When calibration against PIT fails, the warning that we print is misleading.
In a virtualized environment the VM may get descheduled while calibration
or, the check in PIT calibration may fail due to other virtualization
overheads.
The warning message explicitly assumes that calibration failed due to SMI's
which may not be the case. Change that to something proper.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manually adding "io_delay=0xed" fixes system lockups in ioapic
mode on this machine.
System Information
Manufacturer: Hewlett-Packard
Product Name: Presario F700 (KA695EA#ABF)
Base Board Information
Manufacturer: Quanta
Product Name: 30D3
Reference:
https://bugzilla.redhat.com/show_bug.cgi?id=459546
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The TSC calibration function is still very complicated, but this makes
it at least a little bit less so by moving the PIT part out into a
helper function of its own.
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org>
Larry Finger reported at http://lkml.org/lkml/2008/9/1/90:
An ancient laptop of mine started throwing errors from b43legacy when
I started using 2.6.27 on it. This has been bisected to commit bfc0f59
"x86: merge tsc calibration".
The unification of the TSC code adopted mostly the 64bit code, which
prefers PMTIMER/HPET over the PIT calibration.
Larrys system has an AMD K6 CPU. Such systems are known to have
PMTIMER incarnations which run at double speed. This results in a
miscalibration of the TSC by factor 0.5. So the resulting calibrated
CPU/TSC speed is half of the real CPU speed, which means that the TSC
based delay loop will run half the time it should run. That might
explain why the b43legacy driver went berserk.
On the other hand we know about systems, where the PIT based
calibration results in random crap due to heavy SMI/SMM
disturbance. On those systems the PMTIMER/HPET based calibration logic
with SMI detection shows better results.
According to Alok also virtualized systems suffer from the PIT
calibration method.
The solution is to use a more wreckage aware aproach than the current
either/or decision.
1) reimplement the retry loop which was dropped from the 32bit code
during the merge. It repeats the calibration and selects the lowest
frequency value as this is probably the closest estimate to the real
frequency
2) Monitor the delta of the TSC values in the delay loop which waits
for the PIT counter to reach zero. If the maximum value is
significantly different from the minimum, then we have a pretty safe
indicator that the loop was disturbed by an SMI.
3) keep the pmtimer/hpet reference as a backup solution for systems
where the SMI disturbance is a permanent point of failure for PIT
based calibration
4) do the loop iteration for both methods, record the lowest value and
decide after all iterations finished.
5) Set a clear preference to PIT based calibration when the result
makes sense.
The implementation does the reference calibration based on
HPET/PMTIMER around the delay, which is necessary for the PIT anyway,
but keeps separate TSC values to ensure the "independency" of the
resulting calibration values.
Tested on various 32bit/64bit machines including Geode 266Mhz, AMD K6
(affected machine with a double speed pmtimer which I grabbed out of
the dump), Pentium class machines and AMD/Intel 64 bit boxen.
Bisected-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make poll_idle() behave more like the other idle methods.
Currently, poll_idle() returns immediately. The other
idle methods all wait indefinately for some condition
to come true before returning. poll_idle should emulate
these other methods and also wait for a return condition,
in this case, for need_resched() to become 'true'.
Without this delay the idle loop spends all of its time
in the outer loop that calls poll_idle. This outer loop,
these days, does real work, some of it under rcu locks.
That work should only be done when idle is entered and
when idle exits, not continuously while idle is spinning.
Signed-off-by: Joe Korty <joe.korty@ccur.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We have had a number of cases where <asm/cpufeature.h> (and its
predecessors) have diverged substantially from the names list in
/proc/cpuinfo. This patch generates the latter from the former.
It retains the option for explicitly overriding the strings, but by
making that require a separate action it should at least be less
likely to happen.
It would be good to do a future pass and rename strings that are
gratuituously different in the kernel (/proc/cpuinfo is a userspace
interface and must remain constant.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
acpi_mcfg_64bit_base_addr is used when CONFIG_PCI_MMCONFIG is enabled.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Return the correct return value when the CPUID driver partially
completes a request (we should return the number of bytes actually
read or written, instead of the error code.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Return the correct return value when the MSR driver partially
completes a request (we should return the number of bytes actually
read or written, instead of the error code.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Propagate error (-ENXIO) from smp_call_function_single() in the CPUID
driver. This can happen when a CPU is unplugged while the CPUID
driver is open.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Propagate error (-ENXIO) from smp_call_function_single(). These
errors can happen when a CPU is unplugged while the MSR driver is
open.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
I noticed that my sched_clock() was slow on a number of machine, so I
started looking at cpufreq.
The below seems to fix the problem for me.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Triple-fault and keyboard reset may assert INIT instead of RESET; however
INIT is blocked when Intel VT is enabled. This leads to a partially reset
machine when invoking emergency_restart via sysrq-b: the processor is still
working but other parts of the system are dead.
Default to rebooting via ACPI, which correctly asserts RESET and reboots the
machine.
This is safe since we will fall back to keyboard reset and triple fault if
acpi is not enabled or if the reset is not successful.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It allows paravirt implementations of cpu_disable to share the
cpu_disable_common code, without having to take on board APIC
writes, which may not be appropriate.
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add the new play_dead into smpboot.c, as it fits more cleanly in there
alongside other CONFIG_HOTPLUG functions.
Separate out the common code into its own function.
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The removal of the CPU from the various maps was redundant as it already
happened in cpu_disable.
After cleaning this up, cpu_uninit only resets the tlb state, so rename
it and create a noop version for the X86_64 case (so the two play_deads
can be unified later).
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: crash on non-TSC-equipped CPUs
Don't enable the TSC notifier if we *either*:
1. don't have a CPU, or
2. have a CPU with constant TSC.
In either of those cases, the notifier is either damaging (1) or useless(2).
From: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
During CPU hot-remove the sysfs directory created by
threshold_create_bank(), defined in
arch/x86/kernel/cpu/mcheck/mce_amd_64.c, has to be removed before
its parent directory, created by mce_create_device(), defined in
arch/x86/kernel/cpu/mcheck/mce_64.c . Moreover, when the CPU in
question is hotplugged again, obviously the latter has to be created
before the former. At present, the right ordering is not enforced,
because all of these operations are carried out by CPU hotplug
notifiers which are not appropriately ordered with respect to each
other. This leads to serious problems on systems with two or more
multicore AMD CPUs, among other things during suspend and hibernation.
Fix the problem by placing threshold bank CPU hotplug callbacks in
mce_cpu_callback(), so that they are invoked at the right places,
if defined. Additionally, use kobject_del() to remove the sysfs
directory associated with the kobject created by
kobject_create_and_add() in threshold_create_bank(), to prevent the
kernel from crashing during CPU hotplug operations on systems with
two or more multicore AMD CPUs.
This patch fixes bug #11337.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Andi Kleen <andi@firstfloor.org>
Tested-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
use x2apic id reported by cpuid during topology discovery, instead of the
apic id configured in the APIC. For most of the systems, x2apic id
reported by cpuid leaf 0xb will be same as the physical apic id reported
by the APIC_ID register of the APIC. We follow the suggested guidelines
and use the apic id reported by the cpuid.
No change to non-generic UV platforms, will use the apic id reported in the
APIC_ID register as the cpuid reported apic id's may not be unique.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
cpuid leaf 0xb provides extended topology enumeration. This interface provides
the 32-bit x2APIC id of the logical processor and it also provides a new
mechanism to detect SMT and core siblings (which provides increased
addressability).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: work around MTRR mask setting, v2
x86: fix section mismatch warning - uv_cpu_init
x86: fix VMI for early params
x86: fix two modpost warnings in mm/init_64.c
x86: fix 1:1 mapping init on 64-bit (memory hotplug case)
x86: work around MTRR mask setting
x86: PAT Update validate_pat_support for intel CPUs
devmem, x86: PAT Change /dev/mem mmap with O_SYNC to use UC_MINUS
x86: PAT proper tracking of set_memory_uc and friends
x86: fix BUG: unable to handle kernel paging request (numaq_tsc_disable)
x86: export pv_lock_ops non-GPL
x86, mmiotrace: silence section mismatch warning - leave_uniprocessor
x86: use WARN() in arch/x86/kernel
x86: use WARN() in arch/x86/mm/ioremap.c
werror: fix pci calgary
x86: fix oprofile + hibernation badness
x86, SGI UV: hardcode the TLB flush interrupt system vector
x86: fix Xorg startup/shutdown slowdown with PAT
x86: fix "kernel won't boot on a Cyrix MediaGXm (Geode)"
x86 iommu: remove unneeded parenthesis
None of the spinlock API is exported GPL, so there's no reason for
pv_lock_ops to be.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: drago01 <drago01@gmail.com>
improve the debug printout:
- make it actually display something
- print it only once
would be nice to have a WARN_ONCE() facility, to feed such things to
kerneloops.org.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.cpuinit.text+0x3cc4): Section mismatch in reference from the function uv_cpu_init() to the function .init.text:uv_system_init()
The function __cpuinit uv_cpu_init() references
a function __init uv_system_init().
If uv_system_init is only used by uv_cpu_init then
annotate uv_system_init with a matching annotation.
uv_system_init was ment to be called only once, so do it from codepath
(native_smp_prepare_cpus) which is called once, right before activation
of other cpus (smp_init).
Note: old code relied on uv_node_to_blade being initialized to 0,
but it'a not initialized from anywhere.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
alloc_coherent dma_ops callback was added to GART, however, it doesn't
return a size aligned address wrt dma_alloc_coherent, as
DMA-mapping.txt defines. This patch fixes it.
This patch also removes unused gart_map_simple
(dma_mapping_ops->map_simple has gone).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pci-dma.c doesn't use map_simple hook any more so we can remove it
from struct dma_mapping_ops now.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
All the x86 DMA-API functions are defined in asm/dma-mapping.h. This patch
moves the dma_*_coherent functions also to this header file because they are
now small enough to do so.
This is done as a separate patch because it also includes some renaming and
restructuring of the dma-mapping.h file.
Signed-off-by: Joerg Roedel <joerg.roede@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
All dma_ops implementations support the alloc_coherent and free_coherent
callbacks now. This allows a big simplification of the dma_alloc_coherent
function which is done with this patch. The dma_free_coherent functions is also
cleaned up and calls now the free_coherent callback of the dma_ops
implementation.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ v2 - x86: make gart_alloc_coherent return zeroed memory
FUJITA Tomonori pointed it out that the dma_alloc_coherent function
should return memory set to zero. This patch adds this to the GART
implementation too. ]
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
while fixing a different bug i moved the call to vmi_init before
early params could be parsed.
This broke the vmi specific commandline parameters.
Fix that, by moving vmi initialization after kernel has got a chance to
parse early parameters.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
microcode_amd.c uses ">> 32" on a 32-bit value, so gcc warns about that.
The code could use something like this *untested* patch.
linux-next-20080821/arch/x86/kernel/microcode_amd.c:229: warning: right shift count >= width of type
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Joshua Hoblitt reported that only 3 GB of his 16 GB of RAM is
usable. Booting with mtrr_show showed us the BIOS-initialized
MTRR settings - which are all wrong.
So the root cause is that the BIOS has not set the mask correctly:
> [ 0.429971] MSR00000200: 00000000d0000000
> [ 0.433305] MSR00000201: 0000000ff0000800
> should be ==> [ 0.433305] MSR00000201: 0000003ff0000800
>
> [ 0.436638] MSR00000202: 00000000e0000000
> [ 0.439971] MSR00000203: 0000000fe0000800
> should be ==> [ 0.439971] MSR00000203: 0000003fe0000800
>
> [ 0.443304] MSR00000204: 0000000000000006
> [ 0.446637] MSR00000205: 0000000c00000800
> should be ==> [ 0.446637] MSR00000205: 0000003c00000800
>
> [ 0.449970] MSR00000206: 0000000400000006
> [ 0.453303] MSR00000207: 0000000fe0000800
> should be ==> [ 0.453303] MSR00000207: 0000003fe0000800
>
> [ 0.456636] MSR00000208: 0000000420000006
> [ 0.459970] MSR00000209: 0000000ff0000800
> should be ==> [ 0.459970] MSR00000209: 0000003ff0000800
So detect this borkage and add the prefix 111.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch changes the pfn args from 'u32' to 'unsigned long'
on alloc_p*() functions on paravirt_ops, and the corresponding
implementations for Xen and VMI. The prototypes for CONFIG_PARAVIRT=n
are already using unsigned long, so paravirt.h now matches the prototypes
on asm-x86/pgalloc.h.
It shouldn't result in any changes on generated code on 32-bit, with
or without CONFIG_PARAVIRT. On both cases, 'codiff -f' didn't show any
change after applying this patch.
On 64-bit, there are (expected) binary changes only when CONFIG_PARAVIRT
is enabled, as the patch is really supposed to change the size of the
pfn args.
[ v2: KVM_GUEST: use the right parameter type on kvm_release_pt() ]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pentium III and Core Solo/Duo CPUs have an erratum
" Page with PAT set to WC while associated MTRR is UC may consolidate to UC "
which can result in WC setting in PAT to be ineffective. We will disable
PAT on such CPUs, so that we can continue to use MTRR WC setting.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add support for the E820_UNUSABLE memory type, which is defined in
Revision 3.0b (Oct. 10, 2006) of the ACPI Specification on p. 394 Table
14-1:
AddressRangeUnusuable This range of address contains memory in which
errors have been detected. This range must not be used by the OSPM.
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: Gang Wei <gang.wei@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This section mismatch:
>> Seems to be a section mismatch; init_intel() is __cpuinit while
>> numaq_tsc_disable() is __init. Seems to be introduced in:
>>
>> commit 64898a8bad
>> Author: Yinghai Lu <yhlu.kernel@gmail.com>
>> Date: Sat Jul 19 18:01:16 2008 -0700
>>
>> x86: extend and use x86_quirks to clean up NUMAQ code
>
> Oops, I am wrong about numaq_tsc_disable() being __init. Still, I
> believe that Yinghai might be able to say what's really wrong :-)
Would lead to this crash:
BUG: unable to handle kernel paging request at c08a45f0
IP: [<c08a45f0>] numaq_tsc_disable+0x0/0x40
Fixed by the patch below.
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
None of the spinlock API is exported GPL, so there's no reason for
pv_lock_ops to be.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: drago01 <drago01@gmail.com>
after following patch,
commit 1b313f4a6d
Author: Cyrill Gorcunov <gorcunov@gmail.com>
Date: Mon Aug 18 20:45:57 2008 +0400
x86: apic - generic_processor_info
- use physid_set instead of phys_cpu and physids_or
- set phys_cpu_present_map bit AFTER check for allowed
number of processors
- add checking for APIC valid version in 64bit mode
(mostly not needed but added for merging purpose)
- add apic_version definition for 64bit mode which
is used now
we are getting warning for acpi path on 64 bit system.
make the 64-bit side fill in apic_version[] as well.
[ mingo@elte.hu: build fix ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use WARN() instead of a printk+WARN_ON() pair; this way the message
becomes part of the warning section for better reporting/collection.
This also allowed the folding of some if()'s into the WARN()
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix an integer comparison always false warning in the PCI Calgary 64 driver.
A u8 is being compared to something that's 512 by default, resulting in the
following warning:
arch/x86/kernel/pci-calgary_64.c:1285: warning: comparison is always false due to limited range of data type
This was introduced by patch b34e90b8f0.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is useful for a pv_lock_ops backend to know whether interrupts are
enabled or not in the context a spin_lock is being called. This
allows it to enable interrupts while spinning, which could be
particularly helpful when spinning becomes blocking.
The default implementation just calls the normal spin_lock op,
ignoring the flags.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The UV TLB shootdown mechanism needs a system interrupt vector.
Its vector had been hardcoded as 200, but needs to moved to the reserved
system vector range so that it does not collide with some device vector.
This is still temporary until dynamic system IRQ allocation is provided.
But it will be needed when real UV hardware becomes available and runs 2.6.27.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is the 1st patch in the series. Here the aim was to avoid any
significant changes, logically-wise.
So it's mainly about generic interface refactoring: e.g. make
microcode_{intel,amd}.c more about arch-specific details and less
about policies like make-sure-we-run-on-a-target-cpu
(no more set_cpus_allowed_ptr() here) and generic synchronization (no
more microcode_mutex here).
All in all, more line have been deleted than added.
4 files changed, 145 insertions(+), 198 deletions(-)
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Don't use get_cpu() at all. Resort to checking a boot-up CPU (#0) in
microcode_{intel,amd}_module_init().
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cyrix MediaGXm/Cx5530 Unicorn Revision 1.19.3B has stopped
booting starting at v2.6.22.
The reason is this commit:
> commit f25f64ed5b
> Author: Juergen Beisert <juergen@kreuzholzen.de>
> Date: Sun Jul 22 11:12:38 2007 +0200
>
> x86: Replace NSC/Cyrix specific chipset access macros by inlined functions.
this commit activated a macro which was dormant before due to (buggy)
macro side-effects.
I've looked through various datasheets and found that the GXm and GXLV
Geode processors don't have an incrementor.
Remove the incrementor setup entirely. As the incrementor value
differs according to clock speed and we would hope that the BIOS
configures it correctly, it is probably the right solution.
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 34ae7f35a2, which has
been reported to cause a number of problems. During suspend and resume,
it apparently causes a crash in a CPU hotplug notifier to happen,
although the exact details are sketchy because of the inability to get
good traces during the suspend sequence.
See buzilla entries
http://bugzilla.kernel.org/show_bug.cgi?id=11296http://bugzilla.kernel.org/show_bug.cgi?id=11339
for more examples and details.
[ Mark: "Revert the patch for now. I'm still looking into getting a
reliable reproduction and I do not have a fix at this time." ]
Requested-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@inux-foundation.org>
Use X86_FEATURE_NOPL to determine if it is safe to use P6 NOPs in
alternatives. Also, replace table and loop with simple if statement.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The long noops ("NOPL") are supposed to be detected by family >= 6.
Unfortunately, several non-Intel x86 implementations, both hardware
and software, don't obey this dictum. Instead, probe for NOPL
directly by executing a NOPL instruction and see if we get #UD.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The parenthesis in __iommu_queue_command() are not needed when assigning
into 'target' variable.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/early_printk.c:404:13: warning: incorrect type in assignment (different base types)
arch/x86/kernel/early_printk.c:404:13: expected restricted __le16 [assigned] [usertype] wValue
arch/x86/kernel/early_printk.c:404:13: got int [signed] value
arch/x86/kernel/early_printk.c:405:13: warning: incorrect type in assignment (different base types)
arch/x86/kernel/early_printk.c:405:13: expected restricted __le16 [assigned] [usertype] wIndex
arch/x86/kernel/early_printk.c:405:13: got int [signed] index
arch/x86/kernel/early_printk.c:406:14: warning: incorrect type in assignment (different base types)
arch/x86/kernel/early_printk.c:406:14: expected restricted __le16 [assigned] [usertype] wLength
arch/x86/kernel/early_printk.c:406:14: got int [signed] size
arch/x86/kernel/early_printk.c:845:16: warning: Using plain integer as NULL pointer
arch/x86/kernel/early_printk.c:992:13: warning: symbol 'enable_debug_console' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Just add parenthesis to be identical of current
64bit implementation (so diff will not complain).
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- use physid_set instead of phys_cpu and physids_or
- set phys_cpu_present_map bit AFTER check for allowed
number of processors
- add checking for APIC valid version in 64bit mode
(mostly not needed but added for merging purpose)
- add apic_version definition for 64bit mode which
is used now
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We use 32bit code former for 64bit
mode since it's much better implementation
and easier to merge.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds some configuration options that allow to compile out
CPU vendor-specific code in x86 kernels (in arch/x86/kernel/cpu). The
new configuration options are only visible when CONFIG_EMBEDDED is
selected, as they are mostly interesting for space savings reasons.
An example of size saving, on x86 with only Intel CPU support:
text data bss dec hex filename
1125479 118760 212992 1457231 163c4f vmlinux.old
1121355 116536 212992 1450883 162383 vmlinux
-4124 -2224 0 -6348 -18CC +/-
However, I'm not exactly sure that the Kconfig wording is correct with
regard to !64BIT / 64BIT.
[ mingo@elte.hu: convert macro to inline ]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/cpu/intel.c defines a few fallback functions
(cmpxchg_*()) that are used when the CPU doesn't support cmpxchg
and/or cmpxchg64 natively. However, while defined in an Intel-specific
file, these functions are also used for CPUs from other vendors when
they don't support cmpxchg and/or cmpxchg64. This breaks the
compilation when support for Intel CPUs is disabled.
This patch moves these functions to a new
arch/x86/kernel/cpu/cmpxchg.c file, unconditionally compiled when
X86_32 is enabled.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: michael@free-electrons.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
movsl_mask is currently defined in arch/x86/kernel/cpu/intel.c, which
contains code specific to Intel CPUs. However, movsl_mask is used in
the non-CPU specific code in arch/x86/lib/usercopy_32.c, which breaks
the compilation when support for Intel CPUs is compiled out.
This patch solves this problem by moving movsl_mask's definition close
to its users in arch/x86/lib/usercopy_32.c.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: michael@free-electrons.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.text+0x27032): Section mismatch in reference from the function get_tce_space_from_tar() to the function .init.text:calgary_bus_has_devices()
The function get_tce_space_from_tar() references
the function __init calgary_bus_has_devices().
This is often because get_tce_space_from_tar lacks a __init
annotation or the annotation of calgary_bus_has_devices is wrong.
get_tce_space_from_tar is called only from __init function (calgary_init)
and calls __init function (calgary_bus_has_devices).
So annotate it properly.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Chandru Siddalingappa <chandru@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Take out part of get_local_pda referencing __init function (free_bootmem)
to new (static) function marked as __ref. It's safe to do because free_bootmem
is called before __init sections are dropped.
WARNING: vmlinux.o(.cpuinit.text+0x3cd7): Section mismatch in reference from the function get_local_pda() to the function .init.text:free_bootmem()
The function __cpuinit get_local_pda() references
a function __init free_bootmem().
If free_bootmem is only used by get_local_pda then
annotate free_bootmem with a matching annotation.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/power/cpu_32.c __save_processor_state calls read_cr4()
only a i486 CPU doesn't have the CR4 register. Trying to read it
produces an invalid opcode oops during suspend to disk.
Use the safe rc4 reading op instead. If the value to be written is
zero the write is skipped.
arch/x86/power/hibernate_asm_32.S
done: swapped the use of %eax and %ecx to use jecxz for
the zero test and jump over store to %cr4.
restore_image: s/%ecx/%eax/ to be consistent with done:
In addition to __save_processor_state, acpi_save_state_mem,
efi_call_phys_prelog, and efi_call_phys_epilog had checks added
(acpi restore was in assembly and already had a check for
non-zero). There were other reads and writes of CR4, but MCE and
virtualization shouldn't be executed on a i486 anyway.
Signed-off-by: David Fries <david@fries.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.text+0x118f7): Section mismatch in reference from the function construct_ioapic_table() to the function .init.text:MP_bus_info()
The function construct_ioapic_table() references
the function __init MP_bus_info().
This is often because construct_ioapic_table lacks a __init
annotation or the annotation of MP_bus_info is wrong.
construct_ioapic_table is called only from construct_default_ISA_mptable which is __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x1591): Section mismatch in reference from the function init_amd() to the function .init.text:check_enable_amd_mmconf_dmi()
The function __cpuinit init_amd() references
a function __init check_enable_amd_mmconf_dmi().
If check_enable_amd_mmconf_dmi is only used by init_amd then
annotate check_enable_amd_mmconf_dmi with a matching annotation.
check_enable_amd_mmconf_dmi is only called from init_amd which is __cpuinit
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x1fe7): Section mismatch in reference from the function MP_processor_info() to the variable .init.data:x86_quirks
The function __cpuinit MP_processor_info() references
a variable __initdata x86_quirks.
If x86_quirks is only used by MP_processor_info then
annotate x86_quirks with a matching annotation.
MP_processor_info uses x86_quirks which is __init and is used only from
smp_read_mpc and construct_default_ISA_mptable which are __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.text+0x7950): Section mismatch in reference from the function native_calibrate_tsc() to the function .init.text:tsc_read_refs()
The function native_calibrate_tsc() references
the function __init tsc_read_refs().
This is often because native_calibrate_tsc lacks a __init
annotation or the annotation of tsc_read_refs is wrong.
tsc_read_refs is called from native_calibrate_tsc which is not __init
and native_calibrate_tsc cannot be marked __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch changes GART IOMMU to return a size aligned address wrt
dma_alloc_coherent, as DMA-mapping.txt defines:
The cpu return address and the DMA bus master address are both
guaranteed to be aligned to the smallest PAGE_SIZE order which
is greater than or equal to the requested size. This invariant
exists (for example) to guarantee that if you allocate a chunk
which is smaller than or equal to 64 kilobytes, the extent of the
buffer you receive will not cross a 64K boundary.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rearrange functions and comments to find differences
easier.
Also use apic_printk in setup_boot_APIC_clock for
64bit mode.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- Remove redundant masking of APIC_LVTTHMR register in apic_32.c
- Add masking of APIC_LVTTHMR register to apic_64.c. We use a bit
complicated #ifdef here: CONFIG_X86_MCE_P4THERMAL is 32bit specific
and X86_MCE_INTEL is 64bit specific so the appropriate config variable
will be set by Kconfig.
- the APIC_ESR register clearing in apic_64.c now uses not straightforward
way but this is allowed tradeoff.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits)
x86: add MAP_STACK mmap flag
x86: fix section mismatch warning - spp_getpage()
x86: change init_gdt to update the gdt via write_gdt, rather than a direct write.
x86-64: fix overlap of modules and fixmap areas
x86, geode-mfgpt: check IRQ before using MFGPT as clocksource
x86, acpi: cleanup, temp_stack is used only when CONFIG_SMP is set
x86: fix spin_is_contended()
x86, nmi: clean UP NMI watchdog failure message
x86, NMI: fix watchdog failure message
x86: fix /proc/meminfo DirectMap
x86: fix readb() et al compile error with gcc-3.2.3
arch/x86/Kconfig: clean up, experimental adjustement
x86: invalidate caches before going into suspend
x86, perfctr: don't use CCCR_OVF_PMI1 on Pentium 4Ds
x86, AMD IOMMU: initialize dma_ops after sysfs registration
x86m AMD IOMMU: cleanup: replace LOW_U32 macro with generic lower_32_bits
x86, AMD IOMMU: initialize device table properly
x86, AMD IOMMU: use status bit instead of memory write-back for completion wait
x86: silence mmconfig printk
x86, msr: fix NULL pointer deref due to msr_open on nonexistent CPUs
...
- remove redundant read of APIC_LVR register in 64bit mode
- APIC is always integrated for 64bit mode so
gcc will eliminate lapic_is_integrated call
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
No changes on binary level
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If a kernel thread is preempted in single-cpu mode right after the NOP (nop
about to be turned into a lock prefix), then we CPU hotplug a CPU, and then the
thread is scheduled back again, a SMP-unsafe atomic operation will be used on
shared SMP variables, leading to corruption. No corruption would happen in the
reverse case : going from SMP to UP is ok because we split a bit instruction
into tiny pieces, which does not present this condition.
Changing the 0x90 (single-byte nop) currently used into a 0x3E DS segment
override prefix should fix this issue. Since the default of the atomic
instructions is to use the DS segment anyway, it should not affect the
behavior.
The exception to this are references that use ESP/RSP and EBP/RBP as
the base register (they will use the SS segment), however, in Linux
(a) DS == SS at all times, and (b) we do not distinguish between
segment violations reported as #SS as opposed to #GP, so there is no
need to disassemble the instruction to figure out the suitable segment.
This patch assumes that the 0x3E prefix will leave atomic operations as-is (thus
assuming they normally touch data in the DS segment). Since there seem to be no
obvious ill-use of other segment override prefixes for atomic operations, it
should be safe. It can be verified with a quick
grep -r LOCK_PREFIX include/asm-x86/
grep -A 1 -r LOCK_PREFIX arch/x86/
Taken from
This source :
AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System
Instructions
States
"Instructions that Reference a Non-Stack Segment—If an instruction encoding
references any base register other than rBP or rSP, or if an instruction
contains an immediate offset, the default segment is the data segment (DS).
These instructions can use the segment-override prefix to select one of the
non-default segments, as shown in Table 1-5."
Therefore, forcing the DS segment on the atomic operations, which already use
the DS segment, should not change.
This source :
http://wiki.osdev.org/X86_Instruction_Encoding
States
"In 64-bit the CS, SS, DS and ES segment overrides are ignored."
Confirmed by "AMD 64-Bit Technology" A.7
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/x86-64_overview.pdf
"In 64-bit mode, the DS, ES, SS and CS segment-override prefixes have no effect.
These four prefixes are no longer treated as segment-override prefixes in the
context of multipleprefix rules. Instead, they are treated as null prefixes."
This patch applies to 2.6.27-rc2, but would also have to be applied to earlier
kernels (2.6.26, 2.6.25, ...).
Performance impact of the fix : tests done on "xaddq" and "xaddl" shows it
actually improves performances on Intel Xeon, AMD64, Pentium M. It does not
change the performance on Pentium II, Pentium 3 and Pentium 4.
Xeon E5405 2.0GHz :
NR_TESTS 10000000
test empty cycles : 162207948
test test 1-byte nop xadd cycles : 170755422
test test DS override prefix xadd cycles : 170000118 *
test test LOCK xadd cycles : 472012134
AMD64 2.0GHz :
NR_TESTS 10000000
test empty cycles : 146674549
test test 1-byte nop xadd cycles : 150273860
test test DS override prefix xadd cycles : 149982382 *
test test LOCK xadd cycles : 270000690
Pentium 4 3.0GHz
NR_TESTS 10000000
test empty cycles : 290001195
test test 1-byte nop xadd cycles : 310000560
test test DS override prefix xadd cycles : 310000575 *
test test LOCK xadd cycles : 1050103740
Pentium M 2.0GHz
NR_TESTS 10000000
test empty cycles : 180000523
test test 1-byte nop xadd cycles : 320000345
test test DS override prefix xadd cycles : 310000374 *
test test LOCK xadd cycles : 480000357
Pentium 3 550MHz
NR_TESTS 10000000
test empty cycles : 510000231
test test 1-byte nop xadd cycles : 620000128
test test DS override prefix xadd cycles : 620000110 *
test test LOCK xadd cycles : 800000088
Pentium II 350MHz
NR_TESTS 10000000
test empty cycles : 200833494
test test 1-byte nop xadd cycles : 340000130
test test DS override prefix xadd cycles : 340000126 *
test test LOCK xadd cycles : 530000078
Speed test modules can be found at
http://ltt.polymtl.ca/svn/trunk/tests/kernel/test-prefix-speed-32.chttp://ltt.polymtl.ca/svn/trunk/tests/kernel/test-prefix-speed.c
Macro-benchmarks
2.0GHz E5405 Core 2 dual Quad-Core Xeon
Summary
* replace smp lock prefixes with DS segment selector prefixes
no lock prefix (s) with lock prefix (s) Speedup
make -j1 kernel/ 33.94 +/- 0.07 34.91 +/- 0.27 2.8 %
hackbench 50 2.99 +/- 0.01 3.74 +/- 0.01 25.1 %
* replace smp lock prefixes with 0x90 nops
no lock prefix (s) with lock prefix (s) Speedup
make -j1 kernel/ 34.16 +/- 0.32 34.91 +/- 0.27 2.2 %
hackbench 50 3.00 +/- 0.01 3.74 +/- 0.01 24.7 %
Detail :
1 CPU, replace smp lock prefixes with DS segment selector prefixes
make -j1 kernel/
real 0m34.067s
user 0m30.630s
sys 0m2.980s
real 0m33.867s
user 0m30.582s
sys 0m3.024s
real 0m33.939s
user 0m30.738s
sys 0m2.876s
real 0m33.913s
user 0m30.806s
sys 0m2.808s
avg : 33.94s
std. dev. : 0.07s
hackbench 50
Time: 2.978
Time: 2.982
Time: 3.010
Time: 2.984
Time: 2.982
avg : 2.99
std. dev. : 0.01
1 CPU, noreplace-smp
make -j1 kernel/
real 0m35.326s
user 0m30.630s
sys 0m3.260s
real 0m34.325s
user 0m30.802s
sys 0m3.084s
real 0m35.568s
user 0m30.722s
sys 0m3.168s
real 0m34.435s
user 0m30.886s
sys 0m2.996s
avg.: 34.91s
std. dev. : 0.27s
hackbench 50
Time: 3.733
Time: 3.750
Time: 3.761
Time: 3.737
Time: 3.741
avg : 3.74
std. dev. : 0.01
1 CPU, replace smp lock prefixes with 0x90 nops
make -j1 kernel/
real 0m34.139s
user 0m30.782s
sys 0m2.820s
real 0m34.010s
user 0m30.630s
sys 0m2.976s
real 0m34.777s
user 0m30.658s
sys 0m2.916s
real 0m33.924s
user 0m30.634s
sys 0m2.924s
real 0m33.962s
user 0m30.774s
sys 0m2.800s
real 0m34.141s
user 0m30.770s
sys 0m2.828s
avg : 34.16
std. dev. : 0.32
hackbench 50
Time: 2.999
Time: 2.994
Time: 3.004
Time: 2.991
Time: 2.988
avg : 3.00
std. dev. : 0.01
I did more runs (20 runs of each) to compare the nop case to the DS
prefix case. Results in seconds. They actually does not seems to show a
significant difference.
NOP
34.155
33.955
34.012
35.299
35.679
34.141
33.995
35.016
34.254
33.957
33.957
34.008
35.013
34.494
33.893
34.295
34.314
34.854
33.991
34.132
DS
34.080
34.304
34.374
35.095
34.291
34.135
33.940
34.208
35.276
34.288
33.861
33.898
34.610
34.709
33.851
34.256
35.161
34.283
33.865
35.078
Used http://www.graphpad.com/quickcalcs/ttest1.cfm?Format=C to do the
T-test (yeah, I'm lazy) :
Group Group One (DS prefix) Group Two (nops)
Mean 34.37815 34.37070
SD 0.46108 0.51905
SEM 0.10310 0.11606
N 20 20
P value and statistical significance:
The two-tailed P value equals 0.9620
By conventional criteria, this difference is considered to be not statistically significant.
Confidence interval:
The mean of Group One minus Group Two equals 0.00745
95% confidence interval of this difference: From -0.30682 to 0.32172
Intermediate values used in calculations:
t = 0.0480
df = 38
standard error of difference = 0.155
So, unless these calculus are completely bogus, the difference between the nop
and the DS case seems not to be statistically significant.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: H. Peter Anvin <hpa@zytor.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Roland McGrath <roland@redhat.com>
CC: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
CC: Steven Rostedt <srostedt@redhat.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: David Miller <davem@davemloft.net>
CC: Ulrich Drepper <drepper@redhat.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Gregory Haskins <ghaskins@novell.com>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
CC: Clark Williams <williams@redhat.com>
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Andi Kleen <andi@firstfloor.org>
CC: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
By writing directly, a memory access violation can occur whilst
hotplugging a CPU if the entry was previously marked read-only.
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix coding style of traps_64.c with improvements suggested by Ingo.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ftrace depends on some processor state that we destroyed during kexec and
restored by restore_processor_state(). So save_processor_state() and
restore_processor_state() are moved into machine_kexec() and ftrace is
restored after restore_processor_state().
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kexec/Kexec-jump require code size in control page is less than
PAGE_SIZE/2. This patch add link-time checking for this.
ASSERT() of ld link script is used as the link-time checking mechanism.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control
page is used for not only code on some platform. For example in kexec
jump, it is used for data and stack too.
[akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Plus add a build time check so this doesn't go unnoticed again.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dereference took place in code part responsible for manual installation
of microcode patches through /dev/cpu/microcode.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Adds a simple IRQ autodetection to the AMD Geode MFGPT driver, and more
importantly, adds some checks, if IRQs can actually be received on the
chosen line. This fixes cases where MFGPT is selected as clocksource
though not producing any ticks, so the kernel simply starves during
boot.
Signed-off-by: Jens Rottmann <JRottmann@LiPPERTEmbedded.de>
Cc: Andres Salomon <dilinger@debian.org>
Cc: linux-geode@bombadil.infradead.org
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
arch/x86/kernel/acpi/sleep.c:24: warning: 'temp_stack' defined but not used
[ Sven Wegener <sven.wegener@stealer.net>: fix build bug ]
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
prefill_possible_map() is defined inside CONFIG_HOTPLUG_CPU,
so the nesting CONFIG_HOTPLUG_CPU is just redundant.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Allow x86 to support a built-in kernel command line. The built-in
command line can override the one provided by the boot loader, for
those cases where the boot loader is broken or it is difficult
to change the command line in the the boot loader.
H. Peter Anvin wrote:
> Ingo Molnar wrote:
>> Best would be to make it really apparent in the code that nothing
>> changes if this config option is not set. Preferably there should be
>> no extra code at all in that case.
>>
>
> I would like to see this:
[...Nested ifdefs...]
OK. This version changes absolutely nothing if CONFIG_CMDLINE_BOOL is not
set (the default). Also, no space is appended even when CONFIG_CMDLINE_BOOL
is set, but the builtin string is empty. This is less sloppy all the way
around, IMHO.
Note that I use the same option names as on other arches for
this feature.
[ mingo@elte.hu: build fix ]
Signed-off-by: Tim Bird <tim.bird@am.sony.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When a CPU core is shut down, all of its caches need to be flushed
to prevent stale data from causing errors if the core is resumed.
Current Linux suspend code performs an assignment after the flush,
which can add dirty data back to the cache. On some AMD platforms,
additional speculative reads have caused crashes on resume because
of this dirty data.
Relocate the cache flush to be the very last thing done before
halting. Tie into an assembly line so the compile will not
reorder it. Add some documentation explaining what is going
on and why we're doing this.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Acked-by: Mark Borden <mark.borden@amd.com>
Acked-by: Michael Hohmuth <michael.hohmuth@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, setup_p4_watchdog() use CCCR_OVF_PMI1 to enable the counter
overflow interrupts to the second logical core. But this bit doesn't work
on Pentium 4 Ds (model 4, stepping 4) and this patch avoids its use on
these processors. Tested on 4 different machines that have this
specific model with success.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: jvillalovos@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If sysfs registration fails all memory used by IOMMU is freed. This
happens after dma_ops initialization and the functions will access the
freed memory then.
Fix this by initializing dma_ops after the sysfs registration.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds device table initializations which forbids memory accesses
for devices per default and disables all page faults.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We are able to use clock_event_device as it's done in
64bit apic code so lets get rid of local_apic_timer_verify_ok
variable.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no need to clear APIC twice since
disable_local_APIC will clear it anyway.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To be able to unify this function we RE-introduce
APIC_DIVISOR for 64bit mode. This snipped was eliminated
in some time ago in a sake of clenup but now we need it
again since it allow up to get rid of #ifdef(s).
And lapic_is_integrated call is added in apic_64.c but
since we always have APIC integrated on 64bit cpu compiler
will ignore this call.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Get rid of local_apic_timer_disabled and use disable_apic_timer instead.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
msr_open tests for someone trying to open a device for a nonexistent CPU.
However, the function always returns 0, not ret like it should, hence
userspace can BUG the kernel trivially. This bug was introduced by the
cdev lock_kernel pushdown patch last May.
The BUG can be reproduced with these commands:
# mknod fubar c 202 8 <-- pick a number less than NR_CPUS that is not
the number of an online CPU
# cat fubar
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
AMD SB700 based systems with spread spectrum enabled use a SMM based
HPET emulation to provide proper frequency setting. The SMM code is
initialized with the first HPET register access and takes some time to
complete. During this time the config register reads 0xffffffff. We
check for max. 1000 loops whether the config register reads a non
0xffffffff value to make sure that HPET is up and running before we go
further. A counting loop is safe, as the HPET access takes thousands
of CPU cycles. On non SB700 based machines this check is only done
once and has no side effects.
Based on a quirk patch from: crane cai <crane.cai@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
clear bits for cpu nr > 8.
This allows us to boot the full range of possible CPUs that the
supported APIC model will allow. Previously we'd hang or boot up
with less than 8 CPUs.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For some reason we had two parsers registered for maxcpus=. One in init/main.c
and another in arch/x86/smpboot.c. So I nuked the one in arch/x86.
Also 64-bit kernels used to handle maxcpus= as documented in
Documentation/cpu-hotplug.txt. CPUs with 'id > maxcpus' are initialized
but not booted. 32-bit version for some reason ignored them even though
all the infrastructure for booting them later is there.
In the current mainline both 64 and 32 bit versions are broken.
This patch restores the correct behaviour. I've tested x86_64 version on
4- and 8- way Core2 and 2-way Opteron based machines. Various config
combinations SMP, !SMP, CPU_HOTPLUG, !CPU_HOTPLUG.
Booted with maxcpus=1 and maxcpus=4, etc. Everything is working as expected.
So far we've received two reports from different people confirming that 32-bit
version also works fine, both on dual core laptops and 16way server machines.
[v2: This version fixes visws breakage pointed out by Ingo.]
Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
Cc: lizf@cn.fujitsu.com
Cc: jeff.chua.linux@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
All these structure sizes are runtime determined. So use a runtime
bug check.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fxsave/xsave instructions will not touch all the bytes in the
fxsave/xsave frame. Clear the user buffer before doing fxsave/xsave
directly to user buffer during the sigcontext setup.
This is essentially needed in the context of xsave(for example,
some of the fields in the xsave header are not touched by the xsave
and defined as must be zero).
This will also present uniform and clean context to the user (from
which user can safely do fxrstor/xrstor).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
save_i387_xstate() is already doing the required access_ok(). Remove
the redundant access_ok() before it.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The microcode stores its date in a uint32_t in some weird order
approximating pdp-endian. Rather than printing it like that, print it
properly in ISO standard form.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
SGI UV will have MMCFG base addresses that are greater than 4GB (32 bits).
v2: Use CONFIG_RESOURCES_64BIT instead of CONFIG_X86_64.
v3: Create a flag, that is set by platform specific code,
to disable the > 4GB check.
Signed-off-by: John Keller <jpk@sgi.com>
Cc: jpk@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.text+0xcd1f): Section mismatch in reference from the function find_and_reserve_crashkernel() to the function .init.text:find_e820_area()
The function find_and_reserve_crashkernel() references
the function __init find_e820_area().
This is often because find_and_reserve_crashkernel lacks a __init
annotation or the annotation of find_e820_area is wrong.
WARNING: vmlinux.o(.text+0xcd38): Section mismatch in reference from the function find_and_reserve_crashkernel() to the function .init.text:reserve_bootmem_generic()
The function find_and_reserve_crashkernel() references
the function __init reserve_bootmem_generic().
This is often because find_and_reserve_crashkernel lacks a __init
annotation or the annotation of reserve_bootmem_generic is wrong.
find_and_reserve_crashkernel is called from __init function (reserve_crashkernel)
and calls 2 __init functions (find_e820_area, reserve_bootmem_generic),
so mark it __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.text+0x14cf8): Section mismatch in reference from the function map_high() to the function .init.text:init_extra_mapping_uc()
The function map_high() references
the function __init init_extra_mapping_uc().
This is often because map_high lacks a __init
annotation or the annotation of init_extra_mapping_uc is wrong.
WARNING: vmlinux.o(.text+0x14d05): Section mismatch in reference from the function map_high() to the function .init.text:init_extra_mapping_wb()
The function map_high() references
the function __init init_extra_mapping_wb().
This is often because map_high lacks a __init
annotation or the annotation of init_extra_mapping_wb is wrong.
map_high is called only from __init functions (map_*_high)
and calls 2 __init_functions (init_extra_mapping_*)
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix 2.6.27rc1 cannot boot more than 8CPUs
x86: make "apic" an early_param() on 32-bit, NULL check
EFI, x86: fix function prototype
x86, pci-calgary: fix function declaration
x86: work around gcc 3.4.x bug
x86: make "apic" an early_param() on 32-bit
x86, debug: tone down arch/x86/kernel/mpparse.c debugging printk
x86_64: restore the proper NR_IRQS define so larger systems work.
x86: Restore proper vector locking during cpu hotplug
x86: Fix broken VMI in 2.6.27-rc..
x86: fdiv bug detection fix
Jeff Chua reported that booting a !bigsmp kernel on a 16-way box
hangs silently.
this is a long-standing issue, smp start AP cpu could check the
apic id >=8 etc before trying to start it.
achieve this by moving the def_to_bigsmp check later and skip the
apicid id > 8
[ mingo@elte.hu: clean up the message that is printed. ]
Reported-by: "Jeff Chua" <jeff.chua.linux@gmail.com>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/setup.c | 6 ------
arch/x86/kernel/smpboot.c | 10 ++++++++++
2 files changed, 10 insertions(+), 6 deletions(-)
Exception stacks are allocated each time a CPU is set online.
But the allocated space is never freed. Thus with one CPU hotplug
offline/online cycle there is a memory leak of 24K (6 pages) for
a CPU.
Fix is to allocate exception stacks only once -- when the CPU is
set online for the first time.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pda->irqstackptr is allocated whenever a CPU is set online.
But it is never freed. This results in a memory leak of 16K
for each CPU offline/online cycle.
Fix is to allocate pda->irqstackptr only once.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cyrill Gorcunov observed:
> you turned it into early_param so now it's NULL injecting vulnerabled.
> Could you please add checking for NULL str param?
fix that.
Also, change the name of 'str' into 'arg', to make it more apparent
that this is an optional argument that can be NULL, not a string
parameter that is empty when unset.
Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix function declaration:
linux-next-20080807/arch/x86/kernel/pci-calgary_64.c:1353:36: warning: non-ANSI function declaration of function 'get_tce_space_from_tar'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On 32-bit, "apic" is a __setup() param meaning it is parsed rather
late in the game. Make it an early_param() for apic_printk() use
by arch/x86/kernel/mpparse.c.
On 64-bit, it already is an early_param().
Signed-off-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
commit 11a62a0560 turns some formerly
nopped debugging printks in arch/x86/kernel/mppparse.c into regular
ones. The one at the top of smp_scan_config() in particular also
prints on !CONFIG_SMP/CONFIG_X86_LOCAL_APIC kernels and UP machines
without anything resembling MP tables which makes their lowly UP
owners wonder...
Turn the former Dprintk()s into apic_printk()s instead meaning that
their printing is dependent on passing the apic=verbose (or =debug)
command line param.
On 32-bit, "apic" is a __setup() param which isn't early enough
for this code and therefore needs a followup changing it into an
early_param(). On 64-bit, it already is.
Signed-off-by: Rene Herman <rene.herman@gmail.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
64bit mode APIC interrupt handlers are set within irqinit_64.c.
Lets do tha same for 32bit mode which would help in furter code merging.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Having cpu_online_map change during assign_irq_vector can result
in some really nasty and weird things happening. The one that
bit me last time was accessing non existent per cpu memory for non
existent cpus.
This locking was removed in a sloppy x86_64 and x86_32 merge patch.
Guys can we please try and avoid subtly breaking x86 when we are
merging files together?
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
WARNING: vmlinux.o(.text+0x118f7): Section mismatch in reference from the function construct_ioapic_table() to the function .init.text:MP_bus_info()
The function construct_ioapic_table() references
the function __init MP_bus_info().
This is often because construct_ioapic_table lacks a __init
annotation or the annotation of MP_bus_info is wrong.
construct_ioapic_table is called only from construct_default_ISA_mptable which is __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x1591): Section mismatch in reference from the function init_amd() to the function .init.text:check_enable_amd_mmconf_dmi()
The function __cpuinit init_amd() references
a function __init check_enable_amd_mmconf_dmi().
If check_enable_amd_mmconf_dmi is only used by init_amd then
annotate check_enable_amd_mmconf_dmi with a matching annotation.
check_enable_amd_mmconf_dmi is only called from init_amd which is __cpuinit
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x1fe7): Section mismatch in reference from the function MP_processor_info() to the variable .init.data:x86_quirks
The function __cpuinit MP_processor_info() references
a variable __initdata x86_quirks.
If x86_quirks is only used by MP_processor_info then
annotate x86_quirks with a matching annotation.
MP_processor_info uses x86_quirks which is __init and is used only from
smp_read_mpc and construct_default_ISA_mptable which are __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
WARNING: vmlinux.o(.text+0x7950): Section mismatch in reference from the function native_calibrate_tsc() to the function .init.text:tsc_read_refs()
The function native_calibrate_tsc() references
the function __init tsc_read_refs().
This is often because native_calibrate_tsc lacks a __init
annotation or the annotation of tsc_read_refs is wrong.
tsc_read_refs is called from native_calibrate_tsc which is not __init
and native_calibrate_tsc cannot be marked __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The lowmem mapping table created by VMI need not depend on max_low_pfn
at all. Instead we now create an extra large mapping which covers all
possible lowmem instead of the physical ram that is actually available.
This allows the vmi initialization to be done before max_low_pfn could
be computed. We also move the vmi_init code very early in the boot process
so that nobody accidentally breaks the fixmap dependancy.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This patch provides support for the _PSD ACPI object in the Powernow-k8
driver. Although it looks like an invasive patch, most of it is
simply the consequence of turning the static acpi_performance_data
structure into a pointer.
AMD has tested it on several machines over the past few days without issue.
[trivial checkpatch warnings fixed up by davej]
[X86_POWERNOW_K8_ACPI=n buildfix from Randy Dunlap]
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Tested-by: Frank Arnold <frank.arnold@amd.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Dave Jones <davej@redhat.com>
arch/x86/kernel/cpu/cpufreq/elanfreq.c:47:26: warning: symbol 'elan_multiplier' was not declared. Should it be static?
Yes, yes it should.
Signed-off-by: Dave Jones <davej@redhat.com>
The fdiv detection code writes s32 integer into
the boot_cpu_data.fdiv_bug.
However, the boot_cpu_data.fdiv_bug is only char (s8)
field so the detection overwrites already set fields for
other bugs, e.g. the f00f bug field.
Use local s32 variable to receive result.
This is a partial fix to Bugzilla #9928 - fixes wrong
information about the f00f bug (tested) and probably
for coma bug (I have no cpu to test this).
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix all errors and many warnings reported by checkpatch.pl
without change sys_x86_64.o
arch/x86/kernel/sys_x86_64.o:
text data bss dec hex filename
1567 0 0 1567 61f sys_x86_64.o.after
1567 0 0 1567 61f sys_x86_64.o.before
md5:
de28ffedcb5851dfd7ec87a03afec1fd sys_x86_64.o.after
de28ffedcb5851dfd7ec87a03afec1fd sys_x86_64.o.before
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix all errors and many warnings reported by checkpath.pl.
Except the change of include <asm/io.h> to <linux/io.h>
the traps.o before and after changes are the same.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix all errors and many warnings reported by checkpatch.pl
without change signal_64.o
arch/x86/kernel/signal_64.o
text data bss dec hex filename
5143 0 8 5151 141f signal_64.o.after
5143 0 8 5151 141f signal_64.o.before
md5:
e68718092b3641cb27e79e55ce57e3ad signal_64.o.after
e68718092b3641cb27e79e55ce57e3ad signal_64.o.before
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The XSAVE feature mask is a 64-bit number; keep it that way, in order
to avoid the mistake done with rdmsr/wrmsr. Use the xsetbv() function
provided in the previous patch.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
FP/SSE bits may be zero in the xsave header(representing the init state).
Update these bits during the ptrace fpregs set operation, to indicate the
non-init state.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On cpu's supporting xsave/xrstor, fpstate pointer in the sigcontext, will
include the extended state information along with fpstate information. Presence
of extended state information is indicated by the presence
of FP_XSTATE_MAGIC1 at fpstate.sw_reserved.magic1 and FP_XSTATE_MAGIC2
at fpstate + (fpstate.sw_reserved.extended_size - FP_XSTATE_MAGIC2_SIZE).
Extended feature bit mask that is saved in the memory layout is represented
by the fpstate.sw_reserved.xstate_bv
For RT signal frames, UC_FP_XSTATE in the uc_flags also indicate the
presence of extended state information in the sigcontext's fpstate
pointer.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move 64bit routines that saves/restores fpstate in/from user stack from
signal_64.c to xsave.c
restore_i387_xstate() now handles the condition when user passes
NULL fpstate.
Other misc changes for prepartion of xsave/xrstor sigcontext support.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
dynamically allocate fpstate on the stack, instead of static allocation
in the current sigframe layout on the user stack. This will allow the
fpstate structure to grow in the future, which includes extended state
information supporting xsave/xrstor.
signal handlers will be able to access the fpstate pointer from the
sigcontext structure asusual, with no change. For the non RT sigframe's
(which are supported only for 32bit apps), current static fpstate layout
in the sigframe will be unused(so that we don't change the extramask[]
offset in the sigframe and thus prevent breaking app's which modify
extramask[]).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Uses xsave/xrstor (instead of traditional fxsave/fxrstor) in context switch
when available.
Introduces TS_XSAVE flag, which determine the need to use xsave/xrstor
instructions during context switch instead of the legacy fxsave/fxrstor
instructions. Thread-synchronous status word is already in L1 cache during
this code patch and thus minimizes the performance penality compared to
(cpu_has_xsave) checks.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Enables xsave/xrstor by turning on cr4.osxsave on cpu's which have
the xsave support. For now, features that OS supports/enabled are
FP and SSE.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Exports needed by the GRU driver.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This IOMMU helper function doesn't work for some architectures:
http://marc.info/?l=linux-kernel&m=121699304403202&w=2
It also breaks POWER and SPARC builds:
http://marc.info/?l=linux-kernel&m=121730388001890&w=2
Currently, only x86 IOMMUs use this so let's move it to x86 for
now.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix tons of build errors:
arch/x86/kernel/built-in.o: In function `microcode_fini_cpu':
microcode_intel.c:(.text+0x11598): undefined reference to `microcode_mutex'
microcode_intel.c:(.text+0x115a4): undefined reference to `ucode_cpu_info'
microcode_intel.c:(.text+0x115ae): undefined reference to `ucode_cpu_info'
microcode_intel.c:(.text+0x115bc): undefined reference to `microcode_mutex'
[...]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
arch/x86/kernel/microcode.c:412: error: static declaration of ‘microcode_init’ follows non-static declaration
include/asm/microcode.h:1: error: previous declaration of ‘microcode_init’ was here
arch/x86/kernel/microcode.c:454: error: static declaration of ‘microcode_exit’ follows non-static declaration
include/asm/microcode.h:2: error: previous declaration of ‘microcode_exit’ was here
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (21 commits)
x86/PCI: use dev_printk when possible
PCI: add D3 power state avoidance quirk
PCI: fix bogus "'device' may be used uninitialized" warning in pci_slot
PCI: add an option to allow ASPM enabled forcibly
PCI: disable ASPM on pre-1.1 PCIe devices
PCI: disable ASPM per ACPI FADT setting
PCI MSI: Don't disable MSIs if the mask bit isn't supported
PCI: handle 64-bit resources better on 32-bit machines
PCI: rewrite PCI BAR reading code
PCI: document pci_target_state
PCI hotplug: fix typo in pcie hotplug output
x86 gart: replace to_pages macro with iommu_num_pages
x86, AMD IOMMU: replace to_pages macro with iommu_num_pages
iommu: add iommu_num_pages helper function
dma-coherent: add documentation to new interfaces
Cris: convert to using generic dma-coherent mem allocator
Sh: use generic per-device coherent dma allocator
ARM: support generic per-device coherent dma mem
Generic dma-coherent: fix DMA_MEMORY_EXCLUSIVE
x86: use generic per-device dma coherent allocator
...
Clean up and optimize cpumask_of_cpu(), by sharing all the zero words.
Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns
creating a huge array of constant bitmasks, realize that the zero words
can be shared.
In other words, on a 64-bit architecture, we only ever need 64 of these
arrays - with a different bit set in one single world (with enough zero
words around it so that we can create any bitmask by just offsetting in
that big array). And then we just put enough zeroes around it that we
can point every single cpumask to be one of those things.
So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each,
with one bit set in each array - 2MB memory total), we have exactly 64
arrays instead, each 8k bits in size (64kB total).
And then we just point cpumask(n) to the right position (which we can
calculate dynamically). Once we have the right arrays, getting
"cpumask(n)" ends up being:
static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
{
const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
p -= cpu / BITS_PER_LONG;
return (const cpumask_t *)p;
}
This brings other advantages and simplifications as well:
- we are not wasting memory that is just filled with a single bit in
various different places
- we don't need all those games to re-create the arrays in some dense
format, because they're already going to be dense enough.
if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory
is a non-issue (especially since by doing this "overlapping" trick we
probably get better cache behaviour anyway).
[ mingo@elte.hu:
Converted Linus's mails into a commit. See:
http://lkml.org/lkml/2008/7/27/156http://lkml.org/lkml/2008/7/28/320
Also applied a family filter - which also has the side-effect of leaving
out the bits where Linus calls me an idio... Oh, never mind ;-)
]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch introduces microcode patch loading for AMD
processors. It is based on previous corresponding work
for Intel processors.
It hooks into the general patch loading module. Main
difference is that a container file format is used to hold
all patch data for multiple processors as well as an
equivalent CPU table, which comes seperately, as opposed
to Intel's microcode patching solution.
Kconfig and Makefile have been changed provice config
and build option for new source file.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Refactored code by introducing a two-module solution.
There is one general module in which vendor specific modules can hook into.
However, that is exclusive, there is only one vendor specific module
allowed at a time. A CPU vendor check makes sure only the correct
module for the underlying system gets called.
Functinally in terms of patch loading itself there are no changes. This
refactoring provides a basis for future implementations of other vendors'
patch loaders.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Renamed common structures to vendor specific naming scheme
so other vendors will be able to use the same naming
convention.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Split off existing code into two seperate files. One file holds general
code, the other file vendor specific parts.
No functional changes, only refactoring.
Temporarily Introduced a new module name 'ucode' for result,
due to already taken name 'microcode'.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This structure will be later used by other modules as well and
needs therfore to be moved out to a header file.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Removed typedefs. No functional changes to the code.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Intel specific microcode declarations have been moved to a seperate header file.
There are no code changes to the code itself and no side effects to other parts.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix !PCI build failure:
arch/x86/kernel/cpu/intel_cacheinfo.c: In function 'get_k8_northbridge':
arch/x86/kernel/cpu/intel_cacheinfo.c:675: error: implicit declaration of function 'pci_match_id'
Signed-off-by: Ingo Molnar <mingo@elte.hu>