try to insert_resource second time, by expanding the resource...
for case: e820 reserved entry is partially overlapped with bar res...
hope it will never happen
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: aestetic
Capitalize function call interrupts consistently.
All other descriptions in /proc/interrupts are capitalized except
for "function call interrupts". Capitalize it too for consistency.
While that's technically a published ABI I think the risk of anyone
relying on that text to stay the same is negligible.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
this one replaces:
| commit a2bd7274b4
| Author: Yinghai Lu <yhlu.kernel@gmail.com>
| Date: Mon Aug 25 00:56:08 2008 -0700
|
| x86: fix HPET regression in 2.6.26 versus 2.6.25, check hpet against BAR, v3
v2: insert e820 reserve resources before pnp_system_init
v3: fix merging problem in tip/x86/core
v4: address Linus's review about comments and condition in _late()
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
so could let BAR res register at first, or even pnp.
v2: insert e820 reserve resources before pnp_system_init
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The last changes made the calibration loop 250ms long which is far
too much. Try to do that more clever.
Experiments have shown that using a 10ms delay for the PIT based calibration
gives us a good enough value. If we have a reference (HPET/PMTIMER) and the
result of the PIT and the reference is close enough, then we can break out of
the calibration loop on a match right away and use the reference value.
Otherwise we just loop 3 times and decide then, which value to take.
One caveat is that for virtualized environments the PIT calibration often does
not work at all and I found out that 10us is a bit too short as well for the
reference to give a sane result. The solution here is to make the last loop
longer when the first two PIT calibrations failed.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When calibration against PIT fails, the warning that we print is misleading.
In a virtualized environment the VM may get descheduled while calibration
or, the check in PIT calibration may fail due to other virtualization
overheads.
The warning message explicitly assumes that calibration failed due to SMI's
which may not be the case. Change that to something proper.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manually adding "io_delay=0xed" fixes system lockups in ioapic
mode on this machine.
System Information
Manufacturer: Hewlett-Packard
Product Name: Presario F700 (KA695EA#ABF)
Base Board Information
Manufacturer: Quanta
Product Name: 30D3
Reference:
https://bugzilla.redhat.com/show_bug.cgi?id=459546
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The TSC calibration function is still very complicated, but this makes
it at least a little bit less so by moving the PIT part out into a
helper function of its own.
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org>
Larry Finger reported at http://lkml.org/lkml/2008/9/1/90:
An ancient laptop of mine started throwing errors from b43legacy when
I started using 2.6.27 on it. This has been bisected to commit bfc0f59
"x86: merge tsc calibration".
The unification of the TSC code adopted mostly the 64bit code, which
prefers PMTIMER/HPET over the PIT calibration.
Larrys system has an AMD K6 CPU. Such systems are known to have
PMTIMER incarnations which run at double speed. This results in a
miscalibration of the TSC by factor 0.5. So the resulting calibrated
CPU/TSC speed is half of the real CPU speed, which means that the TSC
based delay loop will run half the time it should run. That might
explain why the b43legacy driver went berserk.
On the other hand we know about systems, where the PIT based
calibration results in random crap due to heavy SMI/SMM
disturbance. On those systems the PMTIMER/HPET based calibration logic
with SMI detection shows better results.
According to Alok also virtualized systems suffer from the PIT
calibration method.
The solution is to use a more wreckage aware aproach than the current
either/or decision.
1) reimplement the retry loop which was dropped from the 32bit code
during the merge. It repeats the calibration and selects the lowest
frequency value as this is probably the closest estimate to the real
frequency
2) Monitor the delta of the TSC values in the delay loop which waits
for the PIT counter to reach zero. If the maximum value is
significantly different from the minimum, then we have a pretty safe
indicator that the loop was disturbed by an SMI.
3) keep the pmtimer/hpet reference as a backup solution for systems
where the SMI disturbance is a permanent point of failure for PIT
based calibration
4) do the loop iteration for both methods, record the lowest value and
decide after all iterations finished.
5) Set a clear preference to PIT based calibration when the result
makes sense.
The implementation does the reference calibration based on
HPET/PMTIMER around the delay, which is necessary for the PIT anyway,
but keeps separate TSC values to ensure the "independency" of the
resulting calibration values.
Tested on various 32bit/64bit machines including Geode 266Mhz, AMD K6
(affected machine with a double speed pmtimer which I grabbed out of
the dump), Pentium class machines and AMD/Intel 64 bit boxen.
Bisected-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make poll_idle() behave more like the other idle methods.
Currently, poll_idle() returns immediately. The other
idle methods all wait indefinately for some condition
to come true before returning. poll_idle should emulate
these other methods and also wait for a return condition,
in this case, for need_resched() to become 'true'.
Without this delay the idle loop spends all of its time
in the outer loop that calls poll_idle. This outer loop,
these days, does real work, some of it under rcu locks.
That work should only be done when idle is entered and
when idle exits, not continuously while idle is spinning.
Signed-off-by: Joe Korty <joe.korty@ccur.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We have had a number of cases where <asm/cpufeature.h> (and its
predecessors) have diverged substantially from the names list in
/proc/cpuinfo. This patch generates the latter from the former.
It retains the option for explicitly overriding the strings, but by
making that require a separate action it should at least be less
likely to happen.
It would be good to do a future pass and rename strings that are
gratuituously different in the kernel (/proc/cpuinfo is a userspace
interface and must remain constant.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
acpi_mcfg_64bit_base_addr is used when CONFIG_PCI_MMCONFIG is enabled.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Return the correct return value when the CPUID driver partially
completes a request (we should return the number of bytes actually
read or written, instead of the error code.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Return the correct return value when the MSR driver partially
completes a request (we should return the number of bytes actually
read or written, instead of the error code.)
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Propagate error (-ENXIO) from smp_call_function_single() in the CPUID
driver. This can happen when a CPU is unplugged while the CPUID
driver is open.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Propagate error (-ENXIO) from smp_call_function_single(). These
errors can happen when a CPU is unplugged while the MSR driver is
open.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
I noticed that my sched_clock() was slow on a number of machine, so I
started looking at cpufreq.
The below seems to fix the problem for me.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Triple-fault and keyboard reset may assert INIT instead of RESET; however
INIT is blocked when Intel VT is enabled. This leads to a partially reset
machine when invoking emergency_restart via sysrq-b: the processor is still
working but other parts of the system are dead.
Default to rebooting via ACPI, which correctly asserts RESET and reboots the
machine.
This is safe since we will fall back to keyboard reset and triple fault if
acpi is not enabled or if the reset is not successful.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It allows paravirt implementations of cpu_disable to share the
cpu_disable_common code, without having to take on board APIC
writes, which may not be appropriate.
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add the new play_dead into smpboot.c, as it fits more cleanly in there
alongside other CONFIG_HOTPLUG functions.
Separate out the common code into its own function.
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The removal of the CPU from the various maps was redundant as it already
happened in cpu_disable.
After cleaning this up, cpu_uninit only resets the tlb state, so rename
it and create a noop version for the X86_64 case (so the two play_deads
can be unified later).
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: crash on non-TSC-equipped CPUs
Don't enable the TSC notifier if we *either*:
1. don't have a CPU, or
2. have a CPU with constant TSC.
In either of those cases, the notifier is either damaging (1) or useless(2).
From: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
During CPU hot-remove the sysfs directory created by
threshold_create_bank(), defined in
arch/x86/kernel/cpu/mcheck/mce_amd_64.c, has to be removed before
its parent directory, created by mce_create_device(), defined in
arch/x86/kernel/cpu/mcheck/mce_64.c . Moreover, when the CPU in
question is hotplugged again, obviously the latter has to be created
before the former. At present, the right ordering is not enforced,
because all of these operations are carried out by CPU hotplug
notifiers which are not appropriately ordered with respect to each
other. This leads to serious problems on systems with two or more
multicore AMD CPUs, among other things during suspend and hibernation.
Fix the problem by placing threshold bank CPU hotplug callbacks in
mce_cpu_callback(), so that they are invoked at the right places,
if defined. Additionally, use kobject_del() to remove the sysfs
directory associated with the kobject created by
kobject_create_and_add() in threshold_create_bank(), to prevent the
kernel from crashing during CPU hotplug operations on systems with
two or more multicore AMD CPUs.
This patch fixes bug #11337.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Andi Kleen <andi@firstfloor.org>
Tested-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
use x2apic id reported by cpuid during topology discovery, instead of the
apic id configured in the APIC. For most of the systems, x2apic id
reported by cpuid leaf 0xb will be same as the physical apic id reported
by the APIC_ID register of the APIC. We follow the suggested guidelines
and use the apic id reported by the cpuid.
No change to non-generic UV platforms, will use the apic id reported in the
APIC_ID register as the cpuid reported apic id's may not be unique.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
cpuid leaf 0xb provides extended topology enumeration. This interface provides
the 32-bit x2APIC id of the logical processor and it also provides a new
mechanism to detect SMT and core siblings (which provides increased
addressability).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: work around MTRR mask setting, v2
x86: fix section mismatch warning - uv_cpu_init
x86: fix VMI for early params
x86: fix two modpost warnings in mm/init_64.c
x86: fix 1:1 mapping init on 64-bit (memory hotplug case)
x86: work around MTRR mask setting
x86: PAT Update validate_pat_support for intel CPUs
devmem, x86: PAT Change /dev/mem mmap with O_SYNC to use UC_MINUS
x86: PAT proper tracking of set_memory_uc and friends
x86: fix BUG: unable to handle kernel paging request (numaq_tsc_disable)
x86: export pv_lock_ops non-GPL
x86, mmiotrace: silence section mismatch warning - leave_uniprocessor
x86: use WARN() in arch/x86/kernel
x86: use WARN() in arch/x86/mm/ioremap.c
werror: fix pci calgary
x86: fix oprofile + hibernation badness
x86, SGI UV: hardcode the TLB flush interrupt system vector
x86: fix Xorg startup/shutdown slowdown with PAT
x86: fix "kernel won't boot on a Cyrix MediaGXm (Geode)"
x86 iommu: remove unneeded parenthesis
None of the spinlock API is exported GPL, so there's no reason for
pv_lock_ops to be.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: drago01 <drago01@gmail.com>
improve the debug printout:
- make it actually display something
- print it only once
would be nice to have a WARN_ONCE() facility, to feed such things to
kerneloops.org.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.cpuinit.text+0x3cc4): Section mismatch in reference from the function uv_cpu_init() to the function .init.text:uv_system_init()
The function __cpuinit uv_cpu_init() references
a function __init uv_system_init().
If uv_system_init is only used by uv_cpu_init then
annotate uv_system_init with a matching annotation.
uv_system_init was ment to be called only once, so do it from codepath
(native_smp_prepare_cpus) which is called once, right before activation
of other cpus (smp_init).
Note: old code relied on uv_node_to_blade being initialized to 0,
but it'a not initialized from anywhere.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
alloc_coherent dma_ops callback was added to GART, however, it doesn't
return a size aligned address wrt dma_alloc_coherent, as
DMA-mapping.txt defines. This patch fixes it.
This patch also removes unused gart_map_simple
(dma_mapping_ops->map_simple has gone).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pci-dma.c doesn't use map_simple hook any more so we can remove it
from struct dma_mapping_ops now.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
All the x86 DMA-API functions are defined in asm/dma-mapping.h. This patch
moves the dma_*_coherent functions also to this header file because they are
now small enough to do so.
This is done as a separate patch because it also includes some renaming and
restructuring of the dma-mapping.h file.
Signed-off-by: Joerg Roedel <joerg.roede@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
All dma_ops implementations support the alloc_coherent and free_coherent
callbacks now. This allows a big simplification of the dma_alloc_coherent
function which is done with this patch. The dma_free_coherent functions is also
cleaned up and calls now the free_coherent callback of the dma_ops
implementation.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
[ v2 - x86: make gart_alloc_coherent return zeroed memory
FUJITA Tomonori pointed it out that the dma_alloc_coherent function
should return memory set to zero. This patch adds this to the GART
implementation too. ]
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
while fixing a different bug i moved the call to vmi_init before
early params could be parsed.
This broke the vmi specific commandline parameters.
Fix that, by moving vmi initialization after kernel has got a chance to
parse early parameters.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
microcode_amd.c uses ">> 32" on a 32-bit value, so gcc warns about that.
The code could use something like this *untested* patch.
linux-next-20080821/arch/x86/kernel/microcode_amd.c:229: warning: right shift count >= width of type
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Joshua Hoblitt reported that only 3 GB of his 16 GB of RAM is
usable. Booting with mtrr_show showed us the BIOS-initialized
MTRR settings - which are all wrong.
So the root cause is that the BIOS has not set the mask correctly:
> [ 0.429971] MSR00000200: 00000000d0000000
> [ 0.433305] MSR00000201: 0000000ff0000800
> should be ==> [ 0.433305] MSR00000201: 0000003ff0000800
>
> [ 0.436638] MSR00000202: 00000000e0000000
> [ 0.439971] MSR00000203: 0000000fe0000800
> should be ==> [ 0.439971] MSR00000203: 0000003fe0000800
>
> [ 0.443304] MSR00000204: 0000000000000006
> [ 0.446637] MSR00000205: 0000000c00000800
> should be ==> [ 0.446637] MSR00000205: 0000003c00000800
>
> [ 0.449970] MSR00000206: 0000000400000006
> [ 0.453303] MSR00000207: 0000000fe0000800
> should be ==> [ 0.453303] MSR00000207: 0000003fe0000800
>
> [ 0.456636] MSR00000208: 0000000420000006
> [ 0.459970] MSR00000209: 0000000ff0000800
> should be ==> [ 0.459970] MSR00000209: 0000003ff0000800
So detect this borkage and add the prefix 111.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch changes the pfn args from 'u32' to 'unsigned long'
on alloc_p*() functions on paravirt_ops, and the corresponding
implementations for Xen and VMI. The prototypes for CONFIG_PARAVIRT=n
are already using unsigned long, so paravirt.h now matches the prototypes
on asm-x86/pgalloc.h.
It shouldn't result in any changes on generated code on 32-bit, with
or without CONFIG_PARAVIRT. On both cases, 'codiff -f' didn't show any
change after applying this patch.
On 64-bit, there are (expected) binary changes only when CONFIG_PARAVIRT
is enabled, as the patch is really supposed to change the size of the
pfn args.
[ v2: KVM_GUEST: use the right parameter type on kvm_release_pt() ]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pentium III and Core Solo/Duo CPUs have an erratum
" Page with PAT set to WC while associated MTRR is UC may consolidate to UC "
which can result in WC setting in PAT to be ineffective. We will disable
PAT on such CPUs, so that we can continue to use MTRR WC setting.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add support for the E820_UNUSABLE memory type, which is defined in
Revision 3.0b (Oct. 10, 2006) of the ACPI Specification on p. 394 Table
14-1:
AddressRangeUnusuable This range of address contains memory in which
errors have been detected. This range must not be used by the OSPM.
Signed-off-by: Joseph Cihula <joseph.cihula@intel.com>
Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: Gang Wei <gang.wei@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This section mismatch:
>> Seems to be a section mismatch; init_intel() is __cpuinit while
>> numaq_tsc_disable() is __init. Seems to be introduced in:
>>
>> commit 64898a8bad
>> Author: Yinghai Lu <yhlu.kernel@gmail.com>
>> Date: Sat Jul 19 18:01:16 2008 -0700
>>
>> x86: extend and use x86_quirks to clean up NUMAQ code
>
> Oops, I am wrong about numaq_tsc_disable() being __init. Still, I
> believe that Yinghai might be able to say what's really wrong :-)
Would lead to this crash:
BUG: unable to handle kernel paging request at c08a45f0
IP: [<c08a45f0>] numaq_tsc_disable+0x0/0x40
Fixed by the patch below.
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
None of the spinlock API is exported GPL, so there's no reason for
pv_lock_ops to be.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: drago01 <drago01@gmail.com>
after following patch,
commit 1b313f4a6d
Author: Cyrill Gorcunov <gorcunov@gmail.com>
Date: Mon Aug 18 20:45:57 2008 +0400
x86: apic - generic_processor_info
- use physid_set instead of phys_cpu and physids_or
- set phys_cpu_present_map bit AFTER check for allowed
number of processors
- add checking for APIC valid version in 64bit mode
(mostly not needed but added for merging purpose)
- add apic_version definition for 64bit mode which
is used now
we are getting warning for acpi path on 64 bit system.
make the 64-bit side fill in apic_version[] as well.
[ mingo@elte.hu: build fix ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use WARN() instead of a printk+WARN_ON() pair; this way the message
becomes part of the warning section for better reporting/collection.
This also allowed the folding of some if()'s into the WARN()
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix an integer comparison always false warning in the PCI Calgary 64 driver.
A u8 is being compared to something that's 512 by default, resulting in the
following warning:
arch/x86/kernel/pci-calgary_64.c:1285: warning: comparison is always false due to limited range of data type
This was introduced by patch b34e90b8f0.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It is useful for a pv_lock_ops backend to know whether interrupts are
enabled or not in the context a spin_lock is being called. This
allows it to enable interrupts while spinning, which could be
particularly helpful when spinning becomes blocking.
The default implementation just calls the normal spin_lock op,
ignoring the flags.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The UV TLB shootdown mechanism needs a system interrupt vector.
Its vector had been hardcoded as 200, but needs to moved to the reserved
system vector range so that it does not collide with some device vector.
This is still temporary until dynamic system IRQ allocation is provided.
But it will be needed when real UV hardware becomes available and runs 2.6.27.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This is the 1st patch in the series. Here the aim was to avoid any
significant changes, logically-wise.
So it's mainly about generic interface refactoring: e.g. make
microcode_{intel,amd}.c more about arch-specific details and less
about policies like make-sure-we-run-on-a-target-cpu
(no more set_cpus_allowed_ptr() here) and generic synchronization (no
more microcode_mutex here).
All in all, more line have been deleted than added.
4 files changed, 145 insertions(+), 198 deletions(-)
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Don't use get_cpu() at all. Resort to checking a boot-up CPU (#0) in
microcode_{intel,amd}_module_init().
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cyrix MediaGXm/Cx5530 Unicorn Revision 1.19.3B has stopped
booting starting at v2.6.22.
The reason is this commit:
> commit f25f64ed5b
> Author: Juergen Beisert <juergen@kreuzholzen.de>
> Date: Sun Jul 22 11:12:38 2007 +0200
>
> x86: Replace NSC/Cyrix specific chipset access macros by inlined functions.
this commit activated a macro which was dormant before due to (buggy)
macro side-effects.
I've looked through various datasheets and found that the GXm and GXLV
Geode processors don't have an incrementor.
Remove the incrementor setup entirely. As the incrementor value
differs according to clock speed and we would hope that the BIOS
configures it correctly, it is probably the right solution.
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reverts commit 34ae7f35a2, which has
been reported to cause a number of problems. During suspend and resume,
it apparently causes a crash in a CPU hotplug notifier to happen,
although the exact details are sketchy because of the inability to get
good traces during the suspend sequence.
See buzilla entries
http://bugzilla.kernel.org/show_bug.cgi?id=11296http://bugzilla.kernel.org/show_bug.cgi?id=11339
for more examples and details.
[ Mark: "Revert the patch for now. I'm still looking into getting a
reliable reproduction and I do not have a fix at this time." ]
Requested-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@inux-foundation.org>
Use X86_FEATURE_NOPL to determine if it is safe to use P6 NOPs in
alternatives. Also, replace table and loop with simple if statement.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The long noops ("NOPL") are supposed to be detected by family >= 6.
Unfortunately, several non-Intel x86 implementations, both hardware
and software, don't obey this dictum. Instead, probe for NOPL
directly by executing a NOPL instruction and see if we get #UD.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The parenthesis in __iommu_queue_command() are not needed when assigning
into 'target' variable.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/early_printk.c:404:13: warning: incorrect type in assignment (different base types)
arch/x86/kernel/early_printk.c:404:13: expected restricted __le16 [assigned] [usertype] wValue
arch/x86/kernel/early_printk.c:404:13: got int [signed] value
arch/x86/kernel/early_printk.c:405:13: warning: incorrect type in assignment (different base types)
arch/x86/kernel/early_printk.c:405:13: expected restricted __le16 [assigned] [usertype] wIndex
arch/x86/kernel/early_printk.c:405:13: got int [signed] index
arch/x86/kernel/early_printk.c:406:14: warning: incorrect type in assignment (different base types)
arch/x86/kernel/early_printk.c:406:14: expected restricted __le16 [assigned] [usertype] wLength
arch/x86/kernel/early_printk.c:406:14: got int [signed] size
arch/x86/kernel/early_printk.c:845:16: warning: Using plain integer as NULL pointer
arch/x86/kernel/early_printk.c:992:13: warning: symbol 'enable_debug_console' was not declared. Should it be static?
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Just add parenthesis to be identical of current
64bit implementation (so diff will not complain).
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- use physid_set instead of phys_cpu and physids_or
- set phys_cpu_present_map bit AFTER check for allowed
number of processors
- add checking for APIC valid version in 64bit mode
(mostly not needed but added for merging purpose)
- add apic_version definition for 64bit mode which
is used now
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We use 32bit code former for 64bit
mode since it's much better implementation
and easier to merge.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds some configuration options that allow to compile out
CPU vendor-specific code in x86 kernels (in arch/x86/kernel/cpu). The
new configuration options are only visible when CONFIG_EMBEDDED is
selected, as they are mostly interesting for space savings reasons.
An example of size saving, on x86 with only Intel CPU support:
text data bss dec hex filename
1125479 118760 212992 1457231 163c4f vmlinux.old
1121355 116536 212992 1450883 162383 vmlinux
-4124 -2224 0 -6348 -18CC +/-
However, I'm not exactly sure that the Kconfig wording is correct with
regard to !64BIT / 64BIT.
[ mingo@elte.hu: convert macro to inline ]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/cpu/intel.c defines a few fallback functions
(cmpxchg_*()) that are used when the CPU doesn't support cmpxchg
and/or cmpxchg64 natively. However, while defined in an Intel-specific
file, these functions are also used for CPUs from other vendors when
they don't support cmpxchg and/or cmpxchg64. This breaks the
compilation when support for Intel CPUs is disabled.
This patch moves these functions to a new
arch/x86/kernel/cpu/cmpxchg.c file, unconditionally compiled when
X86_32 is enabled.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: michael@free-electrons.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
movsl_mask is currently defined in arch/x86/kernel/cpu/intel.c, which
contains code specific to Intel CPUs. However, movsl_mask is used in
the non-CPU specific code in arch/x86/lib/usercopy_32.c, which breaks
the compilation when support for Intel CPUs is compiled out.
This patch solves this problem by moving movsl_mask's definition close
to its users in arch/x86/lib/usercopy_32.c.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: michael@free-electrons.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.text+0x27032): Section mismatch in reference from the function get_tce_space_from_tar() to the function .init.text:calgary_bus_has_devices()
The function get_tce_space_from_tar() references
the function __init calgary_bus_has_devices().
This is often because get_tce_space_from_tar lacks a __init
annotation or the annotation of calgary_bus_has_devices is wrong.
get_tce_space_from_tar is called only from __init function (calgary_init)
and calls __init function (calgary_bus_has_devices).
So annotate it properly.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Chandru Siddalingappa <chandru@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Take out part of get_local_pda referencing __init function (free_bootmem)
to new (static) function marked as __ref. It's safe to do because free_bootmem
is called before __init sections are dropped.
WARNING: vmlinux.o(.cpuinit.text+0x3cd7): Section mismatch in reference from the function get_local_pda() to the function .init.text:free_bootmem()
The function __cpuinit get_local_pda() references
a function __init free_bootmem().
If free_bootmem is only used by get_local_pda then
annotate free_bootmem with a matching annotation.
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/power/cpu_32.c __save_processor_state calls read_cr4()
only a i486 CPU doesn't have the CR4 register. Trying to read it
produces an invalid opcode oops during suspend to disk.
Use the safe rc4 reading op instead. If the value to be written is
zero the write is skipped.
arch/x86/power/hibernate_asm_32.S
done: swapped the use of %eax and %ecx to use jecxz for
the zero test and jump over store to %cr4.
restore_image: s/%ecx/%eax/ to be consistent with done:
In addition to __save_processor_state, acpi_save_state_mem,
efi_call_phys_prelog, and efi_call_phys_epilog had checks added
(acpi restore was in assembly and already had a check for
non-zero). There were other reads and writes of CR4, but MCE and
virtualization shouldn't be executed on a i486 anyway.
Signed-off-by: David Fries <david@fries.net>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.text+0x118f7): Section mismatch in reference from the function construct_ioapic_table() to the function .init.text:MP_bus_info()
The function construct_ioapic_table() references
the function __init MP_bus_info().
This is often because construct_ioapic_table lacks a __init
annotation or the annotation of MP_bus_info is wrong.
construct_ioapic_table is called only from construct_default_ISA_mptable which is __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x1591): Section mismatch in reference from the function init_amd() to the function .init.text:check_enable_amd_mmconf_dmi()
The function __cpuinit init_amd() references
a function __init check_enable_amd_mmconf_dmi().
If check_enable_amd_mmconf_dmi is only used by init_amd then
annotate check_enable_amd_mmconf_dmi with a matching annotation.
check_enable_amd_mmconf_dmi is only called from init_amd which is __cpuinit
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x1fe7): Section mismatch in reference from the function MP_processor_info() to the variable .init.data:x86_quirks
The function __cpuinit MP_processor_info() references
a variable __initdata x86_quirks.
If x86_quirks is only used by MP_processor_info then
annotate x86_quirks with a matching annotation.
MP_processor_info uses x86_quirks which is __init and is used only from
smp_read_mpc and construct_default_ISA_mptable which are __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.text+0x7950): Section mismatch in reference from the function native_calibrate_tsc() to the function .init.text:tsc_read_refs()
The function native_calibrate_tsc() references
the function __init tsc_read_refs().
This is often because native_calibrate_tsc lacks a __init
annotation or the annotation of tsc_read_refs is wrong.
tsc_read_refs is called from native_calibrate_tsc which is not __init
and native_calibrate_tsc cannot be marked __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch changes GART IOMMU to return a size aligned address wrt
dma_alloc_coherent, as DMA-mapping.txt defines:
The cpu return address and the DMA bus master address are both
guaranteed to be aligned to the smallest PAGE_SIZE order which
is greater than or equal to the requested size. This invariant
exists (for example) to guarantee that if you allocate a chunk
which is smaller than or equal to 64 kilobytes, the extent of the
buffer you receive will not cross a 64K boundary.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Rearrange functions and comments to find differences
easier.
Also use apic_printk in setup_boot_APIC_clock for
64bit mode.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
- Remove redundant masking of APIC_LVTTHMR register in apic_32.c
- Add masking of APIC_LVTTHMR register to apic_64.c. We use a bit
complicated #ifdef here: CONFIG_X86_MCE_P4THERMAL is 32bit specific
and X86_MCE_INTEL is 64bit specific so the appropriate config variable
will be set by Kconfig.
- the APIC_ESR register clearing in apic_64.c now uses not straightforward
way but this is allowed tradeoff.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits)
x86: add MAP_STACK mmap flag
x86: fix section mismatch warning - spp_getpage()
x86: change init_gdt to update the gdt via write_gdt, rather than a direct write.
x86-64: fix overlap of modules and fixmap areas
x86, geode-mfgpt: check IRQ before using MFGPT as clocksource
x86, acpi: cleanup, temp_stack is used only when CONFIG_SMP is set
x86: fix spin_is_contended()
x86, nmi: clean UP NMI watchdog failure message
x86, NMI: fix watchdog failure message
x86: fix /proc/meminfo DirectMap
x86: fix readb() et al compile error with gcc-3.2.3
arch/x86/Kconfig: clean up, experimental adjustement
x86: invalidate caches before going into suspend
x86, perfctr: don't use CCCR_OVF_PMI1 on Pentium 4Ds
x86, AMD IOMMU: initialize dma_ops after sysfs registration
x86m AMD IOMMU: cleanup: replace LOW_U32 macro with generic lower_32_bits
x86, AMD IOMMU: initialize device table properly
x86, AMD IOMMU: use status bit instead of memory write-back for completion wait
x86: silence mmconfig printk
x86, msr: fix NULL pointer deref due to msr_open on nonexistent CPUs
...
- remove redundant read of APIC_LVR register in 64bit mode
- APIC is always integrated for 64bit mode so
gcc will eliminate lapic_is_integrated call
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
No changes on binary level
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Acked-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If a kernel thread is preempted in single-cpu mode right after the NOP (nop
about to be turned into a lock prefix), then we CPU hotplug a CPU, and then the
thread is scheduled back again, a SMP-unsafe atomic operation will be used on
shared SMP variables, leading to corruption. No corruption would happen in the
reverse case : going from SMP to UP is ok because we split a bit instruction
into tiny pieces, which does not present this condition.
Changing the 0x90 (single-byte nop) currently used into a 0x3E DS segment
override prefix should fix this issue. Since the default of the atomic
instructions is to use the DS segment anyway, it should not affect the
behavior.
The exception to this are references that use ESP/RSP and EBP/RBP as
the base register (they will use the SS segment), however, in Linux
(a) DS == SS at all times, and (b) we do not distinguish between
segment violations reported as #SS as opposed to #GP, so there is no
need to disassemble the instruction to figure out the suitable segment.
This patch assumes that the 0x3E prefix will leave atomic operations as-is (thus
assuming they normally touch data in the DS segment). Since there seem to be no
obvious ill-use of other segment override prefixes for atomic operations, it
should be safe. It can be verified with a quick
grep -r LOCK_PREFIX include/asm-x86/
grep -A 1 -r LOCK_PREFIX arch/x86/
Taken from
This source :
AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System
Instructions
States
"Instructions that Reference a Non-Stack Segment—If an instruction encoding
references any base register other than rBP or rSP, or if an instruction
contains an immediate offset, the default segment is the data segment (DS).
These instructions can use the segment-override prefix to select one of the
non-default segments, as shown in Table 1-5."
Therefore, forcing the DS segment on the atomic operations, which already use
the DS segment, should not change.
This source :
http://wiki.osdev.org/X86_Instruction_Encoding
States
"In 64-bit the CS, SS, DS and ES segment overrides are ignored."
Confirmed by "AMD 64-Bit Technology" A.7
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/x86-64_overview.pdf
"In 64-bit mode, the DS, ES, SS and CS segment-override prefixes have no effect.
These four prefixes are no longer treated as segment-override prefixes in the
context of multipleprefix rules. Instead, they are treated as null prefixes."
This patch applies to 2.6.27-rc2, but would also have to be applied to earlier
kernels (2.6.26, 2.6.25, ...).
Performance impact of the fix : tests done on "xaddq" and "xaddl" shows it
actually improves performances on Intel Xeon, AMD64, Pentium M. It does not
change the performance on Pentium II, Pentium 3 and Pentium 4.
Xeon E5405 2.0GHz :
NR_TESTS 10000000
test empty cycles : 162207948
test test 1-byte nop xadd cycles : 170755422
test test DS override prefix xadd cycles : 170000118 *
test test LOCK xadd cycles : 472012134
AMD64 2.0GHz :
NR_TESTS 10000000
test empty cycles : 146674549
test test 1-byte nop xadd cycles : 150273860
test test DS override prefix xadd cycles : 149982382 *
test test LOCK xadd cycles : 270000690
Pentium 4 3.0GHz
NR_TESTS 10000000
test empty cycles : 290001195
test test 1-byte nop xadd cycles : 310000560
test test DS override prefix xadd cycles : 310000575 *
test test LOCK xadd cycles : 1050103740
Pentium M 2.0GHz
NR_TESTS 10000000
test empty cycles : 180000523
test test 1-byte nop xadd cycles : 320000345
test test DS override prefix xadd cycles : 310000374 *
test test LOCK xadd cycles : 480000357
Pentium 3 550MHz
NR_TESTS 10000000
test empty cycles : 510000231
test test 1-byte nop xadd cycles : 620000128
test test DS override prefix xadd cycles : 620000110 *
test test LOCK xadd cycles : 800000088
Pentium II 350MHz
NR_TESTS 10000000
test empty cycles : 200833494
test test 1-byte nop xadd cycles : 340000130
test test DS override prefix xadd cycles : 340000126 *
test test LOCK xadd cycles : 530000078
Speed test modules can be found at
http://ltt.polymtl.ca/svn/trunk/tests/kernel/test-prefix-speed-32.chttp://ltt.polymtl.ca/svn/trunk/tests/kernel/test-prefix-speed.c
Macro-benchmarks
2.0GHz E5405 Core 2 dual Quad-Core Xeon
Summary
* replace smp lock prefixes with DS segment selector prefixes
no lock prefix (s) with lock prefix (s) Speedup
make -j1 kernel/ 33.94 +/- 0.07 34.91 +/- 0.27 2.8 %
hackbench 50 2.99 +/- 0.01 3.74 +/- 0.01 25.1 %
* replace smp lock prefixes with 0x90 nops
no lock prefix (s) with lock prefix (s) Speedup
make -j1 kernel/ 34.16 +/- 0.32 34.91 +/- 0.27 2.2 %
hackbench 50 3.00 +/- 0.01 3.74 +/- 0.01 24.7 %
Detail :
1 CPU, replace smp lock prefixes with DS segment selector prefixes
make -j1 kernel/
real 0m34.067s
user 0m30.630s
sys 0m2.980s
real 0m33.867s
user 0m30.582s
sys 0m3.024s
real 0m33.939s
user 0m30.738s
sys 0m2.876s
real 0m33.913s
user 0m30.806s
sys 0m2.808s
avg : 33.94s
std. dev. : 0.07s
hackbench 50
Time: 2.978
Time: 2.982
Time: 3.010
Time: 2.984
Time: 2.982
avg : 2.99
std. dev. : 0.01
1 CPU, noreplace-smp
make -j1 kernel/
real 0m35.326s
user 0m30.630s
sys 0m3.260s
real 0m34.325s
user 0m30.802s
sys 0m3.084s
real 0m35.568s
user 0m30.722s
sys 0m3.168s
real 0m34.435s
user 0m30.886s
sys 0m2.996s
avg.: 34.91s
std. dev. : 0.27s
hackbench 50
Time: 3.733
Time: 3.750
Time: 3.761
Time: 3.737
Time: 3.741
avg : 3.74
std. dev. : 0.01
1 CPU, replace smp lock prefixes with 0x90 nops
make -j1 kernel/
real 0m34.139s
user 0m30.782s
sys 0m2.820s
real 0m34.010s
user 0m30.630s
sys 0m2.976s
real 0m34.777s
user 0m30.658s
sys 0m2.916s
real 0m33.924s
user 0m30.634s
sys 0m2.924s
real 0m33.962s
user 0m30.774s
sys 0m2.800s
real 0m34.141s
user 0m30.770s
sys 0m2.828s
avg : 34.16
std. dev. : 0.32
hackbench 50
Time: 2.999
Time: 2.994
Time: 3.004
Time: 2.991
Time: 2.988
avg : 3.00
std. dev. : 0.01
I did more runs (20 runs of each) to compare the nop case to the DS
prefix case. Results in seconds. They actually does not seems to show a
significant difference.
NOP
34.155
33.955
34.012
35.299
35.679
34.141
33.995
35.016
34.254
33.957
33.957
34.008
35.013
34.494
33.893
34.295
34.314
34.854
33.991
34.132
DS
34.080
34.304
34.374
35.095
34.291
34.135
33.940
34.208
35.276
34.288
33.861
33.898
34.610
34.709
33.851
34.256
35.161
34.283
33.865
35.078
Used http://www.graphpad.com/quickcalcs/ttest1.cfm?Format=C to do the
T-test (yeah, I'm lazy) :
Group Group One (DS prefix) Group Two (nops)
Mean 34.37815 34.37070
SD 0.46108 0.51905
SEM 0.10310 0.11606
N 20 20
P value and statistical significance:
The two-tailed P value equals 0.9620
By conventional criteria, this difference is considered to be not statistically significant.
Confidence interval:
The mean of Group One minus Group Two equals 0.00745
95% confidence interval of this difference: From -0.30682 to 0.32172
Intermediate values used in calculations:
t = 0.0480
df = 38
standard error of difference = 0.155
So, unless these calculus are completely bogus, the difference between the nop
and the DS case seems not to be statistically significant.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: H. Peter Anvin <hpa@zytor.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Roland McGrath <roland@redhat.com>
CC: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
CC: Steven Rostedt <srostedt@redhat.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: David Miller <davem@davemloft.net>
CC: Ulrich Drepper <drepper@redhat.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Gregory Haskins <ghaskins@novell.com>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
CC: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
CC: Clark Williams <williams@redhat.com>
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Andi Kleen <andi@firstfloor.org>
CC: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
By writing directly, a memory access violation can occur whilst
hotplugging a CPU if the entry was previously marked read-only.
Signed-off-by: Alex Nixon <alex.nixon@citrix.com>
Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix coding style of traps_64.c with improvements suggested by Ingo.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ftrace depends on some processor state that we destroyed during kexec and
restored by restore_processor_state(). So save_processor_state() and
restore_processor_state() are moved into machine_kexec() and ftrace is
restored after restore_processor_state().
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kexec/Kexec-jump require code size in control page is less than
PAGE_SIZE/2. This patch add link-time checking for this.
ASSERT() of ld link script is used as the link-time checking mechanism.
[akpm@linux-foundation.org: build fix]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control
page is used for not only code on some platform. For example in kexec
jump, it is used for data and stack too.
[akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion]
Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Plus add a build time check so this doesn't go unnoticed again.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dereference took place in code part responsible for manual installation
of microcode patches through /dev/cpu/microcode.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Adds a simple IRQ autodetection to the AMD Geode MFGPT driver, and more
importantly, adds some checks, if IRQs can actually be received on the
chosen line. This fixes cases where MFGPT is selected as clocksource
though not producing any ticks, so the kernel simply starves during
boot.
Signed-off-by: Jens Rottmann <JRottmann@LiPPERTEmbedded.de>
Cc: Andres Salomon <dilinger@debian.org>
Cc: linux-geode@bombadil.infradead.org
Cc: Jordan Crouse <jordan.crouse@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
arch/x86/kernel/acpi/sleep.c:24: warning: 'temp_stack' defined but not used
[ Sven Wegener <sven.wegener@stealer.net>: fix build bug ]
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
prefill_possible_map() is defined inside CONFIG_HOTPLUG_CPU,
so the nesting CONFIG_HOTPLUG_CPU is just redundant.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Allow x86 to support a built-in kernel command line. The built-in
command line can override the one provided by the boot loader, for
those cases where the boot loader is broken or it is difficult
to change the command line in the the boot loader.
H. Peter Anvin wrote:
> Ingo Molnar wrote:
>> Best would be to make it really apparent in the code that nothing
>> changes if this config option is not set. Preferably there should be
>> no extra code at all in that case.
>>
>
> I would like to see this:
[...Nested ifdefs...]
OK. This version changes absolutely nothing if CONFIG_CMDLINE_BOOL is not
set (the default). Also, no space is appended even when CONFIG_CMDLINE_BOOL
is set, but the builtin string is empty. This is less sloppy all the way
around, IMHO.
Note that I use the same option names as on other arches for
this feature.
[ mingo@elte.hu: build fix ]
Signed-off-by: Tim Bird <tim.bird@am.sony.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When a CPU core is shut down, all of its caches need to be flushed
to prevent stale data from causing errors if the core is resumed.
Current Linux suspend code performs an assignment after the flush,
which can add dirty data back to the cache. On some AMD platforms,
additional speculative reads have caused crashes on resume because
of this dirty data.
Relocate the cache flush to be the very last thing done before
halting. Tie into an assembly line so the compile will not
reorder it. Add some documentation explaining what is going
on and why we're doing this.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Acked-by: Mark Borden <mark.borden@amd.com>
Acked-by: Michael Hohmuth <michael.hohmuth@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, setup_p4_watchdog() use CCCR_OVF_PMI1 to enable the counter
overflow interrupts to the second logical core. But this bit doesn't work
on Pentium 4 Ds (model 4, stepping 4) and this patch avoids its use on
these processors. Tested on 4 different machines that have this
specific model with success.
Signed-off-by: Aristeu Rozanski <aris@redhat.com>
Cc: jvillalovos@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If sysfs registration fails all memory used by IOMMU is freed. This
happens after dma_ops initialization and the functions will access the
freed memory then.
Fix this by initializing dma_ops after the sysfs registration.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds device table initializations which forbids memory accesses
for devices per default and disables all page faults.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We are able to use clock_event_device as it's done in
64bit apic code so lets get rid of local_apic_timer_verify_ok
variable.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no need to clear APIC twice since
disable_local_APIC will clear it anyway.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
To be able to unify this function we RE-introduce
APIC_DIVISOR for 64bit mode. This snipped was eliminated
in some time ago in a sake of clenup but now we need it
again since it allow up to get rid of #ifdef(s).
And lapic_is_integrated call is added in apic_64.c but
since we always have APIC integrated on 64bit cpu compiler
will ignore this call.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Get rid of local_apic_timer_disabled and use disable_apic_timer instead.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
msr_open tests for someone trying to open a device for a nonexistent CPU.
However, the function always returns 0, not ret like it should, hence
userspace can BUG the kernel trivially. This bug was introduced by the
cdev lock_kernel pushdown patch last May.
The BUG can be reproduced with these commands:
# mknod fubar c 202 8 <-- pick a number less than NR_CPUS that is not
the number of an online CPU
# cat fubar
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
AMD SB700 based systems with spread spectrum enabled use a SMM based
HPET emulation to provide proper frequency setting. The SMM code is
initialized with the first HPET register access and takes some time to
complete. During this time the config register reads 0xffffffff. We
check for max. 1000 loops whether the config register reads a non
0xffffffff value to make sure that HPET is up and running before we go
further. A counting loop is safe, as the HPET access takes thousands
of CPU cycles. On non SB700 based machines this check is only done
once and has no side effects.
Based on a quirk patch from: crane cai <crane.cai@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
clear bits for cpu nr > 8.
This allows us to boot the full range of possible CPUs that the
supported APIC model will allow. Previously we'd hang or boot up
with less than 8 CPUs.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For some reason we had two parsers registered for maxcpus=. One in init/main.c
and another in arch/x86/smpboot.c. So I nuked the one in arch/x86.
Also 64-bit kernels used to handle maxcpus= as documented in
Documentation/cpu-hotplug.txt. CPUs with 'id > maxcpus' are initialized
but not booted. 32-bit version for some reason ignored them even though
all the infrastructure for booting them later is there.
In the current mainline both 64 and 32 bit versions are broken.
This patch restores the correct behaviour. I've tested x86_64 version on
4- and 8- way Core2 and 2-way Opteron based machines. Various config
combinations SMP, !SMP, CPU_HOTPLUG, !CPU_HOTPLUG.
Booted with maxcpus=1 and maxcpus=4, etc. Everything is working as expected.
So far we've received two reports from different people confirming that 32-bit
version also works fine, both on dual core laptops and 16way server machines.
[v2: This version fixes visws breakage pointed out by Ingo.]
Signed-off-by: Max Krasnyansky <maxk@qualcomm.com>
Cc: lizf@cn.fujitsu.com
Cc: jeff.chua.linux@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
All these structure sizes are runtime determined. So use a runtime
bug check.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fxsave/xsave instructions will not touch all the bytes in the
fxsave/xsave frame. Clear the user buffer before doing fxsave/xsave
directly to user buffer during the sigcontext setup.
This is essentially needed in the context of xsave(for example,
some of the fields in the xsave header are not touched by the xsave
and defined as must be zero).
This will also present uniform and clean context to the user (from
which user can safely do fxrstor/xrstor).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
save_i387_xstate() is already doing the required access_ok(). Remove
the redundant access_ok() before it.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The microcode stores its date in a uint32_t in some weird order
approximating pdp-endian. Rather than printing it like that, print it
properly in ISO standard form.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
SGI UV will have MMCFG base addresses that are greater than 4GB (32 bits).
v2: Use CONFIG_RESOURCES_64BIT instead of CONFIG_X86_64.
v3: Create a flag, that is set by platform specific code,
to disable the > 4GB check.
Signed-off-by: John Keller <jpk@sgi.com>
Cc: jpk@sgi.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.text+0xcd1f): Section mismatch in reference from the function find_and_reserve_crashkernel() to the function .init.text:find_e820_area()
The function find_and_reserve_crashkernel() references
the function __init find_e820_area().
This is often because find_and_reserve_crashkernel lacks a __init
annotation or the annotation of find_e820_area is wrong.
WARNING: vmlinux.o(.text+0xcd38): Section mismatch in reference from the function find_and_reserve_crashkernel() to the function .init.text:reserve_bootmem_generic()
The function find_and_reserve_crashkernel() references
the function __init reserve_bootmem_generic().
This is often because find_and_reserve_crashkernel lacks a __init
annotation or the annotation of reserve_bootmem_generic is wrong.
find_and_reserve_crashkernel is called from __init function (reserve_crashkernel)
and calls 2 __init functions (find_e820_area, reserve_bootmem_generic),
so mark it __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
WARNING: vmlinux.o(.text+0x14cf8): Section mismatch in reference from the function map_high() to the function .init.text:init_extra_mapping_uc()
The function map_high() references
the function __init init_extra_mapping_uc().
This is often because map_high lacks a __init
annotation or the annotation of init_extra_mapping_uc is wrong.
WARNING: vmlinux.o(.text+0x14d05): Section mismatch in reference from the function map_high() to the function .init.text:init_extra_mapping_wb()
The function map_high() references
the function __init init_extra_mapping_wb().
This is often because map_high lacks a __init
annotation or the annotation of init_extra_mapping_wb is wrong.
map_high is called only from __init functions (map_*_high)
and calls 2 __init_functions (init_extra_mapping_*)
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix 2.6.27rc1 cannot boot more than 8CPUs
x86: make "apic" an early_param() on 32-bit, NULL check
EFI, x86: fix function prototype
x86, pci-calgary: fix function declaration
x86: work around gcc 3.4.x bug
x86: make "apic" an early_param() on 32-bit
x86, debug: tone down arch/x86/kernel/mpparse.c debugging printk
x86_64: restore the proper NR_IRQS define so larger systems work.
x86: Restore proper vector locking during cpu hotplug
x86: Fix broken VMI in 2.6.27-rc..
x86: fdiv bug detection fix
Jeff Chua reported that booting a !bigsmp kernel on a 16-way box
hangs silently.
this is a long-standing issue, smp start AP cpu could check the
apic id >=8 etc before trying to start it.
achieve this by moving the def_to_bigsmp check later and skip the
apicid id > 8
[ mingo@elte.hu: clean up the message that is printed. ]
Reported-by: "Jeff Chua" <jeff.chua.linux@gmail.com>
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/x86/kernel/setup.c | 6 ------
arch/x86/kernel/smpboot.c | 10 ++++++++++
2 files changed, 10 insertions(+), 6 deletions(-)
Exception stacks are allocated each time a CPU is set online.
But the allocated space is never freed. Thus with one CPU hotplug
offline/online cycle there is a memory leak of 24K (6 pages) for
a CPU.
Fix is to allocate exception stacks only once -- when the CPU is
set online for the first time.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pda->irqstackptr is allocated whenever a CPU is set online.
But it is never freed. This results in a memory leak of 16K
for each CPU offline/online cycle.
Fix is to allocate pda->irqstackptr only once.
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: akpm@linux-foundation.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cyrill Gorcunov observed:
> you turned it into early_param so now it's NULL injecting vulnerabled.
> Could you please add checking for NULL str param?
fix that.
Also, change the name of 'str' into 'arg', to make it more apparent
that this is an optional argument that can be NULL, not a string
parameter that is empty when unset.
Reported-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix function declaration:
linux-next-20080807/arch/x86/kernel/pci-calgary_64.c:1353:36: warning: non-ANSI function declaration of function 'get_tce_space_from_tar'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On 32-bit, "apic" is a __setup() param meaning it is parsed rather
late in the game. Make it an early_param() for apic_printk() use
by arch/x86/kernel/mpparse.c.
On 64-bit, it already is an early_param().
Signed-off-by: Rene Herman <rene.herman@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
commit 11a62a0560 turns some formerly
nopped debugging printks in arch/x86/kernel/mppparse.c into regular
ones. The one at the top of smp_scan_config() in particular also
prints on !CONFIG_SMP/CONFIG_X86_LOCAL_APIC kernels and UP machines
without anything resembling MP tables which makes their lowly UP
owners wonder...
Turn the former Dprintk()s into apic_printk()s instead meaning that
their printing is dependent on passing the apic=verbose (or =debug)
command line param.
On 32-bit, "apic" is a __setup() param which isn't early enough
for this code and therefore needs a followup changing it into an
early_param(). On 64-bit, it already is.
Signed-off-by: Rene Herman <rene.herman@gmail.com>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
64bit mode APIC interrupt handlers are set within irqinit_64.c.
Lets do tha same for 32bit mode which would help in furter code merging.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Having cpu_online_map change during assign_irq_vector can result
in some really nasty and weird things happening. The one that
bit me last time was accessing non existent per cpu memory for non
existent cpus.
This locking was removed in a sloppy x86_64 and x86_32 merge patch.
Guys can we please try and avoid subtly breaking x86 when we are
merging files together?
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
WARNING: vmlinux.o(.text+0x118f7): Section mismatch in reference from the function construct_ioapic_table() to the function .init.text:MP_bus_info()
The function construct_ioapic_table() references
the function __init MP_bus_info().
This is often because construct_ioapic_table lacks a __init
annotation or the annotation of MP_bus_info is wrong.
construct_ioapic_table is called only from construct_default_ISA_mptable which is __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x1591): Section mismatch in reference from the function init_amd() to the function .init.text:check_enable_amd_mmconf_dmi()
The function __cpuinit init_amd() references
a function __init check_enable_amd_mmconf_dmi().
If check_enable_amd_mmconf_dmi is only used by init_amd then
annotate check_enable_amd_mmconf_dmi with a matching annotation.
check_enable_amd_mmconf_dmi is only called from init_amd which is __cpuinit
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x1fe7): Section mismatch in reference from the function MP_processor_info() to the variable .init.data:x86_quirks
The function __cpuinit MP_processor_info() references
a variable __initdata x86_quirks.
If x86_quirks is only used by MP_processor_info then
annotate x86_quirks with a matching annotation.
MP_processor_info uses x86_quirks which is __init and is used only from
smp_read_mpc and construct_default_ISA_mptable which are __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
WARNING: vmlinux.o(.text+0x7950): Section mismatch in reference from the function native_calibrate_tsc() to the function .init.text:tsc_read_refs()
The function native_calibrate_tsc() references
the function __init tsc_read_refs().
This is often because native_calibrate_tsc lacks a __init
annotation or the annotation of tsc_read_refs is wrong.
tsc_read_refs is called from native_calibrate_tsc which is not __init
and native_calibrate_tsc cannot be marked __init
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
The lowmem mapping table created by VMI need not depend on max_low_pfn
at all. Instead we now create an extra large mapping which covers all
possible lowmem instead of the physical ram that is actually available.
This allows the vmi initialization to be done before max_low_pfn could
be computed. We also move the vmi_init code very early in the boot process
so that nobody accidentally breaks the fixmap dependancy.
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Acked-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
This patch provides support for the _PSD ACPI object in the Powernow-k8
driver. Although it looks like an invasive patch, most of it is
simply the consequence of turning the static acpi_performance_data
structure into a pointer.
AMD has tested it on several machines over the past few days without issue.
[trivial checkpatch warnings fixed up by davej]
[X86_POWERNOW_K8_ACPI=n buildfix from Randy Dunlap]
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Tested-by: Frank Arnold <frank.arnold@amd.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Dave Jones <davej@redhat.com>
arch/x86/kernel/cpu/cpufreq/elanfreq.c:47:26: warning: symbol 'elan_multiplier' was not declared. Should it be static?
Yes, yes it should.
Signed-off-by: Dave Jones <davej@redhat.com>
The fdiv detection code writes s32 integer into
the boot_cpu_data.fdiv_bug.
However, the boot_cpu_data.fdiv_bug is only char (s8)
field so the detection overwrites already set fields for
other bugs, e.g. the f00f bug field.
Use local s32 variable to receive result.
This is a partial fix to Bugzilla #9928 - fixes wrong
information about the f00f bug (tested) and probably
for coma bug (I have no cpu to test this).
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix all errors and many warnings reported by checkpatch.pl
without change sys_x86_64.o
arch/x86/kernel/sys_x86_64.o:
text data bss dec hex filename
1567 0 0 1567 61f sys_x86_64.o.after
1567 0 0 1567 61f sys_x86_64.o.before
md5:
de28ffedcb5851dfd7ec87a03afec1fd sys_x86_64.o.after
de28ffedcb5851dfd7ec87a03afec1fd sys_x86_64.o.before
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix all errors and many warnings reported by checkpath.pl.
Except the change of include <asm/io.h> to <linux/io.h>
the traps.o before and after changes are the same.
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix all errors and many warnings reported by checkpatch.pl
without change signal_64.o
arch/x86/kernel/signal_64.o
text data bss dec hex filename
5143 0 8 5151 141f signal_64.o.after
5143 0 8 5151 141f signal_64.o.before
md5:
e68718092b3641cb27e79e55ce57e3ad signal_64.o.after
e68718092b3641cb27e79e55ce57e3ad signal_64.o.before
Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The XSAVE feature mask is a 64-bit number; keep it that way, in order
to avoid the mistake done with rdmsr/wrmsr. Use the xsetbv() function
provided in the previous patch.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
FP/SSE bits may be zero in the xsave header(representing the init state).
Update these bits during the ptrace fpregs set operation, to indicate the
non-init state.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On cpu's supporting xsave/xrstor, fpstate pointer in the sigcontext, will
include the extended state information along with fpstate information. Presence
of extended state information is indicated by the presence
of FP_XSTATE_MAGIC1 at fpstate.sw_reserved.magic1 and FP_XSTATE_MAGIC2
at fpstate + (fpstate.sw_reserved.extended_size - FP_XSTATE_MAGIC2_SIZE).
Extended feature bit mask that is saved in the memory layout is represented
by the fpstate.sw_reserved.xstate_bv
For RT signal frames, UC_FP_XSTATE in the uc_flags also indicate the
presence of extended state information in the sigcontext's fpstate
pointer.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
move 64bit routines that saves/restores fpstate in/from user stack from
signal_64.c to xsave.c
restore_i387_xstate() now handles the condition when user passes
NULL fpstate.
Other misc changes for prepartion of xsave/xrstor sigcontext support.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
dynamically allocate fpstate on the stack, instead of static allocation
in the current sigframe layout on the user stack. This will allow the
fpstate structure to grow in the future, which includes extended state
information supporting xsave/xrstor.
signal handlers will be able to access the fpstate pointer from the
sigcontext structure asusual, with no change. For the non RT sigframe's
(which are supported only for 32bit apps), current static fpstate layout
in the sigframe will be unused(so that we don't change the extramask[]
offset in the sigframe and thus prevent breaking app's which modify
extramask[]).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Uses xsave/xrstor (instead of traditional fxsave/fxrstor) in context switch
when available.
Introduces TS_XSAVE flag, which determine the need to use xsave/xrstor
instructions during context switch instead of the legacy fxsave/fxrstor
instructions. Thread-synchronous status word is already in L1 cache during
this code patch and thus minimizes the performance penality compared to
(cpu_has_xsave) checks.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Enables xsave/xrstor by turning on cr4.osxsave on cpu's which have
the xsave support. For now, features that OS supports/enabled are
FP and SSE.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Exports needed by the GRU driver.
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This IOMMU helper function doesn't work for some architectures:
http://marc.info/?l=linux-kernel&m=121699304403202&w=2
It also breaks POWER and SPARC builds:
http://marc.info/?l=linux-kernel&m=121730388001890&w=2
Currently, only x86 IOMMUs use this so let's move it to x86 for
now.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix tons of build errors:
arch/x86/kernel/built-in.o: In function `microcode_fini_cpu':
microcode_intel.c:(.text+0x11598): undefined reference to `microcode_mutex'
microcode_intel.c:(.text+0x115a4): undefined reference to `ucode_cpu_info'
microcode_intel.c:(.text+0x115ae): undefined reference to `ucode_cpu_info'
microcode_intel.c:(.text+0x115bc): undefined reference to `microcode_mutex'
[...]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix:
arch/x86/kernel/microcode.c:412: error: static declaration of ‘microcode_init’ follows non-static declaration
include/asm/microcode.h:1: error: previous declaration of ‘microcode_init’ was here
arch/x86/kernel/microcode.c:454: error: static declaration of ‘microcode_exit’ follows non-static declaration
include/asm/microcode.h:2: error: previous declaration of ‘microcode_exit’ was here
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (21 commits)
x86/PCI: use dev_printk when possible
PCI: add D3 power state avoidance quirk
PCI: fix bogus "'device' may be used uninitialized" warning in pci_slot
PCI: add an option to allow ASPM enabled forcibly
PCI: disable ASPM on pre-1.1 PCIe devices
PCI: disable ASPM per ACPI FADT setting
PCI MSI: Don't disable MSIs if the mask bit isn't supported
PCI: handle 64-bit resources better on 32-bit machines
PCI: rewrite PCI BAR reading code
PCI: document pci_target_state
PCI hotplug: fix typo in pcie hotplug output
x86 gart: replace to_pages macro with iommu_num_pages
x86, AMD IOMMU: replace to_pages macro with iommu_num_pages
iommu: add iommu_num_pages helper function
dma-coherent: add documentation to new interfaces
Cris: convert to using generic dma-coherent mem allocator
Sh: use generic per-device coherent dma allocator
ARM: support generic per-device coherent dma mem
Generic dma-coherent: fix DMA_MEMORY_EXCLUSIVE
x86: use generic per-device dma coherent allocator
...
Clean up and optimize cpumask_of_cpu(), by sharing all the zero words.
Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns
creating a huge array of constant bitmasks, realize that the zero words
can be shared.
In other words, on a 64-bit architecture, we only ever need 64 of these
arrays - with a different bit set in one single world (with enough zero
words around it so that we can create any bitmask by just offsetting in
that big array). And then we just put enough zeroes around it that we
can point every single cpumask to be one of those things.
So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each,
with one bit set in each array - 2MB memory total), we have exactly 64
arrays instead, each 8k bits in size (64kB total).
And then we just point cpumask(n) to the right position (which we can
calculate dynamically). Once we have the right arrays, getting
"cpumask(n)" ends up being:
static inline const cpumask_t *get_cpu_mask(unsigned int cpu)
{
const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG];
p -= cpu / BITS_PER_LONG;
return (const cpumask_t *)p;
}
This brings other advantages and simplifications as well:
- we are not wasting memory that is just filled with a single bit in
various different places
- we don't need all those games to re-create the arrays in some dense
format, because they're already going to be dense enough.
if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory
is a non-issue (especially since by doing this "overlapping" trick we
probably get better cache behaviour anyway).
[ mingo@elte.hu:
Converted Linus's mails into a commit. See:
http://lkml.org/lkml/2008/7/27/156http://lkml.org/lkml/2008/7/28/320
Also applied a family filter - which also has the side-effect of leaving
out the bits where Linus calls me an idio... Oh, never mind ;-)
]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch introduces microcode patch loading for AMD
processors. It is based on previous corresponding work
for Intel processors.
It hooks into the general patch loading module. Main
difference is that a container file format is used to hold
all patch data for multiple processors as well as an
equivalent CPU table, which comes seperately, as opposed
to Intel's microcode patching solution.
Kconfig and Makefile have been changed provice config
and build option for new source file.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Refactored code by introducing a two-module solution.
There is one general module in which vendor specific modules can hook into.
However, that is exclusive, there is only one vendor specific module
allowed at a time. A CPU vendor check makes sure only the correct
module for the underlying system gets called.
Functinally in terms of patch loading itself there are no changes. This
refactoring provides a basis for future implementations of other vendors'
patch loaders.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Renamed common structures to vendor specific naming scheme
so other vendors will be able to use the same naming
convention.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Split off existing code into two seperate files. One file holds general
code, the other file vendor specific parts.
No functional changes, only refactoring.
Temporarily Introduced a new module name 'ucode' for result,
due to already taken name 'microcode'.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This structure will be later used by other modules as well and
needs therfore to be moved out to a header file.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Removed typedefs. No functional changes to the code.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Intel specific microcode declarations have been moved to a seperate header file.
There are no code changes to the code itself and no side effects to other parts.
Signed-off-by: Peter Oruba <peter.oruba@amd.com>
Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fix !PCI build failure:
arch/x86/kernel/cpu/intel_cacheinfo.c: In function 'get_k8_northbridge':
arch/x86/kernel/cpu/intel_cacheinfo.c:675: error: implicit declaration of function 'pci_match_id'
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On Monday 21 July 2008, Ingo Molnar wrote:
> > applied to tip/x86/cpu, thanks Mark.
> >
> > I've done some coding style fixes for the new functions you've
> > introduced, see that commit below.
>
> -tip testing found the following build failure:
>
> arch/x86/kernel/built-in.o: In function `show_cache_disable':
> intel_cacheinfo.c:(.text+0xbbf2): undefined reference to `k8_northbridges'
> arch/x86/kernel/built-in.o: In function `store_cache_disable':
> intel_cacheinfo.c:(.text+0xbd91): undefined reference to `k8_northbridges'
>
> please send a delta fix patch against the tip/x86/cpu branch:
>
> http://people.redhat.com/mingo/tip.git/README
>
> which has your patch plus the cleanup applied.
delta fix patch follows. It removes the dependency on k8_northbridges.
-Mark Langsdorf
Operating System Research Center
AMD
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
New versions of AMD processors have support to disable parts
of their L3 caches if too many MCEs are generated by the
L3 cache.
This patch provides a /sysfs interface under the cache
hierarchy to display which caches indices are disabled
(if any) and to monitoring applications to disable a
cache index.
This patch does not set an automatic policy to disable
the L3 cache. Policy decisions would need to be made
by a RAS handler. This patch merely makes it easier to
see what indices are currently disabled.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
commit 3e9704739d ("x86: boot secondary
cpus through initial_code") causes the kernel to crash when a CPU is
brought online after the read only sections have been write
protected. The write to initial_code in do_boot_cpu() fails.
Move inital_code to .cpuinit.data section.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
intr_remapping_enabled get assigned later, so need to check that
in setup_apic_routing
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This adds TIF_NOTIFY_RESUME support for x86, both 64-bit and 32-bit.
When set, we call tracehook_notify_resume() on the way to user mode.
Signed-off-by: Roland McGrath <roland@redhat.com>
This changes x86 syscall tracing to use the new tracehook.h entry points.
There is no change, only cleanup.
Signed-off-by: Roland McGrath <roland@redhat.com>
This makes the x86 signal handling code use tracehook_signal_handler() in
place of calling into ptrace guts. The call is moved after the sa_mask
processing, but there is no other change. This cleanup doesn't matter to
existing debuggers, but is the sensible thing: have all facets of the
handler setup complete before the debugger inspects the task again.
Signed-off-by: Roland McGrath <roland@redhat.com>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, AMD IOMMU: include amd_iommu_last_bdf in device initialization
x86: fix IBM Summit based systems' phys_cpu_present_map on 32-bit kernels
x86, RDC321x: remove gpio.h complications
x86, RDC321x: add to mach-default
crashdump: fix undefined reference to `elfcorehdr_addr'
flag parameters: fix compile error of sys_epoll_create1
This patch implements devices state save/restore before after kexec.
This patch together with features in kexec_jump patch can be used for
following:
- A simple hibernation implementation without ACPI support. You can kexec a
hibernating kernel, save the memory image of original system and shutdown
the system. When resuming, you restore the memory image of original system
via ordinary kexec load then jump back.
- Kernel/system debug through making system snapshot. You can make system
snapshot, jump back, do some thing and make another system snapshot.
- Cooperative multi-kernel/system. With kexec jump, you can switch between
several kernels/systems quickly without boot process except the first time.
This appears like swap a whole kernel/system out/in.
- A general method to call program in physical mode (paging turning
off). This can be used to invoke BIOS code under Linux.
The following user-space tools can be used with kexec jump:
- kexec-tools needs to be patched to support kexec jump. The patches
and the precompiled kexec can be download from the following URL:
source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10
- makedumpfile with patches are used as memory image saving tool, it
can exclude free pages from original kernel memory image file. The
patches and the precompiled makedumpfile can be download from the
following URL:
source: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-src_cvs_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-patches_cvs_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile_cvs_kh10
- An initramfs image can be used as the root file system of kexeced
kernel. An initramfs image built with "BuildRoot" can be downloaded
from the following URL:
initramfs image: http://khibernation.sourceforge.net/download/release_v10/initramfs/rootfs_cvs_kh10.gz
All user space tools above are included in the initramfs image.
Usage example of simple hibernation:
1. Compile and install patched kernel with following options selected:
CONFIG_X86_32=y
CONFIG_RELOCATABLE=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PM=y
CONFIG_HIBERNATION=y
CONFIG_KEXEC_JUMP=y
2. Build an initramfs image contains kexec-tool and makedumpfile, or
download the pre-built initramfs image, called rootfs.gz in
following text.
3. Prepare a partition to save memory image of original kernel, called
hibernating partition in following text.
4. Boot kernel compiled in step 1 (kernel A).
5. In the kernel A, load kernel compiled in step 1 (kernel B) with
/sbin/kexec. The shell command line can be as follow:
/sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x100000
--mem-max=0xffffff --initrd=rootfs.gz
6. Boot the kernel B with following shell command line:
/sbin/kexec -e
7. The kernel B will boot as normal kexec. In kernel B the memory
image of kernel A can be saved into hibernating partition as
follow:
jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='`
echo $jump_back_entry > kexec_jump_back_entry
cp /proc/vmcore dump.elf
Then you can shutdown the machine as normal.
8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
root file system.
9. In kernel C, load the memory image of kernel A as follow:
/sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf
10. Jump back to the kernel A as follow:
/sbin/kexec -e
Then, kernel A is resumed.
Implementation point:
To support jumping between two kernels, before jumping to (executing)
the new kernel and jumping back to the original kernel, the devices
are put into quiescent state, and the state of devices and CPU is
saved. After jumping back from kexeced kernel and jumping to the new
kernel, the state of devices and CPU are restored accordingly. The
devices/CPU state save/restore code of software suspend is called to
implement corresponding function.
Known issues:
- Because the segment number supported by sys_kexec_load is limited,
hibernation image with many segments may not be load. This is
planned to be eliminated by adding a new flag to sys_kexec_load to
make a image can be loaded with multiple sys_kexec_load invoking.
Now, only the i386 architecture is supported.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch provides an enhancement to kexec/kdump. It implements the
following features:
- Backup/restore memory used by the original kernel before/after
kexec.
- Save/restore CPU state before/after kexec.
The features of this patch can be used as a general method to call program in
physical mode (paging turning off). This can be used to call BIOS code under
Linux.
kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:
source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10
Usage example of calling some physical mode code and return:
1. Compile and install patched kernel with following options selected:
CONFIG_X86_32=y
CONFIG_KEXEC=y
CONFIG_PM=y
CONFIG_KEXEC_JUMP=y
2. Build patched kexec-tool or download the pre-built one.
3. Build some physical mode executable named such as "phy_mode"
4. Boot kernel compiled in step 1.
5. Load physical mode executable with /sbin/kexec. The shell command
line can be as follow:
/sbin/kexec --load-preserve-context --args-none phy_mode
6. Call physical mode executable with following shell command line:
/sbin/kexec -e
Implementation point:
To support jumping without reserving memory. One shadow backup page (source
page) is allocated for each page used by kexeced code image (destination
page). When do kexec_load, the image of kexeced code is loaded into source
pages, and before executing, the destination pages and the source pages are
swapped, so the contents of destination pages are backupped. Before jumping
to the kexeced code image and after jumping back to the original kernel, the
destination pages and the source pages are swapped too.
C ABI (calling convention) is used as communication protocol between
kernel and called code.
A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.
Now, only the i386 architecture is supported.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The calgary code can give drivers addresses above 4GB which is very bad
for hardware that is only 32bit DMA addressable.
With this patch, the calgary code sets the global dma_ops to swiotlb or
nommu properly, and the dma_ops of devices behind the Calgary/CalIOC2
to calgary_dma_ops. So the calgary code can handle devices safely that
aren't behind the Calgary/CalIOC2.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexis Bruemmer <alexisb@us.ibm.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:
This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread). So I
CC'ed this to KVM camp. Comments are appreciated.
A pointer to dma_mapping_ops to struct dev_archdata is added. If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's
NULL, the system-wide dma_ops pointer is used as before.
If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging). It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.
The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations. So x86 can't have dma_mapping_ops per
device. Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.
The first patch adds the device argument to dma_mapping_error. The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.
This patch:
dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations. So we can't have dma_mapping_ops per device.
Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device
argument.
[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fix:
arch/x86/kernel/visws_quirks.c: In function ‘visws_early_detect’:
arch/x86/kernel/visws_quirks.c:290: error: ‘skip_ioapic_setup’ undeclared (first use in this function)
arch/x86/kernel/visws_quirks.c:290: error: (Each undeclared identifier is reported only once
arch/x86/kernel/visws_quirks.c:290: error: for each function it appears in.)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Replace previous instances of the cpumask_of_cpu_ptr* macros
with a the new (lvalue capable) generic cpumask_of_cpu().
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Create the cpumask_of_cpu_map statically in the init data section
using NR_CPUS but replace it during boot up with one sized by
nr_cpu_ids (num possible cpus).
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
On Thu, Jul 24, 2008 at 03:43:44PM -0700, Linus Torvalds wrote:
> So how about this patch as a starting point? This is the RightThing(tm) to
> do regardless, and if it then makes it easier to do some other cleanups,
> we should do it first. What do you think?
restore_fpu_checking() calls init_fpu() in error conditions.
While this is wrong(as our main intention is to clear the fpu state of
the thread), this was benign before commit 92d140e21f ("x86: fix taking
DNA during 64bit sigreturn").
Post commit 92d140e21f, live FPU registers may not belong to this
process at this error scenario.
In the error condition for restore_fpu_checking() (especially during the
64bit signal return), we are doing init_fpu(), which saves the live FPU
register state (possibly belonging to some other process context) into
the thread struct (through unlazy_fpu() in init_fpu()). This is wrong
and can leak the FPU data.
For the signal handler restore error condition in restore_i387(), clear
the fpu state present in the thread struct(before ultimately sending a
SIGSEGV for badframe).
For the paranoid error condition check in math_state_restore(), send a
SIGSEGV, if we fail to restore the state.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: <stable@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
based on work from Eric, and add some timeout so don't dead loop when debug
device is not installed
v2: fix checkpatch warning
v3: move ehci struct def to linux/usrb/ehci_def.h from host/ehci.h
also add CONFIG_EARLY_PRINTK_DBGP to disable it by default
v4: address comments from Ingo, seperate ehci reg def moving to another patch
also add auto detect port that connect to debug device for Nvidia
southbridge
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "Arjan van de Ven" <arjan@infradead.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Greg KH" <greg@kroah.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
All the values read while searching for amd_iommu_last_bdf are defined as
inclusive. Let the code handle this value as such. Found by Wei Wang. Thanks
Wei.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: robert.richter@amd.com
Cc: Wei Wang <wei.wang2@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
kdump kernel fails to boot with calgary iommu and aacraid driver on a x366
box. The ongoing dma's of aacraid from the first kernel continue to exist
until the driver is loaded in the kdump kernel. Calgary is initialized
prior to aacraid and creation of new tce tables causes wrong dma's to
occur. Here we try to get the tce tables of the first kernel in kdump
kernel and use them. While in the kdump kernel we do not allocate new tce
tables but instead read the base address register contents of calgary
iommu and use the tables that the registers point to. With these changes
the kdump kernel and hence aacraid now boots normally.
Signed-off-by: Chandru Siddalingappa <chandru@in.ibm.com>
Acked-by: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently list of kretprobe instances are stored in kretprobe object (as
used_instances,free_instances) and in kretprobe hash table. We have one
global kretprobe lock to serialise the access to these lists. This causes
only one kretprobe handler to execute at a time. Hence affects system
performance, particularly on SMP systems and when return probe is set on
lot of functions (like on all systemcalls).
Solution proposed here gives fine-grain locks that performs better on SMP
system compared to present kretprobe implementation.
Solution:
1) Instead of having one global lock to protect kretprobe instances
present in kretprobe object and kretprobe hash table. We will have
two locks, one lock for protecting kretprobe hash table and another
lock for kretporbe object.
2) We hold lock present in kretprobe object while we modify kretprobe
instance in kretprobe object and we hold per-hash-list lock while
modifying kretprobe instances present in that hash list. To prevent
deadlock, we never grab a per-hash-list lock while holding a kretprobe
lock.
3) We can remove used_instances from struct kretprobe, as we can
track used instances of kretprobe instances using kretprobe hash
table.
Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
with return probes set on all systemcalls looks like this.
cacheline non-cacheline Un-patched kernel
aligned patch aligned patch
===============================================================================
real 9m46.784s 9m54.412s 10m2.450s
user 40m5.715s 40m7.142s 40m4.273s
sys 2m57.754s 2m58.583s 3m17.430s
===========================================================
Time duration for kernel compilation ("make -j 8) on the same system, when
kernel is not probed.
=========================
real 9m26.389s
user 40m8.775s
sys 2m7.283s
=========================
Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: Jim Keniston <jkenisto@us.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fix:
arch/x86/kernel/ds.c: In function ‘ds_allocate_buffer':
arch/x86/kernel/ds.c:339: error: implicit declaration of function ‘PAGE_ALIGN'
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Suresh Siddha wants to fix a possible FPU leakage in error conditions,
but the fact that save/restore_i387() are inlines in a header file makes
that harder to do than necessary. So start off with an obvious cleanup.
This just moves the x86-64 version of save/restore_i387() out of the
header file, and moves it to the only file that it is actually used in:
arch/x86/kernel/signal_64.c. So exposing it in a header file was wrong
to begin with.
[ Side note: I'd like to fix up some of the games we play with the
32-bit version of these functions too, but that's a separate
matter. The 32-bit versions are shared - under different names
at that! - by both the native x86-32 code and the x86-64 32-bit
compatibility code ]
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well
nohz: prevent tick stop outside of the idle loop
Commit 9d25d4db81 ("x86: BUILD_IRQ say
.text to avoid .data.percpu") added a ".text" specifier to make sure
that BUILD_IRQ() builds the irq trampoline in the text segment rather
than in some random left-over segment that the compiler happened to
leave the asm in.
However, we should also make sure that we switch back by adding a
".previous" at the end, so that there are no subtle issues with
subsequent compiler-generated code.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix header export, asm-x86/processor-flags.h, CONFIG_* leaks
x86: BUILD_IRQ say .text to avoid .data.percpu
xen: don't use sysret for sysexit32
x86: call early_cpu_init at the same point
This fixes kernel http://bugzilla.kernel.org/show_bug.cgi?id=11112 (bogus
RTC update IRQs reported) for rtc-cmos, in two ways:
- When HPET is stealing the IRQs, use the first IRQ to grab
the seconds counter which will be monitored (instead of
using whatever was previously in that memory);
- In sane IRQ handling modes, scrub out old IRQ status before
enabling IRQs.
That latter is done by tightening up IRQ handling for rtc-cmos everywhere,
also ensuring that when HPET is used it's the only thing triggering IRQ
reports to userspace; net object shrink.
Also fix a bogus HPET message related to its RTC emulation.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Report-by: W Unruh <unruh@physics.ubc.ca>
Cc: Andrew Victor <avictor.za@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch introduces the new syscall inotify_init1 (note: the 1 stands for
the one parameter the syscall takes, as opposed to no parameter before). The
values accepted for this parameter are function-specific and defined in the
inotify.h header. Here the values must match the O_* flags, though. In this
patch CLOEXEC support is introduced.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef __NR_inotify_init1
# ifdef __x86_64__
# define __NR_inotify_init1 294
# elif defined __i386__
# define __NR_inotify_init1 332
# else
# error "need __NR_inotify_init1"
# endif
#endif
#define IN_CLOEXEC O_CLOEXEC
int
main (void)
{
int fd;
fd = syscall (__NR_inotify_init1, 0);
if (fd == -1)
{
puts ("inotify_init1(0) failed");
return 1;
}
int coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
puts ("inotify_init1(0) set close-on-exit");
return 1;
}
close (fd);
fd = syscall (__NR_inotify_init1, IN_CLOEXEC);
if (fd == -1)
{
puts ("inotify_init1(IN_CLOEXEC) failed");
return 1;
}
coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
puts ("inotify_init1(O_CLOEXEC) does not set close-on-exit");
return 1;
}
close (fd);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[akpm@linux-foundation.org: add sys_ni stub]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch introduces the new syscall pipe2 which is like pipe but it also
takes an additional parameter which takes a flag value. This patch implements
the handling of O_CLOEXEC for the flag. I did not add support for the new
syscall for the architectures which have a special sys_pipe implementation. I
think the maintainers of those archs have the chance to go with the unified
implementation but that's up to them.
The implementation introduces do_pipe_flags. I did that instead of changing
all callers of do_pipe because some of the callers are written in assembler.
I would probably screw up changing the assembly code. To avoid breaking code
do_pipe is now a small wrapper around do_pipe_flags. Once all callers are
changed over to do_pipe_flags the old do_pipe function can be removed.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef __NR_pipe2
# ifdef __x86_64__
# define __NR_pipe2 293
# elif defined __i386__
# define __NR_pipe2 331
# else
# error "need __NR_pipe2"
# endif
#endif
int
main (void)
{
int fd[2];
if (syscall (__NR_pipe2, fd, 0) != 0)
{
puts ("pipe2(0) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
int coe = fcntl (fd[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
return 1;
}
}
close (fd[0]);
close (fd[1]);
if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
{
puts ("pipe2(O_CLOEXEC) failed");
return 1;
}
for (int i = 0; i < 2; ++i)
{
int coe = fcntl (fd[i], F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
return 1;
}
}
close (fd[0]);
close (fd[1]);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds the new dup3 syscall. It extends the old dup2 syscall by one
parameter which is meant to hold a flag value. Support for the O_CLOEXEC flag
is added in this patch.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef __NR_dup3
# ifdef __x86_64__
# define __NR_dup3 292
# elif defined __i386__
# define __NR_dup3 330
# else
# error "need __NR_dup3"
# endif
#endif
int
main (void)
{
int fd = syscall (__NR_dup3, 1, 4, 0);
if (fd == -1)
{
puts ("dup3(0) failed");
return 1;
}
int coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
puts ("dup3(0) set close-on-exec flag");
return 1;
}
close (fd);
fd = syscall (__NR_dup3, 1, 4, O_CLOEXEC);
if (fd == -1)
{
puts ("dup3(O_CLOEXEC) failed");
return 1;
}
coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
puts ("dup3(O_CLOEXEC) set close-on-exec flag");
return 1;
}
close (fd);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds the new epoll_create2 syscall. It extends the old epoll_create
syscall by one parameter which is meant to hold a flag value. In this
patch the only flag support is EPOLL_CLOEXEC which causes the close-on-exec
flag for the returned file descriptor to be set.
A new name EPOLL_CLOEXEC is introduced which in this implementation must
have the same value as O_CLOEXEC.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <time.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef __NR_epoll_create2
# ifdef __x86_64__
# define __NR_epoll_create2 291
# elif defined __i386__
# define __NR_epoll_create2 329
# else
# error "need __NR_epoll_create2"
# endif
#endif
#define EPOLL_CLOEXEC O_CLOEXEC
int
main (void)
{
int fd = syscall (__NR_epoll_create2, 1, 0);
if (fd == -1)
{
puts ("epoll_create2(0) failed");
return 1;
}
int coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
puts ("epoll_create2(0) set close-on-exec flag");
return 1;
}
close (fd);
fd = syscall (__NR_epoll_create2, 1, EPOLL_CLOEXEC);
if (fd == -1)
{
puts ("epoll_create2(EPOLL_CLOEXEC) failed");
return 1;
}
coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag");
return 1;
}
close (fd);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds the new eventfd2 syscall. It extends the old eventfd
syscall by one parameter which is meant to hold a flag value. In this
patch the only flag support is EFD_CLOEXEC which causes the close-on-exec
flag for the returned file descriptor to be set.
A new name EFD_CLOEXEC is introduced which in this implementation must
have the same value as O_CLOEXEC.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef __NR_eventfd2
# ifdef __x86_64__
# define __NR_eventfd2 290
# elif defined __i386__
# define __NR_eventfd2 328
# else
# error "need __NR_eventfd2"
# endif
#endif
#define EFD_CLOEXEC O_CLOEXEC
int
main (void)
{
int fd = syscall (__NR_eventfd2, 1, 0);
if (fd == -1)
{
puts ("eventfd2(0) failed");
return 1;
}
int coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
puts ("eventfd2(0) sets close-on-exec flag");
return 1;
}
close (fd);
fd = syscall (__NR_eventfd2, 1, EFD_CLOEXEC);
if (fd == -1)
{
puts ("eventfd2(EFD_CLOEXEC) failed");
return 1;
}
coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
puts ("eventfd2(EFD_CLOEXEC) does not set close-on-exec flag");
return 1;
}
close (fd);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[akpm@linux-foundation.org: add sys_ni stub]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds the new signalfd4 syscall. It extends the old signalfd
syscall by one parameter which is meant to hold a flag value. In this
patch the only flag support is SFD_CLOEXEC which causes the close-on-exec
flag for the returned file descriptor to be set.
A new name SFD_CLOEXEC is introduced which in this implementation must
have the same value as O_CLOEXEC.
The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#ifndef __NR_signalfd4
# ifdef __x86_64__
# define __NR_signalfd4 289
# elif defined __i386__
# define __NR_signalfd4 327
# else
# error "need __NR_signalfd4"
# endif
#endif
#define SFD_CLOEXEC O_CLOEXEC
int
main (void)
{
sigset_t ss;
sigemptyset (&ss);
sigaddset (&ss, SIGUSR1);
int fd = syscall (__NR_signalfd4, -1, &ss, 8, 0);
if (fd == -1)
{
puts ("signalfd4(0) failed");
return 1;
}
int coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
puts ("signalfd4(0) set close-on-exec flag");
return 1;
}
close (fd);
fd = syscall (__NR_signalfd4, -1, &ss, 8, SFD_CLOEXEC);
if (fd == -1)
{
puts ("signalfd4(SFD_CLOEXEC) failed");
return 1;
}
coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
puts ("signalfd4(SFD_CLOEXEC) does not set close-on-exec flag");
return 1;
}
close (fd);
puts ("OK");
return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[akpm@linux-foundation.org: add sys_ni stub]
Signed-off-by: Ulrich Drepper <drepper@redhat.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ACPI defines a hardware signature. BIOS calculates the signature according to
hardware configure and if hardware changes while hibernated, the signature
will change. In that case, S4 resume should fail.
Still, there may be systems on which this mechanism does not work correctly,
so it is better to provide a workaround for them. For this reason, add a new
switch to the acpi_sleep= command line argument allowing one to disable
hardware signature checking.
[shaohua.li@intel.com: build fix]
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Pavel Machek <pavel@ucw.cz>
Cc: <Valdis.Kletnieks@vt.edu>
Cc: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the obsolete and no longer used include/linux/pm_legacy.h
Reviewed-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Pavel Machek <pavel@suse.cz>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
boundary. For example:
u64 val = PAGE_ALIGN(size);
always returns a value < 4GB even if size is greater than 4GB.
The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
example):
#define PAGE_SHIFT 12
#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE-1))
...
#define PAGE_ALIGN(addr) (((addr)+PAGE_SIZE-1)&PAGE_MASK)
The "~" is performed on a 32-bit value, so everything in "and" with
PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
Using the ALIGN() macro seems to be the right way, because it uses
typeof(addr) for the mask.
Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
include/linux/mm.h.
See also lkml discussion: http://lkml.org/lkml/2008/6/11/237
[akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
[akpm@linux-foundation.org: fix v850]
[akpm@linux-foundation.org: fix powerpc]
[akpm@linux-foundation.org: fix arm]
[akpm@linux-foundation.org: fix mips]
[akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
[akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
[akpm@linux-foundation.org: fix powerpc]
Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
it is for uv only
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
When I edit the x86_64 Makefile to -fno-unit-at-a-time, bootup panics
on 0xCCs in IRQ0x3e_interrupt(): IRQ0x20_interrupt etc. have got linked
into .data.percpu. Perhaps there are other ways of triggering that:
specify ".text" in the BUILD_IRQ() macro for safety.
I've been using -fno-unit-at-a-time (to lessen inlining, for easier
debugging) for a long time.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Dump all the PIC, local APIC and I/O APIC information at the
fs_initcall() level, which is after ACPI (if used) has initialised PCI
information, making the point of invocation consistent across MP-table and
ACPI platforms. Remove explicit calls to print_IO_APIC() from elsewhere.
Make the interface of all the functions involved consistent between 32-bit
and 64-bit versions and make them all static by default by the means of a
New-and-Improved(TM) __apicdebuginit() macro.
Note that like print_IO_APIC() all these only output anything if
"apic=debug" has been passed to the kernel through the command line.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The CFI_ADJUST_CFA_OFFSET dwarf2 annotation of a push/popf
pair in ret_from_fork wrongly used a value of 4. It should
have been 8. Fix that.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: heukelum@fastmail.fm
Signed-off-by: Ingo Molnar <mingo@elte.hu>
ftrace requires certain low-level code, like spinlocks and timestamps,
to be compiled without -pg in order to avoid infinite recursion. This
patch splits out the core paravirt spinlocks and the Xen spinlocks
into separate files which can be compiled without -pg.
Also do xen/time.c while we're about it. As a result, we can now use
ftrace within a Xen domain.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LTP testing showed that Xen does not properly implement
sys_modify_ldt(). This patch does the final little bits needed to
make the ldt work properly.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Call early_cpu_init() at the same (early) point in setup_arch().
The x86_64 code was calling it relatively late, after when other arch
code need to do cpu-related setup which depends on it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: hrtick_enabled() should use cpu_active()
sched, x86: clean up hrtick implementation
sched: fix build error, provide partition_sched_domains() unconditionally
sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed
cpu hotplug: Make cpu_active_map synchronization dependency clear
cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2)
sched: rework of "prioritize non-migratable tasks over migratable ones"
sched: reduce stack size in isolated_cpu_setup()
Revert parts of "ftrace: do not trace scheduler functions"
Fixed up conflicts in include/asm-x86/thread_info.h (due to the
TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and
kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr()
introduction).
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
NR_CPUS: Replace NR_CPUS in speedstep-centrino.c
cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP
NR_CPUS: Replace NR_CPUS in cpufreq userspace routines
NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix
cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target
cpumask: Provide a generic set of CPUMASK_ALLOC macros
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c
cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c
cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c
cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
Revert "cpumask: introduce new APIs"
cpumask: make for_each_cpu_mask a bit smaller
net: Pass reference to cpumask variable in net/sunrpc/svc.c
...
Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
This adds fast paths for 32-bit syscall entry and exit when
TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
These paths does not need to save and restore all registers as
the general case of tracing does. Avoiding the iret return path
when syscall audit is enabled helps performance a lot.
Signed-off-by: Roland McGrath <roland@redhat.com>
This adds fast paths for 32-bit syscall entry and exit when
TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
These paths does not need to save and restore all registers as
the general case of tracing does. Avoiding the iret return path
when syscall audit is enabled helps performance a lot.
Signed-off-by: Roland McGrath <roland@redhat.com>
This adds a fast path for 64-bit syscall entry and exit when
TIF_SYSCALL_AUDIT is set, but no other kind of syscall tracing.
This path does not need to save and restore all registers as
the general case of tracing does. Avoiding the iret return path
when syscall audit is enabled helps performance a lot.
Signed-off-by: Roland McGrath <roland@redhat.com>
This short-circuit path in sysret_signal looks wrong to me.
AFAICT, in practice the branch is never taken--and if it were,
it would go wrong. To wit, try loading a module whose init
function does set_thread_flag(TIF_IRET), and see insmod crash
(presumably with a wrong user stack pointer).
This is because the FIXUP_TOP_OF_STACK work hasn't been done yet
when we jump around the call to ptregscall_common and get to
int_with_check--where it expects the user RSP,SS,CS and EFLAGS to
have been stored by FIXUP_TOP_OF_STACK.
I don't think it's normally possible to get to sysret_signal with no
_TIF_DO_NOTIFY_MASK bits set anyway, so these two instructions are
already superfluous. If it ever did happen, it is harmless to call
do_notify_resume with nothing for it to do.
Signed-off-by: Roland McGrath <roland@redhat.com>
It is possible that alloc_iommu()'s boundary_size overflows as
dma_get_seg_boundary can return 0xffffffff. In that case, further usage of
boundary_size triggers a BUG_ON() in the iommu code.
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix crash due to missing debugctlmsr on AMD K6-3
x86: add PTE_FLAGS_MASK
x86: rename PTE_MASK to PTE_PFN_MASK
x86: fix pte_flags() to only return flags, fix lguest (updated)
x86: use setup_clear_cpu_cap with disable_apic, fix
x86: move the last Dprintk instance to pr_debug()
This patch consolidates the header guard names which are also used
externally, i.e. in .c files.
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Declare time_init() in asm-x86/time.h
Also did cleanup in asm-x86/timer.h :
timer_ack is only required for X86_32
int recalibrate_cpu_khz(void) is for X86_32
Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
Moved DECLARE_PER_CPU(int, cpu_number) from CONFIG_X86_32_SMP to CONFIG_X86_32
because cpu_number is required for both.
And include asm/smp.h in process_32.c
Signed-off-by: Jaswinder Singh <jaswinder@infradead.org>
(Jeremy said:
rusty: use PTE_MASK
rusty: use PTE_MASK
rusty: use PTE_MASK
When I asked:
jsgf: does that include the NX flag?
He responded eloquently:
rusty: use PTE_MASK
rusty: use PTE_MASK
yes, it's the official constant of masking flags out of ptes
)
Change a15af1c9ea 'x86/paravirt: add
pte_flags to just get pte flags' removed lguest's private pte_flags()
in favor of a generic one.
Unfortunately, the generic one doesn't filter out the non-flags bits:
this results in lguest creating corrupt shadow page tables and blowing
up host memory.
Since noone is supposed to use the pfn part of pte_flags(), it seems
safest to always do the filtering.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-and-morning-tea-spilled-by: Ingo Molnar <mingo@elte.hu>
introducing an APIC handling probing abstraction:
static struct genapic *apic_probe[] __initdata = {
&apic_x2apic_uv_x,
&apic_x2apic_phys,
&apic_x2apic_cluster,
&apic_physflat,
NULL,
};
This way we can remove UV, x2apic specific code from genapic_64.c and
move them to their specific genapic files.
[ v2: fix compiling when CONFIG_ACPI is not set ]
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
beauty fix: /proc/cpuinfo will still show apic feature even if
we booted up with it disabled.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Use the new generic int attribute accessors for the x86 mce tolerant
attribute. Simple example to illustrate the new macros.
There are much more places all over the tree that could be converted
like this.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This allow to dynamically generate attributes and share show/store
functions between attributes. Right now most attributes are generated
by special macros and lots of duplicated code. With the attribute
passed it's instead possible to attach some data to the attribute
and then use that in shared low level functions to do different things.
I need this for the dynamically generated bank attributes in the x86
machine check code, but it'll allow some further cleanups.
I converted all users in tree to the new show/store prototype. It's a single
huge patch to avoid unbisectable sections.
Runtime tested: x86-32, x86-64
Compiled only: ia64, powerpc
Not compile tested/only grep converted: sh, arm, avr32
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We have the dev_printk() variants for this kind of thing, use them
instead of directly trying to access the bus_id field of struct device.
This is done in order to remove bus_id entirely.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
device_create() is race-prone, so use the race-free
device_create_drvdata() instead as device_create() is going away.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are a couple of places where (P)Dprintk is used which is an old
compile time enabled printk wrapper. Convert it to the generic
pr_debug().
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
... so don't need to call clear_cpu_cap again in early_identify_cpu,
and could use cleared_cpu_caps like other places.
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch makes the needlessly global kvm_smp_prepare_boot_cpu() static.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
random uvesafb failures were reported against Gentoo:
http://bugs.gentoo.org/show_bug.cgi?id=222799
and Mihai Moldovan bisected it back to:
> 8f4d37ec07 is first bad commit
> commit 8f4d37ec07
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Fri Jan 25 21:08:29 2008 +0100
>
> sched: high-res preemption tick
Linus suspected it to be hrtick + vm86 interaction and observed:
> Btw, Peter, Ingo: I think that commit is doing bad things. They aren't
> _incorrect_ per se, but they are definitely bad.
>
> Why?
>
> Using random _TIF_WORK_MASK flags is really impolite for doing
> "scheduling" work. There's a reason that arch/x86/kernel/entry_32.S
> special-cases the _TIF_NEED_RESCHED flag: we don't want to exit out of
> vm86 mode unnecessarily.
>
> See the "work_notifysig_v86" label, and how it does that
> "save_v86_state()" thing etc etc.
Right, I never liked having to fiddle with those TIF flags. Initially I
needed it because the hrtimer base lock could not nest in the rq lock.
That however is fixed these days.
Currently the only reason left to fiddle with the TIF flags is remote
wakeups. We cannot program a remote cpu's hrtimer. I've been thinking
about using the new and improved IPI function call stuff to implement
hrtimer_start_on().
However that does require that smp_call_function_single(.wait=0) works
from interrupt context - /me looks at the latest series from Jens - Yes
that does seem to be supported, good.
Here's a stab at cleaning this stuff up ...
Mihai reported test success as well.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Mihai Moldovan <ionic@ionic.de>
Cc: Michal Januszewski <spock@gentoo.org>
Cc: Antonino Daplas <adaplas@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some cleanups in speedstep-centrino.c for NR_CPUS=4096.
* Use new CPUMASK_PTR (instead of old CPUMASK_VAR).
* Replace arrays sized by NR_CPUS with percpu variables.
* Cleanup some formatting problems (>80 chars per line)
and other checkpatch complaints.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>