The different cpu_dev structures are all used from __cpuinit callers what
I can tell. So mark them as __cpuinitdata instead of __initdata. I am a
little bit unsure about arch/i386/common.c:default_cpu, especially when it
comes to the purpose of this_cpu.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
The init_amd() function is only called from identify_cpu() which is already
marked as __cpuinit. So let's mark it as __cpuinit.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
cpu_dev->c_identify is only called from arch/i386/common.c:identify_cpu(), and
this after generic_identify() already has been called. There is no need to call
this function twice and hook it in c_identify - but I may be wrong, please
double check before applying.
This patch also removes generic_identify() from cpu.h to avoid unnecessary
future nesting.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Fix for the x86_64 kernel mapping code. Without this patch the update path
only inits one pmd_page worth of memory and tramples any entries on it. now
the calling convention to phys_pmd_init and phys_init is to always pass a
[pmd/pud] page not an offset within a page.
Signed-off-by: Keith Mannthey<kmannth@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Implement pause_on_oops() on x86_64.
AK: I redid the patch to do the oops_enter/exit in the existing
oops_begin()/end(). This makes it much shorter.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
context switch a trap is taken for the first FPU use to restore the FPU
context lazily. This is of course great for applications that have very
sporadic or no FPU use (since then you avoid doing the expensive
save/restore all the time). However for very frequent FPU users... you
take an extra trap every context switch.
The patch below adds a simple heuristic to this code: After 5 consecutive
context switches of FPU use, the lazy behavior is disabled and the context
gets restored every context switch. If the app indeed uses the FPU, the
trap is avoided. (the chance of the 6th time slice using FPU after the
previous 5 having done so are quite high obviously).
After 256 switches, this is reset and lazy behavior is returned (until
there are 5 consecutive ones again). The reason for this is to give apps
that do longer bursts of FPU use still the lazy behavior back after some
time.
[akpm@osdl.org: place new task_struct field next to jit_keyring to save space]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
This patch enables ACPI based physical CPU hotplug support for x86_64.
Implements acpi_map_lsapic() and acpi_unmap_lsapic() to support physical cpu
hotplug.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Cc: "Brown, Len" <len.brown@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Make an mmconfig warning print the bus id with a regular format.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
cyrix_identify() should be __init because transmeta_identify() is.
tsc_init() is only called from setup_arch() which is marked as __init.
These two section mismatches have been detected using running modpost on
a vmlinux image compiled with CONFIG_RELOCATABLE=y.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Now for a completely different but trivial approach.
I just boot tested it with 255 CPUS and everything worked.
Currently everything (except module data) we place in
the per cpu area we know about at compile time. So
instead of allocating a fixed size for the per_cpu area
allocate the number of bytes we need plus a fixed constant
for to be used for modules.
It isn't perfect but it is much less of a pain to
work with than what we are doing now.
AK: fixed warning
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The implementation comes from Zach's [RFC, PATCH 10/24] i386 Vmi
descriptor changes:
Descriptor and trap table cleanups. Add cleanly written accessors for
IDT and GDT gates so the subarch may override them. Note that this
allows the hypervisor to transparently tweak the DPL of the descriptors
as well as the RPL of segments in those descriptors, with no unnecessary
kernel code modification. It also allows the hypervisor implementation
of the VMI to tweak the gates, allowing for custom exception frames or
extra layers of indirection above the guest fault / IRQ handlers.
Signed-off-by: Zachary Amsden <zach@vmware.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
And add proper CFI annotation to it which was previously
impossible. This prevents "stuck" messages by the dwarf2 unwinder
when reaching the top of a kernel stack.
Includes feedback from Jan Beulich
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
It's needed for external debuggers and overhead is very small.
Also make the actual notifier chain they use static
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
It's needed for external debuggers and overhead is very small.
Also make the actual notifier chain they use static
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Fix
linux/arch/i386/kernel/mpparse.c: In function #MP_bus_info#:
linux/arch/i386/kernel/mpparse.c:232: warning: comparison is always false due to limited range of data type
Signed-off-by: Andi Kleen <ak@suse.de>
Improve Kconfig description of CONFIG_CRASH_DUMP. Previously
it was too brief to be useful.
Cc: vgoyal@in.ibm.com
Cc: ebiederm@xmission.com
Signed-off-by: Andi Kleen <ak@suse.de>
I've noticed some erratic behavior while testing the X86_64 version
of monotonic_clock().
While spinning in a loop reading monotonic clock values (pinned to a
single cpu) I noticed that the difference between subsequent values
occasionally went negative (time going backwards).
I found that in the following code:
this_offset = get_cycles_sync();
/* FIXME: 1000 or 1000000? */
--> offset = (this_offset - last_offset)*1000 / cpu_khz;
}
return base + offset;
the offset sometimes turns out to be 0, even though
this_offset > last_offset.
+Added fix From: Toyo Abe <toyoa@mvista.com>
The x86_64-mm-monotonic-clock.patch in 2.6.18-rc4-mm2 made a change to
the updating of monotonic_base. It now uses cycles_2_ns().
I suggest that a set_cyc2ns_scale() should be done prior to the setup_irq().
Because cycles_2_ns() can be called from the timer ISR right after the irq0
is enabled.
Signed-off-by: Toyo Abe <toyoa@mvista.com>
Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch moves the entry.S:error_entry to .kprobes.text section,
since code marked unsafe for kprobes jumps directly to entry.S::error_entry,
that must be marked unsafe as well.
This patch also moves all the ".previous.text" asm directives to ".previous"
for kprobes section.
AK: Following a similar i386 patch from Chuck Ebbert
AK: Also merged Jeremy's fix in.
+From: Jeremy Fitzhardinge <jeremy@goop.org>
KPROBE_ENTRY does a .section .kprobes.text, and expects its users to
do a .previous at the end of the function.
Unfortunately, if any code within the function switches sections, for
example .fixup, then the .previous ends up putting all subsequent code
into .fixup. Worse, any subsequent .fixup code gets intermingled with
the code its supposed to be fixing (which is also in .fixup). It's
surprising this didn't cause more havok.
The fix is to use .pushsection/.popsection, so this stuff nests
properly. A further cleanup would be to get rid of all
.section/.previous pairs, since they're inherently fragile.
+From: Chuck Ebbert <76306.1226@compuserve.com>
Because code marked unsafe for kprobes jumps directly to
entry.S::error_code, that must be marked unsafe as well.
The easiest way to do that is to move the page fault entry
point to just before error_code and let it inherit the same
section.
Also moved all the ".previous" asm directives for kprobes
sections to column 1 and removed ".text" from them.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
We have a test that looks for invalid pairings of certain athlon/durons
that weren't designed for SMP, and taint accordingly (with 'S') if we find
such a configuration. However, this test shouldn't fire if there's only
a single CPU present. It's perfectly valid for an SMP kernel to boot on UP
hardware for example.
AK: changed to num_possible_cpus()
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Fix a very dubious piece of code in
arch/i386/kernel/cpu/common.c:cpu_init(). This clears out %fs and
%gs, but clobbers %eax in the process without telling gcc. It turns
out that gcc happens to be not using %eax at that point anyway so it
doesn't matter much, but it looks like a bomb waiting to go off.
This does end up saving an instruction, because gcc wants %eax==0 for
the set_debugreg()s below.
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Following x86-64 patches. Reuses code from them in fact.
Convert the standard backtracer to do all output using
callbacks. Use the x86-64 stack tracer implementation
that uses these callbacks to implement the stacktrace interface.
This allows to use the new dwarf2 unwinder for stacktrace
and get better backtraces.
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
This unifies the standard backtracer and the new stacktrace
in memory backtracer. The standard one is converted to use callbacks
and then reimplement stacktrace using new callbacks.
The main advantage is that stacktrace can now use the new dwarf2 unwinder
and avoid false positives in many cases.
I kept it simple to make sure the standard backtracer stays reliable.
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
Lockdep can call the dwarf2 unwinder early, and the dwarf2 code
uses safe_smp_processor_id which tries to access the local APIC page.
But that doesn't work before the APIC code has set up its fixmap.
Check for this case and always return boot cpu then.
Cc: jbeulich@novell.com
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
- Remove unused all_contexts parameter
No caller used it
- Move skip argument into the structure (needed for
followon patches)
Cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
tce_cache_blast_stress was useful during bringup to stress the IOMMU's
cache flushing. Now that we quiesce DMAs on every cache flush, using
_stress() brings the machine down to its knees once you put it under
load. Remove this debug / bringup code that isn't useful anymore
completely.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Introduce new function verify_bit_range(). Define two versions, one
for CONFIG_IOMMU_DEBUG enabled and one for disabled. Previously we
were checking that the bitmap was consistent every time we allocated
or freed an entry in the TCE table, which is good for debugging but
incurs an unnecessary penalty on non debug builds.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
is_at_popf() needs to test for the iret instruction as well as
popf. So add that test and rename it to is_setting_trap_flag().
Also change max insn length from 16 to 15 to match reality.
LAHF / SAHF can't affect TF, so the comment in x86_64 is removed.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The combination of "local_save_flags" and "local_irq_disable" seems to be
equivalent to "local_irq_save" (see code snips below). Consequently, replace
occurrences of local_save_flags+local_irq_disable with local_irq_save.
* local_irq_save
#define raw_local_irq_save(flags) \
do { (flags) = __raw_local_irq_save(); } while (0)
static inline unsigned long __raw_local_irq_save(void)
{
unsigned long flags = __raw_local_save_flags();
raw_local_irq_disable();
return flags;
}
* local_save_flags
#define raw_local_save_flags(flags) \
do { (flags) = __raw_local_save_flags(); } while (0)
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp>
Signed-off-by: Andi Kleen <ak@suse.de>
Move it into srat.c No need to clutter up setup.c for it
And remove use in setup.c completely - it only guarded a printk
which can be done unconditionally.
Signed-off-by: Andi Kleen <ak@suse.de>
Removes code duplication between i386/x86-64.
Not needed anymore in setup.c since early_param cleanup
Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
I think it was only needed for the printks and we can do them later.
I put in a single early_printk so that we know the kernel is alive
(early_printk doesn't need any locks)
This makes some things easier for initialization of unwind for
lockdep, which is needed by later patches.
cc: mingo@elte.hu
Signed-off-by: Andi Kleen <ak@suse.de>
Instead of hackish manual parsing
Requires earlier i386 patchkit, but also fixes i386 early_printk again.
I removed some obsolete really early parameters which didn't do anything useful.
Also made a few parameters that needed it early (mostly oops printing setup)
Also removed one panic check that wasn't visible without
early console anyways (the early console is now initialized after that
panic)
This cleans up a lot of code.
Signed-off-by: Andi Kleen <ak@suse.de>
This patch replaces the open-coded early commandline parsing
throughout the i386 boot code with the generic mechanism (already used
by ppc, powerpc, ia64 and s390). The code was inconsistent with
whether it deletes the option from the cmdline or not, meaning some of
these will get passed through the environment into init.
This transformation is mainly mechanical, but there are some notable
parts:
1) Grammar: s/linux never set's it up/linux never sets it up/
2) Remove hacked-in earlyprintk= option scanning. When someone
actually implements CONFIG_EARLY_PRINTK, then they can use
early_param().
[AK: actually it is implemented, but I'm adding the early_param it in the next
x86-64 patch]
3) Move declaration of generic_apic_probe() from setup.c into asm/apic.h
4) Various parameters now moved into their appropriate files (thanks Andi).
5) All parse functions which examine arg need to check for NULL,
except one where it has subtle humor value.
AK: readded acpi_sci handling which was completely dropped
AK: moved some more variables into acpi/boot.c
Cc: len.brown@intel.com
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
This makes it possible to modify CPU flags in command line
options without hacks.
And remove another copy in head64.c
Signed-off-by: Andi Kleen <ak@suse.de>
The lock prefix will cause an exception when used with the
popf instruction, so no need to continue searching after it's
found.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
There's no need to check for invalid DMA data direction in nommu and
gart since we do it in dma-mapping.h anyway before calling the
individual dma-ops.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Allows easier extension of the GDT by using the proper C symbol
for the size in the descriptor.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Add unwind annotations to arch/x86_64/lib/*.S, and also use the macros
provided by linux/linkage.h where-ever possible.
Some of the alternative instructions handling needed to be adjusted so
that the replacement code would also have valid unwind information.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
When testing for the REX instruction prefix, first check
for 32-bit mode because in compat mode the REX prefix is an
increment instruction.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Lock sections don't work the new dwarf2 unwinder
This generates slightly smaller code. It adds one more taken
jump to the fast path.
Also move the trampolines into semaphore.S and add proper CFI
annotations.
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Make translation_disabled a uchar rather than an int
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The pci_get_device() API decrements the reference count on the 'from'
parameter when it continues searching. Therefore, take a ref count on
Calgary bus when we initialize them in either translated or
non-translated mode.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
We were freeing the iommu_table and leaking the bitmap pages. Also
rename it to calgary_free_bus, which is more accurate.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Move the tce_table_kva array, disabled bitmap and bus_to_phb array
into a new per bus 'struct calgary_bus_info'. Also slightly reorganize
build_tce_table and tce_table_setparms to avoid exporting bus_info to
tce.c.
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Jon Mason <jdmason@us.ibm.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Move initialization of all memory end variables to as early as
possible, so that dependent code doesn't need to check whether these
variables have already been set.
Change the range check in kunmap_atomic to actually make use of this
so that the no-mapping-estabished path (under CONFIG_DEBUG_HIGHMEM)
gets used only when the address is inside the lowmem area (and BUG()
otherwise).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The genapic field and the accessor macro weren't used anywhere.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
While an earlier patch already did a small step into that direction,
this patch moves initialization of all memory end variables to as
early as possible, so that dependent code doesn't need to check
whether these variables have already been set.
Also, remove a misleading (perhaps just outdated) comment, and make
static a variable only used in a single file.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andi Kleen <ak@suse.de>
... instead of using a CONFIG option. The config option still controls
if the resulting executable actually has unwind information.
This is useful to prevent compilation errors when users select
CONFIG_STACK_UNWIND on old binutils and also allows to use
CFI in the future for non kernel debugging applications.
Cc: jbeulich@novell.com
Cc: sam@ravnborg.org
Signed-off-by: Andi Kleen <ak@suse.de>
Remove some unlinuxy ways to write function parameter definitions.
Remove some stray "return;"s
No functional change.
Cc: len.brown@intel.com
Signed-off-by: Andi Kleen <ak@suse.de>
They did not really belong into io_apic.c. Move them into a new file
and clean it up a bit.
Also remove outdated ATI quirk that was obsolete,
Signed-off-by: Andi Kleen <ak@suse.de>
The MPS table specification says that the operating system should
renumber the IO-APICs following the table as needed. However in
ACPI this is not allowed or neeeded and all x86-64 systems are ACPI
compliant.
The code was already disabled on some systems because it caused
problems there. Remove it completely now.
CC: mdomsch@dell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Bugzilla #6552 says:
"In arch/i386/boot/setup.S, movw is used instead of movb for PS/2 mouse
information, although it is unsigned char. This does not harm, because
the jmp instruction overwritten by movw is used before executing movw,
and never be used again"
I've no idea if this is a real bug or how it gets fixed, so I'm submitting
it for review instead of letting it die of boredom in bugzilla. Aditionally
to i386, I've changed x86-64, which mirrors the same code.
Credits to Yoshinori K. Okuji, who found the problem and suggested a fix.
Signed-off-by: Diego Calleja <diegocg@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
The IO APIC code had lots of duplicated code to read/write 64bit
routing entries into the IO-APIC. Factor this out int common read/write
functions
In a few cases the IO APIC lock is taken more often now, but this
isn't a problem because it's all initialization/shutdown only
slow path code.
Similar to earlier x86-64 patch.
Includes a fix by Jiri Slaby for a mistake that broke resume
Signed-off-by: Andi Kleen <ak@suse.de>
The IO APIC code had lots of duplicated code to read/write 64bit
routing entries into the IO-APIC. Factor this out int common read/write
functions
In a few cases the IO APIC lock is taken more often now, but this
isn't a problem because it's all initialization/shutdown only
slow path code.
Signed-off-by: Andi Kleen <ak@suse.de>
PIC mode is an outdated way to drive the APICs that was used on
some early MP boards. It is not supported in the ACPI model.
It is unlikely to be ever configured by any x86-64 system
Remove it thus.
Signed-off-by: Andi Kleen <ak@suse.de>
No 64bit EISA or Microchannel systems ever. Remove the left over code
in the IO-APIC driver and the mptable parser
Signed-off-by: Andi Kleen <ak@suse.de>
This was an old workaround for broken MP-BIOS. The user could
specify overwrites on the command line.
I've never seen it being used for anything on 64bit. So get
rid of it for now.
Signed-off-by: Andi Kleen <ak@suse.de>
IO-APIC or local APIC can only be disabled at runtime anyways and
Kconfig has forced these options on for a long time now.
The Kconfigs are kept only now for the benefit of the shared acpi
boot.c code.
Signed-off-by: Andi Kleen <ak@suse.de>
- Move them to a pure assembly file. Previously they were in
a C file that only consisted of inline assembly. Doing it in pure
assembler is much nicer.
- Add a frame.i include with FRAME/ENDFRAME macros to easily
add frame pointers to assembly functions
- Add dwarf2 annotation to them so that the new dwarf2 unwinder
doesn't get stuck on them
- Random cleanups
Includes feedback from Jan Beulich and a UML build fix from Andrew
Morton.
Cc: jbeulich@novell.com
Cc: jdike@addtoit.com
Signed-off-by: Andi Kleen <ak@suse.de>
Previously it didn't align. Use the same one as the C compiler
in blended mode, which is good for K8 and Core2 and doesn't hurt
on P4.
Signed-off-by: Andi Kleen <ak@suse.de>
- Move the slow path fallbacks to their own assembly files
This makes them much easier to read and is needed for the next change.
- Add CFI annotations for unwinding (XXX need review)
- Remove constant case which can never happen with out of line spinlocks
- Use patchable LOCK prefixes
- Don't use lock sections anymore for inline code because they can't
be expressed by the unwinder (this adds one taken jump to the lock
fast path)
Cc: jbeulich@novell.com
Signed-off-by: Andi Kleen <ak@suse.de>
Use knowledge about EFLAGS layout (bits 22:63 are always 0) to distingush
EFLAGS word and kernel address in the spin lock stack frame.
Signed-off-by: Andi Kleen <ak@suse.de>
This ports the algorithm from x86-64 (with improvements) to i386.
Previously this only worked for frame pointer enabled kernels.
But spinlocks have a very simple stack frame that can be manually
analyzed. Do this.
Signed-off-by: Andi Kleen <ak@suse.de>
A few trivial spelling and grammar mistakes picked up in
"arch/x86_64/aperture.c", "arch/x86_64/crash.c" and
"arch/x86_64/apic.c". I think all are correct fixes but am ever aware
of my fallibility :o) This is my first patch submission so all
feedback is appreciated, esp. WRT CCing to Linus, Andi and
trivial@kernel.org, is this correct? And which is the most appropriate
kernel version to diff against? If any.
Should apply cleanly to 2.6.18-rc1
Signed-off-by: Adam Henley <adamazing@gmail.com>
Signed-off-by: Andi Kleen <ak@suse.de>
- adam
Hello,
Following my discussion with Andi. Here is a patch that introduces
two new TIF flags to simplify the context switch code in __switch_to().
The idea is to minimize the number of cache lines accessed in the common
case, i.e., when neither the debug registers nor the I/O bitmap are used.
This patch covers the x86-64 modifications. A patch for i386 follows.
Changelog:
- add TIF_DEBUG to track when debug registers are active
- add TIF_IO_BITMAP to track when I/O bitmap is used
- modify __switch_to() to use the new TIF flags
<signed-off-by>: eranian@hpl.hp.com
Signed-off-by: Andi Kleen <ak@suse.de>
For NUMA optimization and some other algorithms it is useful to have a fast
to get the current CPU and node numbers in user space.
x86-64 added a fast way to do this in a vsyscall. This adds a generic
syscall for other architectures to make it a generic portable facility.
I expect some of them will also implement it as a faster vsyscall.
The cache is an optimization for the x86-64 vsyscall optimization. Since
what the syscall returns is an approximation anyways and user space
often wants very fast results it can be cached for some time. The norma
methods to get this information in user space are relatively slow
The vsyscall is in a better position to manage the cache because it has direct
access to a fast time stamp (jiffies). For the generic syscall optimization
it doesn't help much, but enforce a valid argument to keep programs
portable
I only added an i386 syscall entry for now. Other architectures can follow
as needed.
AK: Also added some cleanups from Andrew Morton
Signed-off-by: Andi Kleen <ak@suse.de>
This patch adds a vgetcpu vsyscall, which depending on the CPU RDTSCP
capability uses either the RDTSCP or CPUID to obtain a CPU and node
numbers and pass them to the program.
AK: Lots of changes over Vojtech's original code:
Better prototype for vgetcpu()
It's better to pass the cpu / node numbers as separate arguments
to avoid mistakes when going from SMP to NUMA.
Also add a fast time stamp based cache using a user supplied
argument to speed things more up.
Use fast method from Chuck Ebbert to retrieve node/cpu from
GDT limit instead of CPUID
Made sure RDTSCP init is always executed after node is known.
Drop printk
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
This patch adds initalization of the RDTSCP auxilliary values to CPU numbers
to time.c. If RDTSCP is available, the MSRs are written with the respective
values. It can be later used to initalize per-cpu timekeeping variables.
AK: Some cleanups. Move externs into headers and fix CPU hotplug.
Signed-off-by: Vojtech Pavlik <vojtech@suse.cz>
Signed-off-by: Andi Kleen <ak@suse.de>
AK: This redoes the changes I temporarily reverted.
Intel now has support for Architectural Performance Monitoring Counters
( Refer to IA-32 Intel Architecture Software Developer's Manual
http://www.intel.com/design/pentium4/manuals/253669.htm ). This
feature is present starting from Intel Core Duo and Intel Core Solo processors.
What this means is, the performance monitoring counters and some performance
monitoring events are now defined in an architectural way (using cpuid).
And there will be no need to check for family/model etc for these architectural
events.
Below is the patch to use this performance counters in nmi watchdog driver.
Patch handles both i386 and x86-64 kernels.
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>