Commit Graph

10431 Commits

Author SHA1 Message Date
H. Peter Anvin
d7be0ce6af Merge commit 'v2.6.34-rc6' into x86/cpu 2010-05-08 14:59:58 -07:00
Cyrill Gorcunov
c7993165ef x86, perf: P4 PMU -- check for proper event index in RAW events
RAW events are special and we should be ready for user passing
in insane event index values.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100508112717.315897547@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-08 14:17:53 +02:00
Cyrill Gorcunov
3f51b7119d x86, perf: P4 PMU -- Get rid of redundant check for array index
The caller already has done such a check.
And it was wrong anyway, it had to be '>=' rather than '>'

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100508112717.130386882@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-08 14:17:53 +02:00
Cyrill Gorcunov
137351e0fe x86, perf: P4 PMU -- protect sensible procedures from preemption
Steven reported:

|
| I'm getting:
|
| Pid: 3477, comm: perf Not tainted 2.6.34-rc6 #2727
| Call Trace:
|  [<ffffffff811c7565>] debug_smp_processor_id+0xd5/0xf0
|  [<ffffffff81019874>] p4_hw_config+0x2b/0x15c
|  [<ffffffff8107acbc>] ? trace_hardirqs_on_caller+0x12b/0x14f
|  [<ffffffff81019143>] hw_perf_event_init+0x468/0x7be
|  [<ffffffff810782fd>] ? debug_mutex_init+0x31/0x3c
|  [<ffffffff810c68b2>] T.850+0x273/0x42e
|  [<ffffffff810c6cab>] sys_perf_event_open+0x23e/0x3f1
|  [<ffffffff81009e6a>] ? sysret_check+0x2e/0x69
|  [<ffffffff81009e32>] system_call_fastpath+0x16/0x1b
|
| When running perf record in latest tip/perf/core
|

Due to the fact that p4 counters are shared between HT threads
we synthetically divide the whole set of counters into two
non-intersected subsets. And while we're "borrowing" counters
from these subsets we should not be preempted (well, strictly
speaking in p4_hw_config we just pre-set reference to the
subset which allow to save some cycles in schedule routine
if it happens on the same cpu). So use get_cpu/put_cpu pair.

Also p4_pmu_schedule_events should use smp_processor_id rather
than raw_ version. This allow us to catch up preemption issue
(if there will ever be).

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <20100508112716.963478928@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-08 14:17:53 +02:00
Cyrill Gorcunov
de902d967f x86, perf: P4 PMU -- configure predefined events
If an event is not RAW we should not exit p4_hw_config
early but call x86_setup_perfctr as well.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Robert Richter <robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-08 14:17:52 +02:00
H. Peter Anvin
e08cae4181 x86: Clean up the hypervisor layer
Clean up the hypervisor layer and the hypervisor drivers, using an ops
structure instead of an enumeration with if statements.

The identity of the hypervisor, if needed, can be tested by testing
the pointer value in x86_hyper.

The MS-HyperV private state is moved into a normal global variable
(it's per-system state, not per-CPU state).  Being a normal bss
variable, it will be left at all zero on non-HyperV platforms, and so
can generally be tested for HyperV-specific features without
additional qualification.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Greg KH <greg@kroah.com>
Cc: Hank Janssen <hjanssen@microsoft.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Ky Srinivasan <ksrinivasan@novell.com>
LKML-Reference: <4BE49778.6060800@zytor.com>
2010-05-07 17:13:04 -07:00
Greg Kroah-Hartman
9fa0231742 x86, HyperV: fix up the license to mshyperv.c
This should have been GPLv2 only, we cut and pasted from the wrong file
originally, sorry.

Also removed some unneeded boilerplate license code, we all know where
to find the GPLv2, and that there's no warranty as that is implicit from
the license.

Cc: Ky Srinivasan <ksrinivasan@novell.com>
Cc: Hank Janssen <hjanssen@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
LKML-Reference: <20100507235541.GA15448@kroah.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-07 17:00:25 -07:00
Jacob Pan
2b107d9363 x86: Avoid check hlt for newer cpus
Check hlt instruction was targeted for some older CPUs. It is an expensive
operation in that it takes 4 ticks to break out the check.  We can avoid
such check completely for newer x86 cpus (family >= 5).

[ hpa: corrected family > 5 to family >= 5 ]

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1273269585-14346-1-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-07 15:31:17 -07:00
Lin Ming
4d1c52b02d perf, x86: implement group scheduling transactional APIs
Convert to the transactional PMU API and remove the duplication of
group_sched_in().

Reviewed-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1272002172.5707.61.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-07 11:31:03 +02:00
Peter Zijlstra
ab608344bc perf, x86: Improve the PEBS ABI
Rename perf_event_attr::precise to perf_event_attr::precise_ip and
widen it to 2 bits. This new field describes the required precision of
the PERF_SAMPLE_IP field:

  0 - SAMPLE_IP can have arbitrary skid
  1 - SAMPLE_IP must have constant skid
  2 - SAMPLE_IP requested to have 0 skid
  3 - SAMPLE_IP must have 0 skid

And modify the Intel PEBS code accordingly. The PEBS implementation
now supports up to precise_ip == 2, where we perform the IP fixup.

Also s/PERF_RECORD_MISC_EXACT/&_IP/ to clarify its meaning, this bit
should be set for each PERF_SAMPLE_IP field known to match the actual
instruction triggering the event.

This new scheme allows for a PEBS mode that uses the buffer for more
than a single event.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-07 11:31:02 +02:00
Peter Zijlstra
2b0b5c6fe9 perf, x86: Consolidate some code repetition
Remove some duplicated logic.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-07 11:31:02 +02:00
Peter Zijlstra
1e9a6d8d44 perf, x86: Remove PEBS SAMPLE_RAW support
Its broken, we really should get PERF_SAMPLE_REGS sorted.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-07 11:31:01 +02:00
Robert Richter
a1f2b70a94 perf, x86: Use weight instead of cmask in for_each_event_constraint()
There may exist constraints with a cmask set to zero. In this case
for_each_event_constraint() will not work properly. Now weight is used
instead of the cmask for loop exit detection. Weight is always a value
other than zero since the default contains the HWEIGHT from the
counter mask and in other cases a value of zero does not fit too.

This is in preparation of ibs event constraints that wont have a
cmask.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1271190201-25705-7-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-07 11:31:01 +02:00
Robert Richter
31fa58af57 perf, x86: Pass enable bit mask to __x86_pmu_enable_event()
To reuse this function for events with different enable bit masks,
this mask is part of the function's argument list now.

The function will be used later to control ibs events too.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1271190201-25705-6-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-07 11:31:00 +02:00
Robert Richter
9d0fcba67e perf, x86: Call x86_setup_perfctr() from .hw_config()
The perfctr setup calls are in the corresponding .hw_config()
functions now. This makes it possible to introduce config functions
for other pmu events that are not perfctr specific.

Also, all of a sudden the code looks much nicer.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1271190201-25705-4-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-07 11:31:00 +02:00
Robert Richter
c1726f343b perf, x86: Move x86_setup_perfctr()
Move x86_setup_perfctr(), no other changes made.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1271190201-25705-3-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-07 11:31:00 +02:00
Robert Richter
4261e0e0ef perf, x86: Move perfctr init code to x86_setup_perfctr()
Split __hw_perf_event_init() to configure pmu events other than
perfctrs. Perfctr code is moved to a separate function
x86_setup_perfctr(). This and the following patches refactor the code.

Split in multiple patches for better review.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1271190201-25705-2-git-send-email-robert.richter@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-07 11:30:59 +02:00
Ingo Molnar
cce9131781 Merge branch 'perf/urgent' into perf/core
Merge reason: Resolve patch dependency

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-07 11:30:30 +02:00
Ky Srinivasan
a2a47c6c3d x86: Detect running on a Microsoft HyperV system
This patch integrates HyperV detection within the framework currently
used by VmWare. With this patch, we can avoid having to replicate the
HyperV detection code in each of the Microsoft HyperV drivers.

Reworked and tweaked by Greg K-H to build properly.

Signed-off-by: K. Y. Srinivasan <ksrinivasan@novell.com>
LKML-Reference: <20100506190841.GA1605@kroah.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Vadim Rozenfeld <vrozenfe@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "K.Prasad" <prasad@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Hank Janssen <hjanssen@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-06 18:24:15 -07:00
Robert Richter
bae663bc63 oprofile/x86: make AMD IBS hotplug capable
Current IBS code is not hotplug capable. An offline cpu might not be
initialized or deinitialized properly. This patch fixes this by
removing on_each_cpu() functions. The IBS init/deinit code is executed
in the per-cpu functions model->setup_ctrs() and model->cpu_down()
which are also called by hotplug notifiers. model->cpu_down() replaces
model->exit() that became obsolete.

Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-06 15:11:15 +02:00
Robert Richter
3de668ee8d oprofile/x86: notify cpus only when daemon is running
This patch moves the cpu notifier registration from nmi_init() to
nmi_setup(). The corresponding unregistration function is now in
nmi_shutdown(). Thus, the hotplug code is only active, if the oprofile
daemon is running.

Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-06 13:15:39 +02:00
David Rientjes
b0c4d952a1 x86: Fix fake apicid to node mapping for numa emulation
With NUMA emulation, it's possible for a single cpu to be bound
to multiple nodes since more than one may have affinity if
allocated on a physical node that is local to the cpu.

APIC ids must therefore be mapped to the lowest node ids to
maintain generic kernel use of functions such as cpu_to_node()
that determine device affinity.  For example, if a device has
proximity to physical node 1, for instance, and a cpu happens to
be mapped to a higher emulated node id 8, the proximity may not
be correctly determined by comparison in generic code even
though the cpu may be truly local and allocated on physical node 1.

When this happens, the true topology of the machine isn't
accurately represented in the emulated environment; although
this isn't critical to the system's uptime, any generic code
that is NUMA aware benefits from the physical topology being
accurately represented.

This can affect any system that maps multiple APIC ids to a
single node and is booted with numa=fake=N where N is greater
than the number of physical nodes.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <alpine.DEB.2.00.1005060224140.19473@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-06 12:02:05 +02:00
Eric W. Biederman
4f47b4c9f0 x86, acpi/irq: Define gsi_end when X86_IO_APIC is undefined
My recent changes introducing a global gsi_end variable
failed to take into account the case of using acpi on a system
not built to support IO_APICs, causing the build to fail.

Define gsi_end to 15 when CONFIG_X86_IO_APIC is not set to avoid
compile errors.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <m1tyqm14la.fsf_-_@fess.ebiederm.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-06 08:17:51 +02:00
Linus Torvalds
d7526f271f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
  powernow-k8: Fix frequency reporting
  x86: Fix parse_reservetop() build failure on certain configs
  x86: Fix NULL pointer access in irq_force_complete_move() for Xen guests
  x86: Fix 'reservetop=' functionality
2010-05-04 19:07:35 -07:00
David Howells
a66f6375bd Fix the x86_64 implementation of call_rwsem_wait()
The x86_64 call_rwsem_wait() treats the active state counter part of the
R/W semaphore state as being 16-bit when it's actually 32-bit (it's half
of the 64-bit state).  It should do "decl %edx" not "decw %dx".

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-04 15:24:14 -07:00
Eric W. Biederman
7b20bd5fb9 x86, irq: Kill io_apic_renumber_irq
Now that the generic irq layer is performing the exact same remapping as
io_apic_renumber_irq we can kill this weird  es7000 specific function.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-15-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-04 13:35:20 -07:00
Eric W. Biederman
988856ee16 x86, acpi/irq: Handle isa irqs that are not identity mapped to gsi's.
ACPI irq source overrides are allowed for the 16 isa irqs and are
allowed to map any gsi to any isa irq.  A few motherboards have been
seen to take advantage of this and put the isa irqs on the 2nd or
3rd ioapic.  This causes some problems, most notably the fact
that we can not use any gsi < 16.

To correct this move the gsis that are not isa irqs and have
a gsi number < 16 into the linux irq space just past gsi_end.
This is what the es7000 platform is doing today.  Moving only the
low 16 gsis above the rest of the gsi's only penalizes weird
platforms, leaving sane acpi implementations with a 1-1 mapping
of gsis and irqs.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-14-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-04 13:35:17 -07:00
Eric W. Biederman
4afc51a835 x86, ioapic: Simplify probe_nr_irqs_gsi.
Use the global gsi_end value now that all ioapics have
valid gsi numbers instead of a combination of acpi_probe_gsi
and walking all of the ioapics and couting their number of
entries by hand if acpi_probe_gsi gave us an answer we did
not like.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-13-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-04 13:35:11 -07:00
Eric W. Biederman
d464207c4f x86, ioapic: Optimize pin_2_irq
Now that all ioapics have valid gsi_base values use this to
accellerate pin_2_irq.  In the case of acpi this also ensures
that pin_2_irq will compute the same irq value for an ioapic
pin as acpi will.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-12-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-04 13:35:08 -07:00
Eric W. Biederman
7716a5c4ff x86, ioapic: Move nr_ioapic_registers calculation to mp_register_ioapic.
Now that all ioapic registration happens in mp_register_ioapic we can
move the calculation of nr_ioapic_registers there from enable_IO_APIC.
The number of ioapic registers is already calucated in mp_register_ioapic
so all that really needs to be done is to save the caluclated value
in nr_ioapic_registers.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-11-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-04 13:35:03 -07:00
Eric W. Biederman
cf7500c0ea x86, ioapic: In mpparse use mp_register_ioapic
Long ago MP_ioapic_info was the primary way of setting up our
ioapic data structures and mp_register_ioapic was a compatibility
shim for acpi code.  Now the situation is reversed and
and mp_register_ioapic is the primary way of setting up our
ioapic data structures.

Keep the setting up of ioapic data structures uniform by
having mp_register_ioapic call mp_register_ioapic.

This changes a few fields:

- type: is now hardset to MP_IOAPIC but type had to
  bey MP_IOAPIC or MP_ioapic_info would not have been called.

- flags: is now hard coded to MPC_APIC_USABLE.
  We require flags to contain at least MPC_APIC_USEBLE in
  MP_ioapic_info and we don't ever examine flags so dropping
  a few flags that might possibly exist that we have never
  used is harmless.

- apicaddr: Unchanged

- apicver: Read from the ioapic instead of using the cached
  hardware value in the MP table.  The real hardware value
  will be more accurate.

- apicid: Now verified to be unique and changed if it is not.
  If the BIOS got this right this is a noop.  If the BIOS did
  not fixing things appears to be the better solution.

This adds gsi_base and gsi_end values to our ioapics defined with
the mpatable, which will make our lives simpler later since
we can always assume gsi_base and gsi_end are valid.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-10-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-04 13:34:59 -07:00
Eric W. Biederman
5777372af5 x86, ioapic: Teach mp_register_ioapic to compute a global gsi_end
Add the global variable gsi_end and teach mp_register_ioapic
to keep it uptodate as we add more ioapics into the system.

ioapics can only be added early in boot so the code that
runs later can treat gsi_end as a constant.

Remove the have hacks in sfi.c to second guess mp_register_ioapic
by keeping t's own running total of how many gsi's have been seen,
and instead use the gsi_end.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-9-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-04 13:34:56 -07:00
Eric W. Biederman
eddb0c55a1 x86, ioapic: Fix the types of gsi values
This patches fixes the types of gsi_base and gsi_end values in
struct mp_ioapic_gsi, and the gsi parameter of mp_find_ioapic
and mp_find_ioapic_pin

A gsi is cannonically a u32, not an int.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-8-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-04 13:34:52 -07:00
Eric W. Biederman
4b6b19a1c7 x86, ioapic: Fix io_apic_redir_entries to return the number of entries.
io_apic_redir_entries has a huge conceptual bug.  It returns the maximum
redirection entry not the number of redirection entries.  Which simply
does not match what the name of the function.  This just caught me
and it caught  Feng Tang, and  Len Brown when they wrote sfi_parse_ioapic.

Modify io_apic_redir_entries to actually return the number of redirection
entries, and fix the callers so that they properly handle receiving the
number of the number of redirection table entries, instead of the
number of redirection table entries less one.

While the usage in sfi.c does not show up in this patch it is fixed
by virtue of the fact that io_apic_redir_entries now has the semantics
sfi_parse_ioapic most reasonably expects.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-7-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-04 13:34:48 -07:00
Eric W. Biederman
9638fa521e x86, ioapic: Only export mp_find_ioapic and mp_find_ioapic_pin in io_apic.h
Multiple declarations of the same function in different headers
is a pain to maintain.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-6-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-04 13:34:44 -07:00
Eric W. Biederman
0fd52670fb x86, acpi/irq: Generalize mp_config_acpi_legacy_irqs
Remove the assumption that there is not an override for isa irq 0.
Instead lookup the gsi and from that lookup the ioapic and pin of each
isa irq indivdually.

In general this should not have any behavioural affect but in
perverse cases this gets all of the details correct, instead of
doing something weird.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-5-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-04 13:34:38 -07:00
Eric W. Biederman
9d2062b879 x86, acpi/irq: Fix acpi_sci_ioapic_setup so it has both bus_irq and gsi
Currently acpi_sci_ioapic_setup calls mp_override_legacy_irq with
bus_irq == gsi, which is wrong if we are comming from an override
Instead pass the bus_irq into acpi_sci_ioapic_setup.

This fix was inspired by a similar fix from:
Yinghai Lu <yinghai@kernel.org>

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-4-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-04 13:34:34 -07:00
Eric W. Biederman
9a0a91bb56 x86, acpi/irq: Teach acpi_get_override_irq to take a gsi not an isa_irq
In perverse acpi implementations the isa irqs are not identity mapped
to the first 16 gsi.  Furthermore at least the extended interrupt
resource capability may return gsi's and not isa irqs.  So since
what we get from acpi is a gsi teach acpi_get_overrride_irq to
operate on a gsi instead of an isa_irq.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-2-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-04 13:34:27 -07:00
Eric W. Biederman
2c2df8418a x86, acpi/irq: Introduce apci_isa_irq_to_gsi
There are a number of cases where the current code makes the assumption
that isa irqs identity map to the first 16 acpi global system intereupts.
In most instances that assumption is correct as that is the required
behaviour in dual i8259 mode and the default behavior in ioapic mode.

However there are some systems out there that take advantage of acpis
interrupt remapping  for the isa irqs to have a completely different
mapping of isa_irq to gsi.

Introduce acpi_isa_irq_to_gsi to perform this mapping explicitly in the
code that needs it.  Initially this will be just the current assumed
identity mapping to ensure it's introduction does not cause regressions.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <1269936436-7039-1-git-send-email-ebiederm@xmission.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-04 13:34:23 -07:00
Robert Richter
d30d64c6da oprofile/x86: reordering some functions
Reordering some functions. Necessary for the next patch. No functional
changes.

Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:40:11 +02:00
Robert Richter
de65464973 oprofile/x86: stop disabled counters in nmi handler
This patch adds checks to the nmi handler. Now samples are only
generated and counters reenabled, if the counters are running.
Otherwise the counters are stopped, if oprofile is using the nmi. In
other cases it will ignore the nmi notification.

Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:39:38 +02:00
Robert Richter
6ae56b55bc oprofile/x86: protect cpu hotplug sections
This patch reworks oprofile cpu hotplug code as follows:

Introduce ctr_running variable to check, if counters are running or
not. The state must be known for taking a cpu on or offline and when
switching counters during counter multiplexing.

Protect on_each_cpu() sections with get_online_cpus()/put_online_cpu()
functions. This is necessary if notifiers or states are
modified. Within these sections the cpu mask may not change.

Switch only between counters in nmi_cpu_switch(), if counters are
running. Otherwise the switch may restart a counter though they are
disabled.

Add nmi_cpu_setup() and nmi_cpu_shutdown() to cpu hotplug code. The
function must also be called to avoid uninitialzed counter usage.

Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:37:56 +02:00
Robert Richter
216f3d9b4e oprofile/x86: remove CONFIG_SMP macros
CPU notifier register functions also exist if CONFIG_SMP is
disabled. This change is part of hotplug code rework and also
necessary for later patches.

Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:37:56 +02:00
Robert Richter
2623a1d55a oprofile/x86: fix uninitialized counter usage during cpu hotplug
This fixes a NULL pointer dereference that is triggered when taking a
cpu offline after oprofile was initialized, e.g.:

 $ opcontrol --init
 $ opcontrol --start-daemon
 $ opcontrol --shutdown
 $ opcontrol --deinit
 $ echo 0 > /sys/devices/system/cpu/cpu1/online

See the crash dump below. Though the counter has been disabled the cpu
notifier is still active and trying to use already freed counter data.

This fix is for linux-stable. To proper fix this, the hotplug code
must be rewritten. Thus I will leave a WARN_ON_ONCE() message with
this patch.

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8132ad57>] op_amd_stop+0x2d/0x8e
PGD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu1/online
CPU 1
Modules linked in:

Pid: 0, comm: swapper Not tainted 2.6.34-rc5-oprofile-x86_64-standard-00210-g8c00f06 #16 Anaheim/Anaheim
RIP: 0010:[<ffffffff8132ad57>]  [<ffffffff8132ad57>] op_amd_stop+0x2d/0x8e
RSP: 0018:ffff880001843f28  EFLAGS: 00010006
RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000200200
RDX: ffff880001843f68 RSI: dead000000100100 RDI: 0000000000000000
RBP: ffff880001843f48 R08: 0000000000000000 R09: ffff880001843f08
R10: ffffffff8102c9a5 R11: ffff88000184ea80 R12: 0000000000000000
R13: ffff88000184f6c0 R14: 0000000000000000 R15: 0000000000000000
FS:  00007fec6a92e6f0(0000) GS:ffff880001840000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000000163b000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff88042fcd8000, task ffff88042fcd51d0)
Stack:
 ffff880001843f48 0000000000000001 ffff88042e9f7d38 ffff880001843f68
<0> ffff880001843f58 ffffffff8132a602 ffff880001843f98 ffffffff810521b3
<0> ffff880001843f68 ffff880001843f68 ffff880001843f88 ffff88042fcd9fd8
Call Trace:
 <IRQ>
 [<ffffffff8132a602>] nmi_cpu_stop+0x21/0x23
 [<ffffffff810521b3>] generic_smp_call_function_single_interrupt+0xdf/0x11b
 [<ffffffff8101804f>] smp_call_function_single_interrupt+0x22/0x31
 [<ffffffff810029f3>] call_function_single_interrupt+0x13/0x20
 <EOI>
 [<ffffffff8102c9a5>] ? wake_up_process+0x10/0x12
 [<ffffffff81008701>] ? default_idle+0x22/0x37
 [<ffffffff8100896d>] c1e_idle+0xdf/0xe6
 [<ffffffff813f1170>] ? atomic_notifier_call_chain+0x13/0x15
 [<ffffffff810012fb>] cpu_idle+0x4b/0x7e
 [<ffffffff813e8a4e>] start_secondary+0x1ae/0x1b2
Code: 89 e5 41 55 49 89 fd 41 54 45 31 e4 53 31 db 48 83 ec 08 89 df e8 be f8 ff ff 48 98 48 83 3c c5 10 67 7a 81 00 74 1f 49 8b 45 08 <42> 8b 0c 20 0f 32 48 c1 e2 20 25 ff ff bf ff 48 09 d0 48 89 c2
RIP  [<ffffffff8132ad57>] op_amd_stop+0x2d/0x8e
 RSP <ffff880001843f28>
CR2: 0000000000000000
---[ end trace 679ac372d674b757 ]---
Kernel panic - not syncing: Fatal exception in interrupt
Pid: 0, comm: swapper Tainted: G      D    2.6.34-rc5-oprofile-x86_64-standard-00210-g8c00f06 #16
Call Trace:
 <IRQ>  [<ffffffff813ebd6a>] panic+0x9e/0x10c
 [<ffffffff810474b0>] ? up+0x34/0x39
 [<ffffffff81031ccc>] ? kmsg_dump+0x112/0x12c
 [<ffffffff813eeff1>] oops_end+0x81/0x8e
 [<ffffffff8101efee>] no_context+0x1f3/0x202
 [<ffffffff8101f1b7>] __bad_area_nosemaphore+0x1ba/0x1e0
 [<ffffffff81028d24>] ? enqueue_task_fair+0x16d/0x17a
 [<ffffffff810264dc>] ? activate_task+0x42/0x53
 [<ffffffff8102c967>] ? try_to_wake_up+0x272/0x284
 [<ffffffff8101f1eb>] bad_area_nosemaphore+0xe/0x10
 [<ffffffff813f0f3f>] do_page_fault+0x1c8/0x37c
 [<ffffffff81028d24>] ? enqueue_task_fair+0x16d/0x17a
 [<ffffffff813ee55f>] page_fault+0x1f/0x30
 [<ffffffff8102c9a5>] ? wake_up_process+0x10/0x12
 [<ffffffff8132ad57>] ? op_amd_stop+0x2d/0x8e
 [<ffffffff8132ad46>] ? op_amd_stop+0x1c/0x8e
 [<ffffffff8132a602>] nmi_cpu_stop+0x21/0x23
 [<ffffffff810521b3>] generic_smp_call_function_single_interrupt+0xdf/0x11b
 [<ffffffff8101804f>] smp_call_function_single_interrupt+0x22/0x31
 [<ffffffff810029f3>] call_function_single_interrupt+0x13/0x20
 <EOI>  [<ffffffff8102c9a5>] ? wake_up_process+0x10/0x12
 [<ffffffff81008701>] ? default_idle+0x22/0x37
 [<ffffffff8100896d>] c1e_idle+0xdf/0xe6
 [<ffffffff813f1170>] ? atomic_notifier_call_chain+0x13/0x15
 [<ffffffff810012fb>] cpu_idle+0x4b/0x7e
 [<ffffffff813e8a4e>] start_secondary+0x1ae/0x1b2
------------[ cut here ]------------
WARNING: at /local/rrichter/.source/linux/arch/x86/kernel/smp.c:118 native_smp_send_reschedule+0x27/0x53()
Hardware name: Anaheim
Modules linked in:
Pid: 0, comm: swapper Tainted: G      D    2.6.34-rc5-oprofile-x86_64-standard-00210-g8c00f06 #16
Call Trace:
 <IRQ>  [<ffffffff81017f32>] ? native_smp_send_reschedule+0x27/0x53
 [<ffffffff81030ee2>] warn_slowpath_common+0x77/0xa4
 [<ffffffff81030f1e>] warn_slowpath_null+0xf/0x11
 [<ffffffff81017f32>] native_smp_send_reschedule+0x27/0x53
 [<ffffffff8102634b>] resched_task+0x60/0x62
 [<ffffffff8102653a>] check_preempt_curr_idle+0x10/0x12
 [<ffffffff8102c8ea>] try_to_wake_up+0x1f5/0x284
 [<ffffffff8102c986>] default_wake_function+0xd/0xf
 [<ffffffff810a110d>] pollwake+0x57/0x5a
 [<ffffffff8102c979>] ? default_wake_function+0x0/0xf
 [<ffffffff81026be5>] __wake_up_common+0x46/0x75
 [<ffffffff81026ed0>] __wake_up+0x38/0x50
 [<ffffffff81031694>] printk_tick+0x39/0x3b
 [<ffffffff8103ac37>] update_process_times+0x3f/0x5c
 [<ffffffff8104dc63>] tick_periodic+0x5d/0x69
 [<ffffffff8104dc90>] tick_handle_periodic+0x21/0x71
 [<ffffffff81018fd0>] smp_apic_timer_interrupt+0x82/0x95
 [<ffffffff81002853>] apic_timer_interrupt+0x13/0x20
 [<ffffffff81030cb5>] ? panic_blink_one_second+0x0/0x7b
 [<ffffffff813ebdd6>] ? panic+0x10a/0x10c
 [<ffffffff810474b0>] ? up+0x34/0x39
 [<ffffffff81031ccc>] ? kmsg_dump+0x112/0x12c
 [<ffffffff813eeff1>] ? oops_end+0x81/0x8e
 [<ffffffff8101efee>] ? no_context+0x1f3/0x202
 [<ffffffff8101f1b7>] ? __bad_area_nosemaphore+0x1ba/0x1e0
 [<ffffffff81028d24>] ? enqueue_task_fair+0x16d/0x17a
 [<ffffffff810264dc>] ? activate_task+0x42/0x53
 [<ffffffff8102c967>] ? try_to_wake_up+0x272/0x284
 [<ffffffff8101f1eb>] ? bad_area_nosemaphore+0xe/0x10
 [<ffffffff813f0f3f>] ? do_page_fault+0x1c8/0x37c
 [<ffffffff81028d24>] ? enqueue_task_fair+0x16d/0x17a
 [<ffffffff813ee55f>] ? page_fault+0x1f/0x30
 [<ffffffff8102c9a5>] ? wake_up_process+0x10/0x12
 [<ffffffff8132ad57>] ? op_amd_stop+0x2d/0x8e
 [<ffffffff8132ad46>] ? op_amd_stop+0x1c/0x8e
 [<ffffffff8132a602>] ? nmi_cpu_stop+0x21/0x23
 [<ffffffff810521b3>] ? generic_smp_call_function_single_interrupt+0xdf/0x11b
 [<ffffffff8101804f>] ? smp_call_function_single_interrupt+0x22/0x31
 [<ffffffff810029f3>] ? call_function_single_interrupt+0x13/0x20
 <EOI>  [<ffffffff8102c9a5>] ? wake_up_process+0x10/0x12
 [<ffffffff81008701>] ? default_idle+0x22/0x37
 [<ffffffff8100896d>] ? c1e_idle+0xdf/0xe6
 [<ffffffff813f1170>] ? atomic_notifier_call_chain+0x13/0x15
 [<ffffffff810012fb>] ? cpu_idle+0x4b/0x7e
 [<ffffffff813e8a4e>] ? start_secondary+0x1ae/0x1b2
---[ end trace 679ac372d674b758 ]---

Cc: Andi Kleen <andi@firstfloor.org>
Cc: stable <stable@kernel.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:37:56 +02:00
Robert Richter
5bdb7934ca oprofile/x86: remove duplicate IBS capability check
The check is already done in ibs_exit().

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:35:29 +02:00
Robert Richter
da759fe5be oprofile/x86: move IBS code
Moving code to make future changes easier. This groups all IBS code
together.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:35:28 +02:00
Robert Richter
8617f98c00 oprofile/x86: return -EBUSY if counters are already reserved
In case a counter is already reserved by the watchdog or perf_event
subsystem, oprofile ignored this counters silently. This case is
handled now and oprofile_setup() now reports an error.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:35:28 +02:00
Robert Richter
83300ce0df oprofile/x86: moving shutdown functions
Moving some code in preparation of the next patch.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:35:27 +02:00
Robert Richter
d0e4120fda oprofile/x86: reserve counter msrs pairwise
For AMD's and Intel's P6 generic performance counters have pairwise
counter and control msrs. This patch changes the counter reservation
in a way that both msrs must be registered. It joins some counter
loops and also removes the unnecessary NUM_CONTROLS macro in the AMD
implementation.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:35:26 +02:00
Robert Richter
8f5a2dd83a oprofile/x86: rework error handler in nmi_setup()
This patch improves the error handler in nmi_setup(). Most parts of
the code are moved to allocate_msrs(). In case of an error
allocate_msrs() also frees already allocated memory. nmi_setup()
becomes easier and better extendable.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-05-04 11:35:07 +02:00
H. Peter Anvin
097c1bd567 x86, cpu: Make APERF/MPERF a normal table-driven flag
APERF/MPERF can be handled via the table like all the other scattered
CPU flags.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-4-git-send-email-bp@amd64.org>
2010-05-03 15:49:31 -07:00
Borislav Petkov
d88d95eb1c x86, k8: Fix build error when K8_NB is disabled
K8_NB depends on PCI and when the last is disabled (allnoconfig) we fail
at the final linking stage due to missing exported num_k8_northbridges.
Add a header stub for that.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100503183036.GJ26107@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-03 14:34:24 -07:00
Brian Gerst
250825008f x86-32: Don't set ignore_fpu_irq in simd exception
Any processor that supports simd will have an internal fpu, and the
irq13 handler will not be enabled.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1269176446-2489-5-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-03 13:39:32 -07:00
Brian Gerst
e2e75c915d x86: Merge kernel_math_error() into math_error()
Clean up the kernel exception handling and make it more similar to
the other traps.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1269176446-2489-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-03 13:39:31 -07:00
Brian Gerst
9b6dba9e07 x86: Merge simd_math_error() into math_error()
The only difference between FPU and SIMD exceptions is where the
status bits are read from (cwd/swd vs. mxcsr).  This also fixes
the discrepency introduced by commit adf77bac, which fixed FPU
but not SIMD.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1269176446-2489-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-03 13:39:29 -07:00
Brian Gerst
40d2e76315 x86-32: Rework cache flush denied handler
The cache flush denied error is an erratum on some AMD 486 clones.  If an invd
instruction is executed in userspace, the processor calls exception 19 (13 hex)
instead of #GP (13 decimal).  On cpus where XMM is not supported, redirect
exception 19 to do_general_protection().  Also, remove die_if_kernel(), since
this was the last user.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1269176446-2489-2-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-03 13:39:26 -07:00
Mark Langsdorf
b810e94c9d powernow-k8: Fix frequency reporting
With F10, model 10, all valid frequencies are in the ACPI _PST table.

Cc: <stable@kernel.org> # 33.x 32.x
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
LKML-Reference: <1270065406-1814-6-git-send-email-bp@amd64.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-03 15:04:18 +02:00
Ingo Molnar
56f0e74c9c x86: Fix parse_reservetop() build failure on certain configs
Commit e67a807 ("x86: Fix 'reservetop=' functionality") added a
fixup_early_ioremap() call to parse_reservetop() and declared it
in io.h.

But asm/io.h was only included indirectly - and on some configs
not at all, causing a build failure on those configs.

Cc: Liang Li <liang.li@windriver.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Wang Chen <wangchen@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1272621711-8683-1-git-send-email-liang.li@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-05-03 09:22:19 +02:00
Ingo Molnar
53ba4f2fa7 Merge commit 'v2.6.34-rc6' into core/locking 2010-05-03 09:17:01 +02:00
Frederic Weisbecker
feef47d0cb hw-breakpoints: Get the number of available registers on boot dynamically
The breakpoint generic layer assumes that archs always know in advance
the static number of address registers available to host breakpoints
through the HBP_NUM macro.

However this is not true for every archs. For example Arm needs to get
this information dynamically to handle the compatiblity between
different versions.

To solve this, this patch proposes to drop the static HBP_NUM macro
and let the arch provide the number of available slots through a
new hw_breakpoint_slots() function. For archs that have
CONFIG_HAVE_MIXED_BREAKPOINTS_REGS selected, it will be called once
as the number of registers fits for instruction and data breakpoints
together.
For the others it will be called first to get the number of
instruction breakpoint registers and another time to get the
data breakpoint registers, the targeted type is given as a
parameter of hw_breakpoint_slots().

Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
2010-05-01 04:32:14 +02:00
Frederic Weisbecker
0102752e4c hw-breakpoints: Separate constraint space for data and instruction breakpoints
There are two outstanding fashions for archs to implement hardware
breakpoints.

The first is to separate breakpoint address pattern definition
space between data and instruction breakpoints. We then have
typically distinct instruction address breakpoint registers
and data address breakpoint registers, delivered with
separate control registers for data and instruction breakpoints
as well. This is the case of PowerPc and ARM for example.

The second consists in having merged breakpoint address space
definition between data and instruction breakpoint. Address
registers can host either instruction or data address and
the access mode for the breakpoint is defined in a control
register. This is the case of x86 and Super H.

This patch adds a new CONFIG_HAVE_MIXED_BREAKPOINTS_REGS config
that archs can select if they belong to the second case. Those
will have their slot allocation merged for instructions and
data breakpoints.

The others will have a separate slot tracking between data and
instruction breakpoints.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
2010-05-01 04:32:11 +02:00
Frederic Weisbecker
b2812d031d hw-breakpoints: Change/Enforce some breakpoints policies
The current policies of breakpoints in x86 and SH are the following:

- task bound breakpoints can only break on userspace addresses
- cpu wide breakpoints can only break on kernel addresses

The former rule prevents ptrace breakpoints to be set to trigger on
kernel addresses, which is good. But as a side effect, we can't
breakpoint on kernel addresses for task bound breakpoints.

The latter rule simply makes no sense, there is no reason why we
can't set breakpoints on userspace while performing cpu bound
profiles.

We want the following new policies:

- task bound breakpoint can set userspace address breakpoints, with
no particular privilege required.
- task bound breakpoints can set kernelspace address breakpoints but
must be privileged to do that.
- cpu bound breakpoints can do what they want as they are privileged
already.

To implement these new policies, this patch checks if we are dealing
with a kernel address breakpoint, if so and if the exclude_kernel
parameter is set, we tell the user that the breakpoint is invalid,
which makes a good generic ptrace protection.
If we don't have exclude_kernel, ensure the user has the right
privileges as kernel breakpoints are quite sensitive (risk of
trap recursion attacks and global performance impacts).

[ Paul Mundt: keep addr space check for sh signal delivery and fix
  double function declaration]

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2010-05-01 04:32:10 +02:00
Frederic Weisbecker
73266fc1df hw-breakpoints: Tag ptrace breakpoint as exclude_kernel
Tag ptrace breakpoints with the exclude_kernel attribute set. This
will make it easier to set generic policies on breakpoints, when it
comes to ensure nobody unpriviliged try to breakpoint on the kernel.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: K. Prasad <prasad@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
2010-05-01 04:32:07 +02:00
Prarit Bhargava
bbd391a15d x86: Fix NULL pointer access in irq_force_complete_move() for Xen guests
Upstream PV guests fail to boot because of a NULL pointer in
irq_force_complete_move().  It is possible that xen guests have
irq_desc->chip_data = NULL.

Test for NULL chip_data pointer before attempting to complete an irq move.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
LKML-Reference: <20100427152434.16193.49104.sendpatchset@prarit.bos.redhat.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org> [2.6.33]
2010-04-30 14:31:38 -07:00
Liang Li
e67a807f3d x86: Fix 'reservetop=' functionality
When specifying the 'reservetop=0xbadc0de' kernel parameter,
the kernel will stop booting due to a early_ioremap bug that
relates to commit 8827247ff.

The root cause of boot failure problem is the value of
'slot_virt[i]' was initialized in setup_arch->early_ioremap_init().
But later in setup_arch, the function 'parse_early_param' will
modify 'FIXADDR_TOP' when 'reservetop=0xbadc0de' being specified.

The simplest fix might be use __fix_to_virt(idx0) to get updated
value of 'FIXADDR_TOP' in '__early_ioremap' instead of reference
old value from slot_virt[slot] directly.

Changelog since v0:

-v1: When reservetop being handled then FIXADDR_TOP get
     adjusted, Hence check prev_map then re-initialize slot_virt and
     PMD based on new FIXADDR_TOP.

-v2: place fixup_early_ioremap hence call early_ioremap_init in
     reserve_top_address  to re-initialize slot_virt and
     corresponding PMD when parse_reservertop

-v3: move fixup_early_ioremap out of reserve_top_address to make
     sure other clients of reserve_top_address like xen/lguest won't
     broken

Signed-off-by: Liang Li <liang.li@windriver.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Wang Chen <wangchen@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1272621711-8683-1-git-send-email-liang.li@windriver.com>
[ fixed three small cleanliness details in fixup_early_ioremap() ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-30 12:19:53 +02:00
Ingo Molnar
3ca50496c2 Merge commit 'v2.6.34-rc6' into perf/core
Merge reason: update to the latest -rc.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-30 09:56:44 +02:00
H. Peter Anvin
d9c5841e22 Merge branch 'x86/asm' into x86/atomic
Merge reason:
	Conflict between LOCK_PREFIX_HERE and relative alternatives
	pointers

Resolved Conflicts:
	arch/x86/include/asm/alternative.h
	arch/x86/kernel/alternative.c

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-29 16:53:17 -07:00
H. Peter Anvin
b701a47ba4 x86: Fix LOCK_PREFIX_HERE for uniprocessor build
Checkin b3ac891b67:
x86: Add support for lock prefix in alternatives

... did not define LOCK_PREFIX_HERE in the case of a uniprocessor
build.  As a result, it would cause any of the usages of this macro to
fail on a uniprocessor build.  Fix this by defining LOCK_PREFIX_HERE
as a null string.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Luca Barbieri <luca@luca-barbieri.com>
LKML-Reference: <1267005265-27958-2-git-send-email-luca@luca-barbieri.com>
2010-04-29 16:08:54 -07:00
Linus Torvalds
dfad53d48e Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
  x86: Disable large pages on CPUs with Atom erratum AAE44
  x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzero
  x86, mrst: Conditionally register cpu hotplug notifier for apbt
2010-04-28 20:41:55 -07:00
Jan Beulich
6fc108a08d x86: Clean up arch/x86/Kconfig*
No functional change intended.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4BCF2690020000780003B340@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-28 17:25:53 -07:00
Jan Beulich
47f9fe2629 x86-64: Don't export init_level4_pgt
It's not used by any module, and i386 (as well as some other arches)
also doesn't export its equivalent (swapper_pg_dir).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4BCF33BD020000780003B3E4@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-28 17:25:47 -07:00
Jan Beulich
5967ed87ad x86-64: Reduce SMP locks table size
Reduce the SMP locks table size by using relative pointers instead of
absolute ones, thus cutting the table size by half.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4BCF30FE020000780003B3B6@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-28 17:15:47 -07:00
Jan Beulich
2e61878698 x86-64: Combine SRAT regions when possible
... i.e. when the hole between two regions isn't occupied by memory on
another node. This reduces the memory->node table size, thus reducing
cache footprint of lookups, which got increased significantly some
time ago, and things go back to how they were before that change on
the systems I looked at.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4BCF3230020000780003B3CA@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-28 17:14:11 -07:00
Jan Beulich
402af0d7c6 x86, asm: Introduce and use percpu_inc()
... generating slightly smaller code.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4BCF261F020000780003B33C@vpn.id2.novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-28 16:58:49 -07:00
Bjorn Helgaas
48728e0774 x86/PCI: compute Address Space length rather than using _LEN
ACPI _CRS Address Space Descriptors have _MIN, _MAX, and _LEN.  Linux has
been computing Address Spaces as [_MIN to _MIN + _LEN - 1].  Based on the
tests in the bug reports below, Windows apparently uses [_MIN to _MAX].

Per spec (ACPI 4.0, Table 6-40), for _CRS fixed-size, fixed location
descriptors, "_LEN must be (_MAX - _MIN + 1)", and when that's true, it
doesn't matter which way we compute the end.  But of course, there are
BIOSes that don't follow this rule, and we're better off if Linux handles
those exceptions the same way as Windows.

This patch makes Linux use [_MIN to _MAX], as Windows seems to do.  This
effectively reverts d558b483d5 and 03db42adfe and replaces them with
simpler code.

    https://bugzilla.kernel.org/show_bug.cgi?id=14337 (round)
    https://bugzilla.kernel.org/show_bug.cgi?id=15480 (truncate)

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-04-28 09:17:45 -07:00
Bjorn Helgaas
55051feb57 x86/PCI: never allocate PCI MMIO resources below BIOS_END
When we move a PCI device or assign resources to a device not configured
by the BIOS, we want to avoid the BIOS region below 1MB.  Note that if the
BIOS places devices below 1MB, we leave them there.

See https://bugzilla.kernel.org/show_bug.cgi?id=15744
and https://bugzilla.kernel.org/show_bug.cgi?id=15841

Tested-by: Andy Isaacson <adi@hexapodia.org>
Tested-by: Andy Bailey <bailey@akamai.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-04-26 12:30:03 -07:00
Linus Torvalds
383bee6b54 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: Ensure we re-enable devices on resume
  x86/PCI: parse additional host bridge window resource types
  PCI: revert broken device warning
  PCI aerdrv: use correct bit defines and add 2ms delay to aer_root_reset
  x86/PCI: ignore Consumer/Producer bit in ACPI window descriptions
2010-04-24 11:32:12 -07:00
Dmitry Torokhov
453dc65931 VMware Balloon driver
This is a standalone version of VMware Balloon driver.  Ballooning is a
technique that allows hypervisor dynamically limit the amount of memory
available to the guest (with guest cooperation).  In the overcommit
scenario, when hypervisor set detects that it needs to shuffle some
memory, it instructs the driver to allocate certain number of pages, and
the underlying memory gets returned to the hypervisor.  Later hypervisor
may return memory to the guest by reattaching memory to the pageframes and
instructing the driver to "deflate" balloon.

We are submitting a standalone driver because KVM maintainer (Avi Kivity)
expressed opinion (rightly) that our transport does not fit well into
virtqueue paradigm and thus it does not make much sense to integrate with
virtio.

There were also some concerns whether current ballooning technique is the
right thing.  If there appears a better framework to achieve this we are
prepared to evaluate and switch to using it, but in the meantime we'd like
to get this driver upstream.

We want to get the driver accepted in distributions so that users do not
have to deal with an out-of-tree module and many distributions have
"upstream first" requirement.

The driver has been shipping for a number of years and users running on
VMware platform will have it installed as part of VMware Tools even if it
will not come from a distribution, thus there should not be additional
risk in pulling the driver into mainline.  The driver will only activate
if host is VMware so everyone else should not be affected at all.

Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-24 11:31:26 -07:00
H. Peter Anvin
7a0fc404ae x86: Disable large pages on CPUs with Atom erratum AAE44
Atom erratum AAE44/AAF40/AAG38/AAH41:

"If software clears the PS (page size) bit in a present PDE (page
directory entry), that will cause linear addresses mapped through this
PDE to use 4-KByte pages instead of using a large page after old TLB
entries are invalidated. Due to this erratum, if a code fetch uses
this PDE before the TLB entry for the large page is invalidated then
it may fetch from a different physical address than specified by
either the old large page translation or the new 4-KByte page
translation. This erratum may also cause speculative code fetches from
incorrect addresses."

[http://download.intel.com/design/processor/specupdt/319536.pdf]

Where as commit 211b3d03c7 seems to
workaround errata AAH41 (mixed 4K TLBs) it reduces the window of
opportunity for the bug to occur and does not totally remove it.  This
patch disables mixed 4K/4MB page tables totally avoiding the page
splitting and not tripping this processor issue.

This is based on an original patch by Colin King.

Originally-by: Colin Ian King <colin.king@canonical.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <1269271251-19775-1-git-send-email-colin.king@canonical.com>
Cc: <stable@kernel.org>
2010-04-23 16:49:51 -07:00
H. Peter Anvin
7ce5a2b9bb x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzero
When we do a thread switch, we clear the outgoing FS/GS base if the
corresponding selector is nonzero.  This is taken by __switch_to() as
an entry invariant; it does not verify that it is true on entry.
However, copy_thread() doesn't enforce this constraint, which can
result in inconsistent results after fork().

Make copy_thread() match the behavior of __switch_to().

Reported-and-tested-by: Samuel Thibault <samuel.thibault@inria.fr>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <4BD1E061.8030605@zytor.com>
Cc: <stable@kernel.org>
2010-04-23 16:49:51 -07:00
Robin Holt
1f9cc3cb6a x86, pat: Update the page flags for memtype atomically instead of using memtype_lock
While testing an application using the xpmem (out of kernel) driver, we
noticed a significant page fault rate reduction of x86_64 with respect
to ia64.  For one test running with 32 cpus, one thread per cpu, it
took 01:08 for each of the threads to vm_insert_pfn 2GB worth of pages.
For the same test running on 256 cpus, one thread per cpu, it took 14:48
to vm_insert_pfn 2 GB worth of pages.

The slowdown was tracked to lookup_memtype which acquires the
spinlock memtype_lock.  This heavily contended lock was slowing down
vm_insert_pfn().

With the cmpxchg on page->flags method, both the 32 cpu and 256 cpu
cases take approx 00:01.3 seconds to complete.

Signed-off-by: Robin Holt <holt@sgi.com>
LKML-Reference: <20100423153627.751194346@gulag1.americas.sgi.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com>
Cc: Rafael Wysocki <rjw@novell.com>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-23 15:57:23 -07:00
Ingo Molnar
70bce3ba77 Merge branch 'linus' into perf/core
Merge reason: merge the latest fixes, update to latest -rc.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-23 11:10:30 +02:00
Borislav Petkov
59d3b38874 x86, cacheinfo: Disable index in all four subcaches
When disabling an L3 cache index, make sure we disable that index in
all four subcaches of the L3. Clarify nomenclature while at it, wrt to
disable slots versus disable index and rename accordingly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-6-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-22 17:17:27 -07:00
Borislav Petkov
ba06edb63f x86, cacheinfo: Make L3 cache info per node
Currently, we're allocating L3 cache info and calculating indices for
each online cpu which is clearly superfluous. Instead, we need to do
this per-node as is each L3 cache.

No functional change, only per-cpu memory savings.

-v2: Allocate L3 cache descriptors array dynamically.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-5-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-22 17:17:23 -07:00
Borislav Petkov
9350f982e4 x86, cacheinfo: Reorganize AMD L3 cache structure
Add a struct representing L3 cache attributes (subcache sizes and
indices count) and move the respective members out of _cpuid4_info.
Also, stash the struct pci_dev ptr into the struct simplifying the code
even more.

There should be no functionality change resulting from this patch except
slightly slimming the _cpuid4_info per-cpu vars.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-4-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-22 17:17:23 -07:00
Frank Arnold
f2b20e4140 x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
When running a quest kernel on xen we get:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
PGD 0
Oops: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 0, comm: swapper Tainted: G        W  2.6.34-rc3 #1 /HVM domU
RIP: 0010:[<ffffffff8142f2fb>]  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x
2ca/0x3df
RSP: 0018:ffff880002203e08  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060
RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38
R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0
R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68
FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020)
Stack:
 ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30
<0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70
<0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b
Call Trace:
 <IRQ>
 [<ffffffff810d7ecb>] ? sync_supers_timer_fn+0x0/0x1c
 [<ffffffff81059140>] ? mod_timer+0x23/0x25
 [<ffffffff810d7ec9>] ? arm_supers_timer+0x34/0x36
 [<ffffffff81068b8b>] ? hrtimer_get_next_event+0xa7/0xc3
 [<ffffffff81058e85>] ? get_next_timer_interrupt+0x19a/0x20d
 [<ffffffff8142fa23>] get_cpu_leaves+0x5c/0x232
 [<ffffffff8106a7b1>] ? sched_clock_local+0x1c/0x82
 [<ffffffff8106a9a0>] ? sched_clock_tick+0x75/0x7a
 [<ffffffff8107748c>] generic_smp_call_function_single_interrupt+0xae/0xd0
 [<ffffffff8101f6ef>] smp_call_function_single_interrupt+0x18/0x27
 [<ffffffff8100a773>] call_function_single_interrupt+0x13/0x20
 <EOI>
 [<ffffffff8143c468>] ? notifier_call_chain+0x14/0x63
 [<ffffffff810295c6>] ? native_safe_halt+0xc/0xd
 [<ffffffff810114eb>] ? default_idle+0x36/0x53
 [<ffffffff81008c22>] cpu_idle+0xaa/0xe4
 [<ffffffff81423a9a>] rest_init+0x7e/0x80
 [<ffffffff81b10dd2>] start_kernel+0x40e/0x419
 [<ffffffff81b102c8>] x86_64_start_reservations+0xb3/0xb7
 [<ffffffff81b103c4>] x86_64_start_kernel+0xf8/0x107
Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b
 00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff
 ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb
RIP  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
 RSP <ffff880002203e08>
CR2: 0000000000000038
---[ end trace a7919e7f17c0a726 ]---

The L3 cache index disable feature of AMD CPUs has to be disabled if the
kernel is running as guest on top of a hypervisor because northbridge
devices are not available to the guest. Currently, this fixes a boot
crash on top of Xen. In the future this will become an issue on KVM as
well.

Check if northbridge devices are present and do not enable the feature
if there are none.

Signed-off-by: Frank Arnold <frank.arnold@amd.com>
LKML-Reference: <1271945222-5283-3-git-send-email-bp@amd64.org>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-22 17:17:21 -07:00
Borislav Petkov
b1ab1b4d9a x86, cacheinfo: Unify AMD L3 cache index disable checking
All F10h CPUs starting with model 8 resp. 9, stepping 1, support L3
cache index disable. Concentrate the family, model, stepping checking at
one place and enable the feature implicitly on upcoming Fam10h models.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-2-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-22 17:17:20 -07:00
Bjorn Helgaas
66528fdd45 x86/PCI: parse additional host bridge window resource types
This adds support for Memory24, Memory32, and Memory32Fixed descriptors in
PCI host bridge _CRS.

I experimentally determined that Windows (2008 R2) accepts these descriptors
and treats them as windows that are forwarded to the PCI bus, e.g., if
it finds any PCI devices with BARs outside the windows, it moves them into
the windows.

I don't know whether any machines actually use these descriptors in PCI
host bridge _CRS methods, but if any exist and they're new enough that we
automatically turn on "pci=use_crs", they will work with Windows but not
with Linux.

Here are the details: https://bugzilla.kernel.org/show_bug.cgi?id=15817

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-04-22 16:13:22 -07:00
Linus Torvalds
a486b0af79 Merge branch 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Fix TSS size check for 16-bit tasks
  KVM: Add missing srcu_read_lock() for kvm_mmu_notifier_release()
  KVM: Increase NR_IOBUS_DEVS limit to 200
  KVM: fix the handling of dirty bitmaps to avoid overflows
  KVM: MMU: fix kvm_mmu_zap_page() and its calling path
  KVM: VMX: Save/restore rflags.vm correctly in real mode
  KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL
  KVM: Don't spam kernel log when injecting exceptions due to bad cr writes
  KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails
  KVM: take srcu lock before call to complete_pio()
2010-04-21 12:29:46 -07:00
Jan Kiszka
e8861cfe2c KVM: x86: Fix TSS size check for 16-bit tasks
A 16-bit TSS is only 44 bytes long. So make sure to test for the correct
size on task switch.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-21 13:51:42 +03:00
Jacob Pan
ae7c9b70dc x86, mrst: Conditionally register cpu hotplug notifier for apbt
APB timer is used on Moorestown platforms but not on a standard PC.
If APB timer code is compiled in but not initialized at run-time due
to lack of FW reported SFI table, kernel would panic when the non-boot
CPUs are offlined and notifier is called.

https://bugzilla.kernel.org/show_bug.cgi?id=15786

This patch ensures CPU hotplug notifier for APB timer is only registered
when the APBT timer block is initialized.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1271701423-1162-1-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-20 14:38:28 -07:00
Linus Torvalds
34388d1c4f Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix unsafe frame rewinding with hot regs fetching
2010-04-20 09:20:23 -07:00
Christoph Hellwig
4cecd935f6 x86: correctly wire up the newuname system call
Before commit e28cbf2293 ("improve
sys_newuname() for compat architectures") 64-bit x86 had a private
implementation of sys_uname which was just called sys_uname, which other
architectures used for the old uname.

Due to some merge issues with the uname refactoring patches we ended up
calling the old uname version for both the old and new system call
slots, which lead to the domainname filed never be set which caused
failures with libnss_nis.

Reported-and-tested-by: Andy Isaacson <adi@hexapodia.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-20 09:17:21 -07:00
Takuya Yoshikawa
87bf6e7de1 KVM: fix the handling of dirty bitmaps to avoid overflows
Int is not long enough to store the size of a dirty bitmap.

This patch fixes this problem with the introduction of a wrapper
function to calculate the sizes of dirty bitmaps.

Note: in mark_page_dirty(), we have to consider the fact that
  __set_bit() takes the offset as int, not long.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-04-20 13:06:55 +03:00
Xiao Guangrong
77662e0028 KVM: MMU: fix kvm_mmu_zap_page() and its calling path
This patch fix:

- calculate zapped page number properly in mmu_zap_unsync_children()
- calculate freeed page number properly kvm_mmu_change_mmu_pages()
- if zapped children page it shoud restart hlist walking

KVM-Stable-Tag.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-04-20 12:59:32 +03:00
Avi Kivity
78ac8b47c5 KVM: VMX: Save/restore rflags.vm correctly in real mode
Currently we set eflags.vm unconditionally when entering real mode emulation
through virtual-8086 mode, and clear it unconditionally when we enter protected
mode.  The means that the following sequence

  KVM_SET_REGS  (rflags.vm=1)
  KVM_SET_SREGS (cr0.pe=1)

Ends up with rflags.vm clear due to KVM_SET_SREGS triggering enter_pmode().

Fix by shadowing rflags.vm (and rflags.iopl) correctly while in real mode:
reads and writes to those bits access a shadow register instead of the actual
register.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-04-20 12:59:31 +03:00
Andre Przywara
114be429c8 KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL
There is a quirk for AMD K8 CPUs in many Linux kernels (see
arch/x86/kernel/cpu/mcheck/mce.c:__mcheck_cpu_apply_quirks()) that
clears bit 10 in that MCE related MSR. KVM can only cope with all
zeros or all ones, so it will inject a #GP into the guest, which
will let it panic.
So lets add a quirk to the quirk and ignore this single cleared bit.
This fixes -cpu kvm64 on all machines and -cpu host on K8 machines
with some guest Linux kernels.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-20 12:59:31 +03:00
Avi Kivity
d6a23895aa KVM: Don't spam kernel log when injecting exceptions due to bad cr writes
These are guest-triggerable.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-20 12:55:05 +03:00
Takuya Yoshikawa
b7af404338 KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails
svm_create_vcpu() does not free the pages allocated during the creation
when it fails to complete the allocations. This patch fixes it.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-20 12:55:04 +03:00
Gleb Natapov
7567cae105 KVM: take srcu lock before call to complete_pio()
complete_pio() may use slot table which is protected by srcu.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-20 12:55:04 +03:00