Commit Graph

10229 Commits

Author SHA1 Message Date
H. Peter Anvin
d7be0ce6af Merge commit 'v2.6.34-rc6' into x86/cpu 2010-05-08 14:59:58 -07:00
H. Peter Anvin
e08cae4181 x86: Clean up the hypervisor layer
Clean up the hypervisor layer and the hypervisor drivers, using an ops
structure instead of an enumeration with if statements.

The identity of the hypervisor, if needed, can be tested by testing
the pointer value in x86_hyper.

The MS-HyperV private state is moved into a normal global variable
(it's per-system state, not per-CPU state).  Being a normal bss
variable, it will be left at all zero on non-HyperV platforms, and so
can generally be tested for HyperV-specific features without
additional qualification.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Greg KH <greg@kroah.com>
Cc: Hank Janssen <hjanssen@microsoft.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Ky Srinivasan <ksrinivasan@novell.com>
LKML-Reference: <4BE49778.6060800@zytor.com>
2010-05-07 17:13:04 -07:00
Greg Kroah-Hartman
9fa0231742 x86, HyperV: fix up the license to mshyperv.c
This should have been GPLv2 only, we cut and pasted from the wrong file
originally, sorry.

Also removed some unneeded boilerplate license code, we all know where
to find the GPLv2, and that there's no warranty as that is implicit from
the license.

Cc: Ky Srinivasan <ksrinivasan@novell.com>
Cc: Hank Janssen <hjanssen@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
LKML-Reference: <20100507235541.GA15448@kroah.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-07 17:00:25 -07:00
Ky Srinivasan
a2a47c6c3d x86: Detect running on a Microsoft HyperV system
This patch integrates HyperV detection within the framework currently
used by VmWare. With this patch, we can avoid having to replicate the
HyperV detection code in each of the Microsoft HyperV drivers.

Reworked and tweaked by Greg K-H to build properly.

Signed-off-by: K. Y. Srinivasan <ksrinivasan@novell.com>
LKML-Reference: <20100506190841.GA1605@kroah.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Vadim Rozenfeld <vrozenfe@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "K.Prasad" <prasad@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Hank Janssen <hjanssen@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-06 18:24:15 -07:00
H. Peter Anvin
097c1bd567 x86, cpu: Make APERF/MPERF a normal table-driven flag
APERF/MPERF can be handled via the table like all the other scattered
CPU flags.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Renninger <trenn@suse.de>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-4-git-send-email-bp@amd64.org>
2010-05-03 15:49:31 -07:00
Borislav Petkov
d88d95eb1c x86, k8: Fix build error when K8_NB is disabled
K8_NB depends on PCI and when the last is disabled (allnoconfig) we fail
at the final linking stage due to missing exported num_k8_northbridges.
Add a header stub for that.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100503183036.GJ26107@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-05-03 14:34:24 -07:00
Linus Torvalds
dfad53d48e Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
  x86: Disable large pages on CPUs with Atom erratum AAE44
  x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzero
  x86, mrst: Conditionally register cpu hotplug notifier for apbt
2010-04-28 20:41:55 -07:00
Bjorn Helgaas
48728e0774 x86/PCI: compute Address Space length rather than using _LEN
ACPI _CRS Address Space Descriptors have _MIN, _MAX, and _LEN.  Linux has
been computing Address Spaces as [_MIN to _MIN + _LEN - 1].  Based on the
tests in the bug reports below, Windows apparently uses [_MIN to _MAX].

Per spec (ACPI 4.0, Table 6-40), for _CRS fixed-size, fixed location
descriptors, "_LEN must be (_MAX - _MIN + 1)", and when that's true, it
doesn't matter which way we compute the end.  But of course, there are
BIOSes that don't follow this rule, and we're better off if Linux handles
those exceptions the same way as Windows.

This patch makes Linux use [_MIN to _MAX], as Windows seems to do.  This
effectively reverts d558b483d5 and 03db42adfe and replaces them with
simpler code.

    https://bugzilla.kernel.org/show_bug.cgi?id=14337 (round)
    https://bugzilla.kernel.org/show_bug.cgi?id=15480 (truncate)

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-04-28 09:17:45 -07:00
Bjorn Helgaas
55051feb57 x86/PCI: never allocate PCI MMIO resources below BIOS_END
When we move a PCI device or assign resources to a device not configured
by the BIOS, we want to avoid the BIOS region below 1MB.  Note that if the
BIOS places devices below 1MB, we leave them there.

See https://bugzilla.kernel.org/show_bug.cgi?id=15744
and https://bugzilla.kernel.org/show_bug.cgi?id=15841

Tested-by: Andy Isaacson <adi@hexapodia.org>
Tested-by: Andy Bailey <bailey@akamai.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-04-26 12:30:03 -07:00
Linus Torvalds
383bee6b54 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: Ensure we re-enable devices on resume
  x86/PCI: parse additional host bridge window resource types
  PCI: revert broken device warning
  PCI aerdrv: use correct bit defines and add 2ms delay to aer_root_reset
  x86/PCI: ignore Consumer/Producer bit in ACPI window descriptions
2010-04-24 11:32:12 -07:00
Dmitry Torokhov
453dc65931 VMware Balloon driver
This is a standalone version of VMware Balloon driver.  Ballooning is a
technique that allows hypervisor dynamically limit the amount of memory
available to the guest (with guest cooperation).  In the overcommit
scenario, when hypervisor set detects that it needs to shuffle some
memory, it instructs the driver to allocate certain number of pages, and
the underlying memory gets returned to the hypervisor.  Later hypervisor
may return memory to the guest by reattaching memory to the pageframes and
instructing the driver to "deflate" balloon.

We are submitting a standalone driver because KVM maintainer (Avi Kivity)
expressed opinion (rightly) that our transport does not fit well into
virtqueue paradigm and thus it does not make much sense to integrate with
virtio.

There were also some concerns whether current ballooning technique is the
right thing.  If there appears a better framework to achieve this we are
prepared to evaluate and switch to using it, but in the meantime we'd like
to get this driver upstream.

We want to get the driver accepted in distributions so that users do not
have to deal with an out-of-tree module and many distributions have
"upstream first" requirement.

The driver has been shipping for a number of years and users running on
VMware platform will have it installed as part of VMware Tools even if it
will not come from a distribution, thus there should not be additional
risk in pulling the driver into mainline.  The driver will only activate
if host is VMware so everyone else should not be affected at all.

Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-24 11:31:26 -07:00
H. Peter Anvin
7a0fc404ae x86: Disable large pages on CPUs with Atom erratum AAE44
Atom erratum AAE44/AAF40/AAG38/AAH41:

"If software clears the PS (page size) bit in a present PDE (page
directory entry), that will cause linear addresses mapped through this
PDE to use 4-KByte pages instead of using a large page after old TLB
entries are invalidated. Due to this erratum, if a code fetch uses
this PDE before the TLB entry for the large page is invalidated then
it may fetch from a different physical address than specified by
either the old large page translation or the new 4-KByte page
translation. This erratum may also cause speculative code fetches from
incorrect addresses."

[http://download.intel.com/design/processor/specupdt/319536.pdf]

Where as commit 211b3d03c7 seems to
workaround errata AAH41 (mixed 4K TLBs) it reduces the window of
opportunity for the bug to occur and does not totally remove it.  This
patch disables mixed 4K/4MB page tables totally avoiding the page
splitting and not tripping this processor issue.

This is based on an original patch by Colin King.

Originally-by: Colin Ian King <colin.king@canonical.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <1269271251-19775-1-git-send-email-colin.king@canonical.com>
Cc: <stable@kernel.org>
2010-04-23 16:49:51 -07:00
H. Peter Anvin
7ce5a2b9bb x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzero
When we do a thread switch, we clear the outgoing FS/GS base if the
corresponding selector is nonzero.  This is taken by __switch_to() as
an entry invariant; it does not verify that it is true on entry.
However, copy_thread() doesn't enforce this constraint, which can
result in inconsistent results after fork().

Make copy_thread() match the behavior of __switch_to().

Reported-and-tested-by: Samuel Thibault <samuel.thibault@inria.fr>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <4BD1E061.8030605@zytor.com>
Cc: <stable@kernel.org>
2010-04-23 16:49:51 -07:00
Borislav Petkov
59d3b38874 x86, cacheinfo: Disable index in all four subcaches
When disabling an L3 cache index, make sure we disable that index in
all four subcaches of the L3. Clarify nomenclature while at it, wrt to
disable slots versus disable index and rename accordingly.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-6-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-22 17:17:27 -07:00
Borislav Petkov
ba06edb63f x86, cacheinfo: Make L3 cache info per node
Currently, we're allocating L3 cache info and calculating indices for
each online cpu which is clearly superfluous. Instead, we need to do
this per-node as is each L3 cache.

No functional change, only per-cpu memory savings.

-v2: Allocate L3 cache descriptors array dynamically.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-5-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-22 17:17:23 -07:00
Borislav Petkov
9350f982e4 x86, cacheinfo: Reorganize AMD L3 cache structure
Add a struct representing L3 cache attributes (subcache sizes and
indices count) and move the respective members out of _cpuid4_info.
Also, stash the struct pci_dev ptr into the struct simplifying the code
even more.

There should be no functionality change resulting from this patch except
slightly slimming the _cpuid4_info per-cpu vars.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-4-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-22 17:17:23 -07:00
Frank Arnold
f2b20e4140 x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
When running a quest kernel on xen we get:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
IP: [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
PGD 0
Oops: 0000 [#1] SMP
last sysfs file:
CPU 0
Modules linked in:

Pid: 0, comm: swapper Tainted: G        W  2.6.34-rc3 #1 /HVM domU
RIP: 0010:[<ffffffff8142f2fb>]  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x
2ca/0x3df
RSP: 0018:ffff880002203e08  EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000060
RDX: 0000000000000000 RSI: 0000000000000040 RDI: 0000000000000000
RBP: ffff880002203ed8 R08: 00000000000017c0 R09: ffff880002203e38
R10: ffff8800023d5d40 R11: ffffffff81a01e28 R12: ffff880187e6f5c0
R13: ffff880002203e34 R14: ffff880002203e58 R15: ffff880002203e68
FS:  0000000000000000(0000) GS:ffff880002200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000038 CR3: 0000000001a3c000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a44020)
Stack:
 ffffffff810d7ecb ffff880002203e20 ffffffff81059140 ffff880002203e30
<0> ffffffff810d7ec9 0000000002203e40 000000000050d140 ffff880002203e70
<0> 0000000002008140 0000000000000086 ffff880040020140 ffffffff81068b8b
Call Trace:
 <IRQ>
 [<ffffffff810d7ecb>] ? sync_supers_timer_fn+0x0/0x1c
 [<ffffffff81059140>] ? mod_timer+0x23/0x25
 [<ffffffff810d7ec9>] ? arm_supers_timer+0x34/0x36
 [<ffffffff81068b8b>] ? hrtimer_get_next_event+0xa7/0xc3
 [<ffffffff81058e85>] ? get_next_timer_interrupt+0x19a/0x20d
 [<ffffffff8142fa23>] get_cpu_leaves+0x5c/0x232
 [<ffffffff8106a7b1>] ? sched_clock_local+0x1c/0x82
 [<ffffffff8106a9a0>] ? sched_clock_tick+0x75/0x7a
 [<ffffffff8107748c>] generic_smp_call_function_single_interrupt+0xae/0xd0
 [<ffffffff8101f6ef>] smp_call_function_single_interrupt+0x18/0x27
 [<ffffffff8100a773>] call_function_single_interrupt+0x13/0x20
 <EOI>
 [<ffffffff8143c468>] ? notifier_call_chain+0x14/0x63
 [<ffffffff810295c6>] ? native_safe_halt+0xc/0xd
 [<ffffffff810114eb>] ? default_idle+0x36/0x53
 [<ffffffff81008c22>] cpu_idle+0xaa/0xe4
 [<ffffffff81423a9a>] rest_init+0x7e/0x80
 [<ffffffff81b10dd2>] start_kernel+0x40e/0x419
 [<ffffffff81b102c8>] x86_64_start_reservations+0xb3/0xb7
 [<ffffffff81b103c4>] x86_64_start_kernel+0xf8/0x107
Code: 14 d5 40 ff ae 81 8b 14 02 31 c0 3b 15 47 1c 8b 00 7d 0e 48 8b 05 36 1c 8b
 00 48 63 d2 48 8b 04 d0 c7 85 5c ff ff ff 00 00 00 00 <8b> 70 38 48 8d 8d 5c ff
 ff ff 48 8b 78 10 ba c4 01 00 00 e8 eb
RIP  [<ffffffff8142f2fb>] cpuid4_cache_lookup_regs+0x2ca/0x3df
 RSP <ffff880002203e08>
CR2: 0000000000000038
---[ end trace a7919e7f17c0a726 ]---

The L3 cache index disable feature of AMD CPUs has to be disabled if the
kernel is running as guest on top of a hypervisor because northbridge
devices are not available to the guest. Currently, this fixes a boot
crash on top of Xen. In the future this will become an issue on KVM as
well.

Check if northbridge devices are present and do not enable the feature
if there are none.

Signed-off-by: Frank Arnold <frank.arnold@amd.com>
LKML-Reference: <1271945222-5283-3-git-send-email-bp@amd64.org>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-22 17:17:21 -07:00
Borislav Petkov
b1ab1b4d9a x86, cacheinfo: Unify AMD L3 cache index disable checking
All F10h CPUs starting with model 8 resp. 9, stepping 1, support L3
cache index disable. Concentrate the family, model, stepping checking at
one place and enable the feature implicitly on upcoming Fam10h models.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1271945222-5283-2-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-22 17:17:20 -07:00
Bjorn Helgaas
66528fdd45 x86/PCI: parse additional host bridge window resource types
This adds support for Memory24, Memory32, and Memory32Fixed descriptors in
PCI host bridge _CRS.

I experimentally determined that Windows (2008 R2) accepts these descriptors
and treats them as windows that are forwarded to the PCI bus, e.g., if
it finds any PCI devices with BARs outside the windows, it moves them into
the windows.

I don't know whether any machines actually use these descriptors in PCI
host bridge _CRS methods, but if any exist and they're new enough that we
automatically turn on "pci=use_crs", they will work with Windows but not
with Linux.

Here are the details: https://bugzilla.kernel.org/show_bug.cgi?id=15817

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-04-22 16:13:22 -07:00
Linus Torvalds
a486b0af79 Merge branch 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Fix TSS size check for 16-bit tasks
  KVM: Add missing srcu_read_lock() for kvm_mmu_notifier_release()
  KVM: Increase NR_IOBUS_DEVS limit to 200
  KVM: fix the handling of dirty bitmaps to avoid overflows
  KVM: MMU: fix kvm_mmu_zap_page() and its calling path
  KVM: VMX: Save/restore rflags.vm correctly in real mode
  KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL
  KVM: Don't spam kernel log when injecting exceptions due to bad cr writes
  KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails
  KVM: take srcu lock before call to complete_pio()
2010-04-21 12:29:46 -07:00
Jan Kiszka
e8861cfe2c KVM: x86: Fix TSS size check for 16-bit tasks
A 16-bit TSS is only 44 bytes long. So make sure to test for the correct
size on task switch.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-21 13:51:42 +03:00
Jacob Pan
ae7c9b70dc x86, mrst: Conditionally register cpu hotplug notifier for apbt
APB timer is used on Moorestown platforms but not on a standard PC.
If APB timer code is compiled in but not initialized at run-time due
to lack of FW reported SFI table, kernel would panic when the non-boot
CPUs are offlined and notifier is called.

https://bugzilla.kernel.org/show_bug.cgi?id=15786

This patch ensures CPU hotplug notifier for APB timer is only registered
when the APBT timer block is initialized.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
LKML-Reference: <1271701423-1162-1-git-send-email-jacob.jun.pan@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-20 14:38:28 -07:00
Linus Torvalds
34388d1c4f Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf: Fix unsafe frame rewinding with hot regs fetching
2010-04-20 09:20:23 -07:00
Christoph Hellwig
4cecd935f6 x86: correctly wire up the newuname system call
Before commit e28cbf2293 ("improve
sys_newuname() for compat architectures") 64-bit x86 had a private
implementation of sys_uname which was just called sys_uname, which other
architectures used for the old uname.

Due to some merge issues with the uname refactoring patches we ended up
calling the old uname version for both the old and new system call
slots, which lead to the domainname filed never be set which caused
failures with libnss_nis.

Reported-and-tested-by: Andy Isaacson <adi@hexapodia.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-04-20 09:17:21 -07:00
Takuya Yoshikawa
87bf6e7de1 KVM: fix the handling of dirty bitmaps to avoid overflows
Int is not long enough to store the size of a dirty bitmap.

This patch fixes this problem with the introduction of a wrapper
function to calculate the sizes of dirty bitmaps.

Note: in mark_page_dirty(), we have to consider the fact that
  __set_bit() takes the offset as int, not long.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-04-20 13:06:55 +03:00
Xiao Guangrong
77662e0028 KVM: MMU: fix kvm_mmu_zap_page() and its calling path
This patch fix:

- calculate zapped page number properly in mmu_zap_unsync_children()
- calculate freeed page number properly kvm_mmu_change_mmu_pages()
- if zapped children page it shoud restart hlist walking

KVM-Stable-Tag.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-04-20 12:59:32 +03:00
Avi Kivity
78ac8b47c5 KVM: VMX: Save/restore rflags.vm correctly in real mode
Currently we set eflags.vm unconditionally when entering real mode emulation
through virtual-8086 mode, and clear it unconditionally when we enter protected
mode.  The means that the following sequence

  KVM_SET_REGS  (rflags.vm=1)
  KVM_SET_SREGS (cr0.pe=1)

Ends up with rflags.vm clear due to KVM_SET_SREGS triggering enter_pmode().

Fix by shadowing rflags.vm (and rflags.iopl) correctly while in real mode:
reads and writes to those bits access a shadow register instead of the actual
register.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-04-20 12:59:31 +03:00
Andre Przywara
114be429c8 KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL
There is a quirk for AMD K8 CPUs in many Linux kernels (see
arch/x86/kernel/cpu/mcheck/mce.c:__mcheck_cpu_apply_quirks()) that
clears bit 10 in that MCE related MSR. KVM can only cope with all
zeros or all ones, so it will inject a #GP into the guest, which
will let it panic.
So lets add a quirk to the quirk and ignore this single cleared bit.
This fixes -cpu kvm64 on all machines and -cpu host on K8 machines
with some guest Linux kernels.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-20 12:59:31 +03:00
Avi Kivity
d6a23895aa KVM: Don't spam kernel log when injecting exceptions due to bad cr writes
These are guest-triggerable.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-20 12:55:05 +03:00
Takuya Yoshikawa
b7af404338 KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails
svm_create_vcpu() does not free the pages allocated during the creation
when it fails to complete the allocations. This patch fixes it.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-20 12:55:04 +03:00
Gleb Natapov
7567cae105 KVM: take srcu lock before call to complete_pio()
complete_pio() may use slot table which is protected by srcu.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-20 12:55:04 +03:00
Linus Torvalds
dc57da3875 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/gart: Disable GART explicitly before initialization
  dma-debug: Cleanup for copy-loop in filter_write()
  x86/amd-iommu: Remove obsolete parameter documentation
  x86/amd-iommu: use for_each_pci_dev
  Revert "x86: disable IOMMUs on kernel crash"
  x86/amd-iommu: warn when issuing command to uninitialized cmd buffer
  x86/amd-iommu: enable iommu before attaching devices
  x86/amd-iommu: Use helper function to destroy domain
  x86/amd-iommu: Report errors in acpi parsing functions upstream
  x86/amd-iommu: Pt mode fix for domain_destroy
  x86/amd-iommu: Protect IOMMU-API map/unmap path
  x86/amd-iommu: Remove double NULL check in check_device
2010-04-15 12:20:56 -07:00
Rusty Russell
091ebf07a2 lguest: stop using KVM hypercall mechanism
This is a partial revert of 4cd8b5e2a1 "lguest: use KVM hypercalls";
we revert to using (just as questionable but more reliable) int $15 for
hypercalls.  I didn't revert the register mapping, so we still use the
same calling convention as kvm.

KVM in more recent incarnations stopped injecting a fault when a guest
tried to use the VMCALL instruction from ring 1, so lguest under kvm
fails to make hypercalls.  It was nice to share code with our KVM
cousins, but this was overreach.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Matias Zabaljauregui <zabaljauregui@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
2010-04-14 21:43:56 +09:30
Ingo Molnar
2b2f862ee6 Merge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2010-04-13 13:24:54 +02:00
Mark Langsdorf
679370641e powernow-k8: Fix frequency reporting
With F10, model 10, all valid frequencies are in the ACPI _PST table.

Cc: <stable@kernel.org> # 33.x 32.x
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
LKML-Reference: <1270065406-1814-6-git-send-email-bp@amd64.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-09 14:07:47 -07:00
Mark Langsdorf
a2fed573f0 x86, cpufreq: Add APERF/MPERF support for AMD processors
Starting with model 10 of Family 0x10, AMD processors may have
support for APERF/MPERF.  Add support for identifying it and using
it within cpufreq.  Move the APERF/MPERF functions out of the
acpi-cpufreq code and into their own file so they can easily be
shared.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
LKML-Reference: <20100401141956.GA1930@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-09 14:07:40 -07:00
Borislav Petkov
d65ad45cd8 x86: Unify APERF/MPERF support
Initialize this CPUID flag feature in common code. It could be made a
standalone function later, maybe, if more functionality is duplicated.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-4-git-send-email-bp@amd64.org>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-09 14:05:50 -07:00
Borislav Petkov
73860c6b2f powernow-k8: Add core performance boost support
Starting with F10h, revE, AMD processors add support for a dynamic
core boosting feature called Core Performance Boost. When a specific
condition is present, a subset of the cores on a system are boosted
beyond their P0 operating frequency to speed up the performance of
single-threaded applications.

In the normal case, the system comes out of reset with core boosting
enabled. This patch adds a sysfs knob with which core boosting can be
switched on or off for benchmarking purposes.

While at it, make the CPB code hotplug-aware so that taking cores
offline wouldn't interfere with boosting the remaining online cores.
Furthermore, add cpu_online_mask hotplug protection as suggested by
Andrew.

Finally, cleanup the driver init codepath and update copyrights.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-3-git-send-email-bp@amd64.org>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-09 14:05:43 -07:00
Borislav Petkov
5958f1d5d7 x86, cpu: Add AMD core boosting feature flag to /proc/cpuinfo
By semi-popular demand, this adds the Core Performance Boost feature
flag to /proc/cpuinfo. Possible use case for this is userspace tools
like cpufreq-aperf, for example, so that they don't have to jump through
hoops of accessing "/dev/cpu/%d/cpuid" in order to check for CPB hw
support, or call cpuid from userspace.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-2-git-send-email-bp@amd64.org>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-04-09 14:05:23 -07:00
Frederic Weisbecker
ab285f2b52 perf: Fix unsafe frame rewinding with hot regs fetching
When we fetch the hot regs and rewind to the nth caller, it
might happen that we dereference a frame pointer outside the
kernel stack boundaries, like in this example:

	perf_trace_sched_switch+0xd5/0x120
        schedule+0x6b5/0x860
        retint_careful+0xd/0x21

Since we directly dereference a userspace frame pointer here while
rewinding behind retint_careful, this may end up in a crash.

Fix this by simply using probe_kernel_address() when we rewind the
frame pointer.

This issue will have a much more proper fix in the next version of the
perf_arch_fetch_caller_regs() API that will only need to rewind to the
first caller.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: David Miller <davem@davemloft.net>
Cc: Archs <linux-arch@vger.kernel.org>
2010-04-08 19:03:28 +02:00
Bjorn Helgaas
73a0e61458 x86/PCI: ignore Consumer/Producer bit in ACPI window descriptions
ACPI Address Space Descriptors (used in _CRS) have a Consumer/Producer
bit that is supposed to distinguish regions that are consumed directly
by a device from those that are forwarded ("produced") by a bridge.
But BIOSes have apparently not used this consistently, and Windows
seems to ignore it, so I think Linux should ignore it as well.

I can't point to any of these supposed broken BIOSes, but since we
now rely on _CRS by default, I think it's safer to ignore this bit
from the start.

Here are details of my experiments with how Windows handles it:
    https://bugzilla.kernel.org/show_bug.cgi?id=15701

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-04-08 09:23:42 -07:00
Linus Torvalds
48de8cb784 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Enable Nehalem-EX support
  perf kmem: Fix breakage introduced by 5a0e3ad slab.h script
2010-04-07 14:01:51 -07:00
Linus Torvalds
fb1ae63577 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
  x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards
  x86: Increase CONFIG_NODES_SHIFT max to 10
  ibft, x86: Change reserve_ibft_region() to find_ibft_region()
  x86, hpet: Fix bug in RTC emulation
  x86, hpet: Erratum workaround for read after write of HPET comparator
  bootmem, x86: Fix 32bit numa system without RAM on node 0
  nobootmem, x86: Fix 32bit numa system without RAM on node 0
  x86: Handle overlapping mptables
  x86: Make e820_remove_range to handle all covered case
  x86-32, resume: do a global tlb flush in S4 resume
2010-04-07 11:02:23 -07:00
Joerg Roedel
4b83873d3d x86/gart: Disable GART explicitly before initialization
If we boot into a crash-kernel the gart might still be
enabled and its caches might be dirty. This can result in
undefined behavior later. Fix it by explicitly disabling the
gart hardware before initialization and flushing the caches
after enablement.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-04-07 14:36:30 +02:00
Joerg Roedel
12ff4bf58b Merge branch 'amd-iommu/fixes' into iommu/fixes 2010-04-07 14:36:20 +02:00
Chris Wright
d18c69d389 x86/amd-iommu: use for_each_pci_dev
Replace open coded version with for_each_pci_dev

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-04-07 11:51:34 +02:00
Chris Wright
8f9f55e83e Revert "x86: disable IOMMUs on kernel crash"
This effectively reverts commit 61d047be99.

Disabling the IOMMU can potetially allow DMA transactions to
complete without being translated.  Leave it enabled, and allow
crash kernel to do the IOMMU reinitialization properly.

Cc: stable@kernel.org
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-04-07 11:51:17 +02:00
Chris Wright
549c90dc9a x86/amd-iommu: warn when issuing command to uninitialized cmd buffer
To catch future potential issues we can add a warning whenever we issue
a command before the command buffer is fully initialized.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-04-07 11:51:15 +02:00
Chris Wright
75f66533bc x86/amd-iommu: enable iommu before attaching devices
Hit another kdump problem as reported by Neil Horman.  When initializaing
the IOMMU, we attach devices to their domains before the IOMMU is
fully (re)initialized.  Attaching a device will issue some important
invalidations.  In the context of the newly kexec'd kdump kernel, the
IOMMU may have stale cached data from the original kernel.  Because we
do the attach too early, the invalidation commands are placed in the new
command buffer before the IOMMU is updated w/ that buffer.  This leaves
the stale entries in the kdump context and can renders device unusable.
Simply enable the IOMMU before we do the attach.

Cc: stable@kernel.org
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-04-07 11:50:50 +02:00
Vince Weaver
134fbadf02 perf, x86: Enable Nehalem-EX support
According to Intel Software Devel Manual Volume 3B, the
Nehalem-EX PMU is just like regular Nehalem (except for the
uncore support, which is completely different).

Signed-off-by:  Vince Weaver <vweaver1@eecs.utk.edu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Lin Ming <ming.m.lin@intel.com>
LKML-Reference: <alpine.DEB.2.00.1004060956580.1417@cl320.eecs.utk.edu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-06 17:52:59 +02:00