Commit Graph

824 Commits

Author SHA1 Message Date
Paolo Bonzini
cf39d37539 KVM/arm updates for Linux 5.7
- GICv4.1 support
 - 32bit host removal
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl6DKKIPHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDDe0P/30Oda6HJdcUY+g0dnHkH8N7t+VKjPPnihlX
 WBaT0Y4SzMsfAtG5lQqS48A50dXKWW70QvwkZjxu7abQhYFWGd2SGtTQxwqJXT8J
 I6MBh4r9xrIfiqzVT2BXslA6id5H6wCyyFI6vKm/IFkIu1J6JtwnKakQ0CIddS1d
 Blbgj5jcxGw+2xOppHCQXbWwwDdmYWkMZEBZjmhkezddqLDK+oaAUiUhHHHizTsB
 kLjgqYBVENpR1zDIsGpQAJloKXAiHfBQshQAmnhnBNzXE60LZ0n0/iODU9U5FDEO
 5j0DRWccKvsIMsUh7JpPr5xerGJ0rqk1IwPC2JcyzfRbvRLMpK1IOWfhI5Tg5lbP
 4Ev96QLEMBnKOWMSE0MqnMdq6JPzDLA6WZ28HZe2nc3/oWNgsSDtlXigx4xFFxTX
 zfc2YpAgFu3xJkPf8PtWTFvItm0AvFNFynPg0Rr/NsGf/FGeszYR4cLcHmv5NlWS
 IiV4+lgnlmr2LZr3VjUaumbtWIpuVF4Db5Al2K2E/PCN7ObfEkyCweDic8ophkH8
 sMS9TI38aH1Efy+I2Nfxxqpy8BcElZAMrAWt9R27A4JRLHdr7j5DsGnyRigXHgRe
 pFgbqtk/EjWkHwjaJVg8kPxf2+2P05VZsQeGG721nbKAIKDetM3RA2BflexdsptY
 kXplNsVr
 =eILh
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm updates for Linux 5.7

- GICv4.1 support
- 32bit host removal
2020-03-31 10:44:53 -04:00
Paolo Bonzini
d55c9d4009 KVM: nSVM: check for EFER.SVME=1 before entering guest
EFER is set for L2 using svm_set_efer, which hardcodes EFER_SVME to 1 and hides
an incorrect value for EFER.SVME in the L1 VMCB.  Perform the check manually
to detect invalid guest state.

Reported-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-18 13:41:32 +01:00
Miaohe Lin
0b66465344 KVM: nSVM: Remove an obsolete comment.
The function does not return bool anymore.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:59:00 +01:00
Suravee Suthikulpanit
ab56f8e62d kvm: svm: Introduce GA Log tracepoint for AVIC
GA Log tracepoint is useful when debugging AVIC performance
issue as it can be used with perf to count the number of times
IOMMU AVIC injects interrupts through the slow-path instead of
directly inject interrupts to the target vcpu.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:56 +01:00
Paolo Bonzini
78f2145c4d KVM: nSVM: avoid loss of pending IRQ/NMI before entering L2
This patch reproduces for nSVM the change that was made for nVMX in
commit b5861e5cf2 ("KVM: nVMX: Fix loss of pending IRQ/NMI before
entering L2").  While I do not have a test that breaks without it, I
cannot see why it would not be necessary since all events are unblocked
by VMRUN's setting of GIF back to 1.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:55 +01:00
Paolo Bonzini
b518ba9fa6 KVM: nSVM: implement check_nested_events for interrupts
The current implementation of physical interrupt delivery to a nested guest
is quite broken.  It relies on svm_interrupt_allowed returning false if
VINTR=1 so that the interrupt can be injected from enable_irq_window,
but this does not work for guests that do not intercept HLT or that rely
on clearing the host IF to block physical interrupts while L2 runs.

This patch can be split in two logical parts, but including only
one breaks tests so I am combining both changes together.

The first and easiest is simply to return true for svm_interrupt_allowed
if HF_VINTR_MASK is set and HIF is set.  This way the semantics of
svm_interrupt_allowed are respected: svm_interrupt_allowed being false
does not mean "call enable_irq_window", it means "interrupts cannot
be injected now".

After doing this, however, we need another place to inject the
interrupt, and fortunately we already have one, check_nested_events,
which nested SVM does not implement but which is meant exactly for this
purpose.  It is called before interrupts are injected, and it can
therefore do the L2->L1 switch while leaving inject_pending_event
none the wiser.

This patch was developed together with Cathy Avery, who wrote the
test and did a lot of the initial debugging.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:54 +01:00
Paolo Bonzini
64b5bd2704 KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1
If a nested VM is started while an IRQ was pending and with
V_INTR_MASKING=1, the behavior of the guest depends on host IF.  If it
is 1, the VM should exit immediately, before executing the first
instruction of the guest, because VMRUN sets GIF back to 1.

If it is 0 and the host has VGIF, however, at the time of the VMRUN
instruction L0 is running the guest with a pending interrupt window
request.  This interrupt window request is completely irrelevant to
L2, since IF only controls virtual interrupts, so this patch drops
INTERCEPT_VINTR from the VMCB while running L2 under these circumstances.
To simplify the code, both steps of enabling the interrupt window
(setting the VINTR intercept and requesting a fake virtual interrupt
in svm_inject_irq) are grouped in the svm_set_vintr function, and
likewise for dismissing the interrupt window request in svm_clear_vintr.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:53 +01:00
Paolo Bonzini
b5ec2e020b KVM: nSVM: do not change host intercepts while nested VM is running
Instead of touching the host intercepts so that the bitwise OR in
recalc_intercepts just works, mask away uninteresting intercepts
directly in recalc_intercepts.

This is cleaner and keeps the logic in one place even for intercepts
that can change even while L2 is running.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:53 +01:00
Paolo Bonzini
727a7e27cf KVM: x86: rename set_cr3 callback and related flags to load_mmu_pgd
The set_cr3 callback is not setting the guest CR3, it is setting the
root of the guest page tables, either shadow or two-dimensional.
To make this clearer as well as to indicate that the MMU calls it
via kvm_mmu_load_cr3, rename it to load_mmu_pgd.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:52 +01:00
Paolo Bonzini
689f3bf216 KVM: x86: unify callbacks to load paging root
Similar to what kvm-intel.ko is doing, provide a single callback that
merges svm_set_cr3, set_tdp_cr3 and nested_svm_set_tdp_cr3.

This lets us unify the set_cr3 and set_tdp_cr3 entries in kvm_x86_ops.
I'm doing that in this same patch because splitting it adds quite a bit
of churn due to the need for forward declarations.  For the same reason
the assignment to vcpu->arch.mmu->set_cr3 is moved to kvm_init_shadow_mmu
from init_kvm_softmmu and nested_svm_init_mmu_context.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:51 +01:00
Sean Christopherson
f91af5176c KVM: x86: Refactor kvm_cpuid() param that controls out-of-range logic
Invert and rename the kvm_cpuid() param that controls out-of-range logic
to better reflect the semantics of the affected callers, i.e. callers
that bypass the out-of-range logic do so because they are looking up an
exact guest CPUID entry, e.g. to query the maxphyaddr.

Similarly, rename kvm_cpuid()'s internal "found" to "exact" to clarify
that it tracks whether or not the exact requested leaf was found, as
opposed to any usable leaf being found.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:50 +01:00
Paolo Bonzini
408e9a318f KVM: CPUID: add support for supervisor states
Current CPUID 0xd enumeration code does not support supervisor
states, because KVM only supports setting IA32_XSS to zero.
Change it instead to use a new variable supported_xss, to be
set from the hardware_setup callback which is in charge of CPU
capabilities.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:45 +01:00
Sean Christopherson
257038745c KVM: x86: Move nSVM CPUID 0x8000000A handling into common x86 code
Handle CPUID 0x8000000A in the main switch in __do_cpuid_func() and drop
->set_supported_cpuid() now that both VMX and SVM implementations are
empty.  Like leaf 0x14 (Intel PT) and leaf 0x8000001F (SEV), leaf
0x8000000A is is (obviously) vendor specific but can be queried in
common code while respecting SVM's wishes by querying kvm_cpu_cap_has().

Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:45 +01:00
Sean Christopherson
4eb87460c4 KVM: nSVM: Advertise and enable NRIPS for L1 iff nrips is enabled
Set NRIPS in KVM capabilities if and only if nrips=true, which naturally
incorporates the boot_cpu_has() check, and set nrips_enabled only if the
KVM capability is enabled.

Note, previously KVM would set nrips_enabled based purely on userspace
input, but at worst that would cause KVM to propagate garbage into L1,
i.e. userspace would simply be hosing its VM.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:44 +01:00
Sean Christopherson
a50718cc3f KVM: nSVM: Expose SVM features to L1 iff nested is enabled
Set SVM feature bits in KVM capabilities if and only if nested=true, KVM
shouldn't advertise features that realistically can't be used.  Use
kvm_cpu_cap_has(X86_FEATURE_SVM) to indirectly query "nested" in
svm_set_supported_cpuid() in anticipation of moving CPUID 0x8000000A
adjustments into common x86 code.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:43 +01:00
Sean Christopherson
703c335d06 KVM: x86/mmu: Configure max page level during hardware setup
Configure the max page level during hardware setup to avoid a retpoline
in the page fault handler.  Drop ->get_lpage_level() as the page fault
handler was the last user.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:40 +01:00
Sean Christopherson
bde7723559 KVM: x86/mmu: Merge kvm_{enable,disable}_tdp() into a common function
Combine kvm_enable_tdp() and kvm_disable_tdp() into a single function,
kvm_configure_mmu(), in preparation for doing additional configuration
during hardware setup.  And because having separate helpers is silly.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:39 +01:00
Sean Christopherson
213e0e1f50 KVM: SVM: Refactor logging of NPT enabled/disabled
Tweak SVM's logging of NPT enabled/disabled to handle the logging in a
single pr_info() in preparation for merging kvm_enable_tdp() and
kvm_disable_tdp() into a single function.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:38 +01:00
Sean Christopherson
a1bead2aba KVM: VMX: Directly query Intel PT mode when refreshing PMUs
Use vmx_pt_mode_is_host_guest() in intel_pmu_refresh() instead of
bouncing through kvm_x86_ops->pt_supported, and remove ->pt_supported()
as the PMU code was the last remaining user.

Opportunistically clean up the wording of a comment that referenced
kvm_x86_ops->pt_supported().

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:38 +01:00
Sean Christopherson
139085101f KVM: x86: Use KVM cpu caps to detect MSR_TSC_AUX virt support
Check for MSR_TSC_AUX virtualization via kvm_cpu_cap_has() and drop
->rdtscp_supported().

Note, vmx_rdtscp_supported() needs to hang around a tiny bit longer due
other usage in VMX code.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:35 +01:00
Sean Christopherson
93c380e7b5 KVM: x86: Set emulated/transmuted feature bits via kvm_cpu_caps
Set emulated and transmuted (set based on other features) feature bits
via kvm_cpu_caps now that the CPUID output for KVM_GET_SUPPORTED_CPUID
is direcly overidden with kvm_cpu_caps.

Note, VMX emulation of UMIP already sets kvm_cpu_caps.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:33 +01:00
Sean Christopherson
bd79199990 KVM: x86: Override host CPUID results with kvm_cpu_caps
Override CPUID entries with kvm_cpu_caps during KVM_GET_SUPPORTED_CPUID
instead of masking the host CPUID result, which is redundant now that
the host CPUID is incorporated into kvm_cpu_caps at runtime.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:32 +01:00
Sean Christopherson
90d2f60f41 KVM: x86: Use KVM cpu caps to track UMIP emulation
Set UMIP in kvm_cpu_caps when it is emulated by VMX, even though the
bit will effectively be dropped by do_host_cpuid().  This allows
checking for UMIP emulation via kvm_cpu_caps instead of a dedicated
kvm_x86_ops callback.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:28 +01:00
Sean Christopherson
8721f5b061 KVM: x86: Add a helper to check kernel support when setting cpu cap
Add a helper, kvm_cpu_cap_check_and_set(), to query boot_cpu_has() as
part of setting a KVM cpu capability.  VMX in particular has a number of
features that are dependent on both a VMCS capability and kernel
support.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:26 +01:00
Sean Christopherson
b3d895d5c4 KVM: x86: Move XSAVES CPUID adjust to VMX's KVM cpu cap update
Move the clearing of the XSAVES CPUID bit into VMX, which has a separate
VMCS control to enable XSAVES in non-root, to eliminate the last ugly
renmant of the undesirable "unsigned f_* = *_supported ? F(*) : 0"
pattern in the common CPUID handling code.

Drop ->xsaves_supported(), CPUID adjustment was the only user.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:25 +01:00
Sean Christopherson
9b58b9857f KVM: SVM: Convert feature updates from CPUID to KVM cpu caps
Use the recently introduced KVM CPU caps to propagate SVM-only (kernel)
settings to supported CPUID flags.

Note, there are a few subtleties:

  - Setting a flag based on a *different* feature is effectively
    emulation, and must be done at runtime via ->set_supported_cpuid().

  - CPUID 0x8000000A.EDX is a feature leaf that was previously not
    adjusted by kvm_cpu_cap_mask() because all features are hidden by
    default.

Opportunistically add a technically unnecessary break and fix an
indentation issue in svm_set_supported_cpuid().

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:24 +01:00
Sean Christopherson
66a6950f99 KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking
Calculate the CPUID masks for KVM_GET_SUPPORTED_CPUID at load time using
what is effectively a KVM-adjusted copy of boot_cpu_data, or more
precisely, the x86_capability array in boot_cpu_data.

In terms of KVM support, the vast majority of CPUID feature bits are
constant, and *all* feature support is known at KVM load time.  Rather
than apply boot_cpu_data, which is effectively read-only after init,
at runtime, copy it into a KVM-specific array and use *that* to mask
CPUID registers.

In additional to consolidating the masking, kvm_cpu_caps can be adjusted
by SVM/VMX at load time and thus eliminate all feature bit manipulation
in ->set_supported_cpuid().

Opportunistically clean up a few warts:

  - Replace bare "unsigned" with "unsigned int" when a feature flag is
    captured in a local variable, e.g. f_nx.

  - Sort the CPUID masks by function, index and register (alphabetically
    for registers, i.e. EBX comes before ECX/EDX).

  - Remove the superfluous /* cpuid 7.0.ecx */ comments.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Call kvm_set_cpu_caps from kvm_x86_ops->hardware_setup due to fixed
 GBPAGES patch. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:23 +01:00
Sean Christopherson
d64d83d1e0 KVM: x86: Handle PKU CPUID adjustment in VMX code
Move the setting of the PKU CPUID bit into VMX to eliminate an instance
of the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in
the common CPUID handling code.  Drop ->pku_supported(), CPUID
adjustment was the only user.

Note, some AMD CPUs now support PKU, but SVM doesn't yet support
exposing it to a guest.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:19 +01:00
Sean Christopherson
5ffec6f910 KVM: x86: Handle INVPCID CPUID adjustment in VMX code
Move the INVPCID CPUID adjustments into VMX to eliminate an instance of
the undesirable "unsigned f_* = *_supported ? F(*) : 0" pattern in the
common CPUID handling code.  Drop ->invpcid_supported(), CPUID
adjustment was the only user.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:17 +01:00
Sean Christopherson
b32666b13a KVM: x86: Introduce cpuid_entry_{change,set,clear}() mutators
Introduce mutators to modify feature bits in CPUID entries and use the
new mutators where applicable.  Using the mutators eliminates the need
to manually specify the register to modify query at no extra cost and
will allow adding runtime consistency checks on the function/index in a
future patch.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:15 +01:00
Sean Christopherson
160b486f65 KVM: x86: Drop explicit @func param from ->set_supported_cpuid()
Drop the explicit @func param from ->set_supported_cpuid() and instead
pull the CPUID function from the relevant entry.  This sets the stage
for hardening guest CPUID updates in future patches, e.g. allows adding
run-time assertions that the CPUID feature being changed is actually
a bit in the referenced CPUID entry.

No functional change intended.

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:12 +01:00
Sean Christopherson
7f5581f592 KVM: x86: Use supported_xcr0 to detect MPX support
Query supported_xcr0 when checking for MPX support instead of invoking
->mpx_supported() and drop ->mpx_supported() as kvm_mpx_supported() was
its last user.  Rename vmx_mpx_supported() to cpu_has_vmx_mpx() to
better align with VMX/VMCS nomenclature.

Modify VMX's adjustment of xcr0 to call cpus_has_vmx_mpx() (renamed from
vmx_mpx_supported()) directly to avoid reading supported_xcr0 before
it's fully configured.

No functional change intended.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Test that *all* bits are set. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:10 +01:00
Sean Christopherson
cfc481810c KVM: x86: Calculate the supported xcr0 mask at load time
Add a new global variable, supported_xcr0, to track which xcr0 bits can
be exposed to the guest instead of calculating the mask on every call.
The supported bits are constant for a given instance of KVM.

This paves the way toward eliminating the ->mpx_supported() call in
kvm_mpx_supported(), e.g. eliminates multiple retpolines in VMX's nested
VM-Enter path, and eventually toward eliminating ->mpx_supported()
altogether.

No functional change intended.

Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:58:09 +01:00
Sean Christopherson
21f1b8f29e KVM: x86: Explicitly pass an exception struct to check_intercept
Explicitly pass an exception struct when checking for intercept from
the emulator, which eliminates the last reference to arch.emulate_ctxt
in vendor specific code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:50 +01:00
Sean Christopherson
d8dd54e063 KVM: x86/mmu: Rename kvm_mmu->get_cr3() to ->get_guest_pgd()
Rename kvm_mmu->get_cr3() to call out that it is retrieving a guest
value, as opposed to kvm_mmu->set_cr3(), which sets a host value, and to
note that it will return something other than CR3 when nested EPT is in
use.  Hopefully the new name will also make it more obvious that L1's
nested_cr3 is returned in SVM's nested NPT case.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:46 +01:00
Oliver Upton
cc7f5577ad KVM: SVM: Inhibit APIC virtualization for X2APIC guest
The AVIC does not support guest use of the x2APIC interface. Currently,
KVM simply chooses to squash the x2APIC feature in the guest's CPUID
If the AVIC is enabled. Doing so prevents KVM from running a guest
with greater than 255 vCPUs, as such a guest necessitates the use
of the x2APIC interface.

Instead, inhibit AVIC enablement on a per-VM basis whenever the x2APIC
feature is set in the guest's CPUID.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:35 +01:00
Sean Christopherson
562b6b089d KVM: x86: Consolidate VM allocation and free for VMX and SVM
Move the VM allocation and free code to common x86 as the logic is
more or less identical across SVM and VMX.

Note, although hyperv.hv_pa_pg is part of the common kvm->arch, it's
(currently) only allocated by VMX VMs.  But, since kfree() plays nice
when passed a NULL pointer, the superfluous call for SVM is harmless
and avoids future churn if SVM gains support for HyperV's direct TLB
flush.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
[Make vm_size a field instead of a function. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:33 +01:00
Sean Christopherson
1a625056cc KVM: x86: Directly return __vmalloc() result in ->vm_alloc()
Directly return the __vmalloc() result in {svm,vmx}_vm_alloc() to pave
the way for handling VM alloc/free in common x86 code, and to obviate
the need to check the result of __vmalloc() in vendor specific code.
Add a build-time assertion to ensure each structs' "kvm" field stays at
offset 0, which allows interpreting a "struct kvm_{svm,vmx}" as a
"struct kvm".

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:32 +01:00
Sean Christopherson
d18b2f43b9 KVM: x86: Gracefully handle __vmalloc() failure during VM allocation
Check the result of __vmalloc() to avoid dereferencing a NULL pointer in
the event that allocation failres.

Fixes: d1e5b0e98e ("kvm: Make VM ioctl do valloc for some archs")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16 17:57:31 +01:00
Valdis Klētnieks
575b255c16 KVM: x86: allow compiling as non-module with W=1
Compile error with CONFIG_KVM_INTEL=y and W=1:

  CC      arch/x86/kvm/vmx/vmx.o
arch/x86/kvm/vmx/vmx.c:68:32: error: 'vmx_cpu_id' defined but not used [-Werror=unused-const-variable=]
   68 | static const struct x86_cpu_id vmx_cpu_id[] = {
      |                                ^~~~~~~~~~
cc1: all warnings being treated as errors

When building with =y, the MODULE_DEVICE_TABLE macro doesn't generate a
reference to the structure (or any code at all).  This makes W=1 compiles
unhappy.

Wrap both in a #ifdef to avoid the issue.

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
[Do the same for CONFIG_KVM_AMD. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28 10:35:37 +01:00
Paolo Bonzini
7943f4acea KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter
Even if APICv is disabled at startup, the backing page and ir_list need
to be initialized in case they are needed later.  The only case in
which this can be skipped is for userspace irqchip, and that must be
done because avic_init_backing_page dereferences vcpu->arch.apic
(which is NULL for userspace irqchip).

Tested-by: rmuncrief@humanavance.com
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=206579
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-28 10:33:17 +01:00
Oliver Upton
5ef8acbdd6 KVM: nVMX: Emulate MTF when performing instruction emulation
Since commit 5f3d45e7f2 ("kvm/x86: add support for
MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap
flag processor-based execution control for its L2 guest. KVM simply
forwards any MTF VM-exits to the L1 guest, which works for normal
instruction execution.

However, when KVM needs to emulate an instruction on the behalf of an L2
guest, the monitor trap flag is not emulated. Add the necessary logic to
kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon
instruction emulation for L2.

Fixes: 5f3d45e7f2 ("kvm/x86: add support for MONITOR_TRAP_FLAG")
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23 09:36:23 +01:00
Li RongQing
dd58f3c95c KVM: fix error handling in svm_hardware_setup
rename svm_hardware_unsetup as svm_hardware_teardown, move
it before svm_hardware_setup, and call it to free all memory
if fail to setup in svm_hardware_setup, otherwise memory will
be leaked

remove __exit attribute for it since it is called in __init
function

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-23 09:34:26 +01:00
Miaohe Lin
d80b64ff29 KVM: SVM: Fix potential memory leak in svm_cpu_init()
When kmalloc memory for sd->sev_vmcbs failed, we forget to free the page
held by sd->save_area. Also get rid of the var r as '-ENOMEM' is actually
the only possible outcome here.

Reviewed-by: Liran Alon <liran.alon@oracle.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21 18:06:04 +01:00
Vitaly Kuznetsov
91a5f413af KVM: nVMX: handle nested posted interrupts when apicv is disabled for L1
Even when APICv is disabled for L1 it can (and, actually, is) still
available for L2, this means we need to always call
vmx_deliver_nested_posted_interrupt() when attempting an interrupt
delivery.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21 18:05:21 +01:00
Suravee Suthikulpanit
93fd9666c2 kvm: x86: svm: Fix NULL pointer dereference when AVIC not enabled
Launching VM w/ AVIC disabled together with pass-through device
results in NULL pointer dereference bug with the following call trace.

    RIP: 0010:svm_refresh_apicv_exec_ctrl+0x17e/0x1a0 [kvm_amd]

    Call Trace:
     kvm_vcpu_update_apicv+0x44/0x60 [kvm]
     kvm_arch_vcpu_ioctl_run+0x3f4/0x1c80 [kvm]
     kvm_vcpu_ioctl+0x3d8/0x650 [kvm]
     do_vfs_ioctl+0xaa/0x660
     ? tomoyo_file_ioctl+0x19/0x20
     ksys_ioctl+0x67/0x90
     __x64_sys_ioctl+0x1a/0x20
     do_syscall_64+0x57/0x190
     entry_SYSCALL_64_after_hwframe+0x44/0xa9

Investigation shows that this is due to the uninitialized usage of
struct vapu_svm.ir_list in the svm_set_pi_irte_mode(), which is
called from svm_refresh_apicv_exec_ctrl().

The ir_list is initialized only if AVIC is enabled. So, fixes by
adding a check if AVIC is enabled in the svm_refresh_apicv_exec_ctrl().

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206579
Fixes: 8937d76239 ("kvm: x86: svm: Add support to (de)activate posted interrupts.")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-21 18:05:20 +01:00
Paolo Bonzini
bab0c318ba KVM: x86: do not reset microcode version on INIT or RESET
Do not initialize the microcode version at RESET or INIT, only on vCPU
creation.   Microcode updates are not lost during INIT, and exact
behavior across a warm RESET is not specified by the architecture.

Since we do not support a microcode update directly from the hypervisor,
but only as a result of userspace setting the microcode version MSR,
it's simpler for userspace if we do nothing in KVM and let userspace
emulate behavior for RESET as it sees fit.

Userspace can tie the fix to the availability of MSR_IA32_UCODE_REV in
the list of emulated MSRs.

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 09:25:37 +01:00
Paolo Bonzini
df7e881892 KVM: SVM: relax conditions for allowing MSR_IA32_SPEC_CTRL accesses
Userspace that does not know about the AMD_IBRS bit might still
allow the guest to protect itself with MSR_IA32_SPEC_CTRL using
the Intel SPEC_CTRL bit.  However, svm.c disallows this and will
cause a #GP in the guest when writing to the MSR.  Fix this by
loosening the test and allowing the Intel CPUID bit, and in fact
allow the AMD_STIBP bit as well since it allows writing to
MSR_IA32_SPEC_CTRL too.

Reported-by: Zhiyi Guo <zhguo@redhat.com>
Analyzed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Analyzed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 16:12:57 +01:00
Paolo Bonzini
e8ef2a19a0 KVM: SVM: allow AVIC without split irqchip
SVM is now able to disable AVIC dynamically whenever the in-kernel PIT sets
up an ack notifier, so we can enable it even if in-kernel IOAPIC/PIC/PIT
are in use.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:44 +01:00
Suravee Suthikulpanit
e2ed4078a6 kvm: i8254: Deactivate APICv when using in-kernel PIT re-injection mode.
AMD SVM AVIC accelerates EOI write and does not trap. This causes
in-kernel PIT re-injection mode to fail since it relies on irq-ack
notifier mechanism. So, APICv is activated only when in-kernel PIT
is in discard mode e.g. w/ qemu option:

  -global kvm-pit.lost_tick_policy=discard

Also, introduce APICV_INHIBIT_REASON_PIT_REINJ bit to be used for this
reason.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:44 +01:00