Commit Graph

401 Commits

Author SHA1 Message Date
Sean Christopherson
148d735eb5 KVM: nVMX: Use correct root level for nested EPT shadow page tables
Hardcode the EPT page-walk level for L2 to be 4 levels, as KVM's MMU
currently also hardcodes the page walk level for nested EPT to be 4
levels.  The L2 guest is all but guaranteed to soft hang on its first
instruction when L1 is using EPT, as KVM will construct 4-level page
tables and then tell hardware to use 5-level page tables.

Fixes: 855feb6736 ("KVM: MMU: Add 5 level EPT & Shadow page table support.")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:43 +01:00
Miaohe Lin
ffdbd50dca KVM: nVMX: Fix some comment typos and coding style
Fix some typos in the comments. Also fix coding style.
[Sean Christopherson rewrites the comment of write_fault_to_shadow_pgtable
field in struct kvm_vcpu_arch.]

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:43 +01:00
Oliver Upton
684c0422da KVM: nVMX: Handle pending #DB when injecting INIT VM-exit
SDM 27.3.4 states that the 'pending debug exceptions' VMCS field will
be populated if a VM-exit caused by an INIT signal takes priority over a
debug-trap. Emulate this behavior when synthesizing an INIT signal
VM-exit into L1.

Fixes: 4b9852f4f3 ("KVM: x86: Fix INIT signal handling in various CPU states")
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 12:34:09 +01:00
Paolo Bonzini
bab0c318ba KVM: x86: do not reset microcode version on INIT or RESET
Do not initialize the microcode version at RESET or INIT, only on vCPU
creation.   Microcode updates are not lost during INIT, and exact
behavior across a warm RESET is not specified by the architecture.

Since we do not support a microcode update directly from the hypervisor,
but only as a result of userspace setting the microcode version MSR,
it's simpler for userspace if we do nothing in KVM and let userspace
emulate behavior for RESET as it sees fit.

Userspace can tie the fix to the availability of MSR_IA32_UCODE_REV in
the list of emulated MSRs.

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 09:25:37 +01:00
Miaohe Lin
a8be1ad01b KVM: vmx: delete meaningless vmx_decache_cr0_guest_bits() declaration
The function vmx_decache_cr0_guest_bits() is only called below its
implementation. So this is meaningless and should be removed.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 16:44:06 +01:00
Eric Hankland
4400cf546b KVM: x86: Fix perfctr WRMSR for running counters
Correct the logic in intel_pmu_set_msr() for fixed and general purpose
counters. This was recently changed to set pmc->counter without taking
in to account the value of pmc_read_counter() which will be incorrect if
the counter is currently running and non-zero; this changes back to the
old logic which accounted for the value of currently running counters.

Signed-off-by: Eric Hankland <ehankland@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 16:01:15 +01:00
Vitaly Kuznetsov
a83502314c x86/kvm/hyper-v: don't allow to turn on unsupported VMX controls for nested guests
Sane L1 hypervisors are not supposed to turn any of the unsupported VMX
controls on for its guests and nested_vmx_check_controls() checks for
that. This is, however, not the case for the controls which are supported
on the host but are missing in enlightened VMCS and when eVMCS is in use.

It would certainly be possible to add these missing checks to
nested_check_vm_execution_controls()/_vm_exit_controls()/.. but it seems
preferable to keep eVMCS-specific stuff in eVMCS and reduce the impact on
non-eVMCS guests by doing less unrelated checks. Create a separate
nested_evmcs_check_controls() for this purpose.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:55:26 +01:00
Vitaly Kuznetsov
31de3d2500 x86/kvm/hyper-v: move VMX controls sanitization out of nested_enable_evmcs()
With fine grained VMX feature enablement QEMU>=4.2 tries to do KVM_SET_MSRS
with default (matching CPU model) values and in case eVMCS is also enabled,
fails.

It would be possible to drop VMX feature filtering completely and make
this a guest's responsibility: if it decides to use eVMCS it should know
which fields are available and which are not. Hyper-V mostly complies to
this, however, there are some problematic controls:
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES
VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL

which Hyper-V enables. As there are no corresponding fields in eVMCS, we
can't handle this properly in KVM. This is a Hyper-V issue.

Move VMX controls sanitization from nested_enable_evmcs() to vmx_get_msr(),
and do the bare minimum (only clear controls which are known to cause issues).
This allows userspace to keep setting controls it wants and at the same
time hides them from the guest.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:55:06 +01:00
Sean Christopherson
ea79a75092 KVM: nVMX: Remove stale comment from nested_vmx_load_cr3()
The blurb pertaining to the return value of nested_vmx_load_cr3() no
longer matches reality, remove it entirely as the behavior it is
attempting to document is quite obvious when reading the actual code.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:31:25 +01:00
Miaohe Lin
33aabd029f KVM: nVMX: delete meaningless nested_vmx_run() declaration
The function nested_vmx_run() declaration is below its implementation. So
this is meaningless and should be removed.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:45 +01:00
Suravee Suthikulpanit
f4fdc0a2ed kvm: x86: hyperv: Use APICv update request interface
Since disabling APICv has to be done for all vcpus on AMD-based
system, adopt the newly introduced kvm_request_apicv_update()
interface, and introduce a new APICV_INHIBIT_REASON_HYPERV.

Also, remove the kvm_vcpu_deactivate_apicv() since no longer used.

Cc: Roman Kagan <rkagan@virtuozzo.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:43 +01:00
Suravee Suthikulpanit
ef8efd7a15 kvm: x86: Introduce APICv x86 ops for checking APIC inhibit reasons
Inibit reason bits are used to determine if APICv deactivation is
applicable for a particular hardware virtualization architecture.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:42 +01:00
Paolo Bonzini
7e3e67a987 KVM: x86: remove get_enable_apicv from kvm_x86_ops
It is unused now.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:40 +01:00
Suravee Suthikulpanit
4e19c36f2d kvm: x86: Introduce APICv inhibit reason bits
There are several reasons in which a VM needs to deactivate APICv
e.g. disable APICv via parameter during module loading, or when
enable Hyper-V SynIC support. Additional inhibit reasons will be
introduced later on when dynamic APICv is supported,

Introduce KVM APICv inhibit reason bits along with a new variable,
apicv_inhibit_reasons, to help keep track of APICv state for each VM,

Initially, the APICV_INHIBIT_REASON_DISABLE bit is used to indicate
the case where APICv is disabled during KVM module load.
(e.g. insmod kvm_amd avic=0 or insmod kvm_intel enable_apicv=0).

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
[Do not use get_enable_apicv; consider irqchip_split in svm.c. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-05 15:17:40 +01:00
Linus Torvalds
e813e65038 ARM: Cleanups and corner case fixes
PPC: Bugfixes
 
 x86:
 * Support for mapping DAX areas with large nested page table entries.
 * Cleanups and bugfixes here too.  A particularly important one is
 a fix for FPU load when the thread has TIF_NEED_FPU_LOAD.  There is
 also a race condition which could be used in guest userspace to exploit
 the guest kernel, for which the embargo expired today.
 * Fast path for IPI delivery vmexits, shaving about 200 clock cycles
 from IPI latency.
 * Protect against "Spectre-v1/L1TF" (bring data in the cache via
 speculative out of bound accesses, use L1TF on the sibling hyperthread
 to read it), which unfortunately is an even bigger whack-a-mole game
 than SpectreV1.
 
 Sean continues his mission to rewrite KVM.  In addition to a sizable
 number of x86 patches, this time he contributed a pretty large refactoring
 of vCPU creation that affects all architectures but should not have any
 visible effect.
 
 s390 will come next week together with some more x86 patches.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJeMxtCAAoJEL/70l94x66DQxIIAJv9hMmXLQHGFnUMskjGErR6
 DCLSC0YRdRMwE50CerblyJtGsMwGsPyHZwvZxoAceKJ9w0Yay9cyaoJ87ItBgHoY
 ce0HrqIUYqRSJ/F8WH2lSzkzMBr839rcmqw8p1tt4D5DIsYnxHGWwRaaP+5M/1KQ
 YKFu3Hea4L00U339iIuDkuA+xgz92LIbsn38svv5fxHhPAyWza0rDEYHNgzMKuoF
 IakLf5+RrBFAh6ZuhYWQQ44uxjb+uQa9pVmcqYzzTd5t1g4PV5uXtlJKesHoAvik
 Eba8IEUJn+HgQJjhp3YxQYuLeWOwRF3bwOiZ578MlJ4OPfYXMtbdlqCQANHOcGk=
 =H/q1
 -----END PGP SIGNATURE-----

Merge tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "This is the first batch of KVM changes.

  ARM:
   - cleanups and corner case fixes.

  PPC:
   - Bugfixes

  x86:
   - Support for mapping DAX areas with large nested page table entries.

   - Cleanups and bugfixes here too. A particularly important one is a
     fix for FPU load when the thread has TIF_NEED_FPU_LOAD. There is
     also a race condition which could be used in guest userspace to
     exploit the guest kernel, for which the embargo expired today.

   - Fast path for IPI delivery vmexits, shaving about 200 clock cycles
     from IPI latency.

   - Protect against "Spectre-v1/L1TF" (bring data in the cache via
     speculative out of bound accesses, use L1TF on the sibling
     hyperthread to read it), which unfortunately is an even bigger
     whack-a-mole game than SpectreV1.

  Sean continues his mission to rewrite KVM. In addition to a sizable
  number of x86 patches, this time he contributed a pretty large
  refactoring of vCPU creation that affects all architectures but should
  not have any visible effect.

  s390 will come next week together with some more x86 patches"

* tag 'kvm-5.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
  x86/KVM: Clean up host's steal time structure
  x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed
  x86/kvm: Cache gfn to pfn translation
  x86/kvm: Introduce kvm_(un)map_gfn()
  x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit
  KVM: PPC: Book3S PR: Fix -Werror=return-type build failure
  KVM: PPC: Book3S HV: Release lock on page-out failure path
  KVM: arm64: Treat emulated TVAL TimerValue as a signed 32-bit integer
  KVM: arm64: pmu: Only handle supported event counters
  KVM: arm64: pmu: Fix chained SW_INCR counters
  KVM: arm64: pmu: Don't mark a counter as chained if the odd one is disabled
  KVM: arm64: pmu: Don't increment SW_INCR if PMCR.E is unset
  KVM: x86: Use a typedef for fastop functions
  KVM: X86: Add 'else' to unify fastop and execute call path
  KVM: x86: inline memslot_valid_for_gpte
  KVM: x86/mmu: Use huge pages for DAX-backed files
  KVM: x86/mmu: Remove lpage_is_disallowed() check from set_spte()
  KVM: x86/mmu: Fold max_mapping_level() into kvm_mmu_hugepage_adjust()
  KVM: x86/mmu: Zap any compound page when collapsing sptes
  KVM: x86/mmu: Remove obsolete gfn restoration in FNAME(fetch)
  ...
2020-01-31 09:30:41 -08:00
Miaohe Lin
cef6db76f3 KVM: VMX: remove duplicated segment cache clear
vmx_set_segment() clears segment cache unconditionally, so we should not
clear it again by calling vmx_segment_cache_clear().

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:57 +01:00
Krish Sadhukhan
b91991bf6b KVM: nVMX: Check GUEST_DR7 on vmentry of nested guests
According to section "Checks on Guest Control Registers, Debug Registers, and
and MSRs" in Intel SDM vol 3C, the following checks are performed on vmentry
of nested guests:

    If the "load debug controls" VM-entry control is 1, bits 63:32 in the DR7
    field must be 0.

In KVM, GUEST_DR7 is set prior to the vmcs02 VM-entry by kvm_set_dr() and the
latter synthesizes a #GP if any bit in the high dword in the former is set.
Hence this field needs to be checked in software.

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:55 +01:00
Peter Xu
6a3c623ba8 KVM: X86: Drop x86_set_memory_region()
The helper x86_set_memory_region() is only used in vmx_set_tss_addr()
and kvm_arch_destroy_vm().  Push the lock upper in both cases.  With
that, drop x86_set_memory_region().

This prepares to allow __x86_set_memory_region() to return a HVA
mapped, because the HVA will need to be protected by the lock too even
after __x86_set_memory_region() returns.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:53 +01:00
Peter Xu
2a5755bb21 KVM: X86: Don't take srcu lock in init_rmode_identity_map()
We've already got the slots_lock, so we should be safe.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:53 +01:00
Vitaly Kuznetsov
de9bf4d4ce x86/kvm/hyper-v: remove stale evmcs_already_enabled check from nested_enable_evmcs()
In nested_enable_evmcs() evmcs_already_enabled check doesn't really do
anything: controls are already sanitized and we return '0' regardless.
Just drop the check.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:50 +01:00
Sean Christopherson
de761ea792 KVM: x86: Perform non-canonical checks in 32-bit KVM
Remove the CONFIG_X86_64 condition from the low level non-canonical
helpers to effectively enable non-canonical checks on 32-bit KVM.
Non-canonical checks are performed by hardware if the CPU *supports*
64-bit mode, whether or not the CPU is actually in 64-bit mode is
irrelevant.

For the most part, skipping non-canonical checks on 32-bit KVM is ok-ish
because 32-bit KVM always (hopefully) drops bits 63:32 of whatever value
it's checking before propagating it to hardware, and architecturally,
the expected behavior for the guest is a bit of a grey area since the
vCPU itself doesn't support 64-bit mode.  I.e. a 32-bit KVM guest can
observe the missed checks in several paths, e.g. INVVPID and VM-Enter,
but it's debatable whether or not the missed checks constitute a bug
because technically the vCPU doesn't support 64-bit mode.

The primary motivation for enabling the non-canonical checks is defense
in depth.  As mentioned above, a guest can trigger a missed check via
INVVPID or VM-Enter.  INVVPID is straightforward as it takes a 64-bit
virtual address as part of its 128-bit INVVPID descriptor and fails if
the address is non-canonical, even if INVVPID is executed in 32-bit PM.
Nested VM-Enter is a bit more convoluted as it requires the guest to
write natural width VMCS fields via memory accesses and then VMPTRLD the
VMCS, but it's still possible.  In both cases, KVM is saved from a true
bug only because its flows that propagate values to hardware (correctly)
take "unsigned long" parameters and so drop bits 63:32 of the bad value.

Explicitly performing the non-canonical checks makes it less likely that
a bad value will be propagated to hardware, e.g. in the INVVPID case,
if __invvpid() didn't implicitly drop bits 63:32 then KVM would BUG() on
the resulting unexpected INVVPID failure due to hardware rejecting the
non-canonical address.

The only downside to enabling the non-canonical checks is that it adds a
relatively small amount of overhead, but the affected flows are not hot
paths, i.e. the overhead is negligible.

Note, KVM technically could gate the non-canonical checks on 32-bit KVM
with static_cpu_has(X86_FEATURE_LM), but on bare metal that's an even
bigger waste of code for everyone except the 0.00000000000001% of the
population running on Yonah, and nested 32-bit on 64-bit already fudges
things with respect to 64-bit CPU behavior.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
[Also do so in nested_vmx_check_host_state as reported by Krish. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:50 +01:00
Oliver Upton
d196842150 KVM: nVMX: WARN on failure to set IA32_PERF_GLOBAL_CTRL
Writes to MSR_CORE_PERF_GLOBAL_CONTROL should never fail if the VM-exit
and VM-entry controls are exposed to L1. Promote the checks to perform a
full WARN if kvm_set_msr() fails and remove the now unused macro
SET_MSR_OR_WARN().

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:49 +01:00
Marios Pomonis
66061740f1 KVM: x86: Protect pmu_intel.c from Spectre-v1/L1TF attacks
This fixes Spectre-v1/L1TF vulnerabilities in intel_find_fixed_event()
and intel_rdpmc_ecx_to_pmc().
kvm_rdpmc() (ancestor of intel_find_fixed_event()) and
reprogram_fixed_counter() (ancestor of intel_rdpmc_ecx_to_pmc()) are
exported symbols so KVM should treat them conservatively from a security
perspective.

Fixes: 25462f7f52 ("KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch")

Signed-off-by: Nick Finco <nifi@google.com>
Signed-off-by: Marios Pomonis <pomonis@google.com>
Reviewed-by: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:44 +01:00
Marios Pomonis
c926f2f723 KVM: x86: Protect exit_reason from being used in Spectre-v1/L1TF attacks
This fixes a Spectre-v1/L1TF vulnerability in vmx_handle_exit().
While exit_reason is set by the hardware and therefore should not be
attacker-influenced, an unknown exit_reason could potentially be used to
perform such an attack.

Fixes: 55d2375e58 ("KVM: nVMX: Move nested code to dedicated files")

Signed-off-by: Marios Pomonis <pomonis@google.com>
Signed-off-by: Nick Finco <nifi@google.com>
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:42 +01:00
John Allen
a47970ed74 kvm/svm: PKU not currently supported
Current SVM implementation does not have support for handling PKU. Guests
running on a host with future AMD cpus that support the feature will read
garbage from the PKRU register and will hit segmentation faults on boot as
memory is getting marked as protected that should not be. Ensure that cpuid
from SVM does not advertise the feature.

Signed-off-by: John Allen <john.allen@amd.com>
Cc: stable@vger.kernel.org
Fixes: 0556cbdc2f ("x86/pkeys: Don't check if PKRU is zero before writing it")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-27 19:59:35 +01:00
Sean Christopherson
987b2594ed KVM: x86: Move kvm_vcpu_init() invocation to common code
Move the kvm_cpu_{un}init() calls to common x86 code as an intermediate
step to removing kvm_cpu_{un}init() altogether.

Note, VMX'x alloc_apic_access_page() and init_rmode_identity_map() are
per-VM allocations and are intentionally kept if vCPU creation fails.
They are freed by kvm_arch_destroy_vm().

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24 09:18:57 +01:00
Sean Christopherson
fc6e2a1845 KVM: x86: Move FPU allocation to common x86 code
The allocation of FPU structs is identical across VMX and SVM, move it
to common x86 code.  Somewhat arbitrarily place the allocation so that
it resides directly above the associated initialization via fx_init(),
e.g. instead of retaining its position with respect to the overall vcpu
creation flow.  Although the names names kvm_arch_vcpu_create() and
kvm_arch_vcpu_init() might suggest otherwise, x86 does not have a clean
split between 'create' and 'init'.  Allocating the struct immediately
prior to the first use arguably improves readability *now*, and will
yield even bigger improvements when kvm_arch_vcpu_init() is removed in
a future patch.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24 09:18:56 +01:00
Sean Christopherson
a9dd6f09d7 KVM: x86: Allocate vcpu struct in common x86 code
Move allocation of VMX and SVM vcpus to common x86.  Although the struct
being allocated is technically a VMX/SVM struct, it can be interpreted
directly as a 'struct kvm_vcpu' because of the pre-existing requirement
that 'struct kvm_vcpu' be located at offset zero of the arch/vendor vcpu
struct.

Remove the message from the build-time assertions regarding placement of
the struct, as compatibility with the arch usercopy region is no longer
the sole dependent on 'struct kvm_vcpu' being at offset zero.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24 09:18:55 +01:00
Sean Christopherson
34109c0476 KVM: VMX: Use direct vcpu pointer during vCPU create/free
Capture the vcpu pointer in a local varaible and replace '&vmx->vcpu'
references with a direct reference to the pointer in anticipation of
moving bits of the code to common x86 and passing the vcpu pointer into
vmx_create_vcpu(), i.e. eliminate unnecessary noise from future patches.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24 09:18:54 +01:00
Sean Christopherson
034d8e2cb9 KVM: VMX: Allocate VPID after initializing VCPU
Do VPID allocation after calling the common kvm_vcpu_init() as a step
towards doing vCPU allocation (via kmem_cache_zalloc()) and calling
kvm_vcpu_init() back-to-back.  Squishing allocation and initialization
together will eventually allow the sequence to be moved to arch-agnostic
creation code.

Note, the VPID is not consumed until KVM_RUN, slightly delaying its
allocation should have no real function impact.  VPID allocation was
arbitrarily placed in the original patch, commit 2384d2b326 ("KVM:
VMX: Enable Virtual Processor Identification (VPID)").

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24 09:18:53 +01:00
Paolo Bonzini
6441fa6178 KVM: x86: avoid incorrect writes to host MSR_IA32_SPEC_CTRL
If the guest is configured to have SPEC_CTRL but the host does not
(which is a nonsensical configuration but these are not explicitly
forbidden) then a host-initiated MSR write can write vmx->spec_ctrl
(respectively svm->spec_ctrl) and trigger a #GP when KVM tries to
restore the host value of the MSR.  Add a more comprehensive check
for valid bits of SPEC_CTRL, covering host CPUID flags and,
since we are at it and it is more correct that way, guest CPUID
flags too.

For AMD, remove the unnecessary is_guest_mode check around setting
the MSR interception bitmap, so that the code looks the same as
for Intel.

Cc: Jim Mattson <jmattson@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-24 09:18:47 +01:00
Miaohe Lin
a4d956b939 KVM: nVMX: vmread should not set rflags to specify success in case of #PF
In case writing to vmread destination operand result in a #PF, vmread
should not call nested_vmx_succeed() to set rflags to specify success.
Similar to as done in VMPTRST (See handle_vmptrst()).

Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: stable@vger.kernel.org
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21 14:45:32 +01:00
Miaohe Lin
d8010a779a KVM: vmx: delete meaningless nested_vmx_prepare_msr_bitmap() declaration
The function nested_vmx_prepare_msr_bitmap() declaration is below its
implementation. So this is meaningless and should be removed.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21 14:45:29 +01:00
Sean Christopherson
87382003e3 KVM: x86: Refactor and rename bit() to feature_bit() macro
Rename bit() to __feature_bit() to give it a more descriptive name, and
add a macro, feature_bit(), to stuff the X68_FEATURE_ prefix to keep
line lengths manageable for code that hardcodes the bit to be retrieved.

No functional change intended.

Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21 14:45:28 +01:00
Sean Christopherson
96be4e069c KVM: x86: Drop special XSAVE handling from guest_cpuid_has()
Now that KVM prevents setting host-reserved CR4 bits, drop the dedicated
XSAVE check in guest_cpuid_has() in favor of open coding similar checks
in the SVM/VMX XSAVES enabling flows.

Note, checking boot_cpu_has(X86_FEATURE_XSAVE) in the XSAVES flows is
technically redundant with respect to the CR4 reserved bit checks, e.g.
XSAVES #UDs if CR4.OSXSAVE=0 and arch.xsaves_enabled is consumed if and
only if CR4.OXSAVE=1 in guest.  Keep (add?) the explicit boot_cpu_has()
checks to help document KVM's usage of arch.xsaves_enabled.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21 13:58:10 +01:00
Sean Christopherson
e348ac7c9e KVM: VMX: Add helper to consolidate up PT/RTIT WRMSR fault logic
Add a helper to consolidate the common checks for writing PT MSRs,
and opportunistically clean up the formatting of the affected code.

No functional change intended.

Cc: Chao Peng <chao.p.peng@linux.intel.com>
Cc: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21 13:57:54 +01:00
Sean Christopherson
fe6ed369fc KVM: VMX: Add non-canonical check on writes to RTIT address MSRs
Reject writes to RTIT address MSRs if the data being written is a
non-canonical address as the MSRs are subject to canonical checks, e.g.
KVM will trigger an unchecked #GP when loading the values to hardware
during pt_guest_enter().

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21 13:57:50 +01:00
Miaohe Lin
311497e0c5 KVM: Fix some writing mistakes
Fix some writing mistakes in the comments.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21 13:57:44 +01:00
Miaohe Lin
67b0ae43df KVM: Fix some comment typos and missing parentheses
Fix some typos and add missing parentheses in the comments.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21 13:57:32 +01:00
Miaohe Lin
4d516fe7d3 KVM: Fix some out-dated function names in comment
Since commit b1346ab2af ("KVM: nVMX: Rename prepare_vmcs02_*_full to
prepare_vmcs02_*_rare"), prepare_vmcs02_full has been renamed to
prepare_vmcs02_rare.
nested_vmx_merge_msr_bitmap is renamed to nested_vmx_prepare_msr_bitmap
since commit c992384bde ("KVM: vmx: speed up MSR bitmap merge").

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21 13:57:27 +01:00
Miaohe Lin
668effb63d KVM: Fix some wrong function names in comment
Fix some wrong function names in comment. mmu_check_roots is a typo for
mmu_check_root, vmcs_read_any should be vmcs12_read_any and so on.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21 13:57:23 +01:00
Wanpeng Li
1e9e2622a1 KVM: VMX: FIXED+PHYSICAL mode single target IPI fastpath
ICR and TSCDEADLINE MSRs write cause the main MSRs write vmexits in our
product observation, multicast IPIs are not as common as unicast IPI like
RESCHEDULE_VECTOR and CALL_FUNCTION_SINGLE_VECTOR etc.

This patch introduce a mechanism to handle certain performance-critical
WRMSRs in a very early stage of KVM VMExit handler.

This mechanism is specifically used for accelerating writes to x2APIC ICR
that attempt to send a virtual IPI with physical destination-mode, fixed
delivery-mode and single target. Which was found as one of the main causes
of VMExits for Linux workloads.

The reason this mechanism significantly reduce the latency of such virtual
IPIs is by sending the physical IPI to the target vCPU in a very early stage
of KVM VMExit handler, before host interrupts are enabled and before expensive
operations such as reacquiring KVM’s SRCU lock.
Latency is reduced even more when KVM is able to use APICv posted-interrupt
mechanism (which allows to deliver the virtual IPI directly to target vCPU
without the need to kick it to host).

Testing on Xeon Skylake server:

The virtual IPI latency from sender send to receiver receive reduces
more than 200+ cpu cycles.

Reviewed-by: Liran Alon <liran.alon@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-21 13:57:12 +01:00
Sean Christopherson
ff10e22e65 KVM: VMX: Check for full VMX support when verifying CPU compatibility
Explicitly check the current CPU's IA32_FEAT_CTL and VMX feature flags
when verifying compatibility across physical CPUs.  This effectively
adds a check on IA32_FEAT_CTL to ensure that VMX is fully enabled on
all CPUs.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-17-sean.j.christopherson@intel.com
2020-01-13 19:26:58 +01:00
Sean Christopherson
a4d0b2fdbc KVM: VMX: Use VMX feature flag to query BIOS enabling
Replace KVM's manual checks on IA32_FEAT_CTL with a query on the boot
CPU's MSR_IA32_FEAT_CTL and VMX feature flags.  The MSR_IA32_FEAT_CTL
indicates that IA32_FEAT_CTL has been configured and that dependent
features are accurately reflected in cpufeatures, e.g. the VMX flag is
now cleared during boot if VMX isn't fully enabled via IA32_FEAT_CTL,
including the case where the MSR isn't supported.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-16-sean.j.christopherson@intel.com
2020-01-13 19:19:32 +01:00
Sean Christopherson
21bd3467a5 KVM: VMX: Drop initialization of IA32_FEAT_CTL MSR
Remove KVM's code to initialize IA32_FEAT_CTL MSR when KVM is loaded now
that the MSR is initialized during boot on all CPUs that support VMX,
i.e. on all CPUs that can possibly load kvm_intel.

Note, don't WARN if IA32_FEAT_CTL is unlocked, even though the MSR is
unconditionally locked by init_ia32_feat_ctl().  KVM isn't tied directly
to a CPU vendor detection, whereas init_ia32_feat_ctl() is invoked if
and only if the CPU vendor is recognized and known to support VMX.  As a
result, vmx_disabled_by_bios() may be reached without going through
init_ia32_feat_ctl() and thus without locking IA32_FEAT_CTL.  This quirk
will be eliminated in a future patch.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lkml.kernel.org/r/20191221044513.21680-15-sean.j.christopherson@intel.com
2020-01-13 19:04:37 +01:00
Sean Christopherson
32ad73db7f x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR
As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
are quite a mouthful, especially the VMX bits which must differentiate
between enabling VMX inside and outside SMX (TXT) operation.  Rename the
MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
make them a little friendlier on the eyes.

Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
to match Intel's SDM, but a future patch will add a dedicated Kconfig,
file and functions for the MSR. Using the full name for those assets is
rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
nomenclature is consistent throughout the kernel.

Opportunistically, fix a few other annoyances with the defines:

  - Relocate the bit defines so that they immediately follow the MSR
    define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
  - Add whitespace around the block of feature control defines to make
    it clear they're all related.
  - Use BIT() instead of manually encoding the bit shift.
  - Use "VMX" instead of "VMXON" to match the SDM.
  - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
    be consistent with the kernel's verbiage used for all other feature
    control bits.  Note, the SDM refers to the LMCE bit as LMCE_ON,
    likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN.  Ignore
    the (literal) one-off usage of _ON, the SDM is simply "wrong".

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
2020-01-13 17:23:08 +01:00
Jim Mattson
c90f4d03cc kvm: nVMX: Aesthetic cleanup of handle_vmread and handle_vmwrite
Apply reverse fir tree declaration order, shorten some variable names
to avoid line wrap, reformat a block comment, delete an extra blank
line, and use BIT(10) instead of (1u << 10).

Signed-off-by: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Jon Cargille <jcargill@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08 18:16:00 +01:00
Jim Mattson
693e02cc24 kvm: nVMX: VMWRITE checks unsupported field before read-only field
According to the SDM, VMWRITE checks to see if the secondary source
operand corresponds to an unsupported VMCS field before it checks to
see if the secondary source operand corresponds to a VM-exit
information field and the processor does not support writing to
VM-exit information fields.

Fixes: 49f705c532 ("KVM: nVMX: Implement VMREAD and VMWRITE")
Signed-off-by: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Jon Cargille <jcargill@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08 18:16:00 +01:00
Jim Mattson
dd2d6042b7 kvm: nVMX: VMWRITE checks VMCS-link pointer before VMCS field
According to the SDM, a VMWRITE in VMX non-root operation with an
invalid VMCS-link pointer results in VMfailInvalid before the validity
of the VMCS field in the secondary source operand is checked.

For consistency, modify both handle_vmwrite and handle_vmread, even
though there was no problem with the latter.

Fixes: 6d894f498f ("KVM: nVMX: vmread/vmwrite: Use shadow vmcs12 if running L2")
Signed-off-by: Jim Mattson <jmattson@google.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Jon Cargille <jcargill@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08 18:16:00 +01:00
Xiaoyao Li
5e3d394fdd KVM: VMX: Fix the spelling of CPU_BASED_USE_TSC_OFFSETTING
The mis-spelling is found by checkpatch.pl, so fix them.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-08 18:15:59 +01:00