Commit Graph

6486 Commits

Author SHA1 Message Date
Sean Christopherson
59505b55aa KVM: x86/mmu: Add separate helper for shadow NPT root page role calc
Refactor the shadow NPT role calculation into a separate helper to
better differentiate it from the non-nested shadow MMU, e.g. the NPT
variant is never direct and derives its root level from the TDP level.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200716034122.5998-3-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-30 18:14:34 -04:00
Sean Christopherson
f291a358e0 KVM: VMX: Drop a duplicate declaration of construct_eptp()
Remove an extra declaration of construct_eptp() from vmx.h.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200716034122.5998-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-30 18:13:47 -04:00
Sean Christopherson
096586fda5 KVM: nSVM: Correctly set the shadow NPT root level in its MMU role
Move the initialization of shadow NPT MMU's shadow_root_level into
kvm_init_shadow_npt_mmu() and explicitly set the level in the shadow NPT
MMU's role to be the TDP level.  This ensures the role and MMU levels
are synchronized and also initialized before __kvm_mmu_new_pgd(), which
consumes the level when attempting a fast PGD switch.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Fixes: 9fa72119b2 ("kvm: x86: Introduce kvm_mmu_calc_root_page_role()")
Fixes: a506fdd223 ("KVM: nSVM: implement nested_svm_load_cr3() and use it for host->guest switch")
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200716034122.5998-2-sean.j.christopherson@intel.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-30 18:13:23 -04:00
Thomas Gleixner
f3020b8891 x86/kvm: Use __xfer_to_guest_mode_work_pending() in kvm_run_vcpu()
The comments explicitely explain that the work flags check and handling in
kvm_run_vcpu() is done with preemption and interrupts enabled as KVM
invokes the check again right before entering guest mode with interrupts
disabled which guarantees that the work flags are observed and handled
before VMENTER.

Nevertheless the flag pending check in kvm_run_vcpu() uses the helper
variant which requires interrupts to be disabled triggering an instant
lockdep splat. This was caught in testing before and then not fixed up in
the patch before applying. :(

Use the relaxed and intentionally racy __xfer_to_guest_mode_work_pending()
instead.

Fixes: 72c3c0fe54 ("x86/kvm: Use generic xfer to guest work function")
Reported-by: Qian Cai <cai@lca.pw> writes:
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/87bljxa2sa.fsf@nanos.tec.linutronix.de
2020-07-30 12:31:47 +02:00
Haiwei Li
9c2475f3e4 KVM: Using macros instead of magic values
Instead of using magic values, use macros.

Signed-off-by: Haiwei Li <lihaiwei@tencent.com>
Message-Id: <4c072161-80dd-b7ed-7adb-02acccaa0701@gmail.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-27 10:40:25 -04:00
Paolo Bonzini
5e105c88ab KVM: nVMX: check for invalid hdr.vmx.flags
hdr.vmx.flags is meant for future extensions to the ABI, rejecting
invalid flags is necessary to avoid broken half-loads of the
nVMX state.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-27 09:04:50 -04:00
Paolo Bonzini
0f02bd0ade KVM: nVMX: check for required but missing VMCS12 in KVM_SET_NESTED_STATE
A missing VMCS12 was not causing -EINVAL (it was just read with
copy_from_user, so it is not a security issue, but it is still
wrong).  Test for VMCS12 validity and reject the nested state
if a VMCS12 is required but not present.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-27 09:04:49 -04:00
Thomas Gleixner
72c3c0fe54 x86/kvm: Use generic xfer to guest work function
Use the generic infrastructure to check for and handle pending work before
transitioning into guest mode.

This now handles TIF_NOTIFY_RESUME as well which was ignored so
far. Handling it is important as this covers task work and task work will
be used to offload the heavy lifting of POSIX CPU timers to thread context.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200722220520.979724969@linutronix.de
2020-07-24 15:05:01 +02:00
Kees Cook
3f649ab728 treewide: Remove uninitialized_var() usage
Using uninitialized_var() is dangerous as it papers over real bugs[1]
(or can in the future), and suppresses unrelated compiler warnings
(e.g. "unused variable"). If the compiler thinks it is uninitialized,
either simply initialize the variable or make compiler changes.

In preparation for removing[2] the[3] macro[4], remove all remaining
needless uses with the following script:

git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \
	xargs perl -pi -e \
		's/\buninitialized_var\(([^\)]+)\)/\1/g;
		 s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;'

drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid
pathological white-space.

No outstanding warnings were found building allmodconfig with GCC 9.3.0
for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64,
alpha, and m68k.

[1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/
[2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/
[3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/
[4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/

Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5
Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB
Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers
Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-07-16 12:35:15 -07:00
Paolo Bonzini
e8af9e9f45 KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF
The "if" that drops the present bit from the page structure fauls makes no sense.
It was added by yours truly in order to be bug-compatible with pre-existing code
and in order to make the tests pass; however, the tests are wrong.  The behavior
after this patch matches bare metal.

Reported-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 17:01:54 -04:00
Mohammed Gamal
3edd68399d KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
This patch adds a new capability KVM_CAP_SMALLER_MAXPHYADDR which
allows userspace to query if the underlying architecture would
support GUEST_MAXPHYADDR < HOST_MAXPHYADDR and hence act accordingly
(e.g. qemu can decide if it should warn for -cpu ..,phys-bits=X)

The complications in this patch are due to unexpected (but documented)
behaviour we see with NPF vmexit handling in AMD processor.  If
SVM is modified to add guest physical address checks in the NPF
and guest #PF paths, we see the followning error multiple times in
the 'access' test in kvm-unit-tests:

            test pte.p pte.36 pde.p: FAIL: pte 2000021 expected 2000001
            Dump mapping: address: 0x123400000000
            ------L4: 24c3027
            ------L3: 24c4027
            ------L2: 24c5021
            ------L1: 1002000021

This is because the PTE's accessed bit is set by the CPU hardware before
the NPF vmexit. This is handled completely by hardware and cannot be fixed
in software.

Therefore, availability of the new capability depends on a boolean variable
allow_smaller_maxphyaddr which is set individually by VMX and SVM init
routines. On VMX it's always set to true, on SVM it's only set to true
when NPT is not enabled.

CC: Tom Lendacky <thomas.lendacky@amd.com>
CC: Babu Moger <babu.moger@amd.com>
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Message-Id: <20200710154811.418214-10-mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 17:01:53 -04:00
Paolo Bonzini
8c4182bd27 KVM: VMX: optimize #PF injection when MAXPHYADDR does not match
Ignore non-present page faults, since those cannot have reserved
bits set.

When running access.flat with "-cpu Haswell,phys-bits=36", the
number of trapped page faults goes down from 8872644 to 3978948.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200710154811.418214-9-mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 17:01:52 -04:00
Mohammed Gamal
1dbf5d68af KVM: VMX: Add guest physical address check in EPT violation and misconfig
Check guest physical address against its maximum, which depends on the
guest MAXPHYADDR. If the guest's physical address exceeds the
maximum (i.e. has reserved bits set), inject a guest page fault with
PFERR_RSVD_MASK set.

This has to be done both in the EPT violation and page fault paths, as
there are complications in both cases with respect to the computation
of the correct error code.

For EPT violations, unfortunately the only possibility is to emulate,
because the access type in the exit qualification might refer to an
access to a paging structure, rather than to the access performed by
the program.

Trapping page faults instead is needed in order to correct the error code,
but the access type can be obtained from the original error code and
passed to gva_to_gpa.  The corrections required in the error code are
subtle. For example, imagine that a PTE for a supervisor page has a reserved
bit set.  On a supervisor-mode access, the EPT violation path would trigger.
However, on a user-mode access, the processor will not notice the reserved
bit and not include PFERR_RSVD_MASK in the error code.

Co-developed-by: Mohammed Gamal <mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200710154811.418214-8-mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 17:01:52 -04:00
Paolo Bonzini
a0c134347b KVM: VMX: introduce vmx_need_pf_intercept
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200710154811.418214-7-mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 17:01:52 -04:00
Paolo Bonzini
32de2b5ee3 KVM: x86: update exception bitmap on CPUID changes
Allow vendor code to observe changes to MAXPHYADDR and start/stop
intercepting page faults.

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 17:01:29 -04:00
Paolo Bonzini
6986982fef KVM: x86: rename update_bp_intercept to update_exception_bitmap
We would like to introduce a callback to update the #PF intercept
when CPUID changes.  Just reuse update_bp_intercept since VMX is
already using update_exception_bitmap instead of a bespoke function.

While at it, remove an unnecessary assignment in the SVM version,
which is already done in the caller (kvm_arch_vcpu_ioctl_set_guest_debug)
and has nothing to do with the exception bitmap.

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 13:20:18 -04:00
Mohammed Gamal
ec7771ab47 KVM: x86: mmu: Add guest physical address check in translate_gpa()
Intel processors of various generations have supported 36, 39, 46 or 52
bits for physical addresses.  Until IceLake introduced MAXPHYADDR==52,
running on a machine with higher MAXPHYADDR than the guest more or less
worked, because software that relied on reserved address bits (like KVM)
generally used bit 51 as a marker and therefore the page faults where
generated anyway.

Unfortunately this is not true anymore if the host MAXPHYADDR is 52,
and this can cause problems when migrating from a MAXPHYADDR<52
machine to one with MAXPHYADDR==52.  Typically, the latter are machines
that support 5-level page tables, so they can be identified easily from
the LA57 CPUID bit.

When that happens, the guest might have a physical address with reserved
bits set, but the host won't see that and trap it.  Hence, we need
to check page faults' physical addresses against the guest's maximum
physical memory and if it's exceeded, we need to add the PFERR_RSVD_MASK
bits to the page fault error code.

This patch does this for the MMU's page walks.  The next patches will
ensure that the correct exception and error code is produced whenever
no host-reserved bits are set in page table entries.

Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200710154811.418214-4-mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 13:09:59 -04:00
Mohammed Gamal
cd313569f5 KVM: x86: mmu: Move translate_gpa() to mmu.c
Also no point of it being inline since it's always called through
function pointers. So remove that.

Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200710154811.418214-3-mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 13:09:40 -04:00
Mohammed Gamal
897861479c KVM: x86: Add helper functions for illegal GPA checking and page fault injection
This patch adds two helper functions that will be used to support virtualizing
MAXPHYADDR in both kvm-intel.ko and kvm.ko.

kvm_fixup_and_inject_pf_error() injects a page fault for a user-specified GVA,
while kvm_mmu_is_illegal_gpa() checks whether a GPA exceeds vCPU address limits.

Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200710154811.418214-2-mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 13:07:28 -04:00
Vitaly Kuznetsov
fe9304d318 KVM: x86: drop superfluous mmu_check_root() from fast_pgd_switch()
The mmu_check_root() check in fast_pgd_switch() seems to be
superfluous: when GPA is outside of the visible range
cached_root_available() will fail for non-direct roots
(as we can't have a matching one on the list) and we don't
seem to care for direct ones.

Also, raising #TF immediately when a non-existent GFN is written to CR3
doesn't seem to mach architectural behavior. Drop the check.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200710141157.1640173-10-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 13:03:12 -04:00
Vitaly Kuznetsov
d82aaef9c8 KVM: nSVM: use nested_svm_load_cr3() on guest->host switch
Make nSVM code resemble nVMX where nested_vmx_load_cr3() is used on
both guest->host and host->guest transitions. Also, we can now
eliminate unconditional kvm_mmu_reset_context() and speed things up.

Note, nVMX has two different paths: load_vmcs12_host_state() and
nested_vmx_restore_host_state() and the later is used to restore from
'partial' switch to L2, it always uses kvm_mmu_reset_context().
nSVM doesn't have this yet. Also, nested_svm_vmexit()'s return value
is almost always ignored nowadays.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200710141157.1640173-9-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 12:59:39 -04:00
Vitaly Kuznetsov
a506fdd223 KVM: nSVM: implement nested_svm_load_cr3() and use it for host->guest switch
Undesired triple fault gets injected to L1 guest on SVM when L2 is
launched with certain CR3 values. #TF is raised by mmu_check_root()
check in fast_pgd_switch() and the root cause is that when
kvm_set_cr3() is called from nested_prepare_vmcb_save() with NPT
enabled CR3 points to a nGPA so we can't check it with
kvm_is_visible_gfn().

Using generic kvm_set_cr3() when switching to nested guest is not
a great idea as we'll have to distinguish between 'real' CR3s and
'nested' CR3s to e.g. not call kvm_mmu_new_pgd() with nGPA. Following
nVMX implement nested-specific nested_svm_load_cr3() doing the job.

To support the change, nested_svm_load_cr3() needs to be re-ordered
with nested_svm_init_mmu_context().

Note: the current implementation is sub-optimal as we always do TLB
flush/MMU sync but this is still an improvement as we at least stop doing
kvm_mmu_reset_context().

Fixes: 7c390d350f ("kvm: x86: Add fast CR3 switch code path")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200710141157.1640173-8-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 12:57:37 -04:00
Vitaly Kuznetsov
bf7dea4253 KVM: nSVM: move kvm_set_cr3() after nested_svm_uninit_mmu_context()
kvm_mmu_new_pgd() refers to arch.mmu and at this point it still references
arch.guest_mmu while arch.root_mmu is expected.

Note, the change is effectively a nop: when !npt_enabled,
nested_svm_uninit_mmu_context() does nothing (as we don't do
nested_svm_init_mmu_context()) and with npt_enabled we don't
do kvm_set_cr3().  However, it will matter when we move the
call to kvm_mmu_new_pgd into nested_svm_load_cr3().

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200710141157.1640173-7-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 12:56:30 -04:00
Vitaly Kuznetsov
62156f6cd1 KVM: nSVM: introduce nested_svm_load_cr3()/nested_npt_enabled()
As a preparatory change for implementing nSVM-specific PGD switch
(following nVMX' nested_vmx_load_cr3()), introduce nested_svm_load_cr3()
instead of relying on kvm_set_cr3().

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200710141157.1640173-6-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 12:55:36 -04:00
Vitaly Kuznetsov
59cd9bc5b0 KVM: nSVM: prepare to handle errors from enter_svm_guest_mode()
Some operations in enter_svm_guest_mode() may fail, e.g. currently
we suppress kvm_set_cr3() return value. Prepare the code to proparate
errors.

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200710141157.1640173-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 12:55:13 -04:00
Vitaly Kuznetsov
ebdb3dba7b KVM: nSVM: reset nested_run_pending upon nested_svm_vmrun_msrpm() failure
WARN_ON_ONCE(svm->nested.nested_run_pending) in nested_svm_vmexit()
will fire if nested_run_pending remains '1' but it doesn't really
need to, we are already failing and not going to run nested guest.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200710141157.1640173-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 12:54:25 -04:00
Paolo Bonzini
8c008659aa KVM: MMU: stop dereferencing vcpu->arch.mmu to get the context for MMU init
kvm_init_shadow_mmu() was actually the only function that could be called
with different vcpu->arch.mmu values.  Now that kvm_init_shadow_npt_mmu()
is separated from kvm_init_shadow_mmu(), we always know the MMU context
we need to use and there is no need to dereference vcpu->arch.mmu pointer.

Based on a patch by Vitaly Kuznetsov <vkuznets@redhat.com>.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200710141157.1640173-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 12:53:58 -04:00
Vitaly Kuznetsov
0f04a2ac4f KVM: nSVM: split kvm_init_shadow_npt_mmu() from kvm_init_shadow_mmu()
As a preparatory change for moving kvm_mmu_new_pgd() from
nested_prepare_vmcb_save() to nested_svm_init_mmu_context() split
kvm_init_shadow_npt_mmu() from kvm_init_shadow_mmu(). This also makes
the code look more like nVMX (kvm_init_shadow_ept_mmu()).

No functional change intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200710141157.1640173-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 12:53:27 -04:00
Vitaly Kuznetsov
d574c539c3 KVM: x86: move MSR_IA32_PERF_CAPABILITIES emulation to common x86 code
state_test/smm_test selftests are failing on AMD with:
"Unexpected result from KVM_GET_MSRS, r: 51 (failed MSR was 0x345)"

MSR_IA32_PERF_CAPABILITIES is an emulated MSR on Intel but it is not
known to AMD code, we can move the emulation to common x86 code. For
AMD, we basically just allow the host to read and write zero to the MSR.

Fixes: 27461da310 ("KVM: x86/pmu: Support full width counting")
Suggested-by: Jim Mattson <jmattson@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200710152559.1645827-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 12:52:56 -04:00
Paolo Bonzini
83d31e5271 KVM: nVMX: fixes for preemption timer migration
Commit 850448f35a ("KVM: nVMX: Fix VMX preemption timer migration",
2020-06-01) accidentally broke nVMX live migration from older version
by changing the userspace ABI.  Restore it and, while at it, ensure
that vmx->nested.has_preemption_timer_deadline is always initialized
according to the KVM_STATE_VMX_PREEMPTION_TIMER_DEADLINE flag.

Cc: Makarand Sonare <makarandsonare@google.com>
Fixes: 850448f35a ("KVM: nVMX: Fix VMX preemption timer migration")
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10 06:15:36 -04:00
Sean Christopherson
6926f95acc KVM: Move x86's MMU memory cache helpers to common KVM code
Move x86's memory cache helpers to common KVM code so that they can be
reused by arm64 and MIPS in future patches.

Suggested-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-16-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:42 -04:00
Sean Christopherson
94ce87ef81 KVM: x86/mmu: Prepend "kvm_" to memory cache helpers that will be global
Rename the memory helpers that will soon be moved to common code and be
made globaly available via linux/kvm_host.h.  "mmu" alone is not a
sufficient namespace for globally available KVM symbols.

Opportunistically add "nr_" in mmu_memory_cache_free_objects() to make
it clear the function returns the number of free objects, as opposed to
freeing existing objects.

Suggested-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-14-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:41 -04:00
Sean Christopherson
378f5cd64a KVM: x86/mmu: Skip filling the gfn cache for guaranteed direct MMU topups
Don't bother filling the gfn array cache when the caller is a fully
direct MMU, i.e. won't need a gfn array for shadow pages.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-13-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:41 -04:00
Sean Christopherson
9688088378 KVM: x86/mmu: Zero allocate shadow pages (outside of mmu_lock)
Set __GFP_ZERO for the shadow page memory cache and drop the explicit
clear_page() from kvm_mmu_get_page().  This moves the cost of zeroing a
page to the allocation time of the physical page, i.e. when topping up
the memory caches, and thus avoids having to zero out an entire page
while holding mmu_lock.

Cc: Peter Feiner <pfeiner@google.com>
Cc: Peter Shier <pshier@google.com>
Cc: Junaid Shahid <junaids@google.com>
Cc: Jim Mattson <jmattson@google.com>
Suggested-by: Ben Gardon <bgardon@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-12-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:40 -04:00
Sean Christopherson
5f6078f9f1 KVM: x86/mmu: Make __GFP_ZERO a property of the memory cache
Add a gfp_zero flag to 'struct kvm_mmu_memory_cache' and use it to
control __GFP_ZERO instead of hardcoding a call to kmem_cache_zalloc().
A future patch needs such a flag for the __get_free_page() path, as
gfn arrays do not need/want the allocator to zero the memory.  Convert
the kmem_cache paths to __GFP_ZERO now so as to avoid a weird and
inconsistent API in the future.

No functional change intended.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-11-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:40 -04:00
Sean Christopherson
171a90d70f KVM: x86/mmu: Separate the memory caches for shadow pages and gfn arrays
Use separate caches for allocating shadow pages versus gfn arrays.  This
sets the stage for specifying __GFP_ZERO when allocating shadow pages
without incurring extra cost for gfn arrays.

No functional change intended.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-10-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:40 -04:00
Sean Christopherson
531281ad98 KVM: x86/mmu: Clean up the gorilla math in mmu_topup_memory_caches()
Clean up the minimums in mmu_topup_memory_caches() to document the
driving mechanisms behind the minimums.  Now that encountering an empty
cache is unlikely to trigger BUG_ON(), it is less dangerous to be more
precise when defining the minimums.

For rmaps, the logic is 1 parent PTE per level, plus a single rmap, and
prefetched rmaps.  The extra objects in the current '8 + PREFETCH'
minimum came about due to an abundance of paranoia in commit
c41ef344de ("KVM: MMU: increase per-vcpu rmap cache alloc size"),
i.e. it could have increased the minimum to 2 rmaps.  Furthermore, the
unexpected extra rmap case was killed off entirely by commits
f759e2b4c7 ("KVM: MMU: avoid pte_list_desc running out in
kvm_mmu_pte_write") and f5a1e9f895 ("KVM: MMU: remove call to
kvm_mmu_pte_write from walk_addr").

For the so called page cache, replace '8' with 2*PT64_ROOT_MAX_LEVEL.
The 2x multiplier is needed because the cache is used for both shadow
pages and gfn arrays for indirect MMUs.

And finally, for page headers, replace '4' with PT64_ROOT_MAX_LEVEL.

Note, KVM now supports 5-level paging, i.e. the old minimums that used a
baseline derived from 4-level paging were technically wrong.  But, KVM
always allocates roots in a separate flow, e.g. it's impossible in the
current implementation to actually need 5 new shadow pages in a single
flow.  Use PT64_ROOT_MAX_LEVEL unmodified instead of subtracting 1, as
the direct usage is likely more intuitive to uninformed readers, and the
inflated minimum is unlikely to affect functionality in practice.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-9-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:39 -04:00
Sean Christopherson
f3747a5a9e KVM: x86/mmu: Topup memory caches after walking GVA->GPA
Topup memory caches after walking the GVA->GPA translation during a
shadow page fault, there is no need to ensure the caches are full when
walking the GVA.  As of commit f5a1e9f895 ("KVM: MMU: remove call
to kvm_mmu_pte_write from walk_addr"), the FNAME(walk_addr) flow no
longer add rmaps via kvm_mmu_pte_write().

This avoids allocating memory in the case that the GVA is unmapped in
the guest, and also provides a paper trail of why/when the memory caches
need to be filled.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-8-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:39 -04:00
Sean Christopherson
832914452a KVM: x86/mmu: Move fast_page_fault() call above mmu_topup_memory_caches()
Avoid refilling the memory caches and potentially slow reclaim/swap when
handling a fast page fault, which does not need to allocate any new
objects.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-7-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:39 -04:00
Sean Christopherson
53a3f48771 KVM: x86/mmu: Try to avoid crashing KVM if a MMU memory cache is empty
Attempt to allocate a new object instead of crashing KVM (and likely the
kernel) if a memory cache is unexpectedly empty.  Use GFP_ATOMIC for the
allocation as the caches are used while holding mmu_lock.  The immediate
BUG_ON() makes the code unnecessarily explosive and led to confusing
minimums being used in the past, e.g. allocating 4 objects where 1 would
suffice.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-6-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:38 -04:00
Sean Christopherson
284aa86868 KVM: x86/mmu: Remove superfluous gotos from mmu_topup_memory_caches()
Return errors directly from mmu_topup_memory_caches() instead of
branching to a label that does the same.

No functional change intended.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:38 -04:00
Sean Christopherson
356ec69adf KVM: x86/mmu: Use consistent "mc" name for kvm_mmu_memory_cache locals
Use "mc" for local variables to shorten line lengths and provide
consistent names, which will be especially helpful when some of the
helpers are moved to common KVM code in future patches.

No functional change intended.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:37 -04:00
Sean Christopherson
45177cccd9 KVM: x86/mmu: Consolidate "page" variant of memory cache helpers
Drop the "page" variants of the topup/free memory cache helpers, using
the existence of an associated kmem_cache to select the correct alloc
or free routine.

No functional change intended.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:37 -04:00
Sean Christopherson
5962bfb748 KVM: x86/mmu: Track the associated kmem_cache in the MMU caches
Track the kmem_cache used for non-page KVM MMU memory caches instead of
passing in the associated kmem_cache when filling the cache.  This will
allow consolidating code and other cleanups.

No functional change intended.

Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 13:29:37 -04:00
Thomas Gleixner
2245d39886 x86/kvm/vmx: Use native read/write_cr2()
read/write_cr2() go throuh the paravirt XXL indirection, but nested VMX in
a XEN_PV guest is not supported.

Use the native variants.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Message-Id: <20200708195322.344731916@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 07:08:42 -04:00
Thomas Gleixner
c3f08ed150 x86/kvm/svm: Use uninstrumented wrmsrl() to restore GS
On guest exit MSR_GS_BASE contains whatever the guest wrote to it and the
first action after returning from the ASM code is to set it to the host
kernel value. This uses wrmsrl() which is interesting at least.

wrmsrl() is either using native_write_msr() or the paravirt variant. The
XEN_PV code is uninteresting as nested SVM in a XEN_PV guest does not work.

But native_write_msr() can be placed out of line by the compiler especially
when paravirtualization is enabled in the kernel configuration. The
function is marked notrace, but still can be probed if
CONFIG_KPROBE_EVENTS_ON_NOTRACE is enabled.

That would be a fatal problem as kprobe events use per-CPU variables which
are GS based and would be accessed with the guest GS. Depending on the GS
value this would either explode in colorful ways or lead to completely
undebugable data corruption.

Aside of that native_write_msr() contains a tracepoint which objtool
complains about as it is invoked from the noinstr section.

As this cannot run inside a XEN_PV guest there is no point in using
wrmsrl(). Use native_wrmsrl() instead which is just a plain native WRMSR
without tracing or anything else attached.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Juergen Gross <jgross@suse.com>
Message-Id: <20200708195322.244847377@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 07:08:41 -04:00
Thomas Gleixner
135961e0a7 x86/kvm/svm: Move guest enter/exit into .noinstr.text
Move the functions which are inside the RCU off region into the
non-instrumentable text section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Message-Id: <20200708195322.144607767@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 07:08:41 -04:00
Thomas Gleixner
3ebccdf373 x86/kvm/vmx: Move guest enter/exit into .noinstr.text
Move the functions which are inside the RCU off region into the
non-instrumentable text section.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Message-Id: <20200708195322.037311579@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 07:08:40 -04:00
Thomas Gleixner
9fc975e9ef x86/kvm/svm: Add hardirq tracing on guest enter/exit
Entering guest mode is more or less the same as returning to user
space. From an instrumentation point of view both leave kernel mode and the
transition to guest or user mode reenables interrupts on the host. In user
mode an interrupt is served directly and in guest mode it causes a VM exit
which then handles or reinjects the interrupt.

The transition from guest mode or user mode to kernel mode disables
interrupts, which needs to be recorded in instrumentation to set the
correct state again.

This is important for e.g. latency analysis because otherwise the execution
time in guest or user mode would be wrongly accounted as interrupt disabled
and could trigger false positives.

Add hardirq tracing to guest enter/exit functions in the same way as it
is done in the user mode enter/exit code, respecting the RCU requirements.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Message-Id: <20200708195321.934715094@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 07:08:39 -04:00
Thomas Gleixner
0642391e21 x86/kvm/vmx: Add hardirq tracing to guest enter/exit
Entering guest mode is more or less the same as returning to user
space. From an instrumentation point of view both leave kernel mode and the
transition to guest or user mode reenables interrupts on the host. In user
mode an interrupt is served directly and in guest mode it causes a VM exit
which then handles or reinjects the interrupt.

The transition from guest mode or user mode to kernel mode disables
interrupts, which needs to be recorded in instrumentation to set the
correct state again.

This is important for e.g. latency analysis because otherwise the execution
time in guest or user mode would be wrongly accounted as interrupt disabled
and could trigger false positives.

Add hardirq tracing to guest enter/exit functions in the same way as it
is done in the user mode enter/exit code, respecting the RCU requirements.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Message-Id: <20200708195321.822002354@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 07:08:39 -04:00
Thomas Gleixner
87fa7f3e98 x86/kvm: Move context tracking where it belongs
Context tracking for KVM happens way too early in the vcpu_run()
code. Anything after guest_enter_irqoff() and before guest_exit_irqoff()
cannot use RCU and should also be not instrumented.

The current way of doing this covers way too much code. Move it closer to
the actual vmenter/exit code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20200708195321.724574345@linutronix.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 07:08:38 -04:00
Maxim Levitsky
841c2be09f kvm: x86: replace kvm_spec_ctrl_test_value with runtime test on the host
To avoid complex and in some cases incorrect logic in
kvm_spec_ctrl_test_value, just try the guest's given value on the host
processor instead, and if it doesn't #GP, allow the guest to set it.

One such case is when host CPU supports STIBP mitigation
but doesn't support IBRS (as is the case with some Zen2 AMD cpus),
and in this case we were giving guest #GP when it tried to use STIBP

The reason why can can do the host test is that IA32_SPEC_CTRL msr is
passed to the guest, after the guest sets it to a non zero value
for the first time (due to performance reasons),
and as as result of this, it is pointless to emulate #GP condition on
this first access, in a different way than what the host CPU does.

This is based on a patch from Sean Christopherson, who suggested this idea.

Fixes: 6441fa6178 ("KVM: x86: avoid incorrect writes to host MSR_IA32_SPEC_CTRL")
Cc: stable@vger.kernel.org
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200708115731.180097-1-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 07:08:38 -04:00
Like Xu
632a4cf57f KVM/x86: pmu: Fix #GP condition check for RDPMC emulation
In guest protected mode, if the current privilege level
is not 0 and the PCE flag in the CR4 register is cleared,
we will inject a #GP for RDPMC usage.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Message-Id: <20200708074409.39028-1-like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 07:08:37 -04:00
Vitaly Kuznetsov
995decb6c4 KVM: x86: take as_id into account when checking PGD
OVMF booted guest running on shadow pages crashes on TRIPLE FAULT after
enabling paging from SMM. The crash is triggered from mmu_check_root() and
is caused by kvm_is_visible_gfn() searching through memslots with as_id = 0
while vCPU may be in a different context (address space).

Introduce kvm_vcpu_is_visible_gfn() and use it from mmu_check_root().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200708140023.1476020-1-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 07:08:37 -04:00
Xiaoyao Li
5668821aef KVM: x86: Move kvm_x86_ops.vcpu_after_set_cpuid() into kvm_vcpu_after_set_cpuid()
kvm_x86_ops.vcpu_after_set_cpuid() is used to update vmx/svm specific
vcpu settings based on updated CPUID settings. So it's supposed to be
called after CPUIDs are updated, i.e., kvm_update_cpuid_runtime().

Currently, kvm_update_cpuid_runtime() only updates CPUID bits of OSXSAVE,
APIC, OSPKE, MWAIT, KVM_FEATURE_PV_UNHALT and CPUID(0xD,0).ebx and
CPUID(0xD, 1).ebx. None of them is consumed by vmx/svm's
update_vcpu_after_set_cpuid(). So there is no dependency between them.

Move kvm_x86_ops.vcpu_after_set_cpuid() into kvm_vcpu_after_set_cpuid() is
obviously more reasonable.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200709043426.92712-6-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 07:08:25 -04:00
Xiaoyao Li
7c1b761be0 KVM: x86: Rename cpuid_update() callback to vcpu_after_set_cpuid()
The name of callback cpuid_update() is misleading that it's not about
updating CPUID settings of vcpu but updating the configurations of vcpu
based on the CPUIDs. So rename it to vcpu_after_set_cpuid().

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200709043426.92712-5-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 07:08:18 -04:00
Xiaoyao Li
346ce3591d KVM: x86: Rename kvm_update_cpuid() to kvm_vcpu_after_set_cpuid()
Now there is no updating CPUID bits behavior in kvm_update_cpuid(),
rename it to kvm_vcpu_after_set_cpuid().

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200709043426.92712-4-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 07:08:08 -04:00
Xiaoyao Li
aedbaf4f6a KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()
Beside called in kvm_vcpu_ioctl_set_cpuid*(), kvm_update_cpuid() is also
called 5 places else in x86.c and 1 place else in lapic.c. All those 6
places only need the part of updating guest CPUIDs (OSXSAVE, OSPKE, APIC,
KVM_FEATURE_PV_UNHALT, ...) based on the runtime vcpu state, so extract
them as a separate kvm_update_cpuid_runtime().

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200709043426.92712-3-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 06:53:49 -04:00
Xiaoyao Li
a76733a987 KVM: x86: Introduce kvm_check_cpuid()
Use kvm_check_cpuid() to validate if userspace provides legal cpuid
settings and call it before KVM take any action to update CPUID or
update vcpu states based on given CPUID settings.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200709043426.92712-2-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 06:49:14 -04:00
Xiaoyao Li
36f37648ca KVM: X86: Move kvm_apic_set_version() to kvm_update_cpuid()
There is no dependencies between kvm_apic_set_version() and
kvm_update_cpuid() because kvm_apic_set_version() queries X2APIC CPUID bit,
which is not touched/changed by kvm_update_cpuid().

Obviously, kvm_apic_set_version() belongs to the category of updating
vcpu model.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200708065054.19713-9-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-09 06:48:09 -04:00
Xiaoyao Li
565b782073 KVM: lapic: Use guest_cpuid_has() in kvm_apic_set_version()
Only code cleanup and no functional change.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200708065054.19713-8-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:22:01 -04:00
Xiaoyao Li
0d3b2ba16b KVM: X86: Go on updating other CPUID leaves when leaf 1 is absent
As handling of bits out of leaf 1 added over time, kvm_update_cpuid()
should not return directly if leaf 1 is absent, but should go on
updateing other CPUID leaves.

Keep the update of apic->lapic_timer.timer_mode_mask in a separate
wrapper, to minimize churn for code since it will be moved out of this
function in a future patch.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200708065054.19713-3-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:22:00 -04:00
Xiaoyao Li
1896409282 KVM: X86: Reset vcpu->arch.cpuid_nent to 0 if SET_CPUID* fails
Current implementation keeps userspace input of CPUID configuration and
cpuid->nent even if kvm_update_cpuid() fails. Reset vcpu->arch.cpuid_nent
to 0 for the case of failure as a simple fix.

Besides, update the doc to explicitly state that if IOCTL SET_CPUID*
fail KVM gives no gurantee that previous valid CPUID configuration is
kept.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200708065054.19713-2-xiaoyao.li@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:22:00 -04:00
Like Xu
2e8cd7a3b8 kvm: x86: limit the maximum number of vPMU fixed counters to 3
Some new Intel platforms (such as TGL) already have the
fourth fixed counter TOPDOWN.SLOTS, but it has not been
fully enabled on KVM and the host.

Therefore, we limit edx.split.num_counters_fixed to 3,
so that it does not break the kvm-unit-tests PMU test
case and bad-handled userspace.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Message-Id: <20200624015928.118614-1-like.xu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:59 -04:00
Krish Sadhukhan
761e416934 KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests
According to section "Canonicalization and Consistency Checks" in APM vol. 2
the following guest state is illegal:

    "Any MBZ bit of CR3 is set."
    "Any MBZ bit of CR4 is set."

Suggeted-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Message-Id: <1594168797-29444-3-git-send-email-krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:59 -04:00
Paolo Bonzini
53efe527ca KVM: x86: Make CR4.VMXE reserved for the guest
CR4.VMXE is reserved unless the VMX CPUID bit is set.  On Intel,
it is also tested by vmx_set_cr4, but AMD relies on kvm_valid_cr4,
so fix it.

Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:58 -04:00
Krish Sadhukhan
b899c13277 KVM: x86: Create mask for guest CR4 reserved bits in kvm_update_cpuid()
Instead of creating the mask for guest CR4 reserved bits in kvm_valid_cr4(),
do it in kvm_update_cpuid() so that it can be reused instead of creating it
each time kvm_valid_cr4() is called.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Message-Id: <1594168797-29444-2-git-send-email-krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:58 -04:00
Jim Mattson
d42e3fae6f kvm: x86: Read PDPTEs on CR0.CD and CR0.NW changes
According to the SDM, when PAE paging would be in use following a
MOV-to-CR0 that modifies any of CR0.CD, CR0.NW, or CR0.PG, then the
PDPTEs are loaded from the address in CR3. Previously, kvm only loaded
the PDPTEs when PAE paging would be in use following a MOV-to-CR0 that
modified CR0.PG.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20200707223630.336700-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:58 -04:00
Sean Christopherson
e47c4aee5b KVM: x86/mmu: Rename page_header() to to_shadow_page()
Rename KVM's accessor for retrieving a 'struct kvm_mmu_page' from the
associated host physical address to better convey what the function is
doing.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200622202034.15093-7-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:55 -04:00
Sean Christopherson
573546820b KVM: x86/mmu: Add sptep_to_sp() helper to wrap shadow page lookup
Introduce sptep_to_sp() to reduce the boilerplate code needed to get the
shadow page associated with a spte pointer, and to improve readability
as it's not immediately obvious that "page_header" is a KVM-specific
accessor for retrieving a shadow page.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200622202034.15093-6-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:55 -04:00
Sean Christopherson
985ab27801 KVM: x86/mmu: Make kvm_mmu_page definition and accessor internal-only
Make 'struct kvm_mmu_page' MMU-only, nothing outside of the MMU should
be poking into the gory details of shadow pages.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200622202034.15093-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:54 -04:00
Sean Christopherson
6ca9a6f3ad KVM: x86/mmu: Add MMU-internal header
Add mmu/mmu_internal.h to hold declarations and definitions that need
to be shared between various mmu/ files, but should not be used by
anything outside of the MMU.

Begin populating mmu_internal.h with declarations of the helpers used by
page_track.c.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200622202034.15093-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:54 -04:00
Sean Christopherson
afe8d7e611 KVM: x86/mmu: Move kvm_mmu_available_pages() into mmu.c
Move kvm_mmu_available_pages() from mmu.h to mmu.c, it has a single
caller and has no business being exposed via mmu.h.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200622202034.15093-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:53 -04:00
Sean Christopherson
33e3042dac KVM: x86/mmu: Move mmu_audit.c and mmutrace.h into the mmu/ sub-directory
Move mmu_audit.c and mmutrace.h under mmu/ where they belong.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200622202034.15093-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:53 -04:00
Sean Christopherson
7bd7ded642 KVM: x86/mmu: Exit to userspace on make_mmu_pages_available() error
Propagate any error returned by make_mmu_pages_available() out to
userspace instead of resuming the guest if the error occurs while
handling a page fault.  Now that zapping the oldest MMU pages skips
active roots, i.e. fails if and only if there are no zappable pages,
there is no chance for a false positive, i.e. no chance of returning a
spurious error to userspace.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200623193542.7554-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:52 -04:00
Sean Christopherson
ebdb292dac KVM: x86/mmu: Batch zap MMU pages when shrinking the slab
Use the recently introduced kvm_mmu_zap_oldest_mmu_pages() to batch zap
MMU pages when shrinking a slab.  This fixes a long standing issue where
KVM's shrinker implementation is completely ineffective due to zapping
only a single page.  E.g. without batch zapping, forcing a scan via
drop_caches basically has no impact on a VM with ~2k shadow pages.  With
batch zapping, the number of shadow pages can be reduced to a few
hundred pages in one or two runs of drop_caches.

Note, if the default batch size (currently 128) is problematic, e.g.
zapping 128 pages holds mmu_lock for too long, KVM can bound the batch
size by setting @batch in mmu_shrinker.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200623193542.7554-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:52 -04:00
Sean Christopherson
6b82ef2c9c KVM: x86/mmu: Batch zap MMU pages when recycling oldest pages
Collect MMU pages for zapping in a loop when making MMU pages available,
and skip over active roots when doing so as zapping an active root can
never immediately free up a page.  Batching the zapping avoids multiple
remote TLB flushes and remedies the issue where the loop would bail
early if an active root was encountered.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200623193542.7554-3-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:51 -04:00
Sean Christopherson
f95eec9bed KVM: x86/mmu: Don't put invalid SPs back on the list of active pages
Delete a shadow page from the invalidation list instead of throwing it
back on the list of active pages when it's a root shadow page with
active users.  Invalid active root pages will be explicitly freed by
mmu_free_root_page() when the root_count hits zero, i.e. they don't need
to be put on the active list to avoid leakage.

Use sp->role.invalid to detect that a shadow page has already been
zapped, i.e. is not on a list.

WARN if an invalid page is encountered when zapping pages, as it should
now be impossible.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200623193542.7554-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:51 -04:00
Sean Christopherson
fb58a9c345 KVM: x86/mmu: Optimize MMU page cache lookup for fully direct MMUs
Skip the unsync checks and the write flooding clearing for fully direct
MMUs, which are guaranteed to not have unsync'd or indirect pages (write
flooding detection only applies to indirect pages).  For TDP, this
avoids unnecessary memory reads and writes, and for the write flooding
count will also avoid dirtying a cache line (unsync_child_bitmap itself
consumes a cache line, i.e. write_flooding_count is guaranteed to be in
a different cache line than parent_ptes).

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200623194027.23135-3-sean.j.christopherson@intel.com>
Reviewed-By: Jon Cargille <jcargill@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:50 -04:00
Sean Christopherson
ac101b7cb1 KVM: x86/mmu: Avoid multiple hash lookups in kvm_get_mmu_page()
Refactor for_each_valid_sp() to take the list of shadow pages instead of
retrieving it from a gfn to avoid doing the gfn->list hash and lookup
multiple times during kvm_get_mmu_page().

Cc: Peter Feiner <pfeiner@google.com>
Cc: Jon Cargille <jcargill@google.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200623194027.23135-2-sean.j.christopherson@intel.com>
Reviewed-By: Jon Cargille <jcargill@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:50 -04:00
Joerg Roedel
01c3b2b5cd KVM: SVM: Rename svm_nested_virtualize_tpr() to nested_svm_virtualize_tpr()
Match the naming with other nested svm functions.

No functional changes.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Message-Id: <20200625080325.28439-5-joro@8bytes.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:49 -04:00
Joerg Roedel
a284ba56a0 KVM: SVM: Add svm_ prefix to set/clr/is_intercept()
Make clear the symbols belong to the SVM code when they are built-in.

No functional changes.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Message-Id: <20200625080325.28439-4-joro@8bytes.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:48 -04:00
Joerg Roedel
06e7852c0f KVM: SVM: Add vmcb_ prefix to mark_*() functions
Make it more clear what data structure these functions operate on.

No functional changes.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Message-Id: <20200625080325.28439-3-joro@8bytes.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:48 -04:00
Joerg Roedel
7693b3eb53 KVM: SVM: Rename struct nested_state to svm_nested_state
Renaming is only needed in the svm.h header file.

No functional changes.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Message-Id: <20200625080325.28439-2-joro@8bytes.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:47 -04:00
Sean Christopherson
b2656e4d8b KVM: nVMX: Wrap VM-Fail valid path in generic VM-Fail helper
Add nested_vmx_fail() to wrap VM-Fail paths that _may_ result in VM-Fail
Valid to make it clear at the call sites that the Valid flavor isn't
guaranteed.

Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200609015607.6994-1-sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:47 -04:00
Jim Mattson
c967118ddb kvm: x86: Set last_vmentry_cpu in vcpu_enter_guest
Since this field is now in kvm_vcpu_arch, clean things up a little by
setting it in vendor-agnostic code: vcpu_enter_guest. Note that it
must be set after the call to kvm_x86_ops.run(), since it can't be
updated before pre_sev_run().

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20200603235623.245638-7-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:46 -04:00
Jim Mattson
8a14fe4f0c kvm: x86: Move last_cpu into kvm_vcpu_arch as last_vmentry_cpu
Both the vcpu_vmx structure and the vcpu_svm structure have a
'last_cpu' field. Move the common field into the kvm_vcpu_arch
structure. For clarity, rename it to 'last_vmentry_cpu.'

Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20200603235623.245638-6-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:45 -04:00
Jim Mattson
1aa561b1a4 kvm: x86: Add "last CPU" to some KVM_EXIT information
More often than not, a failed VM-entry in an x86 production
environment is induced by a defective CPU. To help identify the bad
hardware, include the id of the last logical CPU to run a vCPU in the
information provided to userspace on a KVM exit for failed VM-entry or
for KVM internal errors not associated with emulation. The presence of
this additional information is indicated by a new capability,
KVM_CAP_LAST_CPU.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20200603235623.245638-5-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:45 -04:00
Jim Mattson
80a1684c01 kvm: vmx: Add last_cpu to struct vcpu_vmx
As we already do in svm, record the last logical processor on which a
vCPU has run, so that it can be communicated to userspace for
potential hardware errors.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20200603235623.245638-4-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:43 -04:00
Jim Mattson
242636343c kvm: svm: Always set svm->last_cpu on VMRUN
Previously, this field was only set when using SEV. Set it for all
vCPU configurations, so that it can be communicated to userspace for
diagnosing potential hardware errors.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20200603235623.245638-3-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:43 -04:00
Jim Mattson
73cd6e5f7f kvm: svm: Prefer vcpu->cpu to raw_smp_processor_id()
The current logical processor id is cached in vcpu->cpu. Use it
instead of raw_smp_processor_id() when a kvm_vcpu struct is available.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Message-Id: <20200603235623.245638-2-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:42 -04:00
Paolo Bonzini
a8d908b587 KVM: x86: report sev_pin_memory errors with PTR_ERR
Callers of sev_pin_memory() treat
NULL differently:

sev_launch_secret()/svm_register_enc_region() return -ENOMEM
sev_dbg_crypt() returns -EFAULT.

Switching to ERR_PTR() preserves the error and enables cleaner reporting of
different kinds of failures.

Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:42 -04:00
John Hubbard
dc42c8ae0a KVM: SVM: convert get_user_pages() --> pin_user_pages()
This code was using get_user_pages*(), in a "Case 2" scenario
(DMA/RDMA), using the categorization from [1]. That means that it's
time to convert the get_user_pages*() + put_page() calls to
pin_user_pages*() + unpin_user_pages() calls.

There is some helpful background in [2]: basically, this is a small
part of fixing a long-standing disconnect between pinning pages, and
file systems' use of those pages.

[1] Documentation/core-api/pin_user_pages.rst

[2] "Explicit pinning of user-space pages":
    https://lwn.net/Articles/807108/

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Message-Id: <20200526062207.1360225-3-jhubbard@nvidia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:41 -04:00
John Hubbard
78824fabc7 KVM: SVM: fix svn_pin_memory()'s use of get_user_pages_fast()
There are two problems in svn_pin_memory():

1) The return value of get_user_pages_fast() is stored in an
unsigned long, although the declared return value is of type int.
This will not cause any symptoms, but it is misleading.
Fix this by changing the type of npinned to "int".

2) The number of pages passed into get_user_pages_fast() is stored
in an unsigned long, even though get_user_pages_fast() accepts an
int. This means that it is possible to silently overflow the number
of pages.

Fix this by adding a WARN_ON_ONCE() and an early error return. The
npages variable is left as an unsigned long for convenience in
checking for overflow.

Fixes: 89c5058090 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_UPDATE_DATA command")
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Message-Id: <20200526062207.1360225-2-jhubbard@nvidia.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:41 -04:00
Krish Sadhukhan
1aef8161b3 KVM: nSVM: Check that DR6[63:32] and DR7[64:32] are not set on vmrun of nested guests
According to section "Canonicalization and Consistency Checks" in APM vol. 2
the following guest state is illegal:

    "DR6[63:32] are not zero."
    "DR7[63:32] are not zero."
    "Any MBZ bit of EFER is set."

Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Message-Id: <20200522221954.32131-3-krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:41 -04:00
Krish Sadhukhan
f5f6145e41 KVM: x86: Move the check for upper 32 reserved bits of DR6 to separate function
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Message-Id: <20200522221954.32131-2-krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:40 -04:00
Peter Xu
12bc2132b1 KVM: X86: Do the same ignore_msrs check for feature msrs
Logically the ignore_msrs and report_ignored_msrs should also apply to feature
MSRs.  Add them in.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200622220442.21998-3-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:40 -04:00
Peter Xu
6abe9c1386 KVM: X86: Move ignore_msrs handling upper the stack
MSR accesses can be one of:

  (1) KVM internal access,
  (2) userspace access (e.g., via KVM_SET_MSRS ioctl),
  (3) guest access.

The ignore_msrs was previously handled by kvm_get_msr_common() and
kvm_set_msr_common(), which is the bottom of the msr access stack.  It's
working in most cases, however it could dump unwanted warning messages to dmesg
even if kvm get/set the msrs internally when calling __kvm_set_msr() or
__kvm_get_msr() (e.g. kvm_cpuid()).  Ideally we only want to trap cases (2)
or (3), but not (1) above.

To achieve this, move the ignore_msrs handling upper until the callers of
__kvm_get_msr() and __kvm_set_msr().  To identify the "msr missing" event, a
new return value (KVM_MSR_RET_INVALID==2) is used for that.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20200622220442.21998-2-peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:39 -04:00
Sean Christopherson
02f5fb2e69 KVM: x86/mmu: Make .write_log_dirty a nested operation
Move .write_log_dirty() into kvm_x86_nested_ops to help differentiate it
from the non-nested dirty log hooks.  And because it's a nested-only
operation.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200622215832.22090-5-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:38 -04:00
Sean Christopherson
2f1d48aae2 KVM: nVMX: WARN if PML emulation helper is invoked outside of nested guest
WARN if vmx_write_pml_buffer() is called outside of guest mode instead
of silently ignoring the condition.  The only caller is nested EPT's
ept_update_accessed_dirty_bits(), which should only be reachable when
L2 is active.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200622215832.22090-4-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-08 16:21:37 -04:00