Commit Graph

1980 Commits

Author SHA1 Message Date
Babu Moger
0107973a80 KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch
SEV guests fail to boot on a system that supports the PCID feature.

While emulating the RSM instruction, KVM reads the guest CR3
and calls kvm_set_cr3(). If the vCPU is in the long mode,
kvm_set_cr3() does a sanity check for the CR3 value. In this case,
it validates whether the value has any reserved bits set. The
reserved bit range is 63:cpuid_maxphysaddr(). When AMD memory
encryption is enabled, the memory encryption bit is set in the CR3
value. The memory encryption bit may fall within the KVM reserved
bit range, causing the KVM emulation failure.

Introduce a new field cr3_lm_rsvd_bits in kvm_vcpu_arch which will
cache the reserved bits in the CR3 value. This will be initialized
to rsvd_bits(cpuid_maxphyaddr(vcpu), 63).

If the architecture has any special bits(like AMD SEV encryption bit)
that needs to be masked from the reserved bits, should be cleared
in vendor specific kvm_x86_ops.vcpu_after_set_cpuid handler.

Fixes: a780a3ea62 ("KVM: X86: Fix reserved bits check for MOV to CR3")
Signed-off-by: Babu Moger <babu.moger@amd.com>
Message-Id: <160521947657.32054.3264016688005356563.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-13 06:28:37 -05:00
Pankaj Gupta
2cdef91cf8 KVM: x86: handle MSR_IA32_DEBUGCTLMSR with report_ignored_msrs
Windows2016 guest tries to enable LBR by setting the corresponding bits
in MSR_IA32_DEBUGCTLMSR. KVM does not emulate MSR_IA32_DEBUGCTLMSR and
spams the host kernel logs with error messages like:

	kvm [...]: vcpu1, guest rIP: 0xfffff800a8b687d3 kvm_set_msr_common: MSR_IA32_DEBUGCTLMSR 0x1, nop"

This patch fixes this by enabling error logging only with
'report_ignored_msrs=1'.

Signed-off-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Message-Id: <20201105153932.24316-1-pankaj.gupta.linux@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08 04:41:31 -05:00
Oliver Upton
1e293d1ae8 kvm: x86: request masterclock update any time guest uses different msr
Commit 5b9bb0ebbc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME)
emulation in helper fn", 2020-10-21) subtly changed the behavior of guest
writes to MSR_KVM_SYSTEM_TIME(_NEW). Restore the previous behavior; update
the masterclock any time the guest uses a different msr than before.

Fixes: 5b9bb0ebbc ("kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fn", 2020-10-21)
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20201027231044.655110-6-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08 04:41:30 -05:00
Oliver Upton
01b4f510b9 kvm: x86: ensure pv_cpuid.features is initialized when enabling cap
Make the paravirtual cpuid enforcement mechanism idempotent to ioctl()
ordering by updating pv_cpuid.features whenever userspace requests the
capability. Extract this update out of kvm_update_cpuid_runtime() into a
new helper function and move its other call site into
kvm_vcpu_after_set_cpuid() where it more likely belongs.

Fixes: 66570e966d ("kvm: x86: only provide PV features if enabled in guest's CPUID")
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20201027231044.655110-5-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08 04:41:29 -05:00
Oliver Upton
1930e5ddce kvm: x86: reads of restricted pv msrs should also result in #GP
commit 66570e966d ("kvm: x86: only provide PV features if enabled in
guest's CPUID") only protects against disallowed guest writes to KVM
paravirtual msrs, leaving msr reads unchecked. Fix this by enforcing
KVM_CPUID_FEATURES for msr reads as well.

Fixes: 66570e966d ("kvm: x86: only provide PV features if enabled in guest's CPUID")
Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Message-Id: <20201027231044.655110-4-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08 04:41:29 -05:00
Maxim Levitsky
cc4cb01767 KVM: x86: use positive error values for msr emulation that causes #GP
Recent introduction of the userspace msr filtering added code that uses
negative error codes for cases that result in either #GP delivery to
the guest, or handled by the userspace msr filtering.

This breaks an assumption that a negative error code returned from the
msr emulation code is a semi-fatal error which should be returned
to userspace via KVM_RUN ioctl and usually kill the guest.

Fix this by reusing the already existing KVM_MSR_RET_INVALID error code,
and by adding a new KVM_MSR_RET_FILTERED error code for the
userspace filtered msrs.

Fixes: 291f35fb2c1d1 ("KVM: x86: report negative values from wrmsr emulation to userspace")
Reported-by: Qian Cai <cai@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201101115523.115780-1-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-08 04:41:28 -05:00
Takashi Iwai
d383b3146d KVM: x86: Fix NULL dereference at kvm_msr_ignored_check()
The newly introduced kvm_msr_ignored_check() tries to print error or
debug messages via vcpu_*() macros, but those may cause Oops when NULL
vcpu is passed for KVM_GET_MSRS ioctl.

Fix it by replacing the print calls with kvm_*() macros.

(Note that this will leave vcpu argument completely unused in the
 function, but I didn't touch it to make the fix as small as
 possible.  A clean up may be applied later.)

Fixes: 12bc2132b1 ("KVM: X86: Do the same ignore_msrs check for feature msrs")
BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1178280
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Message-Id: <20201030151414.20165-1-tiwai@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-30 13:40:31 -04:00
Paolo Bonzini
c0623f5e5d Merge branch 'kvm-fixes' into 'next'
Pick up bugfixes from 5.9, otherwise various tests fail.
2020-10-21 18:05:58 -04:00
Maxim Levitsky
72f211ecaa KVM: x86: allow kvm_x86_ops.set_efer to return an error value
This will be used to signal an error to the userspace, in case
the vendor code failed during handling of this msr. (e.g -ENOMEM)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201001112954.6258-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:48:48 -04:00
Maxim Levitsky
7dffecaf4e KVM: x86: report negative values from wrmsr emulation to userspace
This will allow the KVM to report such errors (e.g -ENOMEM)
to the userspace.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201001112954.6258-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:36:34 -04:00
Maxim Levitsky
36385ccc9b KVM: x86: xen_hvm_config: cleanup return values
Return 1 on errors that are caused by wrong guest behavior
(which will inject #GP to the guest)

And return a negative error value on issues that are
the kernel's fault (e.g -ENOMEM)

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20201001112954.6258-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:36:34 -04:00
Vitaly Kuznetsov
255cbecfe0 KVM: x86: allocate vcpu->arch.cpuid_entries dynamically
The current limit for guest CPUID leaves (KVM_MAX_CPUID_ENTRIES, 80)
is reported to be insufficient but before we bump it let's switch to
allocating vcpu->arch.cpuid_entries[] array dynamically. Currently,
'struct kvm_cpuid_entry2' is 40 bytes so vcpu->arch.cpuid_entries is
3200 bytes which accounts for 1/4 of the whole 'struct kvm_vcpu_arch'
but having it pre-allocated (for all vCPUs which we also pre-allocate)
gives us no real benefits.

Another plus of the dynamic allocation is that we now do kvm_check_cpuid()
check before we assign anything to vcpu->arch.cpuid_nent/cpuid_entries so
no changes are made in case the check fails.

Opportunistically remove unneeded 'out' labels from
kvm_vcpu_ioctl_set_cpuid()/kvm_vcpu_ioctl_set_cpuid2() and return
directly whenever possible.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20201001130541.1398392-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
2020-10-21 17:36:33 -04:00
Oliver Upton
66570e966d kvm: x86: only provide PV features if enabled in guest's CPUID
KVM unconditionally provides PV features to the guest, regardless of the
configured CPUID. An unwitting guest that doesn't check
KVM_CPUID_FEATURES before use could access paravirt features that
userspace did not intend to provide. Fix this by checking the guest's
CPUID before performing any paravirtual operations.

Introduce a capability, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, to gate the
aforementioned enforcement. Migrating a VM from a host w/o this patch to
a host with this patch could silently change the ABI exposed to the
guest, warranting that we default to the old behavior and opt-in for
the new one.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Change-Id: I202a0926f65035b872bfe8ad15307c026de59a98
Message-Id: <20200818152429.1923996-4-oupton@google.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:36:32 -04:00
Oliver Upton
210dfd93ea kvm: x86: set wall_clock in kvm_write_wall_clock()
Small change to avoid meaningless duplication in the subsequent patch.
No functional change intended.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Change-Id: I77ab9cdad239790766b7a49d5cbae5e57a3005ea
Message-Id: <20200818152429.1923996-3-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:36:31 -04:00
Oliver Upton
5b9bb0ebbc kvm: x86: encapsulate wrmsr(MSR_KVM_SYSTEM_TIME) emulation in helper fn
No functional change intended.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Change-Id: I7cbe71069db98d1ded612fd2ef088b70e7618426
Message-Id: <20200818152429.1923996-2-oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:36:31 -04:00
Paolo Bonzini
043248b328 KVM: VMX: Forbid userspace MSR filters for x2APIC
Allowing userspace to intercept reads to x2APIC MSRs when APICV is
fully enabled for the guest simply can't work.   But more in general,
the LAPIC could be set to in-kernel after the MSR filter is setup
and allowing accesses by userspace would be very confusing.

We could in principle allow userspace to intercept reads and writes to TPR,
and writes to EOI and SELF_IPI, but while that could be made it work, it
would still be silly.

Cc: Alexander Graf <graf@amazon.com>
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:36:24 -04:00
Sean Christopherson
9389b9d5d3 KVM: VMX: Ignore userspace MSR filters for x2APIC
Rework the resetting of the MSR bitmap for x2APIC MSRs to ignore userspace
filtering.  Allowing userspace to intercept reads to x2APIC MSRs when
APICV is fully enabled for the guest simply can't work; the LAPIC and thus
virtual APIC is in-kernel and cannot be directly accessed by userspace.
To keep things simple we will in fact forbid intercepting x2APIC MSRs
altogether, independent of the default_allow setting.

Cc: Alexander Graf <graf@amazon.com>
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201005195532.8674-3-sean.j.christopherson@intel.com>
[Modified to operate even if APICv is disabled, adjust documentation. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-21 17:36:19 -04:00
Paolo Bonzini
0c899c25d7 KVM: x86: do not attempt TSC synchronization on guest writes
KVM special-cases writes to MSR_IA32_TSC so that all CPUs have
the same base for the TSC.  This logic is complicated, and we
do not want it to have any effect once the VM is started.

In particular, if any guest started to synchronize its TSCs
with writes to MSR_IA32_TSC rather than MSR_IA32_TSC_ADJUST,
the additional effect of kvm_write_tsc code would be uncharted
territory.

Therefore, this patch makes writes to MSR_IA32_TSC behave
essentially the same as writes to MSR_IA32_TSC_ADJUST when
they come from the guest.  A new selftest (which passes
both before and after the patch) checks the current semantics
of writes to MSR_IA32_TSC and MSR_IA32_TSC_ADJUST originating
from both the host and the guest.

Upcoming work to remove the special side effects
of host-initiated writes to MSR_IA32_TSC and MSR_IA32_TSC_ADJUST
will be able to build onto this test, adjusting the host side
to use the new APIs and achieve the same effect.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:59:52 -04:00
Paolo Bonzini
729c15c20f KVM: x86: rename KVM_REQ_GET_VMCS12_PAGES
We are going to use it for SVM too, so use a more generic name.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:58:49 -04:00
Alexander Graf
1a155254ff KVM: x86: Introduce MSR filtering
It's not desireable to have all MSRs always handled by KVM kernel space. Some
MSRs would be useful to handle in user space to either emulate behavior (like
uCode updates) or differentiate whether they are valid based on the CPU model.

To allow user space to specify which MSRs it wants to see handled by KVM,
this patch introduces a new ioctl to push filter rules with bitmaps into
KVM. Based on these bitmaps, KVM can then decide whether to reject MSR access.
With the addition of KVM_CAP_X86_USER_SPACE_MSR it can also deflect the
denied MSR events to user space to operate on.

If no filter is populated, MSR handling stays identical to before.

Signed-off-by: Alexander Graf <graf@amazon.com>

Message-Id: <20200925143422.21718-8-graf@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:58:08 -04:00
Alexander Graf
51de8151bd KVM: x86: Add infrastructure for MSR filtering
In the following commits we will add pieces of MSR filtering.
To ensure that code compiles even with the feature half-merged, let's add
a few stubs and struct definitions before the real patches start.

Signed-off-by: Alexander Graf <graf@amazon.com>

Message-Id: <20200925143422.21718-4-graf@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:58:05 -04:00
Alexander Graf
1ae099540e KVM: x86: Allow deflecting unknown MSR accesses to user space
MSRs are weird. Some of them are normal control registers, such as EFER.
Some however are registers that really are model specific, not very
interesting to virtualization workloads, and not performance critical.
Others again are really just windows into package configuration.

Out of these MSRs, only the first category is necessary to implement in
kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against
certain CPU models and MSRs that contain information on the package level
are much better suited for user space to process. However, over time we have
accumulated a lot of MSRs that are not the first category, but still handled
by in-kernel KVM code.

This patch adds a generic interface to handle WRMSR and RDMSR from user
space. With this, any future MSR that is part of the latter categories can
be handled in user space.

Furthermore, it allows us to replace the existing "ignore_msrs" logic with
something that applies per-VM rather than on the full system. That way you
can run productive VMs in parallel to experimental ones where you don't care
about proper MSR handling.

Signed-off-by: Alexander Graf <graf@amazon.com>
Reviewed-by: Jim Mattson <jmattson@google.com>

Message-Id: <20200925143422.21718-3-graf@amazon.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:58:04 -04:00
Alexander Graf
90218e434c KVM: x86: Return -ENOENT on unimplemented MSRs
When we find an MSR that we can not handle, bubble up that error code as
MSR error return code. Follow up patches will use that to expose the fact
that an MSR is not handled by KVM to user space.

Suggested-by: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20200925143422.21718-2-graf@amazon.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:58:02 -04:00
Sean Christopherson
7e34fbd05c KVM: x86: Rename "shared_msrs" to "user_return_msrs"
Rename the "shared_msrs" mechanism, which is used to defer restoring
MSRs that are only consumed when running in userspace, to a more banal
but less likely to be confusing "user_return_msrs".

The "shared" nomenclature is confusing as it's not obvious who is
sharing what, e.g. reasonable interpretations are that the guest value
is shared by vCPUs in a VM, or that the MSR value is shared/common to
guest and host, both of which are wrong.

"shared" is also misleading as the MSR value (in hardware) is not
guaranteed to be shared/reused between VMs (if that's indeed the correct
interpretation of the name), as the ability to share values between VMs
is simply a side effect (albiet a very nice side effect) of deferring
restoration of the host value until returning from userspace.

"user_return" avoids the above confusion by describing the mechanism
itself instead of its effects.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923180409.32255-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:54 -04:00
Sean Christopherson
b2d522552c KVM: x86: Add RIP to the kvm_entry, i.e. VM-Enter, tracepoint
Add RIP to the kvm_entry tracepoint to help debug if the kvm_exit
tracepoint is disabled or if VM-Enter fails, in which case the kvm_exit
tracepoint won't be hit.

Read RIP from within the tracepoint itself to avoid a potential VMREAD
and retpoline if the guest's RIP isn't available.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923201349.16097-2-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:49 -04:00
Sean Christopherson
09e3e2a1cc KVM: x86: Add kvm_x86_ops hook to short circuit emulation
Replace the existing kvm_x86_ops.need_emulation_on_page_fault() with a
more generic is_emulatable(), and unconditionally call the new function
in x86_emulate_instruction().

KVM will use the generic hook to support multiple security related
technologies that prevent emulation in one way or another.  Similar to
the existing AMD #NPF case where emulation of the current instruction is
not possible due to lack of information, AMD's SEV-ES and Intel's SGX
and TDX will introduce scenarios where emulation is impossible due to
the guest's register state being inaccessible.  And again similar to the
existing #NPF case, emulation can be initiated by kvm_mmu_page_fault(),
i.e. outside of the control of vendor-specific code.

While the cause and architecturally visible behavior of the various
cases are different, e.g. SGX will inject a #UD, AMD #NPF is a clean
resume or complete shutdown, and SEV-ES and TDX "return" an error, the
impact on the common emulation code is identical: KVM must stop
emulation immediately and resume the guest.

Query is_emulatable() in handle_ud() as well so that the
force_emulation_prefix code doesn't incorrectly modify RIP before
calling emulate_instruction() in the absurdly unlikely scenario that
KVM encounters forced emulation in conjunction with "do not emulate".

Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200915232702.15945-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:19 -04:00
Maxim Levitsky
cc5b54dd58 KVM: x86: fix MSR_IA32_TSC read for nested migration
MSR reads/writes should always access the L1 state, since the (nested)
hypervisor should intercept all the msrs it wants to adjust, and these
that it doesn't should be read by the guest as if the host had read it.

However IA32_TSC is an exception. Even when not intercepted, guest still
reads the value + TSC offset.
The write however does not take any TSC offset into account.

This is documented in Intel's SDM and seems also to happen on AMD as well.

This creates a problem when userspace wants to read the IA32_TSC value and then
write it. (e.g for migration)

In this case it reads L2 value but write is interpreted as an L1 value.
To fix this make the userspace initiated reads of IA32_TSC return L1 value
as well.

Huge thanks to Dave Gilbert for helping me understand this very confusing
semantic of MSR writes.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200921103805.9102-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:18 -04:00
Babu Moger
9715092f8d KVM: X86: Move handling of INVPCID types to x86
INVPCID instruction handling is mostly same across both VMX and
SVM. So, move the code to common x86.c.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <159985255212.11252.10322694343971983487.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:17 -04:00
Babu Moger
3f3393b3ce KVM: X86: Rename and move the function vmx_handle_memory_failure to x86.c
Handling of kvm_read/write_guest_virt*() errors can be moved to common
code. The same code can be used by both VMX and SVM.

Signed-off-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <159985254493.11252.6603092560732507607.stgit@bmoger-ubuntu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:16 -04:00
Wanpeng Li
68ca7663c7 KVM: LAPIC: Narrow down the kick target vCPU
The kick after setting KVM_REQ_PENDING_TIMER is used to handle the timer
fires on a different pCPU which vCPU is running on.  This kick costs about
1000 clock cycles and we don't need this when injecting already-expired
timer or when using the VMX preemption timer because
kvm_lapic_expired_hv_timer() is called from the target vCPU.

Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1599731444-3525-6-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-28 07:57:09 -04:00
Sean Christopherson
8d214c4816 KVM: x86: Reset MMU context if guest toggles CR4.SMAP or CR4.PKE
Reset the MMU context during kvm_set_cr4() if SMAP or PKE is toggled.
Recent commits to (correctly) not reload PDPTRs when SMAP/PKE are
toggled inadvertantly skipped the MMU context reset due to the mask
of bits that triggers PDPTR loads also being used to trigger MMU context
resets.

Fixes: 427890aff8 ("kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode")
Fixes: cb957adb4e ("kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode")
Cc: Jim Mattson <jmattson@google.com>
Cc: Peter Shier <pshier@google.com>
Cc: Oliver Upton <oupton@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200923215352.17756-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-25 08:56:35 -04:00
Maxim Levitsky
ee6fa05301 KVM: x86: fix MSR_IA32_TSC read for nested migration
MSR reads/writes should always access the L1 state, since the (nested)
hypervisor should intercept all the msrs it wants to adjust, and these
that it doesn't should be read by the guest as if the host had read it.

However IA32_TSC is an exception. Even when not intercepted, guest still
reads the value + TSC offset.
The write however does not take any TSC offset into account.

This is documented in Intel's SDM and seems also to happen on AMD as well.

This creates a problem when userspace wants to read the IA32_TSC value and then
write it. (e.g for migration)

In this case it reads L2 value but write is interpreted as an L1 value.
To fix this make the userspace initiated reads of IA32_TSC return L1 value
as well.

Huge thanks to Dave Gilbert for helping me understand this very confusing
semantic of MSR writes.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200921103805.9102-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-24 13:35:07 -04:00
Mohammed Gamal
b96e6506c2 KVM: x86: VMX: Make smaller physical guest address space support user-configurable
This patch exposes allow_smaller_maxphyaddr to the user as a module parameter.
Since smaller physical address spaces are only supported on VMX, the
parameter is only exposed in the kvm_intel module.

For now disable support by default, and let the user decide if they want
to enable it.

Modifications to VMX page fault and EPT violation handling will depend
on whether that parameter is enabled.

Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Message-Id: <20200903141122.72908-1-mgamal@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-23 09:47:24 -04:00
Paolo Bonzini
bf3c0e5e71 Merge branch 'x86-seves-for-paolo' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD 2020-09-22 06:43:17 -04:00
Vitaly Kuznetsov
d831de1772 KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_EN
Even without in-kernel LAPIC we should allow writing '0' to
MSR_KVM_ASYNC_PF_EN as we're not enabling the mechanism. In
particular, QEMU with 'kernel-irqchip=off' fails to start
a guest with

qemu-system-x86_64: error: failed to set MSR 0x4b564d02 to 0x0

Fixes: 9d3c447c72 ("KVM: X86: Fix async pf caused null-ptr-deref")
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200911093147.484565-1-vkuznets@redhat.com>
[Actually commit the version proposed by Sean Christopherson. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-11 13:26:47 -04:00
Paolo Bonzini
1b67fd086d KVM/arm64 fixes for Linux 5.9, take #1
- Multiple stolen time fixes, with a new capability to match x86
 - Fix for hugetlbfs mappings when PUD and PMD are the same level
 - Fix for hugetlbfs mappings when PTE mappings are enforced
   (dirty logging, for example)
 - Fix tracing output of 64bit values
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl9SGP4PHG1hekBrZXJu
 ZWwub3JnAAoJECPQ0LrRPXpDWSoP/iZdsgkcMXM1qGRpMtn0FDVQiL2zd8QOEki+
 /ldkeLvcUC9aqVOWdCM1fTweIvyPM0KSRxfbQRVGRGXym9bUMzx1laSl0CLXgi9q
 fa/lbmZOLG5PovQLe8eNon6aXybWWwuxfh/dhpBWLg5VGb0fXFutH3Cs2MGbX/Ec
 6qMvbwO09SGSTrXSQepbhFkAmViBAgUH2kiRhKReDfvRc4OwsbxdlmA5r9Ek6R5L
 x7tH8mYwOJVP1OEZWYRX+9GG1n8hCIKKSuRrhMbZtErTSNdf+YhldC6DuJF0zWVE
 nsoKzIEl15kIl0akkC6oA3MLNX1sfRh9C2J85Rt+odN4fY4MidrfcqRgWE2X3cuu
 CKDsr0Lb6aTtfZASHm7QQRsM4hujWoArBq6ZvUNjpfNXOPe4ovxX9TkPHp6OezuK
 v3PRzXQxUtmreK+02ZzalL6IBwAQrmLKxXM2P2Nuh4gDMgFC/BrHMxc1QVSnmb/m
 flMuKtvm+fkwKySQvX22FZrzhPfCMAuxCh28WdDSW2pnmZ8H0M3922y45xw3QTpg
 SltMtIcpO6ipzMsrVvO/hI/GvByFNN6jcLVGUV1Wx8mNdcf2kPeebA4hINKt5UDh
 gpwkz4zb2Bgp/YdgiG8NzBjpk2FMO0IPiAnouPrXenizbesAlKR2V3uFqa70PW/A
 BBkHnakS
 =VBrL
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-fixes-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for Linux 5.9, take #1

- Multiple stolen time fixes, with a new capability to match x86
- Fix for hugetlbfs mappings when PUD and PMD are the same level
- Fix for hugetlbfs mappings when PTE mappings are enforced
  (dirty logging, for example)
- Fix tracing output of 64bit values
2020-09-11 13:12:11 -04:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Linus Torvalds
b2d9e99622 * PAE and PKU bugfixes for x86
* selftests fix for new binutils
 * MMU notifier fix for arm64
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl9ARnoUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroP2YAf/dgLrPm4y4jxm7Aiz3/txqrHEwogT
 ZtvnzqUPb6+vkFrkop8QMOPw7A8NCfkn3/6sWbyUN5ObgOG1pxKyPraeN3ZdsDoR
 KGwv6P0dKgI8B4UuGEMe9GazXv+oOv8+bSUJnE+HZiUHzJKlX4HJbxDwUhvSSatY
 qYCZb/Uzqundh79TYULa7oI1/3F15A2J1zQPe4QgkToH9tsVB8PVfkH5uPJPp64M
 DTm5+qgwwsBULFaAuuo3FTs9f3pWJxn8GOuico1Sm+RnR53mhbUJggUfFzP0rwzZ
 Emevunje5r1rluFs+JWeNtflGH0gI4CLak7jvlOOBjrNb5XJgUSbzLXxkA==
 =Jwic
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:

 - PAE and PKU bugfixes for x86

 - selftests fix for new binutils

 - MMU notifier fix for arm64

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set
  KVM: Pass MMU notifier range flags to kvm_unmap_hva_range()
  kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode
  kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode
  KVM: x86: fix access code passed to gva_to_gpa
  selftests: kvm: Use a shorter encoding to clear RAX
2020-08-22 10:03:05 -07:00
Andrew Jones
004a01241c arm64/x86: KVM: Introduce steal-time cap
arm64 requires a vcpu fd (KVM_HAS_DEVICE_ATTR vcpu ioctl) to probe
support for steal-time. However this is unnecessary, as only a KVM
fd is required, and it complicates userspace (userspace may prefer
delaying vcpu creation until after feature probing). Introduce a cap
that can be checked instead. While x86 can already probe steal-time
support with a kvm fd (KVM_GET_SUPPORTED_CPUID), we add the cap there
too for consistency.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20200804170604.42662-7-drjones@redhat.com
2020-08-21 14:05:19 +01:00
Jim Mattson
cb957adb4e kvm: x86: Toggling CR4.PKE does not load PDPTEs in PAE mode
See the SDM, volume 3, section 4.4.1:

If PAE paging would be in use following an execution of MOV to CR0 or
MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of
CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then
the PDPTEs are loaded from the address in CR3.

Fixes: b9baba8614 ("KVM, pkeys: expose CPUID/CR4 to guest")
Cc: Huaitong Han <huaitong.han@intel.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Message-Id: <20200817181655.3716509-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-17 15:24:08 -04:00
Jim Mattson
427890aff8 kvm: x86: Toggling CR4.SMAP does not load PDPTEs in PAE mode
See the SDM, volume 3, section 4.4.1:

If PAE paging would be in use following an execution of MOV to CR0 or
MOV to CR4 (see Section 4.1.1) and the instruction is modifying any of
CR0.CD, CR0.NW, CR0.PG, CR4.PAE, CR4.PGE, CR4.PSE, or CR4.SMEP; then
the PDPTEs are loaded from the address in CR3.

Fixes: 0be0226f07 ("KVM: MMU: fix SMAP virtualization")
Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Message-Id: <20200817181655.3716509-2-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-17 15:23:50 -04:00
Paolo Bonzini
19cf4b7eef KVM: x86: fix access code passed to gva_to_gpa
The PK bit of the error code is computed dynamically in permission_fault
and therefore need not be passed to gva_to_gpa: only the access bits
(fetch, user, write) need to be passed down.

Not doing so causes a splat in the pku test:

   WARNING: CPU: 25 PID: 5465 at arch/x86/kvm/mmu.h:197 paging64_walk_addr_generic+0x594/0x750 [kvm]
   Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0014.D62.2001092233 01/09/2020
   RIP: 0010:paging64_walk_addr_generic+0x594/0x750 [kvm]
   Code: <0f> 0b e9 db fe ff ff 44 8b 43 04 4c 89 6c 24 30 8b 13 41 39 d0 89
   RSP: 0018:ff53778fc623fb60 EFLAGS: 00010202
   RAX: 0000000000000001 RBX: ff53778fc623fbf0 RCX: 0000000000000007
   RDX: 0000000000000001 RSI: 0000000000000002 RDI: ff4501efba818000
   RBP: 0000000000000020 R08: 0000000000000005 R09: 00000000004000e7
   R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000007
   R13: ff4501efba818388 R14: 10000000004000e7 R15: 0000000000000000
   FS:  00007f2dcf31a700(0000) GS:ff4501f1c8040000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   CR2: 0000000000000000 CR3: 0000001dea475005 CR4: 0000000000763ee0
   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   PKRU: 55555554
   Call Trace:
    paging64_gva_to_gpa+0x3f/0xb0 [kvm]
    kvm_fixup_and_inject_pf_error+0x48/0xa0 [kvm]
    handle_exception_nmi+0x4fc/0x5b0 [kvm_intel]
    kvm_arch_vcpu_ioctl_run+0x911/0x1c10 [kvm]
    kvm_vcpu_ioctl+0x23e/0x5d0 [kvm]
    ksys_ioctl+0x92/0xb0
    __x64_sys_ioctl+0x16/0x20
    do_syscall_64+0x3e/0xb0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
   ---[ end trace d17eb998aee991da ]---

Reported-by: Sean Christopherson <sean.j.christopherson@intel.com>
Fixes: 897861479c ("KVM: x86: Add helper functions for illegal GPA checking and page fault injection")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-17 15:21:43 -04:00
Linus Torvalds
8cd84b7096 PPC:
* Improvements and bugfixes for secure VM support, giving reduced startup
   time and memory hotplug support.
 * Locking fixes in nested KVM code
 * Increase number of guests supported by HV KVM to 4094
 * Preliminary POWER10 support
 
 ARM:
 * Split the VHE and nVHE hypervisor code bases, build the EL2 code
   separately, allowing for the VHE code to now be built with instrumentation
 * Level-based TLB invalidation support
 * Restructure of the vcpu register storage to accomodate the NV code
 * Pointer Authentication available for guests on nVHE hosts
 * Simplification of the system register table parsing
 * MMU cleanups and fixes
 * A number of post-32bit cleanups and other fixes
 
 MIPS:
 * compilation fixes
 
 x86:
 * bugfixes
 * support for the SERIALIZE instruction
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl8yfuQUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNweQgAiEycRbpifAueihK3ScKwYcCFhbHg
 n6KLiFCY3sJRg+ORNb9EuFPJgGygV8DPKbEMvKaGDhNpX3rOpSIrpi5QQ5Hx+WOj
 WHg+aX8Eyy1ys7V84UbiMeZKUbKDDRr0/UOUtJEsF4hiD7s0FgobbQhC/3+awp5k
 sdSTMYlXelep+pjdFX7cNIgjrBNFtqH0ECeuDCcQzDg2zlH+poEPyLaC5+U4RF6r
 pfvcxd6xp50fobo8ro7kMuBeclG3JxLjqqdNrkkHcF1DxROMLLKN7CjHZchYC/BK
 c+S7JHLFnafxiTncMLhv3s4viey05mohW6SxeLw4qcWHfFlz+qyfZwMvZA==
 =d/GI
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more KVM updates from Paolo Bonzini:
 "PPC:
   - Improvements and bugfixes for secure VM support, giving reduced
     startup time and memory hotplug support.

   - Locking fixes in nested KVM code

   - Increase number of guests supported by HV KVM to 4094

   - Preliminary POWER10 support

  ARM:
   - Split the VHE and nVHE hypervisor code bases, build the EL2 code
     separately, allowing for the VHE code to now be built with
     instrumentation

   - Level-based TLB invalidation support

   - Restructure of the vcpu register storage to accomodate the NV code

   - Pointer Authentication available for guests on nVHE hosts

   - Simplification of the system register table parsing

   - MMU cleanups and fixes

   - A number of post-32bit cleanups and other fixes

  MIPS:
   - compilation fixes

  x86:
   - bugfixes

   - support for the SERIALIZE instruction"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (70 commits)
  KVM: MIPS/VZ: Fix build error caused by 'kvm_run' cleanup
  x86/kvm/hyper-v: Synic default SCONTROL MSR needs to be enabled
  MIPS: KVM: Convert a fallthrough comment to fallthrough
  MIPS: VZ: Only include loongson_regs.h for CPU_LOONGSON64
  x86: Expose SERIALIZE for supported cpuid
  KVM: x86: Don't attempt to load PDPTRs when 64-bit mode is enabled
  KVM: arm64: Move S1PTW S2 fault logic out of io_mem_abort()
  KVM: arm64: Don't skip cache maintenance for read-only memslots
  KVM: arm64: Handle data and instruction external aborts the same way
  KVM: arm64: Rename kvm_vcpu_dabt_isextabt()
  KVM: arm: Add trace name for ARM_NISV
  KVM: arm64: Ensure that all nVHE hyp code is in .hyp.text
  KVM: arm64: Substitute RANDOMIZE_BASE for HARDEN_EL2_VECTORS
  KVM: arm64: Make nVHE ASLR conditional on RANDOMIZE_BASE
  KVM: PPC: Book3S HV: Rework secure mem slot dropping
  KVM: PPC: Book3S HV: Move kvmppc_svm_page_out up
  KVM: PPC: Book3S HV: Migrate hot plugged memory
  KVM: PPC: Book3S HV: In H_SVM_INIT_DONE, migrate remaining normal-GFNs to secure-GFNs
  KVM: PPC: Book3S HV: Track the state GFNs associated with secure VMs
  KVM: PPC: Book3S HV: Disable page merging in H_SVM_INIT_START
  ...
2020-08-12 12:25:06 -07:00
Linus Torvalds
57b0779392 virtio: fixes, features
IRQ bypass support for vdpa and IFC
 MLX5 vdpa driver
 Endian-ness fixes for virtio drivers
 Misc other fixes
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl8yVEwPHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpNPEH/0Dtq1s1V4r/kxtLUoMophv9wuORpWCr98BQ
 2aOveTmwTOVdZVOiw2tzTgO9nbWx+cL2HvkU7Aajfpz5hh93Z2VOo2n4a7hBC79f
 rlc3GXiG+pMk5RfmqGofIHTU+D6ony4D5SXlUDurLdtEwunyuqZwABiWkZjdclZJ
 bv90IL8Upzbz0rxYr7k3z8UepdOCt7r4QS/o7STHZBjJRyylxmO/R2yTnh6PtpRK
 Q/z35wJBJ3SKc8X3Fi0VOOSeGNZOiypkkl9ZnLVY5lExNAU1+2MMn2UK119SlCDV
 MSxb7quYFF4cksXH1g77GMBNi1uADRh1dtFMZdkKhZGljGxKLxo=
 =6VTZ
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

 - IRQ bypass support for vdpa and IFC

 - MLX5 vdpa driver

 - Endianness fixes for virtio drivers

 - Misc other fixes

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (71 commits)
  vdpa/mlx5: fix up endian-ness for mtu
  vdpa: Fix pointer math bug in vdpasim_get_config()
  vdpa/mlx5: Fix pointer math in mlx5_vdpa_get_config()
  vdpa/mlx5: fix memory allocation failure checks
  vdpa/mlx5: Fix uninitialised variable in core/mr.c
  vdpa_sim: init iommu lock
  virtio_config: fix up warnings on parisc
  vdpa/mlx5: Add VDPA driver for supported mlx5 devices
  vdpa/mlx5: Add shared memory registration code
  vdpa/mlx5: Add support library for mlx5 VDPA implementation
  vdpa/mlx5: Add hardware descriptive header file
  vdpa: Modify get_vq_state() to return error code
  net/vdpa: Use struct for set/get vq state
  vdpa: remove hard coded virtq num
  vdpasim: support batch updating
  vhost-vdpa: support IOTLB batching hints
  vhost-vdpa: support get/set backend features
  vhost: generialize backend features setting/getting
  vhost-vdpa: refine ioctl pre-processing
  vDPA: dont change vq irq after DRIVER_OK
  ...
2020-08-11 14:34:17 -07:00
Sean Christopherson
05487215e6 KVM: x86: Don't attempt to load PDPTRs when 64-bit mode is enabled
Don't attempt to load PDPTRs if EFER.LME=1, i.e. if 64-bit mode is
enabled.  A recent change to reload the PDTPRs when CR0.CD or CR0.NW is
toggled botched the EFER.LME handling and sends KVM down the PDTPR path
when is_paging() is true, i.e. when the guest toggles CD/NW in 64-bit
mode.

Split the CR0 checks for 64-bit vs. 32-bit PAE into separate paths.  The
64-bit path is specifically checking state when paging is toggled on,
i.e. CR0.PG transititions from 0->1.  The PDPTR path now needs to run if
the new CR0 state has paging enabled, irrespective of whether paging was
already enabled.  Trying to shave a few cycles to make the PDPTR path an
"else if" case is a mess.

Fixes: d42e3fae6f ("kvm: x86: Read PDPTEs on CR0.CD and CR0.NW changes")
Cc: Jim Mattson <jmattson@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Peter Shier <pshier@google.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200714015732.32426-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-08-09 12:51:50 -04:00
Linus Torvalds
921d2597ab s390: implement diag318
x86:
 * Report last CPU for debugging
 * Emulate smaller MAXPHYADDR in the guest than in the host
 * .noinstr and tracing fixes from Thomas
 * nested SVM page table switching optimization and fixes
 
 Generic:
 * Unify shadow MMU cache data structures across architectures
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl8pC+oUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroNcOwgAjomqtEqQNlp7DdZT7VyyklzbxX1/
 ud7v+oOJ8K4sFlf64lSthjPo3N9rzZCcw+yOXmuyuITngXOGc3tzIwXpCzpLtuQ1
 WO1Ql3B/2dCi3lP5OMmsO1UAZqy9pKLg1dfeYUPk48P5+p7d/NPmk+Em5kIYzKm5
 JsaHfCp2EEXomwmljNJ8PQ1vTjIQSSzlgYUBZxmCkaaX7zbEUMtxAQCStHmt8B84
 33LczwXBm3viSWrzsoBV37I70+tseugiSGsCfUyupXOvq55d6D9FCqtCb45Hn4Vh
 Ik8ggKdalsk/reiGEwNw1/3nr6mRMkHSbl+Mhc4waOIFf9dn0urgQgOaDg==
 =YVx0
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "s390:
   - implement diag318

  x86:
   - Report last CPU for debugging
   - Emulate smaller MAXPHYADDR in the guest than in the host
   - .noinstr and tracing fixes from Thomas
   - nested SVM page table switching optimization and fixes

  Generic:
   - Unify shadow MMU cache data structures across architectures"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
  KVM: SVM: Fix sev_pin_memory() error handling
  KVM: LAPIC: Set the TDCR settable bits
  KVM: x86: Specify max TDP level via kvm_configure_mmu()
  KVM: x86/mmu: Rename max_page_level to max_huge_page_level
  KVM: x86: Dynamically calculate TDP level from max level and MAXPHYADDR
  KVM: VXM: Remove temporary WARN on expected vs. actual EPTP level mismatch
  KVM: x86: Pull the PGD's level from the MMU instead of recalculating it
  KVM: VMX: Make vmx_load_mmu_pgd() static
  KVM: x86/mmu: Add separate helper for shadow NPT root page role calc
  KVM: VMX: Drop a duplicate declaration of construct_eptp()
  KVM: nSVM: Correctly set the shadow NPT root level in its MMU role
  KVM: Using macros instead of magic values
  MIPS: KVM: Fix build error caused by 'kvm_run' cleanup
  KVM: nSVM: remove nonsensical EXITINFO1 adjustment on nested NPF
  KVM: x86: Add a capability for GUEST_MAXPHYADDR < HOST_MAXPHYADDR support
  KVM: VMX: optimize #PF injection when MAXPHYADDR does not match
  KVM: VMX: Add guest physical address check in EPT violation and misconfig
  KVM: VMX: introduce vmx_need_pf_intercept
  KVM: x86: update exception bitmap on CPUID changes
  KVM: x86: rename update_bp_intercept to update_exception_bitmap
  ...
2020-08-06 12:59:31 -07:00
Zhu Lingshan
2edd9cb79f kvm: detect assigned device via irqbypass manager
vDPA devices has dedicated backed hardware like
passthrough-ed devices. Then it is possible to setup irq
offloading to vCPU for vDPA devices. Thus this patch tries to
manipulated assigned device counters by
kvm_arch_start/end_assignment() in irqbypass manager, so that
assigned devices could be detected in update_pi_irte()

We will increase/decrease the assigned device counter in kvm/x86.
Both vDPA and VFIO would go through this code path.

Only X86 uses these counters and kvm_arch_start/end_assignment(),
so this code path only affect x86 for now.

Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20200731065533.4144-3-lingshan.zhu@intel.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-08-05 11:08:42 -04:00
Linus Torvalds
125cfa0d4d The conversion of X86 syscall, interrupt and exception entry/exit handling
to the generic code. Pretty much a straight forward 1:1 conversion plus the
 consolidation of the KVM handling of pending work before entering guest
 mode.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8pEFgTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYocEwD/474Eb7LzZ8yahyUBirWJP3k3qzgs9j
 dZUxqB6LNuDOstEyTGLPdx1dmQP2vHbFfjoM7YBOH37EGcHsqjGliLvn2Y05ZD7O
 6kYwjz6qVnJcm3IMtfSUn/8LkfO5pGUdKd3U5ngDmPLpkeaQ4nPKqiO0uIb0wzwa
 cO7l10tG4YjMCWQxPNIaOh8kncLieQBediJPFjkQjV+Fh33kSU3LWTl3fccz6b5+
 mgSUFL0qjQpp+Nl7lCaDQQiAop9GTUETfDtximRydZauiM2NpCfz+QBmQzq50Xv1
 G3DWZoBIZBjmWJUgfSmS/s4GOYkBTBnT/fUcZmIDcgdRwvtEvRzIhcP87/wn7P3N
 UKpLdHqmvA0BFDXZbNZgS362++29pj5Lnb+u3QbWSKQ9UqHN0NUlSY4wzfTLXsGp
 Mzpp4TW0u/8kyOlo7wK3lVDgNJaPG31aiNVuDPgLe4cEluO5cq7/7g2GcFBqF1Ly
 SqNGD1IccteNQTNvDopczPy7qUl5Lal+Ia06szNSPR48gLrvhSWdyYr2i1sD7vx4
 hAhR0Hsi9dacGv46TrRw1OdDzq9bOW68G8GIgLJgDXaayPXLnx6TQEUjzQtIkE/i
 ydTPUarp5QOFByt+RBjI90ZcW4RuLgMTOEVONPXtSn8IoCP2Kdg9u3gD9AmUW3Q2
 JFkKMiSiJPGxlw==
 =84y7
 -----END PGP SIGNATURE-----

Merge tag 'x86-entry-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 conversion to generic entry code from Thomas Gleixner:
 "The conversion of X86 syscall, interrupt and exception entry/exit
  handling to the generic code.

  Pretty much a straight-forward 1:1 conversion plus the consolidation
  of the KVM handling of pending work before entering guest mode"

* tag 'x86-entry-2020-08-04' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/kvm: Use __xfer_to_guest_mode_work_pending() in kvm_run_vcpu()
  x86/kvm: Use generic xfer to guest work function
  x86/entry: Cleanup idtentry_enter/exit
  x86/entry: Use generic interrupt entry/exit code
  x86/entry: Cleanup idtentry_entry/exit_user
  x86/entry: Use generic syscall exit functionality
  x86/entry: Use generic syscall entry function
  x86/ptrace: Provide pt_regs helper for entry/exit
  x86/entry: Move user return notifier out of loop
  x86/entry: Consolidate 32/64 bit syscall entry
  x86/entry: Consolidate check_user_regs()
  x86: Correct noinstr qualifiers
  x86/idtentry: Remove stale comment
2020-08-04 21:05:46 -07:00
Linus Torvalds
99ea1521a0 Remove uninitialized_var() macro for v5.9-rc1
- Clean up non-trivial uses of uninitialized_var()
 - Update documentation and checkpatch for uninitialized_var() removal
 - Treewide removal of uninitialized_var()
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oYLQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsfjEACvf0D3WL3H7sLHtZ2HeMwOgAzq
 il08t6vUscINQwiIIK3Be43ok3uQ1Q+bj8sr2gSYTwunV2IYHFferzgzhyMMno3o
 XBIGd1E+v1E4DGBOiRXJvacBivKrfvrdZ7AWiGlVBKfg2E0fL1aQbe9AYJ6eJSbp
 UGqkBkE207dugS5SQcwrlk1tWKUL089lhDAPd7iy/5RK76OsLRCJFzIerLHF2ZK2
 BwvA+NWXVQI6pNZ0aRtEtbbxwEU4X+2J/uaXH5kJDszMwRrgBT2qoedVu5LXFPi8
 +B84IzM2lii1HAFbrFlRyL/EMueVFzieN40EOB6O8wt60Y4iCy5wOUzAdZwFuSTI
 h0xT3JI8BWtpB3W+ryas9cl9GoOHHtPA8dShuV+Y+Q2bWe1Fs6kTl2Z4m4zKq56z
 63wQCdveFOkqiCLZb8s6FhnS11wKtAX4czvXRXaUPgdVQS1Ibyba851CRHIEY+9I
 AbtogoPN8FXzLsJn7pIxHR4ADz+eZ0dQ18f2hhQpP6/co65bYizNP5H3h+t9hGHG
 k3r2k8T+jpFPaddpZMvRvIVD8O2HvJZQTyY6Vvneuv6pnQWtr2DqPFn2YooRnzoa
 dbBMtpon+vYz6OWokC5QNWLqHWqvY9TmMfcVFUXE4AFse8vh4wJ8jJCNOFVp8On+
 drhmmImUr1YylrtVOw==
 =xHmk
 -----END PGP SIGNATURE-----

Merge tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull uninitialized_var() macro removal from Kees Cook:
 "This is long overdue, and has hidden too many bugs over the years. The
  series has several "by hand" fixes, and then a trivial treewide
  replacement.

   - Clean up non-trivial uses of uninitialized_var()

   - Update documentation and checkpatch for uninitialized_var() removal

   - Treewide removal of uninitialized_var()"

* tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  compiler: Remove uninitialized_var() macro
  treewide: Remove uninitialized_var() usage
  checkpatch: Remove awareness of uninitialized_var() macro
  mm/debug_vm_pgtable: Remove uninitialized_var() usage
  f2fs: Eliminate usage of uninitialized_var() macro
  media: sur40: Remove uninitialized_var() usage
  KVM: PPC: Book3S PR: Remove uninitialized_var() usage
  clk: spear: Remove uninitialized_var() usage
  clk: st: Remove uninitialized_var() usage
  spi: davinci: Remove uninitialized_var() usage
  ide: Remove uninitialized_var() usage
  rtlwifi: rtl8192cu: Remove uninitialized_var() usage
  b43: Remove uninitialized_var() usage
  drbd: Remove uninitialized_var() usage
  x86/mm/numa: Remove uninitialized_var() usage
  docs: deprecated.rst: Add uninitialized_var()
2020-08-04 13:49:43 -07:00
Sean Christopherson
83013059bd KVM: x86: Specify max TDP level via kvm_configure_mmu()
Capture the max TDP level during kvm_configure_mmu() instead of using a
kvm_x86_ops hook to do it at every vCPU creation.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200716034122.5998-10-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-30 18:18:08 -04:00