linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Sean Christopherson	2f728d66e8	KVM: x86: Move kvm_emulate.h into KVM's private directory Now that the emulation context is dynamically allocated and not embedded in struct kvm_vcpu, move its header, kvm_emulate.h, out of the public asm directory and into KVM's private x86 directory. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:52 +01:00
Sean Christopherson	c9b8b07cde	KVM: x86: Dynamically allocate per-vCPU emulation context Allocate the emulation context instead of embedding it in struct kvm_vcpu_arch. Dynamic allocation provides several benefits: - Shrinks the size x86 vcpus by ~2.5k bytes, dropping them back below the PAGE_ALLOC_COSTLY_ORDER threshold. - Allows for dropping the include of kvm_emulate.h from asm/kvm_host.h and moving kvm_emulate.h into KVM's private directory. - Allows a reducing KVM's attack surface by shrinking the amount of vCPU data that is exposed to usercopy. - Allows a future patch to disable the emulator entirely, which may or may not be a realistic endeavor. Mark the entire struct as valid for usercopy to maintain existing behavior with respect to hardened usercopy. Future patches can shrink the usercopy range to cover only what is necessary. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:52 +01:00
Sean Christopherson	f0ed4760ed	KVM: x86: Move emulation-only helpers to emulate.c Move ctxt_virt_addr_bits() and emul_is_noncanonical_address() from x86.h to emulate.c. This eliminates all references to struct x86_emulate_ctxt from x86.h, and sets the stage for a future patch to stop including kvm_emulate.h in asm/kvm_host.h. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:51 +01:00
Sean Christopherson	21f1b8f29e	KVM: x86: Explicitly pass an exception struct to check_intercept Explicitly pass an exception struct when checking for intercept from the emulator, which eliminates the last reference to arch.emulate_ctxt in vendor specific code. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:50 +01:00
Sean Christopherson	2e3bb4d886	KVM: x86: Refactor I/O emulation helpers to provide vcpu-only variant Add variants of the I/O helpers that take a vCPU instead of an emulation context. This will eventually allow KVM to limit use of the emulation context to the full emulation path. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:49 +01:00
Sean Christopherson	abbed4fa94	KVM: x86: Fix warning due to implicit truncation on 32-bit KVM Explicitly cast the integer literal to an unsigned long when stuffing a non-canonical value into the host virtual address during private memslot deletion. The explicit cast fixes a warning that gets promoted to an error when running with KVM's newfangled -Werror setting. arch/x86/kvm/x86.c:9739:9: error: large integer implicitly truncated to unsigned type [-Werror=overflow] Fixes: a3e967c0b87d3 ("KVM: Terminate memslot walks via used_slots" Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:48 +01:00
Sean Christopherson	96d4701049	KVM: nVMX: Drop unnecessary check on ept caps for execute-only Drop the call to cpu_has_vmx_ept_execute_only() when calculating which EPT capabilities will be exposed to L1 for nested EPT. The resulting configuration is immediately sanitized by the passed in @ept_caps, and except for the call from vmx_check_processor_compat(), @ept_caps is the capabilities that are queried by cpu_has_vmx_ept_execute_only(). For vmx_check_processor_compat(), KVM wants to ignore vmx_capability.ept so that a divergence in EPT capabilities between CPUs is detected. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:47 +01:00
Sean Christopherson	d8dd54e063	KVM: x86/mmu: Rename kvm_mmu->get_cr3() to ->get_guest_pgd() Rename kvm_mmu->get_cr3() to call out that it is retrieving a guest value, as opposed to kvm_mmu->set_cr3(), which sets a host value, and to note that it will return something other than CR3 when nested EPT is in use. Hopefully the new name will also make it more obvious that L1's nested_cr3 is returned in SVM's nested NPT case. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:46 +01:00
Sean Christopherson	ac6389ab2c	KVM: nVMX: Rename EPTP validity helper and associated variables Rename valid_ept_address() to nested_vmx_check_eptp() to follow the nVMX nomenclature and to reflect that the function now checks a lot more than just the address contained in the EPTP. Rename address to new_eptp in associated code. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:45 +01:00
Sean Christopherson	ac69dfaace	KVM: nVMX: Rename nested_ept_get_cr3() to nested_ept_get_eptp() Rename the accessor for vmcs12.EPTP to use "eptp" instead of "cr3". The accessor has no relation to cr3 whatsoever, other than it being assigned to the also poorly named kvm_mmu->get_cr3() hook. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:44 +01:00
Sean Christopherson	bb1fcc70d9	KVM: nVMX: Allow L1 to use 5-level page walks for nested EPT Add support for 5-level nested EPT, and advertise said support in the EPT capabilities MSR. KVM's MMU can already handle 5-level legacy page tables, there's no reason to force an L1 VMM to use shadow paging if it wants to employ 5-level page tables. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:44 +01:00
Sean Christopherson	8053f924ca	KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack Drop kvm_mmu_extended_role.cr4_la57 now that mmu_role doesn't mask off level, which already incorporates the guest's CR4.LA57 for a shadow MMU by querying is_la57_mode(). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:43 +01:00
Sean Christopherson	a102a674e4	KVM: x86/mmu: Don't drop level/direct from MMU role calculation Use the calculated role as-is when propagating it to kvm_mmu.mmu_role, i.e. stop masking off meaningful fields. The concept of masking off fields came from kvm_mmu_pte_write(), which (correctly) ignores certain fields when comparing kvm_mmu_page.role against kvm_mmu.mmu_role, e.g. the current mmu's access and level have no relation to a shadow page's access and level. Masking off the level causes problems for 5-level paging, e.g. CR4.LA57 has its own redundant flag in the extended role, and nested EPT would need a similar hack to support 5-level paging for L2. Opportunistically rework the mask for kvm_mmu_pte_write() to define the fields that should be ignored as opposed to the fields that should be checked, i.e. make it opt-out instead of opt-in so that new fields are automatically picked up. While doing so, stop ignoring "direct". The field is effectively ignored anyways because kvm_mmu_pte_write() is only reached with an indirect mmu and the loop only walks indirect shadow pages, but double checking "direct" literally costs nothing. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:42 +01:00
Sean Christopherson	a1c77abb8d	KVM: nVMX: Properly handle userspace interrupt window request Return true for vmx_interrupt_allowed() if the vCPU is in L2 and L1 has external interrupt exiting enabled. IRQs are never blocked in hardware if the CPU is in the guest (L2 from L1's perspective) when IRQs trigger VM-Exit. The new check percolates up to kvm_vcpu_ready_for_interrupt_injection() and thus vcpu_run(), and so KVM will exit to userspace if userspace has requested an interrupt window (to inject an IRQ into L1). Remove the @external_intr param from vmx_check_nested_events(), which is actually an indicator that userspace wants an interrupt window, e.g. it's named @req_int_win further up the stack. Injecting a VM-Exit into L1 to try and bounce out to L0 userspace is all kinds of broken and is no longer necessary. Remove the hack in nested_vmx_vmexit() that attempted to workaround the breakage in vmx_check_nested_events() by only filling interrupt info if there's an actual interrupt pending. The hack actually made things worse because it caused KVM to _never_ fill interrupt info when the LAPIC resides in userspace (kvm_cpu_has_interrupt() queries interrupt.injected, which is always cleared by prepare_vmcs12() before reaching the hack in nested_vmx_vmexit()). Fixes: `6550c4df7e` ("KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"") Cc: stable@vger.kernel.org Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:40 +01:00
Wanpeng Li	b34de572a8	KVM: X86: trigger kvmclock sync request just once on VM creation In the progress of vCPUs creation, it queues a kvmclock sync worker to the global workqueue before each vCPU creation completes. The workqueue subsystem guarantees not to queue the already queued work; however, we can make the logic more clear by making just one leader to trigger this kvmclock sync request, and also save on cacheline bouncing caused by test_and_set_bit. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:40 +01:00
Wanpeng Li	4abaffce4d	KVM: LAPIC: Recalculate apic map in batch In the vCPU reset and set APIC_BASE MSR path, the apic map will be recalculated several times, each time it will consume 10+ us observed by ftrace in my non-overcommit environment since the expensive memory allocate/mutex/rcu etc operations. This patch optimizes it by recaluating apic map in batch, I hope this can benefit the serverless scenario which can frequently create/destroy VMs. Before patch: kvm_lapic_reset ~27us After patch: kvm_lapic_reset ~14us Observed by ftrace, improve ~48%. Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:39 +01:00
Miaohe Lin	49f933d445	KVM: Fix some obsolete comments Remove some obsolete comments, fix wrong function name and description. Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:38 +01:00
Jay Zhou	3c9bd4006b	KVM: x86: enable dirty log gradually in small chunks It could take kvm->mmu_lock for an extended period of time when enabling dirty log for the first time. The main cost is to clear all the D-bits of last level SPTEs. This situation can benefit from manual dirty log protect as well, which can reduce the mmu_lock time taken. The sequence is like this: 1. Initialize all the bits of the dirty bitmap to 1 when enabling dirty log for the first time 2. Only write protect the huge pages 3. KVM_GET_DIRTY_LOG returns the dirty bitmap info 4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level SPTEs gradually in small chunks Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment, I did some tests with a 128G windows VM and counted the time taken of memory_global_dirty_log_start, here is the numbers: VM Size Before After optimization 128G 460ms 10ms Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:37 +01:00
Sean Christopherson	0be4435207	KVM: x86/mmu: Reuse the current root if possible for fast switch Reuse the current root when possible instead of grabbing a different root from the array of cached roots. Doing so avoids unnecessary MMU switches and also fixes a quirk where KVM can't reuse roots without creating multiple roots since the cache is a victim cache, i.e. roots are added to the cache when they're "evicted", not when they are created. The quirk could be fixed by adding roots to the cache on creation, but that would reduce the effective size of the cache as one of its entries would be burned to track the current root. Reusing the current root is especially helpful for nested virt as the current root is almost always usable for the "new" MMU on nested VM-entry/VM-exit. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:37 +01:00
Sean Christopherson	3651c7fc2b	KVM: x86/mmu: Ignore guest CR3 on fast root switch for direct MMU Ignore the guest's CR3 when looking for a cached root for a direct MMU, the guest's CR3 has no impact on the direct MMU's shadow pages (the role check ensures compatibility with CR0.WP, etc...). Zero out root_cr3 when allocating the direct roots to make it clear that it's ignored. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:36 +01:00
Oliver Upton	cc7f5577ad	KVM: SVM: Inhibit APIC virtualization for X2APIC guest The AVIC does not support guest use of the x2APIC interface. Currently, KVM simply chooses to squash the x2APIC feature in the guest's CPUID If the AVIC is enabled. Doing so prevents KVM from running a guest with greater than 255 vCPUs, as such a guest necessitates the use of the x2APIC interface. Instead, inhibit AVIC enablement on a per-VM basis whenever the x2APIC feature is set in the guest's CPUID. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:35 +01:00
Peter Xu	4d39576259	KVM: Remove unnecessary asm/kvm_host.h includes Remove includes of asm/kvm_host.h from files that already include linux/kvm_host.h to make it more obvious that there is no ordering issue between the two headers. linux/kvm_host.h includes asm/kvm_host.h to pick up architecture specific settings, and this will never change, i.e. including asm/kvm_host.h after linux/kvm_host.h may seem problematic, but in practice is simply redundant. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:34 +01:00
Sean Christopherson	562b6b089d	KVM: x86: Consolidate VM allocation and free for VMX and SVM Move the VM allocation and free code to common x86 as the logic is more or less identical across SVM and VMX. Note, although hyperv.hv_pa_pg is part of the common kvm->arch, it's (currently) only allocated by VMX VMs. But, since kfree() plays nice when passed a NULL pointer, the superfluous call for SVM is harmless and avoids future churn if SVM gains support for HyperV's direct TLB flush. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> [Make vm_size a field instead of a function. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:33 +01:00
Sean Christopherson	1a625056cc	KVM: x86: Directly return __vmalloc() result in ->vm_alloc() Directly return the __vmalloc() result in {svm,vmx}_vm_alloc() to pave the way for handling VM alloc/free in common x86 code, and to obviate the need to check the result of __vmalloc() in vendor specific code. Add a build-time assertion to ensure each structs' "kvm" field stays at offset 0, which allows interpreting a "struct kvm_{svm,vmx}" as a "struct kvm". Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:32 +01:00
Sean Christopherson	d18b2f43b9	KVM: x86: Gracefully handle __vmalloc() failure during VM allocation Check the result of __vmalloc() to avoid dereferencing a NULL pointer in the event that allocation failres. Fixes: `d1e5b0e98e` ("kvm: Make VM ioctl do valloc for some archs") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:31 +01:00
Eric Hankland	168d918f26	KVM: x86: Adjust counter sample period after a wrmsr The sample_period of a counter tracks when that counter will overflow and set global status/trigger a PMI. However this currently only gets set when the initial counter is created or when a counter is resumed; this updates the sample period after a wrmsr so running counters will accurately reflect their new value. Signed-off-by: Eric Hankland <ehankland@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:30 +01:00
Sean Christopherson	7f42aa76d4	KVM: x86/mmu: Consolidate open coded variants of memslot TLB flushes Replace open coded instances of kvm_arch_flush_remote_tlbs_memslot()'s functionality with calls to the aforementioned function. Update the comment in kvm_arch_flush_remote_tlbs_memslot() to elaborate on how it is used and why it asserts that slots_lock is held. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:29 +01:00
Sean Christopherson	cec37648f4	KVM: x86/mmu: Use range-based TLB flush for dirty log memslot flush Use the with_address() variant when performing a TLB flush for a specific memslot via kvm_arch_flush_remote_tlbs_memslot(), i.e. when flushing after clearing dirty bits during KVM_{GET,CLEAR}_DIRTY_LOG. This aligns all dirty log memslot-specific TLB flushes to use the with_address() variant and paves the way for consolidating the relevant code. Note, moving to the with_address() variant only affects functionality when running as a HyperV guest. Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:29 +01:00
Sean Christopherson	b3594ffbf9	KVM: x86/mmu: Move kvm_arch_flush_remote_tlbs_memslot() to mmu.c Move kvm_arch_flush_remote_tlbs_memslot() from x86.c to mmu.c in preparation for calling kvm_flush_remote_tlbs_with_address() instead of kvm_flush_remote_tlbs(). The with_address() variant is statically defined in mmu.c, arguably kvm_arch_flush_remote_tlbs_memslot() belongs in mmu.c anyways, and defining kvm_arch_flush_remote_tlbs_memslot() in mmu.c will allow the compiler to inline said function when a future patch consolidates open coded variants of the function. No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:28 +01:00
Sean Christopherson	0577d1abe7	KVM: Terminate memslot walks via used_slots Refactor memslot handling to treat the number of used slots as the de facto size of the memslot array, e.g. return NULL from id_to_memslot() when an invalid index is provided instead of relying on npages==0 to detect an invalid memslot. Rework the sorting and walking of memslots in advance of dynamically sizing memslots to aid bisection and debug, e.g. with luck, a bug in the refactoring will bisect here and/or hit a WARN instead of randomly corrupting memory. Alternatively, a global null/invalid memslot could be returned, i.e. so callers of id_to_memslot() don't have to explicitly check for a NULL memslot, but that approach runs the risk of introducing difficult-to- debug issues, e.g. if the global null slot is modified. Constifying the return from id_to_memslot() to combat such issues is possible, but would require a massive refactoring of arch specific code and would still be susceptible to casting shenanigans. Add function comments to update_memslots() and search_memslots() to explicitly (and loudly) state how memslots are sorted. Opportunistically stuff @hva with a non-canonical value when deleting a private memslot on x86 to detect bogus usage of the freed slot. No functional change intended. Tested-by: Christoffer Dall <christoffer.dall@arm.com> Tested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:26 +01:00
Sean Christopherson	0dff084607	KVM: Provide common implementation for generic dirty log functions Move the implementations of KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG for CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT into common KVM code. The arch specific implemenations are extremely similar, differing only in whether the dirty log needs to be sync'd from hardware (x86) and how the TLBs are flushed. Add new arch hooks to handle sync and TLB flush; the sync will also be used for non-generic dirty log support in a future patch (s390). The ulterior motive for providing a common implementation is to eliminate the dependency between arch and common code with respect to the memslot referenced by the dirty log, i.e. to make it obvious in the code that the validity of the memslot is guaranteed, as a future patch will rework memslot handling such that id_to_memslot() can return NULL. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:24 +01:00
Sean Christopherson	e96c81ee89	KVM: Simplify kvm_free_memslot() and all its descendents Now that all callers of kvm_free_memslot() pass NULL for @dont, remove the param from the top-level routine and all arch's implementations. No functional change intended. Tested-by: Christoffer Dall <christoffer.dall@arm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:22 +01:00
Sean Christopherson	21198846de	KVM: x86: Free arrays for old memslot when moving memslot's base gfn Explicitly free the metadata arrays (stored in slot->arch) in the old memslot structure when moving the memslot's base gfn is committed. This eliminates x86's dependency on kvm_free_memslot() being called when a memslot move is committed, and paves the way for removing the funky code in kvm_free_memslot() that conditionally frees structures based on its @dont param. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:21 +01:00
Sean Christopherson	9d4c197c0e	KVM: Drop "const" attribute from old memslot in commit_memory_region() Drop the "const" attribute from @old in kvm_arch_commit_memory_region() to allow arch specific code to free arch specific resources in the old memslot without having to cast away the attribute. Freeing resources in kvm_arch_commit_memory_region() paves the way for simplifying kvm_free_memslot() by eliminating the last usage of its @dont param. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:20 +01:00
Sean Christopherson	414de7abbf	KVM: Drop kvm_arch_create_memslot() Remove kvm_arch_create_memslot() now that all arch implementations are effectively nops. Removing kvm_arch_create_memslot() eliminates the possibility for arch specific code to allocate memory prior to setting a memslot, which sets the stage for simplifying kvm_free_memslot(). Cc: Janosch Frank <frankja@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:17 +01:00
Sean Christopherson	0dab98b7ad	KVM: x86: Allocate memslot resources during prepare_memory_region() Allocate the various metadata structures associated with a new memslot during kvm_arch_prepare_memory_region(), which paves the way for removing kvm_arch_create_memslot() altogether. Moving x86's memory allocation only changes the order of kernel memory allocations between x86 and common KVM code. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:16 +01:00
Sean Christopherson	edd4fa37ba	KVM: x86: Allocate new rmap and large page tracking when moving memslot Reallocate a rmap array and recalcuate large page compatibility when moving an existing memslot to correctly handle the alignment properties of the new memslot. The number of rmap entries required at each level is dependent on the alignment of the memslot's base gfn with respect to that level, e.g. moving a large-page aligned memslot so that it becomes unaligned will increase the number of rmap entries needed at the now unaligned level. Not updating the rmap array is the most obvious bug, as KVM accesses garbage data beyond the end of the rmap. KVM interprets the bad data as pointers, leading to non-canonical #GPs, unexpected #PFs, etc... general protection fault: 0000 [#1] SMP CPU: 0 PID: 1909 Comm: move_memory_reg Not tainted 5.4.0-rc7+ #139 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:rmap_get_first+0x37/0x50 [kvm] Code: <48> 8b 3b 48 85 ff 74 ec e8 6c f4 ff ff 85 c0 74 e3 48 89 d8 5b c3 RSP: 0018:ffffc9000021bbc8 EFLAGS: 00010246 RAX: ffff00617461642e RBX: ffff00617461642e RCX: 0000000000000012 RDX: ffff88827400f568 RSI: ffffc9000021bbe0 RDI: ffff88827400f570 RBP: 0010000000000000 R08: ffffc9000021bd00 R09: ffffc9000021bda8 R10: ffffc9000021bc48 R11: 0000000000000000 R12: 0030000000000000 R13: 0000000000000000 R14: ffff88827427d700 R15: ffffc9000021bce8 FS: 00007f7eda014700(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7ed9216ff8 CR3: 0000000274391003 CR4: 0000000000162eb0 Call Trace: kvm_mmu_slot_set_dirty+0xa1/0x150 [kvm] __kvm_set_memory_region.part.64+0x559/0x960 [kvm] kvm_set_memory_region+0x45/0x60 [kvm] kvm_vm_ioctl+0x30f/0x920 [kvm] do_vfs_ioctl+0xa1/0x620 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4c/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f7ed9911f47 Code: <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 21 6f 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffc00937498 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000001ab0010 RCX: 00007f7ed9911f47 RDX: 0000000001ab1350 RSI: 000000004020ae46 RDI: 0000000000000004 RBP: 000000000000000a R08: 0000000000000000 R09: 00007f7ed9214700 R10: 00007f7ed92149d0 R11: 0000000000000246 R12: 00000000bffff000 R13: 0000000000000003 R14: 00007f7ed9215000 R15: 0000000000000000 Modules linked in: kvm_intel kvm irqbypass ---[ end trace 0c5f570b3358ca89 ]--- The disallow_lpage tracking is more subtle. Failure to update results in KVM creating large pages when it shouldn't, either due to stale data or again due to indexing beyond the end of the metadata arrays, which can lead to memory corruption and/or leaking data to guest/userspace. Note, the arrays for the old memslot are freed by the unconditional call to kvm_free_memslot() in __kvm_set_memory_region(). Fixes: `05da45583d` ("KVM: MMU: large page support") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:13 +01:00
Sean Christopherson	744e699c7e	KVM: x86: Move gpa_val and gpa_available into the emulator context Move the GPA tracking into the emulator context now that the context is guaranteed to be initialized via __init_emulate_ctxt() prior to dereferencing gpa_{available,val}, i.e. now that seeing a stale gpa_available will also trigger a WARN due to an invalid context. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:12 +01:00
Sean Christopherson	92daa48b34	KVM: x86: Add EMULTYPE_PF when emulation is triggered by a page fault Add a new emulation type flag to explicitly mark emulation related to a page fault. Move the propation of the GPA into the emulator from the page fault handler into x86_emulate_instruction, using EMULTYPE_PF as an indicator that cr2 is valid. Similarly, don't propagate cr2 into the exception.address when it's not valid. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:12 +01:00
Miaohe Lin	999eabcc89	KVM: apic: remove unused function apic_lvt_vector() The function apic_lvt_vector() is unused now, remove it. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:11 +01:00
Miaohe Lin	d71f5e0325	KVM: VMX: Add 'else' to split mutually exclusive case Each if branch in handle_external_interrupt_irqoff() is mutually exclusive. Add 'else' to make it clear and also avoid some unnecessary check. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:10 +01:00
Miaohe Lin	e080e538e6	KVM: x86: eliminate some unreachable code These code are unreachable, remove them. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:09 +01:00
Miaohe Lin	e630269841	KVM: x86: Fix print format and coding style Use %u to print u32 var and correct some coding style. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:08 +01:00
Chia-I Wu	222f06e7cd	KVM: vmx: rewrite the comment in vmx_get_mt_mask Better reflect the structure of the code and metion why we could not always honor the guest. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Cc: Gurchetan Singh <gurchetansingh@chromium.org> Cc: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-03-16 17:57:08 +01:00
Oliver Upton	35a571346a	KVM: nVMX: Check IO instruction VM-exit conditions Consult the 'unconditional IO exiting' and 'use IO bitmaps' VM-execution controls when checking instruction interception. If the 'use IO bitmaps' VM-execution control is 1, check the instruction access against the IO bitmaps to determine if the instruction causes a VM-exit. Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-02-23 10:16:32 +01:00
Oliver Upton	e71237d3ff	KVM: nVMX: Refactor IO bitmap checks into helper function Checks against the IO bitmap are useful for both instruction emulation and VM-exit reflection. Refactor the IO bitmap checks into a helper function. Signed-off-by: Oliver Upton <oupton@google.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-02-23 10:14:40 +01:00
Paolo Bonzini	07721feee4	KVM: nVMX: Don't emulate instructions in guest mode vmx_check_intercept is not yet fully implemented. To avoid emulating instructions disallowed by the L1 hypervisor, refuse to emulate instructions by default. Cc: stable@vger.kernel.org [Made commit, added commit msg - Oliver] Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-02-23 10:11:55 +01:00
Oliver Upton	5ef8acbdd6	KVM: nVMX: Emulate MTF when performing instruction emulation Since commit `5f3d45e7f2` ("kvm/x86: add support for MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap flag processor-based execution control for its L2 guest. KVM simply forwards any MTF VM-exits to the L1 guest, which works for normal instruction execution. However, when KVM needs to emulate an instruction on the behalf of an L2 guest, the monitor trap flag is not emulated. Add the necessary logic to kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon instruction emulation for L2. Fixes: `5f3d45e7f2` ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-02-23 09:36:23 +01:00
Li RongQing	dd58f3c95c	KVM: fix error handling in svm_hardware_setup rename svm_hardware_unsetup as svm_hardware_teardown, move it before svm_hardware_setup, and call it to free all memory if fail to setup in svm_hardware_setup, otherwise memory will be leaked remove __exit attribute for it since it is called in __init function Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-02-23 09:34:26 +01:00
Miaohe Lin	d80b64ff29	KVM: SVM: Fix potential memory leak in svm_cpu_init() When kmalloc memory for sd->sev_vmcbs failed, we forget to free the page held by sd->save_area. Also get rid of the var r as '-ENOMEM' is actually the only possible outcome here. Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2020-02-21 18:06:04 +01:00

1 2 3 4 5 ...

5895 Commits