linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Avi Kivity	586f960796	KVM: Add instruction-set-specific exit qualifications to kvm_exit trace The exit reason alone is insufficient to understand exactly why an exit occured; add ISA-specific trace parameters for additional information. Because fetching these parameters is expensive on vmx, and because these parameters are fetched even if tracing is disabled, we fetch the parameters via a callback instead of as traditional trace arguments. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:41 +02:00
Avi Kivity	aa17911e3c	KVM: Record instruction set in kvm_exit tracepoint exit_reason's meaning depend on the instruction set; record it so a trace taken on one machine can be interpreted on another. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:40 +02:00
Avi Kivity	104f226bfd	KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run() cea15c2 ("KVM: Move KVM context switch into own function") split vmx_vcpu_run() to prevent multiple copies of the context switch from being generated (causing problems due to a label). This patch folds them back together again and adds the __noclone attribute to prevent the label from being duplicated. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:37 +02:00
Avi Kivity	30b31ab682	KVM: x86 emulator: do not perform address calculations on linear addresses Linear addresses are supposed to already have segment checks performed on them; if we play with these addresses the checks become invalid. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:29:36 +02:00
Avi Kivity	90de84f50b	KVM: x86 emulator: preserve an operand's segment identity Currently the x86 emulator converts the segment register associated with an operand into a segment base which is added into the operand address. This loss of information results in us not doing segment limit checks properly. Replace struct operand's addr.mem field by a segmented_address structure which holds both the effetive address and segment. This will allow us to do the limit check at the point of access. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:29:35 +02:00
Avi Kivity	d53db5efc2	KVM: x86 emulator: drop DPRINTF() Failed emulation is reported via a tracepoint; the cmps printk is pointless. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:29:33 +02:00
Avi Kivity	8a6bcaa6ef	KVM: x86 emulator: drop unused #ifndef __KERNEL__ Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:29:32 +02:00
Shane Wang	f9335afea5	KVM: VMX: Inform user about INTEL_TXT dependency Inform user to either disable TXT in the BIOS or do TXT launch with tboot before enabling KVM since some BIOSes do not set FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX bit when TXT is enabled. Signed-off-by: Shane Wang <shane.wang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:29:31 +02:00
Xiao Guangrong	e730b63cc0	KVM: MMU: don't mark spte notrap if reserved bit set If reserved bit is set, we need inject the #PF with PFEC.RSVD=1, but shadow_notrap_nonpresent_pte injects #PF with PFEC.RSVD=0 only Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:29:27 +02:00
Avi Kivity	945ee35e07	KVM: Mask KVM_GET_SUPPORTED_CPUID data with Linux cpuid info This allows Linux to mask cpuid bits if, for example, nx is enabled on only some cpus. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:29:17 +02:00
Avi Kivity	2a6b20b83d	KVM: SVM: Replace svm_has() by standard Linux cpuid accessors Instead of querying cpuid directly, use the Linux accessors (boot_cpu_has, etc.). This allows the things like the clearcpuid kernel command line to work (when it's fixed wrt scattered cpuid bits). Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:29:16 +02:00
Xiao Guangrong	c4806acdce	KVM: MMU: fix apf prefault if nested guest is enabled If apf is generated in L2 guest and is completed in L1 guest, it will prefault this apf in L1 guest's mmu context. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:14 +02:00
Xiao Guangrong	060c2abe6c	KVM: MMU: support apf for nonpaing guest Let's support apf for nonpaing guest Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:13 +02:00
Xiao Guangrong	e5f3f02796	KVM: MMU: clear apfs if page state is changed If CR0.PG is changed, the page fault cann't be avoid when the prefault address is accessed later And it also fix a bug: it can retry a page enabled #PF in page disabled context if mmu is shadow page This idear is from Gleb Natapov Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:12 +02:00
Xiao Guangrong	5054c0de66	KVM: MMU: fix missing post sync audit Add AUDIT_POST_SYNC audit for long mode shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:11 +02:00
Jan Kiszka	d89f5eff70	KVM: Clean up vm creation and release IA64 support forces us to abstract the allocation of the kvm structure. But instead of mixing this up with arch-specific initialization and doing the same on destruction, split both steps. This allows to move generic destruction calls into generic code. It also fixes error clean-up on failures of kvm_create_vm for IA64. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:09 +02:00
Tracey Dent	9d893c6bc1	KVM: x86: Makefile clean up Changed makefile to use the ccflags-y option instead of EXTRA_CFLAGS. Signed-off-by: Tracey Dent <tdent48227@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:29:08 +02:00
Avi Kivity	30bd0c4c6c	KVM: VMX: Disallow NMI while blocked by STI While not mandated by the spec, Linux relies on NMI being blocked by an IF-enabling STI. VMX also refuses to enter a guest in this state, at least on some implementations. Disallow NMI while blocked by STI by checking for the condition, and requesting an interrupt window exit if it occurs. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:29:04 +02:00
Xiao Guangrong	e6d53e3b0d	KVM: avoid unnecessary wait for a async pf In current code, it checks async pf completion out of the wait context, like this: if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE && !vcpu->arch.apf.halted) r = vcpu_enter_guest(vcpu); else { ...... kvm_vcpu_block(vcpu) ^- waiting until 'async_pf.done' is not empty } kvm_check_async_pf_completion(vcpu) ^- delete list from async_pf.done So, if we check aysnc pf completion first, it can be blocked at kvm_vcpu_block Fixed by mark the vcpu is unhalted in kvm_check_async_pf_completion() path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:29:00 +02:00
Xiao Guangrong	c7d28c2404	KVM: fix searching async gfn in kvm_async_pf_gfn_slot Don't search later slots if the slot is empty Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:28:59 +02:00
Xiao Guangrong	c9b263d2be	KVM: fix tracing kvm_try_async_get_page Tracing 'async' and pfn is useless, since 'async' is always true, and 'pfn' is always "fault_pfn' We can trace 'gva' and 'gfn' instead, it can help us to see the life-cycle of an async_pf Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:28:56 +02:00
Gleb Natapov	ec25d5e66e	KVM: handle exit due to INVD in VMX Currently the exit is unhandled, so guest halts with error if it tries to execute INVD instruction. Call into emulator when INVD instruction is executed by a guest instead. This instruction is not needed by ordinary guests, but firmware (like OpenBIOS) use it and fail. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:28:53 +02:00
Jan Kiszka	2eec734374	KVM: x86: Avoid issuing wbinvd twice Micro optimization to avoid calling wbinvd twice on the CPU that has to emulate it. As we might be preempted between smp_call_function_many and the local wbinvd, the cache might be filled again so that real work could be done uselessly. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:28:52 +02:00
Takuya Yoshikawa	515a01279a	KVM: pre-allocate one more dirty bitmap to avoid vmalloc() Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by vmalloc() which will be used in the next logging and this has been causing bad effect to VGA and live-migration: vmalloc() consumes extra systime, triggers tlb flush, etc. This patch resolves this issue by pre-allocating one more bitmap and switching between two bitmaps during dirty logging. Performance improvement: I measured performance for the case of VGA update by trace-cmd. The result was 1.5 times faster than the original one. In the case of live migration, the improvement ratio depends on the workload and the guest memory size. In general, the larger the memory size is the more benefits we get. Note: This does not change other architectures's logic but the allocation size becomes twice. This will increase the actual memory consumption only when the new size changes the number of pages allocated by vmalloc(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:28:46 +02:00
Marcelo Tosatti	612819c3c6	KVM: propagate fault r/w information to gup(), allow read-only memory As suggested by Andrea, pass r/w error code to gup(), upgrading read fault to writable if host pte allows it. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:28:40 +02:00
Marcelo Tosatti	7905d9a5ad	KVM: MMU: flush TLBs on writable -> read-only spte overwrite This can happen in the following scenario: vcpu0 vcpu1 read fault gup(.write=0) gup(.write=1) reuse swap cache, no COW set writable spte use writable spte set read-only spte Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:23:39 +02:00
Marcelo Tosatti	982c25658c	KVM: MMU: remove kvm_mmu_set_base_ptes Unused. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:23:38 +02:00
Marcelo Tosatti	ff1fcb9ebd	KVM: VMX: remove setting of shadow_base_ptes for EPT The EPT present/writable bits use the same position as normal pagetable bits. Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte. Also pass present/writable error code information from EPT violation to generic pagefault handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:23:37 +02:00
Avi Kivity	83bcacb1a5	KVM: Avoid double interrupt injection with vapic After an interrupt injection, the PPR changes, and we have to reflect that into the vapic. This causes a KVM_REQ_EVENT to be set, which causes the whole interrupt injection routine to be run again (harmlessly). Optimize by only setting KVM_REQ_EVENT if the ppr was lowered; otherwise there is no chance that a new injection is needed. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-12 11:23:36 +02:00
Avi Kivity	82ca2d108b	KVM: SVM: Fold save_host_msrs() and load_host_msrs() into their callers This abstraction only serves to obfuscate. Remove. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:34 +02:00
Avi Kivity	dacccfdd6b	KVM: SVM: Move fs/gs/ldt save/restore to heavyweight exit path ldt is never used in the kernel context; same goes for fs (x86_64) and gs (i386). So save/restore them in the heavyweight exit path instead of the lightweight path. By itself, this doesn't buy us much, but it paves the way for moving vmload and vmsave to the heavyweight exit path, since they modify the same registers. [jan: fix copy/pase mistake on i386] Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:33 +02:00
Avi Kivity	afe9e66f82	KVM: SVM: Move svm->host_gs_base into a separate structure More members will join it soon. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:31 +02:00
Avi Kivity	13c34e073b	KVM: SVM: Move guest register save out of interrupts disabled section Saving guest registers is just a memory copy, and does not need to be in the critical section. Move outside the critical section to improve latency a bit. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:29 +02:00
Andi Kleen	f56f536956	KVM: Move KVM context switch into own function gcc 4.5 with some special options is able to duplicate the VMX context switch asm in vmx_vcpu_run(). This results in a compile error because the inline asm sequence uses an on local label. The non local label is needed because other code wants to set up the return address. This patch moves the asm code into an own function and marks that explicitely noinline to avoid this problem. Better would be probably to just move it into an .S file. The diff looks worse than the change really is, it's all just code movement and no logic change. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:26 +02:00
Jan Kiszka	7e1fbeac6f	KVM: x86: Mark kvm_arch_setup_async_pf static It has no user outside mmu.c and also no prototype. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:25 +02:00
Gleb Natapov	fc5f06fac6	KVM: Send async PF when guest is not in userspace too. If guest indicates that it can handle async pf in kernel mode too send it, but only if interrupts are enabled. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:22 +02:00
Gleb Natapov	6adba52742	KVM: Let host know whether the guest can handle async PF in non-userspace context. If guest can detect that it runs in non-preemptable context it can handle async PFs at any time, so let host know that it can send async PF even if guest cpu is not in userspace. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:21 +02:00
Gleb Natapov	7c90705bf2	KVM: Inject asynchronous page fault into a PV guest if page is swapped out. Send async page fault to a PV guest if it accesses swapped out memory. Guest will choose another task to run upon receiving the fault. Allow async page fault injection only when guest is in user mode since otherwise guest may be in non-sleepable context and will not be able to reschedule. Vcpu will be halted if guest will fault on the same page again or if vcpu executes kernel code. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:17 +02:00
Gleb Natapov	631bc48782	KVM: Handle async PF in a guest. When async PF capability is detected hook up special page fault handler that will handle async page fault events and bypass other page faults to regular page fault handler. Also add async PF handling to nested SVM emulation. Async PF always generates exit to L1 where vcpu thread will be scheduled out until page is available. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:16 +02:00
Gleb Natapov	344d9588a9	KVM: Add PV MSR to enable asynchronous page faults delivery. Guest enables async PF vcpu functionality using this MSR. Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:12 +02:00
Gleb Natapov	49c7754ce5	KVM: Add memory slot versioning and use it to provide fast guest write interface Keep track of memslots changes by keeping generation number in memslots structure. Provide kvm_write_guest_cached() function that skips gfn_to_hva() translation if memslots was not changed since previous invocation. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:08 +02:00
Gleb Natapov	56028d0861	KVM: Retry fault before vmentry When page is swapped in it is mapped into guest memory only after guest tries to access it again and generate another fault. To save this fault we can map it immediately since we know that guest is going to access the page. Do it only when tdp is enabled for now. Shadow paging case is more complicated. CR[034] and EFER registers should be switched before doing mapping and then switched back. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:23:06 +02:00
Gleb Natapov	af585b921e	KVM: Halt vcpu if page it tries to access is swapped out If a guest accesses swapped out memory do not swap it in from vcpu thread context. Schedule work to do swapping and put vcpu into halted state instead. Interrupts will still be delivered to the guest and if interrupt will cause reschedule guest will continue to run another task. [avi: remove call to get_user_pages_noio(), nacked by Linus; this makes everything synchrnous again] Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2011-01-12 11:21:39 +02:00
Linus Torvalds	72eb6a7914	Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits) gameport: use this_cpu_read instead of lookup x86: udelay: Use this_cpu_read to avoid address calculation x86: Use this_cpu_inc_return for nmi counter x86: Replace uses of current_cpu_data with this_cpu ops x86: Use this_cpu_ops to optimize code vmstat: User per cpu atomics to avoid interrupt disable / enable irq_work: Use per cpu atomics instead of regular atomics cpuops: Use cmpxchg for xchg to avoid lock semantics x86: this_cpu_cmpxchg and this_cpu_xchg operations percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support percpu,x86: relocate this_cpu_add_return() and friends connector: Use this_cpu operations xen: Use this_cpu_inc_return taskstats: Use this_cpu_ops random: Use this_cpu_inc_return fs: Use this_cpu_inc_return in buffer.c highmem: Use this_cpu_xx_return() operations vmstat: Use this_cpu_inc_return for vm statistics x86: Support for this_cpu_add, sub, dec, inc_return percpu: Generic support for this_cpu_add, sub, dec, inc_return ... Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c} as per Tejun.	2011-01-07 17:02:58 -08:00
Avi Kivity	010c520e20	KVM: Don't reset mmu context unnecessarily when updating EFER The only bit of EFER that affects the mmu is NX, and this is already accounted for (LME only takes effect when changing cr0). Based on a patch by Hillf Danton. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-02 12:05:15 +02:00
Avi Kivity	d0dfc6b74a	KVM: i8259: initialize isr_ack isr_ack is never initialized. So, until the first PIC reset, interrupts may fail to be injected. This can cause Windows XP to fail to boot, as reported in the fallout from the fix to https://bugzilla.kernel.org/show_bug.cgi?id=21962. Reported-and-tested-by: Nicolas Prochazka <prochazka.nicolas@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-02 11:52:48 +02:00
Tejun Heo	0a3aee0da4	x86: Use this_cpu_ops to optimize code Go through x86 code and replace __get_cpu_var and get_cpu_var instances that refer to a scalar and are not used for address determinations. Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2010-12-30 12:20:28 +01:00
Avi Kivity	649497d1a3	KVM: MMU: Fix incorrect direct gfn for unpaged mode shadow We use the physical address instead of the base gfn for the four PAE page directories we use in unpaged mode. When the guest accesses an address above 1GB that is backed by a large host page, a BUG_ON() in kvm_mmu_set_gfn() triggers. Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=21962 Reported-and-tested-by: Nicolas Prochazka <prochazka.nicolas@gmail.com> KVM-Stable-Tag. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-12-29 12:35:29 +02:00
Avi Kivity	3e26f23091	KVM: Fix preemption counter leak in kvm_timer_init() Based on a patch from Thomas Meyer. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-12-16 12:39:31 +02:00
Joerg Roedel	24d1b15f72	KVM: SVM: Do not report xsave in supported cpuid To support xsave properly for the guest the SVM module need software support for it. As long as this is not present do not report the xsave as supported feature in cpuid. As a side-effect this patch moves the bit() helper function into the x86.h file so that it can be used in svm.c too. KVM-Stable-Tag. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-12-08 17:28:37 +02:00
Sheng Yang	3ea3aa8cf6	KVM: Fix OSXSAVE after migration CPUID's OSXSAVE is a mirror of CR4.OSXSAVE bit. We need to update the CPUID after migration. KVM-Stable-Tag. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-12-08 17:28:31 +02:00
Avi Kivity	c8770e7ba6	KVM: VMX: Fix host userspace gsbase corruption We now use load_gs_index() to load gs safely; unfortunately this also changes MSR_KERNEL_GS_BASE, which we managed separately. This resulted in confusion and breakage running 32-bit host userspace on a 64-bit kernel. Fix by - saving guest MSR_KERNEL_GS_BASE before we we reload the host's gs - doing the host save/load unconditionally, instead of only when in guest long mode Things can be cleaned up further, but this is the minmal fix for now. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-11-17 19:48:05 -02:00
Avi Kivity	0a77fe4c18	KVM: Correct ordering of ldt reload wrt fs/gs reload If fs or gs refer to the ldt, they must be reloaded after the ldt. Reorder the code to that effect. Userspace code that uses the ldt with kvm is nonexistent, so this doesn't fix a user-visible bug. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-11-17 19:47:59 -02:00
Jan Kiszka	453d9c57e2	KVM: x86: Issue smp_call_function_many with preemption disabled smp_call_function_many is specified to be called only with preemption disabled. Fulfill this requirement. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-11-05 14:42:27 -02:00
Vasiliy Kulikov	97e69aa62f	KVM: x86: fix information leak to userland Structures kvm_vcpu_events, kvm_debugregs, kvm_pit_state2 and kvm_clock_data are copied to userland with some padding and reserved fields unitialized. It leads to leaking of contents of kernel stack memory. We have to initialize them to zero. In patch v1 Jan Kiszka suggested to fill reserved fields with zeros instead of memset'ting the whole struct. It makes sense as these fields are explicitly marked as padding. No more fields need zeroing. KVM-Stable-Tag. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-11-05 14:42:27 -02:00
Marcelo Tosatti	eb45fda45f	KVM: MMU: fix rmap_remove on non present sptes drop_spte should not attempt to rmap_remove a non present shadow pte. This fixes a BUG_ON seen on kvm-autotest. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reported-by: Lucas Meneghel Rodrigues <lmr@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-11-05 14:42:26 -02:00
Michael S. Tsirkin	edde99ce05	KVM: Write protect memory after slot swap I have observed the following bug trigger: 1. userspace calls GET_DIRTY_LOG 2. kvm_mmu_slot_remove_write_access is called and makes a page ro 3. page fault happens and makes the page writeable fault is logged in the bitmap appropriately 4. kvm_vm_ioctl_get_dirty_log swaps slot pointers a lot of time passes 5. guest writes into the page 6. userspace calls GET_DIRTY_LOG At point (5), bitmap is clean and page is writeable, thus, guest modification of memory is not logged and GET_DIRTY_LOG returns an empty bitmap. The rule is that all pages are either dirty in the current bitmap, or write-protected, which is violated here. It seems that just moving kvm_mmu_slot_remove_write_access down to after the slot pointer swap should fix this bug. KVM-Stable-Tag. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-11-05 14:42:25 -02:00
Linus Torvalds	1765a1fe5d	Merge branch 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (321 commits) KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages KVM: Fix signature of kvm_iommu_map_pages stub KVM: MCE: Send SRAR SIGBUS directly KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED KVM: fix typo in copyright notice KVM: Disable interrupts around get_kernel_ns() KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address KVM: MMU: move access code parsing to FNAME(walk_addr) function KVM: MMU: audit: check whether have unsync sps after root sync KVM: MMU: audit: introduce audit_printk to cleanup audit code KVM: MMU: audit: unregister audit tracepoints before module unloaded KVM: MMU: audit: fix vcpu's spte walking KVM: MMU: set access bit for direct mapping KVM: MMU: cleanup for error mask set while walk guest page table KVM: MMU: update 'root_hpa' out of loop in PAE shadow path KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn() KVM: x86: Fix constant type in kvm_get_time_scale KVM: VMX: Add AX to list of registers clobbered by guest switch KVM guest: Move a printk that's using the clock before it's ready KVM: x86: TSC catchup mode ...	2010-10-24 12:47:25 -07:00
Huang Ying	77db5cbd29	KVM: MCE: Send SRAR SIGBUS directly Originally, SRAR SIGBUS is sent to QEMU-KVM via touching the poisoned page. But commit `9605456919` prevents the signal from being sent. So now the signal is sent via force_sig_info_fault directly. [marcelo: use send_sig_info instead] Reported-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:15 +02:00
Huang Ying	5854dbca9b	KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED Now we have MCG_SER_P (and corresponding SRAO/SRAR MCE) support in kernel and QEMU-KVM, the MCG_SER_P should be added into KVM_MCE_CAP_SUPPORTED to make all these code really works. Reported-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:15 +02:00
Nicolas Kaiser	9611c18777	KVM: fix typo in copyright notice Fix typo in copyright notice. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:14 +02:00
Avi Kivity	395c6b0a9d	KVM: Disable interrupts around get_kernel_ns() get_kernel_ns() wants preemption disabled. It doesn't make a lot of sense during the get/set ioctls (no way to make them non-racy) but the callee wants it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:14 +02:00
Avi Kivity	7ebaf15eef	KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:14 +02:00
Xiao Guangrong	3377078027	KVM: MMU: move access code parsing to FNAME(walk_addr) function Move access code parsing from caller site to FNAME(walk_addr) function Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:14 +02:00
Xiao Guangrong	6903074c36	KVM: MMU: audit: check whether have unsync sps after root sync After root synced, all unsync sps are synced, this patch add a check to make sure it's no unsync sps in VCPU's page table Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:14 +02:00
Xiao Guangrong	38904e1287	KVM: MMU: audit: introduce audit_printk to cleanup audit code Introduce audit_printk, and record audit point instead audit name Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:13 +02:00
Xiao Guangrong	c42fffe3a3	KVM: MMU: audit: unregister audit tracepoints before module unloaded fix: Call Trace: [<ffffffffa01e46ba>] ? kvm_mmu_pte_write+0x229/0x911 [kvm] [<ffffffffa01c6ba9>] ? gfn_to_memslot+0x39/0xa0 [kvm] [<ffffffffa01c6c26>] ? mark_page_dirty+0x16/0x2e [kvm] [<ffffffffa01c6d6f>] ? kvm_write_guest_page+0x67/0x7f [kvm] [<ffffffff81066fbd>] ? local_clock+0x2a/0x3b [<ffffffffa01d52ce>] emulator_write_phys+0x46/0x54 [kvm] ...... Code: Bad RIP value. RIP [<ffffffffa0172056>] 0xffffffffa0172056 RSP <ffff880134f69a70> CR2: ffffffffa0172056 Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:13 +02:00
Xiao Guangrong	98224bf1d1	KVM: MMU: audit: fix vcpu's spte walking After nested nested paging, it may using long mode to shadow 32/PAE paging guest, so this patch fix it Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:12 +02:00
Xiao Guangrong	33f91edb92	KVM: MMU: set access bit for direct mapping Set access bit while setup up direct page table if it's nonpaing or npt enabled, it's good for CPU's speculate access Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:11 +02:00
Xiao Guangrong	20bd40dc64	KVM: MMU: cleanup for error mask set while walk guest page table Small cleanup for set page fault error code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:10 +02:00
Xiao Guangrong	6292757fb0	KVM: MMU: update 'root_hpa' out of loop in PAE shadow path The value of 'vcpu->arch.mmu.pae_root' is not modified, so we can update 'root_hpa' out of the loop. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:09 +02:00
Sheng Yang	7129eecac1	KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn() Eliminate: arch/x86/kvm/emulate.c:801: warning: ‘sv’ may be used uninitialized in this function on gcc 4.1.2 Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:09 +02:00
Jan Kiszka	50933623e5	KVM: x86: Fix constant type in kvm_get_time_scale Older gcc versions complain about the improper type (for x86-32), 4.5 seems to fix this silently. However, we should better use the right type initially. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:08 +02:00
Jan Kiszka	07d6f555d5	KVM: VMX: Add AX to list of registers clobbered by guest switch By chance this caused no harm so far. We overwrite AX during switch to/from guest context, so we must declare this. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:07 +02:00
Zachary Amsden	c285545f81	KVM: x86: TSC catchup mode Negate the effects of AN TYM spell while kvm thread is preempted by tracking conversion factor to the highest TSC rate and catching the TSC up when it has fallen behind the kernel view of time. Note that once triggered, we don't turn off catchup mode. A slightly more clever version of this is possible, which only does catchup when TSC rate drops, and which specifically targets only CPUs with broken TSC, but since these all are considered unstable_tsc(), this patch covers all necessary cases. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:05 +02:00
Zachary Amsden	34c238a1d1	KVM: x86: Rename timer function This just changes some names to better reflect the usage they will be given. Separated out to keep confusion to a minimum. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:05 +02:00
Zachary Amsden	5f4e3f8827	KVM: x86: Make math work for other scales The math in kvm_get_time_scale relies on the fact that NSEC_PER_SEC < 2^32. To use the same function to compute arbitrary time scales, we must extend the first reduction step to shrink the base rate to a 32-bit value, and possibly reduce the scaled rate into a 32-bit as well. Note we must take care to avoid an arithmetic overflow when scaling up the tps32 value (this could not happen with the fixed scaled value of NSEC_PER_SEC, but can happen with scaled rates above 2^31. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:04 +02:00
Avi Kivity	49e9d557f9	KVM: VMX: Respect interrupt window in big real mode If an interrupt is pending, we need to stop emulation so we can inject it. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:02 +02:00
Mohammed Gamal	a92601bb70	KVM: VMX: Emulated real mode interrupt injection Replace the inject-as-software-interrupt hack we currently have with emulated injection. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:01 +02:00
Mohammed Gamal	63995653ad	KVM: Add kvm_inject_realmode_interrupt() wrapper This adds a wrapper function kvm_inject_realmode_interrupt() around the emulator function emulate_int_real() to allow real mode interrupt injection. [avi: initialize operand and address sizes before emulating interrupts] [avi: initialize rip for real mode interrupt injection] [avi: clear interrupt pending flag after emulating interrupt injection] Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:01 +02:00
Hillf Danton	cb16a7b387	KVM: MMU: fix counting of rmap entries in rmap_add() It seems that rmap entries are under counted. Signed-off-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:59 +02:00
Gleb Natapov	a0a07cd2c5	KVM: SVM: do not generate "external interrupt exit" if other exit is pending Nested SVM checks for external interrupt after injecting nested exception. In case there is external interrupt pending the code generates "external interrupt exit" and overwrites previous exit info. If previously injected exception already generated exit it will be lost. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:56 +02:00
Avi Kivity	f4f5105087	KVM: Convert PIC lock from raw spinlock to ordinary spinlock The PIC code used to be called from preempt_disable() context, which wasn't very good for PREEMPT_RT. That is no longer the case, so move back from raw_spinlock_t to spinlock_t. Signed-off-by: Avi Kivity <avi@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:56 +02:00
Zachary Amsden	28e4639adf	KVM: x86: Fix kvmclock bug If preempted after kvmclock values are updated, but before hardware virtualization is entered, the last tsc time as read by the guest is never set. It underflows the next time kvmclock is updated if there has not yet been a successful entry / exit into hardware virt. Fix this by simply setting last_tsc to the newly read tsc value so that any computed nsec advance of kvmclock is nulled. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:56 +02:00
Joerg Roedel	0959ffacf3	KVM: MMU: Don't track nested fault info in error-code This patch moves the detection whether a page-fault was nested or not out of the error code and moves it into a separate variable in the fault struct. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:55 +02:00
Avi Kivity	625831a3f4	KVM: VMX: Move fixup_rmode_irq() to avoid forward declaration No code changes. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:54 +02:00
Avi Kivity	b463a6f744	KVM: Non-atomic interrupt injection Change the interrupt injection code to work from preemptible, interrupts enabled context. This works by adding a ->cancel_injection() operation that undoes an injection in case we were not able to actually enter the guest (this condition could never happen with atomic injection). Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:54 +02:00
Avi Kivity	83422e17c1	KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and entry Currently vmx_complete_interrupts() can decode event information from vmx exit fields into the generic kvm event queues. Make it able to decode the information from the entry fields as well by parametrizing it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:52 +02:00
Avi Kivity	537b37e267	KVM: VMX: Move real-mode interrupt injection fixup to vmx_complete_interrupts() This allows reuse of vmx_complete_interrupts() for cancelling injections. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:51 +02:00
Avi Kivity	51aa01d13d	KVM: VMX: Split up vmx_complete_interrupts() vmx_complete_interrupts() does too much, split it up: - vmx_vcpu_run() gets the "cache important vmcs fields" part - a new vmx_complete_atomic_exit() gets the parts that must be done atomically - a new vmx_recover_nmi_blocking() does what its name says - vmx_complete_interrupts() retains the event injection recovery code This helps in reducing the work done in atomic context. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:51 +02:00
Avi Kivity	3842d135ff	KVM: Check for pending events before attempting injection Instead of blindly attempting to inject an event before each guest entry, check for a possible event first in vcpu->requests. Sites that can trigger event injection are modified to set KVM_REQ_EVENT: - interrupt, nmi window opening - ppr updates - i8259 output changes - local apic irr changes - rflags updates - gif flag set - event set on exit This improves non-injecting entry performance, and sets the stage for non-atomic injection. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:50 +02:00
Avi Kivity	b0bc3ee2b5	KVM: MMU: Fix regression with ept memory types merged into non-ept page tables Commit "KVM: MMU: Make tdp_enabled a mmu-context parameter" made real-mode set ->direct_map, and changed the code that merges in the memory type depend on direct_map instead of tdp_enabled. However, in this case what really matters is tdp, not direct_map, since tdp changes the pte format regardless of whether the mapping is direct or not. As a result, real-mode shadow mappings got corrupted with ept memory types. The result was a huge slowdown, likely due to the cache being disabled. Change it back as the simplest fix for the regression (real fix is to move all that to vmx code, and not use tdp_enabled as a synonym for ept). Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:49 +02:00
Joerg Roedel	4c62a2dc92	KVM: X86: Report SVM bit to userspace only when supported This patch fixes a bug in KVM where it _always_ reports the support of the SVM feature to userspace. But KVM only supports SVM on AMD hardware and only when it is enabled in the kernel module. This patch fixes the wrong reporting. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:48 +02:00
Joerg Roedel	3d4aeaad8b	KVM: SVM: Report Nested Paging support to userspace This patch implements the reporting of the nested paging feature support to userspace. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:47 +02:00
Joerg Roedel	55c5e464fc	KVM: SVM: Expect two more candiates for exit_int_info This patch adds INTR and NMI intercepts to the list of expected intercepts with an exit_int_info set. While this can't happen on bare metal it is architectural legal and may happen with KVMs SVM emulation. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:46 +02:00
Joerg Roedel	4b16184c1c	KVM: SVM: Initialize Nested Nested MMU context on VMRUN This patch adds code to initialize the Nested Nested Paging MMU context when the L1 guest executes a VMRUN instruction and has nested paging enabled in its VMCB. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:46 +02:00
Joerg Roedel	5bd2edc341	KVM: SVM: Implement MMU helper functions for Nested Nested Paging This patch adds the helper functions which will be used in the mmu context for handling nested nested page faults. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:45 +02:00
Joerg Roedel	2d48a985c7	KVM: MMU: Track NX state in struct kvm_mmu With Nested Paging emulation the NX state between the two MMU contexts may differ. To make sure that always the right fault error code is recorded this patch moves the NX state into struct kvm_mmu so that the code can distinguish between L1 and L2 NX state. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:44 +02:00
Joerg Roedel	81407ca553	KVM: MMU: Allow long mode shadows for legacy page tables Currently the KVM softmmu implementation can not shadow a 32 bit legacy or PAE page table with a long mode page table. This is a required feature for nested paging emulation because the nested page table must alway be in host format. So this patch implements the missing pieces to allow long mode page tables for page table types. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:43 +02:00
Joerg Roedel	651dd37a9c	KVM: MMU: Refactor mmu_alloc_roots function This patch factors out the direct-mapping paths of the mmu_alloc_roots function into a seperate function. This makes it a lot easier to avoid all the unnecessary checks done in the shadow path which may break when running direct. In fact, this patch already fixes a problem when running PAE guests on a PAE shadow page table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:42 +02:00
Joerg Roedel	d41d1895eb	KVM: MMU: Introduce kvm_pdptr_read_mmu This function is implemented to load the pdptr pointers of the currently running guest (l1 or l2 guest). Therefore it takes care about the current paging mode and can read pdptrs out of l2 guest physical memory. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:42 +02:00
Joerg Roedel	ff03a073e7	KVM: MMU: Add kvm_mmu parameter to load_pdptrs function This function need to be able to load the pdptrs from any mmu context currently in use. So change this function to take an kvm_mmu parameter to fit these needs. As a side effect this patch also moves the cached pdptrs from vcpu_arch into the kvm_mmu struct. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:41 +02:00
Joerg Roedel	d47f00a62b	KVM: X86: Propagate fetch faults KVM currently ignores fetch faults in the instruction emulator. With nested-npt we could have such faults. This patch adds the code to handle these. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:41 +02:00
Joerg Roedel	d4f8cf664e	KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa This patch implements logic to make sure that either a page-fault/page-fault-vmexit or a nested-page-fault-vmexit is propagated back to the guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:40 +02:00
Joerg Roedel	02f59dc9f1	KVM: MMU: Introduce init_kvm_nested_mmu() This patch introduces the init_kvm_nested_mmu() function which is used to re-initialize the nested mmu when the l2 guest changes its paging mode. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:39 +02:00
Joerg Roedel	3d06b8bfd4	KVM: MMU: Introduce kvm_read_nested_guest_page() This patch introduces the kvm_read_guest_page_x86 function which reads from the physical memory of the guest. If the guest is running in guest-mode itself with nested paging enabled it will read from the guest's guest physical memory instead. The patch also changes changes the code to use this function where it is necessary. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:38 +02:00
Joerg Roedel	2329d46d21	KVM: MMU: Make walk_addr_generic capable for two-level walking This patch uses kvm_read_guest_page_tdp to make the walk_addr_generic functions suitable for two-level page table walking. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:38 +02:00
Joerg Roedel	ec92fe44e7	KVM: X86: Add kvm_read_guest_page_mmu function This patch adds a function which can read from the guests physical memory or from the guest's guest physical memory. This will be used in the two-dimensional page table walker. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:37 +02:00
Joerg Roedel	6539e738f6	KVM: MMU: Implement nested gva_to_gpa functions This patch adds the functions to do a nested l2_gva to l1_gpa page table walk. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:36 +02:00
Joerg Roedel	14dfe855f9	KVM: X86: Introduce pointer to mmu context used for gva_to_gpa This patch introduces the walk_mmu pointer which points to the mmu-context currently used for gva_to_gpa translations. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:35 +02:00
Joerg Roedel	c30a358d33	KVM: MMU: Add infrastructure for two-level page walker This patch introduces a mmu-callback to translate gpa addresses in the walk_addr code. This is later used to translate l2_gpa addresses into l1_gpa addresses. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:34 +02:00
Joerg Roedel	1e301feb07	KVM: MMU: Introduce generic walk_addr function This is the first patch in the series towards a generic walk_addr implementation which could walk two-dimensional page tables in the end. In this first step the walk_addr function is renamed into walk_addr_generic which takes a mmu context as an additional parameter. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:33 +02:00
Joerg Roedel	8df25a328a	KVM: MMU: Track page fault data in struct vcpu This patch introduces a struct with two new fields in vcpu_arch for x86: * fault.address * fault.error_code This will be used to correctly propagate page faults back into the guest when we could have either an ordinary page fault or a nested page fault. In the case of a nested page fault the fault-address is different from the original address that should be walked. So we need to keep track about the real fault-address. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:33 +02:00
Joerg Roedel	3241f22da8	KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu This patch changes is_rsvd_bits_set() function prototype to take only a kvm_mmu context instead of a full vcpu. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:32 +02:00
Joerg Roedel	52fde8df7d	KVM: MMU: Introduce kvm_init_shadow_mmu helper function Some logic of the init_kvm_softmmu function is required to build the Nested Nested Paging context. So factor the required logic into a seperate function and export it. Also make the whole init path suitable for more than one mmu context. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:32 +02:00
Joerg Roedel	cb659db8a7	KVM: MMU: Introduce inject_page_fault function pointer This patch introduces an inject_page_fault function pointer into struct kvm_mmu which will be used to inject a page fault. This will be used later when Nested Nested Paging is implemented. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:31 +02:00
Joerg Roedel	5777ed340d	KVM: MMU: Introduce get_cr3 function pointer This function pointer in the MMU context is required to implement Nested Nested Paging. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:31 +02:00
Joerg Roedel	1c97f0a04c	KVM: X86: Introduce a tdp_set_cr3 function This patch introduces a special set_tdp_cr3 function pointer in kvm_x86_ops which is only used for tpd enabled mmu contexts. This allows to remove some hacks from svm code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:30 +02:00
Joerg Roedel	f43addd461	KVM: MMU: Make set_cr3 a function pointer in kvm_mmu This is necessary to implement Nested Nested Paging. As a side effect this allows some cleanups in the SVM nested paging code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:29 +02:00
Joerg Roedel	c5a78f2b64	KVM: MMU: Make tdp_enabled a mmu-context parameter This patch changes the tdp_enabled flag from its global meaning to the mmu-context and renames it to direct_map there. This is necessary for Nested SVM with emulation of Nested Paging where we need an extra MMU context to shadow the Nested Nested Page Table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:28 +02:00
Joerg Roedel	957446afce	KVM: MMU: Check for root_level instead of long mode The walk_addr function checks for !is_long_mode in its 64 bit version. But what is meant here is a check for pae paging. Change the condition to really check for pae paging so that it also works with nested nested paging. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:27 +02:00
Jes Sorensen	7b91409822	KVM: x86: Emulate MSR_EBC_FREQUENCY_ID Some operating systems store data about the host processor at the time of installation, and when booted on a more uptodate cpu tries to read MSR_EBC_FREQUENCY_ID. This has been found with XP. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:27 +02:00
Roedel, Joerg	b75f4eb341	KVM: SVM: Clean up rip handling in vmrun emulation This patch changes the rip handling in the vmrun emulation path from using next_rip to the generic kvm register access functions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:25 +02:00
Joerg Roedel	cda0008299	KVM: SVM: Restore correct registers after sel_cr0 intercept emulation This patch implements restoring of the correct rip, rsp, and rax after the svm emulation in KVM injected a selective_cr0 write intercept into the guest hypervisor. The problem was that the vmexit is emulated in the instruction emulation which later commits the registers right after the write-cr0 instruction. So the l1 guest will continue to run with the l2 rip, rsp and rax resulting in unpredictable behavior. This patch is not the final word, it is just an easy patch to fix the issue. The real fix will be done when the instruction emulator is made aware of nested virtualization. Until this is done this patch fixes the issue and provides an easy way to fix this in -stable too. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:24 +02:00
Joerg Roedel	f87f928882	KVM: MMU: Fix 32 bit legacy paging with NPT This patch fixes 32 bit legacy paging with NPT enabled. The mmu_check_root call on the top-level of the loop causes root_gfn to take values (in the tdp_enabled path) which are outside of guest memory. So the mmu_check_root call fails at some point in the loop interation causing the guest to tiple-fault. This patch changes the mmu_check_root calls to the places where they are really necessary. As a side-effect it introduces a check for the root of a pae page table too. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:23 +02:00
Xiao Guangrong	30644b902c	KVM: MMU: lower the aduit frequency The audit is very high overhead, so we need lower the frequency to assure the guest is running. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:59 +02:00
Xiao Guangrong	eb2591865a	KVM: MMU: improve spte audit Both audit_mappings() and audit_sptes_have_rmaps() need to walk vcpu's page table, so we can do these checking in a spte walking Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:58 +02:00
Xiao Guangrong	49edf87806	KVM: MMU: improve active sp audit Both audit_rmap() and audit_write_protection() need to walk all active sp, so we can do these checking in a sp walking Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:57 +02:00
Xiao Guangrong	2f4f337248	KVM: MMU: move audit to a separate file Move the audit code from arch/x86/kvm/mmu.c to arch/x86/kvm/mmu_audit.c Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:57 +02:00
Xiao Guangrong	8b1fe17cc7	KVM: MMU: support disable/enable mmu audit dynamicly Add a r/w module parameter named 'mmu_audit', it can control audit enable/disable: enable: echo 1 > /sys/module/kvm/parameters/mmu_audit disable: echo 0 > /sys/module/kvm/parameters/mmu_audit This patch not change the logic Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:56 +02:00
Jes Sorensen	84e0cefa8d	KVM: Fix guest kernel crash on MSR_K7_CLK_CTL MSR_K7_CLK_CTL is a no longer documented MSR, which is only relevant on said old AMD CPU models. This change returns the expected value, which the Linux kernel is expecting to avoid writing back the MSR, plus it ignores all writes to the MSR. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:55 +02:00
Avi Kivity	9ed049c3b6	KVM: i8259: Make ICW1 conform to spec ICW is not a full reset, instead it resets a limited number of registers in the PIC. Change ICW1 emulation to only reset those registers. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:54 +02:00
Avi Kivity	7d9ddaedd8	KVM: x86 emulator: clean up control flow in x86_emulate_insn() x86_emulate_insn() is full of things like if (rc != X86EMUL_CONTINUE) goto done; break; consolidate all of those at the end of the switch statement. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:54 +02:00
Avi Kivity	a4d4a7c188	KVM: x86 emulator: fix group 11 decoding for reg != 0 These are all undefined. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:53 +02:00
Avi Kivity	b9eac5f4d1	KVM: x86 emulator: use single stage decoding for mov instructions Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:52 +02:00
Avi Kivity	e90aa41e6c	KVM: Don't save/restore MSR_IA32_PERF_STATUS It is read/only; restoring it only results in annoying messages. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:51 +02:00
Marcelo Tosatti	eaa48512ba	KVM: SVM: init_vmcb should reset vcpu->efer Otherwise EFER_LMA bit is retained across a SIPI reset. Fixes guest cpu onlining. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:51 +02:00
Marcelo Tosatti	678041ad9d	KVM: SVM: reset mmu context in init_vmcb Since commit `aad827034e` no mmu reinitialization is performed via init_vmcb. Zero vcpu->arch.cr0 and pass the reset value as a parameter to kvm_set_cr0. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:50 +02:00
Avi Kivity	c41a15dd46	KVM: Fix pio trace direction out = write, in = read, not the other way round. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:49 +02:00
Xiao Guangrong	8e0e8afa82	KVM: MMU: remove count_rmaps() Nothing is checked in count_rmaps(), so remove it Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:49 +02:00
Xiao Guangrong	365fb3fdf6	KVM: MMU: rewrite audit_mappings_page() function There is a bugs in this function, we call gfn_to_pfn() and kvm_mmu_gva_to_gpa_read() in atomic context(kvm_mmu_audit() is called under the spinlock(mmu_lock)'s protection). This patch fix it by: - introduce gfn_to_pfn_atomic instead of gfn_to_pfn - get the mapping gfn from kvm_mmu_page_get_gfn() And it adds 'notrap' ptes check in unsync/direct sps Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:48 +02:00
Xiao Guangrong	bc32ce2152	KVM: MMU: fix wrong not write protected sp report The audit code reports some sp not write protected in current code, it's just the bug in audit_write_protection(), since: - the invalid sp not need write protected - using uninitialize local variable('gfn') - call kvm_mmu_audit() out of mmu_lock's protection Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:47 +02:00
Xiao Guangrong	0beb8d6604	KVM: MMU: check rmap for every spte The read-only spte also has reverse mapping, so fix the code to check them, also modify the function name to fit its doing Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:46 +02:00
Xiao Guangrong	9ad17b1001	KVM: MMU: fix compile warning in audit code fix: arch/x86/kvm/mmu.c: In function ‘kvm_mmu_unprotect_page’: arch/x86/kvm/mmu.c:1741: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’ arch/x86/kvm/mmu.c:1745: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’ arch/x86/kvm/mmu.c: In function ‘mmu_unshadow’: arch/x86/kvm/mmu.c:1761: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’ arch/x86/kvm/mmu.c: In function ‘set_spte’: arch/x86/kvm/mmu.c:2005: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’ arch/x86/kvm/mmu.c: In function ‘mmu_set_spte’: arch/x86/kvm/mmu.c:2033: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 7 has type ‘gfn_t’ Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:46 +02:00
Jason Wang	23e7a7944f	KVM: pit: Do not check pending pit timer in vcpu thread Pit interrupt injection was done by workqueue, so no need to check pending pit timer in vcpu thread which could lead unnecessary unblocking of vcpu. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:45 +02:00
Avi Kivity	6230f7fc04	KVM: x86 emulator: simplify ALU opcode block decode further The ALU opcode block is very regular; introduce D6ALU() to define decode flags for 6 instructions at a time. Suggested by Paolo Bonzini. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:43 +02:00
Avi Kivity	217fc9cfca	KVM: Fix build error due to 64-bit division in nsec_to_cycles() Use do_div() instead. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:43 +02:00
Avi Kivity	34d1f4905e	KVM: x86 emulator: trap and propagate #DE from DIV and IDIV Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:42 +02:00
Avi Kivity	f6b3597bde	KVM: x86 emulator: add macros for executing instructions that may trap Like DIV and IDIV. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:41 +02:00
Avi Kivity	739ae40606	KVM: x86 emulator: simplify instruction decode flags for opcodes 0F 00-FF Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:41 +02:00
Avi Kivity	d269e3961a	KVM: x86 emulator: simplify instruction decode flags for opcodes E0-FF Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:40 +02:00
Avi Kivity	d2c6c7adb1	KVM: x86 emulator: simplify instruction decode flags for opcodes C0-DF Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:39 +02:00
Avi Kivity	50748613d1	KVM: x86 emulator: simplify instruction decode flags for opcodes A0-AF Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:38 +02:00
Avi Kivity	76e8e68d44	KVM: x86 emulator: simplify instruction decode flags for opcodes 80-8F Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:38 +02:00
Avi Kivity	48fe67b5f7	KVM: x86 emulator: simplify string instruction decode flags Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:37 +02:00
Avi Kivity	5315fbb223	KVM: x86 emulator: simplify ALU block (opcodes 00-3F) decode flags Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:36 +02:00
Avi Kivity	8d8f4e9f66	KVM: x86 emulator: support byte/word opcode pairs Many x86 instructions come in byte and word variants distinguished with bit 0 of the opcode. Add macros to aid in defining them. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:35 +02:00
Avi Kivity	081bca0e6b	KVM: x86 emulator: refuse SrcMemFAddr (e.g. LDS) with register operand SrcMemFAddr is not defined with the modrm operand designating a register instead of a memory address. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:35 +02:00
Gleb Natapov	d2ddd1c483	KVM: x86 emulator: get rid of "restart" in emulation context. x86_emulate_insn() will return 1 if instruction can be restarted without re-entering a guest. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:34 +02:00
Gleb Natapov	3e2f65d57a	KVM: x86 emulator: move string instruction completion check into separate function Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:33 +02:00
Gleb Natapov	6e2fb2cadd	KVM: x86 emulator: Rename variable that shadows another local variable. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:32 +02:00
Wei Yongjun	cc4feed57f	KVM: x86 emulator: add CALL FAR instruction emulation (opcode 9a) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:31 +02:00
Xiao Guangrong	189be38db3	KVM: MMU: combine guest pte read between fetch and pte prefetch Combine guest pte read between guest pte check in the fetch path and pte prefetch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:28 +02:00
Xiao Guangrong	957ed9effd	KVM: MMU: prefetch ptes when intercepted guest #PF Support prefetch ptes when intercept guest #PF, avoid to #PF by later access If we meet any failure in the prefetch path, we will exit it and not try other ptes to avoid become heavy path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:27 +02:00
Zachary Amsden	1d5f066e0b	KVM: x86: Fix a possible backwards warp of kvmclock Kernel time, which advances in discrete steps may progress much slower than TSC. As a result, when kvmclock is adjusted to a new base, the apparent time to the guest, which runs at a much higher, nsec scaled rate based on the current TSC, may have already been observed to have a larger value (kernel_ns + scaled tsc) than the value to which we are setting it (kernel_ns + 0). We must instead compute the clock as potentially observed by the guest for kernel_ns to make sure it does not go backwards. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:24 +02:00
Zachary Amsden	ca84d1a24c	KVM: x86: Add clock sync request to hardware enable If there are active VCPUs which are marked as belonging to a particular hardware CPU, request a clock sync for them when enabling hardware; the TSC could be desynchronized on a newly arriving CPU, and we need to recompute guests system time relative to boot after a suspend event. This covers both cases. Note that it is acceptable to take the spinlock, as either no other tasks will be running and no locks held (BSP after resume), or other tasks will be guaranteed to drop the lock relatively quickly (AP on CPU_STARTING). Noting we now get clock synchronization requests for VCPUs which are starting up (or restarting), it is tempting to attempt to remove the arch/x86/kvm/x86.c CPU hot-notifiers at this time, however it is not correct to do so; they are required for systems with non-constant TSC as the frequency may not be known immediately after the processor has started until the cpufreq driver has had a chance to run and query the chipset. Updated: implement better locking semantics for hardware_enable Removed the hack of dropping and retaking the lock by adding the semantic that we always hold kvm_lock when hardware_enable is called. The one place that doesn't need to worry about it is resume, as resuming a frozen CPU, the spinlock won't be taken. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:24 +02:00
Zachary Amsden	46543ba45f	KVM: x86: Robust TSC compensation Make the match of TSC find TSC writes that are close to each other instead of perfectly identical; this allows the compensator to also work in migration / suspend scenarios. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:23 +02:00
Zachary Amsden	759379dd68	KVM: x86: Add helper functions for time computation Add a helper function to compute the kernel time and convert nanoseconds back to CPU specific cycles. Note that these must not be called in preemptible context, as that would mean the kernel could enter software suspend state, which would cause non-atomic operation. Also, convert the KVM_SET_CLOCK / KVM_GET_CLOCK ioctls to use the kernel time helper, these should be bootbased as well. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:23 +02:00
Zachary Amsden	48434c20e1	KVM: x86: Fix deep C-state TSC desynchronization When CPUs with unstable TSCs enter deep C-state, TSC may stop running. This causes us to require resynchronization. Since we can't tell when this may potentially happen, we assume the worst by forcing re-compensation for it at every point the VCPU task is descheduled. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:23 +02:00
Zachary Amsden	e48672fa25	KVM: x86: Unify TSC logic Move the TSC control logic from the vendor backends into x86.c by adding adjust_tsc_offset to x86 ops. Now all TSC decisions can be done in one place. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:23 +02:00
Zachary Amsden	6755bae8e6	KVM: x86: Warn about unstable TSC If creating an SMP guest with unstable host TSC, issue a warning Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	8cfdc00085	KVM: x86: Make cpu_tsc_khz updates use local CPU This simplifies much of the init code; we can now simply always call tsc_khz_changed, optionally passing it a new value, or letting it figure out the existing value (while interrupts are disabled, and thus, by inference from the rule, not raceful against CPU hotplug or frequency updates, which will issue IPIs to the local CPU to perform this very same task). Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	f38e098ff3	KVM: x86: TSC reset compensation Attempt to synchronize TSCs which are reset to the same value. In the case of a reliable hardware TSC, we can just re-use the same offset, but on non-reliable hardware, we can get closer by adjusting the offset to match the elapsed time. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	99e3e30aee	KVM: x86: Move TSC offset writes to common code Also, ensure that the storing of the offset and the reading of the TSC are never preempted by taking a spinlock. While the lock is overkill now, it is useful later in this patch series. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	f4e1b3c8bd	KVM: x86: Convert TSC writes to TSC offset writes Change svm / vmx to be the same internally and write TSC offset instead of bare TSC in helper functions. Isolated as a single patch to contain code movement. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	ae38436b78	KVM: x86: Drop vm_init_tsc This is used only by the VMX code, and is not done properly; if the TSC is indeed backwards, it is out of sync, and will need proper handling in the logic at each and every CPU change. For now, drop this test during init as misguided. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:21 +02:00
Wei Yongjun	45bf21a8ce	KVM: MMU: fix missing percpu counter destroy commit ad05c88266b4cce1c820928ce8a0fb7690912ba1 (KVM: create aggregate kvm_total_used_mmu_pages value) introduce percpu counter kvm_total_used_mmu_pages but never destroy it, this may cause oops when rmmod & modprobe. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:21 +02:00
Xiaotian Feng	80b63faf02	KVM: MMU: fix regression from rework mmu_shrink() code Latest kvm mmu_shrink code rework makes kernel changes kvm->arch.n_used_mmu_pages/ kvm->arch.n_max_mmu_pages at kvm_mmu_free_page/kvm_mmu_alloc_page, which is called by kvm_mmu_commit_zap_page. So the kvm->arch.n_used_mmu_pages or kvm_mmu_available_pages(vcpu->kvm) is unchanged after kvm_mmu_prepare_zap_page(), This caused kvm_mmu_change_mmu_pages/__kvm_mmu_free_some_pages loops forever. Moving kvm_mmu_commit_zap_page would make the while loop performs as normal. Reported-by: Avi Kivity <avi@redhat.com> Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Tested-by: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:21 +02:00
Wei Yongjun	e4abac67b7	KVM: x86 emulator: add JrCXZ instruction emulation Add JrCXZ instruction emulation (opcode 0xe3) Used by FreeBSD boot loader. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:20 +02:00
Wei Yongjun	09b5f4d3c4	KVM: x86 emulator: add LDS/LES/LFS/LGS/LSS instruction emulation Add LDS/LES/LFS/LGS/LSS instruction emulation. (opcode 0xc4, 0xc5, 0x0f 0xb2, 0x0f 0xb4~0xb5) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:20 +02:00
Dave Hansen	45221ab668	KVM: create aggregate kvm_total_used_mmu_pages value Of slab shrinkers, the VM code says: * Note that 'shrink' will be passed nr_to_scan == 0 when the VM is * querying the cache size, so a fastpath for that case is appropriate. and it means it. Look at how it calls the shrinkers: nr_before = (shrinker->shrink)(0, gfp_mask); shrink_ret = (shrinker->shrink)(this_scan, gfp_mask); So, if you do anything stupid in your shrinker, the VM will doubly punish you. The mmu_shrink() function takes the global kvm_lock, then acquires every VM's kvm->mmu_lock in sequence. If we have 100 VMs, then we're going to take 101 locks. We do it twice, so each call takes 202 locks. If we're under memory pressure, we can have each cpu trying to do this. It can get really hairy, and we've seen lock spinning in mmu_shrink() be the dominant entry in profiles. This is guaranteed to optimize at least half of those lock aquisitions away. It removes the need to take any of the locks when simply trying to count objects. A 'percpu_counter' can be a large object, but we only have one of these for the entire system. There are not any better alternatives at the moment, especially ones that handle CPU hotplug. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:19 +02:00
Dave Hansen	49d5ca2663	KVM: replace x86 kvm n_free_mmu_pages with n_used_mmu_pages Doing this makes the code much more readable. That's borne out by the fact that this patch removes code. "used" also happens to be the number that we need to return back to the slab code when our shrinker gets called. Keeping this value as opposed to free makes the next patch simpler. So, 'struct kvm' is kzalloc()'d. 'struct kvm_arch' is a structure member (and not a pointer) of 'struct kvm'. That means they start out zeroed. I _think_ they get initialized properly by kvm_mmu_change_mmu_pages(). But, that only happens via kvm ioctls. Another benefit of storing 'used' intead of 'free' is that the values are consistent from the moment the structure is allocated: no negative "used" value. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:18 +02:00
Dave Hansen	39de71ec53	KVM: rename x86 kvm->arch.n_alloc_mmu_pages arch.n_alloc_mmu_pages is a poor choice of name. This value truly means, "the number of pages which _may_ be allocated". But, reading the name, "n_alloc_mmu_pages" implies "the number of allocated mmu pages", which is dead wrong. It's really the high watermark, so let's give it a name to match: nr_max_mmu_pages. This change will make the next few patches much more obvious and easy to read. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:18 +02:00
Dave Hansen	e0df7b9f6c	KVM: abstract kvm x86 mmu->n_free_mmu_pages "free" is a poor name for this value. In this context, it means, "the number of mmu pages which this kvm instance should be able to allocate." But "free" implies much more that the objects are there and ready for use. "available" is a much better description, especially when you see how it is calculated. In this patch, we abstract its use into a function. We'll soon replace the function's contents by calculating the value in a different way. All of the reads of n_free_mmu_pages are taken care of in this patch. The modification sites will be handled in a patch later in the series. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:17 +02:00
Avi Kivity	6142914280	KVM: x86 emulator: implement CWD (opcode 99) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:16 +02:00
Avi Kivity	d46164dbd9	KVM: x86 emulator: implement IMUL REG, R/M, IMM (opcode 69) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:16 +02:00
Avi Kivity	7db41eb762	KVM: x86 emulator: add Src2Imm decoding Needed for 3-operand IMUL. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:15 +02:00
Avi Kivity	39f21ee546	KVM: x86 emulator: consolidate immediate decode into a function Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:14 +02:00
Avi Kivity	48bb5d3c40	KVM: x86 emulator: implement RDTSC (opcode 0F 31) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:14 +02:00
Avi Kivity	7077aec0bc	KVM: x86 emulator: remove SrcImplicit Useless. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:13 +02:00
Avi Kivity	5c82aa2998	KVM: x86 emulator: implement IMUL REG, R/M (opcode 0F AF) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:12 +02:00
Avi Kivity	f3a1b9f496	KVM: x86 emulator: implement IMUL REG, R/M, imm8 (opcode 6B) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:12 +02:00
Avi Kivity	40ece7c729	KVM: x86 emulator: implement RET imm16 (opcode C2) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:12 +02:00
Avi Kivity	b250e60589	KVM: x86 emulator: add SrcImmU16 operand type Used for RET NEAR instructions. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:12 +02:00
Avi Kivity	0ef753b8c3	KVM: x86 emulator: implement CALL FAR (FF /3) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:12 +02:00
Avi Kivity	7af04fc05c	KVM: x86 emulator: implement DAS (opcode 2F) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:11 +02:00
Avi Kivity	fb2c264105	KVM: x86 emulator: Use a register for ____emulate_2op() destination Most x86 two operand instructions allow the destination to be a memory operand, but IMUL (for example) requires that the destination be a register. Change ____emulate_2op() to take a register for both source and destination so we can invoke IMUL. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:11 +02:00
Avi Kivity	b3b3d25a12	KVM: x86 emulator: pass destination type to ____emulate_2op() We'll need it later so we can use a register for the destination. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:11 +02:00
Wei Yongjun	f2f3184534	KVM: x86 emulator: add LOOP/LOOPcc instruction emulation Add LOOP/LOOPcc instruction emulation (opcode 0xe0~0xe2). Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:11 +02:00
Wei Yongjun	e8b6fa70e3	KVM: x86 emulator: add CBW/CWDE/CDQE instruction emulation Add CBW/CWDE/CDQE instruction emulation.(opcode 0x98) Used by FreeBSD's boot loader. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:11 +02:00
Avi Kivity	0fa6ccbd28	KVM: x86 emulator: fix REPZ/REPNZ termination condition EFLAGS.ZF needs to be checked after each iteration, not before. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:10 +02:00
Avi Kivity	f6b33fc504	KVM: x86 emulator: implement SCAS (opcodes AE, AF) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:10 +02:00
Avi Kivity	5c56e1cf7a	KVM: x86 emulator: fix INTn emulation not pushing EFLAGS and CS emulate_push() only schedules a push; it doesn't actually push anything. Call writeback() to flush out the write. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:09 +02:00
Wei Yongjun	a13a63faa6	KVM: x86 emulator: remove dup code of in/out instruction Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:09 +02:00
Wei Yongjun	41167be544	KVM: x86 emulator: change OUT instruction to use dst instead of src Change OUT instruction to use dst instead of src, so we can reuse those code for all out instructions. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:09 +02:00
Wei Yongjun	943858e275	KVM: x86 emulator: introduce DstImmUByte for dst operand decode Introduce DstImmUByte for dst operand decode, which will be used for out instruction. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:09 +02:00
Wei Yongjun	c483c02ad3	KVM: x86 emulator: remove useless label from x86_emulate_insn() Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:09 +02:00
Wei Yongjun	ee45b58efe	KVM: x86 emulator: add setcc instruction emulation Add setcc instruction emulation (opcode 0x0f 0x90~0x9f) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:08 +02:00
Wei Yongjun	92f738a52b	KVM: x86 emulator: add XADD instruction emulation Add XADD instruction emulation (opcode 0x0f 0xc0~0xc1) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:06 +02:00
Wei Yongjun	31be40b398	KVM: x86 emulator: put register operand write back to a function Introduce function write_register_operand() to write back the register operand. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:06 +02:00
Mohammed Gamal	8ec4722dd2	KVM: Separate emulation context initialization in a separate function The code for initializing the emulation context is duplicated at two locations (emulate_instruction() and kvm_task_switch()). Separate it in a separate function and call it from there. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:04 +02:00
Wei Yongjun	d9574a25af	KVM: x86 emulator: add bsf/bsr instruction emulation Add bsf/bsr instruction emulation (opcode 0x0f 0xbc~0xbd) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:04 +02:00
Mohammed Gamal	8c5eee30a9	KVM: x86 emulator: Fix emulate_grp3 return values This patch lets emulate_grp3() return X86EMUL_* return codes instead of hardcoded ones. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:04 +02:00
Mohammed Gamal	3f9f53b0d5	KVM: x86 emulator: Add unary mul, imul, div, and idiv instructions This adds unary mul, imul, div, and idiv instructions (group 3 r/m 4-7). Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:04 +02:00
Wei Yongjun	ba7ff2b76d	KVM: x86 emulator: mask group 8 instruction as BitOp Mask group 8 instruction as BitOp, so we can share the code for adjust the source operand. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:03 +02:00
Wei Yongjun	3885f18fe3	KVM: x86 emulator: do not adjust the address for immediate source adjust the dst address for a register source but not adjust the address for an immediate source. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:02 +02:00
Wei Yongjun	35c843c485	KVM: x86 emulator: fix negative bit offset BitOp instruction emulation If bit offset operands is a negative number, BitOp instruction will return wrong value. This patch fix it. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:01 +02:00
Mohammed Gamal	8744aa9aad	KVM: x86 emulator: Add stc instruction (opcode 0xf9) Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:01 +02:00
Wei Yongjun	c034da8b92	KVM: x86 emulator: using SrcOne for instruction d0/d1 decoding Using SrcOne for instruction d0/d1 decoding. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:00 +02:00
Wei Yongjun	36089fed70	KVM: x86 emulator: disable writeback when decode dest operand This patch change to disable writeback when decode dest operand if the dest type is ImplicitOps or not specified. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:00 +02:00
Wei Yongjun	06cb704611	KVM: x86 emulator: use SrcAcc to simplify stos decoding Use SrcAcc to simplify stos decoding. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:00 +02:00
Mohammed Gamal	6e154e56b4	KVM: x86 emulator: Add into, int, and int3 instructions (opcodes 0xcc-0xce) This adds support for int instructions to the emulator. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:00 +02:00
Mohammed Gamal	160ce1f1a8	KVM: x86 emulator: Allow accessing IDT via emulator ops The patch adds a new member get_idt() to x86_emulate_ops. It also adds a function to get the idt in order to be used by the emulator. This is needed for real mode interrupt injection and the emulation of int instructions. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:59 +02:00
Wei Yongjun	d3ad624329	KVM: x86 emulator: simplify two-byte opcode check Two-byte opcode always start with 0x0F and the decode flags of opcode 0xF0 is always 0, so remove dup check. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:59 +02:00
Mohammed Gamal	34698d8c61	KVM: x86 emulator: Fix nop emulation If a nop instruction is encountered, we jump directly to the done label. This skip updating rip. Break from the switch case instead Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:41 +02:00
Avi Kivity	2dbd0dd711	KVM: x86 emulator: Decode memory operands directly into a 'struct operand' Since modrm operand can be either register or memory, decoding it into a 'struct operand', which can represent both, is simpler. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:40 +02:00
Avi Kivity	1f6f05800e	KVM: x86 emulator: change invlpg emulation to use src.mem.addr Instead of using modrm_ea, which will soon be gone. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:39 +02:00
Avi Kivity	342fc63095	KVM: x86 emulator: switch LEA to use SrcMem decoding The NoAccess flag will prevent memory from being accessed. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:38 +02:00
Avi Kivity	5a506b125f	KVM: x86 emulator: add NoAccess flag for memory instructions that skip access Use for INVLPG, which accesses the tlb, not memory. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:37 +02:00
Avi Kivity	b27f38563d	KVM: x86 emulator: use struct operand for mov reg,dr and mov dr,reg for reg op This is an ordinary modrm source or destination; use the standard structure representing it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:36 +02:00
Avi Kivity	1a0c7d44e4	KVM: x86 emulator: use struct operand for mov reg,cr and mov cr,reg for reg op This is an ordinary modrm source or destination; use the standard structure representing it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:35 +02:00
Avi Kivity	cecc9e3916	KVM: x86 emulator: mark mov cr and mov dr as 64-bit instructions in long mode Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:35 +02:00
Avi Kivity	7f9b4b75be	KVM: x86 emulator: introduce Op3264 for mov cr and mov dr instructions The operands for these instructions are 32 bits or 64 bits, depending on long mode, and ignoring REX prefixes, or the operand size prefix. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:35 +02:00
Avi Kivity	1e87e3efe7	KVM: x86 emulator: simplify REX.W check (x && (x & y)) == (x & y) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:34 +02:00
Avi Kivity	d4709c78ee	KVM: x86 emulator: drop use_modrm_ea Unused (and has never been). Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:34 +02:00
Avi Kivity	91ff3cb43c	KVM: x86 emulator: put register operand fetch into a function The code is repeated three times, put it into fetch_register_operand() Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Avi Kivity	3d9e77dff8	KVM: x86 emulator: use SrcAcc to simplify xchg decoding Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Avi Kivity	4515453964	KVM: x86 emulator: simplify xchg decode tables Use X8() to avoid repetition. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Avi Kivity	1a6440aef6	KVM: x86 emulator: use correct type for memory address in operands Currently we use a void pointer for memory addresses. That's wrong since these are guest virtual addresses which are not directly dereferencable by the host. Use the correct type, unsigned long. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Avi Kivity	09ee57cdae	KVM: x86 emulator: push segment override out of decode_modrm() Let it compute modrm_seg instead, and have the caller apply it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Joerg Roedel	dbe7758482	KVM: SVM: Check for asid != 0 on nested vmrun This patch lets a nested vmrun fail if the L1 hypervisor left the asid zero. This fixes the asid_zero unit test. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:32 +02:00
Joerg Roedel	52c65a30a5	KVM: SVM: Check for nested vmrun intercept before emulating vmrun This patch lets the nested vmrun fail if the L1 hypervisor has not intercepted vmrun. This fixes the "vmrun intercept check" unit test. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:32 +02:00
Xiao Guangrong	4132779b17	KVM: MMU: mark page dirty only when page is really written Mark page dirty only when this page is really written, it's more exacter, and also can fix dirty page marking in speculation path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:32 +02:00
Xiao Guangrong	8672b7217a	KVM: MMU: move bits lost judgement into a separate function Introduce spte_has_volatile_bits() function to judge whether spte bits will miss, it's more readable and can help us to cleanup code later Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:31 +02:00
Xiao Guangrong	251464c464	KVM: MMU: using kvm_set_pfn_accessed() instead of mark_page_accessed() It's a small cleanup that using using kvm_set_pfn_accessed() instead of mark_page_accessed() Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:30 +02:00
Gleb Natapov	4fc40f076f	KVM: x86 emulator: check io permissions only once for string pio Do not recheck io permission on every iteration. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:29 +02:00
Avi Kivity	9928ff608b	KVM: x86 emulator: fix LMSW able to clear cr0.pe LMSW is documented not to be able to clear cr0.pe; make it so. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:28 +02:00
Gleb Natapov	e85d28f8e8	KVM: x86 emulator: don't update vcpu state if instruction is restarted No need to update vcpu state since instruction is in the middle of the emulation. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:27 +02:00
Avi Kivity	63540382cc	KVM: x86 emulator: convert some push instructions to direct decode Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:26 +02:00
Avi Kivity	d0e533255d	KVM: x86 emulator: allow repeat macro arguments to contain commas Needed for repeating instructions with execution functions. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:25 +02:00
Avi Kivity	73fba5f4fe	KVM: x86 emulator: move decode tables downwards So they can reference execution functions. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:25 +02:00
Avi Kivity	dde7e6d12a	KVM: x86 emulator: move x86_decode_insn() downwards No code changes. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:24 +02:00
Avi Kivity	ef65c88912	KVM: x86 emulator: allow storing emulator execution function in decode tables Instead of looking up the opcode twice (once for decode flags, once for the big execution switch) look up both flags and function in the decode tables. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:22 +02:00
Avi Kivity	9aabc88fc8	KVM: x86 emulator: store x86_emulate_ops in emulation context It doesn't ever change, so we don't need to pass it around everywhere. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:21 +02:00
Avi Kivity	ab85b12b1a	KVM: x86 emulator: move ByteOp and Dst back to bits 0:3 Now that the group index no longer exists, the space is free. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:20 +02:00
Avi Kivity	3885d530b0	KVM: x86 emulator: drop support for old-style groups Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:19 +02:00
Avi Kivity	9f5d3220e3	KVM: x86 emulator: convert group 9 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:18 +02:00
Avi Kivity	2cb20bc8af	KVM: x86 emulator: convert group 8 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:18 +02:00
Avi Kivity	2f3a9bc9eb	KVM: x86 emulator: convert group 7 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:16 +02:00
Avi Kivity	b67f9f0741	KVM: x86 emulator: convert group 5 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:15 +02:00
Avi Kivity	591c9d20a3	KVM: x86 emulator: convert group 4 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:14 +02:00
Avi Kivity	ee70ea30ee	KVM: x86 emulator: convert group 3 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:13 +02:00
Avi Kivity	99880c5cd5	KVM: x86 emulator: convert group 1A to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:12 +02:00
Avi Kivity	5b92b5faff	KVM: x86 emulator: convert group 1 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:11 +02:00
Avi Kivity	120df8902d	KVM: x86 emulator: allow specifying group directly in opcode Instead of having a group number, store the group table pointer directly in the opcode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:10 +02:00
Avi Kivity	793d5a8d6b	KVM: x86 emulator: reserve group code 0 We'll be using that to distinguish between new-style and old-style groups. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:09 +02:00
Avi Kivity	42a1c52095	KVM: x86 emulator: move group tables to top No code changes. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:08 +02:00
Avi Kivity	fd853310a1	KVM: x86 emulator: Add wrappers for easily defining opcodes Once 'struct opcode' grows, its initializer will become more complicated. Wrap the simple initializers in a D() macro, and replace the empty initializers with an even simpler N macro. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:08 +02:00
Avi Kivity	d65b1dee40	KVM: x86 emulator: introduce 'struct opcode' This will hold all the information known about the opcode. Currently, this is just the decode flags. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:07 +02:00
Avi Kivity	ea9ef04e19	KVM: x86 emulator: drop parentheses in repreat macros The parenthese make is impossible to use the macros with initializers that require braces. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:06 +02:00
Mohammed Gamal	62bd430e6d	KVM: x86 emulator: Add IRET instruction Ths patch adds IRET instruction (opcode 0xcf). Currently, only IRET in real mode is emulated. Protected mode support is to be added later if needed. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Reviewed-by: Avi Kivity <avi@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:05 +02:00
Joerg Roedel	7a190667bb	KVM: SVM: Emulate next_rip svm feature This patch implements the emulations of the svm next_rip feature in the nested svm implementation in kvm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:04 +02:00
Joerg Roedel	3f6a9d1693	KVM: SVM: Sync efer back into nested vmcb This patch fixes a bug in a nested hypervisor that heavily switches between real-mode and long-mode. The problem is fixed by syncing back efer into the guest vmcb on emulated vmexit. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:03 +02:00
Xiao Guangrong	19ada5c4b6	KVM: MMU: remove valueless output message After commit 53383eaad08d, the '*spte' has updated before call rmap_remove()(in most case it's 'shadow_trap_nonpresent_pte'), so remove this information from error message Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:02 +02:00
Avi Kivity	d359192fea	KVM: VMX: Use host_gdt variable wherever we need the host gdt Now that we have the host gdt conveniently stored in a variable, make use of it instead of querying the cpu. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:01 +02:00
Avi Kivity	e071edd5ba	KVM: x86 emulator: unify the two Group 3 variants Use just one group table for byte (F6) and word (F7) opcodes. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:00 +02:00
Avi Kivity	dfe11481d8	KVM: x86 emulator: Allow LOCK prefix for NEG and NOT Opcodes F6/2, F6/3, F7/2, F7/3. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:59 +02:00
Avi Kivity	4968ec4e26	KVM: x86 emulator: simplify Group 1 decoding Move operand decoding to the opcode table, keep lock decoding in the group table. This allows us to get consolidate the four variants of Group 1 into one group. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:58 +02:00
Avi Kivity	52811d7de5	KVM: x86 emulator: mix decode bits from opcode and group decode tables Allow bits that are common to all members of a group to be specified in the opcode table instead of the group table. This allows some simplification of the decode tables. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:58 +02:00
Avi Kivity	047a481809	KVM: x86 emulator: add Undefined decode flag Add a decode flag to indicate the instruction is invalid. Will come in useful later, when we mix decode bits from the opcode and group table. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:57 +02:00
Avi Kivity	2ce495365f	KVM: x86 emulator: Make group storage bits separate from operand bits Currently group bits are stored in bits 0:7, where operand bits are stored. Make group bits be 0:3, and move the existing bits 0:3 to 16:19, so we can mix group and operand bits. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:55 +02:00
Avi Kivity	880a188378	KVM: x86 emulator: consolidate Jcc rel32 decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:55 +02:00
Avi Kivity	be8eacddbd	KVM: x86 emulator: consolidate CMOVcc decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:53 +02:00
Avi Kivity	b6e6153885	KVM: x86 emulator: consolidate MOV reg, imm decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:52 +02:00
Avi Kivity	b3ab3405fe	KVM: x86 emulator: consolidate Jcc rel8 decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:51 +02:00
Avi Kivity	3849186c38	KVM: x86 emulator: consolidate push/pop reg decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:49 +02:00
Avi Kivity	749358a6b4	KVM: x86 emulator: consolidate inc/dec reg decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:48 +02:00
Avi Kivity	83babbca46	KVM: x86 emulator: add macros for repetitive instructions Some instructions are repetitive in the opcode space, add macros for consolidating them. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:48 +02:00
Avi Kivity	91269b8f94	KVM: x86 emulator: fix handling for unemulated instructions If an instruction is present in the decode tables but not in the execution switch, it will be emulated as a NOP. An example is IRET (0xcf). Fix by adding default: labels to the execution switches. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:47 +02:00
Linus Torvalds	d60a2793ba	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Remove stale pmtimer_64.c x86, cleanups: Use clear_page/copy_page rather than memset/memcpy x86: Remove unnecessary #ifdef ACPI/X86_IO_ACPI x86, cleanup: Remove obsolete boot_cpu_id variable	2010-10-21 13:18:06 -07:00
Linus Torvalds	2f0384e5fc	Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, amd_nb: Enable GART support for AMD family 0x15 CPUs x86, amd: Use compute unit information to determine thread siblings x86, amd: Extract compute unit information for AMD CPUs x86, amd: Add support for CPUID topology extension of AMD CPUs x86, nmi: Support NMI watchdog on newer AMD CPU families x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB x86, k8-gart: Decouple handling of garts and northbridges x86, cacheinfo: Fix dependency of AMD L3 CID x86, kvm: add new AMD SVM feature bits x86, cpu: Fix allowed CPUID bits for KVM guests x86, cpu: Update AMD CPUID feature bits x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit x86, AMD: Remove needless CPU family check (for L3 cache info) x86, tsc: Remove CPU frequency calibration on AMD	2010-10-21 13:01:08 -07:00
Avi Kivity	9581d442b9	KVM: Fix fs/gs reload oops with invalid ldt kvm reloads the host's fs and gs blindly, however the underlying segment descriptors may be invalid due to the user modifying the ldt after loading them. Fix by using the safe accessors (loadsegment() and load_gs_index()) instead of home grown unsafe versions. This is CVE-2010-3698. KVM-Stable-Tag. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-19 14:21:45 -02:00
Zachary Amsden	47008cd887	KVM: x86: Move TSC reset out of vmcb_init The VMCB is reset whenever we receive a startup IPI, so Linux is setting TSC back to zero happens very late in the boot process and destabilizing the TSC. Instead, just set TSC to zero once at VCPU creation time. Why the separate patch? So git-bisect is your friend. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-11 12:36:07 +02:00
Zachary Amsden	58877679fd	KVM: x86: Fix SVM VMCB reset On reset, VMCB TSC should be set to zero. Instead, code was setting tsc_offset to zero, which passes through the underlying TSC. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-11 12:36:07 +02:00
H. Peter Anvin	86ffb08519	Merge remote branch 'origin/x86/cpu' into x86/amd-nb	2010-10-01 16:18:11 -07:00
Jan Beulich	234bb549ee	x86, cleanups: Use clear_page/copy_page rather than memset/memcpy When operating on whole pages, use clear_page() and copy_page() in favor of memset() and memcpy(); after all that's what they are intended for. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4C7FB8CA0200007800013F51@vpn.id2.novell.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-22 15:36:49 -07:00
Andre Przywara	6d886fd042	x86, cpu: Fix allowed CPUID bits for KVM guests The AMD extensions to AVX (FMA4, XOP) work on the same YMM register set as AVX, so they are safe for guests to use, as long as AVX itself is allowed. Add F16C and AES on the way for the same reasons. Signed-off-by: Andre Przywara <andre.przywara@amd.com> LKML-Reference: <1283778860-26843-4-git-send-email-andre.przywara@amd.com> Acked-by: Avi Kivity <avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-08 13:34:15 -07:00
Andre Przywara	7ef8aa72ab	x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit The AMD SSE5 feature set as-it has been replaced by some extensions to the AVX instruction set. Thus the bit formerly advertised as SSE5 is re-used for one of these extensions (XOP). Although this changes the /proc/cpuinfo output, it is not user visible, as there are no CPUs (yet) having this feature. To avoid confusion this should be added to the stable series, too. Cc: stable@kernel.org [.32.x .34.x, .35.x] Signed-off-by: Andre Przywara <andre.przywara@amd.com> LKML-Reference: <1283778860-26843-2-git-send-email-andre.przywara@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-08 13:32:55 -07:00
Gleb Natapov	eebb5f31b8	KVM: i8259: fix migration Top of kvm_kpic_state structure should have the same memory layout as kvm_pic_state since it is copied by memcpy. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-09-08 14:50:58 -03:00
Avi Kivity	ae0635b358	KVM: fix i8259 oops when no vcpus are online If there are no vcpus, found will be NULL. Check before doing anything with it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-09-08 14:50:56 -03:00
Avi Kivity	16518d5ada	KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts operand::val and operand::orig_val are 32-bit on i386, whereas cmpxchg8b operands are 64-bit. Fix by adding val64 and orig_val64 union members to struct operand, and using them where needed. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-09-08 14:50:55 -03:00
Linus Torvalds	3dc8d7f07e	Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PIT: free irq source id in handling error path KVM: destroy workqueue on kvm_create_pit() failures KVM: fix poison overwritten caused by using wrong xstate size	2010-08-22 11:27:36 -07:00
Xiao Guangrong	6b5d7a9f6f	KVM: PIT: free irq source id in handling error path Free irq source id if create pit workqueue fail Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-17 12:04:23 +03:00
Xiaotian Feng	3185bf8c23	KVM: destroy workqueue on kvm_create_pit() failures kernel needs to destroy workqueue if kvm_create_pit() fails, otherwise after pit is freed, the workqueue is leaked. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-15 14:17:35 +03:00
Xiaotian Feng	f45755b834	KVM: fix poison overwritten caused by using wrong xstate size fpu.state is allocated from task_xstate_cachep, the size of task_xstate_cachep is xstate_size. xstate_size is set from cpuid instruction, which is often smaller than sizeof(struct xsave_struct). kvm is using sizeof(struct xsave_struct) to fill in/out fpu.state.xsave, as what we allocated for fpu.state is xstate_size, kernel will write out of memory and caused poison/redzone/padding overwritten warnings. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Reviewed-by: Sheng Yang <sheng@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Sheng Yang <sheng@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-15 14:10:15 +03:00
H. Peter Anvin	7645e43204	x86, kvm: Remove cast obsoleted by set_64bit() prototype cleanup KVM ended up having to put a pretty ugly wrapper around set_64bit() in order to get the type right. Now set_64bit() takes the expected u64 type, and this wrapper can be cleaned up. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Avi Kivity <avi@redhat.com> LKML-Reference: <4C5C4E7A.8040603@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-06 13:07:19 -07:00
Linus Torvalds	d9a73c0016	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: um, x86: Cast to (u64 *) inside set_64bit() x86-32, asm: Directly access per-cpu GDT x86-64, asm: Directly access per-cpu IST x86, asm: Merge cmpxchg_486_u64() and cmpxchg8b_emu() x86, asm: Move cmpxchg emulation code to arch/x86/lib x86, asm: Clean up and simplify <asm/cmpxchg.h> x86, asm: Clean up and simplify set_64bit() x86: Add memory modify constraints to xchg() and cmpxchg() x86-64: Simplify loading initial_gs x86: Use symbolic MSR names x86: Remove redundant K6 MSRs	2010-08-06 10:07:34 -07:00
Linus Torvalds	0f477dd085	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix keeping track of AMD C1E x86, cpu: Package Level Thermal Control, Power Limit Notification definitions x86, cpu: Export AMD errata definitions x86, cpu: Use AMD errata checking framework for erratum 383 x86, cpu: Clean up AMD erratum 400 workaround x86, cpu: AMD errata checking framework x86, cpu: Split addon_cpuid_features.c x86, cpu: Clean up formatting in cpufeature.h, remove override x86, cpu: Enumerate xsaveopt x86, cpu: Add xsaveopt cpufeature x86, cpu: Make init_scattered_cpuid_features() consider cpuid subleaves x86, cpu: Support the features flags in new CPUID leaf 7 x86, cpu: Add CPU flags for F16C and RDRND x86: Look for IA32_ENERGY_PERF_BIAS support x86, AMD: Extend support to future families x86, cacheinfo: Carve out L3 cache slot accessors x86, xsave: Cleanup return codes in check_for_xstate()	2010-08-06 10:02:36 -07:00
Avi Kivity	3444d7da18	KVM: VMX: Fix host GDT.LIMIT corruption vmx does not restore GDT.LIMIT to the host value, instead it sets it to 64KB. This means host userspace can learn a few bits of host memory. Fix by reloading GDTR when we load other host state. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 08:10:18 +03:00
Xiao Guangrong	9a3aad7057	KVM: MMU: using __xchg_spte more smarter Sometimes, atomically set spte is not needed, this patch call __xchg_spte() more smartly Note: if the old mapping's access bit is already set, we no need atomic operation since the access bit is not lost Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:41:01 +03:00
Xiao Guangrong	e4b502ead2	KVM: MMU: cleanup spte set and accssed/dirty tracking Introduce set_spte_track_bits() to cleanup current code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:41:00 +03:00
Xiao Guangrong	be233d49ea	KVM: MMU: don't atomicly set spte if it's not present If the old mapping is not present, the spte.a is not lost, so no need atomic operation to set it Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:59 +03:00
Xiao Guangrong	9ed5520dd3	KVM: MMU: fix page dirty tracking lost while sync page In sync-page path, if spte.writable is changed, it will lose page dirty tracking, for example: assume spte.writable = 0 in a unsync-page, when it's synced, it map spte to writable(that is spte.writable = 1), later guest write spte.gfn, it means spte.gfn is dirty, then guest changed this mapping to read-only, after it's synced, spte.writable = 0 So, when host release the spte, it detect spte.writable = 0 and not mark page dirty Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:58 +03:00
Xiao Guangrong	daa3db693c	KVM: MMU: fix broken page accessed tracking with ept enabled In current code, if ept is enabled(shadow_accessed_mask = 0), the page accessed tracking is lost. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:57 +03:00
Xiao Guangrong	fa1de2bfc0	KVM: MMU: add missing reserved bits check in speculative path In the speculative path, we should check guest pte's reserved bits just as the real processor does Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:56 +03:00
Andrea Arcangeli	6e3e243c3b	KVM: MMU: fix mmu notifier invalidate handler for huge spte The index wasn't calculated correctly (off by one) for huge spte so KVM guest was unstable with transparent hugepages. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:54 +03:00
Wei Yongjun	c19b8bd60e	KVM: x86 emulator: fix xchg instruction emulation If the destination is a memory operand and the memory cannot map to a valid page, the xchg instruction emulation and locked instruction will not work on io regions and stuck in endless loop. We should emulate exchange as write to fix it. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:53 +03:00
Gleb Natapov	9195c4da26	KVM: x86: Call mask notifiers from pic If pit delivers interrupt while pic is masking it OS will never do EOI and ack notifier will not be called so when pit will be unmasked no pit interrupts will be delivered any more. Calling mask notifiers solves this issue. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:52 +03:00
Gleb Natapov	68be080345	KVM: x86: never re-execute instruction with enabled tdp With tdp enabled we should get into emulator only when emulating io, so reexecution will always bring us back into emulator. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:51 +03:00
Gleb Natapov	c0e0608cb9	KVM: x86: emulator: inc/dec can have lock prefix Mark inc (0xfe/0 0xff/0) and dec (0xfe/1 0xff/1) as lock prefix capable. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:49 +03:00
Avi Kivity	24157aaf83	KVM: MMU: Eliminate redundant temporaries in FNAME(fetch) 'level' and 'sptep' are aliases for 'interator.level' and 'iterator.sptep', no need for them. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:48 +03:00
Avi Kivity	5991b33237	KVM: MMU: Validate all gptes during fetch, not just those used for new pages Currently, when we fetch an spte, we only verify that gptes match those that the walker saw if we build new shadow pages for them. However, this misses the following race: vcpu1 vcpu2 walk change gpte walk instantiate sp fetch existing sp Fix by validating every gpte, regardless of whether it is used for building a new sp or not. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:47 +03:00
Avi Kivity	0b3c933302	KVM: MMU: Simplify spte fetch() function Partition the function into three sections: - fetching indirect shadow pages (host_level > guest_level) - fetching direct shadow pages (page_level < host_level <= guest_level) - the final spte (page_level == host_level) Instead of the current spaghetti. A slight change from the original code is that we call validate_direct_spte() more often: previously we called it only for gw->level, now we also call it for lower levels. The change should have no effect. [xiao: fix regression caused by validate_direct_spte() called too late] Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:45 +03:00
Avi Kivity	39c8c672a1	KVM: MMU: Add gpte_valid() helper Move the code to check whether a gpte has changed since we fetched it into a helper. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:44 +03:00
Avi Kivity	a357bd229c	KVM: MMU: Add validate_direct_spte() helper Add a helper to verify that a direct shadow page is valid wrt the required access permissions; drop the page if it is not valid. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:43 +03:00
Avi Kivity	a3aa51cfaa	KVM: MMU: Add drop_large_spte() helper To clarify spte fetching code, move large spte handling into a helper. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:42 +03:00
Avi Kivity	121eee97a7	KVM: MMU: Use __set_spte to link shadow pages To avoid split accesses to 64 bit sptes on i386, use __set_spte() to link shadow pages together. (not technically required since shadow pages are __GFP_KERNEL, so upper 32 bits are always clear) Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:41 +03:00
Avi Kivity	32ef26a359	KVM: MMU: Add link_shadow_page() helper To simplify the process of fetching an spte, add a helper that links a shadow page to an spte. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:40 +03:00
Avi Kivity	908e75f3e7	KVM: Expose MCE control MSRs to userspace Userspace needs to reset and save/restore these MSRs. The MCE banks are not exposed since their number varies from vcpu to vcpu. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:36 +03:00
Xiao Guangrong	aea924f606	KVM: PIT: stop vpit before freeing irq_routing Fix: general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ...... Call Trace: [<ffffffffa0159bd1>] ? kvm_set_irq+0xdd/0x24b [kvm] [<ffffffff8106ea8b>] ? trace_hardirqs_off_caller+0x1f/0x10e [<ffffffff813ad17f>] ? sub_preempt_count+0xe/0xb6 [<ffffffff8106d273>] ? put_lock_stats+0xe/0x27 ... RIP [<ffffffffa0159c72>] kvm_set_irq+0x17e/0x24b [kvm] This bug is triggered when guest is shutdown, is because we freed irq_routing before pit thread stopped Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:35 +03:00
Gleb Natapov	a6f177efaa	KVM: Reenter guest after emulation failure if due to access to non-mmio address When shadow pages are in use sometimes KVM try to emulate an instruction when it accesses a shadowed page. If emulation fails KVM un-shadows the page and reenter guest to allow vcpu to execute the instruction. If page is not in shadow page hash KVM assumes that this was attempt to do MMIO and reports emulation failure to userspace since there is no way to fix the situation. This logic has a race though. If two vcpus tries to write to the same shadowed page simultaneously both will enter emulator, but only one of them will find the page in shadow page hash since the one who founds it also removes it from there, so another cpu will report failure to userspace and will abort the guest. Fix this by checking (in addition to checking shadowed page hash) that page that caused the emulation belongs to valid memory slot. If it is then reenter the guest to allow vcpu to reexecute the instruction. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:34 +03:00
Gleb Natapov	edba23e515	KVM: Return EFAULT from kvm ioctl when guest accesses bad area Currently if guest access address that belongs to memory slot but is not backed up by page or page is read only KVM treats it like MMIO access. Remove that capability. It was never part of the interface and should not be relied upon. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:33 +03:00
Jiri Slaby	673813e81d	KVM: fix lock imbalance in kvm_create_pit() Stanse found that there is an omitted unlock in kvm_create_pit in one fail path. Add proper unlock there. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Gleb Natapov <gleb@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Gregory Haskins <ghaskins@novell.com> Cc: kvm@vger.kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:31 +03:00
Avi Kivity	f59c1d2ded	KVM: MMU: Keep going on permission error Real hardware disregards permission errors when computing page fault error code bit 0 (page present). Do the same. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:30 +03:00
Avi Kivity	b0eeec29fe	KVM: MMU: Only indicate a fetch fault in page fault error code if nx is enabled Bit 4 of the page fault error code is set only if EFER.NX is set. Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:29 +03:00
Wei Yongjun	5d55f299f9	KVM: x86 emulator: re-implementing 'mov AL,moffs' instruction decoding This patch change to use DstAcc for decoding 'mov AL, moffs' and introduced SrcAcc for decoding 'mov moffs, AL'. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:27 +03:00
Wei Yongjun	07cbc6c185	KVM: x86 emulator: fix cli/sti instruction emulation If IOPL check fail, the cli/sti emulate GP and then we should skip writeback since the default write OP is OP_REG. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:26 +03:00
Wei Yongjun	b16b2b7bb5	KVM: x86 emulator: fix 'mov rm,sreg' instruction decoding The source operand of 'mov rm,sreg' is segment register, not general-purpose register, so remove SrcReg from decoding. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:25 +03:00
Wei Yongjun	e97e883f8b	KVM: x86 emulator: fix 'and AL,imm8' instruction decoding 'and AL,imm8' should be mask as ByteOp, otherwise the dest operand length will no correct and we may fill the full EAX when writeback. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:24 +03:00
Wei Yongjun	ce7a0ad3bd	KVM: x86 emulator: fix the comment of out instruction Fix the comment of out instruction, using the same style as the other instructions. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:23 +03:00
Wei Yongjun	a5046e6c7d	KVM: x86 emulator: fix 'mov sreg,rm16' instruction decoding Memory reads for 'mov sreg,rm16' should be 16 bits only. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:22 +03:00
Avi Kivity	b79b93f92c	KVM: MMU: Don't drop accessed bit while updating an spte __set_spte() will happily replace an spte with the accessed bit set with one that has the accessed bit clear. Add a helper update_spte() which checks for this condition and updates the page flag if needed. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:21 +03:00
Avi Kivity	a9221dd5ec	KVM: MMU: Atomically check for accessed bit when dropping an spte Currently, in the window between the check for the accessed bit, and actually dropping the spte, a vcpu can access the page through the spte and set the bit, which will be ignored by the mmu. Fix by using an exchange operation to atmoically fetch the spte and drop it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:20 +03:00
Avi Kivity	ce061867aa	KVM: MMU: Move accessed/dirty bit checks from rmap_remove() to drop_spte() Since we need to make the check atomic, move it to the place that will set the new spte. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:18 +03:00
Avi Kivity	be38d276b0	KVM: MMU: Introduce drop_spte() When we call rmap_remove(), we (almost) always immediately follow it by an __set_spte() to a nonpresent pte. Since we need to perform the two operations atomically, to avoid losing the dirty and accessed bits, introduce a helper drop_spte() and convert all call sites. The operation is still nonatomic at this point. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-02 06:40:17 +03:00
Xiao Guangrong	dd180b3e90	KVM: VMX: fix tlb flush with invalid root Commit 341d9b535b6c simplify reload logic while entry guest mode, it can avoid unnecessary sync-root if KVM_REQ_MMU_RELOAD and KVM_REQ_MMU_SYNC both set. But, it cause a issue that when we handle 'KVM_REQ_TLB_FLUSH', the root is invalid, it is triggered during my test: Kernel BUG at ffffffffa00212b8 [verbose debug info unavailable] ...... Fixed by directly return if the root is not ready. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-02 06:40:16 +03:00
Joerg Roedel	828554136b	KVM: Remove unnecessary divide operations This patch converts unnecessary divide and modulo operations in the KVM large page related code into logical operations. This allows to convert gfn_t to u64 while not breaking 32 bit builds. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:30 +03:00
Xiao Guangrong	84754cd8fc	KVM: MMU: cleanup FNAME(fetch)() functions Cleanup this function that we are already get the direct sp's access Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:26 +03:00
Xiao Guangrong	9e7b0e7fba	KVM: MMU: fix direct sp's access corrupted If the mapping is writable but the dirty flag is not set, we will find the read-only direct sp and setup the mapping, then if the write #PF occur, we will mark this mapping writable in the read-only direct sp, now, other real read-only mapping will happily write it without #PF. It may hurt guest's COW Fixed by re-install the mapping when write #PF occur. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:25 +03:00
Xiao Guangrong	5fd5387c89	KVM: MMU: fix conflict access permissions in direct sp In no-direct mapping, we mark sp is 'direct' when we mapping the guest's larger page, but its access is encoded form upper page-struct entire not include the last mapping, it will cause access conflict. For example, have this mapping: [W] / PDE1 -> \|---\| P[W] \| \| LPA \ PDE2 -> \|---\| [R] P have two children, PDE1 and PDE2, both PDE1 and PDE2 mapping the same lage page(LPA). The P's access is WR, PDE1's access is WR, PDE2's access is RO(just consider read-write permissions here) When guest access PDE1, we will create a direct sp for LPA, the sp's access is from P, is W, then we will mark the ptes is W in this sp. Then, guest access PDE2, we will find LPA's shadow page, is the same as PDE's, and mark the ptes is RO. So, if guest access PDE1, the incorrect #PF is occured. Fixed by encode the last mapping access into direct shadow page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:23 +03:00
Xiao Guangrong	36a2e6774b	KVM: MMU: fix writable sync sp mapping While we sync many unsync sp at one time(in mmu_sync_children()), we may mapping the spte writable, it's dangerous, if one unsync sp's mapping gfn is another unsync page's gfn. For example: SP1.pte[0] = P SP2.gfn's pfn = P [SP1.pte[0] = SP2.gfn's pfn] First, we write protected SP1 and SP2, but SP1 and SP2 are still the unsync sp. Then, sync SP1 first, it will detect SP1.pte[0].gfn only has one unsync-sp, that is SP2, so it will mapping it writable, but we plan to sync SP2 soon, at this point, the SP2->unsync is not reliable since later we sync SP2 but SP2->gfn is already writable. So the final result is: SP2 is the sync page but SP2.gfn is writable. This bug will corrupt guest's page table, fixed by mark read-only mapping if the mapped gfn has shadow pages. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:22 +03:00
Sheng Yang	f5f48ee15c	KVM: VMX: Execute WBINVD to keep data consistency with assigned devices Some guest device driver may leverage the "Non-Snoop" I/O, and explicitly WBINVD or CLFLUSH to a RAM space. Since migration may occur before WBINVD or CLFLUSH, we need to maintain data consistency either by: 1: flushing cache (wbinvd) when the guest is scheduled out if there is no wbinvd exit, or 2: execute wbinvd on all dirty physical CPUs when guest wbinvd exits. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:21 +03:00
Avi Kivity	3e00750947	KVM: Simplify vcpu_enter_guest() mmu reload logic slightly No need to reload the mmu in between two different vcpu->requests checks. kvm_mmu_reload() may trigger KVM_REQ_TRIPLE_FAULT, but that will be caught during atomic guest entry later. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:47:19 +03:00
Chris Lalancette	529df65e39	KVM: Search the LAPIC's for one that will accept a PIC interrupt Older versions of 32-bit linux have a "Checking 'hlt' instruction" test where they repeatedly call the 'hlt' instruction, and then expect a timer interrupt to kick the CPU out of halt. This happens before any LAPIC or IOAPIC setup happens, which means that all of the APIC's are in virtual wire mode at this point. Unfortunately, the current implementation of virtual wire mode is hardcoded to only kick the BSP, so if a crash+kexec occurs on a different vcpu, it will never get kicked. This patch makes pic_unlock() do the equivalent of kvm_irq_delivery_to_apic() for the IOAPIC code. That is, it runs through all of the vcpus looking for one that is in virtual wire mode. In the normal case where LAPICs and IOAPICs are configured, this won't be used at all. In the bootstrap phase of a modern OS, before the LAPICs and IOAPICs are configured, this will have exactly the same behavior as today; VCPU0 is always looked at first, so it will always get out of the loop after the first iteration. This will only go through the loop more than once during a kexec/kdump, in which case it will only do it a few times until the kexec'ed kernel programs the LAPIC and IOAPIC. Signed-off-by: Chris Lalancette <clalance@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:17 +03:00
Sheng Yang	6c3f604117	KVM: x86: Enable AVX for guest Enable Intel(R) Advanced Vector Extension(AVX) for guest. The detection of AVX feature includes OSXSAVE bit testing. When OSXSAVE bit is not set, even if AVX is supported, the AVX instruction would result in UD as well. So we're safe to expose AVX bits to guest directly. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:10 +03:00
Avi Kivity	7ac77099ce	KVM: Prevent internal slots from being COWed If a process with a memory slot is COWed, the page will change its address (despite having an elevated reference count). This breaks internal memory slots which have their physical addresses loaded into vmcs registers (see the APIC access memory slot). Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:08 +03:00
Avi Kivity	a8eeb04a44	KVM: Add mini-API for vcpu->requests Makes it a little more readable and hackable. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:05 +03:00
Avi Kivity	36633f32ba	KVM: i8259: simplify pic_irq_request() calling sequence Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:04 +03:00
Avi Kivity	073d46133a	KVM: i8259: reduce excessive abstraction for pic_irq_request() Part of the i8259 code pretends it isn't part of kvm, but we know better. Reduce excessive abstraction, eliminating callbacks and void pointers. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:03 +03:00
Avi Kivity	b74a07beed	KVM: Remove kernel-allocated memory regions Equivalent (and better) functionality is provided by user-allocated memory regions. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:01 +03:00
Avi Kivity	a1f4d39500	KVM: Remove memory alias support As advertised in feature-removal-schedule.txt. Equivalent support is provided by overlapping memory regions. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:47:00 +03:00
Avi Kivity	d1ac91d8a2	KVM: Consolidate load/save temporary buffer allocation and freeing Instead of three temporary variables and three free calls, have one temporary variable (with four names) and one free call. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:57 +03:00
Avi Kivity	a1a005f36e	KVM: Fix xsave and xcr save/restore memory leak We allocate temporary kernel buffers for these structures, but never free them. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:56 +03:00
Wei Yongjun	7d5993d63f	KVM: x86 emulator: fix group3 instruction decoding Group 3 instruction with ModRM reg field as 001 is defined as test instruction under AMD arch, and emulate_grp3() is ready for emulate it, so fix the decoding. static inline int emulate_grp3(...) { ... switch (c->modrm_reg) { case 0 ... 1: /* test */ emulate_2op_SrcV("test", c->src, c->dst, ctxt->eflags); ... } Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:55 +03:00
Chris Lalancette	e7dca5c0eb	KVM: x86: Allow any LAPIC to accept PIC interrupts If the guest wants to accept timer interrupts on a CPU other than the BSP, we need to remove this gate. Signed-off-by: Chris Lalancette <clalance@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:50 +03:00
Chris Lalancette	33572ac0ad	KVM: x86: Introduce a workqueue to deliver PIT timer interrupts We really want to "kvm_set_irq" during the hrtimer callback, but that is risky because that is during interrupt context. Instead, offload the work to a workqueue, which is a bit safer and should provide most of the same functionality. Signed-off-by: Chris Lalancette <clalance@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:49 +03:00
Wei Yongjun	c37eda1384	KVM: x86 emulator: fix pusha instruction emulation emulate pusha instruction only writeback the last EDI register, but the other registers which need to be writeback is ignored. This patch fixed it. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:48 +03:00
Zachary Amsden	bd371396b3	KVM: x86: fix -DDEBUG oops Fix a slight error with assertion in local APIC code. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:46 +03:00
Xiao Guangrong	1047df1fb6	KVM: MMU: don't walk every parent pages while mark unsync While we mark the parent's unsync_child_bitmap, if the parent is already unsynced, it no need walk it's parent, it can reduce some unnecessary workload Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:45 +03:00
Xiao Guangrong	7a8f1a74e4	KVM: MMU: clear unsync_child_bitmap completely In current code, some page's unsync_child_bitmap is not cleared completely in mmu_sync_children(), for example, if two PDPEs shard one PDT, one of PDPE's unsync_child_bitmap is not cleared. Currently, it not harm anything just little overload, but it's the prepare work for the later patch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:44 +03:00
Xiao Guangrong	ebdea638df	KVM: MMU: cleanup for __mmu_unsync_walk() Decrease sp->unsync_children after clear unsync_child_bitmap bit Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:43 +03:00
Xiao Guangrong	be71e061d1	KVM: MMU: don't mark pte notrap if it's just sync transient If the sync-sp just sync transient, don't mark its pte notrap Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:42 +03:00
Xiao Guangrong	f918b44352	KVM: MMU: avoid double write protected in sync page path The sync page is already write protected in mmu_sync_children(), don't write protected it again Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:41 +03:00
Xiao Guangrong	cb83cad2e7	KVM: MMU: cleanup for dirty page judgment Using wrap function to cleanup page dirty judgment Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:39 +03:00
Xiao Guangrong	ac3cd03cca	KVM: MMU: rename 'page' and 'shadow_page' to 'sp' Rename 'page' and 'shadow_page' to 'sp' to better fit the context Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:38 +03:00
Sheng Yang	2d5b5a6655	KVM: x86: XSAVE/XRSTOR live migration support This patch enable save/restore of xsave state. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:37 +03:00
Avi Kivity	2390218b6a	KVM: Fix mov cr3 #GP at wrong instruction On Intel, we call skip_emulated_instruction() even if we injected a #GP, resulting in the #GP pointing at the wrong address. Fix by injecting the exception and skipping the instruction at the same place, so we can do just one or the other. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:35 +03:00
Avi Kivity	a83b29c6ad	KVM: Fix mov cr4 #GP at wrong instruction On Intel, we call skip_emulated_instruction() even if we injected a #GP, resulting in the #GP pointing at the wrong address. Fix by injecting the exception and skipping the instruction at the same place, so we can do just one or the other. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:34 +03:00
Avi Kivity	49a9b07edc	KVM: Fix mov cr0 #GP at wrong instruction On Intel, we call skip_emulated_instruction() even if we injected a #GP, resulting in the #GP pointing at the wrong address. Fix by injecting the exception and skipping the instruction at the same place, so we can do just one or the other. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:46:32 +03:00
Dexuan Cui	2acf923e38	KVM: VMX: Enable XSAVE/XRSTOR for guest This patch enable guest to use XSAVE/XRSTOR instructions. We assume that host_xcr0 would use all possible bits that OS supported. And we loaded xcr0 in the same way we handled fpu - do it as late as we can. Signed-off-by: Dexuan Cui <dexuan.cui@intel.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:31 +03:00
Avi Kivity	f495c6e5e8	KVM: VMX: Fix incorrect rcu deref in rmode_tss_base() Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:30 +03:00
Andi Kleen	a24e809902	KVM: Fix unused but set warnings No real bugs in this one. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:29 +03:00
Xiao Guangrong	3b5d132186	KVM: MMU: delay local tlb flush delay local tlb flush until enter guest moden, it can reduce vpid flush frequency and reduce remote tlb flush IPI(if KVM_REQ_TLB_FLUSH bit is already set, IPI is not sent) Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:26 +03:00
Xiao Guangrong	5304efde6a	KVM: MMU: use wrapper function to flush local tlb Use kvm_mmu_flush_tlb() function instead of calling kvm_x86_ops->tlb_flush(vcpu) directly. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:25 +03:00
Xiao Guangrong	4f78fd08e9	KVM: MMU: remove unnecessary remote tlb flush This remote tlb flush is no necessary since we have synced while sp is zapped Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:24 +03:00
Xiao Guangrong	4b9d3a0451	KVM: VMX: fix rcu usage warning in init_rmode() fix: [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- include/linux/kvm_host.h:258 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 1 lock held by qemu-system-x86/3796: #0: (&vcpu->mutex){+.+.+.}, at: [<ffffffffa0217fd8>] vcpu_load+0x1a/0x66 [kvm] stack backtrace: Pid: 3796, comm: qemu-system-x86 Not tainted 2.6.34 #25 Call Trace: [<ffffffff81070ed1>] lockdep_rcu_dereference+0x9d/0xa5 [<ffffffffa0214fdf>] gfn_to_memslot_unaliased+0x65/0xa0 [kvm] [<ffffffffa0216139>] gfn_to_hva+0x22/0x4c [kvm] [<ffffffffa0216217>] kvm_write_guest_page+0x2a/0x7f [kvm] [<ffffffffa0216286>] kvm_clear_guest_page+0x1a/0x1c [kvm] [<ffffffffa0278239>] init_rmode+0x3b/0x180 [kvm_intel] [<ffffffffa02786ce>] vmx_set_cr0+0x350/0x4d3 [kvm_intel] [<ffffffffa02274ff>] kvm_arch_vcpu_ioctl_set_sregs+0x122/0x31a [kvm] [<ffffffffa021859c>] kvm_vcpu_ioctl+0x578/0xa3d [kvm] [<ffffffff8106624c>] ? cpu_clock+0x2d/0x40 [<ffffffff810f7d86>] ? fget_light+0x244/0x28e [<ffffffff810709b9>] ? trace_hardirqs_off_caller+0x1f/0x10e [<ffffffff8110501b>] vfs_ioctl+0x32/0xa6 [<ffffffff81105597>] do_vfs_ioctl+0x47f/0x4b8 [<ffffffff813ae654>] ? sub_preempt_count+0xa3/0xb7 [<ffffffff810f7da8>] ? fget_light+0x266/0x28e [<ffffffff810f7c53>] ? fget_light+0x111/0x28e [<ffffffff81105617>] sys_ioctl+0x47/0x6a [<ffffffff81002c1b>] system_call_fastpath+0x16/0x1b Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:23 +03:00
Gui Jianfeng	1760dd4939	KVM: VMX: rename vpid_sync_vcpu_all() to vpid_sync_vcpu_single() The name "pid_sync_vcpu_all" isn't appropriate since it just affect a single vpid, so rename it to vpid_sync_vcpu_single(). Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:16 +03:00
Gui Jianfeng	b9d762fa79	KVM: VMX: Add all-context INVVPID type support Add all-context INVVPID type support. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:46:04 +03:00
Xiao Guangrong	0671a8e75d	KVM: MMU: reduce remote tlb flush in kvm_mmu_pte_write() collect remote tlb flush in kvm_mmu_pte_write() path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:39:28 +03:00
Xiao Guangrong	f41d335a02	KVM: MMU: traverse sp hlish safely Now, we can safely to traverse sp hlish Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:39:28 +03:00
Xiao Guangrong	d98ba05365	KVM: MMU: gather remote tlb flush which occurs during page zapped Using kvm_mmu_prepare_zap_page() and kvm_mmu_zap_page() instead of kvm_mmu_zap_page() that can reduce remote tlb flush IPI Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:39:27 +03:00
Xiao Guangrong	103ad25a86	KVM: MMU: don't get free page number in the loop In the later patch, we will modify sp's zapping way like below: kvm_mmu_prepare_zap_page A kvm_mmu_prepare_zap_page B kvm_mmu_prepare_zap_page C .... kvm_mmu_commit_zap_page [ zaped multiple sps only need to call kvm_mmu_commit_zap_page once ] In __kvm_mmu_free_some_pages() function, the free page number is getted form 'vcpu->kvm->arch.n_free_mmu_pages' in loop, it will hinders us to apply kvm_mmu_prepare_zap_page() and kvm_mmu_commit_zap_page() since kvm_mmu_prepare_zap_page() not free sp. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:39:27 +03:00
Xiao Guangrong	7775834a23	KVM: MMU: split the operations of kvm_mmu_zap_page() Using kvm_mmu_prepare_zap_page() and kvm_mmu_commit_zap_page() to split kvm_mmu_zap_page() function, then we can: - traverse hlist safely - easily to gather remote tlb flush which occurs during page zapped Those feature can be used in the later patches Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:39:27 +03:00
Xiao Guangrong	7ae680eb2d	KVM: MMU: introduce some macros to cleanup hlist traverseing Introduce for_each_gfn_sp() and for_each_gfn_indirect_valid_sp() to cleanup hlist traverseing Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:39:27 +03:00
Xiao Guangrong	03116aa57e	KVM: MMU: skip invalid sp when unprotect page In kvm_mmu_unprotect_page(), the invalid sp can be skipped Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:39:26 +03:00
Gui Jianfeng	518c8aee5c	KVM: VMX: Make sure single type invvpid is supported before issuing invvpid instruction According to SDM, we need check whether single-context INVVPID type is supported before issuing invvpid instruction. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Reviewed-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:39:26 +03:00
Lai Jiangshan	7bee342a9e	KVM: x86: use linux/uaccess.h instead of asm/uaccess.h Should use linux/uaccess.h instead of asm/uaccess.h Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:39:25 +03:00
Sheng Yang	4bc9b98281	KVM: VMX: Enforce EPT pagetable level checking We only support 4 levels EPT pagetable now. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:39:25 +03:00
Mohammed Gamal	5120702e73	KVM: VMX: Properly return error to userspace on vmentry failure The vmexit handler returns KVM_EXIT_UNKNOWN since there is no handler for vmentry failures. This intercepts vmentry failures and returns KVM_FAIL_ENTRY to userspace instead. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:39:24 +03:00
Gui Jianfeng	b66d80006e	KVM: MMU: Don't calculate quadrant if tdp_enabled There's no need to calculate quadrant if tdp is enabled. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:39:24 +03:00
Avi Kivity	8184dd38e2	KVM: MMU: Allow spte.w=1 for gpte.w=0 and cr0.wp=0 only in shadow mode When tdp is enabled, the guest's cr0.wp shouldn't have any effect on spte permissions. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:39:23 +03:00
Jan Kiszka	10ab25cd6b	KVM: x86: Propagate fpu_alloc errors Memory allocation may fail. Propagate such errors. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:39:22 +03:00
Zachary Amsden	6dc696d4dd	KVM: SVM: Fix EFER.LME being stripped Must set VCPU register to be the guest notion of EFER even if that setting is not valid on hardware. This was masked by the set in set_efer until 7657fd5ace88e8092f5f3a84117e093d7b893f26 broke that. Fix is simply to set the VCPU register before stripping bits. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:39:22 +03:00
Gui Jianfeng	01c168ac3d	KVM: MMU: don't check PT_WRITABLE_MASK directly Since we have is_writable_pte(), make use of it. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:39:22 +03:00
Lai Jiangshan	3af1817a0d	KVM: MMU: calculate correct gfn for small host pages backing large guest pages In Documentation/kvm/mmu.txt: gfn: Either the guest page table containing the translations shadowed by this page, or the base page frame for linear translations. See role.direct. But in function FNAME(fetch)(), sp->gfn is incorrect when one of following situations occurred: 1) guest is 32bit paging and the guest PDE maps a 4-MByte page (backed by 4k host pages), FNAME(fetch)() miss handling the quadrant. And if guest use pse-36, "table_gfn = gpte_to_gfn(gw->ptes[level - delta]);" is incorrect. 2) guest is long mode paging and the guest PDPTE maps a 1-GByte page (backed by 4k or 2M host pages). So we fix it to suit to the document and suit to the code which requires sp->gfn correct when sp->role.direct=1. We use the goal mapping gfn(gw->gfn) to calculate the base page frame for linear translations, it is simple and easy to be understood. Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Reported-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:39:21 +03:00
Lai Jiangshan	c9fa0b3bef	KVM: MMU: Calculate correct base gfn for direct non-DIR level In Document/kvm/mmu.txt: gfn: Either the guest page table containing the translations shadowed by this page, or the base page frame for linear translations. See role.direct. But in __direct_map(), the base gfn calculation is incorrect, it does not calculate correctly when level=3 or 4. Fix by using PT64_LVL_ADDR_MASK() which accounts for all levels correctly. Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:53 +03:00
Lai Jiangshan	2032a93d66	KVM: MMU: Don't allocate gfns page for direct mmu pages When sp->role.direct is set, sp->gfns does not contain any essential information, leaf sptes reachable from this sp are for a continuous guest physical memory range (a linear range). So sp->gfns[i] (if it was set) equals to sp->gfn + i. (PT_PAGE_TABLE_LEVEL) Obviously, it is not essential information, we can calculate it when need. It means we don't need sp->gfns when sp->role.direct=1, Thus we can save one page usage for every kvm_mmu_page. Note: Access to sp->gfns must be wrapped by kvm_mmu_page_get_gfn() or kvm_mmu_page_set_gfn(). It is only exposed in FNAME(sync_page). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:52 +03:00
Xiao Guangrong	9f1a122f97	KVM: MMU: allow more page become unsync at getting sp time Allow more page become asynchronous at getting sp time, if need create new shadow page for gfn but it not allow unsync(level > 1), we should unsync all gfn's unsync page Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:52 +03:00
Xiao Guangrong	9cf5cf5ad4	KVM: MMU: allow more page become unsync at gfn mapping time In current code, shadow page can become asynchronous only if one shadow page for a gfn, this rule is too strict, in fact, we can let all last mapping page(i.e, it's the pte page) become unsync, and sync them at invlpg or flush tlb time. This patch allow more page become asynchronous at gfn mapping time Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:51 +03:00
Avi Kivity	221d059d15	KVM: Update Red Hat copyrights Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:51 +03:00
Gleb Natapov	9fb2d2b4ff	KVM: SVM: correctly trace irq injection On SVM interrupts are injected by svm_set_irq() not svm_inject_irq(). The later is used only to wait for irq window. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:51 +03:00
Xiao Guangrong	f78978aa3a	KVM: MMU: only update unsync page in invlpg path Only unsync pages need updated at invlpg time since other shadow pages are write-protected Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:50 +03:00
Xiao Guangrong	e02aa901b1	KVM: MMU: don't write-protect if have new mapping to unsync page Two cases maybe happen in kvm_mmu_get_page() function: - one case is, the goal sp is already in cache, if the sp is unsync, we only need update it to assure this mapping is valid, but not mark it sync and not write-protect sp->gfn since it not broke unsync rule(one shadow page for a gfn) - another case is, the goal sp not existed, we need create a new sp for gfn, i.e, gfn (may)has another shadow page, to keep unsync rule, we should sync(mark sync and write-protect) gfn's unsync shadow page. After enabling multiple unsync shadows, we sync those shadow pages only when the new sp not allow to become unsync(also for the unsyc rule, the new rule is: allow all pte page become unsync) Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:50 +03:00
Xiao Guangrong	1d9dc7e000	KVM: MMU: split kvm_sync_page() function Split kvm_sync_page() into kvm_sync_page() and kvm_sync_page_transient() to clarify the code address Avi's suggestion kvm_sync_page_transient() function only update shadow page but not mark it sync and not write protect sp->gfn. it will be used by later patch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:49 +03:00
Sheng Yang	98918833a3	KVM: x86: Use FPU API Convert KVM to use generic FPU API. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:49 +03:00
Sheng Yang	7cf30855e0	KVM: x86: Use unlazy_fpu() for host FPU We can avoid unnecessary fpu load when userspace process didn't use FPU frequently. Derived from Avi's idea. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:48 +03:00
Avi Kivity	9373662463	KVM: Consolidate arch specific vcpu ioctl locking Now that all arch specific ioctls have centralized locking, it is easy to move it to the central dispatcher. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:48 +03:00
Avi Kivity	526b78ad1a	KVM: x86: Lock arch specific vcpu ioctls centrally Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:47 +03:00
Avi Kivity	2122ff5eab	KVM: move vcpu locking to dispatcher for generic vcpu ioctls All vcpu ioctls need to be locked, so instead of locking each one specifically we lock at the generic dispatcher. This patch only updates generic ioctls and leaves arch specific ioctls alone. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:47 +03:00
Xiao Guangrong	1683b2416e	KVM: x86: cleanup unused local variable fix: arch/x86/kvm/x86.c: In function ‘handle_emulation_failure’: arch/x86/kvm/x86.c:3844: warning: unused variable ‘ctxt’ Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:46 +03:00
Xiao Guangrong	f55c3f419a	KVM: MMU: unalias gfn before sp->gfns[] comparison in sync_page sp->gfns[] contain unaliased gfns, but gpte might contain pointer to aliased region. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:46 +03:00
Xiao Guangrong	6d74229f01	KVM: MMU: remove rmap before clear spte Remove rmap before clear spte otherwise it will trigger BUG_ON() in some functions such as rmap_write_protect(). Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:46 +03:00
Xiao Guangrong	e8ad9a7074	KVM: MMU: use proper cache object freeing function Use kmem_cache_free to free objects allocated by kmem_cache_alloc. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:46 +03:00
Sheng Yang	aad827034e	KVM: VMX: Only reset MMU when necessary Only modifying some bits of CR0/CR4 needs paging mode switch. Modify EFER.NXE bit would result in reserved bit updates. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:45 +03:00
Sheng Yang	62ad07551a	KVM: x86: Clean up duplicate assignment mmu.free() already set root_hpa to INVALID_PAGE, no need to do it again in the destory_kvm_mmu(). kvm_x86_ops->set_cr4() and set_efer() already assign cr4/efer to vcpu->arch.cr4/efer, no need to do it again later. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:44 +03:00
Mohammed Gamal	222b7c52c3	KVM: x86 emulator: Add missing decoder flags for xor instructions This adds missing decoder flags for xor instructions (opcodes 0x34 - 0x35) Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:44 +03:00
Mohammed Gamal	abc190830f	KVM: x86 emulator: Add missing decoder flags for sub instruction This adds missing decoder flags for sub instructions (opcodes 0x2c - 0x2d) Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:44 +03:00
Mohammed Gamal	dfb507c41d	KVM: x86 emulator: Add test acc, imm instruction (opcodes 0xA8 - 0xA9) This adds test acc, imm instruction to the x86 emulator Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:43 +03:00
Marcelo Tosatti	24955b6c90	KVM: pass correct parameter to kvm_mmu_free_some_pages Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:43 +03:00
Dongxiao Xu	4610c9cc6d	KVM: VMX: VMXON/VMXOFF usage changes SDM suggests VMXON should be called before VMPTRLD, and VMXOFF should be called after doing VMCLEAR. Therefore in vmm coexistence case, we should firstly call VMXON before any VMCS operation, and then call VMXOFF after the operation is done. Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:43 +03:00
Dongxiao Xu	b923e62e4d	KVM: VMX: VMCLEAR/VMPTRLD usage changes Originally VMCLEAR/VMPTRLD is called on vcpu migration. To support hosted VMM coexistance, VMCLEAR is executed on vcpu schedule out, and VMPTRLD is executed on vcpu schedule in. This could also eliminate the IPI when doing VMCLEAR. Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:42 +03:00
Dongxiao Xu	92fe13be74	KVM: VMX: Some minor changes to code structure Do some preparations for vmm coexistence support. Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:42 +03:00
Dongxiao Xu	7725b89414	KVM: VMX: Define new functions to wrapper direct call of asm code Define vmcs_load() and kvm_cpu_vmxon() to avoid direct call of asm code. Also move VMXE bit operation out of kvm_cpu_vmxoff(). Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:41 +03:00
Avi Kivity	f0f5933a16	KVM: MMU: Fix free memory accounting race in mmu_alloc_roots() We drop the mmu lock between freeing memory and allocating the roots; this allows some other vcpu to sneak in and allocate memory. While the race is benign (resulting only in temporary overallocation, not oom) it is simple and easy to fix by moving the freeing close to the allocation. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:41 +03:00
Gleb Natapov	6d77dbfc88	KVM: inject #UD if instruction emulation fails and exit to userspace Do not kill VM when instruction emulation fails. Inject #UD and report failure to userspace instead. Userspace may choose to reenter guest if vcpu is in userspace (cpl == 3) in which case guest OS will kill offending process and continue running. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-08-01 10:35:40 +03:00
Gui Jianfeng	54a4f0239f	KVM: MMU: make kvm_mmu_zap_page() return the number of pages it actually freed Currently, kvm_mmu_zap_page() returning the number of freed children sp. This might confuse the caller, because caller don't know the actual freed number. Let's make kvm_mmu_zap_page() return the number of pages it actually freed. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:39 +03:00
Gui Jianfeng	518c5a05e8	KVM: MMU: Fix debug output error in walk_addr() Fix a debug output error in walk_addr Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:39 +03:00
Gui Jianfeng	f3b8c964a9	KVM: MMU: mark page table dirty when a pte is actually modified Sometime cmpxchg_gpte doesn't modify gpte, in such case, don't mark page table page as dirty. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:38 +03:00
Joerg Roedel	eec4b140c9	KVM: SVM: Allow EFER.LMSLE to be set with nested svm This patch enables setting of efer bit 13 which is allowed in all SVM capable processors. This is necessary for the SLES11 version of Xen 4.0 to boot with nested svm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:38 +03:00
Joerg Roedel	3f10c846f8	KVM: SVM: Dump vmcb contents on failed vmrun This patch adds a function to dump the vmcb into the kernel log and calls it after a failed vmrun to ease debugging. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:38 +03:00
Avi Kivity	d94e1dc9af	KVM: Get rid of KVM_REQ_KICK KVM_REQ_KICK poisons vcpu->requests by having a bit set during normal operation. This causes the fast path check for a clear vcpu->requests to fail all the time, triggering tons of atomic operations. Fix by replacing KVM_REQ_KICK with a vcpu->guest_mode atomic. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:37 +03:00
Gleb Natapov	54b8486f46	KVM: x86 emulator: do not inject exception directly into vcpu Return exception as a result of instruction emulation and handle injection in KVM code. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:37 +03:00
Gleb Natapov	95cb229530	KVM: x86 emulator: move interruptibility state tracking out of emulator Emulator shouldn't access vcpu directly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:36 +03:00
Gleb Natapov	4d2179e1e9	KVM: x86 emulator: handle shadowed registers outside emulator Emulator shouldn't access vcpu directly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:36 +03:00
Gleb Natapov	bdb475a323	KVM: x86 emulator: use shadowed register in emulate_sysexit() emulate_sysexit() should use shadowed registers copy instead of looking into vcpu state directly. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:36 +03:00
Gleb Natapov	ef050dc039	KVM: x86 emulator: set RFLAGS outside x86 emulator code Removes the need for set_flags() callback. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:35 +03:00
Gleb Natapov	95c5588652	KVM: x86 emulator: advance RIP outside x86 emulator code Return new RIP as part of instruction emulation result instead of updating KVM's RIP from x86 emulator code. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:35 +03:00
Gleb Natapov	3457e4192e	KVM: handle emulation failure case first If emulation failed return immediately. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:34 +03:00
Gleb Natapov	8fe681e984	KVM: do not inject #PF in (read\|write)_emulated() callbacks Return error to x86 emulator instead of injection exception behind its back. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:34 +03:00
Gleb Natapov	f181b96d4c	KVM: remove export of emulator_write_emulated() It is not called directly outside of the file it's defined in anymore. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-01 10:35:34 +03:00

... 7 8 9 10 11 ...

2018 Commits