linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-22 14:46:51 +07:00

Author	SHA1	Message	Date
Linus Torvalds	44e98edcd1	A very small release for x86 and s390 KVM. s390: timekeeping changes, cleanups and fixes x86: support for Hyper-V MSRs to report crashes, and a bunch of cleanups. One interesting feature that was planned for 4.3 (emulating the local APIC in kernel while keeping the IOAPIC and 8254 in userspace) had to be delayed because Intel complained about my reading of the manual. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJVznW4AAoJEL/70l94x66Dt+gH/3vydhh6kv+mKhnR+kADaGfM gaunw0CUpJLU6gkOkYOm5M32WGhsT9Hd3WtRTJO6PhSo7cQ88hMx24u4XAffoewo Os5tDwAaHeV2enVSTri6xX8e2F2mgPDghGcYJPUBwnmMjRzZ8tj2VHUcbxqVT6Pb pX3V8ZxOZ81+ACZU2tdNRzLUd2H1v4d74gtVS7ove1Vb0CvPOBdHf1KQuUCUa2Pi 73fvnaEuSaFYtSWZIP1PYxLnsQHpApH3Kco/5kHeqUPpYaGa/g2bnfncHRw20Svr gb3opwbfyiq91xfGbRVR3+E63Cw4G6aTl5MDNv9UFJ+xFKuj8WJ72xXXTSwzUi4= =HgT+ -----END PGP SIGNATURE----- Merge tag 'kvm-4.3-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "A very small release for x86 and s390 KVM. - s390: timekeeping changes, cleanups and fixes - x86: support for Hyper-V MSRs to report crashes, and a bunch of cleanups. One interesting feature that was planned for 4.3 (emulating the local APIC in kernel while keeping the IOAPIC and 8254 in userspace) had to be delayed because Intel complained about my reading of the manual" * tag 'kvm-4.3-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits) x86/kvm: Rename VMX's segment access rights defines KVM: x86/vPMU: Fix unnecessary signed extension for AMD PERFCTRn kvm: x86: Fix error handling in the function kvm_lapic_sync_from_vapic KVM: s390: Fix assumption that kvm_set_irq_routing is always run successfully KVM: VMX: drop ept misconfig check KVM: MMU: fully check zero bits for sptes KVM: MMU: introduce is_shadow_zero_bits_set() KVM: MMU: introduce the framework to check zero bits on sptes KVM: MMU: split reset_rsvds_bits_mask_ept KVM: MMU: split reset_rsvds_bits_mask KVM: MMU: introduce rsvd_bits_validate KVM: MMU: move FNAME(is_rsvd_bits_set) to mmu.c KVM: MMU: fix validation of mmio page fault KVM: MTRR: Use default type for non-MTRR-covered gfn before WARN_ON KVM: s390: host STP toleration for VMs KVM: x86: clean/fix memory barriers in irqchip_in_kernel KVM: document memory barriers for kvm->vcpus/kvm->online_vcpus KVM: x86: remove unnecessary memory barriers for shared MSRs KVM: move code related to KVM_SET_BOOT_CPU_ID to x86 KVM: s390: log capability enablement and vm attribute changes ...	2015-08-31 08:27:44 -07:00
Ingo Molnar	a5dd192496	Merge branch 'x86/urgent' into x86/asm to fix up conflicts and to pick up fixes Conflicts: arch/x86/entry/entry_64_compat.S arch/x86/math-emu/get_address.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-08-18 09:39:47 +02:00
Haozhong Zhang	d7add05458	KVM: x86: Use adjustment in guest cycles when handling MSR_IA32_TSC_ADJUST When kvm_set_msr_common() handles a guest's write to MSR_IA32_TSC_ADJUST, it will calcuate an adjustment based on the data written by guest and then use it to adjust TSC offset by calling a call-back adjust_tsc_offset(). The 3rd parameter of adjust_tsc_offset() indicates whether the adjustment is in host TSC cycles or in guest TSC cycles. If SVM TSC scaling is enabled, adjust_tsc_offset() [i.e. svm_adjust_tsc_offset()] will first scale the adjustment; otherwise, it will just use the unscaled one. As the MSR write here comes from the guest, the adjustment is in guest TSC cycles. However, the current kvm_set_msr_common() uses it as a value in host TSC cycles (by using true as the 3rd parameter of adjust_tsc_offset()), which can result in an incorrect adjustment of TSC offset if SVM TSC scaling is enabled. This patch fixes this problem. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Cc: stable@vger.linux.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-08-07 13:28:03 +02:00
Paolo Bonzini	18c3626e3d	KVM: x86: zero IDT limit on entry to SMM The recent BlackHat 2015 presentation "The Memory Sinkhole" mentions that the IDT limit is zeroed on entry to SMM. This is not documented, and must have changed some time after 2010 (see http://www.ssi.gouv.fr/uploads/IMG/pdf/IT_Defense_2010_final.pdf). KVM was not doing it, but the fix is easy. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-08-07 12:46:32 +02:00
Xiao Guangrong	a0a64f50aa	KVM: MMU: introduce rsvd_bits_validate These two fields, rsvd_bits_mask and bad_mt_xwr, in "struct kvm_mmu" are used to check if reserved bits set on guest ptes, move them to a data struct so that the approach can be applied to check host shadow page table entries as well Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-08-05 12:47:23 +02:00
Ingo Molnar	5b929bd11d	Merge branch 'x86/urgent' into x86/asm, before applying dependent patches Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-31 10:23:35 +02:00
Paolo Bonzini	71ba994c94	KVM: x86: clean/fix memory barriers in irqchip_in_kernel The memory barriers are trying to protect against concurrent RCU-based interrupt injection, but the IRQ routing table is not valid at the time kvm->arch.vpic is written. Fix this by writing kvm->arch.vpic last. kvm_destroy_pic then need not set kvm->arch.vpic to NULL; modify it to take a struct kvm_pic* and reuse it if the IOAPIC creation fails. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-30 16:02:56 +02:00
Paolo Bonzini	c847fe8895	KVM: x86: remove unnecessary memory barriers for shared MSRs There is no smp_rmb matching the smp_wmb. shared_msr_update is called from hardware_enable, which in turn is called via on_each_cpu. on_each_cpu and must imply a read memory barrier (on x86 the rmb is achieved simply through asm volatile in native_apic_mem_write). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-29 14:27:23 +02:00
Paolo Bonzini	d71ba78834	KVM: move code related to KVM_SET_BOOT_CPU_ID to x86 This is another remnant of ia64 support. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-29 14:27:21 +02:00
Andrey Smetanin	2ce7918990	kvm/x86: add sending hyper-v crash notification to user space Sending of notification is done by exiting vcpu to user space if KVM_REQ_HV_CRASH is enabled for vcpu. At exit to user space the kvm_run structure contains system_event with type KVM_SYSTEM_EVENT_CRASH to notify about guest crash occurred. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Peter Hornyack <peterhornyack@google.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-23 08:27:06 +02:00
Andrey Smetanin	e7d9513b60	kvm/x86: added hyper-v crash msrs into kvm hyperv context Added kvm Hyper-V context hv crash variables as storage of Hyper-V crash msrs. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Peter Hornyack <peterhornyack@google.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-23 08:27:06 +02:00
Andrey Smetanin	e83d58874b	kvm/x86: move Hyper-V MSR's/hypercall code into hyperv.c file This patch introduce Hyper-V related source code file - hyperv.c and per vm and per vcpu hyperv context structures. All Hyper-V MSR's and hypercall code moved into hyperv.c. All Hyper-V kvm/vcpu fields moved into appropriate hyperv context structures. Copyrights and authors information copied from x86.c to hyperv.c. Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Peter Hornyack <peterhornyack@google.com> CC: Paolo Bonzini <pbonzini@redhat.com> CC: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-23 08:27:06 +02:00
Wanpeng Li	ee4100da16	kvm: x86: fix load xsave feature warning [ 68.196974] WARNING: CPU: 1 PID: 2140 at arch/x86/kvm/x86.c:3161 kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm]() [ 68.196975] Modules linked in: snd_hda_codec_hdmi i915 rfcomm bnep bluetooth i2c_algo_bit rfkill nfsd drm_kms_helper nfs_acl nfs drm lockd grace sunrpc fscache snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_seq_dummy snd_seq_oss x86_pkg_temp_thermal snd_seq_midi kvm_intel snd_seq_midi_event snd_rawmidi kvm snd_seq ghash_clmulni_intel fuse snd_timer aesni_intel parport_pc ablk_helper snd_seq_device cryptd ppdev snd lp parport lrw dcdbas gf128mul i2c_core glue_helper lpc_ich video shpchp mfd_core soundcore serio_raw acpi_cpufreq ext4 mbcache jbd2 sd_mod crc32c_intel ahci libahci libata e1000e ptp pps_core [ 68.197005] CPU: 1 PID: 2140 Comm: qemu-system-x86 Not tainted 4.2.0-rc1+ #2 [ 68.197006] Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015 [ 68.197007] ffffffffa03b0657 ffff8800d984bca8 ffffffff815915a2 0000000000000000 [ 68.197009] 0000000000000000 ffff8800d984bce8 ffffffff81057c0a 00007ff6d0001000 [ 68.197010] 0000000000000002 ffff880211c1a000 0000000000000004 ffff8800ce0288c0 [ 68.197012] Call Trace: [ 68.197017] [<ffffffff815915a2>] dump_stack+0x45/0x57 [ 68.197020] [<ffffffff81057c0a>] warn_slowpath_common+0x8a/0xc0 [ 68.197022] [<ffffffff81057cfa>] warn_slowpath_null+0x1a/0x20 [ 68.197029] [<ffffffffa037bed8>] kvm_arch_vcpu_ioctl+0xe88/0x1340 [kvm] [ 68.197035] [<ffffffffa037aede>] ? kvm_arch_vcpu_load+0x4e/0x1c0 [kvm] [ 68.197040] [<ffffffffa03696a6>] kvm_vcpu_ioctl+0xc6/0x5c0 [kvm] [ 68.197043] [<ffffffff811252d2>] ? perf_pmu_enable+0x22/0x30 [ 68.197044] [<ffffffff8112663e>] ? perf_event_context_sched_in+0x7e/0xb0 [ 68.197048] [<ffffffff811a6882>] do_vfs_ioctl+0x2c2/0x4a0 [ 68.197050] [<ffffffff8107bf33>] ? finish_task_switch+0x173/0x220 [ 68.197053] [<ffffffff8123307f>] ? selinux_file_ioctl+0x4f/0xd0 [ 68.197055] [<ffffffff8122cac3>] ? security_file_ioctl+0x43/0x60 [ 68.197057] [<ffffffff811a6ad9>] SyS_ioctl+0x79/0x90 [ 68.197060] [<ffffffff81597e57>] entry_SYSCALL_64_fastpath+0x12/0x6a [ 68.197061] ---[ end trace 558a5ebf9445fc80 ]--- After commit (`0c4109bec0` 'x86/fpu/xstate: Fix up bad get_xsave_addr() assumptions'), there is no assumption an xsave bit is present in the hardware (pcntxt_mask) that it is always present in a given xsave buffer. An enabled state to be present on 'pcntxt_mask', but not in 'xstate_bv' could happen when the last 'xsave' did not request that this feature be saved (unlikely) or because the "init optimization" caused it to not be saved. This patch kill the assumption. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-10 13:26:45 +02:00
Paolo Bonzini	5544eb9b81	KVM: count number of assigned devices If there are no assigned devices, the guest PAT are not providing any useful information and can be overridden to writeback; VMX always does this because it has the "IPAT" bit in its extended page table entries, but SVM does not have anything similar. Hook into VFIO and legacy device assignment so that they provide this information to KVM. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-10 13:25:26 +02:00
Radim Krčmář	370777daab	KVM: VMX: fix vmwrite to invalid VMCS fpu_activate is called outside of vcpu_load(), which means it should not touch VMCS, but fpu_activate needs to. Avoid the call by moving it to a point where we know that the guest needs eager FPU and VMCS is loaded. This will get rid of the following trace vmwrite error: reg 6800 value 0 (err 1) [<ffffffff8162035b>] dump_stack+0x19/0x1b [<ffffffffa046c701>] vmwrite_error+0x2c/0x2e [kvm_intel] [<ffffffffa045f26f>] vmcs_writel+0x1f/0x30 [kvm_intel] [<ffffffffa04617e5>] vmx_fpu_activate.part.61+0x45/0xb0 [kvm_intel] [<ffffffffa0461865>] vmx_fpu_activate+0x15/0x20 [kvm_intel] [<ffffffffa0560b91>] kvm_arch_vcpu_create+0x51/0x70 [kvm] [<ffffffffa0548011>] kvm_vm_ioctl+0x1c1/0x760 [kvm] [<ffffffff8118b55a>] ? handle_mm_fault+0x49a/0xec0 [<ffffffff811e47d5>] do_vfs_ioctl+0x2e5/0x4c0 [<ffffffff8127abbe>] ? file_has_perm+0xae/0xc0 [<ffffffff811e4a51>] SyS_ioctl+0xa1/0xc0 [<ffffffff81630949>] system_call_fastpath+0x16/0x1b (Note: we also unconditionally activate FPU in vmx_vcpu_reset(), so the removed code added nothing.) Fixes: `c447e76b4c` ("kvm/fpu: Enable eager restore kvm FPU for MPX") Cc: <stable@vger.kernel.org> Reported-by: Vlastimil Holer <vlastimil.holer@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-10 13:25:25 +02:00
Andy Lutomirski	03b9730b76	x86/asm/tsc: Add rdtsc_ordered() and use it in trivial call sites rdtsc_barrier(); rdtsc() is an unnecessary mouthful and requires more thought than should be necessary. Add an rdtsc_ordered() helper and replace the trivial call sites with it. This should not change generated code. The duplication of the fence asm is temporary. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/dddbf98a2af53312e9aa73a5a2b1622fe5d6f52b.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:23:29 +02:00
Andy Lutomirski	4ea1636b04	x86/asm/tsc: Rename native_read_tsc() to rdtsc() Now that there is no paravirt TSC, the "native" is inappropriate. The function does RDTSC, so give it the obvious name: rdtsc(). Suggested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/fd43e16281991f096c1e4d21574d9e1402c62d39.1434501121.git.luto@kernel.org [ Ported it to v4.2-rc1. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:23:28 +02:00
Andy Lutomirski	881d7bf843	x86/asm/tsc, kvm: Remove vget_cycles() The only caller was KVM's read_tsc(). The only difference between vget_cycles() and native_read_tsc() was that vget_cycles() returned zero instead of crashing on TSC-less systems. KVM already checks vclock_mode() before calling that function, so the extra check is unnecessary. Also, KVM (host-side) requires the TSC to exist. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Huang Rui <ray.huang@amd.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm ML <kvm@vger.kernel.org> Link: http://lkml.kernel.org/r/20615df14ae2eb713ea7a5f5123c1dc4c7ca993d.1434501121.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-07-06 15:23:25 +02:00
Nicolas Iooss	b0996ae482	KVM: x86: remove data variable from kvm_get_msr_common Commit `609e36d372` ("KVM: x86: pass host_initiated to functions that read MSRs") modified kvm_get_msr_common function to use msr_info->data instead of data but missed one occurrence. Replace it and remove the unused local variable. Fixes: `609e36d372` ("KVM: x86: pass host_initiated to functions that read MSRs") Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-07-03 18:55:19 +02:00
Linus Torvalds	4e241557fc	The bulk of the changes here is for x86. And for once it's not for silicon that no one owns: these are really new features for everyone. * ARM: several features are in progress but missed the 4.2 deadline. So here is just a smattering of bug fixes, plus enabling the VFIO integration. * s390: Some fixes/refactorings/optimizations, plus support for 2GB pages. * x86: 1) host and guest support for marking kvmclock as a stable scheduler clock. 2) support for write combining. 3) support for system management mode, needed for secure boot in guests. 4) a bunch of cleanups required for 2+3. 5) support for virtualized performance counters on AMD; 6) legacy PCI device assignment is deprecated and defaults to "n" in Kconfig; VFIO replaces it. On top of this there are also bug fixes and eager FPU context loading for FPU-heavy guests. * Common code: Support for multiple address spaces; for now it is used only for x86 SMM but the s390 folks also have plans. There are some x86 conflicts, one with the rc8 pull request and the rest with Ingo's FPU rework. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJViYzhAAoJEL/70l94x66Dda0H/1IepMbfEy+o849d5G71fNTs F8Y8qUP2GZuL7T53FyFUGSBw+AX7kimu9ia4gR/PmDK+QYsdosYeEjwlsolZfTBf sHuzNtPoJhi5o1o/ur4NGameo0WjGK8f1xyzr+U8z74QDQyQv/QYCdK/4isp4BJL ugHNHkuROX6Zng4i7jc9rfaSRg29I3GBxQUYpMkEnD3eMYMUBWGm6Rs8pHgGAMvL vqzntgW00WNxehTqcAkmD/Wv+txxhkvIadZnjgaxH49e9JeXeBKTIR5vtb7Hns3s SuapZUyw+c95DIipXq4EznxxaOrjbebOeFgLCJo8+XMXZum8RZf/ob24KroYad0= =YsAR -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull first batch of KVM updates from Paolo Bonzini: "The bulk of the changes here is for x86. And for once it's not for silicon that no one owns: these are really new features for everyone. Details: - ARM: several features are in progress but missed the 4.2 deadline. So here is just a smattering of bug fixes, plus enabling the VFIO integration. - s390: Some fixes/refactorings/optimizations, plus support for 2GB pages. - x86: * host and guest support for marking kvmclock as a stable scheduler clock. * support for write combining. * support for system management mode, needed for secure boot in guests. * a bunch of cleanups required for the above * support for virtualized performance counters on AMD * legacy PCI device assignment is deprecated and defaults to "n" in Kconfig; VFIO replaces it On top of this there are also bug fixes and eager FPU context loading for FPU-heavy guests. - Common code: Support for multiple address spaces; for now it is used only for x86 SMM but the s390 folks also have plans" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits) KVM: s390: clear floating interrupt bitmap and parameters KVM: x86/vPMU: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRs KVM: x86/vPMU: Implement AMD vPMU code for KVM KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch KVM: x86/vPMU: introduce kvm_pmu_msr_idx_to_pmc KVM: x86/vPMU: reorder PMU functions KVM: x86/vPMU: whitespace and stylistic adjustments in PMU code KVM: x86/vPMU: use the new macros to go between PMC, PMU and VCPU KVM: x86/vPMU: introduce pmu.h header KVM: x86/vPMU: rename a few PMU functions KVM: MTRR: do not map huge page for non-consistent range KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type KVM: MTRR: introduce mtrr_for_each_mem_type KVM: MTRR: introduce fixed_mtrr_addr_* functions KVM: MTRR: sort variable MTRRs KVM: MTRR: introduce var_mtrr_range KVM: MTRR: introduce fixed_mtrr_segment table KVM: MTRR: improve kvm_mtrr_get_guest_memory_type KVM: MTRR: do not split 64 bits MSR content KVM: MTRR: clean up mtrr default type ...	2015-06-24 09:36:49 -07:00
Wei Huang	6912ac326d	KVM: x86/vPMU: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRs This patch enables AMD guest VM to access (R/W) PMU related MSRs, which include PERFCTR[0..3] and EVNTSEL[0..3]. Reviewed-by: Joerg Roedel <jroedel@suse.de> Tested-by: Joerg Roedel <jroedel@suse.de> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-23 14:12:15 +02:00
Wei Huang	474a5bb944	KVM: x86/vPMU: introduce pmu.h header This will be used for private function used by AMD- and Intel-specific PMU implementations. Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:29 +02:00
Wei Huang	c6702c9dcf	KVM: x86/vPMU: rename a few PMU functions Before introducing a pmu.h header for them, make the naming more consistent. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:29 +02:00
Xiao Guangrong	19efffa244	KVM: MTRR: sort variable MTRRs Sort all valid variable MTRRs based on its base address, it will help us to check a range to see if it's fully contained in variable MTRRs Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> [Fix list insertion sort, simplify var_mtrr_range_is_valid to just test the V bit. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:28 +02:00
Xiao Guangrong	70109e7d9d	KVM: MTRR: remove mtrr_state.have_fixed vMTRR does not depend on any host MTRR feature and fixed MTRRs have always been implemented, so drop this field Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:27 +02:00
Xiao Guangrong	eb839917a7	KVM: MTRR: handle MSR_MTRRcap in kvm_mtrr_get_msr MSR_MTRRcap is a MTRR msr so move the handler to the common place, also add some comments to make the hard code more readable Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:27 +02:00
Xiao Guangrong	ff53604b40	KVM: x86: move MTRR related code to a separate file MTRR code locates in x86.c and mmu.c so that move them to a separate file to make the organization more clearer and it will be the place where we fully implement vMTRR Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:26 +02:00
Xiao Guangrong	b18d5431ac	KVM: x86: fix CR0.CD virtualization Currently, CR0.CD is not checked when we virtualize memory cache type for noncoherent_dma guests, this patch fixes it by : - setting UC for all memory if CR0.CD = 1 - zapping all the last sptes in MMU if CR0.CD is changed Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-19 17:16:26 +02:00
Paolo Bonzini	6d396b5520	KVM: x86: advertise KVM_CAP_X86_SMM ... and we're done. :) Because SMBASE is usually relocated above 1M on modern chipsets, and SMM handlers might indeed rely on 4G segment limits, we only expose it if KVM is able to run the guest in big real mode. This includes any of VMX+emulate_invalid_guest_state, VMX+unrestricted_guest, or SVM. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:38 +02:00
Paolo Bonzini	699023e239	KVM: x86: add SMM to the MMU role, support SMRAM address space This is now very simple to do. The only interesting part is a simple trick to find the right memslot in gfn_to_rmap, retrieving the address space from the spte role word. The same trick is used in the auditing code. The comment on top of union kvm_mmu_page_role has been stale forever, so remove it. Speaking of stale code, remove pad_for_nice_hex_output too: it was splitting the "access" bitfield across two bytes and thus had effectively turned into pad_for_ugly_hex_output. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:37 +02:00
Paolo Bonzini	9da0e4d5ac	KVM: x86: work on all available address spaces This patch has no semantic change, but it prepares for the introduction of a second address space for system management mode. A new function x86_set_memory_region (and the "slots_lock taken" counterpart __x86_set_memory_region) is introduced in order to operate on all address spaces when adding or deleting private memory slots. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:37 +02:00
Paolo Bonzini	54bf36aac5	KVM: x86: use vcpu-specific functions to read/write/translate GFNs We need to hide SMRAM from guests not running in SMM. Therefore, all uses of kvm_read_guest* and kvm_write_guest* must be changed to check whether the VCPU is in system management mode and use a different set of memslots. Switch from kvm_* to the newly-introduced kvm_vcpu_*, which call into kvm_arch_vcpu_memslots_id. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-05 17:26:36 +02:00
Paolo Bonzini	660a5d517a	KVM: x86: save/load state on SMM switch The big ugly one. This patch adds support for switching in and out of system management mode, respectively upon receiving KVM_REQ_SMI and upon executing a RSM instruction. Both 32- and 64-bit formats are supported for the SMM state save area. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:17:46 +02:00
Paolo Bonzini	cd7764fe9f	KVM: x86: latch INITs while in system management mode Do not process INITs immediately while in system management mode, keep it instead in apic->pending_events. Tell userspace if an INIT is pending when they issue GET_VCPU_EVENTS, and similarly handle the new field in SET_VCPU_EVENTS. Note that the same treatment should be done while in VMX non-root mode. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:01:51 +02:00
Paolo Bonzini	64d6067057	KVM: x86: stubs for SMM support This patch adds the interface between x86.c and the emulator: the SMBASE register, a new emulator flag, the RSM instruction. It also adds a new request bit that will be used by the KVM_SMI ioctl. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:01:45 +02:00
Paolo Bonzini	f077825a87	KVM: x86: API changes for SMM support This patch includes changes to the external API for SMM support. Userspace can predicate the availability of the new fields and ioctls on a new capability, KVM_CAP_X86_SMM, which is added at the end of the patch series. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:01:11 +02:00
Paolo Bonzini	a584539b24	KVM: x86: pass the whole hflags field to emulator and back The hflags field will contain information about system management mode and will be useful for the emulator. Pass the entire field rather than just the guest-mode information. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:01:05 +02:00
Paolo Bonzini	609e36d372	KVM: x86: pass host_initiated to functions that read MSRs SMBASE is only readable from SMM for the VCPU, but it must be always accessible if userspace is accessing it. Thus, all functions that read MSRs are changed to accept a struct msr_data; the host_initiated and index fields are pre-initialized, while the data field is filled on return. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:01:00 +02:00
Paolo Bonzini	62ef68bb4d	KVM: x86: introduce num_emulated_msrs We will want to filter away MSR_IA32_SMBASE from the emulated_msrs if the host CPU does not support SMM virtualization. Introduce the logic to do that, and also move paravirt MSRs to emulated_msrs for simplicity and to get rid of KVM_SAVE_MSRS_BEGIN. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 16:00:46 +02:00
Paolo Bonzini	e69fab5df4	KVM: x86: clear hidden CPU state at reset time This was noticed by Radim while reviewing the implementation of system management mode. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-06-04 10:44:44 +02:00
Marcelo Tosatti	b7e60c5aed	KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR Initialize kvmclock base, on kvmclock system MSR write time, so that the guest sees kvmclock counting from zero. This matches baremetal behaviour when kvmclock in guest sets sched clock stable. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> [Remove unnecessary comment. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-29 14:02:40 +02:00
Marcelo Tosatti	630994b3c7	KVM: x86: add module parameter to disable periodic kvmclock sync The periodic kvmclock sync can be an undesired source of latencies. When running cyclictest on a guest, a latency spike is visible. With kvmclock periodic sync disabled, the spike is gone. Guests should use ntp which means the propagations of ntp corrections from the host clock are unnecessary. v2: -> Make parameter read-only (Radim) -> Return early on kvmclock_sync_fn (Andrew) Reported-and-tested-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-28 10:43:35 +02:00
Paolo Bonzini	f36f3f2846	KVM: add "new" argument to kvm_arch_commit_memory_region This lets the function access the new memory slot without going through kvm_memslots and id_to_memslot. It will simplify the code when more than one address space will be supported. Unfortunately, the "const"ness of the new argument must be casted away in two places. Fixing KVM to accept const struct kvm_memory_slot pointers would require modifications in pretty much all architectures, and is left for later. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-28 10:42:58 +02:00
Ingo Molnar	003e2e8b57	x86/fpu: Standardize the parameter type of copy_kernel_to_fpregs() Bring the __copy_fpstate_to_fpregs() and copy_fpstate_to_fpregs() functions in line with the parameter passing convention of other kernel-to-FPU-registers copying functions: pass around an in-memory FPU register state pointer, instead of struct fpu *. NOTE: This patch also changes the assembly constraint of the FXSAVE-leak workaround from 'fpu->fpregs_active' to 'fpstate' - but that is fine, as we only need a valid memory address there for the FILDL instruction. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:11:32 +02:00
Ingo Molnar	3e1bf47e5c	x86/fpu: Rename copy_fpstate_to_fpregs() to copy_kernel_to_fpregs() Bring the __copy_fpstate_to_fpregs() and copy_fpstate_to_fpregs() functions in line with the naming of other kernel-to-FPU-registers copying functions. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-27 14:11:29 +02:00
Paolo Bonzini	15f46015ee	KVM: add memslots argument to kvm_arch_memslots_updated Prepare for the case of multiple address spaces. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-26 12:40:17 +02:00
Paolo Bonzini	09170a4942	KVM: const-ify uses of struct kvm_userspace_memory_region Architecture-specific helpers are not supposed to muck with struct kvm_userspace_memory_region contents. Add const to enforce this. In order to eliminate the only write in __kvm_set_memory_region, the cleaning of deleted slots is pulled up from update_memslots to __kvm_set_memory_region. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-26 12:40:13 +02:00
Paolo Bonzini	9f6b802978	KVM: use kvm_memslots whenever possible kvm_memslots provides lockdep checking. Use it consistently instead of explicit dereferencing of kvm->memslots. Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-26 12:40:08 +02:00
Ingo Molnar	3152657f10	Linux 4.1-rc5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVYnloAAoJEHm+PkMAQRiGCgkH/j3r2djOOm4h83FXrShaHORY p8TBI3FNj4fzLk2PfzqbmiDw2T2CwygB+pxb2Ac9CE99epw8qPk2SRvPXBpdKR7t lolhhwfzApLJMZbhzNLVywUCDUhFoiEWRhmPqIfA3WXFcIW3t5VNXAoIFjV5HFr6 sYUlaxSI1XiQ5tldVv8D6YSFHms41pisziBIZmzhIUg10P6Vv3D0FbE74fjAJwx0 +08zj3EO7yQMv7Aeeq8F8AJ3628142rcZf0NWF5ohlKLRK3gt0cl9jO5U4Co2dDt 29v03LP5EI6jDKkIbuWlqRMq9YxJz7N3wnkzV0EJiqXucoqPLFDqzbxB4gnS1pI= =7vbA -----END PGP SIGNATURE----- Merge branch 'linus' into x86/fpu Resolve semantic conflict in arch/x86/kvm/cpuid.c with: `c447e76b4c` ("kvm/fpu: Enable eager restore kvm FPU for MPX") By removing the FPU internal include files. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-25 09:39:19 +02:00
Paolo Bonzini	a9b4fb7e79	Merge branch 'kvm-master' into kvm-next Grab MPX bugfix, and fix conflicts against Rik's adaptive FPU deactivation patch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-20 12:31:37 +02:00
Liang Li	c447e76b4c	kvm/fpu: Enable eager restore kvm FPU for MPX The MPX feature requires eager KVM FPU restore support. We have verified that MPX cannot work correctly with the current lazy KVM FPU restore mechanism. Eager KVM FPU restore should be enabled if the MPX feature is exposed to VM. Signed-off-by: Yang Zhang <yang.z.zhang@intel.com> Signed-off-by: Liang Li <liang.z.li@intel.com> [Also activate the FPU on AMD processors. - Paolo] Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-20 12:30:26 +02:00
Andrea Arcangeli	e8fd5e9e99	kvm: fix crash in kvm_vcpu_reload_apic_access_page memslot->userfault_addr is set by the kernel with a mmap executed from the kernel but the userland can still munmap it and lead to the below oops after memslot->userfault_addr points to a host virtual address that has no vma or mapping. [ 327.538306] BUG: unable to handle kernel paging request at fffffffffffffffe [ 327.538407] IP: [<ffffffff811a7b55>] put_page+0x5/0x50 [ 327.538474] PGD 1a01067 PUD 1a03067 PMD 0 [ 327.538529] Oops: 0000 [#1] SMP [ 327.538574] Modules linked in: macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT iptable_filter ip_tables tun bridge stp llc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ipmi_devintf iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp dcdbas intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core ipmi_si ipmi_msghandler acpi_pad wmi acpi_power_meter lpc_ich mfd_core mei_me [ 327.539488] mei shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc mlx4_ib ib_sa ib_mad ib_core mlx4_en vxlan ib_addr ip_tunnel xfs libcrc32c sd_mod crc_t10dif crct10dif_common crc32c_intel mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ahci i2c_core libahci mlx4_core libata tg3 ptp pps_core megaraid_sas ntb dm_mirror dm_region_hash dm_log dm_mod [ 327.539956] CPU: 3 PID: 3161 Comm: qemu-kvm Not tainted 3.10.0-240.el7.userfault19.4ca4011.x86_64.debug #1 [ 327.540045] Hardware name: Dell Inc. PowerEdge R420/0CN7CM, BIOS 2.1.2 01/20/2014 [ 327.540115] task: ffff8803280ccf00 ti: ffff880317c58000 task.ti: ffff880317c58000 [ 327.540184] RIP: 0010:[<ffffffff811a7b55>] [<ffffffff811a7b55>] put_page+0x5/0x50 [ 327.540261] RSP: 0018:ffff880317c5bcf8 EFLAGS: 00010246 [ 327.540313] RAX: 00057ffffffff000 RBX: ffff880616a20000 RCX: 0000000000000000 [ 327.540379] RDX: 0000000000002014 RSI: 00057ffffffff000 RDI: fffffffffffffffe [ 327.540445] RBP: ffff880317c5bd10 R08: 0000000000000103 R09: 0000000000000000 [ 327.540511] R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffffe [ 327.540576] R13: 0000000000000000 R14: ffff880317c5bd70 R15: ffff880317c5bd50 [ 327.540643] FS: 00007fd230b7f700(0000) GS:ffff880630800000(0000) knlGS:0000000000000000 [ 327.540717] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 327.540771] CR2: fffffffffffffffe CR3: 000000062a2c3000 CR4: 00000000000427e0 [ 327.540837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 327.540904] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 327.540974] Stack: [ 327.541008] ffffffffa05d6d0c ffff880616a20000 0000000000000000 ffff880317c5bdc0 [ 327.541093] ffffffffa05ddaa2 0000000000000000 00000000002191bf 00000042f3feab2d [ 327.541177] 00000042f3feab2d 0000000000000002 0000000000000001 0321000000000000 [ 327.541261] Call Trace: [ 327.541321] [<ffffffffa05d6d0c>] ? kvm_vcpu_reload_apic_access_page+0x6c/0x80 [kvm] [ 327.543615] [<ffffffffa05ddaa2>] vcpu_enter_guest+0x3f2/0x10f0 [kvm] [ 327.545918] [<ffffffffa05e2f10>] kvm_arch_vcpu_ioctl_run+0x2b0/0x5a0 [kvm] [ 327.548211] [<ffffffffa05e2d02>] ? kvm_arch_vcpu_ioctl_run+0xa2/0x5a0 [kvm] [ 327.550500] [<ffffffffa05ca845>] kvm_vcpu_ioctl+0x2b5/0x680 [kvm] [ 327.552768] [<ffffffff810b8d12>] ? creds_are_invalid.part.1+0x12/0x50 [ 327.555069] [<ffffffff810b8d71>] ? creds_are_invalid+0x21/0x30 [ 327.557373] [<ffffffff812d6066>] ? inode_has_perm.isra.49.constprop.65+0x26/0x80 [ 327.559663] [<ffffffff8122d985>] do_vfs_ioctl+0x305/0x530 [ 327.561917] [<ffffffff8122dc51>] SyS_ioctl+0xa1/0xc0 [ 327.564185] [<ffffffff816de829>] system_call_fastpath+0x16/0x1b [ 327.566480] Code: 0b 31 f6 4c 89 e7 e8 4b 7f ff ff 0f 0b e8 24 fd ff ff e9 a9 fd ff ff 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 2a 8b 47 1c 85 c0 74 1e f0 Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-20 12:30:06 +02:00
Xiao Guangrong	d81135a57a	KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed CR0.CD and CR0.NW are not used by shadow page table so that need not adjust mmu if these two bit are changed Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-19 20:52:43 +02:00
Xiao Guangrong	efdfe536d8	KVM: MMU: fix MTRR update Currently, whenever guest MTRR registers are changed kvm_mmu_reset_context is called to switch to the new root shadow page table, however, it's useless since: 1) the cache type is not cached into shadow page's attribute so that the original root shadow page will be reused 2) the cache type is set on the last spte, that means we should sync the last sptes when MTRR is changed This patch fixs this issue by drop all the spte in the gfn range which is being updated by MTRR Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-19 20:52:42 +02:00
Xiao Guangrong	edc90b7dc4	KVM: MMU: fix SMAP virtualization KVM may turn a user page to a kernel page when kernel writes a readonly user page if CR0.WP = 1. This shadow page entry will be reused after SMAP is enabled so that kernel is allowed to access this user page Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu once CR4.SMAP is updated Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-19 20:52:36 +02:00
Ingo Molnar	c47ada305d	x86/fpu: Harmonize FPU register state types Use these consistent names: struct fregs_state # was: i387_fsave_struct struct fxregs_state # was: i387_fxsave_struct struct swregs_state # was: i387_soft_struct struct xregs_state # was: xsave_struct union fpregs_state # was: thread_xstate Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:09 +02:00
Ingo Molnar	bf935b0b52	x86/fpu: Create 'union thread_xstate' helper for fpstate_init() fpstate_init() only uses fpu->state, so pass that in to it. This enables the cleanup we will do in the next patch. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:07 +02:00
Ingo Molnar	0e75c54f17	x86/fpu: Rename restore_fpu_checking() to copy_fpstate_to_fpregs() fpu_restore_checking() is a helper function of restore_fpu_checking(), but this is not apparent from the naming. Both copy fpstate contents to fpregs, while the fuller variant does a full copy without leaking information. So rename them to: copy_fpstate_to_fpregs() __copy_fpstate_to_fpregs() Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:03 +02:00
Ingo Molnar	befc61ad3c	x86/fpu: Move asm/xcr.h to asm/fpu/internal.h Now that all FPU internals using drivers are converted to public APIs, move xcr.h's definitions into fpu/internal.h and remove xcr.h. Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:48:00 +02:00
Ingo Molnar	c4d72e2db3	x86/fpu: Simplify fpstate_init_curr() usage Now that fpstate_init_curr() is not doing implicit allocations anymore, almost all uses of it involve a very simple pattern: if (!fpu->fpstate_active) fpstate_init_curr(fpu); which is basically activating the FPU fpstate if it was not active before. So propagate the check into the function itself, and rename the function according to its new purpose: fpu__activate_curr(fpu); Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:51 +02:00
Ingo Molnar	0ee6a51725	x86/fpu, kvm: Simplify fx_init() Now that fpstate_init() cannot fail the error return of fx_init() has lost its purpose. Eliminate the error return and propagate this change to all callers. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:51 +02:00
Ingo Molnar	e62bb3d894	x86/fpu: Rename fpstate_alloc_init() to fpstate_init_curr() Now that there are no FPU context allocations, rename fpstate_alloc_init() to fpstate_init_curr(), to signal that it initializes the fpstate and marks it active, for the current task. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:50 +02:00
Ingo Molnar	91d93d0e20	x86/fpu: Remove failure return from fpstate_alloc_init() Remove the failure code and propagate this down to callers. Note that this function still has an 'init' aspect, which must be called. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:50 +02:00
Ingo Molnar	c4d6ee6e2e	x86/fpu: Remove failure paths from fpstate-alloc low level functions Now that we always allocate the FPU context as part of task_struct there's no need for separate allocations - remove them and their primary failure handling code. ( Note that there's still secondary error codes that have become superfluous, those will be removed in separate patches. ) Move the somewhat misplaced setup_xstate_comp() call to the core. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:50 +02:00
Ingo Molnar	7366ed771f	x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again) So 6 years ago we made the FPU fpstate dynamically allocated: `aa283f4927` ("x86, fpu: lazy allocation of FPU area - v5") `61c4628b53` ("x86, fpu: split FPU state from task struct - v5") In hindsight this was a mistake: - it complicated context allocation failure handling, such as: /* kthread execs. TODO: cleanup this horror. / if (WARN_ON(fpstate_alloc_init(fpu))) force_sig(SIGKILL, tsk); - it caused us to enable irqs in fpu__restore(): local_irq_enable(); / * does a slab alloc which can sleep / if (fpstate_alloc_init(fpu)) { / * ran out of memory! */ do_group_exit(SIGKILL); return; } local_irq_disable(); - it (slightly) slowed down task creation/destruction by adding slab allocation/free pattens. - it made access to context contents (slightly) slower by adding one more pointer dereference. The motivation for the dynamic allocation was two-fold: - reduce memory consumption by non-FPU tasks - allocate and handle only the necessary amount of context for various XSAVE processors that have varying hardware frame sizes. These days, with glibc using SSE memcpy by default and GCC optimizing for SSE/AVX by default, the scope of FPU using apps on an x86 system is much larger than it was 6 years ago. For example on a freshly installed Fedora 21 desktop system, with a recent kernel, all non-kthread tasks have used the FPU shortly after bootup. Also, even modern embedded x86 CPUs try to support the latest vector instruction set - so they'll too often use the larger xstate frame sizes. So remove the dynamic allocation complication by embedding the FPU fpstate in task_struct again. This should make the FPU a lot more accessible to all sorts of atomic contexts. We could still optimize for the xstate frame size in the future, by moving the state structure to the last element of task_struct, and allocating only a part of that. This change is kept minimal by still keeping the ctx_alloc()/free() routines (that now do nothing substantial) - we'll remove them in the following patches. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:49 +02:00
Ingo Molnar	4f83634710	x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate() So fpu_save_init() is a historic name that got its name when the only way the FPU state was FNSAVE, which cleared (well, destroyed) the FPU state after saving it. Nowadays the name is misleading, because ever since the introduction of FXSAVE (and more modern FPU saving instructions) the 'we need to reload the FPU state' part is only true if there's a pending FPU exception [], which is almost never the case. So rename it to copy_fpregs_to_fpstate() to make it clear what's happening. Also add a few comments about why we cannot keep registers in certain cases. Also clean up the control flow a bit, to make it more apparent when we are dropping/keeping FP registers, and to optimize the common case (of keeping fpregs) some more. [] Probably not true anymore, modern instructions always leave the FPU state intact, even if exceptions are pending: because pending FP exceptions are posted on the next FP instruction, not asynchronously. They were truly asynchronous back in the IRQ13 case, and we had to synchronize with them, but that code is not working anymore: we don't have IRQ13 mapped in the IDT anymore. But a cleanup patch is obviously not the place to change subtle behavior. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:49 +02:00
Ingo Molnar	400e4b2091	x86/fpu: Rename xsave.header::xstate_bv to 'xfeatures' 'xsave.header::xstate_bv' is a misnomer - what does 'bv' stand for? It probably comes from the 'XGETBV' instruction name, but I could not find in the Intel documentation where that abbreviation comes from. It could mean 'bit vector' - or something else? But how about - instead of guessing about a weird name - we named the field in an obvious and descriptive way that tells us exactly what it does? So rename it to 'xfeatures', which is a bitmask of the xfeatures that are fpstate_active in that context structure. Eyesore like: fpu->state->xsave.xsave_hdr.xstate_bv \|= XSTATE_FP; is now much more readable: fpu->state->xsave.header.xfeatures \|= XSTATE_FP; Which form is not just infinitely more readable, but is also shorter as well. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:35 +02:00
Ingo Molnar	3a54450b5e	x86/fpu: Rename 'xsave_hdr' to 'header' Code like: fpu->state->xsave.xsave_hdr.xstate_bv \|= XSTATE_FP; is an eyesore, because not only is the words 'xsave' and 'state' are repeated twice times (!), but also because of the 'hdr' and 'bv' abbreviations that are pretty meaningless at a first glance. Start cleaning this up by renaming 'xsave_hdr' to 'header'. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:34 +02:00
Ingo Molnar	78f7f1e54b	x86/fpu: Rename fpu-internal.h to fpu/internal.h This unifies all the FPU related header files under a unified, hiearchical naming scheme: - asm/fpu/types.h: FPU related data types, needed for 'struct task_struct', widely included in almost all kernel code, and hence kept as small as possible. - asm/fpu/api.h: FPU related 'public' methods exported to other subsystems. - asm/fpu/internal.h: FPU subsystem internal methods - asm/fpu/xsave.h: XSAVE support internal methods (Also standardize the header guard in asm/fpu/internal.h.) Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:31 +02:00
Ingo Molnar	db2b1d3ad1	x86/fpu: Use 'struct fpu' in fpstate_alloc_init() Migrate this function to pure 'struct fpu' usage. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:29 +02:00
Ingo Molnar	c5bedc6847	x86/fpu: Get rid of PF_USED_MATH usage, convert it to fpu->fpstate_active Introduce a simple fpu->fpstate_active flag in the fpu context data structure and use that instead of PF_USED_MATH in task->flags. Testing for this flag byte should be slightly more efficient than testing a bit in a bitmask, but the main advantage is that most FPU functions can now be performed on a 'struct fpu' alone, they don't need access to 'struct task_struct' anymore. There's a slight linecount increase, mostly due to the 'fpu' local variables and due to extra comments. The local variables will go away once we move most of the FPU methods to pure 'struct fpu' parameters. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:25 +02:00
Ingo Molnar	4c1384100e	x86/fpu: Open code PF_USED_MATH usages PF_USED_MATH is used directly, but also in a handful of helper inlines. To ease the elimination of PF_USED_MATH, convert all inline helpers to open-coded PF_USED_MATH usage. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:24 +02:00
Ingo Molnar	f89e32e0a3	x86/fpu: Fix header file dependencies of fpu-internal.h Fix a minor header file dependency bug in asm/fpu-internal.h: it relies on i387.h but does not include it. All users of fpu-internal.h included it explicitly. Also remove unnecessary includes, to reduce compilation time. This also makes it easier to use it as a standalone header file for FPU internals, such as an upcoming C module in arch/x86/kernel/fpu/. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:16 +02:00
Ingo Molnar	c0ee2cf61b	x86/fpu: Rename fpu_finit() to fpstate_init() Make it clear that we are initializing the in-memory FPU context area, no the FPU registers. Also move it to the fpu__*() namespace. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:13 +02:00
Ingo Molnar	a7c2a83364	x86/fpu: Rename fpu_free() to fpstate_free() Use the fpu__*() namespace. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:13 +02:00
Ingo Molnar	ed97b08546	x86/fpu: Rename fpu_alloc() to fpstate_alloc() Use the fpu__*() namespace for fpstate_alloc() as well. Also add a comment about FPU state alignment. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:13 +02:00
Ingo Molnar	97185c95f7	x86/fpu: Split an fpstate_alloc_init() function out of init_fpu() Most init_fpu() users don't want the register-saving aspect of the function, they are calling it for 'current' and when FPU registers are not allocated and initialized yet. Split out a simplified API that does just that (and add debug-checks for these conditions): fpstate_alloc_init(). Use it where appropriate. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-05-19 15:47:10 +02:00
Xiao Guangrong	0be0226f07	KVM: MMU: fix SMAP virtualization KVM may turn a user page to a kernel page when kernel writes a readonly user page if CR0.WP = 1. This shadow page entry will be reused after SMAP is enabled so that kernel is allowed to access this user page Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu once CR4.SMAP is updated Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-11 17:17:50 +02:00
Radim Krčmář	74545705cb	KVM: x86: fix initial PAT value PAT should be 0007_0406_0007_0406h on RESET and not modified on INIT. VMX used a wrong value (host's PAT) and while SVM used the right one, it never got to arch.pat. This is not an issue with QEMU as it will force the correct value. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:46 +02:00
Rik van Riel	653f52c316	kvm,x86: load guest FPU context more eagerly Currently KVM will clear the FPU bits in CR0.TS in the VMCS, and trap to re-load them every time the guest accesses the FPU after a switch back into the guest from the host. This patch copies the x86 task switch semantics for FPU loading, with the FPU loaded eagerly after first use if the system uses eager fpu mode, or if the guest uses the FPU frequently. In the latter case, after loading the FPU for 255 times, the fpu_counter will roll over, and we will revert to loading the FPU on demand, until it has been established that the guest is still actively using the FPU. This mirrors the x86 task switch policy, which seems to work. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:45 +02:00
James Sullivan	93bbf0b8bc	kvm: x86: Extended struct kvm_lapic_irq with msi_redir_hint for MSI delivery Extended struct kvm_lapic_irq with bool msi_redir_hint, which will be used to determine if the delivery of the MSI should target only the lowest priority CPU in the logical group specified for delivery. (In physical dest mode, the RH bit is not relevant). Initialized the value of msi_redir_hint to true when RH=1 in kvm_set_msi_irq(), and initialized to false in all other cases. Added value of msi_redir_hint to a debug message dump of an IRQ in apic_send_ipi(). Signed-off-by: James Sullivan <sullivan.james.f@gmail.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:44 +02:00
Nadav Amit	d28bc9dd25	KVM: x86: INIT and reset sequences are different x86 architecture defines differences between the reset and INIT sequences. INIT does not initialize the FPU (including MMX, XMM, YMM, etc.), TSC, PMU, MSRs (in general), MTRRs machine-check, APIC ID, APIC arbitration ID and BSP. References (from Intel SDM): "If the MP protocol has completed and a BSP is chosen, subsequent INITs (either to a specific processor or system wide) do not cause the MP protocol to be repeated." [8.4.2: MP Initialization Protocol Requirements and Restrictions] [Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT] "If the processor is reset by asserting the INIT# pin, the x87 FPU state is not changed." [9.2: X87 FPU INITIALIZATION] "The state of the local APIC following an INIT reset is the same as it is after a power-up or hardware reset, except that the APIC ID and arbitration ID registers are not affected." [10.4.7.3: Local APIC State After an INIT Reset ("Wait-for-SIPI" State)] Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1428924848-28212-1-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:43 +02:00
Nadav Amit	90de4a1875	KVM: x86: Support for disabling quirks Introducing KVM_CAP_DISABLE_QUIRKS for disabling x86 quirks that were previous created in order to overcome QEMU issues. Those issue were mostly result of invalid VM BIOS. Currently there are two quirks that can be disabled: 1. KVM_QUIRK_LINT0_REENABLED - LINT0 was enabled after boot 2. KVM_QUIRK_CD_NW_CLEARED - CD and NW are cleared after boot These two issues are already resolved in recent releases of QEMU, and would therefore be disabled by QEMU. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1428879221-29996-1-git-send-email-namit@cs.technion.ac.il> [Report capability from KVM_CHECK_EXTENSION too. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:29:42 +02:00
Christian Borntraeger	ccf73aaf5a	KVM: arm/mips/x86/power use __kvm_guest_{enter\|exit} Use __kvm_guest_{enter\|exit} instead of kvm_guest_{enter\|exit} where interrupts are disabled. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-05-07 11:28:22 +02:00
Radim Krčmář	5dca0d9147	kvm: x86: fix kvmclock update protocol The kvmclock spec says that the host will increment a version field to an odd number, then update stuff, then increment it to an even number. The host is buggy and doesn't do this, and the result is observable when one vcpu reads another vcpu's kvmclock data. There's no good way for a guest kernel to keep its vdso from reading a different vcpu's kvmclock data, but we don't need to care about changing VCPUs as long as we read a consistent data from kvmclock. (VCPU can change outside of this loop too, so it doesn't matter if we return a value not fit for this VCPU.) Based on a patch by Radim Krčmář. Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-27 15:48:59 +02:00
Nadav Amit	9e9c3fe40b	KVM: x86: Fix MSR_IA32_BNDCFGS in msrs_to_save kvm_init_msr_list is currently called before hardware_setup. As a result, vmx_mpx_supported always returns false when kvm_init_msr_list checks whether to save MSR_IA32_BNDCFGS. Move kvm_init_msr_list after vmx_hardware_setup is called to fix this issue. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1428864435-4732-1-git-send-email-namit@cs.technion.ac.il> Cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-14 18:09:50 +02:00
Linus Torvalds	7fd56474db	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Ingo Molnar: "The main changes in this cycle were: - clockevents state machine cleanups and enhancements (Viresh Kumar) - clockevents broadcast notifier horror to state machine conversion and related cleanups (Thomas Gleixner, Rafael J Wysocki) - clocksource and timekeeping core updates (John Stultz) - clocksource driver updates and fixes (Ben Dooks, Dmitry Osipenko, Hans de Goede, Laurent Pinchart, Maxime Ripard, Xunlei Pang) - y2038 fixes (Xunlei Pang, John Stultz) - NMI-safe ktime_get_raw_fast() and general refactoring of the clock code, in preparation to perf's per event clock ID support (Peter Zijlstra) - generic sched/clock fixes, optimizations and cleanups (Daniel Thompson) - clockevents cpu_down() race fix (Preeti U Murthy)" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (94 commits) timers/PM: Drop unnecessary braces from tick_freeze() timers/PM: Fix up tick_unfreeze() timekeeping: Get rid of stale comment clockevents: Cleanup dead cpu explicitely clockevents: Make tick handover explicit clockevents: Remove broadcast oneshot control leftovers sched/idle: Use explicit broadcast oneshot control function ARM: Tegra: Use explicit broadcast oneshot control function ARM: OMAP: Use explicit broadcast oneshot control function intel_idle: Use explicit broadcast oneshot control function ACPI/idle: Use explicit broadcast control function ACPI/PAD: Use explicit broadcast oneshot control function x86/amd/idle, clockevents: Use explicit broadcast oneshot control functions clockevents: Provide explicit broadcast oneshot control functions clockevents: Remove the broadcast control leftovers ARM: OMAP: Use explicit broadcast control function intel_idle: Use explicit broadcast control function cpuidle: Use explicit broadcast control function ACPI/processor: Use explicit broadcast control function ACPI/PAD: Use explicit broadcast control function ...	2015-04-13 11:08:28 -07:00
Linus Torvalds	9003601310	The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64. ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling vhost, too), page aging s390: interrupt handling rework, allowing to inject all local interrupts via new ioctl and to get/set the full local irq state for migration and introspection. New ioctls to access memory by virtual address, and to get/set the guest storage keys. SIMD support. MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches from Ralf Baechle's MIPS tree. x86: bugfixes (notably for pvclock, the others are small) and cleanups. Another small latency improvement for the TSC deadline timer. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJVJ9vmAAoJEL/70l94x66DoMEH/R3rh8IMf4jTiWRkcqohOMPX k1+NaSY/lCKayaSgggJ2hcQenMbQoXEOdslvaA/H0oC+VfJGK+lmU6E63eMyyhjQ Y+Px6L85NENIzDzaVu/TIWWuhil5PvIRr3VO8cvntExRoCjuekTUmNdOgCvN2ObW wswN2qRdPIeEj2kkulbnye+9IV4G0Ne9bvsmUdOdfSSdi6ZcV43JcvrpOZT++mKj RrKB+3gTMZYGJXMMLBwMkdl8mK1ozriD+q0mbomT04LUyGlPwYLl4pVRDBqyksD7 KsSSybaK2E4i5R80WEljgDMkNqrCgNfg6VZe4n9Y+CfAAOToNnkMJaFEi+yuqbs= =yu2b -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM updates from Paolo Bonzini: "First batch of KVM changes for 4.1 The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64. Summary: ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling vhost, too), page aging s390: interrupt handling rework, allowing to inject all local interrupts via new ioctl and to get/set the full local irq state for migration and introspection. New ioctls to access memory by virtual address, and to get/set the guest storage keys. SIMD support. MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches from Ralf Baechle's MIPS tree. x86: bugfixes (notably for pvclock, the others are small) and cleanups. Another small latency improvement for the TSC deadline timer" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) KVM: use slowpath for cross page cached accesses kvm: mmu: lazy collapse small sptes into large sptes KVM: x86: Clear CR2 on VCPU reset KVM: x86: DR0-DR3 are not clear on reset KVM: x86: BSP in MSR_IA32_APICBASE is writable KVM: x86: simplify kvm_apic_map KVM: x86: avoid logical_map when it is invalid KVM: x86: fix mixed APIC mode broadcast KVM: x86: use MDA for interrupt matching kvm/ppc/mpic: drop unused IRQ_testbit KVM: nVMX: remove unnecessary double caching of MAXPHYADDR KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu KVM: vmx: pass error code with internal error #2 x86: vdso: fix pvclock races with task migration KVM: remove kvm_read_hva and kvm_read_hva_atomic KVM: x86: optimize delivery of TSC deadline timer interrupt KVM: x86: extract blocking logic from __vcpu_run kvm: x86: fix x86 eflags fixed bit KVM: s390: migrate vcpu interrupt state ...	2015-04-13 09:47:01 -07:00
Wanpeng Li	3ea3b7fa9a	kvm: mmu: lazy collapse small sptes into large sptes Dirty logging tracks sptes in 4k granularity, meaning that large sptes have to be split. If live migration is successful, the guest in the source machine will be destroyed and large sptes will be created in the destination. However, the guest continues to run in the source machine (for example if live migration fails), small sptes will remain around and cause bad performance. This patch introduce lazy collapsing of small sptes into large sptes. The rmap will be scanned in ioctl context when dirty logging is stopped, dropping those sptes which can be collapsed into a single large-page spte. Later page faults will create the large-page sptes. Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Message-Id: <1428046825-6905-1-git-send-email-wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:04 +02:00
Nadav Amit	1119022c71	KVM: x86: Clear CR2 on VCPU reset CR2 is not cleared as it should after reset. See Intel SDM table named "IA-32 Processor States Following Power-up, Reset, or INIT". Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427933438-12782-5-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:03 +02:00
Nadav Amit	ae561edeb4	KVM: x86: DR0-DR3 are not clear on reset DR0-DR3 are not cleared as they should during reset and when they are set from userspace. It appears to be caused by `c77fb5fe6f` ("KVM: x86: Allow the guest to run with dirty debug registers"). Force their reload on these situations. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427933438-12782-4-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:03 +02:00
Nadav Amit	58d269d8cc	KVM: x86: BSP in MSR_IA32_APICBASE is writable After reset, the CPU can change the BSP, which will be used upon INIT. Reset should return the BSP which QEMU asked for, and therefore handled accordingly. To quote: "If the MP protocol has completed and a BSP is chosen, subsequent INITs (either to a specific processor or system wide) do not cause the MP protocol to be repeated." [Intel SDM 8.4.2: MP Initialization Protocol Requirements and Restrictions] Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Message-Id: <1427933438-12782-3-git-send-email-namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:47:02 +02:00
Eugene Korenevsky	5a4f55cde8	KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu cpuid_maxphyaddr(), which performs lot of memory accesses is called extensively across KVM, especially in nVMX code. This patch adds a cached value of maxphyaddr to vcpu.arch to reduce the pressure onto CPU cache and simplify the code of cpuid_maxphyaddr() callers. The cached value is initialized in kvm_arch_vcpu_init() and reloaded every time CPUID is updated by usermode. It is obvious that these reloads occur infrequently. Signed-off-by: Eugene Korenevsky <ekorenevsky@gmail.com> Message-Id: <20150329205612.GA1223@gnote> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:46:56 +02:00
Paolo Bonzini	9c8fd1ba22	KVM: x86: optimize delivery of TSC deadline timer interrupt The newly-added tracepoint shows the following results on the tscdeadline_latency test: qemu-kvm-8387 [002] 6425.558974: kvm_vcpu_wakeup: poll time 10407 ns qemu-kvm-8387 [002] 6425.558984: kvm_vcpu_wakeup: poll time 0 ns qemu-kvm-8387 [002] 6425.561242: kvm_vcpu_wakeup: poll time 10477 ns qemu-kvm-8387 [002] 6425.561251: kvm_vcpu_wakeup: poll time 0 ns and so on. This is because we need to go through kvm_vcpu_block again after the timer IRQ is injected. Avoid it by polling once before entering kvm_vcpu_block. On my machine (Xeon E5 Sandy Bridge) this removes about 500 cycles (7%) from the latency of the TSC deadline timer. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:46:54 +02:00
Paolo Bonzini	362c698f82	KVM: x86: extract blocking logic from __vcpu_run Rename the old __vcpu_run to vcpu_run, and extract part of it to a new function vcpu_block. The next patch will add a new condition in vcpu_block, avoid extra indentation. Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-04-08 10:46:53 +02:00
Paolo Bonzini	bf0fb67cf9	KVM/ARM changes for v4.1: - fixes for live migration - irqfd support - kvm-io-bus & vgic rework to enable ioeventfd - page ageing for stage-2 translation - various cleanups -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVHQ0kAAoJECPQ0LrRPXpDHKQQALjw6STaZd7n20OFopNgHd4P qVeWYEKBxnsiSvL4p3IOSlZlEul+7x08aZqtyxWQRQcDT4ggTI+3FKKfc+8yeRpH WV6YJP0bGqz7039PyMLuIgs48xkSZtntePw69hPJfHZh4C1RBlP5T2SfE8mU8VZX fWToiU3W12QfKnmN7JFgxZopnGhrYCrG0EexdTDziAZu0GEMlDrO4wnyTR60WCvT 4TEF73R0kpAz4yplKuhcDHuxIG7VFhQ4z7b09M1JtR0gQ3wUvfbD3Wqqi49SwHkv NQOStcyLsIlDosSRcLXNCwb3IxjObXTBcAxnzgm2Aoc1xMMZX1ZPQNNs6zFZzycb 2c6QMiQ35zm7ellbvrG+bT+BP86JYWcAtHjWcaUFgqSJjb8MtqcMtsCea/DURhqx /kictqbPYBBwKW6SKbkNkisz59hPkuQnv35fuf992MRCbT9LAXLPRLbcirucCzkE p1MOotsWoO3ldJMZaVn0KYk3sQf6mCIfbYPEdOcw3fhJlvyy3NdjVkLOFbA5UUg1 rQ7Ru2rTemBc0ExVrymngNTMpMB4XcEeJzXfhcgMl3DWbDj60Ku/O26sDtZ6bsFv JuDYn8FVDHz9gpEQHgiUi1YMsBKXLhnILa1ppaa6AflykU3BRfYjAk1SXmX84nQK mJUJEdFuxi6pHN0UKxUI =avA4 -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into 'kvm-next' KVM/ARM changes for v4.1: - fixes for live migration - irqfd support - kvm-io-bus & vgic rework to enable ioeventfd - page ageing for stage-2 translation - various cleanups	2015-04-07 18:09:20 +02:00
Ingo Molnar	c5e77f5216	Linux 4.0-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVGHwjAAoJEHm+PkMAQRiG8rcIAJ6cEJ6mbqLpyz5XrGf4yNp0 +wG/QlEpT8rgrxe9wSjB3lfW3kR2Pe69b9fVVCdiklygdkmva5vfmDrVGGzYfe3M QrFSSlMVBplvh6IiM/L1mVMtr3DSmCO23YZZ9R5b7FoEYatNHRpNWBCBpuXpd4aD sLuIvO3L/S7LqeOAFkkYWv6AuL9umicmjR8u+nsmCSRJom7At/aJ6R66WIp9vxho Rn7r6wcUk6B2Q/gYNjdSE8SIwdyKhuBGyvqQ9U9s6Btg9DQfM/b0vG5kw9hqeAq/ 9445jqVDP1whA2vz6GjnvltidxrqRvuDPBwzOnFmY5U+KZz4lS3x2mnWAAJ3xWs= =TqVJ -----END PGP SIGNATURE----- Merge tag 'v4.0-rc6' into timers/core, before applying new patches Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-31 09:08:13 +02:00
Peter Zijlstra	876e78818d	time: Rename timekeeper::tkr to timekeeper::tkr_mono In preparation of adding another tkr field, rename this one to tkr_mono. Also rename tk_read_base::base_mono to tk_read_base::base, since the structure is not specific to CLOCK_MONOTONIC and the mono name got added to the tk_read_base instance. Lots of trivial churn. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150319093400.344679419@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2015-03-27 09:45:06 +01:00
Nikolay Nikolaev	e32edf4fd0	KVM: Redesign kvm_io_bus_ API to pass VCPU structure to the callbacks. This is needed in e.g. ARM vGIC emulation, where the MMIO handling depends on the VCPU that does the access. Signed-off-by: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>	2015-03-26 21:43:11 +00:00
Xiubo Li	52eb5a6d57	KVM: x86: For the symbols used locally only should be static type This patch fix the following sparse warnings: for arch/x86/kvm/x86.c: warning: symbol 'emulator_read_write' was not declared. Should it be static? warning: symbol 'emulator_write_emulated' was not declared. Should it be static? warning: symbol 'emulator_get_dr' was not declared. Should it be static? warning: symbol 'emulator_set_dr' was not declared. Should it be static? for arch/x86/kvm/pmu.c: warning: symbol 'fixed_pmc_events' was not declared. Should it be static? Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-17 22:38:28 -03:00
Xiubo Li	795a149e78	KVM: x86: Avoid using plain integer as NULL pointer warning This patch fix the following sparse warning: for file arch/x86/kvm/x86.c: warning: Using plain integer as NULL pointer Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-17 22:34:25 -03:00
Paolo Bonzini	dc9be0fac7	kvm: move advertising of KVM_CAP_IRQFD to common code POWER supports irqfds but forgot to advertise them. Some userspace does not check for the capability, but others check it---thus they work on x86 and s390 but not POWER. To avoid that other architectures in the future make the same mistake, let common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE. Reported-and-tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: `297e21053a` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 21:18:59 -03:00
Joel Schopp	5cb56059c9	kvm: x86: make kvm_emulate_* consistant Currently kvm_emulate() skips the instruction but kvm_emulate_* sometimes don't. The end reult is the caller ends up doing the skip themselves. Let's make them consistant. Signed-off-by: Joel Schopp <joel.schopp@amd.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 20:29:15 -03:00
Thomas Huth	548ef28449	KVM: Get rid of kvm_kvfree() kvm_kvfree() provides exactly the same functionality as the new common kvfree() function - so let's simply replace the kvm function with the common function. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2015-03-10 10:37:43 -03:00
Paolo Bonzini	f781951299	kvm: add halt_poll_ns module parameter This patch introduces a new module parameter for the KVM module; when it is present, KVM attempts a bit of polling on every HLT before scheduling itself out via kvm_vcpu_block. This parameter helps a lot for latency-bound workloads---in particular I tested it with O_DSYNC writes with a battery-backed disk in the host. In this case, writes are fast (because the data doesn't have to go all the way to the platters) but they cannot be merged by either the host or the guest. KVM's performance here is usually around 30% of bare metal, or 50% if you use cache=directsync or cache=writethrough (these parameters avoid that the guest sends pointless flush requests, and at the same time they are not slow because of the battery-backed cache). The bad performance happens because on every halt the host CPU decides to halt itself too. When the interrupt comes, the vCPU thread is then migrated to a new physical CPU, and in general the latency is horrible because the vCPU thread has to be scheduled back in. With this patch performance reaches 60-65% of bare metal and, more important, 99% of what you get if you use idle=poll in the guest. This means that the tunable gets rid of this particular bottleneck, and more work can be done to improve performance in the kernel or QEMU. Of course there is some price to pay; every time an otherwise idle vCPUs is interrupted by an interrupt, it will poll unnecessarily and thus impose a little load on the host. The above results were obtained with a mostly random value of the parameter (500000), and the load was around 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, that can be used to tune the parameter. It counts how many HLT instructions received an interrupt during the polling period; each successful poll avoids that Linux schedules the VCPU thread out and back in, and may also avoid a likely trip to C1 and back for the physical CPU. While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. Of these halts, almost all are failed polls. During the benchmark, instead, basically all halts end within the polling period, except a more or less constant stream of 50 per second coming from vCPUs that are not running the benchmark. The wasted time is thus very low. Things may be slightly different for Windows VMs, which have a ~10 ms timer tick. The effect is also visible on Marcelo's recently-introduced latency test for the TSC deadline timer. Though of course a non-RT kernel has awful latency bounds, the latency of the timer is around 8000-10000 clock cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC deadline timer, thus, the effect is both a smaller average latency and a smaller variance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-02-06 13:08:37 +01:00
Kai Huang	843e433057	KVM: VMX: Add PML support in VMX This patch adds PML support in VMX. A new module parameter 'enable_pml' is added to allow user to enable/disable it manually. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-30 09:39:54 +01:00
Kai Huang	88178fd4f7	KVM: x86: Add new dirty logging kvm_x86_ops for PML This patch adds new kvm_x86_ops dirty logging hooks to enable/disable dirty logging for particular memory slot, and to flush potentially logged dirty GPAs before reporting slot->dirty_bitmap to userspace. kvm x86 common code calls these hooks when they are available so PML logic can be hidden to VMX specific. SVM won't be impacted as these hooks remain NULL there. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:31:41 +01:00
Kai Huang	1c91cad423	KVM: x86: Change parameter of kvm_mmu_slot_remove_write_access This patch changes the second parameter of kvm_mmu_slot_remove_write_access from 'slot id' to 'struct kvm_memory_slot ' to align with kvm_x86_ops dirty logging hooks, which will be introduced in further patch. Better way is to change second parameter of kvm_arch_commit_memory_region from 'struct kvm_userspace_memory_region ' to 'struct kvm_memory_slot * new', but it requires changes on other non-x86 ARCH too, so avoid it now. Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-29 15:31:37 +01:00
Nadav Amit	bac155310b	KVM: x86: 32-bit wraparound read/write not emulated correctly If we got a wraparound of 32-bit operand, and the limit is 0xffffffff, read and writes should be successful. It just needs to be done in two segments. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:15:18 +01:00
Nadav Amit	801806d956	KVM: x86: IRET emulation does not clear NMI masking The IRET instruction should clear NMI masking, but the current implementation does not do so. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-26 12:14:42 +01:00
Paolo Bonzini	8fff5e374a	KVM: s390: fixes and features for kvm/next (3.20) 1. Generic - sparse warning (make function static) - optimize locking - bugfixes for interrupt injection - fix MVPG addressing modes 2. hrtimer/wakeup fun A recent change can cause KVM hangs if adjtime is used in the host. The hrtimer might wake up too early or too late. Too early is fatal as vcpu_block will see that the wakeup condition is not met and sleep again. This CPU might never wake up again. This series addresses this problem. adjclock slowing down the host clock will result in too late wakeups. This will require more work. In addition to that we also change the hrtimer from REALTIME to MONOTONIC to avoid similar problems with timedatectl set-time. 3. sigp rework We will move all "slow" sigps to QEMU (protected with a capability that can be enabled) to avoid several races between concurrent SIGP orders. 4. Optimize the shadow page table Provide an interface to announce the maximum guest size. The kernel will use that to make the pagetable 2,3,4 (or theoretically) 5 levels. 5. Provide an interface to set the guest TOD We now use two vm attributes instead of two oneregs, as oneregs are vcpu ioctl and we don't want to call them from other threads. 6. Protected key functions The real HMC allows to enable/disable protected key CPACF functions. Lets provide an implementation + an interface for QEMU to activate this the protected key instructions. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iQIcBAABAgAGBQJUwj60AAoJEBF7vIC1phx8iV0QAKq1LZRTmgTLS2fd0oyWKZeN ShWUIUiB+7IUiuogYXZMfqOm61oogxwc95Ti+3tpSWYwkzUWagpS/RJQze7E1HOc 3pHpXwrR01ueUT6uVV4xc/vmVIlQAIl/ScRDDPahlAT2crCleWcKVC9l0zBs/Kut IrfzN9pJcrkmXD178CDP8/VwXsn02ptLQEpidGibGHCd03YVFjp3X0wfwNdQxMbU qOwNYCz3SLfDm5gsybO2DG+aVY3AbM2ZOJt/qLv2j4Phz4XB4t4W9iJnAefSz7JA W4677wbMQpfZlUQYhI78H/Cl9SfWAuLug1xk83O/+lbEiR5u+8zLxB69dkFTiBaH 442OY957T6TQZ/V9d0jDo2XxFrcaU9OONbVLsfBQ56Vwv5cAg9/7zqG8eqH7Nq9R gU3fQesgD4N0Kpa77T9k45TT/hBRnUEtsGixAPT6QYKyE6cK4AJATHKSjMSLbdfj ELbt0p2mVtKhuCcANfEx54U2CxOrg5ElBmPz8hRw0OkXdwpqh1sGKmt0govcHP1I BGSzE9G4mswwI1bQ7cqcyTk/lwL8g3+KQmRJoOcgCveQlnY12X5zGD5DhuPMPiIT VENqbcTzjlxdu+4t7Enml+rXl7ySsewT9L231SSrbLsTQVgCudD1B9m72WLu5ZUT 9/Z6znv6tkeKV5rM9DYE =zLjR -----END PGP SIGNATURE----- Merge tag 'kvm-s390-next-20150122' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next KVM: s390: fixes and features for kvm/next (3.20) 1. Generic - sparse warning (make function static) - optimize locking - bugfixes for interrupt injection - fix MVPG addressing modes 2. hrtimer/wakeup fun A recent change can cause KVM hangs if adjtime is used in the host. The hrtimer might wake up too early or too late. Too early is fatal as vcpu_block will see that the wakeup condition is not met and sleep again. This CPU might never wake up again. This series addresses this problem. adjclock slowing down the host clock will result in too late wakeups. This will require more work. In addition to that we also change the hrtimer from REALTIME to MONOTONIC to avoid similar problems with timedatectl set-time. 3. sigp rework We will move all "slow" sigps to QEMU (protected with a capability that can be enabled) to avoid several races between concurrent SIGP orders. 4. Optimize the shadow page table Provide an interface to announce the maximum guest size. The kernel will use that to make the pagetable 2,3,4 (or theoretically) 5 levels. 5. Provide an interface to set the guest TOD We now use two vm attributes instead of two oneregs, as oneregs are vcpu ioctl and we don't want to call them from other threads. 6. Protected key functions The real HMC allows to enable/disable protected key CPACF functions. Lets provide an implementation + an interface for QEMU to activate this the protected key instructions.	2015-01-23 14:33:36 +01:00
Paolo Bonzini	1c6007d59a	KVM/ARM changes for v3.20 including GICv3 emulation, dirty page logging, added trace symbols, and adding an explicit VGIC init device control IOCTL. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJUwhsKAAoJEEtpOizt6ddyuSEH/ia2uf07N0i+C1dPKYiqhKEd nFqBvgrhAMVztWLmy1Wq4SnO9YNd+CrPYATrfCiYsYQ9aKc09+qDq+uo06bVpZXz KsHjVGUsdyJ4qRqjDixkPvZviGIXa6C//+hcwg1XH2nit1uHmXVupzB9dDz3ZM2l GCwApdRdaaUVDt5Ud2ljqIWZa18Qf/5/HD8MdPXpmotDOKucL6pBr/1R1XWueCU/ ejRs/qy3EFyMWdEdfGFAMCa0ZvHbPmsJmvB/EgkyUnuJj77ptA0jNo1jtzSfEyis 53x4ffWnIsPl9yqhk0oKerIALVUvV4A7/me2ya6tsQ5fiBX7lJ3+qwggvCkWQzw= =fMS2 -----END PGP SIGNATURE----- Merge tag 'kvm-arm-for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next KVM/ARM changes for v3.20 including GICv3 emulation, dirty page logging, added trace symbols, and adding an explicit VGIC init device control IOCTL. Conflicts: arch/arm64/include/asm/kvm_arm.h arch/arm64/kvm/handle_exit.c	2015-01-23 13:39:51 +01:00
Dominik Dingel	31928aa586	KVM: remove unneeded return value of vcpu_postcreate The return value of kvm_arch_vcpu_postcreate is not checked in its caller. This is okay, because only x86 provides vcpu_postcreate right now and it could only fail if vcpu_load failed. But that is not possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so just get rid of the unchecked return value. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>	2015-01-23 13:24:52 +01:00
Marcelo Tosatti	54750f2cf0	KVM: x86: workaround SuSE's 2.6.16 pvclock vs masterclock issue SuSE's 2.6.16 kernel fails to boot if the delta between tsc_timestamp and rdtsc is larger than a given threshold: * If we get more than the below threshold into the future, we rerequest * the real time from the host again which has only little offset then * that we need to adjust using the TSC. * * For now that threshold is 1/5th of a jiffie. That should be good * enough accuracy for completely broken systems, but also give us swing * to not call out to the host all the time. */ #define PVCLOCK_DELTA_MAX ((1000000000ULL / HZ) / 5) Disable masterclock support (which increases said delta) in case the boot vcpu does not use MSR_KVM_SYSTEM_TIME_NEW. Upstreams kernels which support pvclock vsyscalls (and therefore make use of PVCLOCK_STABLE_BIT) use MSR_KVM_SYSTEM_TIME_NEW. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-20 20:38:39 +01:00
Fengguang Wu	69b0049a89	KVM: fix "Should it be static?" warnings from sparse arch/x86/kvm/x86.c:495:5: sparse: symbol 'kvm_read_nested_guest_page' was not declared. Should it be static? arch/x86/kvm/x86.c:646:5: sparse: symbol '__kvm_set_xcr' was not declared. Should it be static? arch/x86/kvm/x86.c:1183:15: sparse: symbol 'max_tsc_khz' was not declared. Should it be static? arch/x86/kvm/x86.c:1237:6: sparse: symbol 'kvm_track_tsc_matching' was not declared. Should it be static? Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-20 20:38:35 +01:00
Paolo Bonzini	e108ff2f80	KVM: x86: switch to kvm_get_dirty_log_protect We now have a generic function that does most of the work of kvm_vm_ioctl_get_dirty_log, now use it. Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>	2015-01-16 14:40:14 +01:00
Radim Krčmář	defcf51fa9	KVM: x86: allow TSC deadline timer on all hosts Emulation does not utilize the feature. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-09 10:23:54 +01:00
Nicholas Krause	bab5bb3982	kvm: x86: Remove kvm_make_request from lapic.c Adds a function kvm_vcpu_set_pending_timer instead of calling kvm_make_request in lapic.c. Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-08 22:48:08 +01:00
Marcelo Tosatti	d0659d946b	KVM: x86: add option to advance tscdeadline hrtimer expiration For the hrtimer which emulates the tscdeadline timer in the guest, add an option to advance expiration, and busy spin on VM-entry waiting for the actual expiration time to elapse. This allows achieving low latencies in cyclictest (or any scenario which requires strict timing regarding timer expiration). Reduces average cyclictest latency from 12us to 8us on Core i5 desktop. Note: this option requires tuning to find the appropriate value for a particular hardware/guest combination. One method is to measure the average delay between apic_timer_fn and VM-entry. Another method is to start with 1000ns, and increase the value in say 500ns increments until avg cyclictest numbers stop decreasing. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-08 22:47:30 +01:00
Wincy Van	ff651cb613	KVM: nVMX: Add nested msr load/restore algorithm Several hypervisors need MSR auto load/restore feature. We read MSRs from VM-entry MSR load area which specified by L1, and load them via kvm_set_msr in the nested entry. When nested exit occurs, we get MSRs via kvm_get_msr, writing them to L1`s MSR store area. After this, we read MSRs from VM-exit MSR load area, and load them via kvm_set_msr. Signed-off-by: Wincy Van <fanwenyi0529@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2015-01-08 22:45:14 +01:00
Paolo Bonzini	df1daba7d1	KVM: x86: support XSAVES usage in the host Userspace is expecting non-compacted format for KVM_GET_XSAVE, but struct xsave_struct might be using the compacted format. Convert in order to preserve userspace ABI. Likewise, userspace is passing non-compacted format for KVM_SET_XSAVE but the kernel will pass it to XRSTORS, and we need to convert back. Fixes: `f31a9f7c71` Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: stable@vger.kernel.org Cc: H. Peter Anvin <hpa@linux.intel.com> Tested-by: Nadav Amit <namit@cs.technion.ac.il> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-12-05 13:57:05 +01:00
Paolo Bonzini	2b4a273b42	kvm: x86: avoid warning about potential shift wrapping bug cs.base is declared as a __u64 variable and vector is a u32 so this causes a static checker warning. The user indeed can set "sipi_vector" to any u32 value in kvm_vcpu_ioctl_x86_set_vcpu_events(), but the value should really have 8-bit precision only. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-24 16:53:50 +01:00
Paolo Bonzini	c9eab58f64	KVM: x86: move device assignment out of kvm_host.h Create a new header, and hide the device assignment functions there. Move struct kvm_assigned_dev_kernel to assigned-dev.c by modifying arch/x86/kvm/iommu.c to take a PCI device struct. Based on a patch by Radim Krcmar <rkrcmark@redhat.com>. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-24 16:53:50 +01:00
Radim Krčmář	c274e03af7	kvm: x86: move assigned-dev.c and iommu.c to arch/x86/ Now that ia64 is gone, we can hide deprecated device assignment in x86. Notable changes: - kvm_vm_ioctl_assigned_device() was moved to x86/kvm_arch_vm_ioctl() The easy parts were removed from generic kvm code, remaining - kvm_iommu_(un)map_pages() would require new code to be moved - struct kvm_assigned_dev_kernel depends on struct kvm_irq_ack_notifier Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-23 18:33:36 +01:00
Paolo Bonzini	6ef768fac9	kvm: x86: move ioapic.c and irq_comm.c back to arch/x86/ ia64 does not need them anymore. Ack notifiers become x86-specific too. Suggested-by: Gleb Natapov <gleb@kernel.org> Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-21 18:02:37 +01:00
Chris J Arges	d913b90435	kvm: svm: move WARN_ON in svm_adjust_tsc_offset When running the tsc_adjust kvm-unit-test on an AMD processor with the IA32_TSC_ADJUST feature enabled, the WARN_ON in svm_adjust_tsc_offset can be triggered. This WARN_ON checks for a negative adjustment in case __scale_tsc is called; however it may trigger unnecessary warnings. This patch moves the WARN_ON to trigger only if __scale_tsc will actually be called from svm_adjust_tsc_offset. In addition make adj in kvm_set_msr_common s64 since this can have signed values. Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-13 11:56:11 +01:00
Paolo Bonzini	ac146235d4	KVM: x86: fix warning on 32-bit compilation PCIDs are only supported in 64-bit mode. No need to clear bit 63 of CR3 unless the host is 64-bit. Reported by Fengguang Wu's autobuilder. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-10 13:53:25 +01:00
David Matlack	ce1a5e60a6	kvm: x86: add trace event for pvclock updates The new trace event records: * the id of vcpu being updated * the pvclock_vcpu_time_info struct being written to guest memory This is useful for debugging pvclock bugs, such as the bug fixed by "[PATCH] kvm: x86: Fix kvm clock versioning.". Signed-off-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-08 08:20:55 +01:00
Owen Hofmann	09a0c3f110	kvm: x86: Fix kvm clock versioning. kvm updates the version number for the guest paravirt clock structure by incrementing the version of its private copy. It does not read the guest version, so will write version = 2 in the first update for every new VM, including after restoring a saved state. If guest state is saved during reading the clock, it could read and accept struct fields and guest TSC from two different updates. This changes the code to increment the guest version and write it back. Signed-off-by: Owen Hofmann <osh@google.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-08 08:20:54 +01:00
Marcelo Tosatti	7f187922dd	KVM: x86: update masterclock values on TSC writes When the guest writes to the TSC, the masterclock TSC copy must be updated as well along with the TSC_OFFSET update, otherwise a negative tsc_timestamp is calculated at kvm_guest_time_update. Once "if (!vcpus_matched && ka->use_master_clock)" is simplified to "if (ka->use_master_clock)", the corresponding "if (!ka->use_master_clock)" becomes redundant, so remove the do_request boolean and collapse everything into a single condition. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-08 08:20:53 +01:00
Nadav Amit	38827dbd3f	KVM: x86: Do not update EFLAGS on faulting emulation If the emulation ends in fault, eflags should not be updated. However, several instruction emulations (actually all the fastops) currently update eflags, if the fault was detected afterwards (e.g., #PF during writeback). Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-07 15:44:08 +01:00
Nadav Amit	9d88fca71a	KVM: x86: MOV to CR3 can set bit 63 Although Intel SDM mentions bit 63 is reserved, MOV to CR3 can have bit 63 set. As Intel SDM states in section 4.10.4 "Invalidation of TLBs and Paging-Structure Caches": " MOV to CR3. ... If CR4.PCIDE = 1 and bit 63 of the instructionâ€™s source operand is 0 ..." In other words, bit 63 is not reserved. KVM emulator currently consider bit 63 as reserved. Fix it. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-07 15:44:07 +01:00
Nadav Amit	82b32774c2	KVM: x86: Breakpoints do not consider CS.base x86 debug registers hold a linear address. Therefore, breakpoints detection should consider CS.base, and check whether instruction linear address equals (CS.base + RIP). This patch introduces a function to evaluate RIP linear address and uses it for breakpoints detection. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-07 15:44:04 +01:00
Nadav Amit	3ffb24681c	KVM: x86: No error-code on real-mode exceptions Real-mode exceptions do not deliver error code. As can be seen in Intel SDM volume 2, real-mode exceptions do not have parentheses, which indicate error-code. To avoid significant changes of the code, the error code is "removed" during exception queueing. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-07 15:44:02 +01:00
Chao Peng	612263b30c	KVM: x86: Enable Intel AVX-512 for guest Expose Intel AVX-512 feature bits to guest. Also add checks for xcr0 AVX512 related bits according to spec: http://download-software.intel.com/sites/default/files/managed/71/2e/319433-017.pdf Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:30 +01:00
Nadav Amit	16f8a6f979	KVM: vmx: Unavailable DR4/5 is checked before CPL If DR4/5 is accessed when it is unavailable (since CR4.DE is set), then #UD should be generated even if CPL>0. This is according to Intel SDM Table 6-2: "Priority Among Simultaneous Exceptions and Interrupts". Note, that this may happen on the first DR access, even if the host does not sets debug breakpoints. Obviously, it occurs when the host debugs the guest. This patch moves the DR4/5 checks from __kvm_set_dr/_kvm_get_dr to handle_dr. The emulator already checks DR4/5 availability in check_dr_read. Nested virutalization related calls to kvm_set_dr/kvm_get_dr would not like to inject exceptions to the guest. As for SVM, the patch follows the previous logic as much as possible. Anyhow, it appears the DR interception code might be buggy - even if the DR access may cause an exception, the instruction is skipped. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:26 +01:00
Nadav Amit	6bdf06625d	KVM: x86: DR7.GD should be cleared upon any #DB exception Intel SDM 17.2.4 (Debug Control Register (DR7)) says: "The processor clears the GD flag upon entering to the debug exception handler." This sentence may be misunderstood as if it happens only on #DB due to debug-register protection, but it happens regardless to the cause of the #DB. Fix the behavior to match both real hardware and Bochs. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-11-03 12:07:23 +01:00
Andy Honig	8b3c3104c3	KVM: x86: Prevent host from panicking on shared MSR writes. The previous patch blocked invalid writes directly when the MSR is written. As a precaution, prevent future similar mistakes by gracefulling handle GPs caused by writes to shared MSRs. Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> [Remove parts obsoleted by Nadav's patch. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-24 13:21:08 +02:00
Nadav Amit	854e8bb1aa	KVM: x86: Check non-canonical addresses upon WRMSR Upon WRMSR, the CPU should inject #GP if a non-canonical value (address) is written to certain MSRs. The behavior is "almost" identical for AMD and Intel (ignoring MSRs that are not implemented in either architecture since they would anyhow #GP). However, IA32_SYSENTER_ESP and IA32_SYSENTER_EIP cause #GP if non-canonical address is written on Intel but not on AMD (which ignores the top 32-bits). Accordingly, this patch injects a #GP on the MSRs which behave identically on Intel and AMD. To eliminate the differences between the architecutres, the value which is written to IA32_SYSENTER_ESP and IA32_SYSENTER_EIP is turned to canonical value before writing instead of injecting a #GP. Some references from Intel and AMD manuals: According to Intel SDM description of WRMSR instruction #GP is expected on WRMSR "If the source register contains a non-canonical address and ECX specifies one of the following MSRs: IA32_DS_AREA, IA32_FS_BASE, IA32_GS_BASE, IA32_KERNEL_GS_BASE, IA32_LSTAR, IA32_SYSENTER_EIP, IA32_SYSENTER_ESP." According to AMD manual instruction manual: LSTAR/CSTAR (SYSCALL): "The WRMSR instruction loads the target RIP into the LSTAR and CSTAR registers. If an RIP written by WRMSR is not in canonical form, a general-protection exception (#GP) occurs." IA32_GS_BASE and IA32_FS_BASE (WRFSBASE/WRGSBASE): "The address written to the base field must be in canonical form or a #GP fault will occur." IA32_KERNEL_GS_BASE (SWAPGS): "The address stored in the KernelGSbase MSR must be in canonical form." This patch fixes CVE-2014-3610. Cc: stable@vger.kernel.org Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-24 13:21:08 +02:00
Linus Torvalds	0429fbc0bd	Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...	2014-10-15 07:48:18 +02:00
Paolo Bonzini	f439ed27f8	kvm: do not handle APIC access page if in-kernel irqchip is not in use This fixes the following OOPS: loaded kvm module (v3.17-rc1-168-gcec26bc) BUG: unable to handle kernel paging request at fffffffffffffffe IP: [<ffffffff81168449>] put_page+0x9/0x30 PGD 1e15067 PUD 1e17067 PMD 0 Oops: 0000 [#1] PREEMPT SMP [<ffffffffa063271d>] ? kvm_vcpu_reload_apic_access_page+0x5d/0x70 [kvm] [<ffffffffa013b6db>] vmx_vcpu_reset+0x21b/0x470 [kvm_intel] [<ffffffffa0658816>] ? kvm_pmu_reset+0x76/0xb0 [kvm] [<ffffffffa064032a>] kvm_vcpu_reset+0x15a/0x1b0 [kvm] [<ffffffffa06403ac>] kvm_arch_vcpu_setup+0x2c/0x50 [kvm] [<ffffffffa062e540>] kvm_vm_ioctl+0x200/0x780 [kvm] [<ffffffff81212170>] do_vfs_ioctl+0x2d0/0x4b0 [<ffffffff8108bd99>] ? __mmdrop+0x69/0xb0 [<ffffffff812123d1>] SyS_ioctl+0x81/0xa0 [<ffffffff8112a6f6>] ? __audit_syscall_exit+0x1f6/0x2a0 [<ffffffff817229e9>] system_call_fastpath+0x16/0x1b Code: c6 78 ce a3 81 4c 89 e7 e8 d9 80 ff ff 0f 0b 4c 89 e7 e8 8f f6 ff ff e9 fa fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 1e 8b 47 1c 85 c0 74 27 f0 RIP [<ffffffff81193045>] put_page+0x5/0x50 when not using the in-kernel irqchip ("-machine kernel_irqchip=off" with QEMU). The fix is to make the same check in kvm_vcpu_reload_apic_access_page that we already have in vmx.c's vm_need_virtualize_apic_accesses(). Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Tested-by: Jan Kiszka <jan.kiszka@siemens.com> Fixes: `4256f43f9f` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-10-02 19:26:11 +02:00
Tang Chen	c24ae0dcd3	kvm: x86: Unpin and remove kvm_arch->apic_access_page In order to make the APIC access page migratable, stop pinning it in memory. And because the APIC access page is not pinned in memory, we can remove kvm_arch->apic_access_page. When we need to write its physical address into vmcs, we use gfn_to_page() to get its page struct, which is needed to call page_to_phys(); the page is then immediately unpinned. Suggested-by: Gleb Natapov <gleb@kernel.org> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:08:01 +02:00
Tang Chen	4256f43f9f	kvm: x86: Add request bit to reload APIC access page address Currently, the APIC access page is pinned by KVM for the entire life of the guest. We want to make it migratable in order to make memory hot-unplug available for machines that run KVM. This patch prepares to handle this in generic code, through a new request bit (that will be set by the MMU notifier) and a new hook that is called whenever the request bit is processed. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:08:00 +02:00
Tang Chen	fe71557afb	kvm: Add arch specific mmu notifier for page invalidation This will be used to let the guest run while the APIC access page is not pinned. Because subsequent patches will fill in the function for x86, place the (still empty) x86 implementation in the x86.c file instead of adding an inline function in kvm_host.h. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:07:59 +02:00
Chen Yucong	81760dccf8	kvm: x86: use macros to compute bank MSRs Avoid open coded calculations for bank MSRs by using well-defined macros that hide the index of higher bank MSRs. No semantic changes. Signed-off-by: Chen Yucong <slaoub@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:07:56 +02:00
Nadav Amit	4566654bb9	KVM: vmx: Inject #GP on invalid PAT CR Guest which sets the PAT CR to invalid value should get a #GP. Currently, if vmx supports loading PAT CR during entry, then the value is not checked. This patch makes the required check in that case. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:07:52 +02:00
Liang Chen	77c3913b74	KVM: x86: directly use kvm_make_request again A one-line wrapper around kvm_make_request is not particularly useful. Replace kvm_mmu_flush_tlb() with kvm_make_request(). Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:07:51 +02:00
Radim Krčmář	a70656b63a	KVM: x86: count actual tlb flushes - we count KVM_REQ_TLB_FLUSH requests, not actual flushes (KVM can have multiple requests for one flush) - flushes from kvm_flush_remote_tlbs aren't counted - it's easy to make a direct request by mistake Solve these by postponing the counting to kvm_check_request(). Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:07:50 +02:00
Nadav Amit	a2b9e6c1a3	KVM: x86: Don't report guest userspace emulation error to userspace Commit `fc3a9157d3` ("KVM: X86: Don't report L2 emulation failures to user-space") disabled the reporting of L2 (nested guest) emulation failures to userspace due to race-condition between a vmexit and the instruction emulator. The same rational applies also to userspace applications that are permitted by the guest OS to access MMIO area or perform PIO. This patch extends the current behavior - of injecting a #UD instead of reporting it to userspace - also for guest userspace code. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-24 14:07:49 +02:00
Tang Chen	a255d4795f	kvm: Remove ept_identity_pagetable from struct kvm_arch. kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But it is never used to refer to the page at all. In vcpu initialization, it indicates two things: 1. indicates if ept page is allocated 2. indicates if a memory slot for identity page is initialized Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept identity pagetable is initialized. So we can remove ept_identity_pagetable. NOTE: In the original code, ept identity pagetable page is pinned in memroy. As a result, it cannot be migrated/hot-removed. After this patch, since kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page is no longer pinned in memory. And it can be migrated/hot-removed. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-17 13:10:11 +02:00
Guo Hui Liu	105b21bbf6	KVM: x86: Use kvm_make_request when applicable This patch replace the set_bit method by kvm_make_request to make code more readable and consistent. Signed-off-by: Guo Hui Liu <liuguohui@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-16 14:44:20 +02:00
Paolo Bonzini	54987b7afa	KVM: x86: propagate exception from permission checks on the nested page fault Currently, if a permission error happens during the translation of the final GPA to HPA, walk_addr_generic returns 0 but does not fill in walker->fault. To avoid this, add an x86_exception* argument to the translate_gpa function, and let it fill in walker->fault. The nested_page_fault field will be true, since the walk_mmu is the nested_mmu and translate_gpu instead operates on the "outer" (NPT) instance. Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-05 12:01:13 +02:00
Paolo Bonzini	ef54bcfeea	KVM: x86: skip writeback on injection of nested exception If a nested page fault happens during emulation, we will inject a vmexit, not a page fault. However because writeback happens after the injection, we will write ctxt->eip from L2 into the L1 EIP. We do not write back if an instruction caused an interception vmexit---do the same for page faults. Suggested-by: Gleb Natapov <gleb@kernel.org> Reviewed-by: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-05 12:01:06 +02:00
Tiejun Chen	d143148383	KVM: mmio: cleanup kvm_set_mmio_spte_mask Just reuse rsvd_bits() inside kvm_set_mmio_spte_mask() for slightly better code. Signed-off-by: Tiejun Chen <tiejun.chen@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-09-03 10:04:10 +02:00
Paolo Bonzini	fd2752352b	KVM: x86: use guest maxphyaddr to check MTRR values The check introduced in commit `d7a2a246a1` (KVM: x86: #GP when attempts to write reserved bits of Variable Range MTRRs, 2014-08-19) will break if the guest maxphyaddr is higher than the host's (which sometimes happens depending on your hardware and how QEMU is configured). To fix this, use cpuid_maxphyaddr similar to how the APIC_BASE MSR does already. Reported-by: Jan Kiszka <jan.kiszka@siemens.com> Tested-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-29 18:56:24 +02:00
Radim Krčmář	13a34e067e	KVM: remove garbage arg to *hardware_{en,dis}able In the beggining was on_each_cpu(), which required an unused argument to kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten. Remove unnecessary arguments that stem from this. Signed-off-by: Radim KrÄmÃ¡Å™ <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-29 16:35:55 +02:00
Christoph Lameter	89cbc76768	x86: Replace __get_cpu_var uses __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int x = &__get_cpu_var(y); Converts to int x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2014-08-26 13:45:49 -04:00
Radim Krčmář	7b46268d29	KVM: trace kvm_ple_window grow/shrink Tracepoint for dynamic PLE window, fired on every potential change. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-21 18:45:23 +02:00
Radim Krčmář	ae97a3b818	KVM: x86: introduce sched_in to kvm_x86_ops sched_in preempt notifier is available for x86, allow its use in specific virtualization technlogies as well. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-21 18:45:22 +02:00
Radim Krčmář	e790d9ef64	KVM: add kvm_arch_sched_in Introduce preempt notifiers for architecture specific code. Advantage over creating a new notifier in every arch is slightly simpler code and guaranteed call order with respect to kvm_sched_in. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-21 18:45:21 +02:00
Paolo Bonzini	e0ad0b477c	KVM: emulate: warn on invalid or uninitialized exception numbers These were reported when running Jailhouse on AMD processors. Initialize ctxt->exception.vector with an invalid exception number, and warn if it remained invalid even though the emulator got an X86EMUL_PROPAGATE_FAULT return code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-20 13:01:26 +02:00
Wanpeng Li	d7a2a246a1	KVM: x86: #GP when attempts to write reserved bits of Variable Range MTRRs Section 11.11.2.3 of the SDM mentions "All other bits in the IA32_MTRR_PHYSBASEn and IA32_MTRR_PHYSMASKn registers are reserved; the processor generates a general-protection exception(#GP) if software attempts to write to them". This patch do it in kvm. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-19 15:12:29 +02:00
Wanpeng Li	adfb5d2746	KVM: x86: fix check legal type of Variable Range MTRRs The first entry in each pair(IA32_MTRR_PHYSBASEn) defines the base address and memory type for the range; the second entry(IA32_MTRR_PHYSMASKn) contains a mask used to determine the address range. The legal values for the type field of IA32_MTRR_PHYSBASEn are 0,1,4,5, and 6. However, IA32_MTRR_PHYSMASKn don't have type field. This patch avoid check if the type field is legal for IA32_MTRR_PHYSMASKn. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-19 15:12:29 +02:00
Wei Huang	dc9b2d933a	KVM: SVM: add rdmsr support for AMD event registers Current KVM only supports RDMSR for K7_EVNTSEL0 and K7_PERFCTR0 MSRs. Reading the rest MSRs will trigger KVM to inject #GP into guest VM. This causes a warning message "Failed to access perfctr msr (MSR c0010001 is ffffffffffffffff)" on AMD host. This patch adds RDMSR support for all K7_EVNTSELn and K7_PERFCTRn registers and thus supresses the warning message. Signed-off-by: Wei Huang <wehuang@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-08-19 15:12:28 +02:00
Linus Torvalds	66bb0aa077	Here are the PPC and ARM changes for KVM, which I separated because they had small conflicts (respectively within KVM documentation, and with 3.16-rc changes). Since they were all within the subsystem, I took care of them. Stephen Rothwell reported some snags in PPC builds, but they are all fixed now; the latest linux-next report was clean. New features for ARM include: - KVM VGIC v2 emulation on GICv3 hardware - Big-Endian support for arm/arm64 (guest and host) - Debug Architecture support for arm64 (arm32 is on Christoffer's todo list) And for PPC: - Book3S: Good number of LE host fixes, enable HV on LE - Book3S HV: Add in-guest debug support This release drops support for KVM on the PPC440. As a result, the PPC merge removes more lines than it adds. :) I also included an x86 change, since Davidlohr tied it to an independent bug report and the reporter quickly provided a Tested-by; there was no reason to wait for -rc2. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJT4iIJAAoJEBvWZb6bTYbyZqoP/3Wxy8NWPFJ8HGt81NHlGnDS a9UbL7EibcOEG+aaKqmtBglTD5YDiGBDNCxxiSJaDHt+grLN4fsWIliJob1nJFoO 90f89EWN2XjeCrJXA5nUoeg5tpc5OoYKsiP6pTgzIwkP8vvs/H1+zpcTS/UmYsr/ qipVMMsM+zZeHWZcSbqjW88z7YqIn1sr5282wJ85cbyv4KGizb/G4dyPuDqLb6np hkAD8Ah6VV2suQ2FSy7G2fg20R0vglUi60hkEHLoCBPVqJCl7SmC8MvxNbjBnP8S J36R0R0u1wHYKzAGooLJGVOZ/o/gSiVqKX+++L2EvJBN+kuA6u/7fxLyBT+LwDAE IF/Aln5rpg1fe+eywvhz86WljTVEQ8bO1zVsIQUPY+/ZOPedZHMwyvXft8ogbjSp 2m9OJ/3e8Aggh0OeHpCDoeow+QDUXvX0YdCw+2Yh0p+7VMXqkyp0QEiBu38jrusC rB3VNifJbDSWLKdG9LfCAPHnxZD2XYEwv2WFBo6KQOGMGHfx0GXpCOL/jQihrhA6 HtEG5Bs3lvnHQemdpUZ58xojiABbMaUPdcnPXQQEp23WhZzrfLMLzqVG0VYnhSsC 9pi7MJj8c31rqx5WU2oRM28i/BvNxN0NCtkDpineO5s3f89Ws1xnwxqlm38AKP0J irJQTYFEqec+GM9JK1rG =hyQP -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull second round of KVM changes from Paolo Bonzini: "Here are the PPC and ARM changes for KVM, which I separated because they had small conflicts (respectively within KVM documentation, and with 3.16-rc changes). Since they were all within the subsystem, I took care of them. Stephen Rothwell reported some snags in PPC builds, but they are all fixed now; the latest linux-next report was clean. New features for ARM include: - KVM VGIC v2 emulation on GICv3 hardware - Big-Endian support for arm/arm64 (guest and host) - Debug Architecture support for arm64 (arm32 is on Christoffer's todo list) And for PPC: - Book3S: Good number of LE host fixes, enable HV on LE - Book3S HV: Add in-guest debug support This release drops support for KVM on the PPC440. As a result, the PPC merge removes more lines than it adds. :) I also included an x86 change, since Davidlohr tied it to an independent bug report and the reporter quickly provided a Tested-by; there was no reason to wait for -rc2" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (122 commits) KVM: Move more code under CONFIG_HAVE_KVM_IRQFD KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use KVM: nVMX: Fix nested vmexit ack intr before load vmcs01 KVM: PPC: Enable IRQFD support for the XICS interrupt controller KVM: Give IRQFD its own separate enabling Kconfig option KVM: Move irq notifier implementation into eventfd.c KVM: Move all accesses to kvm::irq_routing into irqchip.c KVM: irqchip: Provide and use accessors for irq routing table KVM: Don't keep reference to irq routing table in irqfd struct KVM: PPC: drop duplicate tracepoint arm64: KVM: fix 64bit CP15 VM access for 32bit guests KVM: arm64: GICv3: mandate page-aligned GICV region arm64: KVM: GICv3: move system register access to msr_s/mrs_s KVM: PPC: PR: Handle FSCR feature deselects KVM: PPC: HV: Remove generic instruction emulation KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr KVM: PPC: Remove DCR handling KVM: PPC: Expose helper functions for data/inst faults KVM: PPC: Separate loadstore emulation from priv emulation KVM: PPC: Handle magic page in kvmppc_ld/st ...	2014-08-07 11:35:30 -07:00
Linus Torvalds	e7fda6c4c3	Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and time updates from Thomas Gleixner: "A rather large update of timers, timekeeping & co - Core timekeeping code is year-2038 safe now for 32bit machines. Now we just need to fix all in kernel users and the gazillion of user space interfaces which rely on timespec/timeval :) - Better cache layout for the timekeeping internal data structures. - Proper nanosecond based interfaces for in kernel users. - Tree wide cleanup of code which wants nanoseconds but does hoops and loops to convert back and forth from timespecs. Some of it definitely belongs into the ugly code museum. - Consolidation of the timekeeping interface zoo. - A fast NMI safe accessor to clock monotonic for tracing. This is a long standing request to support correlated user/kernel space traces. With proper NTP frequency correction it's also suitable for correlation of traces accross separate machines. - Checkpoint/restart support for timerfd. - A few NOHZ[_FULL] improvements in the [hr]timer code. - Code move from kernel to kernel/time of all time* related code. - New clocksource/event drivers from the ARM universe. I'm really impressed that despite an architected timer in the newer chips SoC manufacturers insist on inventing new and differently broken SoC specific timers. [ Ed. "Impressed"? I don't think that word means what you think it means ] - Another round of code move from arch to drivers. Looks like most of the legacy mess in ARM regarding timers is sorted out except for a few obnoxious strongholds. - The usual updates and fixlets all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) timekeeping: Fixup typo in update_vsyscall_old definition clocksource: document some basic timekeeping concepts timekeeping: Use cached ntp_tick_length when accumulating error timekeeping: Rework frequency adjustments to work better w/ nohz timekeeping: Minor fixup for timespec64->timespec assignment ftrace: Provide trace clocks monotonic timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC seqcount: Add raw_write_seqcount_latch() seqcount: Provide raw_read_seqcount() timekeeping: Use tk_read_base as argument for timekeeping_get_ns() timekeeping: Create struct tk_read_base and use it in struct timekeeper timekeeping: Restructure the timekeeper some more clocksource: Get rid of cycle_last clocksource: Move cycle_last validation to core code clocksource: Make delta calculation a function wireless: ath9k: Get rid of timespec conversions drm: vmwgfx: Use nsec based interfaces drm: i915: Use nsec based interfaces timekeeping: Provide ktime_get_raw() hangcheck-timer: Use ktime_get_ns() ...	2014-08-05 17:46:42 -07:00
Paolo Bonzini	cc568ead3c	Patch queue for ppc - 2014-08-01 Highlights in this release include: - BookE: Rework instruction fetch, not racy anymore now - BookE HV: Fix ONE_REG accessors for some in-hardware registers - Book3S: Good number of LE host fixes, enable HV on LE - Book3S: Some misc bug fixes - Book3S HV: Add in-guest debug support - Book3S HV: Preload cache lines on context switch - Remove 440 support Alexander Graf (31): KVM: PPC: Book3s PR: Disable AIL mode with OPAL KVM: PPC: Book3s HV: Fix tlbie compile error KVM: PPC: Book3S PR: Handle hyp doorbell exits KVM: PPC: Book3S PR: Fix ABIv2 on LE KVM: PPC: Book3S PR: Fix sparse endian checks PPC: Add asm helpers for BE 32bit load/store KVM: PPC: Book3S HV: Make HTAB code LE host aware KVM: PPC: Book3S HV: Access guest VPA in BE KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE KVM: PPC: Book3S HV: Access XICS in BE KVM: PPC: Book3S HV: Fix ABIv2 on LE KVM: PPC: Book3S HV: Enable for little endian hosts KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct KVM: PPC: Deflect page write faults properly in kvmppc_st KVM: PPC: Book3S: Stop PTE lookup on write errors KVM: PPC: Book3S: Add hack for split real mode KVM: PPC: Book3S: Make magic page properly 4k mappable KVM: PPC: Remove 440 support KVM: Rename and add argument to check_extension KVM: Allow KVM_CHECK_EXTENSION on the vm fd KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode KVM: PPC: Implement kvmppc_xlate for all targets KVM: PPC: Move kvmppc_ld/st to common code KVM: PPC: Remove kvmppc_bad_hva() KVM: PPC: Use kvm_read_guest in kvmppc_ld KVM: PPC: Handle magic page in kvmppc_ld/st KVM: PPC: Separate loadstore emulation from priv emulation KVM: PPC: Expose helper functions for data/inst faults KVM: PPC: Remove DCR handling KVM: PPC: HV: Remove generic instruction emulation KVM: PPC: PR: Handle FSCR feature deselects Alexey Kardashevskiy (1): KVM: PPC: Book3S: Fix LPCR one_reg interface Aneesh Kumar K.V (4): KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation KVM: PPC: BOOK3S: PR: Emulate virtual timebase register KVM: PPC: BOOK3S: PR: Emulate instruction counter KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page Anton Blanchard (2): KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC() Bharat Bhushan (10): kvm: ppc: bookehv: Added wrapper macros for shadow registers kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1 kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR kvm: ppc: booke: Add shared struct helpers of SPRN_ESR kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7 kvm: ppc: Add SPRN_EPR get helper function kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit KVM: PPC: Booke-hv: Add one reg interface for SPRG9 KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr Michael Neuling (1): KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling Mihai Caraman (8): KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule KVM: PPC: e500: Fix default tlb for victim hint KVM: PPC: e500: Emulate power management control SPR KVM: PPC: e500mc: Revert "add load inst fixup" KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1 KVM: PPC: Book3s: Remove kvmppc_read_inst() function KVM: PPC: Allow kvmppc_get_last_inst() to fail KVM: PPC: Bookehv: Get vcpu's last instruction for emulation Paul Mackerras (4): KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled KVM: PPC: Book3S PR: Take SRCU read lock around RTAS kvm_read_guest() call KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication Stewart Smith (2): Split out struct kvmppc_vcore creation to separate function Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJT21skAAoJECszeR4D/txgeFEP/AzJopN7s//W33CfyBqURHXp XALCyAw+S67gtcaTZbxomcG1xuT8Lj9WEw28iz3rCtAnJwIxsY63xrI1nXMzTaI2 p1rC0ai5Qy+nlEbd6L78spZy/Nzh8DFYGWx78iUSO1mYD8xywJwtoiBA539pwp8j 8N+mgn61Hwhv31bKtsZlmzXymVr/jbTp5LVuxsBLJwD2lgT49g+4uBnX2cG/iXkg Rzbh7LxoNNXrSPI8sYmTWu/81aeXteeX70ja6DHuV5dWLNTuAXJrh5EUfeAZqBrV aYcLWUYmIyB87txNmt6ZGVar2p3jr2Xhb9mKx+EN4dbehblanLc1PUqlHd0q3dKc Nt60ByqpZn+qDAK86dShSZLEe+GT3lovvE76CqVXD4Er+OUEkc9JoxhN1cof/Gb0 o6uwZ2isXHRdGoZx5vb4s3UTOlwZGtoL/CyY/HD/ujYDSURkCGbxLj3kkecSY8ut QdDAWsC15BwsHtKLr5Zwjp2w+0eGq2QJgfvO0zqWFiz9k33SCBCUpwluFeqh27Hi aR5Wir3j+MIw9G8XlYlDJWYfi0h/SZ4G7hh7jSu26NBNBzQsDa8ow/cLzdMhdUwH OYSaeqVk5wiRb9to1uq1NQWPA0uRAx3BSjjvr9MCGRqmvn+FV5nj637YWUT+53Hi aSvg/U2npghLPPG2cihu =JuLr -----END PGP SIGNATURE----- Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvm Patch queue for ppc - 2014-08-01 Highlights in this release include: - BookE: Rework instruction fetch, not racy anymore now - BookE HV: Fix ONE_REG accessors for some in-hardware registers - Book3S: Good number of LE host fixes, enable HV on LE - Book3S: Some misc bug fixes - Book3S HV: Add in-guest debug support - Book3S HV: Preload cache lines on context switch - Remove 440 support Alexander Graf (31): KVM: PPC: Book3s PR: Disable AIL mode with OPAL KVM: PPC: Book3s HV: Fix tlbie compile error KVM: PPC: Book3S PR: Handle hyp doorbell exits KVM: PPC: Book3S PR: Fix ABIv2 on LE KVM: PPC: Book3S PR: Fix sparse endian checks PPC: Add asm helpers for BE 32bit load/store KVM: PPC: Book3S HV: Make HTAB code LE host aware KVM: PPC: Book3S HV: Access guest VPA in BE KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE KVM: PPC: Book3S HV: Access XICS in BE KVM: PPC: Book3S HV: Fix ABIv2 on LE KVM: PPC: Book3S HV: Enable for little endian hosts KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct KVM: PPC: Deflect page write faults properly in kvmppc_st KVM: PPC: Book3S: Stop PTE lookup on write errors KVM: PPC: Book3S: Add hack for split real mode KVM: PPC: Book3S: Make magic page properly 4k mappable KVM: PPC: Remove 440 support KVM: Rename and add argument to check_extension KVM: Allow KVM_CHECK_EXTENSION on the vm fd KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode KVM: PPC: Implement kvmppc_xlate for all targets KVM: PPC: Move kvmppc_ld/st to common code KVM: PPC: Remove kvmppc_bad_hva() KVM: PPC: Use kvm_read_guest in kvmppc_ld KVM: PPC: Handle magic page in kvmppc_ld/st KVM: PPC: Separate loadstore emulation from priv emulation KVM: PPC: Expose helper functions for data/inst faults KVM: PPC: Remove DCR handling KVM: PPC: HV: Remove generic instruction emulation KVM: PPC: PR: Handle FSCR feature deselects Alexey Kardashevskiy (1): KVM: PPC: Book3S: Fix LPCR one_reg interface Aneesh Kumar K.V (4): KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation KVM: PPC: BOOK3S: PR: Emulate virtual timebase register KVM: PPC: BOOK3S: PR: Emulate instruction counter KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page Anton Blanchard (2): KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC() Bharat Bhushan (10): kvm: ppc: bookehv: Added wrapper macros for shadow registers kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1 kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR kvm: ppc: booke: Add shared struct helpers of SPRN_ESR kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7 kvm: ppc: Add SPRN_EPR get helper function kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit KVM: PPC: Booke-hv: Add one reg interface for SPRG9 KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr Michael Neuling (1): KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling Mihai Caraman (8): KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule KVM: PPC: e500: Fix default tlb for victim hint KVM: PPC: e500: Emulate power management control SPR KVM: PPC: e500mc: Revert "add load inst fixup" KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1 KVM: PPC: Book3s: Remove kvmppc_read_inst() function KVM: PPC: Allow kvmppc_get_last_inst() to fail KVM: PPC: Bookehv: Get vcpu's last instruction for emulation Paul Mackerras (4): KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled KVM: PPC: Book3S PR: Take SRCU read lock around RTAS kvm_read_guest() call KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication Stewart Smith (2): Split out struct kvmppc_vcore creation to separate function Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8 Conflicts: Documentation/virtual/kvm/api.txt	2014-08-05 09:58:11 +02:00
Linus Torvalds	8533ce7271	These are the x86, MIPS and s390 changes; PPC and ARM will come in a few days. MIPS and s390 have little going on this release; just bugfixes, some small, some larger. The highlights for x86 are nested VMX improvements (Jan Kiszka), optimizations for old processor (up to Nehalem, by me and Bandan Das), and a lot of x86 emulator bugfixes (Nadav Amit). Stephen Rothwell reported a trivial conflict with the tracing branch. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABAgAGBQJT300XAAoJEBvWZb6bTYby3V8QAJz+XyajnhJ8wH55Vxczz22L i2gtUGmBLhEXsBcaVKO4BBfek88lLzg0SGLjfW5wCMQmKtxVlrwTCXNkBoPGjapd NwHtWkMKym44PDhRovn7zkSumkxC43uFIBR/ebrhP6Bvhh9s+MnkQUxfw9ILB+YV EeKyEG8sSgxFCciuHbp3mIXpDcO6r/ldy6I7009OdyhLoMY+Kvmk7kRe9wtAivdg CGJi60QvGOn2RGRPOCEtF6UWr8Ae8fe1t84o0hkXPv/j3jtabzAatXKJa4dYNbIs 7Mp4NQpxaGV6rq3WCYVeZRxGs+UReGDAS3Il4Z8C9eTOTooSfxdVr8acpM8PY6I8 UmLT6ECLGycc4ELXrETtR+QLmiXACyJqyVxz4aiLV3kWSWfamKD3hBeQK9NizNcE VoPDl+PyISvR1tW4KstBuzfUWAEXi+gO78cqqFr/VW6cl7HKpA1DFQaPfGkYKDae 2CPwcLwI5/M6RtSgkyXTkEqNZLc2BjldqSeM1lmWjhZVW56X2iqePUL46Vab3Yvt U+sELtwEE560NLN3hbaHUsLR1tcUix5w8vTzcXPxgoHQBszHCcAZTWd1XHulr64F rp/cangqtkPKcu5j1mNhQs38oLjHI1MUsbQrqFoD4tmHjQ75iXHRFzYGoIVKXyHG AnGbQzJzBcdAANhm3LW0 =UXxV -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM changes from Paolo Bonzini: "These are the x86, MIPS and s390 changes; PPC and ARM will come in a few days. MIPS and s390 have little going on this release; just bugfixes, some small, some larger. The highlights for x86 are nested VMX improvements (Jan Kiszka), optimizations for old processor (up to Nehalem, by me and Bandan Das), and a lot of x86 emulator bugfixes (Nadav Amit). Stephen Rothwell reported a trivial conflict with the tracing branch" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (104 commits) x86/kvm: Resolve shadow warnings in macro expansion KVM: s390: rework broken SIGP STOP interrupt handling KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table KVM: vmx: remove duplicate vmx_mpx_supported() prototype KVM: s390: Fix memory leak on busy SIGP stop x86/kvm: Resolve shadow warning from min macro kvm: Resolve missing-field-initializers warnings Replace NR_VMX_MSR with its definition KVM: x86: Assertions to check no overrun in MSR lists KVM: x86: set rflags.rf during fault injection KVM: x86: Setting rflags.rf during rep-string emulation KVM: x86: DR6/7.RTM cannot be written KVM: nVMX: clean up nested_release_vmcs12 and code around it KVM: nVMX: fix lifetime issues for vmcs02 KVM: x86: Defining missing x86 vectors KVM: x86: emulator injects #DB when RFLAGS.RF is set KVM: x86: Cleanup of rflags.rf cleaning KVM: x86: Clear rflags.rf on emulated instructions KVM: x86: popf emulation should not change RF KVM: x86: Clearing rflags.rf upon skipped emulated instruction ...	2014-08-04 12:16:46 -07:00
Alexander Graf	784aa3d7fb	KVM: Rename and add argument to check_extension In preparation to make the check_extension function available to VM scope we add a struct kvm * argument to the function header and rename the function accordingly. It will still be called from the /dev/kvm fd, but with a NULL argument for struct kvm *. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-28 15:23:17 +02:00
Nadav Amit	0123be429f	KVM: x86: Assertions to check no overrun in MSR lists Currently there is no check whether shared MSRs list overrun the allocated size which can results in bugs. In addition there is no check that vmx->guest_msrs has sufficient space to accommodate all the VMX msrs. This patch adds the assertions. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-24 14:16:57 +02:00
Nadav Amit	d6e8c85456	KVM: x86: set rflags.rf during fault injection x86 does not automatically set rflags.rf during event injection. This patch does partial job, setting rflags.rf upon fault injection. It does not handle the setting of RF upon interrupt injection on rep-string instruction. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-24 14:04:01 +02:00
Thomas Gleixner	d28ede8379	timekeeping: Create struct tk_read_base and use it in struct timekeeper The members of the new struct are the required ones for the new NMI safe accessor to clcok monotonic. In order to reuse the existing timekeeping code and to make the update of the fast NMI safe timekeepers a simple memcpy use the struct for the timekeeper as well and convert all users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 15:01:53 -07:00
Thomas Gleixner	4a0e637738	clocksource: Get rid of cycle_last cycle_last was added to the clocksource to support the TSC validation. We moved that to the core code, so we can get rid of the extra copy. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 15:01:52 -07:00
Thomas Gleixner	cbcf2dd3b3	x86: kvm: Make kvm_get_time_and_clockread() nanoseconds based Convert the relevant base data right away to nanoseconds instead of doing the conversion on every readout. Reduces text size by 160 bytes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 15:01:46 -07:00
Thomas Gleixner	bb0b58127c	x86: kvm: Use ktime_get_boot_ns() Use the new nanoseconds based interface and get rid of the timespec conversion dance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2014-07-23 15:01:45 -07:00
Nadav Amit	6f43ed01e8	KVM: x86: DR6/7.RTM cannot be written Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the current KVM implementation. This bug is apparent only if the MOV-DR instruction is emulated or the host also debugs the guest. This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared and set respectively. It also sets DR6.RTM upon every debug exception. Obviously, it is not a complete fix, as debugging of RTM is still unsupported. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-21 17:17:52 +02:00
Nadav Amit	4161a56906	KVM: x86: emulator injects #DB when RFLAGS.RF is set If the RFLAGS.RF is set, then no #DB should occur on instruction breakpoints. However, the KVM emulator injects #DB regardless to RFLAGS.RF. This patch fixes this behavior. KVM, however, still appears not to update RFLAGS.RF correctly, regardless of this patch. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-21 13:43:09 +02:00
Nadav Amit	bb663c7ada	KVM: x86: Clearing rflags.rf upon skipped emulated instruction When skipping an emulated instruction, rflags.rf should be cleared as it would be on real x86 CPU. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-21 13:41:32 +02:00
Paolo Bonzini	44583cba91	KVM: x86: use kvm_read_guest_page for emulator accesses Emulator accesses are always done a page at a time, either by the emulator itself (for fetches) or because we need to query the MMU for address translations. Speed up these accesses by using kvm_read_guest_page and, in the case of fetches, by inlining kvm_read_guest_virt_helper and dropping the loop around kvm_read_guest_page. This final tweak saves 30-100 more clock cycles (4-10%), bringing the count (as measured by kvm-unit-tests) down to 720-1100 clock cycles on a Sandy Bridge Xeon host, compared to 2300-3200 before the whole series and 925-1700 after the first two low-hanging fruit changes. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-11 09:14:04 +02:00
Bandan Das	1498507a47	KVM: emulate: move init_decode_cache to emulate.c Core emulator functions all belong in emulator.c, x86 should have no knowledge of emulator internals Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-11 09:13:59 +02:00
Paolo Bonzini	6addfc4299	KVM: x86: avoid useless set of KVM_REQ_EVENT after emulation Despite the provisions to emulate up to 130 consecutive instructions, in practice KVM will emulate just one before exiting handle_invalid_guest_state, because x86_emulate_instruction always sets KVM_REQ_EVENT. However, we only need to do this if an interrupt could be injected, which happens a) if an interrupt shadow bit (STI or MOV SS) has gone away; b) if the interrupt flag has just been set (other instructions than STI can set it without enabling an interrupt shadow). This cuts another 700-900 cycles from the cost of emulating an instruction (measured on a Sandy Bridge Xeon: 1650-2600 cycles before the patch on kvm-unit-tests, 925-1700 afterwards). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-11 09:13:57 +02:00
Paolo Bonzini	37ccdcbe07	KVM: x86: return all bits from get_interrupt_shadow For the next patch we will need to know the full state of the interrupt shadow; we will then set KVM_REQ_EVENT when one bit is cleared. However, right now get_interrupt_shadow only returns the one corresponding to the emulated instruction, or an unconditional 0 if the emulated instruction does not have an interrupt shadow. This is confusing and does not allow us to check for cleared bits as mentioned above. Clean the callback up, and modify toggle_interruptibility to match the comment above the call. As a small result, the call to set_interrupt_shadow will be skipped in the common case where int_shadow == 0 && mask == 0. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-11 09:13:56 +02:00
Matthias Lange	22d48b2d2a	KVM: svm: writes to MSR_K7_HWCR generates GPE in guest Since commit 575203 the MCE subsystem in the Linux kernel for AMD sets bit 18 in MSR_K7_HWCR. Running such a kernel as a guest in KVM on an AMD host results in a GPE injected into the guest because kvm_set_msr_common returns 1. This patch fixes this by masking bit 18 from the MSR value desired by the guest. Signed-off-by: Matthias Lange <matthias.lange@kernkonzept.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-11 09:11:59 +02:00
Nadav Amit	5f7552d4a5	KVM: x86: Pending interrupt may be delivered after INIT We encountered a scenario in which after an INIT is delivered, a pending interrupt is delivered, although it was sent before the INIT. As the SDM states in section 10.4.7.1, the ISR and the IRR should be cleared after INIT as KVM does. This also means that pending interrupts should be cleared. This patch clears upon reset (and INIT) the pending interrupts; and at the same occassion clears the pending exceptions, since they may cause a similar issue. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-11 09:11:58 +02:00
Tomasz Grabiec	0d3da0d26e	KVM: x86: fix TSC matching I've observed kvmclock being marked as unstable on a modern single-socket system with a stable TSC and qemu-1.6.2 or qemu-2.0.0. The culprit was failure in TSC matching because of overflow of kvm_arch::nr_vcpus_matched_tsc in case there were multiple TSC writes in a single synchronization cycle. Turns out that qemu does multiple TSC writes during init, below is the evidence of that (qemu-2.0.0): The first one: 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel] 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm] 0xffffffffa04cfd6b : kvm_arch_vcpu_postcreate+0x4b/0x80 [kvm] 0xffffffffa04b8188 : kvm_vm_ioctl+0x418/0x750 [kvm] The second one: 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel] 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm] 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel] 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm] 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm] 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm] 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm] #0 kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780 #1 kvm_put_msrs at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270 #2 kvm_arch_put_registers at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909 #3 kvm_cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1641 #4 cpu_synchronize_post_init at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:330 #5 cpu_synchronize_all_post_init () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:521 #6 main at /build/buildd/qemu-2.0.0+dfsg/vl.c:4390 The third one: 0xffffffffa08ff2b4 : vmx_write_tsc_offset+0xa4/0xb0 [kvm_intel] 0xffffffffa04c9c05 : kvm_write_tsc+0x1a5/0x360 [kvm] 0xffffffffa090610d : vmx_set_msr+0x29d/0x350 [kvm_intel] 0xffffffffa04be83b : do_set_msr+0x3b/0x60 [kvm] 0xffffffffa04c10a8 : msr_io+0xc8/0x160 [kvm] 0xffffffffa04caeb6 : kvm_arch_vcpu_ioctl+0xc86/0x1060 [kvm] 0xffffffffa04b6797 : kvm_vcpu_ioctl+0xc7/0x5a0 [kvm] #0 kvm_vcpu_ioctl at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1780 #1 kvm_put_msrs at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1270 #2 kvm_arch_put_registers at /build/buildd/qemu-2.0.0+dfsg/target-i386/kvm.c:1909 #3 kvm_cpu_synchronize_post_reset at /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:1635 #4 cpu_synchronize_post_reset at /build/buildd/qemu-2.0.0+dfsg/include/sysemu/kvm.h:323 #5 cpu_synchronize_all_post_reset () at /build/buildd/qemu-2.0.0+dfsg/cpus.c:512 #6 main at /build/buildd/qemu-2.0.0+dfsg/vl.c:4482 The fix is to count each vCPU only once when matched, so that nr_vcpus_matched_tsc holds the size of the matched set. This is achieved by reusing generation counters. Every vCPU with this_tsc_generation == cur_tsc_generation is in the matched set. The match set is cleared by setting cur_tsc_generation to a value which no other vCPU is set to (by incrementing it). I needed to bump up the counter size form u8 to u64 to ensure it never overflows. Otherwise in cases TSC is not written the same number of times on each vCPU the counter could overflow and incorrectly indicate some vCPUs as being in the matched set. This scenario seems unlikely but I'm not sure if it can be disregarded. Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-09 18:09:57 +02:00
Bandan Das	9242b5b60d	KVM: x86: Check for nested events if there is an injectable interrupt With commit `b6b8a1451f` that introduced vmx_check_nested_events, checks for injectable interrupts happen at different points in time for L1 and L2 that could potentially cause a race. The regression occurs because KVM_REQ_EVENT is always set when nested_run_pending is set even if there's no pending interrupt. Consequently, there could be a small window when check_nested_events returns without exiting to L1, but an interrupt comes through soon after and it incorrectly, gets injected to L2 by inject_pending_event Fix this by adding a call to check for nested events too when a check for injectable interrupt returns true Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-07-08 10:06:42 +02:00
Rickard Strandqvist	9f6226a762	arch: x86: kvm: x86.c: Cleaning up variable is set more than once A struct member variable is set to the same value more than once This was found using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-06-30 16:52:04 +02:00
Xiaoming Gao	e1fa108d24	kvm: fix wrong address when writing Hyper-V tsc page When kvm_write_guest writes the tsc_ref structure to the guest, or it will lead the low HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT bits of the TSC page address must be cleared, or the guest can see a non-zero sequence number. Otherwise Windows guests would not be able to get a correct clocksource (QueryPerformanceCounter will always return 0) which causes serious chaos. Signed-off-by: Xiaoming Gao <newtongao@tencnet.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-06-19 13:43:43 +02:00
Nadav Amit	a449c7aa51	KVM: x86: Hypercall handling does not considers opsize correctly Currently, the hypercall handling routine only considers LME as an indication to whether the guest uses 32/64-bit mode. This is incosistent with hyperv hypercalls handling and against the common sense of considering cs.l as well. This patch uses is_64_bit_mode instead of is_long_mode for that matter. In addition, the result is masked in respect to the guest execution mode. Last, it changes kvm_hv_hypercall to use is_64_bit_mode as well to simplify the code. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-06-19 12:52:14 +02:00
Nadav Amit	67f4d4288c	KVM: x86: rdpmc emulation checks the counter incorrectly The rdpmc emulation checks that the counter (ECX) is not higher than 2, without taking into considerations bits 30:31 role (e.g., bit 30 marks whether the counter is fixed). The fix uses the pmu information for checking the validity of the pmu counter. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-06-18 17:46:18 +02:00
Linus Torvalds	b05d59dfce	At over 200 commits, covering almost all supported architectures, this was a pretty active cycle for KVM. Changes include: - a lot of s390 changes: optimizations, support for migration, GDB support and more - ARM changes are pretty small: support for the PSCI 0.2 hypercall interface on both the guest and the host (the latter acked by Catalin) - initial POWER8 and little-endian host support - support for running u-boot on embedded POWER targets - pretty large changes to MIPS too, completing the userspace interface and improving the handling of virtualized timer hardware - for x86, a larger set of changes is scheduled for 3.17. Still, we have a few emulator bugfixes and support for running nested fully-virtualized Xen guests (para-virtualized Xen guests have always worked). And some optimizations too. The only missing architecture here is ia64. It's not a coincidence that support for KVM on ia64 is scheduled for removal in 3.17. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJTjtlBAAoJEBvWZb6bTYbyMOUP/2NAePghE3IjG99ikHFdn+BX BfrURsuR6GD0AhYQnBidBmpFbAmN/LwSJxv/M7sV7OBRWLu3qbt69DrPTU2e/FK1 j9q25peu8jRyHzJ1q9rBroo74nD9lQYuVr3uXNxxcg0DRnw14JHGlM3y8LDEknO8 W+gpWTeAQ+2AuOX98MpRbCRMuzziCSv5bP5FhBVnsWHiZfvMbcUrbeJt+zYSiDAZ 0tHm/5dFKzfj/vVrrnjD4EZcRr688Bs5rztG96hY6aoVJryjZGLtLp92wCWkRRmH CCvZwd245NmNthuKHzcs27/duSWfU0uOlu7AMrD44QYhzeDGyB/2nbCxbGqLLoBA nnOviXH4cC65/CnisZ79zfo979HbZcX+Lzg747EjBgCSxJmLlwgiG8yXtDvk5otB TH6GUeGDiEEPj//JD3XtgSz0sF2NvjREWRyemjDMvhz6JC/bLytXKb3sn+NXSj8m ujzF9eQoa4qKDcBL4IQYGTJ4z5nY3Pd68dHFIPHB7n82OxFLSQUBKxXw8/1fb5og VVb8PL4GOcmakQlAKtTMlFPmuy4bbL2r/2iV5xJiOZKmXIu8Hs1JezBE3SFAltbl 3cAGwSM9/dDkKxUbTFblyOE9bkKbg4WYmq0LkdzsPEomb3IZWntOT25rYnX+LrBz bAknaZpPiOrW11Et1htY =j5Od -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into next Pull KVM updates from Paolo Bonzini: "At over 200 commits, covering almost all supported architectures, this was a pretty active cycle for KVM. Changes include: - a lot of s390 changes: optimizations, support for migration, GDB support and more - ARM changes are pretty small: support for the PSCI 0.2 hypercall interface on both the guest and the host (the latter acked by Catalin) - initial POWER8 and little-endian host support - support for running u-boot on embedded POWER targets - pretty large changes to MIPS too, completing the userspace interface and improving the handling of virtualized timer hardware - for x86, a larger set of changes is scheduled for 3.17. Still, we have a few emulator bugfixes and support for running nested fully-virtualized Xen guests (para-virtualized Xen guests have always worked). And some optimizations too. The only missing architecture here is ia64. It's not a coincidence that support for KVM on ia64 is scheduled for removal in 3.17" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits) KVM: add missing cleanup_srcu_struct KVM: PPC: Book3S PR: Rework SLB switching code KVM: PPC: Book3S PR: Use SLB entry 0 KVM: PPC: Book3S HV: Fix machine check delivery to guest KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs KVM: PPC: Book3S HV: Make sure we don't miss dirty pages KVM: PPC: Book3S HV: Fix dirty map for hugepages KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates() KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number KVM: PPC: Book3S: Add ONE_REG register names that were missed KVM: PPC: Add CAP to indicate hcall fixes KVM: PPC: MPIC: Reset IRQ source private members KVM: PPC: Graciously fail broken LE hypercalls PPC: ePAPR: Fix hypercall on LE guest KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler KVM: PPC: BOOK3S: Always use the saved DAR value PPC: KVM: Make NX bit available with magic page KVM: PPC: Disable NX for old magic page using guests KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest ...	2014-06-04 08:47:12 -07:00
Linus Torvalds	4efdedca93	Small fixes for x86, slightly larger fixes for PPC, and a forgotten s390 patch. The PPC fixes are important because they fix breakage that is new in 3.15. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJTdivEAAoJEBvWZb6bTYbyw3YQAIILnflhHNtklj1mfPnnibQf c3BLCkJ0gtK6A0FO2aAHgSja0kpgbEEnSphE/A/cb0vkLon3n5O0pQoSKjGUUbBO Mo0ndjzBYNmCP4MGxhkrg49VdqD40NaR0BjJAZudb4vUOw892WLFIJMIVmIqs9eG 8V/y6S7mPLmrooAKHZxXql9y30UC77T1VZ3r4pXwYgKtUT51BQfTyWiSfjQBa8yI oGOSb8uqEC7YiOYPJYUNIMsyVqW4E6Qqs46rqtP4XZmSxzWXDzzgP4nQHHyJJCdZ aBYkeG+sJZG7ZwleJLejAncjWUY9Oq9GkMYNj0cTAoP/zA6jBGAll96KGKRbes9z bZUtCNL3ifLcgbIGeAxgjmYOq0XLGahHbqm9QISYW2XdRkBI+8EJs5FCP4YEHzZn FSm3zcCQ+wtbqjBbZZcqqLa6A/CGzjyO26qz+BCxrZ0BQkQX/2am3UykQ0JWam3H vX5ZM2ewJhs6SjFisPcswd20AN+SHjPyzPvErBLDfrqnAVbwj2ehgqyN2slVsqrj UyGzeKCfJgA0TiEH/4K6j6hvQWynUU+/2JglIfGE6AXmWddazCzl/qx4LvuGKFoB b8JSQ7YaHSsq/tHc8WhHkvcP0FSDZEiHcJN2iY1pwLKTSQp9JN3aPNruPKiO8dsW N+LoHL5fFcDi6Uu6wS7w =E2fU -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm fixes from Paolo Bonzini: "Small fixes for x86, slightly larger fixes for PPC, and a forgotten s390 patch. The PPC fixes are important because they fix breakage that is new in 3.15" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: announce irqfd capability KVM: x86: disable master clock if TSC is reset during suspend KVM: vmx: disable APIC virtualization in nested guests KVM guest: Make pv trampoline code executable KVM: PPC: Book3S: ifdef on CONFIG_KVM_BOOK3S_32_HANDLER for 32bit KVM: PPC: Book3S HV: Add missing code for transaction reclaim on guest exit KVM: PPC: Book3S: HV: make _PAGE_NUMA take effect	2014-05-28 08:08:03 -07:00
Paolo Bonzini	fb5e336b97	KVM: x86: drop set_rflags callback Not needed anymore now that the CPL is computed directly during task switch. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-22 17:47:16 +02:00
Marcelo Tosatti	16a9602158	KVM: x86: disable master clock if TSC is reset during suspend Updating system_time from the kernel clock once master clock has been enabled can result in time backwards event, in case kernel clock frequency is lower than TSC frequency. Disable master clock in case it is necessary to update it from the resume path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-14 17:59:21 +02:00
Jan Kiszka	d9f89b88f5	KVM: x86: Fix CR3 reserved bits check in long mode Regression of `346874c9`: PAE is set in long mode, but that does not mean we have valid PDPTRs. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-12 20:04:01 +02:00
Michael S. Tsirkin	b63cf42fd1	kvm/x86: implement hv EOI assist It seems that it's easy to implement the EOI assist on top of the PV EOI feature: simply convert the page address to the format expected by PV EOI. Notes: -"No EOI required" is set only if interrupt injected is edge triggered; this is true because level interrupts are going through IOAPIC which disables PV EOI. In any case, if guest triggers EOI the bit will get cleared on exit. -For migration, set of HV_X64_MSR_APIC_ASSIST_PAGE sets KVM_PV_EOI_EN internally, so restoring HV_X64_MSR_APIC_ASSIST_PAGE seems sufficient In any case, bit is cleared on exit so worst case it's never re-enabled -no handling of PV EOI data is performed at HV_X64_MSR_EOI write; HV_X64_MSR_EOI is a separate optimization - it's an X2APIC replacement that lets you do EOI with an MSR and not IO. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-07 18:00:49 +02:00
Andi Kleen	2605fc216f	asmlinkage, x86: Add explicit __visible to arch/x86/* As requested by Linus add explicit __visible to the asmlinkage users. This marks all functions visible to assembler. Tree sweep for arch/x86/* Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-05-05 16:07:44 -07:00
Ulrich Obergfell	1171903d89	KVM: x86: improve the usability of the 'kvm_pio' tracepoint This patch moves the 'kvm_pio' tracepoint to emulator_pio_in_emulated() and emulator_pio_out_emulated(), and it adds an argument (a pointer to the 'pio_data'). A single 8-bit or 16-bit or 32-bit data item is fetched from 'pio_data' (depending on 'size'), and the value is included in the trace record ('val'). If 'count' is greater than one, this is indicated by the string "(...)" in the trace output. Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-05-05 22:42:05 +02:00
Xiao Guangrong	198c74f43f	KVM: MMU: flush tlb out of mmu lock when write-protect the sptes Now we can flush all the TLBs out of the mmu lock without TLB corruption when write-proect the sptes, it is because: - we have marked large sptes readonly instead of dropping them that means we just change the spte from writable to readonly so that we only need to care the case of changing spte from present to present (changing the spte from present to nonpresent will flush all the TLBs immediately), in other words, the only case we need to care is mmu_spte_update() - in mmu_spte_update(), we haved checked SPTE_HOST_WRITEABLE \| PTE_MMU_WRITEABLE instead of PT_WRITABLE_MASK, that means it does not depend on PT_WRITABLE_MASK anymore Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-23 17:49:52 -03:00
Xiao Guangrong	c126d94f2c	KVM: MMU: lazily drop large spte Currently, kvm zaps the large spte if write-protected is needed, the later read can fault on that spte. Actually, we can make the large spte readonly instead of making them un-present, the page fault caused by read access can be avoided The idea is from Avi: \| As I mentioned before, write-protecting a large spte is a good idea, \| since it moves some work from protect-time to fault-time, so it reduces \| jitter. This removes the need for the return value. This version has fixed the issue reported in `6b73a9606`, the reason of that issue is that fast_page_fault() directly sets the readonly large spte to writable but only dirty the first page into the dirty-bitmap that means other pages are missed. Fixed it by only the normal sptes (on the PT_PAGE_TABLE_LEVEL level) can be fast fixed Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2014-04-23 17:49:50 -03:00

... 2 3 4 5 6 ...

1251 Commits