linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-21 07:09:15 +07:00

Author	SHA1	Message	Date
Luwei Kang	6c0f0bba85	KVM: x86: Introduce a function to initialize the PT configuration Initialize the Intel PT configuration when cpuid update. Include cpuid inforamtion, rtit_ctl bit mask and the number of address ranges. Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-21 11:28:36 +01:00
Chao Peng	2ef444f160	KVM: x86: Add Intel PT context switch for each vcpu Load/Store Intel Processor Trace register in context switch. MSR IA32_RTIT_CTL is loaded/stored automatically from VMCS. In Host-Guest mode, we need load/resore PT MSRs only when PT is enabled in guest. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-21 11:28:35 +01:00
Chao Peng	86f5201df0	KVM: x86: Add Intel Processor Trace cpuid emulation Expose Intel Processor Trace to guest only when the PT works in Host-Guest mode. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-21 11:28:35 +01:00
Chao Peng	f99e3daf94	KVM: x86: Add Intel PT virtualization work mode Intel Processor Trace virtualization can be work in one of 2 possible modes: a. System-Wide mode (default): When the host configures Intel PT to collect trace packets of the entire system, it can leave the relevant VMX controls clear to allow VMX-specific packets to provide information across VMX transitions. KVM guest will not aware this feature in this mode and both host and KVM guest trace will output to host buffer. b. Host-Guest mode: Host can configure trace-packet generation while in VMX non-root operation for guests and root operation for native executing normally. Intel PT will be exposed to KVM guest in this mode, and the trace output to respective buffer of host and guest. In this mode, tht status of PT will be saved and disabled before VM-entry and restored after VM-exit if trace a virtual machine. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-21 11:28:34 +01:00
Luwei Kang	e0018afec5	perf/x86/intel/pt: add new capability for Intel PT This adds support for "output to Trace Transport subsystem" capability of Intel PT. It means that PT can output its trace to an MMIO address range rather than system memory buffer. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-21 11:28:33 +01:00
Luwei Kang	69843a913f	perf/x86/intel/pt: Add new bit definitions for PT MSRs Add bit definitions for Intel PT MSRs to support trace output directed to the memeory subsystem and holds a count if packet bytes that have been sent out. These are required by the upcoming PT support in KVM guests for MSRs read/write emulation. Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-21 11:28:33 +01:00
Luwei Kang	61be2998ca	perf/x86/intel/pt: Introduce intel_pt_validate_cap() intel_pt_validate_hw_cap() validates whether a given PT capability is supported by the hardware. It checks the PT capability array which reflects the capabilities of the hardware on which the code is executed. For setting up PT for KVM guests this is not correct as the capability array for the guest can be different from the host array. Provide a new function to check against a given capability array. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-21 11:28:32 +01:00
Chao Peng	f6d079ce86	perf/x86/intel/pt: Export pt_cap_get() pt_cap_get() is required by the upcoming PT support in KVM guests. Export it and move the capabilites enum to a global header. As a global functions, "pt_" is already used for ptrace and other things, so it makes sense to use "intel_pt_" as a prefix. Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-21 11:28:32 +01:00
Chao Peng	887eda13b5	perf/x86/intel/pt: Move Intel PT MSRs bit defines to global header The Intel Processor Trace (PT) MSR bit defines are in a private header. The upcoming support for PT virtualization requires these defines to be accessible from KVM code. Move them to the global MSR header file. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Signed-off-by: Luwei Kang <luwei.kang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-21 11:28:31 +01:00
Wei Yang	bdd303cb1b	KVM: fix some typos Signed-off-by: Wei Yang <richard.weiyang@gmail.com> [Preserved the iff and a probably intentional weird bracket notation. Also dropped the style change to make a single-purpose patch. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-12-21 11:28:26 +01:00
Peng Hao	649472a169	x86/kvmclock: convert to SPDX identifiers Update the verbose license text with the matching SPDX license identifier. Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> [Changed deprecated GPL-2.0+ to GPL-2.0-or-later. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-12-21 11:28:25 +01:00
Sean Christopherson	9b7ebff23c	KVM: x86: Remove KF() macro placeholder Although well-intentioned, keeping the KF() definition as a hint for handling scattered CPUID features may be counter-productive. Simply redefining the bit position only works for directly manipulating the guest's CPUID leafs, e.g. it doesn't make guest_cpuid_has() magically work. Taking an alternative approach, e.g. ensuring the bit position is identical between the Linux-defined and hardware-defined features, may be a simpler and/or more effective method of exposing scattered features to the guest. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-12-21 11:28:25 +01:00
Jim Mattson	788fc1e9ad	kvm: vmx: Allow guest read access to IA32_TSC Let the guest read the IA32_TSC MSR with the generic RDMSR instruction as well as the specific RDTSC(P) instructions. Note that the hardware applies the TSC multiplier and offset (when applicable) to the result of RDMSR(IA32_TSC), just as it does to the result of RDTSC(P). Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-12-21 11:28:24 +01:00
Jim Mattson	9ebdfe5230	kvm: nVMX: NMI-window and interrupt-window exiting should wake L2 from HLT According to the SDM, "NMI-window exiting" VM-exits wake a logical processor from the same inactive states as would an NMI and "interrupt-window exiting" VM-exits wake a logical processor from the same inactive states as would an external interrupt. Specifically, they wake a logical processor from the shutdown state and from the states entered using the HLT and MWAIT instructions. Fixes: `6dfacadd58` ("KVM: nVMX: Add support for activity state HLT") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> [Squashed comments of two Jim's patches and used the simplified code hunk provided by Sean. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-12-21 11:28:24 +01:00
Tambe, William	e081354d6a	KVM: nSVM: Fix nested guest support for PAUSE filtering. Currently, the nested guest's PAUSE intercept intentions are not being honored. Instead, since the L0 hypervisor's pause_filter_count and pause_filter_thresh values are still in place, these values are used instead of those programmed in the VMCB by the L1 hypervisor. To honor the desired PAUSE intercept support of the L1 hypervisor, the L0 hypervisor must use the PAUSE filtering fields of the L1 hypervisor. This requires saving and restoring of both the L0 and L1 hypervisor's PAUSE filtering fields. Signed-off-by: William Tambe <william.tambe@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-12-21 11:28:23 +01:00
YueHaibing	ba7424b200	KVM: VMX: Remove duplicated include from vmx.c Remove duplicated include. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-12-21 11:28:21 +01:00
Vitaly Kuznetsov	e87555e550	KVM: x86: svm: report MSR_IA32_MCG_EXT_CTL as unsupported AMD doesn't seem to implement MSR_IA32_MCG_EXT_CTL and svm code in kvm knows nothing about it, however, this MSR is among emulated_msrs and thus returned with KVM_GET_MSR_INDEX_LIST. The consequent KVM_GET_MSRS, of course, fails. Report the MSR as unsupported to not confuse userspace. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-12-21 11:28:20 +01:00
Paolo Bonzini	ed8e481227	KVM: x86: fix size of x86_fpu_cache objects The memory allocation in `b666a4b697` ("kvm: x86: Dynamically allocate guest_fpu", 2018-11-06) is wrong, there are other members in struct fpu before the fpregs_state union and the patch should be doing something similar to the code in fpu__init_task_struct_size. It's enough to run a guest and then rmmod kvm to see slub errors which are actually caused by memory corruption. For now let's revert it to sizeof(struct fpu), which is conservative. I have plans to move fsave/fxsave/xsave directly in KVM, without using the kernel FPU helpers, and once it's done, the size of the object in the cache will be something like kvm_xstate_size. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-21 11:28:19 +01:00
David S. Miller	2be09de7d6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Lots of conflicts, by happily all cases of overlapping changes, parallel adds, things of that nature. Thanks to Stephen Rothwell, Saeed Mahameed, and others for their guidance in these resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 11:53:36 -08:00
David Howells	e262e32d6b	vfs: Suppress MS_* flag defs within the kernel unless explicitly enabled Only the mount namespace code that implements mount(2) should be using the MS_* flags. Suppress them inside the kernel unless uapi/linux/mount.h is included. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: David Howells <dhowells@redhat.com>	2018-12-20 16:32:56 +00:00
Sinan Kaya	5d32a66541	PCI/ACPI: Allow ACPI to be built without CONFIG_PCI set We are compiling PCI code today for systems with ACPI and no PCI device present. Remove the useless code and reduce the tight dependency. Signed-off-by: Sinan Kaya <okaya@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # PCI parts Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2018-12-20 10:19:49 +01:00
Joerg Roedel	03ebe48e23	Merge branches 'iommu/fixes', 'arm/renesas', 'arm/mediatek', 'arm/tegra', 'arm/omap', 'arm/smmu', 'x86/vt-d', 'x86/amd' and 'core' into next	2018-12-20 10:05:20 +01:00
Steven Rostedt (VMware)	d2a68c4eff	x86/ftrace: Do not call function graph from dynamic trampolines Since commit `79922b8009` ("ftrace: Optimize function graph to be called directly"), dynamic trampolines should not be calling the function graph tracer at the end. If they do, it could cause the function graph tracer to trace functions that it filtered out. Right now it does not cause a problem because there's a test to check if the function graph tracer is attached to the same function as the function tracer, which for now is true. But the function graph tracer is undergoing changes that can make this no longer true which will cause the function graph tracer to trace other functions. For example: # cd /sys/kernel/tracing/ # echo do_IRQ > set_ftrace_filter # mkdir instances/foo # echo ip_rcv > instances/foo/set_ftrace_filter # echo function_graph > current_tracer # echo function > instances/foo/current_tracer Would cause the function graph tracer to trace both do_IRQ and ip_rcv, if the current tests change. As the current tests prevent this from being a problem, this code does not need to be backported. But it does make the code cleaner. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2018-12-19 22:43:37 -05:00
Vitaly Kuznetsov	3cf85f9f6b	KVM: x86: nSVM: fix switch to guest mmu Recent optimizations in MMU code broke nested SVM with NPT in L1 completely: when we do nested_svm_{,un}init_mmu_context() we want to switch from TDP MMU to shadow MMU, both init_kvm_tdp_mmu() and kvm_init_shadow_mmu() check if re-configuration is needed by looking at cache source data. The data, however, doesn't change - it's only the type of the MMU which changes. We end up not re-initializing guest MMU as shadow and everything goes off the rails. The issue could have been fixed by putting MMU type into extended MMU role but this is not really needed. We can just split root and guest MMUs the exact same way we did for nVMX, their types never change in the lifetime of a vCPU. There is still room for improvement: currently, we reset all MMU roots when switching from L1 to L2 and back and this is not needed. Fixes: `7dcd575520` ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-19 22:19:22 +01:00
Ingo Molnar	6ac389346e	Revert "kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs" This reverts commit `77b0bf55bc`. See this commit for details about the revert: `e769742d35` ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Conflicts: arch/x86/Makefile Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-19 12:00:28 +01:00
Ingo Molnar	96af6cd02a	Revert "x86/objtool: Use asm macros to work around GCC inlining bugs" This reverts commit `c06c4d8090`. See this commit for details about the revert: `e769742d35` ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-19 12:00:23 +01:00
Ingo Molnar	ac180540b0	Revert "x86/refcount: Work around GCC inlining bug" This reverts commit `9e1725b410`. See this commit for details about the revert: `e769742d35` ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") The conflict resolution for interaction with: `288e4521f0`: ("x86/asm: 'Simplify' GEN_*_RMWcc() macros") was provided by Masahiro Yamada. Conflicts: arch/x86/include/asm/refcount.h Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-19 12:00:09 +01:00
Ingo Molnar	851a4cd7cc	Revert "x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs" This reverts commit `77f48ec28e`. See this commit for details about the revert: `e769742d35` ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-19 12:00:04 +01:00
Ingo Molnar	ffb61c6346	Revert "x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs" This reverts commit `f81f8ad56f`. See this commit for details about the revert: `e769742d35` ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-19 12:00:00 +01:00
Ingo Molnar	a4da3d86a2	Revert "x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops" This reverts commit `494b5168f2`. See this commit for details about the revert: `e769742d35` ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-19 11:59:55 +01:00
Ingo Molnar	81a68455e7	Revert "x86/extable: Macrofy inline assembly code to work around GCC inlining bugs" This reverts commit `0474d5d9d2`. See this commit for details about the revert: `e769742d35` ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-19 11:59:47 +01:00
Ingo Molnar	c3462ba986	Revert "x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs" This reverts commit `d5a581d84a`. See this commit for details about the revert: `e769742d35` ("Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs"") Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-19 11:59:21 +01:00
Ingo Molnar	e769742d35	Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs" This reverts commit `5bdcd510c2`. The macro based workarounds for GCC's inlining bugs caused regressions: distcc and other distro build setups broke, and the fixes are not easy nor will they solve regressions on already existing installations. So we are reverting this patch and the 8 followup patches. What makes this revert easier is that GCC9 will likely include the new 'asm inline' syntax that makes inlining of assembly blocks a lot more robust. This is a superior method to any macro based hackeries - and might even be backported to GCC8, which would make all modern distros get the inlining fixes as well. Many thanks to Masahiro Yamada and others for helping sort out these problems. Reported-by: Masahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: Borislav Petkov <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Juergen Gross <jgross@suse.com> Cc: Richard Biener <rguenther@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nadav Amit <namit@vmware.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-19 11:58:10 +01:00
Borislav Petkov	72a8f089c3	x86/mce: Restore MCE injector's module name It was mce-inject.ko but it turned into inject.ko since the containing source file got renamed. Restore it. Fixes: `21afaf1813` ("x86/mce: Streamline MCE subsystem's naming") Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Tony Luck <tony.luck@intel.com> Link: https://lkml.kernel.org/r/20181218182546.GA21386@zn.tnic	2018-12-19 00:04:36 +01:00
Colin Ian King	32043fa065	x86/mtrr: Don't copy uninitialized gentry fields back to userspace Currently the copy_to_user of data in the gentry struct is copying uninitiaized data in field _pad from the stack to userspace. Fix this by explicitly memset'ing gentry to zero, this also will zero any compiler added padding fields that may be in struct (currently there are none). Detected by CoverityScan, CID#200783 ("Uninitialized scalar variable") Fixes: `b263b31e8a` ("x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Cc: security@kernel.org Link: https://lkml.kernel.org/r/20181218172956.1440-1-colin.king@canonical.com	2018-12-19 00:00:16 +01:00
Eduardo Habkost	0e1b869fff	kvm: x86: Add AMD's EX_CFG to the list of ignored MSRs Some guests OSes (including Windows 10) write to MSR 0xc001102c on some cases (possibly while trying to apply a CPU errata). Make KVM ignore reads and writes to that MSR, so the guest won't crash. The MSR is documented as "Execution Unit Configuration (EX_CFG)", at AMD's "BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors". Cc: stable@vger.kernel.org Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-18 22:15:44 +01:00
Wanpeng Li	dcbd3e49c2	KVM: X86: Fix NULL deref in vcpu_scan_ioapic Reported by syzkaller: CPU: 1 PID: 5962 Comm: syz-executor118 Not tainted 4.20.0-rc6+ #374 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline] RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline] RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline] RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline] RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074 Call Trace: kvm_vcpu_ioctl+0x5c8/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2596 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT14 msr and triggers scan ioapic logic to load synic vectors into EOI exit bitmap. However, irqchip is not initialized by this simple testcase, ioapic/apic objects should not be accessed. This patch fixes it by also considering whether or not apic is present. Reported-by: syzbot+39810e6c400efadfef71@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-18 22:15:44 +01:00
Cfir Cohen	c2dd5146e9	KVM: Fix UAF in nested posted interrupt processing nested_get_vmcs12_pages() processes the posted_intr address in vmcs12. It caches the kmap()ed page object and pointer, however, it doesn't handle errors correctly: it's possible to cache a valid pointer, then release the page and later dereference the dangling pointer. I was able to reproduce with the following steps: 1. Call vmlaunch with valid posted_intr_desc_addr but an invalid MSR_EFER. This causes nested_get_vmcs12_pages() to cache the kmap()ed pi_desc_page and pi_desc. Later the invalid EFER value fails check_vmentry_postreqs() which fails the first vmlaunch. 2. Call vmlanuch with a valid EFER but an invalid posted_intr_desc_addr (I set it to 2G - 0x80). The second time we call nested_get_vmcs12_pages pi_desc_page is unmapped and released and pi_desc_page is set to NULL (the "shouldn't happen" clause). Due to the invalid posted_intr_desc_addr, kvm_vcpu_gpa_to_page() fails and nested_get_vmcs12_pages() returns. It doesn't return an error value so vmlaunch proceeds. Note that at this time we have a dangling pointer in vmx->nested.pi_desc and POSTED_INTR_DESC_ADDR in L0's vmcs. 3. Issue an IPI in L2 guest code. This triggers a call to vmx_complete_nested_posted_interrupt() and pi_test_and_clear_on() which dereferences the dangling pointer. Vulnerable code requires nested and enable_apicv variables to be set to true. The host CPU must also support posted interrupts. Fixes: `5e2f30b756` "KVM: nVMX: get rid of nested_get_page()" Cc: stable@vger.kernel.org Reviewed-by: Andy Honig <ahonig@google.com> Signed-off-by: Cfir Cohen <cfir@google.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-18 22:15:34 +01:00
Chang S. Bae	87ab4689ca	x86/fsgsbase/64: Fix the base write helper functions Andy spotted a regression in the fs/gs base helpers after the patch series was committed. The helper functions which write fs/gs base are not just writing the base, they are also changing the index. That's wrong and needs to be separated because writing the base has not to modify the index. While the regression is not causing any harm right now because the only caller depends on that behaviour, it's a guarantee for subtle breakage down the road. Make the index explicitly changed from the caller, instead of including the code in the helpers. Subsequently, the task write helpers do not handle for the current task anymore. The range check for a base value is also factored out, to minimize code redundancy from the caller. Fixes: `b1378a561f` ("x86/fsgsbase/64: Introduce FS/GS base helper functions") Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Lutomirski <luto@kernel.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lkml.kernel.org/r/20181126195524.32179-1-chang.seok.bae@intel.com	2018-12-18 14:26:09 +01:00
Thomas Lendacky	20c3a2c33e	x86/speculation: Add support for STIBP always-on preferred mode Different AMD processors may have different implementations of STIBP. When STIBP is conditionally enabled, some implementations would benefit from having STIBP always on instead of toggling the STIBP bit through MSR writes. This preference is advertised through a CPUID feature bit. When conditional STIBP support is requested at boot and the CPU advertises STIBP always-on mode as preferred, switch to STIBP "on" support. To show that this transition has occurred, create a new spectre_v2_user_mitigation value and a new spectre_v2_user_strings message. The new mitigation value is used in spectre_v2_user_select_mitigation() to print the new mitigation message as well as to return a new string from stibp_state(). Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@alien8.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/20181213230352.6937.74943.stgit@tlendack-t1.amdoffice.net	2018-12-18 14:13:33 +01:00
Hui Wang	aa02ef099c	x86/topology: Use total_cpus for max logical packages calculation nr_cpu_ids can be limited on the command line via nr_cpus=. This can break the logical package management because it results in a smaller number of packages while in kdump kernel. Check below case: There is a two sockets system, each socket has 8 cores, which has 16 logical cpus while HT was turn on. 0 1 2 3 4 5 6 7 \| 16 17 18 19 20 21 22 23 cores on socket 0 threads on socket 0 8 9 10 11 12 13 14 15 \| 24 25 26 27 28 29 30 31 cores on socket 1 threads on socket 1 While starting the kdump kernel with command line option nr_cpus=16 panic was triggered on one of the cpus 24-31 eg. 26, then online cpu will be 1-15, 26(cpu 0 was disabled in kdump), ncpus will be 16 and __max_logical_packages will be 1, but actually two packages were booted on. This issue can reproduced by set kdump option nr_cpus=<real physical core numbers>, and then trigger panic on last socket's thread, for example: taskset -c 26 echo c > /proc/sysrq-trigger Use total_cpus which will not be limited by nr_cpus command line to calculate the value of __max_logical_packages. Signed-off-by: Hui Wang <john.wanghui@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <guijianfeng@huawei.com> Cc: <wencongyang2@huawei.com> Cc: <douliyang1@huawei.com> Cc: <qiaonuohan@huawei.com> Link: https://lkml.kernel.org/r/20181107023643.22174-1-john.wanghui@huawei.com	2018-12-18 13:38:37 +01:00
Yangtao Li	6848ac7ca3	x86/mm/dump_pagetables: Use DEFINE_SHOW_ATTRIBUTE() Use DEFINE_SHOW_ATTRIBUTE() instead of open coding it. Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: keescook@chromium.org Cc: luto@kernel.org Cc: peterz@infradead.org Cc: bp@alien8.de Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20181119154334.18265-1-tiny.windzz@gmail.com	2018-12-18 13:05:54 +01:00
James Morris	5580b4a1a8	Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity into next-integrity From Mimi: In Linux 4.19, a new LSM hook named security_kernel_load_data was upstreamed, allowing LSMs and IMA to prevent the kexec_load syscall. Different signature verification methods exist for verifying the kexec'ed kernel image. This pull request adds additional support in IMA to prevent loading unsigned kernel images via the kexec_load syscall, independently of the IMA policy rules, based on the runtime "secure boot" flag. An initial IMA kselftest is included. In addition, this pull request defines a new, separate keyring named ".platform" for storing the preboot/firmware keys needed for verifying the kexec'ed kernel image's signature and includes the associated IMA kexec usage of the ".platform" keyring. (David Howell's and Josh Boyer's patches for reading the preboot/firmware keys, which were previously posted for a different use case scenario, are included here.)	2018-12-17 11:26:46 -08:00
Peter Zijlstra	3c567356db	x86/mm/cpa: Rename @addrinarray to @numpages The CPA_ARRAY interface works in single pages, and everything, except in these 'few' locations is this variable called 'numpages'. Remove this 'addrinarray' abberation and use 'numpages' consistently. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.695039210@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-17 18:54:30 +01:00
Peter Zijlstra	c38116bb94	x86/mm/cpa: Better use CLFLUSHOPT Currently we issue an MFENCE before and after flushing a range. This means that if we flush a bunch of single page ranges -- like with the cpa array, we issue a whole bunch of superfluous MFENCEs. Reorgainze the code a little to avoid this. [ mingo: capitalize instructions, tweak changelog and comments. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.626999883@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-17 18:54:29 +01:00
Peter Zijlstra	fe0937b24f	x86/mm/cpa: Fold cpa_flush_range() and cpa_flush_array() into a single cpa_flush() function Note that the cache flush loop in cpa_flush_*() is identical when we use __cpa_addr(); further observe that flush_tlb_kernel_range() is a special case of to the cpa_flush_array() TLB invalidation code. This then means the two functions are virtually identical. Fold these two functions into a single cpa_flush() call. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.559855600@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-17 18:54:28 +01:00
Peter Zijlstra	83b4e39146	x86/mm/cpa: Make cpa_data::numpages invariant Make sure __change_page_attr_set_clr() doesn't modify cpa->numpages. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.493000228@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-17 18:54:27 +01:00
Peter Zijlstra	935f583982	x86/mm/cpa: Optimize cpa_flush_array() TLB invalidation Instead of punting and doing tlb_flush_all(), do the same as flush_tlb_kernel_range() does and use single page invalidations. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.430001980@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-17 18:54:26 +01:00
Peter Zijlstra	5fe26b7a8f	x86/mm/cpa: Simplify the code after making cpa->vaddr invariant Since cpa->vaddr is invariant, this means we can remove all workarounds that deal with it changing. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.366619025@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-17 18:54:25 +01:00
Peter Zijlstra	98bfc9b038	x86/mm/cpa: Make cpa_data::vaddr invariant Currently __change_page_attr_set_clr() will modify cpa->vaddr when !(CPA_ARRAY \| CPA_PAGES_ARRAY), whereas in the array cases it will increment cpa->curpage. Change __cpa_addr() such that its @idx argument also works in the !array case and use cpa->curpage increments for all cases. NOTE: since cpa_data::numpages is 'unsigned long' so should cpa_data::curpage be. NOTE: after this only cpa->numpages is still modified. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.295174892@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-17 18:54:24 +01:00
Peter Zijlstra	16ebf031e8	x86/mm/cpa: Add __cpa_addr() helper The code to compute the virtual address of a cpa_data is duplicated; introduce a helper before more copies happen. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.229119497@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-17 18:54:23 +01:00
Peter Zijlstra	ecc729f1f4	x86/mm/cpa: Add ARRAY and PAGES_ARRAY selftests The current pageattr-test code only uses the regular range interface, add code that also tests the array and pages interface. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom.StDenis@amd.com Cc: dave.hansen@intel.com Link: http://lkml.kernel.org/r/20181203171043.162771364@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-17 18:54:22 +01:00
Ingo Molnar	02117e42db	Merge branch 'x86/urgent' into x86/mm, to pick up dependent fix Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-17 18:48:25 +01:00
Peter Zijlstra	721066dfd4	x86/mm/cpa: Fix cpa_flush_array() TLB invalidation In commit: `a7295fd53c` ("x86/mm/cpa: Use flush_tlb_kernel_range()") I misread the CAP array code and incorrectly used tlb_flush_kernel_range(), resulting in missing TLB flushes and consequent failures. Instead do a full invalidate in this case -- for now. Reported-by: StDenis, Tom <Tom.StDenis@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Fixes: `a7295fd53c` ("x86/mm/cpa: Use flush_tlb_kernel_range()") Link: http://lkml.kernel.org/r/20181203171043.089868285@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-17 18:48:09 +01:00
Masami Hiramatsu	8162b3d1a7	kprobes/x86: Remove unneeded arch_within_kprobe_blacklist from x86 Remove x86 specific arch_within_kprobe_blacklist(). Since we have already added all blacklisted symbols to the kprobe blacklist by arch_populate_kprobe_blacklist(), we don't need arch_within_kprobe_blacklist() on x86 anymore. Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonghong Song <yhs@fb.com> Link: http://lkml.kernel.org/r/154503491354.26176.13903264647254766066.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-17 17:48:40 +01:00
Masami Hiramatsu	fe6e656154	kprobes/x86: Show x86-64 specific blacklisted symbols correctly Show x86-64 specific blacklisted symbols in debugfs. Since x86-64 prohibits probing on symbols which are in entry text, those should be shown. Tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonghong Song <yhs@fb.com> Link: http://lkml.kernel.org/r/154503488425.26176.17136784384033608516.stgit@devbox Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-12-17 17:48:39 +01:00
Andrea Righi	bf9445a33a	kprobes/x86/xen: blacklist non-attachable xen interrupt functions Blacklist symbols in Xen probe-prohibited areas, so that user can see these prohibited symbols in debugfs. See also: `a50480cb6d`. Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2018-12-17 10:27:59 -05:00
Alistair Strachan	cd01544a26	x86/vdso: Pass --eh-frame-hdr to the linker Commit `379d98ddf4` ("x86: vdso: Use $LD instead of $CC to link") accidentally broke unwinding from userspace, because ld would strip the .eh_frame sections when linking. Originally, the compiler would implicitly add --eh-frame-hdr when invoking the linker, but when this Makefile was converted from invoking ld via the compiler, to invoking it directly (like vmlinux does), the flag was missed. (The EH_FRAME section is important for the VDSO shared libraries, but not for vmlinux.) Fix the problem by explicitly specifying --eh-frame-hdr, which restores parity with the old method. See relevant bug reports for additional info: https://bugzilla.kernel.org/show_bug.cgi?id=201741 https://bugzilla.redhat.com/show_bug.cgi?id=1659295 Fixes: `379d98ddf4` ("x86: vdso: Use $LD instead of $CC to link") Reported-by: Florian Weimer <fweimer@redhat.com> Reported-by: Carlos O'Donell <carlos@redhat.com> Reported-by: "H. J. Lu" <hjl.tools@gmail.com> Signed-off-by: Alistair Strachan <astrachan@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Carlos O'Donell <carlos@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: kernel-team@android.com Cc: Laura Abbott <labbott@redhat.com> Cc: stable <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: X86 ML <x86@kernel.org> Link: https://lkml.kernel.org/r/20181214223637.35954-1-astrachan@google.com	2018-12-15 11:37:51 +01:00
Marc Orr	b666a4b697	kvm: x86: Dynamically allocate guest_fpu Previously, the guest_fpu field was embedded in the kvm_vcpu_arch struct. Unfortunately, the field is quite large, (e.g., 4352 bytes on my current setup). This bloats the kvm_vcpu_arch struct for x86 into an order 3 memory allocation, which can become a problem on overcommitted machines. Thus, this patch moves the fpu state outside of the kvm_vcpu_arch struct. With this patch applied, the kvm_vcpu_arch struct is reduced to 15168 bytes for vmx on my setup when building the kernel with kvmconfig. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 18:00:08 +01:00
Marc Orr	240c35a378	kvm: x86: Use task structs fpu field for user Previously, x86's instantiation of 'struct kvm_vcpu_arch' added an fpu field to save/restore fpu-related architectural state, which will differ from kvm's fpu state. However, this is redundant to the 'struct fpu' field, called fpu, embedded in the task struct, via the thread field. Thus, this patch removes the user_fpu field from the kvm_vcpu_arch struct and replaces it with the task struct's fpu field. This change is significant because the fpu struct is actually quite large. For example, on the system used to develop this patch, this change reduces the size of the vcpu_vmx struct from 23680 bytes down to 19520 bytes, when building the kernel with kvmconfig. This reduction in the size of the vcpu_vmx struct moves us closer to being able to allocate the struct at order 2, rather than order 3. Suggested-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 18:00:07 +01:00
Krish Sadhukhan	4e445aee96	KVM: nVMX: Move the checks for Guest Non-Register States to a separate helper function .. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 18:00:06 +01:00
Krish Sadhukhan	254b2f3b0f	KVM: nVMX: Move the checks for Host Control Registers and MSRs to a separate helper function .. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 18:00:06 +01:00
Krish Sadhukhan	5fbf963400	KVM: nVMX: Move the checks for VM-Entry Control Fields to a separate helper function .. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 18:00:05 +01:00
Krish Sadhukhan	61446ba75e	KVM: nVMX: Move the checks for VM-Exit Control Fields to a separate helper function .. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 18:00:04 +01:00
Sean Christopherson	f9b245e182	KVM: nVMX: Remove param indirection from nested_vmx_check_msr_switch() Passing the enum and doing an indirect lookup is silly when we can simply pass the field directly. Remove the "fast path" code in nested_vmx_check_msr_switch_controls() as it's now nothing more than a redundant check. Remove the debug message rather than continue passing the enum for the address field. Having debug messages for the MSRs themselves is useful as MSR legality is a huge space, whereas messing up a physical address means the VMM is fundamentally broken. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 18:00:04 +01:00
Krish Sadhukhan	461b4ba4c7	KVM: nVMX: Move the checks for VM-Execution Control Fields to a separate helper function .. to improve readability and maintainability, and to align the code as per the layout of the checks in chapter "VM Entries" in Intel SDM vol 3C. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 18:00:03 +01:00
Krish Sadhukhan	16322a3b5e	KVM: nVMX: Prepend "nested_vmx_" to check_vmentry_{pre,post}reqs() .. as they are used only in nested vmx context. Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 18:00:02 +01:00
Lan Tianyu	53963a70ac	KVM/VMX: Check ept_pointer before flushing ept tlb This patch is to initialize ept_pointer to INVALID_PAGE and check it before flushing ept tlb. If ept_pointer is invalid, bypass the flush request. Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 18:00:02 +01:00
Krish Sadhukhan	a0d4f80344	KVM nVMX: MSRs should not be stored if VM-entry fails during or after loading guest state According to section "VM-entry Failures During or After Loading Guest State" in Intel SDM vol 3C, "No MSRs are saved into the VM-exit MSR-store area." when bit 31 of the exit reason is set. Reported-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Suggested-by: Jim Mattson <jmattson@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 18:00:01 +01:00
Jim Mattson	e53d88af63	kvm: x86: Don't modify MSR_PLATFORM_INFO on vCPU reset If userspace has provided a different value for this MSR (e.g with the turbo bits set), the userspace-provided value should survive a vCPU reset. For backwards compatibility, MSR_PLATFORM_INFO is initialized in kvm_arch_vcpu_setup. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Drew Schmitt <dasch@google.com> Cc: Abhiroop Dabral <adabral@paloaltonetworks.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 18:00:01 +01:00
Wei Huang	3d82c565a7	kvm: vmx: add cpu into VMX preemption timer bug list This patch adds Intel "Xeon CPU E3-1220 V2", with CPUID.01H.EAX=0x000306A8, into the list of known broken CPUs which fail to support VMX preemption timer. This bug was found while running the APIC timer test of kvm-unit-test on this specific CPU, even though the errata info can't be located in the public domain for this CPU. Signed-off-by: Wei Huang <wei@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 18:00:00 +01:00
Eduardo Habkost	d7b09c827a	kvm: x86: Report STIBP on GET_SUPPORTED_CPUID Months ago, we have added code to allow direct access to MSR_IA32_SPEC_CTRL to the guest, which makes STIBP available to guests. This was implemented by commits `d28b387fb7` ("KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL") and `b2ac58f905` ("KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL"). However, we never updated GET_SUPPORTED_CPUID to let userspace know that STIBP can be enabled in CPUID. Fix that by updating kvm_cpuid_8000_0008_ebx_x86_features and kvm_cpuid_7_0_edx_x86_features. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:59 +01:00
Vitaly Kuznetsov	87a8d795b2	x86/hyper-v: Stop caring about EOI for direct stimers Turns out we over-engineered Direct Mode for stimers a bit: unlike traditional stimers where we may want to try to re-inject the message upon EOI, Direct Mode stimers just set the irq in APIC and kvm_apic_set_irq() fails only when APIC is disabled (see APIC_DM_FIXED case in __apic_accept_irq()). Remove the redundant part. Suggested-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:59 +01:00
Vitaly Kuznetsov	08a800ac25	x86/kvm/hyper-v: avoid open-coding stimer_mark_pending() in kvm_hv_notify_acked_sint() stimers_pending optimization only helps us to avoid multiple kvm_make_request() calls. This doesn't happen very often and these calls are very cheap in the first place, remove open-coded version of stimer_mark_pending() from kvm_hv_notify_acked_sint(). Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:58 +01:00
Vitaly Kuznetsov	8644f771e0	x86/kvm/hyper-v: direct mode for synthetic timers Turns out Hyper-V on KVM (as of 2016) will only use synthetic timers if direct mode is available. With direct mode we notify the guest by asserting APIC irq instead of sending a SynIC message. The implementation uses existing vec_bitmap for letting lapic code know that we're interested in the particular IRQ's EOI request. We assume that the same APIC irq won't be used by the guest for both direct mode stimer and as sint source (especially with AutoEOI semantics). It is unclear how things should be handled if that's not true. Direct mode is also somewhat less expensive; in my testing stimer_send_msg() takes not less than 1500 cpu cycles and stimer_notify_direct() can usually be done in 300-400. WS2016 without Hyper-V, however, always sticks to non-direct version. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:57 +01:00
Vitaly Kuznetsov	6a058a1ead	x86/kvm/hyper-v: use stimer config definition from hyperv-tlfs.h As a preparation to implementing Direct Mode for Hyper-V synthetic timers switch to using stimer config definition from hyperv-tlfs.h. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:57 +01:00
Vitaly Kuznetsov	0aa67255f5	x86/hyper-v: move synic/stimer control structures definitions to hyperv-tlfs.h We implement Hyper-V SynIC and synthetic timers in KVM too so there's some room for code sharing. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:56 +01:00
Vitaly Kuznetsov	2bc39970e9	x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID With every new Hyper-V Enlightenment we implement we're forced to add a KVM_CAP_HYPERV_* capability. While this approach works it is fairly inconvenient: the majority of the enlightenments we do have corresponding CPUID feature bit(s) and userspace has to know this anyways to be able to expose the feature to the guest. Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one cap to rule them all!") returning all Hyper-V CPUID feature leaves. Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible: Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000, 0x40000001) and we would probably confuse userspace in case we decide to return these twice. KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:54 +01:00
Vitaly Kuznetsov	e2e871ab2f	x86/kvm/hyper-v: Introduce nested_get_evmcs_version() helper The upcoming KVM_GET_SUPPORTED_HV_CPUID ioctl will need to return Enlightened VMCS version in HYPERV_CPUID_NESTED_FEATURES.EAX when it was enabled. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:54 +01:00
Vitaly Kuznetsov	220d6586ec	x86/hyper-v: Drop HV_X64_CONFIGURE_PROFILER definition BIT(13) in HYPERV_CPUID_FEATURES.EBX is described as "ConfigureProfiler" in TLFS v4.0 but starting 5.0 it is replaced with 'Reserved'. As we don't currently us it in kernel it can just be dropped. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:53 +01:00
Vitaly Kuznetsov	a4987defc1	x86/hyper-v: Do some housekeeping in hyperv-tlfs.h hyperv-tlfs.h is a bit messy: CPUID feature bits are not always sorted, it's hard to get which CPUID they belong to, some items are duplicated (e.g. HV_X64_MSR_CRASH_CTL_NOTIFY/HV_CRASH_CTL_CRASH_NOTIFY). Do some housekeeping work. While on it, replace all (1 << X) with BIT(X) macro. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:53 +01:00
Vitaly Kuznetsov	ec08449172	x86/hyper-v: Mark TLFS structures packed The TLFS structures are used for hypervisor-guest communication and must exactly meet the specification. Compilers can add alignment padding to structures or reorder struct members for randomization and optimization, which would break the hypervisor ABI. Mark the structures as packed to prevent this. 'struct hv_vp_assist_page' and 'struct hv_enlightened_vmcs' need to be properly padded to support the change. Suggested-by: Nadav Amit <nadav.amit@gmail.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Nadav Amit <nadav.amit@gmail.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:52 +01:00
Roman Kagan	7deec5e0df	x86: kvm: hyperv: don't retry message delivery for periodic timers The SynIC message delivery protocol allows the message originator to request, should the message slot be busy, to be notified when it's free. However, this is unnecessary and even undesirable for messages generated by SynIC timers in periodic mode: if the period is short enough compared to the time the guest spends in the timer interrupt handler, so the timer ticks start piling up, the excessive interactions due to this notification and retried message delivery only makes the things worse. [This was observed, in particular, with Windows L2 guests setting (temporarily) the periodic timer to 2 kHz, and spending hundreds of microseconds in the timer interrupt handler due to several L2->L1 exits; under some load in L0 this could exceed 500 us so the timer ticks started to pile up and the guest livelocked.] Relieve the situation somewhat by not retrying message delivery for periodic SynIC timers. This appears to remain within the "lazy" lost ticks policy for SynIC timers as implemented in KVM. Note that it doesn't solve the fundamental problem of livelocking the guest with a periodic timer whose period is smaller than the time needed to process a tick, but it makes it a bit less likely to be triggered. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:51 +01:00
Roman Kagan	3a0e773172	x86: kvm: hyperv: simplify SynIC message delivery SynIC message delivery is somewhat overengineered: it pretends to follow the ordering rules when grabbing the message slot, using atomic operations and all that, but does it incorrectly and unnecessarily. The correct order would be to first set .msg_pending, then atomically replace .message_type if it was zero, and then clear .msg_pending if the previous step was successful. But this all is done in vcpu context so the whole update looks atomic to the guest (it's assumed to only access the message page from this cpu), and therefore can be done in whatever order is most convenient (and is also the reason why the incorrect order didn't trigger any bugs so far). While at this, also switch to kvm_vcpu_{read,write}_guest_page, and drop the no longer needed synic_clear_sint_msg_pending. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:51 +01:00
Peng Hao	eb1ff0a913	kvm: x86: remove unnecessary recalculate_apic_map In the previous code, the variable apic_sw_disabled influences recalculate_apic_map. But in "KVM: x86: simplify kvm_apic_map" (commit: `3b5a5ffa92`), the access to apic_sw_disabled in recalculate_apic_map has been deleted. Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:50 +01:00
Peng Hao	b2227ddec1	kvm: svm: remove unused struct definition structure svm_init_data is never used. So remove it. Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:50 +01:00
Jim Mattson	84c8c5b8f8	kvm: vmx: Skip all SYSCALL MSRs in setup_msrs() when !EFER.SCE Like IA32_STAR, IA32_LSTAR and IA32_FMASK only need to contain guest values on VM-entry when the guest is in long mode and EFER.SCE is set. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:49 +01:00
Jim Mattson	db31c8f5af	kvm: vmx: Don't set hardware IA32_CSTAR MSR on VM-entry SYSCALL raises #UD in compatibility mode on Intel CPUs, so it's pointless to load the guest's IA32_CSTAR value into the hardware MSR. IA32_CSTAR still provides 48 bits of storage on Intel CPUs that have CPUID.80000001:EDX.LM[bit 29] set, so we cannot remove it from the vmx_msr_index[] array. Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:48 +01:00
Jim Mattson	898a811f14	kvm: vmx: Document the need for MSR_STAR in i386 builds Add a comment explaining why MSR_STAR must be included in vmx_msr_index[] even for i386 builds. The elided comment has not been relevant since move_msr_up() was introduced in commit `a75beee6e4` ("KVM: VMX: Avoid saving and restoring msrs on lightweight vmexit"). Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:48 +01:00
Jim Mattson	0023ef39dc	kvm: vmx: Set IA32_TSC_AUX for legacy mode guests RDTSCP is supported in legacy mode as well as long mode. The IA32_TSC_AUX MSR should be set to the correct guest value before entering any guest that supports RDTSCP. Fixes: `4e47c7a6d7` ("KVM: VMX: Add instruction rdtscp support for guest") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:47 +01:00
Sean Christopherson	55d2375e58	KVM: nVMX: Move nested code to dedicated files From a functional perspective, this is (supposed to be) a straight copy-paste of code. Code was moved piecemeal to nested.c as not all code that could/should be moved was obviously nested-only. The nested code was then re-ordered as needed to compile, i.e. stats may not show this is being a "pure" move despite there not being any intended changes in functionality. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:46 +01:00
Sean Christopherson	7c97fcb3b6	KVM: VMX: Expose nested_vmx_allowed() to nested VMX as a non-inline Exposing only the function allows @nested, i.e. the module param, to be statically defined in vmx.c, ensuring we aren't unnecessarily checking said variable in the nested code. nested_vmx_allowed() is exposed due to the need to verify nested support in vmx_{get,set}_nested_state(). The downside is that nested_vmx_allowed() likely won't be inlined in vmx_{get,set}_nested_state(), but that should be a non-issue as they're not a hot path. Keeping vmx_{get,set}_nested_state() in vmx.c isn't a viable option as they need access to several nested-only functions. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:59:45 +01:00
Sean Christopherson	97b7ead392	KVM: VMX: Expose various getters and setters to nested VMX ...as they're used directly by the nested code. This will allow moving the bulk of the nested code out of vmx.c without concurrent changes to vmx.h. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:18:01 +01:00
Sean Christopherson	cf3646eb3a	KVM: VMX: Expose misc variables needed for nested VMX Exposed vmx_msr_index, vmx_return and host_efer via vmx.h so that the nested code can be moved out of vmx.c. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:18:01 +01:00
Sean Christopherson	ff241486ac	KVM: nVMX: Move "vmcs12 to shadow/evmcs sync" to helper function ...so that the function doesn't need to be created when moving the nested code out of vmx.c. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:18:00 +01:00
Sean Christopherson	3e8eacccae	KVM: nVMX: Call nested_vmx_setup_ctls_msrs() iff @nested is true ...so that it doesn't need access to @nested. The only case where the provided struct isn't already zeroed is the call from vmx_create_vcpu() as setup_vmcs_config() zeroes the struct in the other use cases. This will allow @nested to be statically defined in vmx.c, i.e. this removes the last direct reference from nested code. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:17:59 +01:00
Sean Christopherson	e4027cfafd	KVM: nVMX: Set callbacks for nested functions during hardware setup ...in nested-specific code so that they can eventually be moved out of vmx.c, e.g. into nested.c. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:17:58 +01:00
Sean Christopherson	a3203381ca	KVM: VMX: Move the hardware {un}setup functions to the bottom ...so that future patches can reference e.g. @kvm_vmx_exit_handlers without having to simultaneously move a big chunk of code. Speaking from experience, resolving merge conflicts is an absolute nightmare without pre-moving the code. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:17:58 +01:00
Sean Christopherson	5158917c7b	KVM: x86: nVMX: Allow nested_enable_evmcs to be NULL ...so that it can conditionally set by the VMX code, i.e. iff @nested is true. This will in turn allow it to be moved out of vmx.c and into a nested-specified file. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:17:57 +01:00
Sean Christopherson	944c346453	KVM: VMX: Move nested hardware/vcpu {un}setup to helper functions Eventually this will allow us to move the nested VMX code out of vmx.c. Note that this also effectively wraps @enable_shadow_vmcs with @nested so that it too can be moved out of vmx.c. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:17:56 +01:00
Sean Christopherson	89b0c9f583	KVM: VMX: Move VMX instruction wrappers to a dedicated header file VMX has a few hundred lines of code just to wrap various VMX specific instructions, e.g. VMWREAD, INVVPID, etc... Move them to a dedicated header so it's easier to find/isolate the boilerplate. With this change, more inlines can be moved from vmx.c to vmx.h. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 17:17:27 +01:00
Sean Christopherson	75edce8a45	KVM: VMX: Move eVMCS code to dedicated files The header, evmcs.h, already exists and contains a fair amount of code, but there are a few pieces in vmx.c that can be moved verbatim. In addition, move an array definition to evmcs.c to prepare for multiple consumers of evmcs.h. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 14:00:06 +01:00
Sean Christopherson	8373d25d25	KVM: VMX: Add vmx.h to hold VMX definitions Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 14:00:01 +01:00
Sean Christopherson	609363cf81	KVM: nVMX: Move vmcs12 code to dedicated files vmcs12 is the KVM-defined struct used to track a nested VMCS, e.g. a VMCS created by L1 for L2. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:30 +01:00
Sean Christopherson	cb1d474b32	KVM: VMX: Move VMCS definitions to dedicated file This isn't intended to be a pure reflection of hardware, e.g. struct loaded_vmcs and struct vmcs_host_state are KVM-defined constructs. Similar to capabilities.h, this is a standalone file to avoid circular dependencies between yet-to-be-created vmx.h and nested.h files. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:29 +01:00
Sean Christopherson	2c4fd91d26	KVM: VMX: Expose various module param vars via capabilities.h Expose the variables associated with various module params that are needed by the nested VMX code. There is no ulterior logic for what variables are/aren't exposed, this is purely "what's needed by the nested code". Note that @nested is intentionally not exposed. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:29 +01:00
Sean Christopherson	3077c19108	KVM: VMX: Move capabilities structs and helpers to dedicated file Defining a separate capabilities.h as opposed to putting this code in e.g. vmx.h avoids circular dependencies between (the yet-to-be-added) vmx.h and nested.h. The aforementioned circular dependencies are why struct nested_vmx_msrs also resides in capabilities instead of e.g. nested.h. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:28 +01:00
Sean Christopherson	7caaa71108	KVM: VMX: Pass vmx_capability struct to setup_vmcs_config() ...instead of referencing the global struct. This will allow moving setup_vmcs_config() to a separate file that may not have access to the global variable. Modify nested_vmx_setup_ctls_msrs() appropriately since vmx_capability.ept may not be accurate when called by vmx_check_processor_compat(). No functional change intended. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:27 +01:00
Sean Christopherson	c73da3fcab	KVM: VMX: Properly handle dynamic VM Entry/Exit controls EFER and PERF_GLOBAL_CTRL MSRs have dedicated VM Entry/Exit controls that KVM dynamically toggles based on whether or not the guest's value for each MSRs differs from the host. Handle the dynamic behavior by adding a helper that clears the dynamic bits so the bits aren't set when initializing the VMCS field outside of the dynamic toggling flow. This makes the handling consistent with similar behavior for other controls, e.g. pin, exec and sec_exec. More importantly, it eliminates two global bools that are stealthily modified by setup_vmcs_config. Opportunistically clean up a comment and print related to errata for IA32_PERF_GLOBAL_CTRL. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:26 +01:00
Sean Christopherson	71d9409e20	KVM: VMX: Move caching of MSR_IA32_XSS to hardware_setup() MSR_IA32_XSS has no relation to the VMCS whatsoever, it doesn't belong in setup_vmcs_config() and its reference to host_xss prevents moving setup_vmcs_config() to a dedicated file. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:26 +01:00
Sean Christopherson	4cebd747d7	KVM: VMX: Drop the "vmx" prefix from vmx_evmcs.h VMX specific files now reside in a dedicated subdirectory, i.e. the file name prefix is redundant. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:25 +01:00
Sean Christopherson	e0123119a5	KVM: VMX: rename vmx_shadow_fields.h to vmcs_shadow_fields.h VMX specific files now reside in a dedicated subdirectory. Drop the "vmx" prefix, which is redundant, and add a "vmcs" prefix to clarify that the file is referring to VMCS shadow fields. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:24 +01:00
Sean Christopherson	a821bab2d1	KVM: VMX: Move VMX specific files to a "vmx" subdirectory ...to prepare for shattering vmx.c into multiple files without having to prepend "vmx_" to all new files. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:24 +01:00
Sean Christopherson	3592cda6bc	KVM: x86: Add requisite includes to hyperv.h Until this point vmx.c has been the only consumer and included the file after many others. Prepare for multiple consumers, i.e. the shattering of vmx.c Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:23 +01:00
Sean Christopherson	8ba2e525ec	KVM: x86: Add requisite includes to kvm_cache_regs.h Until this point vmx.c has been the only consumer and included the file after many others. Prepare for multiple consumers, i.e. the shattering of vmx.c Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:22 +01:00
Sean Christopherson	199b118ab3	KVM: VMX: Alphabetize the includes in vmx.c ...to prepare for the creation of a "vmx" subdirectory that will contain a variety of headers. Clean things up now to avoid making a bigger mess in the future. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:21 +01:00
Sean Christopherson	dfae3c03b8	KVM: nVMX: Allocate and configure VM{READ,WRITE} bitmaps iff enable_shadow_vmcs ...and make enable_shadow_vmcs depend on nested. Aside from the obvious memory savings, this will allow moving the relevant code out of vmx.c in the future, e.g. to a nested specific file. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:21 +01:00
Sean Christopherson	1b3ab5ad1b	KVM: nVMX: Free the VMREAD/VMWRITE bitmaps if alloc_kvm_area() fails Fixes: `34a1cd60d1` ("kvm: x86: vmx: move some vmx setting from vmx_init() to hardware_setup()") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:20 +01:00
Paolo Bonzini	2a31b9db15	kvm: introduce manual dirty log reprotect There are two problems with KVM_GET_DIRTY_LOG. First, and less important, it can take kvm->mmu_lock for an extended period of time. Second, its user can actually see many false positives in some cases. The latter is due to a benign race like this: 1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects them. 2. The guest modifies the pages, causing them to be marked ditry. 3. Userspace actually copies the pages. 4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though they were not written to since (3). This is especially a problem for large guests, where the time between (1) and (3) can be substantial. This patch introduces a new capability which, when enabled, makes KVM_GET_DIRTY_LOG not write-protect the pages it returns. Instead, userspace has to explicitly clear the dirty log bits just before using the content of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a 64-page granularity rather than requiring to sync a full memslot; this way, the mmu_lock is taken for small amounts of time, and only a small amount of time will pass between write protection of pages and the sending of their content. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:19 +01:00
Paolo Bonzini	8fe65a8299	kvm: rename last argument to kvm_get_dirty_log_protect When manual dirty log reprotect will be enabled, kvm_get_dirty_log_protect's pointer argument will always be false on exit, because no TLB flush is needed until the manual re-protection operation. Rename it from "is_dirty" to "flush", which more accurately tells the caller what they have to do with it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:18 +01:00
Paolo Bonzini	e5d83c74a5	kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic The first such capability to be handled in virt/kvm/ will be manual dirty page reprotection. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-12-14 12:34:18 +01:00
Paolo Bonzini	bb22dc14a2	Merge branch 'khdr_fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest into HEAD Merge topic branch from Shuah.	2018-12-14 12:33:31 +01:00
Christoph Hellwig	356da6d0cd	dma-mapping: bypass indirect calls for dma-direct Avoid expensive indirect calls in the fast path DMA mapping operations by directly calling the dma_direct_* ops if we are using the directly mapped DMA operations. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com>	2018-12-13 21:06:18 +01:00
Christoph Hellwig	55897af630	dma-direct: merge swiotlb_dma_ops into the dma_direct code While the dma-direct code is (relatively) clean and simple we actually have to use the swiotlb ops for the mapping on many architectures due to devices with addressing limits. Instead of keeping two implementations around this commit allows the dma-direct implementation to call the swiotlb bounce buffering functions and thus share the guts of the mapping implementation. This also simplified the dma-mapping setup on a few architectures where we don't have to differenciate which implementation to use. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com>	2018-12-13 21:06:17 +01:00
Christoph Hellwig	3731c3d477	dma-mapping: always build the direct mapping code All architectures except for sparc64 use the dma-direct code in some form, and even for sparc64 we had the discussion of a direct mapping mode a while ago. In preparation for directly calling the direct mapping code don't bother having it optionally but always build the code in. This is a minor hardship for some powerpc and arm configs that don't pull it in yet (although they should in a relase ot two), and sparc64 which currently doesn't need it at all, but it will reduce the ifdef mess we'd otherwise need significantly. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Tony Luck <tony.luck@intel.com>	2018-12-13 21:06:11 +01:00
Maran Wilson	716ff017a3	KVM: x86: Allow Qemu/KVM to use PVH entry point For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, Qemu should be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html This patch enables Qemu to use that same entry point for booting KVM guests. Signed-off-by: Maran Wilson <maran.wilson@oracle.com> Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2018-12-13 13:41:49 -05:00
Maran Wilson	a43fb7da53	xen/pvh: Move Xen code for getting mem map via hcall out of common file We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The original design for PVH entry in Xen guests relies on being able to obtain the memory map from the hypervisor using a hypercall. When we extend the PVH entry ABI to support other hypervisors like Qemu/KVM, a new mechanism will be added that allows the guest to get the memory map without needing to use hypercalls. For Xen guests, the hypercall approach will still be supported. In preparation for adding support for other hypervisors, we can move the code that uses hypercalls into the Xen specific file. This will allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson <maran.wilson@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2018-12-13 13:41:49 -05:00
Maran Wilson	8cee3974b3	xen/pvh: Move Xen specific PVH VM initialization out of common file We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. This patch moves the small block of code used for initializing Xen PVH virtual machines into the Xen specific file. This initialization is not going to be needed for Qemu/KVM guests. Moving it out of the common file is going to allow us to compile kernels in the future without CONFIG_XEN that are still capable of being booted as a Qemu/KVM guest via the PVH entry point. Signed-off-by: Maran Wilson <maran.wilson@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2018-12-13 13:41:49 -05:00
Maran Wilson	4df7363e52	xen/pvh: Create a new file for Xen specific PVH code We need to refactor PVH entry code so that support for other hypervisors like Qemu/KVM can be added more easily. The first step in that direction is to create a new file that will eventually hold the Xen specific routines. Signed-off-by: Maran Wilson <maran.wilson@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2018-12-13 13:41:49 -05:00
Maran Wilson	fcd4747698	xen/pvh: Move PVH entry code out of Xen specific tree Once hypervisors other than Xen start using the PVH entry point for starting VMs, we would like the option of being able to compile PVH entry capable kernels without enabling CONFIG_XEN and all the code that comes along with that. To allow that, we are moving the PVH code out of Xen and into files sitting at a higher level in the tree. This patch is not introducing any code or functional changes, just moving files from one location to another. Signed-off-by: Maran Wilson <maran.wilson@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2018-12-13 13:41:49 -05:00
Maran Wilson	7733607fb3	xen/pvh: Split CONFIG_XEN_PVH into CONFIG_PVH and CONFIG_XEN_PVH In order to pave the way for hypervisors other than Xen to use the PVH entry point for VMs, we need to factor the PVH entry code into Xen specific and hypervisor agnostic components. The first step in doing that, is to create a new config option for PVH entry that can be enabled independently from CONFIG_XEN. Signed-off-by: Maran Wilson <maran.wilson@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2018-12-13 13:41:49 -05:00
Eric Biggers	a033aed5a8	crypto: x86/chacha - yield the FPU occasionally To improve responsiveness, yield the FPU (temporarily re-enabling preemption) every 4 KiB encrypted/decrypted, rather than keeping preemption disabled during the entire encryption/decryption operation. Alternatively we could do this for every skcipher_walk step, but steps may be small in some cases, and yielding the FPU is expensive on x86. Suggested-by: Martin Willi <martin@strongswan.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-12-13 18:24:58 +08:00
Eric Biggers	7a507d6225	crypto: x86/chacha - add XChaCha12 support Now that the x86_64 SIMD implementations of ChaCha20 and XChaCha20 have been refactored to support varying the number of rounds, add support for XChaCha12. This is identical to XChaCha20 except for the number of rounds, which is 12 instead of 20. This can be used by Adiantum. Reviewed-by: Martin Willi <martin@strongswan.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-12-13 18:24:58 +08:00
Eric Biggers	8b65f34c58	crypto: x86/chacha20 - refactor to allow varying number of rounds In preparation for adding XChaCha12 support, rename/refactor the x86_64 SIMD implementations of ChaCha20 to support different numbers of rounds. Reviewed-by: Martin Willi <martin@strongswan.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-12-13 18:24:58 +08:00
Eric Biggers	4af7826187	crypto: x86/chacha20 - add XChaCha20 support Add an XChaCha20 implementation that is hooked up to the x86_64 SIMD implementations of ChaCha20. This can be used by Adiantum. An SSSE3 implementation of single-block HChaCha20 is also added so that XChaCha20 can use it rather than the generic implementation. This required refactoring the ChaCha permutation into its own function. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-12-13 18:24:57 +08:00
Eric Biggers	0f961f9f67	crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305 Add a 64-bit AVX2 implementation of NHPoly1305, an ε-almost-∆-universal hash function used in the Adiantum encryption mode. For now, only the NH portion is actually AVX2-accelerated; the Poly1305 part is less performance-critical so is just implemented in C. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-12-13 18:24:57 +08:00
Eric Biggers	012c82388c	crypto: x86/nhpoly1305 - add SSE2 accelerated NHPoly1305 Add a 64-bit SSE2 implementation of NHPoly1305, an ε-almost-∆-universal hash function used in the Adiantum encryption mode. For now, only the NH portion is actually SSE2-accelerated; the Poly1305 part is less performance-critical so is just implemented in C. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-12-13 18:24:57 +08:00
Dan Williams	51c3fbd89d	x86/mm: Fix decoy address handling vs 32-bit builds A decoy address is used by set_mce_nospec() to update the cache attributes for a page that may contain poison (multi-bit ECC error) while attempting to minimize the possibility of triggering a speculative access to that page. When reserve_memtype() is handling a decoy address it needs to convert it to its real physical alias. The conversion, AND'ing with __PHYSICAL_MASK, is broken for a 32-bit physical mask and reserve_memtype() is passed the last physical page. Gert reports triggering the: BUG_ON(start >= end); ...assertion when running a 32-bit non-PAE build on a platform that has a driver resource at the top of physical memory: BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved Given that the decoy address scheme is only targeted at 64-bit builds and assumes that the top of physical address space is free for use as a decoy address range, simply bypass address sanitization in the 32-bit case. Lastly, there was no need to crash the system when this failure occurred, and no need to crash future systems if the assumptions of decoy addresses are ever violated. Change the BUG_ON() to a WARN() with an error return. Fixes: `510ee090ab` ("x86/mm/pat: Prepare {reserve, free}_memtype() for...") Reported-by: Gert Robben <t2@gert.gr> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Gert Robben <t2@gert.gr> Cc: stable@vger.kernel.org Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: platform-driver-x86@vger.kernel.org Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/154454337985.789277.12133288391664677775.stgit@dwillia2-desk3.amr.corp.intel.com	2018-12-11 18:28:20 -08:00
Reinette Chatre	52eb74339a	x86/resctrl: Fix rdt_find_domain() return value and checks rdt_find_domain() returns an ERR_PTR() that is generated from a provided domain id when the value is negative. Care needs to be taken when creating an ERR_PTR() from this value because a subsequent check using IS_ERR() expects the error to be within the MAX_ERRNO range. Using an invalid domain id as an ERR_PTR() does work at this time since this is currently always -1. Using this undocumented assumption is fragile since future users of rdt_find_domain() may not be aware of thus assumption. Two related issues are addressed: - Ensure that rdt_find_domain() always returns a valid error value by forcing the error to be -ENODEV when a negative domain id is provided. - In a few instances the return value of rdt_find_domain() is just checked for NULL - fix these to include a check of ERR_PTR. Fixes: `d89b737901` ("x86/intel_rdt/cqm: Add mon_data") Fixes: `521348b011` ("x86/intel_rdt: Introduce utility to obtain CDP peer") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: fenghua.yu@intel.com Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/b88cd4ff6a75995bf8db9b0ea546908fe50f69f3.1544479852.git.reinette.chatre@intel.com	2018-12-11 22:09:28 +01:00
Reinette Chatre	80b71c340f	x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking sequence The user triggers the creation of a pseudo-locked region when writing the requested schemata to the schemata resctrl file. The pseudo-locking of a region is required to be done on a CPU that is associated with the cache on which the pseudo-locked region will reside. In order to run the locking code on a specific CPU, the needed CPU has to be selected and ensured to remain online during the entire locking sequence. At this time, the cpu_hotplug_lock is not taken during the pseudo-lock region creation and it is thus possible for a CPU to be selected to run the pseudo-locking code and then that CPU to go offline before the thread is able to run on it. Fix this by ensuring that the cpu_hotplug_lock is taken while the CPU on which code has to run needs to be controlled. Since the cpu_hotplug_lock is always taken before rdtgroup_mutex the lock order is maintained. Fixes: `e0bdfe8e36` ("x86/intel_rdt: Support creation/removal of pseudo-locked region") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: stable <stable@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/b7b17432a80f95a1fa21a1698ba643014f58ad31.1544476425.git.reinette.chatre@intel.com	2018-12-11 21:59:01 +01:00
Robin Murphy	a8a4c98fc9	x86/dma/amd-gart: Stop resizing dma_debug_entry pool dma-debug is now capable of adding new entries to its pool on-demand if the initial preallocation was insufficient, so the IOMMU_LEAK logic no longer needs to explicitly change the pool size. This does lose it the ability to save a couple of megabytes of RAM by reducing the pool size below its default, but it seems unlikely that that is a realistic concern these days (or indeed that anyone is actively debugging AGP drivers' DMA usage any more). Getting rid of dma_debug_resize_entries() will make room for further streamlining in the dma-debug code itself. Removing the call reveals quite a lot of cruft which has been useless for nearly a decade since commit `19c1a6f576` ("x86 gart: reimplement IOMMU_LEAK feature by using DMA_API_DEBUG"), including the entire 'iommu=leak' parameter, which controlled nothing except whether dma_debug_resize_entries() was called or not. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Qian Cai <cai@lca.pw> Signed-off-by: Christoph Hellwig <hch@lst.de>	2018-12-11 14:32:12 +01:00
Nick Desaulniers	e4f752dda0	x86/um/vdso: Drop implicit common-page-size linker flag GNU linker's -z common-page-size's default value is based on the target architecture. arch/x86/um/vdso/Makefile sets it to the architecture default, which is implicit and redundant. Drop it so that one more LLVM build issue gets addressed. Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Richard Weinberger <richard@nod.at> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-um@lists.infradead.org Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20181206191231.192355-1-ndesaulniers@google.com	2018-12-11 14:19:42 +01:00
Mimi Zohar	399574c64e	x86/ima: retry detecting secure boot mode The secure boot mode may not be detected on boot for some reason (eg. buggy firmware). This patch attempts one more time to detect the secure boot mode. Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>	2018-12-11 07:19:45 -05:00
Eric Richter	d958083a8f	x86/ima: define arch_get_ima_policy() for x86 On x86, there are two methods of verifying a kexec'ed kernel image signature being loaded via the kexec_file_load syscall - an architecture specific implementaton or a IMA KEXEC_KERNEL_CHECK appraisal rule. Neither of these methods verify the kexec'ed kernel image signature being loaded via the kexec_load syscall. Secure boot enabled systems require kexec images to be signed. Therefore, this patch loads an IMA KEXEC_KERNEL_CHECK policy rule on secure boot enabled systems not configured with CONFIG_KEXEC_VERIFY_SIG enabled. When IMA_APPRAISE_BOOTPARAM is configured, different IMA appraise modes (eg. fix, log) can be specified on the boot command line, allowing unsigned or invalidly signed kernel images to be kexec'ed. This patch permits enabling IMA_APPRAISE_BOOTPARAM or IMA_ARCH_POLICY, but not both. Signed-off-by: Eric Richter <erichte@linux.ibm.com> Signed-off-by: Nayna Jain <nayna@linux.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Peter Jones <pjones@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>	2018-12-11 07:13:41 -05:00
Michal Hocko	5b5e4d623e	x86/speculation/l1tf: Drop the swap storage limit restriction when l1tf=off Swap storage is restricted to max_swapfile_size (~16TB on x86_64) whenever the system is deemed affected by L1TF vulnerability. Even though the limit is quite high for most deployments it seems to be too restrictive for deployments which are willing to live with the mitigation disabled. We have a customer to deploy 8x 6,4TB PCIe/NVMe SSD swap devices which is clearly out of the limit. Drop the swap restriction when l1tf=off is specified. It also doesn't make much sense to warn about too much memory for the l1tf mitigation when it is forcefully disabled by the administrator. [ tglx: Folded the documentation delta change ] Fixes: `377eeaa8e1` ("x86/speculation/l1tf: Limit swap file size to MAX_PA/2") Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Kosina <jkosina@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: <linux-mm@kvack.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181113184910.26697-1-mhocko@kernel.org	2018-12-11 11:46:13 +01:00
Kirill A. Shutemov	254eb5505c	x86/dump_pagetables: Fix LDT remap address marker The LDT remap placement has been changed. It's now placed before the direct mapping in the kernel virtual address space for both paging modes. Change address markers order accordingly. Fixes: `d52888aa27` ("x86/mm: Move LDT remap out of KASLR region on 5-level paging") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: luto@kernel.org Cc: peterz@infradead.org Cc: boris.ostrovsky@oracle.com Cc: jgross@suse.com Cc: bhe@redhat.com Cc: hans.van.kranenburg@mendix.com Cc: linux-mm@kvack.org Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20181130202328.65359-3-kirill.shutemov@linux.intel.com	2018-12-11 11:19:24 +01:00
Kirill A. Shutemov	16877a5570	x86/mm: Fix guard hole handling There is a guard hole at the beginning of the kernel address space, also used by hypervisors. It occupies 16 PGD entries. This reserved range is not defined explicitely, it is calculated relative to other entities: direct mapping and user space ranges. The calculation got broken by recent changes of the kernel memory layout: LDT remap range is now mapped before direct mapping and makes the calculation invalid. The breakage leads to crash on Xen dom0 boot[1]. Define the reserved range explicitely. It's part of kernel ABI (hypervisors expect it to be stable) and must not depend on changes in the rest of kernel memory layout. [1] https://lists.xenproject.org/archives/html/xen-devel/2018-11/msg03313.html Fixes: `d52888aa27` ("x86/mm: Move LDT remap out of KASLR region on 5-level paging") Reported-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: bp@alien8.de Cc: hpa@zytor.com Cc: dave.hansen@linux.intel.com Cc: luto@kernel.org Cc: peterz@infradead.org Cc: boris.ostrovsky@oracle.com Cc: bhe@redhat.com Cc: linux-mm@kvack.org Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20181130202328.65359-2-kirill.shutemov@linux.intel.com	2018-12-11 11:19:24 +01:00
David S. Miller	addb067983	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2018-12-11 The following pull-request contains BPF updates for your net-next tree. It has three minor merge conflicts, resolutions: 1) tools/testing/selftests/bpf/test_verifier.c Take first chunk with alignment_prevented_execution. 2) net/core/filter.c [...] case bpf_ctx_range_ptr(struct __sk_buff, flow_keys): case bpf_ctx_range(struct __sk_buff, wire_len): return false; [...] 3) include/uapi/linux/bpf.h Take the second chunk for the two cases each. The main changes are: 1) Add support for BPF line info via BTF and extend libbpf as well as bpftool's program dump to annotate output with BPF C code to facilitate debugging and introspection, from Martin. 2) Add support for BPF_ALU \| BPF_ARSH \| BPF_{K,X} in interpreter and all JIT backends, from Jiong. 3) Improve BPF test coverage on archs with no efficient unaligned access by adding an "any alignment" flag to the BPF program load to forcefully disable verifier alignment checks, from David. 4) Add a new bpf_prog_test_run_xattr() API to libbpf which allows for proper use of BPF_PROG_TEST_RUN with data_out, from Lorenz. 5) Extend tc BPF programs to use a new __sk_buff field called wire_len for more accurate accounting of packets going to wire, from Petar. 6) Improve bpftool to allow dumping the trace pipe from it and add several improvements in bash completion and map/prog dump, from Quentin. 7) Optimize arm64 BPF JIT to always emit movn/movk/movk sequence for kernel addresses and add a dedicated BPF JIT backend allocator, from Ard. 8) Add a BPF helper function for IR remotes to report mouse movements, from Sean. 9) Various cleanups in BPF prog dump e.g. to make UAPI bpf_prog_info member naming consistent with existing conventions, from Yonghong and Song. 10) Misc cleanups and improvements in allowing to pass interface name via cmdline for xdp1 BPF example, from Matteo. 11) Fix a potential segfault in BPF sample loader's kprobes handling, from Daniel T. 12) Fix SPDX license in libbpf's README.rst, from Andrey. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-10 18:00:43 -08:00
Linus Torvalds	8586ca8a21	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Three fixes: a boot parameter re-(re-)fix, a retpoline build artifact fix and an LLVM workaround" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Drop implicit common-page-size linker flag x86/build: Fix compiler support check for CONFIG_RETPOLINE x86/boot: Clear RSDP address in boot_params for broken loaders	2018-12-09 15:09:55 -08:00
Linus Torvalds	ebbd30004d	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull kprobes fixes from Ingo Molnar: "Two kprobes fixes: a blacklist fix and an instruction patching related corruption fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes/x86: Blacklist non-attachable interrupt functions kprobes/x86: Fix instruction patching corruption when copying more than one RIP-relative instruction	2018-12-09 14:21:33 -08:00

1 2 3 4 5 ...

31475 Commits