linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2025-01-15 05:06:17 +07:00

Author	SHA1	Message	Date
Heiko Carstens	932602e238	fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types Some fs compat system calls have unsigned long parameters instead of compat_ulong_t. In order to allow the COMPAT_SYSCALL_DEFINE macro generate code that performs proper zero and sign extension convert all 64 bit parameters their corresponding 32 bit counterparts. compat_sys_io_getevents() is a bit different: the non-compat version has signed parameters for the "min_nr" and "nr" parameters while the compat version has unsigned parameters. So change this as well. For all practical purposes this shouldn't make any difference (doesn't fix a real bug). Also introduce a generic compat_aio_context_t type which can be used everywhere. The access_ok() check within compat_sys_io_getevents() got also removed since the non-compat sys_io_getevents() should be able to handle everything anyway. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-06 16:30:44 +01:00
Heiko Carstens	378a10f3ae	fs/compat: optional preadv64/pwrite64 compat system calls The preadv64/pwrite64 have been implemented for the x32 ABI, in order to allow passing 64 bit arguments from user space without splitting them into two 32 bit parameters, like it would be necessary for usual compat tasks. Howevert these two system calls are only being used for the x32 ABI, so add __ARCH_WANT_COMPAT defines for these two compat syscalls and make these two only visible for x86. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-06 15:35:09 +01:00
Heiko Carstens	c2e7c3d0ef	s390/compat: partial parameter conversion within syscall wrappers Parameter conversion within the system call wrappers is only needed for parameters which differ in size and have a size of eight bytes on 64 bit. For system call parameters with a size of less than eight byte the called system call itself will perform parameter conversion anyway. So we can save the double conversion of e.g. int parameters. The only types which need to be converted are therefore pointer and (unsigned) long parameters. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:12:25 +01:00
Heiko Carstens	ab4f8bba19	s390/compat: automatic zero, sign and pointer conversion of syscalls Instead of explicitly changing compat system call parameters from e.g. unsigned long to compat_ulong_t let the COMPAT_SYSCALL_WRAP macros automatically detect (unsigned) long parameters and zero and sign extend them automatically. The resulting binary is completely identical. In addition add a sys_[system call name] prototype for each system call wrapper. This will cause compile errors if the prototype does not match the prototype in include/linux/syscall.h. Therefore we should now always get the correct zero and sign extension of system call parameters. Pointers are handled like before. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:12:24 +01:00
Heiko Carstens	2c81fc4fb4	s390/compat: add sync_file_range and fallocate compat syscalls The compat syscall wrappers for sync_file_range and fallocate merged 32 bit parameters into 64 bit parameters. Therefore they did more than just the usual zero and/or sign extension of system call parameters. So convert these two wrappers to full s390 specific compat sytem calls. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:47 +01:00
Heiko Carstens	00fcb1494f	s390/compat: convert system call wrappers to C part 15 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:46 +01:00
Heiko Carstens	7f6afe87a0	s390/compat: convert system call wrappers to C part 14 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:46 +01:00
Heiko Carstens	28798abc9c	s390/compat: convert system call wrappers to C part 13 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:45 +01:00
Heiko Carstens	20f7835c0e	s390/compat: convert system call wrappers to C part 12 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:45 +01:00
Heiko Carstens	9c4d62fab4	s390/compat: convert system call wrappers to C part 11 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:44 +01:00
Heiko Carstens	18421166e8	s390/compat: convert system call wrappers to C part 10 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:43 +01:00
Heiko Carstens	24e4c2aaef	s390/compat: convert system call wrappers to C part 09 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:43 +01:00
Heiko Carstens	47b3ae9b8c	s390/compat: convert system call wrappers to C part 08 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:42 +01:00
Heiko Carstens	0ebe3eec1e	s390/compat: convert system call wrappers to C part 07 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:42 +01:00
Heiko Carstens	ce5cef7ede	s390/compat: convert system call wrappers to C part 06 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:41 +01:00
Heiko Carstens	86d295e1cc	s390/compat: convert system call wrappers to C part 05 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:40 +01:00
Heiko Carstens	c355ce182a	s390/compat: convert system call wrappers to C part 04 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:40 +01:00
Heiko Carstens	be06fbf816	s390/compat: convert system call wrappers to C part 03 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:39 +01:00
Heiko Carstens	473a06572f	s390/compat: convert system call wrappers to C part 02 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:39 +01:00
Heiko Carstens	b07edab23c	s390/compat: convert system call wrappers to C part 01 Introduce a new compat_wrap.c file which contains the s390 specific compat system call wrappers. The s390 specific system call wrappers only perform sign, zero and pointer conversion of system call arguments before actually calling the non-compat system call. Therefore introduce COMPAT_SYSCALL_WRAPx macros which generate C code that is nearly identical to the assembly code. This has the advantage that the compile will generate correct code, and we avoid the frequent copy-paste errors seen in the compat_wrapper.S file. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:38 +01:00
Heiko Carstens	5383d2c8b3	s390/compat: convert to COMPAT_SYSCALL_DEFINEx part 7 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:38 +01:00
Heiko Carstens	a0f8c6da8f	s390/compat: convert to COMPAT_SYSCALL_DEFINEx part 6 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:37 +01:00
Heiko Carstens	52a0b536a3	s390/compat: convert to COMPAT_SYSCALL_DEFINEx part 5 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:36 +01:00
Heiko Carstens	e723e0cc17	s390/compat: convert to COMPAT_SYSCALL_DEFINEx part 4 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:36 +01:00
Heiko Carstens	4ca2ea58c8	s390/compat: convert to COMPAT_SYSCALL_DEFINEx part 3 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:35 +01:00
Heiko Carstens	208096eee2	s390/compat: convert to COMPAT_SYSCALL_DEFINEx part 2 Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:35 +01:00
Heiko Carstens	c6c0f58f90	s390/compat: convert to COMPAT_SYSCALL_DEFINEx part 1 Convert s390 specific system calls to to the new COMPAT_SYSCALL_DEFINE macro. This allows us to get rid of the assembly compat wrappers. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:34 +01:00
Heiko Carstens	0473c9b5f0	compat: let architectures define __ARCH_WANT_COMPAT_SYS_GETDENTS64 For architecture dependent compat syscalls in common code an architecture must define something like __ARCH_WANT_<WHATEVER> if it wants to use the code. This however is not true for compat_sys_getdents64 for which architectures must define __ARCH_OMIT_COMPAT_SYS_GETDENTS64 if they do not want the code. This leads to the situation where all architectures, except mips, get the compat code but only x86_64, arm64 and the generic syscall architectures actually use it. So invert the logic, so that architectures actively must do something to get the compat code. This way a couple of architectures get rid of otherwise dead code. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>	2014-03-04 09:05:33 +01:00
Linus Torvalds	3154da34be	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc fixes, most of them on the tooling side" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Fix strict alias issue for find_first_bit perf tools: fix BFD detection on opensuse perf: Fix hotplug splat perf/x86: Fix event scheduling perf symbols: Destroy unused symsrcs perf annotate: Check availability of annotate when processing samples	2014-03-02 11:37:07 -06:00
Linus Torvalds	55de1ed2f5	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "The VMCOREINFO patch I'll pushing for this release to avoid having a release with kASLR and but without that information. I was hoping to include the FPU patches from Suresh, but ran into a problem (see other thread); will try to make them happen next week" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, kaslr: add missed "static" declarations x86, kaslr: export offset in VMCOREINFO ELF notes	2014-03-01 22:48:14 -06:00
Linus Torvalds	d8efcf38b1	Three x86 fixes and one for ARM/ARM64. In particular, nested virtualization on Intel is broken in 3.13 and fixed by this pull request. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJTEIYBAAoJEBvWZb6bTYbyv7AP/iOE8ybTr7MSfZPTF4Ip13o+ rvzlnzUtDMsZAN5dZAXhsR3lPeXygjTmeI6FWdiVwdalp9wpg2FLAZ0BDj8KIg0r cAINYOmJ0jhC1mTOfMJghsrE9b1aaXwWVSlXkivzyIrPJCZFlqDKOXleHJXyqNTY g259tPI8VWS7Efell9NclUNXCdD4g6wG/RdEjumjv9JLiXlDVviXvzvIEl/S3Ud1 D2xxX7vvGHikyTwuls/bWzJzzRRlsb1VVOOQtBuC9NyaAY7bpQjGQvo6XzxowtIZ h4F4iU/umln5WcDiJU8XXiV/TOCVzqgdLk3Pr5Kgv3yO8/XbE/CcnyJmeaSgMoJB i7vJ6tUX5mfGsLNxfshXw0RsY/y9KMLnbt62eiPImWBxgpDZNxKpAqCA7GsOb87g Vjzl3poEwe+5eN6Usbpd78rRgfgbbZF+Pf2qsphtQhFQGaogz1Ltz0B0hY3MYxx3 y9OJMJyt1MI4+hvvdjhSnmIo6APwuGSr+hhdKCPSlMiWJun2XRHXTHBNAS+dsjgs Sx2Bzao/lki5l7y9Ea1fR4yerigbFJF4L1iV04sSbsoh0I/nN5qjXFrc22Ju0i3i uIrVwfSSdX4HQwQYdBGKQQRGq/W0wOjEDoA5qZmxg3s4j8KSd7ooBtRk/VepVH7E kaUrekJ+KWs/sVNW2MtU =zQTn -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "Three x86 fixes and one for ARM/ARM64. In particular, nested virtualization on Intel is broken in 3.13 and fixed by this pull request" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: kvm, vmx: Really fix lazy FPU on nested guest kvm: x86: fix emulator buffer overflow (CVE-2014-0049) arm/arm64: KVM: detect CPU reset on CPU_PM_EXIT KVM: MMU: drop read-only large sptes when creating lower level sptes	2014-02-28 11:45:03 -08:00
Linus Torvalds	78d9e93440	- !CONFIG_SMP build fix - pte bit testing macros conversion fix (int truncates top bits of long) - stack unwinding PC calculation fix -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iQIcBAABAgAGBQJTEMWxAAoJEGvWsS0AyF7xHWYP/R2Q8fpOqbT5j0RWghNMq97Y 9TnvQk7LZ2kztWxm3//iXIKkhyq+68ggzofNLYqIg9TPPVTeFUzatMO0iRMwLmL3 uvgDkD4UwgVAZQutP9q/NQhcHO5L6tH91oXGZPw0jAnK40VI8jBllZaJus8+FaGR LHhZ0gRU0V5r69956UKLFY7sHTdnskr9W4lsDAWpOJDaDHuriVK+BkvxHl+hHO0I 3YlYQ9FTMgUdq0nLL0h0LpT3WZVXdA3HzP2vReL9pk7uqghrYFUrrTlKu0c6kfrk 0inHoJESvw4+q1ehzJHkOxHpTeUFcuUyPLooAmS58wzKLKkoY3S9s2VGcwsS7Wp5 pRyq8RJmnY1dAh4El70xLu73NtKg0/z2+bo0UPCV6CLALa7daw9MFnb1P2zeXkHK QacOCgfVwNyli/o/IeWblbyyykA3J7e1J3UK5N5E4qOGZGOMIThKKUfKXvhJTVKq bhQKhe0sHP5V9D54EzZcGhWTTfsKXrUcG53SGgkMXGSFdZtJMa4HHzEe7dRwHc+R h5P4Zb53v9HEGYQjIiUVB3I0vrzAJmBE3rTR4/M8oAB2FRvNRSQQfnqQR4MZtA2i Oi2EMftNB/r+xzFabxCup/lnMa22PttTToAHDBcE6EHeOHveuuiEfaFbv16hoLub 05SeCU7CjMSMlt6ufB8d =kzwj -----END PGP SIGNATURE----- Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull ARM64 fixes from Catalin Marinas: - !CONFIG_SMP build fix - pte bit testing macros conversion fix (int truncates top bits of long) - stack unwinding PC calculation fix * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: Fix !CONFIG_SMP kernel build arm64: mm: Add double logical invert to pte accessors ARM64: unwind: Fix PC calculation	2014-02-28 11:43:42 -08:00
Linus Torvalds	f94def7602	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc fixes from Ben Herrenschmidt: "Here are a few more powerpc fixes for 3.14. Most of these are also CC'ed to stable and fix bugs in new functionality introduced in the last 2 or 3 versions" * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/powernv: Fix indirect XSCOM unmangling powerpc/powernv: Fix opal_xscom_{read,write} prototype powerpc/powernv: Refactor PHB diag-data dump powerpc/powernv: Dump PHB diag-data immediately powerpc: Increase stack redzone for 64-bit userspace to 512 bytes powerpc/ftrace: bugfix for test_24bit_addr powerpc/crashdump : Fix page frame number check in copy_oldmem_page powerpc/le: Ensure that the 'stop-self' RTAS token is handled correctly	2014-02-28 11:42:33 -08:00
Catalin Marinas	b57fc9e806	arm64: Fix !CONFIG_SMP kernel build Commit `fb4a96029c` (arm64: kernel: fix per-cpu offset restore on resume) uses per_cpu_offset() unconditionally during CPU wakeup, however, this is only defined for the SMP case. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Reported-by: Dave P Martin <Dave.Martin@arm.com>	2014-02-28 16:12:25 +00:00
Steve Capper	84fe6826c2	arm64: mm: Add double logical invert to pte accessors Page table entries on ARM64 are 64 bits, and some pte functions such as pte_dirty return a bitwise-and of a flag with the pte value. If the flag to be tested resides in the upper 32 bits of the pte, then we run into the danger of the result being dropped if downcast. For example: gather_stats(page, md, pte_dirty(pte), 1); where pte_dirty(pte) is downcast to an int. This patch adds a double logical invert to all the pte_ accessors to ensure predictable downcasting. Signed-off-by: Steve Capper <steve.capper@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>	2014-02-28 15:44:19 +00:00
Benjamin Herrenschmidt	e0cf957614	powerpc/powernv: Fix indirect XSCOM unmangling We need to unmangle the full address, not just the register number, and we also need to support the real indirect bit being set for in-kernel uses. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.13]	2014-02-28 19:15:49 +11:00
Benjamin Herrenschmidt	2f3f38e4d3	powerpc/powernv: Fix opal_xscom_{read,write} prototype The OPAL firmware functions opal_xscom_read and opal_xscom_write take a 64-bit argument for the XSCOM (PCB) address in order to support the indirect mode on P8. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.13]	2014-02-28 19:15:48 +11:00
Gavin Shan	af87d2fe95	powerpc/powernv: Refactor PHB diag-data dump As Ben suggested, the patch prints PHB diag-data with multiple fields in one line and omits the line if the fields of that line are all zero. With the patch applied, the PHB3 diag-data dump looks like: PHB3 PHB#3 Diag-data (Version: 1) brdgCtl: 00000002 RootSts: 0000000f 00400000 b0830008 00100147 00002000 nFir: 0000000000000000 0030006e00000000 0000000000000000 PhbSts: 0000001c00000000 0000000000000000 Lem: 0000000000100000 42498e327f502eae 0000000000000000 InAErr: 8000000000000000 8000000000000000 0402030000000000 0000000000000000 PE[ 8] A/B: 8480002b00000000 8000000000000000 [ The current diag data is so big that it overflows the printk buffer pretty quickly in cases when we get a handful of errors at once which can happen. --BenH ] Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> CC: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-28 18:43:19 +11:00
Gavin Shan	9471660437	powerpc/powernv: Dump PHB diag-data immediately The PHB diag-data is important to help locating the root cause for EEH errors such as frozen PE or fenced PHB. However, the EEH core enables IO path by clearing part of HW registers before collecting this data causing it to be corrupted. This patch fixes this by dumping the PHB diag-data immediately when frozen/fenced state on PE or PHB is detected for the first time in eeh_ops::get_state() or next_error() backend. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> CC: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-28 18:43:10 +11:00
Paul Mackerras	573ebfa660	powerpc: Increase stack redzone for 64-bit userspace to 512 bytes The new ELFv2 little-endian ABI increases the stack redzone -- the area below the stack pointer that can be used for storing data -- from 288 bytes to 512 bytes. This means that we need to allow more space on the user stack when delivering a signal to a 64-bit process. To make the code a bit clearer, we define new USER_REDZONE_SIZE and KERNEL_REDZONE_SIZE symbols in ptrace.h. For now, we leave the kernel redzone size at 288 bytes, since increasing it to 512 bytes would increase the size of interrupt stack frames correspondingly. Gcc currently only makes use of 288 bytes of redzone even when compiling for the new little-endian ABI, and the kernel cannot currently be compiled with the new ABI anyway. In the future, hopefully gcc will provide an option to control the amount of redzone used, and then we could reduce it even more. This also changes the code in arch_compat_alloc_user_space() to preserve the expanded redzone. It is not clear why this function would ever be used on a 64-bit process, though. Signed-off-by: Paul Mackerras <paulus@samba.org> CC: <stable@vger.kernel.org> [v3.13] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-28 18:06:26 +11:00
Liu Ping Fan	a95fc58549	powerpc/ftrace: bugfix for test_24bit_addr The branch target should be the func addr, not the addr of func_descr_t. So using ppc_function_entry() to generate the right target addr. Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-28 18:06:25 +11:00
Laurent Dufour	f5295bd8ea	powerpc/crashdump : Fix page frame number check in copy_oldmem_page In copy_oldmem_page, the current check using max_pfn and min_low_pfn to decide if the page is backed or not, is not valid when the memory layout is not continuous. This happens when running as a QEMU/KVM guest, where RTAS is mapped higher in the memory. In that case max_pfn points to the end of RTAS, and a hole between the end of the kdump kernel and RTAS is not backed by PTEs. As a consequence, the kdump kernel is crashing in copy_oldmem_page when accessing in a direct way the pages in that hole. This fix relies on the memblock's service memblock_is_region_memory to check if the read page is part or not of the directly accessible memory. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Tested-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> CC: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-28 18:06:25 +11:00
Tony Breeds	41dd03a94c	powerpc/le: Ensure that the 'stop-self' RTAS token is handled correctly Currently we're storing a host endian RTAS token in rtas_stop_self_args.token. We then pass that directly to rtas. This is fine on big endian however on little endian the token is not what we expect. This will typically result in hitting: panic("Alas, I survived.\n"); To fix this we always use the stop-self token in host order and always convert it to be32 before passing this to rtas. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-02-28 18:06:24 +11:00
Paolo Bonzini	1b385cbdd7	kvm, vmx: Really fix lazy FPU on nested guest Commit `e504c9098e` (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13) highlighted a real problem, but the fix was subtly wrong. nested_read_cr0 is the CR0 as read by L2, but here we want to look at the CR0 value reflecting L1's setup. In other words, L2 might think that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually running it with TS=1, we should inject the fault into L1. The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use it. Fixes: `e504c9098e` Reported-by: Kashyap Chamarty <kchamart@redhat.com> Reported-by: Stefan Bader <stefan.bader@canonical.com> Tested-by: Kashyap Chamarty <kchamart@redhat.com> Tested-by: Anthoine Bourgeois <bourgeois@bertin.fr> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-27 22:54:11 +01:00
Andrew Honig	a08d3b3b99	kvm: x86: fix emulator buffer overflow (CVE-2014-0049) The problem occurs when the guest performs a pusha with the stack address pointing to an mmio address (or an invalid guest physical address) to start with, but then extending into an ordinary guest physical address. When doing repeated emulated pushes emulator_read_write sets mmio_needed to 1 on the first one. On a later push when the stack points to regular memory, mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0. As a result, KVM exits to userspace, and then returns to complete_emulated_mmio. In complete_emulated_mmio vcpu->mmio_cur_fragment is incremented. The termination condition of vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved. The code bounces back and fourth to userspace incrementing mmio_cur_fragment past it's buffer. If the guest does nothing else it eventually leads to a a crash on a memcpy from invalid memory address. However if a guest code can cause the vm to be destroyed in another vcpu with excellent timing, then kvm_clear_async_pf_completion_queue can be used by the guest to control the data that's pointed to by the call to cancel_work_item, which can be used to gain execution. Fixes: `f78146b0f9` Signed-off-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org (3.5+) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-27 19:35:22 +01:00
Marc Zyngier	b20c9f29c5	arm/arm64: KVM: detect CPU reset on CPU_PM_EXIT Commit `1fcf7ce0c6` (arm: kvm: implement CPU PM notifier) added support for CPU power-management, using a cpu_notifier to re-init KVM on a CPU that entered CPU idle. The code assumed that a CPU entering idle would actually be powered off, loosing its state entierely, and would then need to be reinitialized. It turns out that this is not always the case, and some HW performs CPU PM without actually killing the core. In this case, we try to reinitialize KVM while it is still live. It ends up badly, as reported by Andre Przywara (using a Calxeda Midway): [ 3.663897] Kernel panic - not syncing: unexpected prefetch abort in Hyp mode at: 0x685760 [ 3.663897] unexpected data abort in Hyp mode at: 0xc067d150 [ 3.663897] unexpected HVC/SVC trap in Hyp mode at: 0xc0901dd0 The trick here is to detect if we've been through a full re-init or not by looking at HVBAR (VBAR_EL2 on arm64). This involves implementing the backend for __hyp_get_vectors in the main KVM HYP code (rather small), and checking the return value against the default one when the CPU notifier is called on CPU_PM_EXIT. Reported-by: Andre Przywara <osp@andrep.de> Tested-by: Andre Przywara <osp@andrep.de> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Rob Herring <rob.herring@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-27 19:27:10 +01:00
Peter Zijlstra	26e61e8939	perf/x86: Fix event scheduling Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures, with perf WARN_ON()s triggering. He also provided traces of the failures. This is I think the relevant bit: > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_disable: x86_pmu_disable > pec_1076_warn-2804 [000] d... 147.926153: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926156: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926158: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926159: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1 > pec_1076_warn-2804 [000] d... 147.926161: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926162: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926163: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926166: collect_events: Adding event: 1 (ffff880119ec8800) So we add the insn:p event (fd[23]). At this point we should have: n_events = 2, n_added = 1, n_txn = 1 > pec_1076_warn-2804 [000] d... 147.926170: collect_events: Adding event: 0 (ffff8800c9e01800) > pec_1076_warn-2804 [000] d... 147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00) We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]). These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so that's not visible. group_sched_in() pmu->start_txn() /* nop - BP pmu */ event_sched_in() event->pmu->add() So here we should end up with: 0: n_events = 3, n_added = 2, n_txn = 2 4: n_events = 4, n_added = 3, n_txn = 3 But seeing the below state on x86_pmu_enable(), the must have failed, because the 0 and 4 events aren't there anymore. Looking at group_sched_in(), since the BP is the leader, its event_sched_in() must have succeeded, for otherwise we would not have seen the sibling adds. But since neither 0 or 4 are in the below state; their event_sched_in() must have failed; but I don't see why, the complete state: 0,0,1:p,4 fits perfectly fine on a core2. However, since we try and schedule 4 it means the 0 event must have succeeded! Therefore the 4 event must have failed, its failure will have put group_sched_in() into the fail path, which will call: event_sched_out() event->pmu->del() on 0 and the BP event. Now x86_pmu_del() will reduce n_events; but it will not reduce n_added; giving what we see below: n_event = 2, n_added = 2, n_txn = 2 > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_enable: x86_pmu_enable > pec_1076_warn-2804 [000] d... 147.926177: x86_pmu_state: Events: { > pec_1076_warn-2804 [000] d... 147.926179: x86_pmu_state: 0: state: .R config: ffffffffffffffff ( (null)) > pec_1076_warn-2804 [000] d... 147.926181: x86_pmu_state: 33: state: AR config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926182: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2 > pec_1076_warn-2804 [000] d... 147.926184: x86_pmu_state: Assignment: { > pec_1076_warn-2804 [000] d... 147.926186: x86_pmu_state: 0->33 tag: 1 config: 0 (ffff88011ac99800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: 1->0 tag: 1 config: 1 (ffff880119ec8800) > pec_1076_warn-2804 [000] d... 147.926188: x86_pmu_state: } > pec_1076_warn-2804 [000] d... 147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0 So the problem is that x86_pmu_del(), when called from a group_sched_in() that fails (for whatever reason), and without x86_pmu TXN support (because the leader is !x86_pmu), will corrupt the n_added state. Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Stephane Eranian <eranian@google.com> Cc: Dave Jones <davej@redhat.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-02-27 12:38:02 +01:00
Marcelo Tosatti	404381c583	KVM: MMU: drop read-only large sptes when creating lower level sptes Read-only large sptes can be created due to read-only faults as follows: - QEMU pagetable entry that maps guest memory is read-only due to COW. - Guest read faults such memory, COW is not broken, because it is a read-only fault. - Enable dirty logging, large spte not nuked because it is read-only. - Write-fault on such memory causes guest to loop endlessly (which must go down to level 1 because dirty logging is enabled). Fix by dropping large spte when necessary. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2014-02-26 17:23:32 +01:00
Kees Cook	e290e8c59d	x86, kaslr: add missed "static" declarations This silences build warnings about unexported variables and functions. Signed-off-by: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/20140209215644.GA30339@www.outflux.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-02-25 16:59:29 -08:00
Eugene Surovegin	b6085a8657	x86, kaslr: export offset in VMCOREINFO ELF notes Include kASLR offset in VMCOREINFO ELF notes to assist in debugging. [ hpa: pushing this for v3.14 to avoid having a kernel version with kASLR where we can't debug output. ] Signed-off-by: Eugene Surovegin <surovegin@google.com> Link: http://lkml.kernel.org/r/20140123173120.GA25474@www.outflux.net Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-02-25 16:57:47 -08:00

1 2 3 4 5 ...

92605 Commits