linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-28 11:18:45 +07:00

Author	SHA1	Message	Date
Philipp Rudo	8da0b72495	kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load The current code uses the sh_offset field in purgatory_info->sechdrs to store a pointer to the current load address of the section. Depending whether the section will be loaded or not this is either a pointer into purgatory_info->purgatory_buf or kexec_purgatory. This is not only a violation of the ELF standard but also makes the code very hard to understand as you cannot tell if the memory you are using is read-only or not. Remove this misuse and store the offset of the section in pugaroty_info->purgatory_buf in sh_offset. Link: http://lkml.kernel.org/r/20180321112751.22196-10-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-13 17:10:28 -07:00
Philipp Rudo	8aec395b84	kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations* When the relocations are applied to the purgatory only the section the relocations are applied to is writable. The other sections, i.e. the symtab and .rel/.rela, are in read-only kexec_purgatory. Highlight this by marking the corresponding variables as 'const'. While at it also change the signatures of arch_kexec_apply_relocations* to take section pointers instead of just the index of the relocation section. This removes the second lookup and sanity check of the sections in arch code. Link: http://lkml.kernel.org/r/20180321112751.22196-6-prudo@linux.vnet.ibm.com Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: Dave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-13 17:10:28 -07:00
AKASHI Takahiro	babac4a84a	kexec_file, x86: move re-factored code to generic side In the previous patches, commonly-used routines, exclude_mem_range() and prepare_elf64_headers(), were carved out. Now place them in kexec common code. A prefix "crash_" is given to each of their names to avoid possible name collisions. Link: http://lkml.kernel.org/r/20180306102303.9063-8-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-13 17:10:27 -07:00
AKASHI Takahiro	eb7dae947e	x86: kexec_file: clean up prepare_elf64_headers() Removing bufp variable in prepare_elf64_headers() makes the code simpler and more understandable. Link: http://lkml.kernel.org/r/20180306102303.9063-7-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-13 17:10:27 -07:00
AKASHI Takahiro	8d5f894a31	x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer While CRASH_MAX_RANGES (== 16) seems to be good enough, fixed-number array is not a good idea in general. In this patch, size of crash_mem buffer is calculated as before and the buffer is now dynamically allocated. This change also allows removing crash_elf_data structure. Link: http://lkml.kernel.org/r/20180306102303.9063-6-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-13 17:10:27 -07:00
AKASHI Takahiro	c72c7e6709	x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers() The code guarded by CONFIG_X86_64 is necessary on some architectures which have a dedicated kernel mapping outside of linear memory mapping. (arm64 is among those.) In this patch, an additional argument, kernel_map, is added to enable/ disable the code removing #ifdef. Link: http://lkml.kernel.org/r/20180306102303.9063-5-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-13 17:10:27 -07:00
AKASHI Takahiro	cbe6601617	x86: kexec_file: purge system-ram walking from prepare_elf64_headers() While prepare_elf64_headers() in x86 looks pretty generic for other architectures' use, it contains some code which tries to list crash memory regions by walking through system resources, which is not always architecture agnostic. To make this function more generic, the related code should be purged. In this patch, prepare_elf64_headers() simply scans crash_mem buffer passed and add all the listed regions to elf header as a PT_LOAD segment. So walk_system_ram_res(prepare_elf64_headers_callback) have been moved forward before prepare_elf64_headers() where the callback, prepare_elf64_headers_callback(), is now responsible for filling up crash_mem buffer. Meanwhile exclude_elf_header_ranges() used to be called every time in this callback it is rather redundant and now called only once in prepare_elf_headers() as well. Link: http://lkml.kernel.org/r/20180306102303.9063-4-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-13 17:10:27 -07:00
AKASHI Takahiro	9ec4ecef0a	kexec_file,x86,powerpc: factor out kexec_file_ops functions As arch_kexec_kernel_image_{probe,load}(), arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig() are almost duplicated among architectures, they can be commonalized with an architecture-defined kexec_file_ops array. So let's factor them out. Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-13 17:10:27 -07:00
AKASHI Takahiro	b799a09f63	kexec_file: make use of purgatory optional Patch series "kexec_file, x86, powerpc: refactoring for other architecutres", v2. This is a preparatory patchset for adding kexec_file support on arm64. It was originally included in a arm64 patch set[1], but Philipp is also working on their kexec_file support on s390[2] and some changes are now conflicting. So these common parts were extracted and put into a separate patch set for better integration. What's more, my original patch#4 was split into a few small chunks for easier review after Dave's comment. As such, the resulting code is basically identical with my original, and the only visible differences are: - renaming of _kexec_kernel_image_probe() and _kimage_file_post_load_cleanup() - change one of types of arguments at prepare_elf64_headers() Those, unfortunately, require a couple of trivial changes on the rest (#1, #6 to #13) of my arm64 kexec_file patch set[1]. Patch #1 allows making a use of purgatory optional, particularly useful for arm64. Patch #2 commonalizes arch_kexec_kernel_{image_probe, image_load, verify_sig}() and arch_kimage_file_post_load_cleanup() across architectures. Patches #3-#7 are also intended to generalize parse_elf64_headers(), along with exclude_mem_range(), to be made best re-use of. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/561182.html [2] http://lkml.iu.edu//hypermail/linux/kernel/1802.1/02596.html This patch (of 7): On arm64, crash dump kernel's usable memory is protected by unmapping it from kernel virtual space unlike other architectures where the region is just made read-only. It is highly unlikely that the region is accidentally corrupted and this observation rationalizes that digest check code can also be dropped from purgatory. The resulting code is so simple as it doesn't require a bit ugly re-linking/relocation stuff, i.e. arch_kexec_apply_relocations_add(). Please see: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545428.html All that the purgatory does is to shuffle arguments and jump into a new kernel, while we still need to have some space for a hash value (purgatory_sha256_digest) which is never checked against. As such, it doesn't make sense to have trampline code between old kernel and new kernel on arm64. This patch introduces a new configuration, ARCH_HAS_KEXEC_PURGATORY, and allows related code to be compiled in only if necessary. [takahiro.akashi@linaro.org: fix trivial screwup] Link: http://lkml.kernel.org/r/20180309093346.GF25863@linaro.org Link: http://lkml.kernel.org/r/20180306102303.9063-2-takahiro.akashi@linaro.org Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: Dave Young <dyoung@redhat.com> Tested-by: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-13 17:10:27 -07:00
Linus Torvalds	681857ef0d	Merge branch 'parisc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: - fix panic when halting system via "shutdown -h now" - drop own coding in favour of generic CONFIG_COMPAT_BINFMT_ELF implementation - add FPE_CONDTRAP constant: last outstanding parisc-specific cleanup for Eric Biedermans siginfo patches - move some functions to .init and some to .text.hot linker sections * 'parisc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Prevent panic at system halt parisc: Switch to generic COMPAT_BINFMT_ELF parisc: Move cache flush functions into .text.hot section parisc/signal: Add FPE_CONDTRAP for conditional trap handling	2018-04-12 17:07:04 -07:00
Linus Torvalds	67a7a8fff8	xen: fixes for 4.17-rc1 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEhRJncuj2BJSl0Jf3sN6d1ii/Ey8FAlrPnM8ACgkQsN6d1ii/ Ey9Kzwf/eQVb6zzn7FDHAb6pLaZ5i2xi2xohsKmhAVQIEa94rZ3mLoRegtnIfyjO RcjjSAzHSZO9NQgNA2ALdu6bBdzu4/ywQEQCnY2Gqxp0ocG/+k3p/FqLHZGdcqPo e3gpcVxHSFWUCCGm1t3umI25driqrUq4xa6UFi2IB4djDvTrK/JsSygKx6GiVujL 2eV7v7rgqaaVZQyo8iOd+LlWuKZewKLfnALUDC21X5J2HmvfoyTdn85kldzbiIsG YR7mcfgAtAVTyCfgXI3eqAGpRFEyqR4ga87oahdV3/iW+4wreh4hm2Xd/IETXklv Epxyet8IlMB9886PuZhZqgnW6o1RDA== =z3bP -----END PGP SIGNATURE----- Merge tag 'for-linus-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "A few fixes of Xen related core code and drivers" * tag 'for-linus-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pvh: Indicate XENFEAT_linux_rsdp_unrestricted to Xen xen/acpi: off by one in read_acpi_id() xen/acpi: upload _PSD info for non Dom0 CPUs too x86/xen: Delay get_cpu_cap until stack canary is established xen: xenbus_dev_frontend: Verify body of XS_TRANSACTION_END xen: xenbus: Catch closing of non existent transactions xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling	2018-04-12 11:04:35 -07:00
Krish Sadhukhan	f0f4cf5b30	x86: Add check for APIC access address for vmentry of L2 guests According to the sub-section titled 'VM-Execution Control Fields' in the section titled 'Basic VM-Entry Checks' in Intel SDM vol. 3C, the following vmentry check must be enforced: If the 'virtualize APIC-accesses' VM-execution control is 1, the APIC-access address must satisfy the following checks: - Bits 11:0 of the address must be 0. - The address should not set any bits beyond the processor's physical-address width. This patch adds the necessary check to conform to this rule. If the check fails, we cause the L2 VMENTRY to fail which is what the associated unit test (following patch) expects. Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-04-12 18:36:28 +02:00
Arnaldo Carvalho de Melo	fd97d39b0a	Revert "x86/asm: Allow again using asm.h when building for the 'bpf' clang target" This reverts commit `ca26cffa4e`. Newer clang versions accept that asm(_ASM_SP) construct, and now that the bpf-script-test-kbuild.c script, used in one of the 'perf test LLVM' subtests doesn't include ptrace.h, which ended up including arch/x86/include/asm/asm.h, we can revert this patch. Suggested-by: Yonghong Song <yhs@fb.com> Link: https://lkml.kernel.org/r/613f0a0d-c433-8f4d-dcc1-c9889deae39e@fb.com Acked-by: Yonghong Song <yhs@fb.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Ahern <dsahern@gmail.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wang Nan <wangnan0@huawei.com> Link: https://lkml.kernel.org/n/tip-nqozcv8loq40tkqpfw997993@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2018-04-12 10:33:27 -03:00
Ingo Molnar	ef389b7346	Merge branch 'WIP.x86/asm' into x86/urgent, because the topic is ready Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-12 09:42:34 +02:00
Joerg Roedel	e3e2881214	x86/pgtable: Don't set huge PUD/PMD on non-leaf entries The pmd_set_huge() and pud_set_huge() functions are used from the generic ioremap() code to establish large mappings where this is possible. But the generic ioremap() code does not check whether the PMD/PUD entries are already populated with a non-leaf entry, so that any page-table pages these entries point to will be lost. Further, on x86-32 with SHARED_KERNEL_PMD=0, this causes a BUG_ON() in vmalloc_sync_one() when PMD entries are synced from swapper_pg_dir to the current page-table. This happens because the PMD entry from swapper_pg_dir was promoted to a huge-page entry while the current PGD still contains the non-leaf entry. Because both entries are present and point to a different page, the BUG_ON() triggers. This was actually triggered with pti-x32 enabled in a KVM virtual machine by the graphics driver. A real and better fix for that would be to improve the page-table handling in the generic ioremap() code. But that is out-of-scope for this patch-set and left for later work. Reported-by: David H. Gutteridge <dhgutteridge@sympatico.ca> Signed-off-by: Joerg Roedel <jroedel@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Waiman Long <llong@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180411152437.GC15462@8bytes.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-12 09:41:41 +02:00
Dave Hansen	8c06c7740d	x86/pti: Leave kernel text global for !PCID Global pages are bad for hardening because they potentially let an exploit read the kernel image via a Meltdown-style attack which makes it easier to find gadgets. But, global pages are good for performance because they reduce TLB misses when making user/kernel transitions, especially when PCIDs are not available, such as on older hardware, or where a hypervisor has disabled them for some reason. This patch implements a basic, sane policy: If you have PCIDs, you only map a minimal amount of kernel text global. If you do not have PCIDs, you map all kernel text global. This policy effectively makes PCIDs something that not only adds performance but a little bit of hardening as well. I ran a simple "lseek" microbenchmark[1] to test the benefit on a modern Atom microserver. Most of the benefit comes from applying the series before this patch ("entry only"), but there is still a signifiant benefit from this patch. No Global Lines (baseline ): 6077741 lseeks/sec 88 Global Lines (entry only): 7528609 lseeks/sec (+23.9%) 94 Global Lines (this patch): 8433111 lseeks/sec (+38.8%) [1.] https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205518.E3D989EB@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-12 09:06:00 +02:00
Dave Hansen	39114b7a74	x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image Summary: In current kernels, with PTI enabled, no pages are marked Global. This potentially increases TLB misses. But, the mechanism by which the Global bit is set and cleared is rather haphazard. This patch makes the process more explicit. In the end, it leaves us with Global entries in the page tables for the areas truly shared by userspace and kernel and increases TLB hit rates. The place this patch really shines in on systems without PCIDs. In this case, we are using an lseek microbenchmark[1] to see how a reasonably non-trivial syscall behaves. Higher is better: No Global pages (baseline): 6077741 lseeks/sec 88 Global Pages (this set): 7528609 lseeks/sec (+23.9%) On a modern Skylake desktop with PCIDs, the benefits are tangible, but not huge for a kernel compile (lower is better): No Global pages (baseline): 186.951 seconds time elapsed ( +- 0.35% ) 28 Global pages (this set): 185.756 seconds time elapsed ( +- 0.09% ) -1.195 seconds (-0.64%) I also re-checked everything using the lseek1 test[1]: No Global pages (baseline): 15783951 lseeks/sec 28 Global pages (this set): 16054688 lseeks/sec +270737 lseeks/sec (+1.71%) The effect is more visible, but still modest. Details: The kernel page tables are inherited from head_64.S which rudely marks them as _PAGE_GLOBAL. For PTI, we have been relying on the grace of $DEITY and some insane behavior in pageattr.c to clear _PAGE_GLOBAL. This patch tries to do better. First, stop filtering out "unsupported" bits from being cleared in the pageattr code. It's fine to filter out setting these bits but it is insane to keep us from clearing them. Then, explicitly go clear _PAGE_GLOBAL from the kernel identity map. Do not rely on pageattr to do it magically. After this patch, we can see that "GLB" shows up in each copy of the page tables, that we have the same number of global entries in each and that they are the same entries. /sys/kernel/debug/page_tables/current_kernel:11 /sys/kernel/debug/page_tables/current_user:11 /sys/kernel/debug/page_tables/kernel:11 9caae8ad6a1fb53aca2407ec037f612d current_kernel.GLB 9caae8ad6a1fb53aca2407ec037f612d current_user.GLB 9caae8ad6a1fb53aca2407ec037f612d kernel.GLB A quick visual audit also shows that all the entries make sense. 0xfffffe0000000000 is the cpu_entry_area and 0xffffffff81c00000 is the entry/exit text: 0xfffffe0000000000-0xfffffe0000002000 8K ro GLB NX pte 0xfffffe0000002000-0xfffffe0000003000 4K RW GLB NX pte 0xfffffe0000003000-0xfffffe0000006000 12K ro GLB NX pte 0xfffffe0000006000-0xfffffe0000007000 4K ro GLB x pte 0xfffffe0000007000-0xfffffe000000d000 24K RW GLB NX pte 0xfffffe000002d000-0xfffffe000002e000 4K ro GLB NX pte 0xfffffe000002e000-0xfffffe000002f000 4K RW GLB NX pte 0xfffffe000002f000-0xfffffe0000032000 12K ro GLB NX pte 0xfffffe0000032000-0xfffffe0000033000 4K ro GLB x pte 0xfffffe0000033000-0xfffffe0000039000 24K RW GLB NX pte 0xffffffff81c00000-0xffffffff81e00000 2M ro PSE GLB x pmd [1.] https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205517.C80FBE05@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-12 09:06:00 +02:00
Dave Hansen	0f561fce4d	x86/pti: Enable global pages for shared areas The entry/exit text and cpu_entry_area are mapped into userspace and the kernel. But, they are not _PAGE_GLOBAL. This creates unnecessary TLB misses. Add the _PAGE_GLOBAL flag for these areas. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205515.2977EE7D@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-12 09:05:59 +02:00
Dave Hansen	639d6aafe4	x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init __ro_after_init data gets stuck in the .rodata section. That's normally fine because the kernel itself manages the R/W properties. But, if we run __change_page_attr() on an area which is __ro_after_init, the .rodata checks will trigger and force the area to be immediately read-only, even if it is early-ish in boot. This caused problems when trying to clear the _PAGE_GLOBAL bit for these area in the PTI code: it cleared _PAGE_GLOBAL like I asked, but also took it up on itself to clear _PAGE_RW. The kernel then oopses the next time it wrote to a __ro_after_init data structure. To fix this, add the kernel_set_to_readonly check, just like we have for kernel text, just a few lines below in this function. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205514.8D898241@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-12 09:05:59 +02:00
Dave Hansen	430d4005b8	x86/mm: Comment _PAGE_GLOBAL mystery I was mystified as to where the _PAGE_GLOBAL in the kernel page tables for kernel text came from. I audited all the places I could find, but I missed one: head_64.S. The page tables that we create in here live for a long time, and they also have _PAGE_GLOBAL set, despite whether the processor supports it or not. It's harmless, and we got lucky that the pageattr code accidentally clears it when we wipe it out of __supported_pte_mask and then later try to mark kernel text read-only. Comment some of these properties to make it easier to find and understand in the future. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205513.079BB265@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-12 09:05:58 +02:00
Dave Hansen	1a54420aeb	x86/mm: Remove extra filtering in pageattr code The pageattr code has a mode where it can set or clear PTE bits in existing PTEs, so the page protections of the new PTEs come from one of two places: 1. The set/clear masks: cpa->mask_clr / cpa->mask_set 2. The existing PTE We filter ->mask_set/clr for supported PTE bits at entry to __change_page_attr() so we never need to filter them again. The only other place permissions can come from is an existing PTE and those already presumably have good bits. We do not need to filter them again. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205511.BC072352@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-12 09:05:58 +02:00
Dave Hansen	fb43d6cb91	x86/mm: Do not auto-massage page protections A PTE is constructed from a physical address and a pgprotval_t. __PAGE_KERNEL, for instance, is a pgprot_t and must be converted into a pgprotval_t before it can be used to create a PTE. This is done implicitly within functions like pfn_pte() by massage_pgprot(). However, this makes it very challenging to set bits (and keep them set) if your bit is being filtered out by massage_pgprot(). This moves the bit filtering out of pfn_pte() and friends. For users of PAGE_KERNEL, filtering will be done automatically inside those macros but for users of __PAGE_KERNEL, they need to do their own filtering now. Note that we also just move pfn_pte/pmd/pud() over to check_pgprot() instead of massage_pgprot(). This way, we still look for unsupported bits and properly warn about them if we find them. This might happen if an unfiltered __PAGE_KERNEL* value was passed in, for instance. - printk format warning fix from: Arnd Bergmann <arnd@arndb.de> - boot crash fix from: Tom Lendacky <thomas.lendacky@amd.com> - crash bisected by: Mike Galbraith <efault@gmx.de> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reported-and-fixed-by: Arnd Bergmann <arnd@arndb.de> Fixed-by: Tom Lendacky <thomas.lendacky@amd.com> Bisected-by: Mike Galbraith <efault@gmx.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205509.77E1D7F6@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-12 09:04:22 +02:00
Linus Torvalds	1fe43114ea	More power management updates for 4.17-rc1 - Rework the idle loop in order to prevent CPUs from spending too much time in shallow idle states by making it stop the scheduler tick before putting the CPU into an idle state only if the idle duration predicted by the idle governor is long enough. That required the code to be reordered to invoke the idle governor before stopping the tick, among other things (Rafael Wysocki, Frederic Weisbecker, Arnd Bergmann). - Add the missing description of the residency sysfs attribute to the cpuidle documentation (Prashanth Prakash). - Finalize the cpufreq cleanup moving frequency table validation from drivers to the core (Viresh Kumar). - Fix a clock leak regression in the armada-37xx cpufreq driver (Gregory Clement). - Fix the initialization of the CPU performance data structures for shared policies in the CPPC cpufreq driver (Shunyong Yang). - Clean up the ti-cpufreq, intel_pstate and CPPC cpufreq drivers a bit (Viresh Kumar, Rafael Wysocki). - Mark the expected switch fall-throughs in the PM QoS core (Gustavo Silva). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJazfv7AAoJEILEb/54YlRx/kYP+gPOX5O5cFF22Y2xvDHPMWjm D/3Nc2aRo+5DuHHECSIJ3ZVQzVoamN5zQ1KbsBRV0bJgwim4fw4M199Jr/0I2nES 1pkByuxLrAtwb83uX3uBIQnwgKOAwRftOTeVaFaMoXgIbyUqK7ZFkGq0xQTnKqor 6+J+78O7wMaIZ0YXQP98BC6g96vs/f+ICrh7qqY85r4NtO/thTA1IKevBmlFeIWR yVhEYgwSFBaWehKK8KgbshmBBEk3qzDOYfwZF/JprPhiN/6madgHgYjHC8Seok5c QUUTRlyO1ULTQe4JulyJUKobx7HE9u/FXC0RjbBiKPnYR4tb9Hd8OpajPRZo96AT 8IQCdzL2Iw/ZyQsmQZsWeO1HwPTwVlF/TO2gf6VdQtH221izuHG025p8/RcZe6zb fTTFhh6/tmBvmOlbKMwxaLbGbwcj/5W5GvQXlXAtaElLobwwNEcEyVfF4jo4Zx/U DQc7agaAps67lcgFAqNDy0PoU6bxV7yoiAIlTJHO9uyPkDNyIfb0ZPlmdIi3xYZd tUD7C+VBezrNCkw7JWL1xXLFfJ5X7K6x5bi9I7TBj1l928Hak0dwzs7KlcNBtF1Y SwnJsNa3kxunGsPajya8dy5gdO0aFeB9Bse0G429+ugk2IJO/Q9M9nQUArJiC9Xl Gw1bw5Ynv6lx+r5EqxHa =Pnk4 -----END PGP SIGNATURE----- Merge tag 'pm-4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These include one big-ticket item which is the rework of the idle loop in order to prevent CPUs from spending too much time in shallow idle states. It reduces idle power on some systems by 10% or more and may improve performance of workloads in which the idle loop overhead matters. This has been in the works for several weeks and it has been tested and reviewed quite thoroughly. Also included are changes that finalize the cpufreq cleanup moving frequency table validation from drivers to the core, a few fixes and cleanups of cpufreq drivers, a cpuidle documentation update and a PM QoS core update to mark the expected switch fall-throughs in it. Specifics: - Rework the idle loop in order to prevent CPUs from spending too much time in shallow idle states by making it stop the scheduler tick before putting the CPU into an idle state only if the idle duration predicted by the idle governor is long enough. That required the code to be reordered to invoke the idle governor before stopping the tick, among other things (Rafael Wysocki, Frederic Weisbecker, Arnd Bergmann). - Add the missing description of the residency sysfs attribute to the cpuidle documentation (Prashanth Prakash). - Finalize the cpufreq cleanup moving frequency table validation from drivers to the core (Viresh Kumar). - Fix a clock leak regression in the armada-37xx cpufreq driver (Gregory Clement). - Fix the initialization of the CPU performance data structures for shared policies in the CPPC cpufreq driver (Shunyong Yang). - Clean up the ti-cpufreq, intel_pstate and CPPC cpufreq drivers a bit (Viresh Kumar, Rafael Wysocki). - Mark the expected switch fall-throughs in the PM QoS core (Gustavo Silva)" * tag 'pm-4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits) tick-sched: avoid a maybe-uninitialized warning cpufreq: Drop cpufreq_table_validate_and_show() cpufreq: SCMI: Don't validate the frequency table twice cpufreq: CPPC: Initialize shared perf capabilities of CPUs cpufreq: armada-37xx: Fix clock leak cpufreq: CPPC: Don't set transition_latency cpufreq: ti-cpufreq: Use builtin_platform_driver() cpufreq: intel_pstate: Do not include debugfs.h PM / QoS: mark expected switch fall-throughs cpuidle: Add definition of residency to sysfs documentation time: hrtimer: Use timerqueue_iterate_next() to get to the next timer nohz: Avoid duplication of code related to got_idle_tick nohz: Gather tick_sched booleans under a common flag field cpuidle: menu: Avoid selecting shallow states with stopped tick cpuidle: menu: Refine idle state selection for running tick sched: idle: Select idle state before stopping the tick time: hrtimer: Introduce hrtimer_next_event_without() time: tick-sched: Split tick_nohz_stop_sched_tick() cpuidle: Return nohz hint from cpuidle_select() jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC ...	2018-04-11 17:03:20 -07:00
Linus Torvalds	375479c386	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - a new and faster epoll based IRQ controller and NIC driver - misc fixes and janitorial updates * git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: Fix vector raw inintialization logic Migrate vector timers to new timer API um: Compile with modern headers um: vector: Fix an error handling path in 'vector_parse()' um: vector: Fix a memory allocation check um: vector: fix missing unlock on error in vector_net_open() um: Add missing EXPORT for free_irq_by_fd() High Performance UML Vector Network Driver Epoll based IRQ controller um: Use POSIX ucontext_t instead of struct ucontext um: time: Use timespec64 for persistent clock um: Restore symbol versions for __memcpy and memcpy	2018-04-11 16:36:47 -07:00
Pavel Tatashin	6f84f8d158	xen, mm: allow deferred page initialization for xen pv domains Juergen Gross noticed that commit `f7f99100d8` ("mm: stop zeroing memory during allocation in vmemmap") broke XEN PV domains when deferred struct page initialization is enabled. This is because the xen's PagePinned() flag is getting erased from struct pages when they are initialized later in boot. Juergen fixed this problem by disabling deferred pages on xen pv domains. It is desirable, however, to have this feature available as it reduces boot time. This fix re-enables the feature for pv-dmains, and fixes the problem the following way: The fix is to delay setting PagePinned flag until struct pages for all allocated memory are initialized, i.e. until after free_all_bootmem(). A new x86_init.hyper op init_after_bootmem() is called to let xen know that boot allocator is done, and hence struct pages for all the allocated memory are now initialized. If deferred page initialization is enabled, the rest of struct pages are going to be initialized later in boot once page_alloc_init_late() is called. xen_after_bootmem() walks page table's pages and marks them pinned. Link: http://lkml.kernel.org/r/20180226160112.24724-2-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Tested-by: Juergen Gross <jgross@suse.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Alok Kataria <akataria@vmware.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andy Lutomirski <luto@kernel.org> Cc: Laura Abbott <labbott@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Mathias Krause <minipli@googlemail.com> Cc: Jinbum Park <jinb.park7@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Baoquan He <bhe@redhat.com> Cc: Jia Zhang <zhang.jia@linux.alibaba.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-11 10:28:38 -07:00
Kees Cook	8f2af155b5	exec: pass stack rlimit into mm layout functions Patch series "exec: Pin stack limit during exec". Attempts to solve problems with the stack limit changing during exec continue to be frustrated[1][2]. In addition to the specific issues around the Stack Clash family of flaws, Andy Lutomirski pointed out[3] other places during exec where the stack limit is used and is assumed to be unchanging. Given the many places it gets used and the fact that it can be manipulated/raced via setrlimit() and prlimit(), I think the only way to handle this is to move away from the "current" view of the stack limit and instead attach it to the bprm, and plumb this down into the functions that need to know the stack limits. This series implements the approach. [1] `04e35f4495` ("exec: avoid RLIMIT_STACK races with prlimit()") [2] `779f4e1c6c` ("Revert "exec: avoid RLIMIT_STACK races with prlimit()"") [3] to security@kernel.org, "Subject: existing rlimit races?" This patch (of 3): Since it is possible that the stack rlimit can change externally during exec (either via another thread calling setrlimit() or another process calling prlimit()), provide a way to pass the rlimit down into the per-architecture mm layout functions so that the rlimit can stay in the bprm structure instead of sitting in the signal structure until exec is finalized. Link: http://lkml.kernel.org/r/1518638796-20819-2-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Willy Tarreau <w@1wt.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Rik van Riel <riel@redhat.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Greg KH <greg@kroah.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Cc: Brad Spengler <spender@grsecurity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-11 10:28:37 -07:00
hu huajun	2698d82e51	KVM: X86: fix incorrect reference of trace_kvm_pi_irte_update In arch/x86/kvm/trace.h, this function is declared as host_irq the first input, and vcpu_id the second, instead of otherwise. Signed-off-by: hu huajun <huhuajun@linux.alibaba.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-04-11 13:34:48 +02:00
Rafael J. Wysocki	51798deaff	Merge branches 'pm-cpuidle' and 'pm-qos' * pm-cpuidle: tick-sched: avoid a maybe-uninitialized warning cpuidle: Add definition of residency to sysfs documentation time: hrtimer: Use timerqueue_iterate_next() to get to the next timer nohz: Avoid duplication of code related to got_idle_tick nohz: Gather tick_sched booleans under a common flag field cpuidle: menu: Avoid selecting shallow states with stopped tick cpuidle: menu: Refine idle state selection for running tick sched: idle: Select idle state before stopping the tick time: hrtimer: Introduce hrtimer_next_event_without() time: tick-sched: Split tick_nohz_stop_sched_tick() cpuidle: Return nohz hint from cpuidle_select() jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC sched: idle: Do not stop the tick before cpuidle_idle_call() sched: idle: Do not stop the tick upfront in the idle loop time: tick-sched: Reorganize idle tick management code * pm-qos: PM / QoS: mark expected switch fall-throughs	2018-04-11 13:22:46 +02:00
Helge Deller	75abf64287	parisc/signal: Add FPE_CONDTRAP for conditional trap handling Posix and common sense requires that SI_USER not be a signal specific si_code. Thus add a new FPE_CONDTRAP si_code for conditional traps. Signed-off-by: Helge Deller <deller@gmx.de> Cc: Stephen Rothwell <sfr@canb.auug.org.au>	2018-04-11 11:40:35 +02:00
KarimAllah Ahmed	8e9b29b618	X86/KVM: Do not allow DISABLE_EXITS_MWAIT when LAPIC ARAT is not available If the processor does not have an "Always Running APIC Timer" (aka ARAT), we should not give guests direct access to MWAIT. The LAPIC timer would stop ticking in deep C-states, so any host deadlines would not wakeup the host kernel. The host kernel intel_idle driver handles this by switching to broadcast mode when ARAT is not available and MWAIT is issued with a deep C-state that would stop the LAPIC timer. When MWAIT is passed through, we can not tell when MWAIT is issued. So just disable this capability when LAPIC ARAT is not available. I am not even sure if there are any CPUs with VMX support but no LAPIC ARAT or not. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Reported-by: Wanpeng Li <kernellwp@gmail.com> Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-04-11 11:34:16 +02:00
KarimAllah Ahmed	386c6ddbda	X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted The VMX-preemption timer is used by KVM as a way to set deadlines for the guest (i.e. timer emulation). That was safe till very recently when capability KVM_X86_DISABLE_EXITS_MWAIT to disable intercepting MWAIT was introduced. According to Intel SDM 25.5.1: """ The VMX-preemption timer operates in the C-states C0, C1, and C2; it also operates in the shutdown and wait-for-SIPI states. If the timer counts down to zero in any state other than the wait-for SIPI state, the logical processor transitions to the C0 C-state and causes a VM exit; the timer does not cause a VM exit if it counts down to zero in the wait-for-SIPI state. The timer is not decremented in C-states deeper than C2. """ Now once the guest issues the MWAIT with a c-state deeper than C2 the preemption timer will never wake it up again since it stopped ticking! Usually this is compensated by other activities in the system that would wake the core from the deep C-state (and cause a VMExit). For example, if the host itself is ticking or it received interrupts, etc! So disable the VMX-preemption timer if MWAIT is exposed to the guest! Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Fixes: `4d5422cea3` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-04-10 17:19:44 +02:00
Li RongQing	a774635db5	x86/apic: Fix signedness bug in APIC ID validity checks The APIC ID as parsed from ACPI MADT is validity checked with the apic->apic_id_valid() callback, which depends on the selected APIC type. For non X2APIC types APIC IDs >= 0xFF are invalid, but values > 0x7FFFFFFF are detected as valid. This happens because the 'apicid' argument of the apic_id_valid() callback is type 'int'. So the resulting comparison apicid < 0xFF evaluates to true for all unsigned int values > 0x7FFFFFFF which are handed to default_apic_id_valid(). As a consequence, invalid APIC IDs in !X2APIC mode are considered valid and accounted as possible CPUs. Change the apicid argument type of the apic_id_valid() callback to u32 so the evaluation is unsigned and returns the correct result. [ tglx: Massaged changelog ] Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Cc: jgross@suse.com Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/1523322966-10296-1-git-send-email-lirongqing@baidu.com	2018-04-10 16:46:39 +02:00
Kirill A. Shutemov	d94a155c59	x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption Some features (Intel MKTME, AMD SME) reduce the number of effectively available physical address bits. cpuinfo_x86::x86_phys_bits is adjusted accordingly during the early cpu feature detection. Though if get_cpu_cap() is called later again then this adjustement is overwritten. That happens in setup_pku(), which is called after detect_tme(). To address this, extract the address sizes enumeration into a separate function, which is only called only from early_identify_cpu() and from generic_identify(). This makes get_cpu_cap() safe to be called later during boot proccess without overwriting cpuinfo_x86::x86_phys_bits. [ tglx: Massaged changelog ] Fixes: `cb06d8e3d0` ("x86/tme: Detect if TME and MKTME is activated by BIOS") Reported-by: Kai Huang <kai.huang@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: linux-mm@kvack.org Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180410092704.41106-1-kirill.shutemov@linux.intel.com	2018-04-10 16:33:21 +02:00
Boris Ostrovsky	a5a18ae73b	xen/pvh: Indicate XENFEAT_linux_rsdp_unrestricted to Xen Pre-4.17 kernels ignored start_info's rsdp_paddr pointer and instead relied on finding RSDP in standard location in BIOS RO memory. This has worked since that's where Xen used to place it. However, with recent Xen change (commit 4a5733771e6f ("libxl: put RSDP for PVH guest near 4GB")) it prefers to keep RSDP at a "non-standard" address. Even though as of commit `b17d9d1df3` ("x86/xen: Add pvh specific rsdp address retrieval function") Linux is able to find RSDP, for back-compatibility reasons we need to indicate to Xen that we can handle this, an we do so by setting XENFEAT_linux_rsdp_unrestricted flag in ELF notes. (Also take this opportunity and sync features.h header file with Xen) Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>	2018-04-10 09:22:22 -04:00
Linus Torvalds	d8312a3f61	ARM: - VHE optimizations - EL2 address space randomization - speculative execution mitigations ("variant 3a", aka execution past invalid privilege register access) - bugfixes and cleanups PPC: - improvements for the radix page fault handler for HV KVM on POWER9 s390: - more kvm stat counters - virtio gpu plumbing - documentation - facilities improvements x86: - support for VMware magic I/O port and pseudo-PMCs - AMD pause loop exiting - support for AMD core performance extensions - support for synchronous register access - expose nVMX capabilities to userspace - support for Hyper-V signaling via eventfd - use Enlightened VMCS when running on Hyper-V - allow userspace to disable MWAIT/HLT/PAUSE vmexits - usual roundup of optimizations and nested virtualization bugfixes Generic: - API selftest infrastructure (though the only tests are for x86 as of now) -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAABAgAGBQJay19UAAoJEL/70l94x66DGKYIAIu9PTHAEwaX0et15fPW5y2x rrtS355lSAmMrPJ1nePRQ+rProD/1B0Kizj3/9O+B9OTKKRsorRYNa4CSu9neO2k N3rdE46M1wHAPwuJPcYvh3iBVXtgbMayk1EK5aVoSXaMXEHh+PWZextkl+F+G853 kC27yDy30jj9pStwnEFSBszO9ua/URdKNKBATNx8WUP6d9U/dlfm5xv3Dc3WtKt2 UMGmog2wh0i7ecXo7hRkMK4R7OYP3ZxAexq5aa9BOPuFp+ZdzC/MVpN+jsjq2J/M Zq6RNyA2HFyQeP0E9QgFsYS2BNOPeLZnT5Jg1z4jyiD32lAZ/iC51zwm4oNKcDM= =bPlD -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM: - VHE optimizations - EL2 address space randomization - speculative execution mitigations ("variant 3a", aka execution past invalid privilege register access) - bugfixes and cleanups PPC: - improvements for the radix page fault handler for HV KVM on POWER9 s390: - more kvm stat counters - virtio gpu plumbing - documentation - facilities improvements x86: - support for VMware magic I/O port and pseudo-PMCs - AMD pause loop exiting - support for AMD core performance extensions - support for synchronous register access - expose nVMX capabilities to userspace - support for Hyper-V signaling via eventfd - use Enlightened VMCS when running on Hyper-V - allow userspace to disable MWAIT/HLT/PAUSE vmexits - usual roundup of optimizations and nested virtualization bugfixes Generic: - API selftest infrastructure (though the only tests are for x86 as of now)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits) kvm: x86: fix a prototype warning kvm: selftests: add sync_regs_test kvm: selftests: add API testing infrastructure kvm: x86: fix a compile warning KVM: X86: Add Force Emulation Prefix for "emulate the next instruction" KVM: X86: Introduce handle_ud() KVM: vmx: unify adjacent #ifdefs x86: kvm: hide the unused 'cpu' variable KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown" kvm: Add emulation for movups/movupd KVM: VMX: raise internal error for exception during invalid protected mode state KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending KVM: x86: Fix misleading comments on handling pending exceptions KVM: x86: Rename interrupt.pending to interrupt.injected KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt x86/kvm: use Enlightened VMCS when running on Hyper-V x86/hyper-v: detect nested features x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits ...	2018-04-09 11:42:31 -07:00
Dave Hansen	6baf4bec02	x86/espfix: Document use of _PAGE_GLOBAL The "normal" kernel page table creation mechanisms using PAGE_KERNEL_* page protections will never set _PAGE_GLOBAL with PTI. The few places in the kernel that always want _PAGE_GLOBAL must avoid using PAGE_KERNEL_*. Document that we want it here and its use is not accidental. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205507.BCF4D4F0@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-09 18:27:33 +02:00
Dave Hansen	8a57f4849f	x86/mm: Introduce "default" kernel PTE mask The __PAGE_KERNEL_* page permissions are "raw". They contain bits that may or may not be supported on the current processor. They need to be filtered by a mask (currently __supported_pte_mask) to turn them into a value that we can actually set in a PTE. These __PAGE_KERNEL_* values all contain _PAGE_GLOBAL. But, with PTI, we want to be able to support _PAGE_GLOBAL (have the bit set in __supported_pte_mask) but not have it appear in any of these masks by default. This patch creates a new mask, __default_kernel_pte_mask, and applies it when creating all of the PAGE_KERNEL_* masks. This makes PAGE_KERNEL_* safe to use anywhere (they only contain supported bits). It also ensures that PAGE_KERNEL_* contains _PAGE_GLOBAL on PTI=n kernels but clears _PAGE_GLOBAL when PTI=y. We also make __default_kernel_pte_mask a non-GPL exported symbol because there are plenty of driver-available interfaces that take PAGE_KERNEL_* permissions. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205506.030DB6B6@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-09 18:27:32 +02:00
Dave Hansen	606c7193d5	x86/mm: Undo double _PAGE_PSE clearing When clearing _PAGE_PRESENT on a huge page, we need to be careful to also clear _PAGE_PSE, otherwise it might still get confused for a valid large page table entry. We do that near the spot where we set _PAGE_PSE. That's fine, but it's unnecessary. pgprot_large_2_4k() already did it. BTW, I also noticed that pgprot_large_2_4k() and pgprot_4k_2_large() are not symmetric. pgprot_large_2_4k() clears _PAGE_PSE (because it is aliased to _PAGE_PAT) but pgprot_4k_2_large() does not put _PAGE_PSE back. Bummer. Also, add some comments and change "promote" to "move". "Promote" seems an odd word to move when we are logically moving a bit to a lower bit position. Also add an extra line return to make it clear to which line the comment applies. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205504.9B0F44A9@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-09 18:27:32 +02:00
Dave Hansen	d1440b23c9	x86/mm: Factor out pageattr _PAGE_GLOBAL setting The pageattr code has a pattern repeated where it sets _PAGE_GLOBAL for present PTEs but clears it for non-present PTEs. The intention is to keep _PAGE_GLOBAL from getting confused with _PAGE_PROTNONE since _PAGE_GLOBAL is for present PTEs and _PAGE_PROTNONE is for non-present But, this pattern makes no sense. Effectively, it says, if you use the pageattr code, always set _PAGE_GLOBAL when _PAGE_PRESENT. canon_pgprot() will clear it if unsupported (because it masks the value with __supported_pte_mask) but we always set it. Even if canon_pgprot() did not filter _PAGE_GLOBAL, it would be OK. _PAGE_GLOBAL is ignored when CR4.PGE=0 by the hardware. This unconditional setting of _PAGE_GLOBAL is a problem when we have PTI and non-PTI and we want some areas to have _PAGE_GLOBAL and some not. This updated version of the code says: 1. Clear _PAGE_GLOBAL when !_PAGE_PRESENT 2. Never set _PAGE_GLOBAL implicitly 3. Allow _PAGE_GLOBAL to be in cpa.set_mask 4. Allow _PAGE_GLOBAL to be inherited from previous PTE Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205502.86E199DA@viggo.jf.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-09 18:27:32 +02:00
Ingo Molnar	ee1400dda3	Merge branch 'linus' into x86/pti to pick up upstream changes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-09 18:24:58 +02:00
Andy Lutomirski	071ccc966b	x86/entry/64: Drop idtentry's manual stack switch for user entries For non-paranoid entries, idtentry knows how to switch from the kernel stack to the user stack, as does error_entry. This results in pointless duplication and code bloat. Make idtentry stop thinking about stacks for non-paranoid entries. This reduces text size by 5377 bytes. This goes back to the following commit: `7f2590a110` ("x86/entry/64: Use a per-CPU trampoline stack for IDT entries") Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/90aab80c1f906e70742eaa4512e3c9b5e62d59d4.1522794757.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-09 18:23:50 +02:00
Arnd Bergmann	92e830f25f	x86/olpc: Fix inconsistent MFD_CS5535 configuration This Kconfig warning appeared after a fix to the Kconfig validation. The GPIO_CS5535 driver depends on the MFD_CS5535 driver, but the former is selected in places where the latter is not: WARNING: unmet direct dependencies detected for GPIO_CS5535 Depends on [m]: GPIOLIB [=y] && (X86 [=y] \|\| MIPS \|\| COMPILE_TEST [=y]) && MFD_CS5535 [=m] Selected by [y]: - OLPC_XO1_SCI [=y] && X86_32 [=y] && OLPC [=y] && OLPC_XO1_PM [=y] && INPUT [=y]=y The warning does seem appropriate, since the GPIO_CS5535 driver won't work unless MFD_CS5535 is also present. However, there is no link time dependency between the two, so this caused no problems during randconfig testing before. This changes the 'select GPIO_CS5535' to 'depends on GPIO_CS5535' to avoid the issue, at the expense of making it harder to configure the driver (one now has to select the dependencies first). The 'select MFD_CORE' part is completely redundant, since we already depend on MFD_CS5535 here, so I'm removing that as well. Ideally, the private symbols exported by that cs5535 gpio driver would just be converted to gpiolib interfaces so we could expletely avoid this dependency. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-kbuild@vger.kernel.org Fixes: `f622f82795` ("kconfig: warn unmet direct dependency of tristate symbols selected by y") Link: http://lkml.kernel.org/r/20180404124539.3817101-1-arnd@arndb.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-09 18:22:34 +02:00
Dominik Brodowski	c76fc98260	syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention Make the code in syscall_wrapper.h more readable by naming the stub macros similar to the stub they provide. While at it, fix a stray newline at the end of the __IA32_COMPAT_SYS_STUBx macro. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180409105145.5364-5-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-09 16:47:28 +02:00
Dominik Brodowski	d5a00528b5	syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_() to __x64_sys_() This rename allows us to have a coherent syscall stub naming convention on 64-bit x86 (0xffffffff prefix removed): 810f0af0 t kernel_waitid # common (32/64) kernel helper <inline> __do_sys_waitid # inlined helper doing actual work 810f0be0 t __se_sys_waitid # C func calling inlined helper <inline> __do_compat_sys_waitid # inlined helper doing actual work 810f0d80 t __se_compat_sys_waitid # compat C func calling inlined helper 810f2080 T __x64_sys_waitid # x64 64-bit-ptregs -> C stub 810f20b0 T __ia32_sys_waitid # ia32 32-bit-ptregs -> C stub[] 810f2470 T __ia32_compat_sys_waitid # ia32 32-bit-ptregs -> compat C stub 810f2490 T __x32_compat_sys_waitid # x32 64-bit-ptregs -> compat C stub [] This stub is unused, as the syscall table links __ia32_compat_sys_waitid instead of __ia32_sys_waitid as we need a compat variant here. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180409105145.5364-4-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-09 16:47:28 +02:00
Dominik Brodowski	5ac9efa3c5	syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention Tidy the naming convention for compat syscall subs. Hints which describe the purpose of the stub go in front and receive a double underscore to denote that they are generated on-the-fly by the COMPAT_SYSCALL_DEFINEx() macro. For the generic case, this means: t kernel_waitid # common C function (see kernel/exit.c) __do_compat_sys_waitid # inlined helper doing the actual work # (takes original parameters as declared) T __se_compat_sys_waitid # sign-extending C function calling inlined # helper (takes parameters of type long, # casts them to unsigned long and then to # the declared type) T compat_sys_waitid # alias to __se_compat_sys_waitid() # (taking parameters as declared), to # be included in syscall table For x86, the naming is as follows: t kernel_waitid # common C function (see kernel/exit.c) __do_compat_sys_waitid # inlined helper doing the actual work # (takes original parameters as declared) t __se_compat_sys_waitid # sign-extending C function calling inlined # helper (takes parameters of type long, # casts them to unsigned long and then to # the declared type) T __ia32_compat_sys_waitid # IA32_EMULATION 32-bit-ptregs -> C stub, # calls __se_compat_sys_waitid(); to be # included in syscall table T __x32_compat_sys_waitid # x32 64-bit-ptregs -> C stub, calls # __se_compat_sys_waitid(); to be included # in syscall table If only one of IA32_EMULATION and x32 is enabled, __se_compat_sys_waitid() may be inlined into the stub __{ia32,x32}_compat_sys_waitid(). Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180409105145.5364-3-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-09 16:47:28 +02:00
Dominik Brodowski	e145242ea0	syscalls/core, syscalls/x86: Clean up syscall stub naming convention Tidy the naming convention for compat syscall subs. Hints which describe the purpose of the stub go in front and receive a double underscore to denote that they are generated on-the-fly by the SYSCALL_DEFINEx() macro. For the generic case, this means (0xffffffff prefix removed): 810f08d0 t kernel_waitid # common C function (see kernel/exit.c) <inline> __do_sys_waitid # inlined helper doing the actual work # (takes original parameters as declared) 810f1aa0 T __se_sys_waitid # sign-extending C function calling inlined # helper (takes parameters of type long; # casts them to the declared type) 810f1aa0 T sys_waitid # alias to __se_sys_waitid() (taking # parameters as declared), to be included # in syscall table For x86, the naming is as follows: 810efc70 t kernel_waitid # common C function (see kernel/exit.c) <inline> __do_sys_waitid # inlined helper doing the actual work # (takes original parameters as declared) 810efd60 t __se_sys_waitid # sign-extending C function calling inlined # helper (takes parameters of type long; # casts them to the declared type) 810f1140 T __ia32_sys_waitid # IA32_EMULATION 32-bit-ptregs -> C stub, # calls __se_sys_waitid(); to be included # in syscall table 810f1110 T sys_waitid # x86 64-bit-ptregs -> C stub, calls # __se_sys_waitid(); to be included in # syscall table For x86, sys_waitid() will be re-named to __x64_sys_waitid in a follow-up patch. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180409105145.5364-2-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-09 16:47:27 +02:00
Masahiro Yamada	54a702f705	kbuild: mark $(targets) as .SECONDARY and remove .PRECIOUS markers GNU Make automatically deletes intermediate files that are updated in a chain of pattern rules. Example 1) %.dtb.o <- %.dtb.S <- %.dtb <- %.dts Example 2) %.o <- %.c <- %.c_shipped A couple of makefiles mark such targets as .PRECIOUS to prevent Make from deleting them, but the correct way is to use .SECONDARY. .SECONDARY Prerequisites of this special target are treated as intermediate files but are never automatically deleted. .PRECIOUS When make is interrupted during execution, it may delete the target file it is updating if the file was modified since make started. If you mark the file as precious, make will never delete the file if interrupted. Both can avoid deletion of intermediate files, but the difference is the behavior when Make is interrupted; .SECONDARY deletes the target, but .PRECIOUS does not. The use of .PRECIOUS is relatively rare since we do not want to keep partially constructed (possibly corrupted) targets. Another difference is that .PRECIOUS works with pattern rules whereas .SECONDARY does not. .PRECIOUS: $(obj)/%.lex.c works, but .SECONDARY: $(obj)/%.lex.c has no effect. However, for the reason above, I do not want to use .PRECIOUS which could cause obscure build breakage. The targets specified as .SECONDARY must be explicit. $(targets) contains all targets that need to include ..cmd files. So, the intermediates you want to keep are mostly in there. Therefore, mark $(targets) as .SECONDARY. It means primary targets are also marked as .SECONDARY, but I do not see any drawback for this. I replaced some .SECONDARY / .PRECIOUS markers with 'targets'. This will make Kbuild search for non-existing ..cmd files, but this is not a noticeable performance issue. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Frank Rowand <frowand.list@gmail.com> Acked-by: Ingo Molnar <mingo@kernel.org>	2018-04-07 19:04:02 +09:00
Linus Torvalds	3b54765cca	Merge branch 'akpm' (patches from Andrew) Merge updates from Andrew Morton: - a few misc things - ocfs2 updates - the v9fs maintainers have been missing for a long time. I've taken over v9fs patch slinging. - most of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (116 commits) mm,oom_reaper: check for MMF_OOM_SKIP before complaining mm/ksm: fix interaction with THP mm/memblock.c: cast constant ULLONG_MAX to phys_addr_t headers: untangle kmemleak.h from mm.h include/linux/mmdebug.h: make VM_WARN* non-rvals mm/page_isolation.c: make start_isolate_page_range() fail if already isolated mm: change return type to vm_fault_t mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes mm, page_alloc: wakeup kcompactd even if kswapd cannot free more memory kernel/fork.c: detect early free of a live mm mm: make counting of list_lru_one::nr_items lockless mm/swap_state.c: make bool enable_vma_readahead and swap_vma_readahead() static block_invalidatepage(): only release page if the full page was invalidated mm: kernel-doc: add missing parameter descriptions mm/swap.c: remove @cold parameter description for release_pages() mm/nommu: remove description of alloc_vm_area zram: drop max_zpage_size and use zs_huge_class_size() zsmalloc: introduce zs_huge_class_size() mm: fix races between swapoff and flush dcache fs/direct-io.c: minor cleanups in do_blockdev_direct_IO ...	2018-04-06 14:19:26 -07:00
Peng Hao	e01bca2fc6	kvm: x86: fix a prototype warning Make the function static to avoid a warning: no previous prototype for ‘vmx_enable_tdp’ Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-04-06 18:20:31 +02:00
Randy Dunlap	514c603249	headers: untangle kmemleak.h from mm.h Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious reason. It looks like it's only a convenience, so remove kmemleak.h from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that don't already #include it. Also remove <linux/kmemleak.h> from source files that do not use it. This is tested on i386 allmodconfig and x86_64 allmodconfig. It would be good to run it through the 0day bot for other $ARCHes. I have neither the horsepower nor the storage space for the other $ARCHes. Update: This patch has been extensively build-tested by both the 0day bot & kisskb/ozlabs build farms. Both of them reported 2 build failures for which patches are included here (in v2). [ slab.h is the second most used header file after module.h; kernel.h is right there with slab.h. There could be some minor error in the counting due to some #includes having comments after them and I didn't combine all of those. ] [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr] Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org Link: http://kisskb.ellerman.id.au/kisskb/head/13396/ Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Reported-by: Michael Ellerman <mpe@ellerman.id.au> [2 build failures] Reported-by: Fengguang Wu <fengguang.wu@intel.com> [2 build failures] Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Wei Yongjun <weiyongjun1@huawei.com> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: John Johansen <john.johansen@canonical.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-05 21:36:27 -07:00
Pavel Tatashin	078eb6aa50	x86/mm/memory_hotplug: determine block size based on the end of boot memory Memory sections are combined into "memory block" chunks. These chunks are the units upon which memory can be added and removed. On x86, the new memory may be added after the end of the boot memory, therefore, if block size does not align with end of boot memory, memory hot-plugging/hot-removing can be broken. Memory sections are combined into "memory block" chunks. These chunks are the units upon which memory can be added and removed. On x86 the new memory may be added after the end of the boot memory, therefore, if block size does not align with end of boot memory, memory hotplugging/hotremoving can be broken. Currently, whenever machine is booted with more than 64G the block size is unconditionally increased to 2G from the base 128M. This is done in order to reduce number of memory device files in sysfs: /sys/devices/system/memory/memoryXXX We must use the largest allowed block size that aligns to the next address to be able to hotplug the next block of memory. So, when memory is larger or equal to 64G, we check the end address and find the largest block size that is still power of two but smaller or equal to 2G. Before, the fix: Run qemu with: -m 64G,slots=2,maxmem=66G -object memory-backend-ram,id=mem1,size=2G (qemu) device_add pc-dimm,id=dimm1,memdev=mem1 Block size [0x80000000] unaligned hotplug range: start 0x1040000000, size 0x80000000 acpi PNP0C80:00: add_memory failed acpi PNP0C80:00: acpi_memory_enable_device() error acpi PNP0C80:00: Enumeration failure With the fix memory is added successfully as the block size is set to 1G, and therefore aligns with start address 0x1040000000. [pasha.tatashin@oracle.com: v4] Link: http://lkml.kernel.org/r/20180215165920.8570-3-pasha.tatashin@oracle.com Link: http://lkml.kernel.org/r/20180213193159.14606-3-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-04-05 21:36:25 -07:00
Linus Torvalds	672a9c1069	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: kfifo: fix inaccurate comment tools/thermal: tmon: fix for segfault net: Spelling s/stucture/structure/ edd: don't spam log if no EDD information is present Documentation: Fix early-microcode.txt references after file rename tracing: Block comments should align the * on each line treewide: Fix typos in printk GenWQE: Fix a typo in two comments treewide: Align function definition open/close braces	2018-04-05 11:56:35 -07:00
Linus Torvalds	e02d37bf55	sound updates for 4.17-rc1 This became a large update. The changes are scattered widely, and majority of them are attributed to ASoC componentization. The gitk output made me dizzy, but it's slightly better than London tube. OK, below are some highlights: - Continued hardening works in ALSA PCM core; most of the existing syzkaller reports should have been covered. - USB-audio got the initial USB Audio Class 3 support, as well as UAC2 jack detection support and more DSD-device support. - ASoC componentization: finally each individual driver was converted to components framework, which is more future-proof for further works. Most of conversations were systematic. - Lots of fixes for Intel Baytrail / Cherrytrail devices with Realtek codecs, typically tablets and small PCs. - Fixes / cleanups for Samsung Odroid systems - Cleanups in Freescale SSI driver - New ASoC drivers: * AKM AK4458 and AK5558 codecs * A few AMD based machine drivers * Intel Kabylake machine drivers * Maxim MAX9759 codec * Motorola CPCAP codec * Socionext Uniphier SoCs * TI PCM1789 and TDA7419 codecs - Retirement of Blackfin drivers along with architecture removal. -----BEGIN PGP SIGNATURE----- iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAlrF2gUOHHRpd2FpQHN1 c2UuZGUACgkQLtJE4w1nLE/ZLhAAvUgpOkpHRmvyXoqhWdG/FWWFWtoFrQaDZE5y NPcGHy/ZLuCXGL3Zpsm9lZqXd1sxRdsxF3hiWT0JqqC7oxs/oSOhSzf7w6P9ppW7 nxZKo4SCSQpmy0Y58QhwpXUkuGzRAOXcug39BNiAqxjtWPPNT8bUj/br3ApH9+90 Dtittl26Z1Eek1KwNJDMdJt8l5P4P5Ls44g/9Xwhgxk/P0nHmErNuUftlNc/65/b HdVgLSXVJbfJ9dLRjQC0yg7jPzSgSp5xssAkWGfPv8AMnM6ql7LWGO+6zOdVcOUo 0ipKJpZHUI/k1Uv4yBxI32GueOl/gH78M3iGv1CVe/jaC8g8XXA5GScnG41U1ZUO p9f78q8jk+O4uCDvbCvigw+iqb7Lm7ME0jNaQ6gZzZX2sDDBUBIYMS6W658pQfgT w00c73gm7J+MPv4FsVyyzZsmqyO/xE/1x9F2eGut67DbCKVcfQnyheYJq3Gt96qo tzvJ+cy3JxCfGn7Ngl2/i8jtHD6sGf1Pl3gOPk5DEN2qfuBy/vQ4W4TlJ1pOqGFG JjpUhEpvYhP/XPrFo970g2yYQq5VsjumQiHGxbD56qu4hrkPU3w92gYKNc0F689j QQRc8gyAvUp78ZletF4WYLf6H1yNmkP3ufhsuP1MQWuxRmTcxVtIRDU1PLAq5J8w 10mGs6s= =F3q1 -----END PGP SIGNATURE----- Merge tag 'sound-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "This became a large update. The changes are scattered widely, and the majority of them are attributed to ASoC componentization. The gitk output made me dizzy, but it's slightly better than London tube. OK, below are some highlights: - Continued hardening works in ALSA PCM core; most of the existing syzkaller reports should have been covered. - USB-audio got the initial USB Audio Class 3 support, as well as UAC2 jack detection support and more DSD-device support. - ASoC componentization: finally each individual driver was converted to components framework, which is more future-proof for further works. Most of conversations were systematic. - Lots of fixes for Intel Baytrail / Cherrytrail devices with Realtek codecs, typically tablets and small PCs. - Fixes / cleanups for Samsung Odroid systems - Cleanups in Freescale SSI driver - New ASoC drivers: * AKM AK4458 and AK5558 codecs * A few AMD based machine drivers * Intel Kabylake machine drivers * Maxim MAX9759 codec * Motorola CPCAP codec * Socionext Uniphier SoCs * TI PCM1789 and TDA7419 codecs - Retirement of Blackfin drivers along with architecture removal" * tag 'sound-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (497 commits) ALSA: pcm: Fix UAF at PCM release via PCM timer access ALSA: usb-audio: silence a static checker warning ASoC: tscs42xx: Remove owner assignment from i2c_driver ASoC: mediatek: remove "simple-mfd" in the example ASoC: cpcap: replace codec to component ASoC: Intel: bytcr_rt5651: don't use codec anymore ASoC: amd: don't use codec anymore ALSA: usb-audio: fix memory leak on cval ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls ASoC: topology: Fix kcontrol name string handling ALSA: aloop: Mark paused device as inactive ALSA: usb-audio: update clock valid control ALSA: usb-audio: UAC2 jack detection ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams ALSA: pcm: Avoid potential races between OSS ioctls and read/write ALSA: usb-audio: Integrate native DSD support for ITF-USB based DACs. ALSA: usb-audio: FIX native DSD support for TEAC UD-501 DAC ALSA: usb-audio: Add native DSD support for Luxman DA-06 ALSA: usb-audio: fix uac control query argument ASoC: nau8824: recover system clock when device changes ...	2018-04-05 10:42:07 -07:00
Rafael J. Wysocki	0e7767687f	time: tick-sched: Reorganize idle tick management code Prepare the scheduler tick code for reworking the idle loop to avoid stopping the tick in some cases. The idea is to split the nohz idle entry call to decouple the idle time stats accounting and preparatory work from the actual tick stop code, in order to later be able to delay the tick stop once we reach more power-knowledgeable callers. Move away the tick_nohz_start_idle() invocation from __tick_nohz_idle_enter(), rename the latter to __tick_nohz_idle_stop_tick() and define tick_nohz_idle_stop_tick() as a wrapper around it for calling it from the outside. Make tick_nohz_idle_enter() only call tick_nohz_start_idle() instead of calling the entire __tick_nohz_idle_enter(), add another wrapper disabling and enabling interrupts around tick_nohz_idle_stop_tick() and make the current callers of tick_nohz_idle_enter() call it too to retain their current functionality. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2018-04-05 18:58:47 +02:00
Linus Torvalds	1b2951dd99	This is the bulk of GPIO changes for the v4.17 kernel cycle: New drivers: - Nintendo Wii GameCube GPIO, known as "Hollywood" - Raspberry Pi mailbox service GPIO expander - Spreadtrum main SC9860 SoC and IEC GPIO controllers. Improvements: - Implemented .get_multiple() callback for most of the high-performance industrial GPIO cards for the ISA bus. - ISA GPIO drivers now select the ISA_BUS_API instead of depending on it. This is merged with the same pattern for all the ISA drivers and some other Kconfig cleanups related to this. Cleanup: - Delete the TZ1090 GPIO drivers following the deletion of this SoC from the ARM tree. - Move the documentation over to driver-api to conform with the rest of the kernel documentation build. - Continue to make the GPIO drivers include only <linux/gpio/driver.h> and not the too broad <linux/gpio.h> that we want to get rid of. - Managed to remove VLA allocation from two drivers pending more fixes in this area for the next merge window. - Misc janitorial fixes. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJaxIehAAoJEEEQszewGV1zlEAP/3p3E6J8vPqJNV/C39c40krC ajo0ndiTC7cotmCXNQOl9xfMCTgkjBtx3WEKwTDfCsuWW+2YB0DRmMd0Bkf2RWjQ nM4rB64FzAu+rdD9jdGtfn24ofylSRFaHNQ/V8Prc2JVAXJt4DS97h+6kwzIAqCm A/xXQAx67k5qoTXLvR2n/8LX8TphSe2kwH/f/3/lJpNLfLCRRJ3GqJfpa72jw2eL 4VIPc6KmttkqzJ1GFtzLPfhkhRr0p4sSzUNydlj5BKhmOSVu6Afv5ylgpK/p38dQ mGvNqFnU0lpwelsoZK75YikDFbqQjn4XkXJGvmIRMw4qM7crcw5oSkeMwCrcGqJW 7Uo7NoQU94wcQSZTppFQdaJs7NHdcnpW7jcfRYYetZL/6eDGBtfxoym90Lyjvaqs y+ykofbadI0X/9omO5j+qozvIneLam/CF7iDRUb/5t1LJbNwtXUsVYhz3FuwPDt1 ZHb6w+np9ZHN6H9jz3b/F9B/uQt54pshm7NorSXrJvZfKrv8kV14MoHgYsuQDDjV khbveygB8DwaPeV4XjpLeYhJB1L/Wjf46CVD6tyaCRDByGQmdoJEQF9QB2CxrF2J ouaaaS8tSC0IK/mKMMgJxC1Vr2gh0NMlQ3AL9EJDJvX+9RoIA2gwtBAiGnlEcdq3 GyFAZ0szb5P4BaNnX9qc =C5t5 -----END PGP SIGNATURE----- Merge tag 'gpio-v4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO updates from Linus Walleij: "This is the bulk of GPIO changes for the v4.17 kernel cycle: New drivers: - Nintendo Wii GameCube GPIO, known as "Hollywood" - Raspberry Pi mailbox service GPIO expander - Spreadtrum main SC9860 SoC and IEC GPIO controllers. Improvements: - Implemented .get_multiple() callback for most of the high-performance industrial GPIO cards for the ISA bus. - ISA GPIO drivers now select the ISA_BUS_API instead of depending on it. This is merged with the same pattern for all the ISA drivers and some other Kconfig cleanups related to this. Cleanup: - Delete the TZ1090 GPIO drivers following the deletion of this SoC from the ARM tree. - Move the documentation over to driver-api to conform with the rest of the kernel documentation build. - Continue to make the GPIO drivers include only <linux/gpio/driver.h> and not the too broad <linux/gpio.h> that we want to get rid of. - Managed to remove VLA allocation from two drivers pending more fixes in this area for the next merge window. - Misc janitorial fixes" * tag 'gpio-v4.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (77 commits) gpio: Add Spreadtrum PMIC EIC driver support gpio: Add Spreadtrum EIC driver support dt-bindings: gpio: Add Spreadtrum EIC controller documentation gpio: ath79: Fix potential NULL dereference in ath79_gpio_probe() pinctrl: qcom: Don't allow protected pins to be requested gpiolib: Support 'gpio-reserved-ranges' property gpiolib: Change bitmap allocation to kmalloc_array gpiolib: Extract mask allocation into subroutine dt-bindings: gpio: Add a gpio-reserved-ranges property gpio: mockup: fix a potential crash when creating debugfs entries gpio: pca953x: add compatibility for pcal6524 and pcal9555a gpio: dwapb: Add support for a bus clock gpio: Remove VLA from xra1403 driver gpio: Remove VLA from MAX3191X driver gpio: ws16c48: Implement get_multiple callback gpio: gpio-mm: Implement get_multiple callback gpio: 104-idi-48: Implement get_multiple callback gpio: 104-dio-48e: Implement get_multiple callback gpio: pcie-idio-24: Implement get_multiple/set_multiple callbacks gpio: pci-idio-16: Implement get_multiple callback ...	2018-04-05 09:51:41 -07:00
Dominik Brodowski	6dc936f175	syscalls/x86: Extend register clearing on syscall entry to lower registers To reduce the chance that random user space content leaks down the call chain in registers, also clear lower registers on syscall entry: For 64-bit syscalls, extend the register clearing in PUSH_AND_CLEAR_REGS to %dx and %cx. This should not hurt at all, also on the other callers of that macro. We do not need to clear %rdi and %rsi for syscall entry, as those registers are used to pass the parameters to do_syscall_64(). For the 32-bit compat syscalls, do_int80_syscall_32() and do_fast_syscall_32() each only take one parameter. Therefore, extend the register clearing to %dx, %cx, and %si in entry_SYSCALL_compat and entry_INT80_compat. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180405095307.3730-8-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-05 16:59:39 +02:00
Dominik Brodowski	f8781c4a22	syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64 Removing CONFIG_SYSCALL_PTREGS from arch/x86/Kconfig and simply selecting ARCH_HAS_SYSCALL_WRAPPER unconditionally on x86-64 allows us to simplify several codepaths. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180405095307.3730-7-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-05 16:59:38 +02:00
Dominik Brodowski	ebeb8c82ff	syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32 Extend ARCH_HAS_SYSCALL_WRAPPER for i386 emulation and for x32 on 64-bit x86. For x32, all we need to do is to create an additional stub for each compat syscall which decodes the parameters in x86-64 ordering, e.g.: asmlinkage long __compat_sys_x32_xyzzy(struct pt_regs regs) { return c_SyS_xyzzy(regs->di, regs->si, regs->dx); } For i386 emulation, we need to teach compat_sys_() to take struct pt_regs as its only argument, e.g.: asmlinkage long __compat_sys_ia32_xyzzy(struct pt_regs regs) { return c_SyS_xyzzy(regs->bx, regs->cx, regs->dx); } In addition, we need to create additional stubs for common syscalls (that is, for syscalls which have the same parameters on 32-bit and 64-bit), e.g.: asmlinkage long __sys_ia32_xyzzy(struct pt_regs regs) { return c_sys_xyzzy(regs->bx, regs->cx, regs->dx); } This approach avoids leaking random user-provided register content down the call chain. This patch is based on an original proof-of-concept \| From: Linus Torvalds <torvalds@linux-foundation.org> \| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> and was split up and heavily modified by me, in particular to base it on ARCH_HAS_SYSCALL_WRAPPER. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180405095307.3730-6-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-05 16:59:38 +02:00
Dominik Brodowski	fa697140f9	syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls Let's make use of ARCH_HAS_SYSCALL_WRAPPER=y on pure 64-bit x86-64 systems: Each syscall defines a stub which takes struct pt_regs as its only argument. It decodes just those parameters it needs, e.g: asmlinkage long sys_xyzzy(const struct pt_regs regs) { return SyS_xyzzy(regs->di, regs->si, regs->dx); } This approach avoids leaking random user-provided register content down the call chain. For example, for sys_recv() which is a 4-parameter syscall, the assembly now is (in slightly reordered fashion): <sys_recv>: callq <__fentry__> / decode regs->di, ->si, ->dx and ->r10 / mov 0x70(%rdi),%rdi mov 0x68(%rdi),%rsi mov 0x60(%rdi),%rdx mov 0x38(%rdi),%rcx [ SyS_recv() is automatically inlined by the compiler, as it is not [yet] used anywhere else ] / clear %r9 and %r8, the 5th and 6th args / xor %r9d,%r9d xor %r8d,%r8d / do the actual work / callq __sys_recvfrom / cleanup and return / cltq retq The only valid place in an x86-64 kernel which rightfully calls a syscall function on its own -- vsyscall -- needs to be modified to pass struct pt_regs onwards as well. To keep the syscall table generation working independent of SYSCALL_PTREGS being enabled, the stubs are named the same as the "original" syscall stubs, i.e. sys_(). This patch is based on an original proof-of-concept \| From: Linus Torvalds <torvalds@linux-foundation.org> \| Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> and was split up and heavily modified by me, in particular to base it on ARCH_HAS_SYSCALL_WRAPPER, to limit it to 64-bit-only for the time being, and to update the vsyscall to the new calling convention. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180405095307.3730-4-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-05 16:59:26 +02:00
Linus Torvalds	dfe64506c0	x86/syscalls: Don't pointlessly reload the system call number We have it in a register in the low-level asm, just pass it in as an argument rather than have do_syscall_64() load it back in from the ptregs pointer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180405095307.3730-2-linux@dominikbrodowski.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-05 16:59:24 +02:00
Dmitry V. Levin	9820e1c337	x86/uapi: Fix asm/bootparam.h userspace compilation errors Consistently use types provided by <linux/types.h> to fix the following asm/bootparam.h userspace compilation errors: /usr/include/asm/bootparam.h:140:2: error: unknown type name 'u16' u16 version; /usr/include/asm/bootparam.h:141:2: error: unknown type name 'u16' u16 compatible_version; /usr/include/asm/bootparam.h:142:2: error: unknown type name 'u16' u16 pm_timer_address; /usr/include/asm/bootparam.h:143:2: error: unknown type name 'u16' u16 num_cpus; /usr/include/asm/bootparam.h:144:2: error: unknown type name 'u64' u64 pci_mmconfig_base; /usr/include/asm/bootparam.h:145:2: error: unknown type name 'u32' u32 tsc_khz; /usr/include/asm/bootparam.h:146:2: error: unknown type name 'u32' u32 apic_khz; /usr/include/asm/bootparam.h:147:2: error: unknown type name 'u8' u8 standard_ioapic; /usr/include/asm/bootparam.h:148:2: error: unknown type name 'u8' u8 cpu_ids[255]; Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: <stable@vger.kernel.org> # v4.16 Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `4a362601ba` ("x86/jailhouse: Add infrastructure for running in non-root cell") Link: http://lkml.kernel.org/r/20180405043210.GA13254@altlinux.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-05 10:05:21 +02:00
Stephane Eranian	d1e7e602cd	perf/x86/intel: Move regs->flags EXACT bit init This patch removes a redundant store on regs->flags introduced by commit: `71eb9ee959` ("perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and later CPUs") We were clearing the PERF_EFLAGS_EXACT but it was overwritten by regs->flags = pebs->flags later on. The PERF_EFLAGS_EXACT is a software flag using bit 3 of regs->flags. X86 marks this bit as Reserved. To make sure this bit is zero before we do any IP processing, we clear it explicitly. Patch also removes the following assignment: regs->flags = pebs->flags \| (regs->flags & PERF_EFLAGS_VM); Because there is no regs->flags to preserve anymore because set_linear_ip() is not called until later. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1522909791-32498-1-git-send-email-eranian@google.com [ Improve capitalization, punctuation and clarity of comments. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-05 09:28:40 +02:00
Linus Torvalds	06dd3dfeea	Char/Misc patches for 4.17-rc1 Here is the big set of char/misc driver patches for 4.17-rc1. There are a lot of little things in here, nothing huge, but all important to the different hardware types involved: - thunderbolt driver updates - parport updates (people still care...) - nvmem driver updates - mei updates (as always) - hwtracing driver updates - hyperv driver updates - extcon driver updates - and a handfull of even smaller driver subsystem and individual driver updates All of these have been in linux-next with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWsShSQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykNqwCfUbfvopswb1PesHCLABDBsFQChgoAniDa6pS9 kI8TN5MdLN85UU27Mkb6 =BzFR -----END PGP SIGNATURE----- Merge tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc updates from Greg KH: "Here is the big set of char/misc driver patches for 4.17-rc1. There are a lot of little things in here, nothing huge, but all important to the different hardware types involved: - thunderbolt driver updates - parport updates (people still care...) - nvmem driver updates - mei updates (as always) - hwtracing driver updates - hyperv driver updates - extcon driver updates - ... and a handful of even smaller driver subsystem and individual driver updates All of these have been in linux-next with no reported issues" * tag 'char-misc-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (149 commits) hwtracing: Add HW tracing support menu intel_th: Add ACPI glue layer intel_th: Allow forcing host mode through drvdata intel_th: Pick up irq number from resources intel_th: Don't touch switch routing in host mode intel_th: Use correct method of finding hub intel_th: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate stm class: Make dummy's master/channel ranges configurable stm class: Add SPDX GPL-2.0 header to replace GPLv2 boilerplate MAINTAINERS: Bestow upon myself the care for drivers/hwtracing hv: add SPDX license id to Kconfig hv: add SPDX license to trace Drivers: hv: vmbus: do not mark HV_PCIE as perf_device Drivers: hv: vmbus: respect what we get from hv_get_synint_state() /dev/mem: Avoid overwriting "err" in read_mem() eeprom: at24: use SPDX identifier instead of GPL boiler-plate eeprom: at24: simplify the i2c functionality checking eeprom: at24: fix a line break eeprom: at24: tweak newlines eeprom: at24: refactor at24_probe() ...	2018-04-04 20:07:20 -07:00
Linus Torvalds	9eb31227cb	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - add AEAD support to crypto engine - allow batch registration in simd Algorithms: - add CFB mode - add speck block cipher - add sm4 block cipher - new test case for crct10dif - improve scheduling latency on ARM - scatter/gather support to gcm in aesni - convert x86 crypto algorithms to skcihper Drivers: - hmac(sha224/sha256) support in inside-secure - aes gcm/ccm support in stm32 - stm32mp1 support in stm32 - ccree driver from staging tree - gcm support over QI in caam - add ks-sa hwrng driver" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (212 commits) crypto: ccree - remove unused enums crypto: ahash - Fix early termination in hash walk crypto: brcm - explicitly cast cipher to hash type crypto: talitos - don't leak pointers to authenc keys crypto: qat - don't leak pointers to authenc keys crypto: picoxcell - don't leak pointers to authenc keys crypto: ixp4xx - don't leak pointers to authenc keys crypto: chelsio - don't leak pointers to authenc keys crypto: caam/qi - don't leak pointers to authenc keys crypto: caam - don't leak pointers to authenc keys crypto: lrw - Free rctx->ext with kzfree crypto: talitos - fix IPsec cipher in length crypto: Deduplicate le32_to_cpu_array() and cpu_to_le32_array() crypto: doc - clarify hash callbacks state machine crypto: api - Keep failed instances alive crypto: api - Make crypto_alg_lookup static crypto: api - Remove unused crypto_type lookup function crypto: chelsio - Remove declaration of static function from header crypto: inside-secure - hmac(sha224) support crypto: inside-secure - hmac(sha256) support ..	2018-04-04 17:11:08 -07:00
Sai Praneeth	162ee5a8ab	x86/mm: Fix bogus warning during EFI bootup, use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush() Linus reported the following boot warning: WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/tlbflush.h:134 load_new_mm_cr3+0x114/0x170 [...] Call Trace: switch_mm_irqs_off+0x267/0x590 switch_mm+0xe/0x20 efi_switch_mm+0x3e/0x50 efi_enter_virtual_mode+0x43f/0x4da start_kernel+0x3bf/0x458 secondary_startup_64+0xa5/0xb0 ... after merging: `03781e4089`: x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3 When the platform supports PCID and if CONFIG_DEBUG_VM=y is enabled, build_cr3_noflush() (called via switch_mm()) does a sanity check to see if X86_FEATURE_PCID is set. Presently, build_cr3_noflush() uses "this_cpu_has(X86_FEATURE_PCID)" to perform the check but this_cpu_has() works only after SMP is initialized (i.e. per cpu cpu_info's should be populated) and this happens to be very late in the boot process (during rest_init()). As efi_runtime_services() are called during (early) kernel boot time and run time, modify build_cr3_noflush() to use boot_cpu_has() all the time. As suggested by Dave Hansen, this should be OK because all CPU's have same capabilities on x86. With this change the warning is fixed. ( Dave also suggested that we put a warning in this_cpu_has() if it's used early in the boot process. This is still work in progress as it affects MCE. ) Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Lee Chun-Yi <jlee@suse.com> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Shankar <ravi.v.shankar@intel.com> Cc: Ricardo Neri <ricardo.neri@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1522870459-7432-1-git-send-email-sai.praneeth.prakhya@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-04-05 01:27:49 +02:00
Linus Torvalds	23221d997b	arm64 updates for 4.17 Nothing particularly stands out here, probably because people were tied up with spectre/meltdown stuff last time around. Still, the main pieces are: - Rework of our CPU features framework so that we can whitelist CPUs that don't require kpti even in a heterogeneous system - Support for the IDC/DIC architecture extensions, which allow us to elide instruction and data cache maintenance when writing out instructions - Removal of the large memory model which resulted in suboptimal codegen by the compiler and increased the use of literal pools, which could potentially be used as ROP gadgets since they are mapped as executable - Rework of forced signal delivery so that the siginfo_t is well-formed and handling of show_unhandled_signals is consolidated and made consistent between different fault types - More siginfo cleanup based on the initial patches from Eric Biederman - Workaround for Cortex-A55 erratum #1024718 - Some small ACPI IORT updates and cleanups from Lorenzo Pieralisi - Misc cleanups and non-critical fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABCgAGBQJaw1TCAAoJELescNyEwWM0gyQIAJVMK4QveBW+LwF96NYdZo16 p90Aa+nqKelh/s93govQArDMv1gxyuXdFlQZVOGPQHfqpz6RhJWmBA2tFsUbQrUc OBcioPrRihqTmKBe+1r1XORwZxkVX6GGmCn0LYpPR7I3TjxXZpvxqaxGxiUvHkci yVxWlDTyN/7eL3akhCpCDagN3Fxwk3QnJLqE3fxOFMlY7NvQcmUxcITiUl/s469q xK6SWH9SRH1JK8jTHPitwUBiU//3FfCqSI9HLEdDIDoTuPcVM8UetWvi4QzrzJL1 UYg8lmU0CXNmflDzZJDaMf+qFApOrGxR0YVPpBzlQvxe0JIY69g48f+JzDPz8nc= =+gNa -----END PGP SIGNATURE----- Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "Nothing particularly stands out here, probably because people were tied up with spectre/meltdown stuff last time around. Still, the main pieces are: - Rework of our CPU features framework so that we can whitelist CPUs that don't require kpti even in a heterogeneous system - Support for the IDC/DIC architecture extensions, which allow us to elide instruction and data cache maintenance when writing out instructions - Removal of the large memory model which resulted in suboptimal codegen by the compiler and increased the use of literal pools, which could potentially be used as ROP gadgets since they are mapped as executable - Rework of forced signal delivery so that the siginfo_t is well-formed and handling of show_unhandled_signals is consolidated and made consistent between different fault types - More siginfo cleanup based on the initial patches from Eric Biederman - Workaround for Cortex-A55 erratum #1024718 - Some small ACPI IORT updates and cleanups from Lorenzo Pieralisi - Misc cleanups and non-critical fixes" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (70 commits) arm64: uaccess: Fix omissions from usercopy whitelist arm64: fpsimd: Split cpu field out from struct fpsimd_state arm64: tlbflush: avoid writing RES0 bits arm64: cmpxchg: Include linux/compiler.h in asm/cmpxchg.h arm64: move percpu cmpxchg implementation from cmpxchg.h to percpu.h arm64: cmpxchg: Include build_bug.h instead of bug.h for BUILD_BUG arm64: lse: Include compiler_types.h and export.h for out-of-line LL/SC arm64: fpsimd: include <linux/init.h> in fpsimd.h drivers/perf: arm_pmu_platform: do not warn about affinity on uniprocessor perf: arm_spe: include linux/vmalloc.h for vmap() Revert "arm64: Revert L1_CACHE_SHIFT back to 6 (64-byte cache line size)" arm64: cpufeature: Avoid warnings due to unused symbols arm64: Add work around for Arm Cortex-A55 Erratum 1024718 arm64: Delay enabling hardware DBM feature arm64: Add MIDR encoding for Arm Cortex-A55 and Cortex-A35 arm64: capabilities: Handle shared entries arm64: capabilities: Add support for checks based on a list of MIDRs arm64: Add helpers for checking CPU MIDR against a range arm64: capabilities: Clean up midr range helpers arm64: capabilities: Change scope of VHE to Boot CPU feature ...	2018-04-04 16:01:43 -07:00
Peng Hao	3140c156e9	kvm: x86: fix a compile warning fix a "warning: no previous prototype". Cc: stable@vger.kernel.org Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-04-04 19:10:29 +02:00
Wanpeng Li	6c86eedc20	KVM: X86: Add Force Emulation Prefix for "emulate the next instruction" There is no easy way to force KVM to run an instruction through the emulator (by design as that will expose the x86 emulator as a significant attack-surface). However, we do wish to expose the x86 emulator in case we are testing it (e.g. via kvm-unit-tests). Therefore, this patch adds a "force emulation prefix" that is designed to raise #UD which KVM will trap and it's #UD exit-handler will match "force emulation prefix" to run instruction after prefix by the x86 emulator. To not expose the x86 emulator by default, we add a module parameter that should be off by default. A simple testcase here: #include <stdio.h> #include <string.h> #define HYPERVISOR_INFO 0x40000000 #define CPUID(idx, eax, ebx, ecx, edx) \ asm volatile (\ "ud2a; .ascii \"kvm\"; cpuid" \ :"=b" (ebx), "=a" (eax), "=c" (ecx), "=d" (edx) \ :"0"(idx) ); void main() { unsigned int eax, ebx, ecx, edx; char string[13]; CPUID(HYPERVISOR_INFO, &eax, &ebx, &ecx, &edx); (unsigned int )(string + 0) = ebx; (unsigned int )(string + 4) = ecx; (unsigned int )(string + 8) = edx; string[12] = 0; if (strncmp(string, "KVMKVMKVM\0\0\0", 12) == 0) printf("kvm guest\n"); else printf("bare hardware\n"); } Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> [Correctly handle usermode exits. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-04-04 19:09:40 +02:00
Wanpeng Li	082d06edab	KVM: X86: Introduce handle_ud() Introduce handle_ud() to handle invalid opcode, this function will be used by later patches. Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmÃ¡Å™ <rkrcmar@redhat.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-04-04 19:03:58 +02:00
Paolo Bonzini	4fde8d57cf	KVM: vmx: unify adjacent #ifdefs vmx_save_host_state has multiple ifdefs for CONFIG_X86_64 that have no other code between them. Simplify by reducing them to a single conditional. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-04-04 18:58:59 +02:00
Arnd Bergmann	51e8a8cc2f	x86: kvm: hide the unused 'cpu' variable The local variable was newly introduced but is only accessed in one place on x86_64, but not on 32-bit: arch/x86/kvm/vmx.c: In function 'vmx_save_host_state': arch/x86/kvm/vmx.c:2175:6: error: unused variable 'cpu' [-Werror=unused-variable] This puts it into another #ifdef. Fixes: `35060ed6a1` ("x86/kvm/vmx: avoid expensive rdmsr for MSR_GS_BASE") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-04-04 18:57:40 +02:00
Sean Christopherson	c75d0edc8e	KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig Remove the WARN_ON in handle_ept_misconfig() as it is unnecessary and causes false positives. Return the unmodified result of kvm_mmu_page_fault() instead of converting a system error code to KVM_EXIT_UNKNOWN so that userspace sees the error code of the actual failure, not a generic "we don't know what went wrong". * kvm_mmu_page_fault() will WARN if reserved bits are set in the SPTEs, i.e. it covers the case where an EPT misconfig occurred because of a KVM bug. * The WARN_ON will fire on any system error code that is hit while handling the fault, e.g. -ENOMEM from mmu_topup_memory_caches() while handling a legitmate MMIO EPT misconfig or -EFAULT from kvm_handle_bad_page() if the corresponding HVA is invalid. In either case, userspace should receive the original error code and firing a warning is incorrect behavior as KVM is operating as designed. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-04-04 18:00:40 +02:00
Sean Christopherson	2c151b2544	Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown" The bug that led to commit `95e057e258` was a benign warning (no adverse affects other than the warning itself) that was detected by syzkaller. Further inspection shows that the WARN_ON in question, in handle_ept_misconfig(), is unnecessary and flawed (this was also briefly discussed in the original patch: https://patchwork.kernel.org/patch/10204649). * The WARN_ON is unnecessary as kvm_mmu_page_fault() will WARN if reserved bits are set in the SPTEs, i.e. it covers the case where an EPT misconfig occurred because of a KVM bug. * The WARN_ON is flawed because it will fire on any system error code that is hit while handling the fault, e.g. -ENOMEM can be returned by mmu_topup_memory_caches() while handling a legitmate MMIO EPT misconfig. The original behavior of returning -EFAULT when userspace munmaps an HVA without first removing the memslot is correct and desirable, i.e. KVM is letting userspace know it has generated a bad address. Returning RET_PF_EMULATE masks the WARN_ON in the EPT misconfig path, but does not fix the underlying bug, i.e. the WARN_ON is bogus. Furthermore, returning RET_PF_EMULATE has the unwanted side effect of causing KVM to attempt to emulate an instruction on any page fault with an invalid HVA translation, e.g. a not-present EPT violation on a VM_PFNMAP VMA whose fault handler failed to insert a PFN. * There is no guarantee that the fault is directly related to the instruction, i.e. the fault could have been triggered by a side effect memory access in the guest, e.g. while vectoring a #DB or writing a tracing record. This could cause KVM to effectively mask the fault if KVM doesn't model the behavior leading to the fault, i.e. emulation could succeed and resume the guest. * If emulation does fail, KVM will return EMULATION_FAILED instead of -EFAULT, which is a red herring as the user will either debug a bogus emulation attempt or scratch their head wondering why we were attempting emulation in the first place. TL;DR: revert to returning -EFAULT and remove the bogus WARN_ON in handle_ept_misconfig in a future patch. This reverts commit `95e057e258`. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-04-04 18:00:36 +02:00
Stefan Fritsch	29916968c4	kvm: Add emulation for movups/movupd This is very similar to the aligned versions movaps/movapd. We have seen the corresponding emulation failures with openbsd as guest and with Windows 10 with intel HD graphics pass through. Signed-off-by: Christian Ehrhardt <christian_ehrhardt@genua.de> Signed-off-by: Stefan Fritsch <sf@sfritsch.de> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-04-04 17:52:46 +02:00
Sean Christopherson	add5ff7a21	KVM: VMX: raise internal error for exception during invalid protected mode state Exit to userspace with KVM_INTERNAL_ERROR_EMULATION if we encounter an exception in Protected Mode while emulating guest due to invalid guest state. Unlike Big RM, KVM doesn't support emulating exceptions in PM, i.e. PM exceptions are always injected via the VMCS. Because we will never do VMRESUME due to emulation_required, the exception is never realized and we'll keep emulating the faulting instruction over and over until we receive a signal. Exit to userspace iff there is a pending exception, i.e. don't exit simply on a requested event. The purpose of this check and exit is to aid in debugging a guest that is in all likelihood already doomed. Invalid guest state in PM is extremely limited in normal operation, e.g. it generally only occurs for a few instructions early in BIOS, and any exception at this time is all but guaranteed to be fatal. Non-vectored interrupts, e.g. INIT, SIPI and SMI, can be cleanly handled/emulated, while checking for vectored interrupts, e.g. INTR and NMI, without hitting false positives would add a fair amount of complexity for almost no benefit (getting hit by lightning seems more likely than encountering this specific scenario). Add a WARN_ON_ONCE to vmx_queue_exception() if we try to inject an exception via the VMCS and emulation_required is true. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-04-04 17:51:55 +02:00
Linus Torvalds	4608f06453	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next Pull sparc updates from David Miller: 1) Add support for ADI (Application Data Integrity) found in more recent sparc64 cpus. Essentially this is keyed based access to virtual memory, and if the key encoded in the virual address is wrong you get a trap. The mm changes were reviewed by Andrew Morton and others. Work by Khalid Aziz. 2) Validate DAX completion index range properly, from Rob Gardner. 3) Add proper Kconfig deps for DAX driver. From Guenter Roeck. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: sparc64: Make atomic_xchg() an inline function rather than a macro. sparc64: Properly range check DAX completion index sparc: Make auxiliary vectors for ADI available on 32-bit as well sparc64: Oracle DAX driver depends on SPARC64 sparc64: Update signal delivery to use new helper functions sparc64: Add support for ADI (Application Data Integrity) mm: Allow arch code to override copy_highpage() mm: Clear arch specific VM flags on protection change mm: Add address parameter to arch_validate_prot() sparc64: Add auxiliary vectors to report platform ADI properties sparc64: Add handler for "Memory Corruption Detected" trap sparc64: Add HV fault type handlers for ADI related faults sparc64: Add support for ADI register fields, ASIs and traps mm, swap: Add infrastructure for saving page metadata on swap signals, sparc: Add signal codes for ADI violations	2018-04-03 14:08:58 -07:00
Linus Torvalds	5bb053bef8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) Support offloading wireless authentication to userspace via NL80211_CMD_EXTERNAL_AUTH, from Srinivas Dasari. 2) A lot of work on network namespace setup/teardown from Kirill Tkhai. Setup and cleanup of namespaces now all run asynchronously and thus performance is significantly increased. 3) Add rx/tx timestamping support to mv88e6xxx driver, from Brandon Streiff. 4) Support zerocopy on RDS sockets, from Sowmini Varadhan. 5) Use denser instruction encoding in x86 eBPF JIT, from Daniel Borkmann. 6) Support hw offload of vlan filtering in mvpp2 dreiver, from Maxime Chevallier. 7) Support grafting of child qdiscs in mlxsw driver, from Nogah Frankel. 8) Add packet forwarding tests to selftests, from Ido Schimmel. 9) Deal with sub-optimal GSO packets better in BBR congestion control, from Eric Dumazet. 10) Support 5-tuple hashing in ipv6 multipath routing, from David Ahern. 11) Add path MTU tests to selftests, from Stefano Brivio. 12) Various bits of IPSEC offloading support for mlx5, from Aviad Yehezkel, Yossi Kuperman, and Saeed Mahameed. 13) Support RSS spreading on ntuple filters in SFC driver, from Edward Cree. 14) Lots of sockmap work from John Fastabend. Applications can use eBPF to filter sendmsg and sendpage operations. 15) In-kernel receive TLS support, from Dave Watson. 16) Add XDP support to ixgbevf, this is significant because it should allow optimized XDP usage in various cloud environments. From Tony Nguyen. 17) Add new Intel E800 series "ice" ethernet driver, from Anirudh Venkataramanan et al. 18) IP fragmentation match offload support in nfp driver, from Pieter Jansen van Vuuren. 19) Support XDP redirect in i40e driver, from Björn Töpel. 20) Add BPF_RAW_TRACEPOINT program type for accessing the arguments of tracepoints in their raw form, from Alexei Starovoitov. 21) Lots of striding RQ improvements to mlx5 driver with many performance improvements, from Tariq Toukan. 22) Use rhashtable for inet frag reassembly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1678 commits) net: mvneta: improve suspend/resume net: mvneta: split rxq/txq init and txq deinit into SW and HW parts ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh net: bgmac: Fix endian access in bgmac_dma_tx_ring_free() net: bgmac: Correctly annotate register space route: check sysctl_fib_multipath_use_neigh earlier than hash fix typo in command value in drivers/net/phy/mdio-bitbang. sky2: Increase D3 delay to sky2 stops working after suspend net/mlx5e: Set EQE based as default TX interrupt moderation mode ibmvnic: Disable irqs before exiting reset from closed state net: sched: do not emit messages while holding spinlock vlan: also check phy_driver ts_info for vlan's real device Bluetooth: Mark expected switch fall-throughs Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME Bluetooth: btrsi: remove unused including <linux/version.h> Bluetooth: hci_bcm: Remove DMI quirk for the MINIX Z83-4 sh_eth: kill useless check in __sh_eth_get_regs() sh_eth: add sh_eth_cpu_data::no_xdfar flag ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data() ipv4: factorize sk_wmem_alloc updates done by __ip_append_data() ...	2018-04-03 14:04:18 -07:00
Linus Torvalds	642e7fd233	Merge branch 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux Pull removal of in-kernel calls to syscalls from Dominik Brodowski: "System calls are interaction points between userspace and the kernel. Therefore, system call functions such as sys_xyzzy() or compat_sys_xyzzy() should only be called from userspace via the syscall table, but not from elsewhere in the kernel. At least on 64-bit x86, it will likely be a hard requirement from v4.17 onwards to not call system call functions in the kernel: It is better to use use a different calling convention for system calls there, where struct pt_regs is decoded on-the-fly in a syscall wrapper which then hands processing over to the actual syscall function. This means that only those parameters which are actually needed for a specific syscall are passed on during syscall entry, instead of filling in six CPU registers with random user space content all the time (which may cause serious trouble down the call chain). Those x86-specific patches will be pushed through the x86 tree in the near future. Moreover, rules on how data may be accessed may differ between kernel data and user data. This is another reason why calling sys_xyzzy() is generally a bad idea, and -- at most -- acceptable in arch-specific code. This patchset removes all in-kernel calls to syscall functions in the kernel with the exception of arch/. On top of this, it cleans up the three places where many syscalls are referenced or prototyped, namely kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h" * 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits) bpf: whitelist all syscalls for error injection kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions kernel/sys_ni: sort cond_syscall() entries syscalls/x86: auto-create compat_sys_() prototypes syscalls: sort syscall prototypes in include/linux/compat.h net: remove compat_sys_() prototypes from net/compat.h syscalls: sort syscall prototypes in include/linux/syscalls.h kexec: move sys_kexec_load() prototype to syscalls.h x86/sigreturn: use SYSCALL_DEFINE0 x86: fix sys_sigreturn() return type to be long, not unsigned long x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate() fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate() fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid() kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare() ...	2018-04-02 21:22:12 -07:00
Linus Torvalds	bc16d4052f	Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main EFI changes in this cycle were: - Fix the apple-properties code (Andy Shevchenko) - Add WARN() on arm64 if UEFI Runtime Services corrupt the reserved x18 register (Ard Biesheuvel) - Use efi_switch_mm() on x86 instead of manipulating %cr3 directly (Sai Praneeth) - Fix early memremap leak in ESRT code (Ard Biesheuvel) - Switch to L"xxx" notation for wide string literals (Ard Biesheuvel) - ... plus misc other cleanups and bugfixes" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Use efi_switch_mm() rather than manually twiddling with %cr3 x86/efi: Replace efi_pgd with efi_mm.pgd efi: Use string literals for efi_char16_t variable initializers efi/esrt: Fix handling of early ESRT table mapping efi: Use efi_mm in x86 as well as ARM efi: Make const array 'apple' static efi/apple-properties: Use memremap() instead of ioremap() efi: Reorder pr_notice() with add_device_randomness() call x86/efi: Replace GFP_ATOMIC with GFP_KERNEL in efi_query_variable_store() efi/arm64: Check whether x18 is preserved by runtime services calls efi/arm: Stop printing addresses of virtual mappings efi/apple-properties: Remove redundant attribute initialization from unmarshal_key_value_pairs() efi/arm: Only register page tables when they exist	2018-04-02 17:46:37 -07:00
Linus Torvalds	2fcd2b306a	Merge branch 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 dma mapping updates from Ingo Molnar: "This tree, by Christoph Hellwig, switches over the x86 architecture to the generic dma-direct and swiotlb code, and also unifies more of the dma-direct code between architectures. The now unused x86-only primitives are removed" * 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: dma-mapping: Don't clear GFP_ZERO in dma_alloc_attrs swiotlb: Make swiotlb_{alloc,free}_buffer depend on CONFIG_DMA_DIRECT_OPS dma/swiotlb: Remove swiotlb_{alloc,free}_coherent() dma/direct: Handle force decryption for DMA coherent buffers in common code dma/direct: Handle the memory encryption bit in common code dma/swiotlb: Remove swiotlb_set_mem_attributes() set_memory.h: Provide set_memory_{en,de}crypted() stubs x86/dma: Remove dma_alloc_coherent_gfp_flags() iommu/intel-iommu: Enable CONFIG_DMA_DIRECT_OPS=y and clean up intel_{alloc,free}_coherent() iommu/amd_iommu: Use CONFIG_DMA_DIRECT_OPS=y and dma_direct_{alloc,free}() x86/dma/amd_gart: Use dma_direct_{alloc,free}() x86/dma/amd_gart: Look at dev->coherent_dma_mask instead of GFP_DMA x86/dma: Use generic swiotlb_ops x86/dma: Use DMA-direct (CONFIG_DMA_DIRECT_OPS=y) x86/dma: Remove dma_alloc_coherent_mask()	2018-04-02 17:18:45 -07:00
Linus Torvalds	a5532439eb	Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 timer updates from Ingo Molnar: "Two changes: add the new convert_art_ns_to_tsc() API for upcoming Intel Goldmont+ drivers, and remove the obsolete rdtscll() API" * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tsc: Get rid of rdtscll() x86/tsc: Convert ART in nanoseconds to TSC	2018-04-02 16:18:31 -07:00
Linus Torvalds	cea061e455	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: "The main changes in this cycle were: - Add "Jailhouse" hypervisor support (Jan Kiszka) - Update DeviceTree support (Ivan Gorinov) - Improve DMI date handling (Andy Shevchenko)" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/PCI: Fix a potential regression when using dmi_get_bios_year() firmware/dmi_scan: Uninline dmi_get_bios_year() helper x86/devicetree: Use CPU description from Device Tree of/Documentation: Specify local APIC ID in "reg" MAINTAINERS: Add entry for Jailhouse x86/jailhouse: Allow to use PCI_MMCONFIG without ACPI x86: Consolidate PCI_MMCONFIG configs x86: Align x86_64 PCI_MMCONFIG with 32-bit variant x86/jailhouse: Enable PCI mmconfig access in inmates PCI: Scan all functions when running over Jailhouse jailhouse: Provide detection for non-x86 systems x86/devicetree: Fix device IRQ settings in DT x86/devicetree: Initialize device tree before using it pci: Simplify code by using the new dmi_get_bios_year() helper ACPI/sleep: Simplify code by using the new dmi_get_bios_year() helper x86/pci: Simplify code by using the new dmi_get_bios_year() helper dmi: Introduce the dmi_get_bios_year() helper function x86/platform/quark: Re-use DEFINE_SHOW_ATTRIBUTE() macro x86/platform/atom: Re-use DEFINE_SHOW_ATTRIBUTE() macro	2018-04-02 16:15:32 -07:00
Linus Torvalds	d22fff8141	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: - Extend the memmap= boot parameter syntax to allow the redeclaration and dropping of existing ranges, and to support all e820 range types (Jan H. Schönherr) - Improve the W+X boot time security checks to remove false positive warnings on Xen (Jan Beulich) - Support booting as Xen PVH guest (Juergen Gross) - Improved 5-level paging (LA57) support, in particular it's possible now to have a single kernel image for both 4-level and 5-level hardware (Kirill A. Shutemov) - AMD hardware RAM encryption support (SME/SEV) fixes (Tom Lendacky) - Preparatory commits for hardware-encrypted RAM support on Intel CPUs. (Kirill A. Shutemov) - Improved Intel-MID support (Andy Shevchenko) - Show EFI page tables in page_tables debug files (Andy Lutomirski) - ... plus misc fixes and smaller cleanups * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits) x86/cpu/tme: Fix spelling: "configuation" -> "configuration" x86/boot: Fix SEV boot failure from change to __PHYSICAL_MASK_SHIFT x86/mm: Update comment in detect_tme() regarding x86_phys_bits x86/mm/32: Remove unused node_memmap_size_bytes() & CONFIG_NEED_NODE_MEMMAP_SIZE logic x86/mm: Remove pointless checks in vmalloc_fault x86/platform/intel-mid: Add special handling for ACPI HW reduced platforms ACPI, x86/boot: Introduce the ->reduced_hw_early_init() ACPI callback ACPI, x86/boot: Split out acpi_generic_reduce_hw_init() and export x86/pconfig: Provide defines and helper to run MKTME_KEY_PROG leaf x86/pconfig: Detect PCONFIG targets x86/tme: Detect if TME and MKTME is activated by BIOS x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G x86/boot/compressed/64: Use page table in trampoline memory x86/boot/compressed/64: Use stack from trampoline memory x86/boot/compressed/64: Make sure we have a 32-bit code segment x86/mm: Do not use paravirtualized calls in native_set_p4d() kdump, vmcoreinfo: Export pgtable_l5_enabled value x86/boot/compressed/64: Prepare new top-level page table for trampoline x86/boot/compressed/64: Set up trampoline memory x86/boot/compressed/64: Save and restore trampoline memory ...	2018-04-02 15:45:30 -07:00
Linus Torvalds	986b37c0ae	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups and msr updates from Ingo Molnar: "The main change is a performance/latency improvement to /dev/msr access. The rest are misc cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/msr: Make rdmsrl_safe_on_cpu() scheduling safe as well x86/cpuid: Allow cpuid_read() to schedule x86/msr: Allow rdmsr_safe_on_cpu() to schedule x86/rtc: Stop using deprecated functions x86/dumpstack: Unify show_regs() x86/fault: Do not print IP in show_fault_oops() x86/MSR: Move native_* variants to msr.h	2018-04-02 15:16:43 -07:00
Linus Torvalds	e68b4bad71	Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 build updates from Ingo Molnar: "The biggest change is the forcing of asm-goto support on x86, which effectively increases the GCC minimum supported version to gcc-4.5 (on x86)" * 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Don't pass in -D__KERNEL__ multiple times x86: Remove FAST_FEATURE_TESTS x86: Force asm-goto x86/build: Drop superfluous ALIGN from the linker script	2018-04-02 14:37:03 -07:00
Linus Torvalds	5e46caf62d	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm fixlets from Ingo Molnar: "A clobber list fix and cleanups" * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm: Trim clear_page.S includes x86/asm: Clobber flags in clear_page()	2018-04-02 14:06:47 -07:00
Linus Torvalds	2451d1e59d	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 apic updates from Ingo Molnar: "The main x86 APIC/IOAPIC changes in this cycle were: - Robustify kexec support to more carefully restore IRQ hardware state before calling into kexec/kdump kernels. (Baoquan He) - Clean up the local APIC code a bit (Dou Liyang) - Remove unused callbacks (David Rientjes)" * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Finish removing unused callbacks x86/apic: Drop logical_smp_processor_id() inline x86/apic: Modernize the pending interrupt code x86/apic: Move pending interrupt check code into it's own function x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified x86/apic: Rename variables and functions related to x86_io_apic_ops x86/apic: Remove the (now) unused disable_IO_APIC() function x86/apic: Fix restoring boot IRQ mode in reboot and kexec/kdump x86/apic: Split disable_IO_APIC() into two functions to fix CONFIG_KEXEC_JUMP=y x86/apic: Split out restore_boot_irq_mode() from disable_IO_APIC() x86/apic: Make setup_local_APIC() static x86/apic: Simplify init_bsp_APIC() usage x86/x2apic: Mark set_x2apic_phys_mode() as __init	2018-04-02 13:38:43 -07:00
Linus Torvalds	86bbbebac1	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Ingo Molnar: "The main changes in this cycle were: - AMD MCE support/decoding improvements (Yazen Ghannam) - general MCE header cleanups and reorganization (Borislav Petkov)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "x86/mce/AMD: Collect error info even if valid bits are not set" x86/MCE: Cleanup and complete struct mce fields definitions x86/mce/AMD: Carve out SMCA get_block_address() code x86/mce/AMD: Get address from already initialized block x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type x86/mce/AMD: Pass the bank number to smca_get_bank_type() x86/mce/AMD: Collect error info even if valid bits are not set x86/mce: Issue the 'mcelog --ascii' message only on !AMD x86/mce: Convert 'struct mca_config' bools to a bitfield x86/mce: Put private structures and definitions into the internal header	2018-04-02 11:47:07 -07:00
Dominik Brodowski	3e2052e5dd	syscalls/x86: auto-create compat_sys_() prototypes compat_sys_() functions are no longer called from within the kernel on x86 except from the system call table. Linking the system call does not require compat_sys_() function prototypes at least on x86. Therefore, generate compat_sys_() prototypes on-the-fly within the COMPAT_SYSCALL_DEFINEx() macro, and remove x86-specific prototypes from various header files. Suggested-by: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: x86@kernel.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:18 +02:00
Tautschnig, Michael	4c8ca51af7	x86/sigreturn: use SYSCALL_DEFINE0 All definitions of syscalls in x86 except for those patched here have already been using the appropriate SYSCALL_DEFINE*. Signed-off-by: Michael Tautschnig <tautschn@amazon.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jaswinder Singh <jaswinder@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: x86@kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:14 +02:00
Dominik Brodowski	025bd3905a	x86: fix sys_sigreturn() return type to be long, not unsigned long Same as with other system calls, sys_sigreturn() should return a value of type long, not unsigned long. This also matches the behaviour for IA32_EMULATION, see sys32_sigreturn() in arch/x86/ia32/ia32_signal.c . Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: x86@kernel.org Cc: Michael Tautschnig <tautschn@amazon.co.uk> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:13 +02:00
Dominik Brodowski	66f4e88cc6	x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() Using this helper allows us to avoid the in-kernel calls to the sys_ioperm() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_ioperm(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: x86@kernel.org Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:12 +02:00
Dominik Brodowski	c7b95d5156	mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead() Using this helper allows us to avoid the in-kernel calls to the sys_readahead() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_readahead(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:12 +02:00
Dominik Brodowski	a90f590a1b	mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff() Using this helper allows us to avoid the in-kernel calls to the sys_mmap_pgoff() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_mmap_pgoff(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:11 +02:00
Dominik Brodowski	9d5b7c956b	mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64() Using the ksys_fadvise64_64() helper allows us to avoid the in-kernel calls to the sys_fadvise64_64() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as ksys_fadvise64_64(). Some compat stubs called sys_fadvise64(), which then just passed through the arguments to sys_fadvise64_64(). Get rid of this indirection, and call ksys_fadvise64_64() directly. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:10 +02:00
Dominik Brodowski	edf292c76b	fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate() Using the ksys_fallocate() wrapper allows us to get rid of in-kernel calls to the sys_fallocate() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_fallocate(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:09 +02:00
Dominik Brodowski	36028d5dd7	fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls Using the ksys_p{read,write}64() wrappers allows us to get rid of in-kernel calls to the sys_pread64() and sys_pwrite64() syscalls. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_p{read,write}64(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:09 +02:00
Dominik Brodowski	df260e21e6	fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate() Using the ksys_truncate() wrapper allows us to get rid of in-kernel calls to the sys_truncate() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_truncate(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:08 +02:00
Dominik Brodowski	806cbae122	fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall Using this helper allows us to avoid the in-kernel calls to the sys_sync_file_range() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_sync_file_range(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:07 +02:00
Dominik Brodowski	411d9475cf	fs: add ksys_ftruncate() wrapper; remove in-kernel calls to sys_ftruncate() Using the ksys_ftruncate() wrapper allows us to get rid of in-kernel calls to the sys_ftruncate() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_ftruncate(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:16:00 +02:00
Dominik Brodowski	ab0d1e85bf	fs/quota: use COMPAT_SYSCALL_DEFINE for sys32_quotactl() While sys32_quotactl() is only needed on x86, it can use the recommended COMPAT_SYSCALL_DEFINEx() machinery for its setup. Acked-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:15:47 +02:00
Dominik Brodowski	b51d3cdf44	x86: remove compat_sys_x86_waitpid() compat_sys_x86_waitpid() is not needed, as it takes the same parameters (int, *int, int) as the native syscall. Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: x86@kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:15:03 +02:00
Dominik Brodowski	37db219ef9	x86: use _do_fork() in compat_sys_x86_clone() It is trivial to directly call _do_fork() instead of the sys_clone() syscall in compat_sys_x86_clone(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: x86@kernel.org Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>	2018-04-02 20:15:03 +02:00
Linus Torvalds	486adcea4a	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main kernel side changes were: - Modernize the kprobe and uprobe creation/destruction tooling ABIs: The existing text based APIs (kprobe_events and uprobe_events in tracefs), are naive, limited ABIs in that they require user-space to clean up after themselves, which is both difficult and fragile if the tool is buggy or exits unexpectedly. In other words they are not really suited for modern, robust tooling. So introduce a modern, file descriptor based ABI that does not have these limitations: introduce the 'perf_kprobe' and 'perf_uprobe' PMUs and extend the perf_event_open() syscall to create events with a kprobe/uprobe attached to them. These [k,u]probe are associated with this file descriptor, so they are not available in tracefs. (Song Liu) - Intel Cannon Lake CPU support (Harry Pan) - Intel PT cleanups (Alexander Shishkin) - Improve the performance of pinned/flexible event groups by using RB trees (Alexey Budankov) - Add PERF_EVENT_IOC_MODIFY_ATTRIBUTES which allows the modification of hardware breakpoints, which new ABI variant massively speeds up existing tooling that uses hardware breakpoints to instrument (and debug) memory usage. (Milind Chabbi, Jiri Olsa) - Various Intel PEBS handling fixes and improvements, and other Intel PMU improvements (Kan Liang) - Various perf core improvements and optimizations (Peter Zijlstra) - ... misc cleanups, fixes and updates. There's over 200 tooling commits, here's an (imperfect) list of highlights: - 'perf annotate' improvements: * Recognize and handle jumps to other functions as calls, which improves the navigation along jumps and back. (Arnaldo Carvalho de Melo) * Add the 'P' hotkey in TUI annotation to dump annotation output into a file, to ease e-mail reporting of annotation details. (Arnaldo Carvalho de Melo) * Add an IPC/cycles column to the TUI (Jin Yao) * Improve s390 assembly annotation (Thomas Richter) * Refactor the output formatting logic to better separate it into interactive and non-interactive features and add the --stdio2 output variant to demonstrate this. (Arnaldo Carvalho de Melo) - 'perf script' improvements: * Add Python 3 support (Jaroslav Škarvada) * Add --show-round-event (Jiri Olsa) - 'perf c2c' improvements: * Add NUMA analysis support (Jiri Olsa) - 'perf trace' improvements: * Improve PowerPC support (Ravi Bangoria) - 'perf inject' improvements: * Integrate ARM CoreSight traces (Robert Walker) - 'perf stat' improvements: * Add the --interval-count option (yuzhoujian) * Add the --timeout option (yuzhoujian) - 'perf sched' improvements (Changbin Du) - Vendor events improvements : * Add IBM s390 vendor events (Thomas Richter) * Add and improve arm64 vendor events (John Garry, Ganapatrao Kulkarni) * Update POWER9 vendor events (Sukadev Bhattiprolu) - Intel PT tooling improvements (Adrian Hunter) - PMU handling improvements (Agustin Vega-Frias) - Record machine topology in perf.data (Jiri Olsa) - Various overwrite related cleanups (Kan Liang) - Add arm64 dwarf post unwind support (Kim Phillips, Jean Pihet) - ... and lots of other changes, cleanups and fixes, see the shortlog and Git history for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (262 commits) perf/x86/intel: Enable C-state residency events for Cannon Lake perf/x86/intel: Add Cannon Lake support for RAPL profiling perf/x86/pt, coresight: Clean up address filter structure perf vendor events s390: Add JSON files for IBM z14 perf vendor events s390: Add JSON files for IBM z13 perf vendor events s390: Add JSON files for IBM zEC12 zBC12 perf vendor events s390: Add JSON files for IBM z196 perf vendor events s390: Add JSON files for IBM z10EC z10BC perf mmap: Be consistent when checking for an unmaped ring buffer perf mmap: Fix accessing unmapped mmap in perf_mmap__read_done() perf build: Fix check-headers.sh opts assignment perf/x86: Update rdpmc_always_available static key to the modern API perf annotate: Use absolute addresses to calculate jump target offsets perf annotate: Defer searching for comma in raw line till it is needed perf annotate: Support jumping from one function to another perf annotate: Add "_local" to jump/offset validation routines perf python: Reference Py_None before returning it perf annotate: Mark jumps to outher functions with the call arrow perf annotate: Pass function descriptor to its instruction parsing routines perf annotate: No need to calculate notes->start twice ...	2018-04-02 11:06:34 -07:00
Linus Torvalds	701f3b3149	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in the locking subsystem in this cycle were: - Add the Linux Kernel Memory Consistency Model (LKMM) subsystem, which is an an array of tools in tools/memory-model/ that formally describe the Linux memory coherency model (a.k.a. Documentation/memory-barriers.txt), and also produce 'litmus tests' in form of kernel code which can be directly executed and tested. Here's a high level background article about an earlier version of this work on LWN.net: https://lwn.net/Articles/718628/ The design principles: "There is reason to believe that Documentation/memory-barriers.txt could use some help, and a major purpose of this patch is to provide that help in the form of a design-time tool that can produce all valid executions of a small fragment of concurrent Linux-kernel code, which is called a "litmus test". This tool's functionality is roughly similar to a full state-space search. Please note that this is a design-time tool, not useful for regression testing. However, we hope that the underlying Linux-kernel memory model will be incorporated into other tools capable of analyzing large bodies of code for regression-testing purposes." [...] "A second tool is klitmus7, which converts litmus tests to loadable kernel modules for direct testing. As with herd7, the klitmus7 code is freely available from http://diy.inria.fr/sources/index.html (and via "git" at https://github.com/herd/herdtools7)" [...] Credits go to: "This patch was the result of a most excellent collaboration founded by Jade Alglave and also including Alan Stern, Andrea Parri, and Luc Maranget." ... and to the gents listed in the MAINTAINERS entry: LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM) M: Alan Stern <stern@rowland.harvard.edu> M: Andrea Parri <parri.andrea@gmail.com> M: Will Deacon <will.deacon@arm.com> M: Peter Zijlstra <peterz@infradead.org> M: Boqun Feng <boqun.feng@gmail.com> M: Nicholas Piggin <npiggin@gmail.com> M: David Howells <dhowells@redhat.com> M: Jade Alglave <j.alglave@ucl.ac.uk> M: Luc Maranget <luc.maranget@inria.fr> M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> The LKMM project already found several bugs in Linux locking primitives and improved the understanding and the documentation of the Linux memory model all around. - Add KASAN instrumentation to atomic APIs (Dmitry Vyukov) - Add RWSEM API debugging and reorganize the lock debugging Kconfig (Waiman Long) - ... misc cleanups and other smaller changes" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) locking/Kconfig: Restructure the lock debugging menu locking/Kconfig: Add LOCK_DEBUGGING_SUPPORT to make it more readable locking/rwsem: Add DEBUG_RWSEMS to look for lock/unlock mismatches lockdep: Make the lock debug output more useful locking/rtmutex: Handle non enqueued waiters gracefully in remove_waiter() locking/atomic, asm-generic, x86: Add comments for atomic instrumentation locking/atomic, asm-generic: Add KASAN instrumentation to atomic operations locking/atomic/x86: Switch atomic.h to use atomic-instrumented.h locking/atomic, asm-generic: Add asm-generic/atomic-instrumented.h locking/xchg/alpha: Remove superfluous memory barriers from the _local() variants tools/memory-model: Finish the removal of rb-dep, smp_read_barrier_depends(), and lockless_dereference() tools/memory-model: Add documentation of new litmus test tools/memory-model: Remove mention of docker/gentoo image locking/memory-barriers: De-emphasize smp_read_barrier_depends() some more locking/lockdep: Show unadorned pointers mutex: Drop linkage.h from mutex.h tools/memory-model: Remove rb-dep, smp_read_barrier_depends, and lockless_dereference tools/memory-model: Convert underscores to hyphens tools/memory-model: Add a S lock-based external-view litmus test tools/memory-model: Add required herd7 version to README file ...	2018-04-02 10:27:16 -07:00
Linus Torvalds	320b164abb	main drm pull request for v4.17 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJavDxYAAoJEAx081l5xIa+pCoP/iwjuxkSTdJpZUx5g0daGkCK O18moGqGPChb7qJovfHqCKZ1f9PGulQt7SxwFzzJXNbv0PbfMA/Og0EhMLBImb+Q VfYgq2vJLpmkikgcI5fBrzs9DRMQKKobGIzw24VS7IkPYA7d8KgAyywBwG0+LUFR G3sobClgapsfaUcleb3ZOeDwymGkGCuuYRpYE4giHtuMDIxCWLePKJKOaOIq8o6P A1557EvSbKuLQGI9X50jzJOoBE3TKRQYkzuM1GthdOF8RHaMNcFy44lDNO030HwZ hzwAIg5Izhu16PqZGyEdIQ6SJTv3isRJWEciPnOsijvjl1li3ehMdQfhGISa/jZO ivEGd32kaactiT0jJ5OyexergEViCPVKCIORksSIk46L84luDva9L22A3yu0mf3F ixB63bAiLH7Py77kH3DmeJdqhMxlVZXCbdBVFDvzZvY4O3Mx0Dv9mmN/nw1FVCFH scSYnXea9/o4IY5yGASU6FAUJEEGu20HAN12oHJw7/taqV/gbbEos3F7AGmjJE0f qe6Rt/8fwi7Lhm2va6EoOo6yltH/gL4/AgnsN76VzppNGbaIv7W8Qa4Y/ES1lAE1 SATAEUJfU8kiLrVOolIElPbgfdJwv8TzoxiKB5wK/eoH20wf4BTmOuBMviaL2qXK Sz6wihq+IlMXW7Y7pIl/ =DrA+ -----END PGP SIGNATURE----- Merge tag 'drm-for-v4.17' of git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "Cannonlake and Vega12 support are probably the two major things. This pull lacks nouveau, Ben had some unforseen leave and a few other blockers so we'll see how things look or maybe leave it for this merge window. core: - Device links to handle sound/gpu pm dependency - Color encoding/range properties - Plane clipping into plane check helper - Backlight helpers - DP TP4 + HBR3 helper support amdgpu: - Vega12 support - Enable DC by default on all supported GPUs - Powerplay restructuring and cleanup - DC bandwidth calc updates - DC backlight on pre-DCE11 - TTM backing store dropping support - SR-IOV fixes - Adding "wattman" like functionality - DC crc support - Improved DC dual-link handling amdkfd: - GPUVM support for dGPU - KFD events for dGPU - Enable PCIe atomics for dGPUs - HSA process eviction support - Live-lock fixes for process eviction - VM page table allocation fix for large-bar systems panel: - Raydium RM68200 - AUO G104SN02 V2 - KEO TX31D200VM0BAA - ARM Versatile panels i915: - Cannonlake support enabled - AUX-F port support added - Icelake base enabling until internal milestone of forcewake support - Query uAPI interface (used for GPU topology information currently) - Compressed framebuffer support for sprites - kmem cache shrinking when GPU is idle - Avoid boosting GPU when waited item is being processed already - Avoid retraining LSPCON link unnecessarily - Decrease request signaling latency - Deprecation of I915_SET_COLORKEY_NONE - Kerneldoc and compiler warning cleanup for upcoming CI enforcements - Full range ycbcr toggling - HDCP support i915/gvt: - Big refactor for shadow ppgtt - KBL context save/restore via LRI cmd (Weinan) - Properly unmap dma for guest page (Changbin) vmwgfx: - Lots of various improvements etnaviv: - Use the drm gpu scheduler - prep work for GC7000L support vc4: - fix alpha blending - Expose perf counters to userspace pl111: - Bandwidth checking/limiting - Versatile panel support sun4i: - A83T HDMI support - A80 support - YUV plane support - H3/H5 HDMI support omapdrm: - HPD support for DVI connector - remove lots of static variables msm: - DSI updates from 10nm / SDM845 - fix for race condition with a3xx/a4xx fence completion irq - some refactoring/prep work for eventual a6xx support (ie. when we have a userspace) - a5xx debugfs enhancements - some mdp5 fixes/cleanups to prepare for eventually merging writeback - support (ie. when we have a userspace) tegra: - mmap() fixes for fbdev devices - Overlay plane for hw cursor fix - dma-buf cache maintenance support mali-dp: - YUV->RGB conversion support rockchip: - rk3399/chromebook fixes and improvements rcar-du: - LVDS support move to drm bridge - DT bindings for R8A77995 - Driver/DT support for R8A77970 tilcdc: - DRM panel support" * tag 'drm-for-v4.17' of git://people.freedesktop.org/~airlied/linux: (1646 commits) drm/i915: Fix hibernation with ACPI S0 target state drm/i915/execlists: Use a locked clear_bit() for synchronisation with interrupt drm/i915: Specify which engines to reset following semaphore/event lockups drm/i915/dp: Write to SET_POWER dpcd to enable MST hub. drm/amdkfd: Use ordered workqueue to restore processes drm/amdgpu: Fix acquiring VM on large-BAR systems drm/amd/pp: clean header file hwmgr.h drm/amd/pp: use mlck_table.count for array loop index limit drm: Fix uabi regression by allowing garbage mode->type from userspace drm/amdgpu: Add an ATPX quirk for hybrid laptop drm/amdgpu: fix spelling mistake: "asssert" -> "assert" drm/amd/pp: Add new asic support in pp_psm.c drm/amd/pp: Clean up powerplay code on Vega12 drm/amd/pp: Add smu irq handlers for legacy asics drm/amd/pp: Fix set wrong temperature range on smu7 drm/amdgpu: Don't change preferred domian when fallback GTT v5 drm/vmwgfx: Bump version patchlevel and date drm/vmwgfx: use monotonic event timestamps drm/vmwgfx: Unpin the screen object backup buffer when not used drm/vmwgfx: Stricter count of legacy surface device resources ...	2018-04-02 07:59:23 -07:00
David S. Miller	c0b458a946	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor conflicts in drivers/net/ethernet/mellanox/mlx5/core/en_rep.c, we had some overlapping changes: 1) In 'net' MLX5E_PARAMS_LOG_{SQ,RQ}_SIZE --> MLX5E_REP_PARAMS_LOG_{SQ,RQ}_SIZE 2) In 'net-next' params->log_rq_size is renamed to be params->log_rq_mtu_frames. 3) In 'net-next' params->hard_mtu is added. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-04-01 19:49:34 -04:00
Linus Torvalds	10b84daddb	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Two fixlets" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/hwbp: Simplify the perf-hwbp code, fix documentation perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and later CPUs	2018-03-31 07:59:00 -10:00
Linus Torvalds	ad0500ca87	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two UV platform fixes, and a kbuild fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/UV: Fix critical UV MMR address error x86/platform/uv/BAU: Add APIC idt entry x86/purgatory: Avoid creating stray .<pid>.d files, remove -MD from KBUILD_CFLAGS	2018-03-31 07:50:30 -10:00
Linus Torvalds	93e04d4ad7	Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 PTI fixes from Ingo Molnar: "Two fixes: a relatively simple objtool fix that makes Clang built kernels work with ORC debug info, plus an alternatives macro fix" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternatives: Fixup alternative_call_2 objtool: Add Clang support	2018-03-31 07:26:48 -10:00
Harry Pan	1159e09476	perf/x86/intel: Enable C-state residency events for Cannon Lake Cannon Lake supports C1/C3/C6/C7, PC2/PC3/PC6/PC7/PC8/PC9/PC10 state residency counters, this patch enables those counters. ( The MSR information is based on Intel Software Developers' Manual, Vol. 4, Order No. 335592. ) Tested-by: Puthikorn Voravootivat <puthik@chromium.org> Signed-off-by: Harry Pan <harry.pan@intel.com> Reviewed-by: Benson Leung <bleung@chromium.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan.liang@intel.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: gs0622@gmail.com Link: http://lkml.kernel.org/r/20180309121549.630-3-harry.pan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-31 11:28:36 +02:00
Harry Pan	490d03e83d	perf/x86/intel: Add Cannon Lake support for RAPL profiling This patch enables RAPL counters (energy consumption counters) support for Cannon Lake processors. ( ESU and power domains refer to Intel Software Developers' Manual, Vol. 4, Order No. 335592. ) Usage example: $ perf list $ perf stat -a -e power/energy-cores/,power/energy-pkg/ sleep 10 Tested-by: Puthikorn Voravootivat <puthik@chromium.org> Signed-off-by: Harry Pan <harry.pan@intel.com> Reviewed-by: Benson Leung <bleung@chromium.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: colin.king@canonical.com Cc: gs0622@gmail.com Cc: kan.liang@linux.intel.com Link: http://lkml.kernel.org/r/20180309121549.630-2-harry.pan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-31 11:28:36 +02:00
Colin Ian King	eaeb8e76cd	x86/cpu/tme: Fix spelling: "configuation" -> "configuration" Trivial fix to spelling mistake in the pr_err_once() error message text. Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Link: http://lkml.kernel.org/r/20180313154709.1015-1-colin.king@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-31 10:46:40 +02:00
Cao jin	d6289f36aa	x86/build: Don't pass in -D__KERNEL__ multiple times Some .<target>.cmd files under arch/x86 are showing two instances of -D__KERNEL__, like arch/x86/boot/ and arch/x86/realmode/rm/. __KERNEL__ is already defined in KBUILD_CPPFLAGS in the top Makefile, so it can be dropped safely. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kbuild@vger.kernel.org Link: http://lkml.kernel.org/r/20180316084944.3997-1-caoj.fnst@cn.fujitsu.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-31 08:02:07 +02:00
Ingo Molnar	169310f71f	Merge branch 'linus' into locking/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-31 07:30:17 +02:00
Linus Torvalds	72573481eb	KVM fixes for v4.16-rc8 PPC: - Fix a bug causing occasional machine check exceptions on POWER8 hosts (introduced in 4.16-rc1) x86: - Fix a guest crashing regression with nested VMX and restricted guest (introduced in 4.16-rc1) - Fix dependency check for pv tlb flush (The wrong dependency that effectively disabled the feature was added in 4.16-rc4, the original feature in 4.16-rc1, so it got decent testing.) -----BEGIN PGP SIGNATURE----- iQEcBAABCAAGBQJavUt5AAoJEED/6hsPKofo8uQH/RuijrsAIUnymkYY+6BYFXlh Ri8qhG8VB+C3SpWEtsqcqNVkjJTepCD2Ej5BJTL4Gc9BSTWy7Ht6kqskEgwcnzu2 xRfkg0q0vTj1+GDd+UiTZfxiinoHtB9x3fiXali5UNTCd1fweLxdidETfO+GqMMq KDhTR+S8dXE5VG7r+iJ80LZPtHQJ94f0fh9XpQk3X2ExTG5RBxag1U2nCfiKRAZk xRv1CNAxNaBxS38CgYfHzg31NJx38fnq/qREsIdOx0Ju9WQkglBFkhLAGUb4vL0I nn8YX/oV9cW2G8tyPWjC245AouABOLbzu0xyj5KgCY/z1leA9tdLFX/ET6Zye+E= =++uZ -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Radim Krčmář: "PPC: - Fix a bug causing occasional machine check exceptions on POWER8 hosts (introduced in 4.16-rc1) x86: - Fix a guest crashing regression with nested VMX and restricted guest (introduced in 4.16-rc1) - Fix dependency check for pv tlb flush (the wrong dependency that effectively disabled the feature was added in 4.16-rc4, the original feature in 4.16-rc1, so it got decent testing)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Fix pv tlb flush dependencies KVM: nVMX: sync vmcs02 segment regs prior to vmx_set_cr0 KVM: PPC: Book3S HV: Fix duplication of host SLB entries	2018-03-30 07:24:14 -10:00
Jason A. Donenfeld	530ba6c7cb	um: Compile with modern headers Recent libcs have gotten a bit more strict, so we actually need to include the right headers and use the right types. This enables UML to compile again. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: stable@vger.kernel.org Signed-off-by: Richard Weinberger <richard@nod.at>	2018-03-29 22:15:12 +02:00
Alexander Shishkin	6ed70cf342	perf/x86/pt, coresight: Clean up address filter structure This is a cosmetic patch that deals with the address filter structure's ambiguous fields 'filter' and 'range'. The former stands to mean that the filter's action should be to filter the traces to its address range if it's set or stop tracing if it's unset. This is confusing and hard on the eyes, so this patch replaces it with 'action' enum. The 'range' field is completely redundant (meaning that the filter is an address range as opposed to a single address trigger), as we can use zero size to mean the same thing. Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/20180329120648.11902-1-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-29 16:07:22 +02:00
Ingo Molnar	2d074918fb	Merge branch 'perf/urgent' into perf/core Conflicts: kernel/events/hw_breakpoint.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-29 16:03:48 +02:00
Liran Alon	f497b6c25d	KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending When vCPU runs L2 and there is a pending event that requires to exit from L2 to L1 and nested_run_pending=1, vcpu_enter_guest() will request an immediate-exit from L2 (See req_immediate_exit). Since now handling of req_immediate_exit also makes sure to set KVM_REQ_EVENT, there is no need to also set it on vmx_vcpu_run() when nested_run_pending=1. This optimizes cases where VMRESUME was executed by L1 to enter L2 and there is no pending events that require exit from L2 to L1. Previously, this would have set KVM_REQ_EVENT unnecessarly. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Liran Alon	1a680e355c	KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending In case L2 VMExit to L0 during event-delivery, VMCS02 is filled with IDT-vectoring-info which vmx_complete_interrupts() makes sure to reinject before next resume of L2. While handling the VMExit in L0, an IPI could be sent by another L1 vCPU to the L1 vCPU which currently runs L2 and exited to L0. When L0 will reach vcpu_enter_guest() and call inject_pending_event(), it will note that a previous event was re-injected to L2 (by IDT-vectoring-info) and therefore won't check if there are pending L1 events which require exit from L2 to L1. Thus, L0 enters L2 without immediate VMExit even though there are pending L1 events! This commit fixes the issue by making sure to check for L1 pending events even if a previous event was reinjected to L2 and bailing out from inject_pending_event() before evaluating a new pending event in case an event was already reinjected. The bug was observed by the following setup: * L0 is a 64CPU machine which runs KVM. * L1 is a 16CPU machine which runs KVM. * L0 & L1 runs with APICv disabled. (Also reproduced with APICv enabled but easier to analyze below info with APICv disabled) * L1 runs a 16CPU L2 Windows Server 2012 R2 guest. During L2 boot, L1 hangs completely and analyzing the hang reveals that one L1 vCPU is holding KVM's mmu_lock and is waiting forever on an IPI that he has sent for another L1 vCPU. And all other L1 vCPUs are currently attempting to grab mmu_lock. Therefore, all L1 vCPUs are stuck forever (as L1 runs with kernel-preemption disabled). Observing /sys/kernel/debug/tracing/trace_pipe reveals the following series of events: (1) qemu-system-x86-19066 [030] kvm_nested_vmexit: rip: 0xfffff802c5dca82f reason: EPT_VIOLATION ext_inf1: 0x0000000000000182 ext_inf2: 0x00000000800000d2 ext_int: 0x00000000 ext_int_err: 0x00000000 (2) qemu-system-x86-19054 [028] kvm_apic_accept_irq: apicid f vec 252 (Fixed\|edge) (3) qemu-system-x86-19066 [030] kvm_inj_virq: irq 210 (4) qemu-system-x86-19066 [030] kvm_entry: vcpu 15 (5) qemu-system-x86-19066 [030] kvm_exit: reason EPT_VIOLATION rip 0xffffe00069202690 info 83 0 (6) qemu-system-x86-19066 [030] kvm_nested_vmexit: rip: 0xffffe00069202690 reason: EPT_VIOLATION ext_inf1: 0x0000000000000083 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000 (7) qemu-system-x86-19066 [030] kvm_nested_vmexit_inject: reason: EPT_VIOLATION ext_inf1: 0x0000000000000083 ext_inf2: 0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000 (8) qemu-system-x86-19066 [030] kvm_entry: vcpu 15 Which can be analyzed as follows: (1) L2 VMExit to L0 on EPT_VIOLATION during delivery of vector 0xd2. Therefore, vmx_complete_interrupts() will set KVM_REQ_EVENT and reinject a pending-interrupt of 0xd2. (2) L1 sends an IPI of vector 0xfc (CALL_FUNCTION_VECTOR) to destination vCPU 15. This will set relevant bit in LAPIC's IRR and set KVM_REQ_EVENT. (3) L0 reach vcpu_enter_guest() which calls inject_pending_event() which notes that interrupt 0xd2 was reinjected and therefore calls vmx_inject_irq() and returns. Without checking for pending L1 events! Note that at this point, KVM_REQ_EVENT was cleared by vcpu_enter_guest() before calling inject_pending_event(). (4) L0 resumes L2 without immediate-exit even though there is a pending L1 event (The IPI pending in LAPIC's IRR). We have already reached the buggy scenario but events could be furthered analyzed: (5+6) L2 VMExit to L0 on EPT_VIOLATION. This time not during event-delivery. (7) L0 decides to forward the VMExit to L1 for further handling. (8) L0 resumes into L1. Note that because KVM_REQ_EVENT is cleared, the LAPIC's IRR is not examined and therefore the IPI is still not delivered into L1! Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Liran Alon	a042c26fd8	KVM: x86: Fix misleading comments on handling pending exceptions The reason that exception.pending should block re-injection of NMI/interrupt is not described correctly in comment in code. Instead, it describes why a pending exception should be injected before a pending NMI/interrupt. Therefore, move currently present comment to code-block evaluating a new pending event which explains why exception.pending is evaluated first. In addition, create a new comment describing that exception.pending blocks re-injection of NMI/interrupt because the exception was queued by handling vmexit which was due to NMI/interrupt delivery. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@orcle.com> [Used a comment from Sean J <sean.j.christopherson@intel.com>. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Liran Alon	04140b4144	KVM: x86: Rename interrupt.pending to interrupt.injected For exceptions & NMIs events, KVM code use the following coding convention: ) "pending" represents an event that should be injected to guest at some point but it's side-effects have not yet occurred. ) "injected" represents an event that it's side-effects have already occurred. However, interrupts don't conform to this coding convention. All current code flows mark interrupt.pending when it's side-effects have already taken place (For example, bit moved from LAPIC IRR to ISR). Therefore, it makes sense to just rename interrupt.pending to interrupt.injected. This change follows logic of previous commit `664f8e26b0` ("KVM: X86: Fix loss of exception which has not yet been injected") which changed exception to follow this coding convention as well. It is important to note that in case !lapic_in_kernel(vcpu), interrupt.pending usage was and still incorrect. In this case, interrrupt.pending can only be set using one of the following ioctls: KVM_INTERRUPT, KVM_SET_VCPU_EVENTS and KVM_SET_SREGS. Looking at how QEMU uses these ioctls, one can see that QEMU uses them either to re-set an "interrupt.pending" state it has received from KVM (via KVM_GET_VCPU_EVENTS interrupt.pending or via KVM_GET_SREGS interrupt_bitmap) or by dispatching a new interrupt from QEMU's emulated LAPIC which reset bit in IRR and set bit in ISR before sending ioctl to KVM. So it seems that indeed "interrupt.pending" in this case is also suppose to represent "interrupt.injected". However, kvm_cpu_has_interrupt() & kvm_cpu_has_injectable_intr() is misusing (now named) interrupt.injected in order to return if there is a pending interrupt. This leads to nVMX/nSVM not be able to distinguish if it should exit from L2 to L1 on EXTERNAL_INTERRUPT on pending interrupt or should re-inject an injected interrupt. Therefore, add a FIXME at these functions for handling this issue. This patch introduce no semantics change. Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Liran Alon	7c5a6a5970	KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt kvm_inject_realmode_interrupt() is called from one of the injection functions which writes event-injection to VMCS: vmx_queue_exception(), vmx_inject_irq() and vmx_inject_nmi(). All these functions are called just to cause an event-injection to guest. They are not responsible of manipulating the event-pending flag. The only purpose of kvm_inject_realmode_interrupt() should be to emulate real-mode interrupt-injection. This was also incorrect when called from vmx_queue_exception(). Signed-off-by: Liran Alon <liran.alon@oracle.com> Reviewed-by: Nikita Leshenko <nikita.leshchenko@oracle.com> Reviewed-by: Jim Mattson <jmattson@google.com> Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Vitaly Kuznetsov	773e8a0425	x86/kvm: use Enlightened VMCS when running on Hyper-V Enlightened VMCS is just a structure in memory, the main benefit besides avoiding somewhat slower VMREAD/VMWRITE is using clean field mask: we tell the underlying hypervisor which fields were modified since VMEXIT so there's no need to inspect them all. Tight CPUID loop test shows significant speedup: Before: 18890 cycles After: 8304 cycles Static key is being used to avoid performance penalty for non-Hyper-V deployments. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Vitaly Kuznetsov	5431390b30	x86/hyper-v: detect nested features TLFS 5.0 says: "Support for an enlightened VMCS interface is reported with CPUID leaf 0x40000004. If an enlightened VMCS interface is supported, additional nested enlightenments may be discovered by reading the CPUID leaf 0x4000000A (see 2.4.11)." Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Vitaly Kuznetsov	68d1eb72ee	x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits The definitions are according to the Hyper-V TLFS v5.0. KVM on Hyper-V will use these. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Vitaly Kuznetsov	a46d15cc1a	x86/hyper-v: allocate and use Virtual Processor Assist Pages Virtual Processor Assist Pages usage allows us to do optimized EOI processing for APIC, enable Enlightened VMCS support in KVM and more. struct hv_vp_assist_page is defined according to the Hyper-V TLFS v5.0b. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Ladi Prosek	d4abc577bb	x86/kvm: rename HV_X64_MSR_APIC_ASSIST_PAGE to HV_X64_MSR_VP_ASSIST_PAGE The assist page has been used only for the paravirtual EOI so far, hence the "APIC" in the MSR name. Renaming to match the Hyper-V TLFS where it's called "Virtual VP Assist MSR". Signed-off-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Vitaly Kuznetsov	415bd1cd3a	x86/hyper-v: move definitions from TLFS to hyperv-tlfs.h mshyperv.h now only contains fucntions/variables we define in kernel, all definitions from TLFS should go to hyperv-tlfs.h. 'enum hv_cpuid_function' is removed as we already have this info in hyperv-tlfs.h, code in mshyperv.c is adjusted accordingly. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Vitaly Kuznetsov	5a48580322	x86/hyper-v: move hyperv.h out of uapi hyperv.h is not part of uapi, there are no (known) users outside of kernel. We are making changes to this file to match current Hyper-V Hypervisor Top-Level Functional Specification (TLFS, see: https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs) and we don't want to maintain backwards compatibility. Move the file renaming to hyperv-tlfs.h to avoid confusing it with mshyperv.h. In future, all definitions from TLFS should go to it and all kernel objects should go to mshyperv.h or include/linux/hyperv.h. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Wanpeng Li	34226b6b70	KVM: X86: Fix setup the virt_spin_lock_key before static key get initialized static_key_disable_cpuslocked(): static key 'virt_spin_lock_key+0x0/0x20' used before call to jump_label_init() WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:161 static_key_disable_cpuslocked+0x61/0x80 RIP: 0010:static_key_disable_cpuslocked+0x61/0x80 Call Trace: static_key_disable+0x16/0x20 start_kernel+0x192/0x4b3 secondary_startup_64+0xa5/0xb0 Qspinlock will be choosed when dedicated pCPUs are available, however, the static virt_spin_lock_key is set in kvm_spinlock_init() before jump_label_init() has been called, which will result in a WARN(). This patch fixes it by delaying the virt_spin_lock_key setup to .smp_prepare_cpus(). Reported-by: Davidlohr Bueso <dbueso@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Fixes: `b2798ba0b8` ("KVM: X86: Choose qspinlock when dedicated physical CPUs are available") Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Babu Moger	8566ac8b8e	KVM: SVM: Implement pause loop exit logic in SVM Bring the PLE(pause loop exit) logic to AMD svm driver. While testing, we found this helping in situations where numerous pauses are generated. Without these patches we could see continuos VMEXITS due to pause interceptions. Tested it on AMD EPYC server with boot parameter idle=poll on a VM with 32 vcpus to simulate extensive pause behaviour. Here are VMEXITS in 10 seconds interval. Pauses 810199 504 Total 882184 325415 Signed-off-by: Babu Moger <babu.moger@amd.com> [Prevented the window from dropping below the initial value. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Babu Moger	1d8fb44a72	KVM: SVM: Add pause filter threshold This patch adds the support for pause filtering threshold. This feature support is indicated by CPUID Fn8000_000A_EDX. See AMD APM Vol 2 Section 15.14.4 Pause Intercept Filtering for more details. In this mode, a 16-bit pause filter threshold field is added in VMCB. The threshold value is a cycle count that is used to reset the pause counter. As with simple pause filtering, VMRUN loads the pause count value from VMCB into an internal counter. Then, on each pause instruction the hardware checks the elapsed number of cycles since the most recent pause instruction against the pause Filter Threshold. If the elapsed cycle count is greater than the pause filter threshold, then the internal pause count is reloaded from VMCB and execution continues. If the elapsed cycle count is less than the pause filter threshold, then the internal pause count is decremented. If the count value is less than zero and pause intercept is enabled, a #VMEXIT is triggered. If advanced pause filtering is supported and pause filter threshold field is set to zero, the filter will operate in the simpler, count only mode. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Babu Moger	c8e88717cf	KVM: VMX: Bring the common code to header file This patch brings some of the code from vmx to x86.h header file. Now, we can share this code between vmx and svm. Modified couple functions to make it common. Signed-off-by: Babu Moger <babu.moger@amd.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Babu Moger	18abdc3425	KVM: VMX: Remove ple_window_actual_max Get rid of ple_window_actual_max, because its benefits are really minuscule and the logic is complicated. The overflows(and underflow) are controlled in __ple_window_grow and _ple_window_shrink respectively. Suggested-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Babu Moger <babu.moger@amd.com> [Fixed potential wraparound and change the max to UINT_MAX. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Babu Moger	7fbc85a5fb	KVM: VMX: Fix the module parameters for vmx The vmx module parameters are supposed to be unsigned variants. Also fixed the checkpatch errors like the one below. WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. +module_param(ple_gap, uint, S_IRUGO); Signed-off-by: Babu Moger <babu.moger@amd.com> [Expanded uint to unsigned int in code. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 22:47:06 +02:00
Yazen Ghannam	e2efacb6a5	Revert "x86/mce/AMD: Collect error info even if valid bits are not set" This reverts commit `4b1e84276a`. Software uses the valid bits to decide if the values can be used for further processing or other actions. So setting the valid bits will have software act on values that it shouldn't be acting on. The recommendation to save all the register values does not mean that the values are always valid. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: tony.luck@intel.com Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: bp@suse.de Cc: linux-edac@vger.kernel.org Link: https://lkml.kernel.org/r/20180326191526.64314-1-Yazen.Ghannam@amd.com	2018-03-28 20:34:59 +02:00
mike.travis@hpe.com	bd47a85acd	x86/platform/UV: Fix critical UV MMR address error A critical error was found testing the fixed UV4 HUB in that an MMR address was found to be incorrect. This causes the virtual address space for accessing the MMIOH1 region to be allocated with the incorrect size. Fixes: `673aa20c55` ("x86/platform/UV: Update uv_mmrs.h to prepare for UV4A fixes") Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Russ Anderson <russ.anderson@hpe.com> Cc: Andrew Banman <andrew.banman@hpe.com> Link: https://lkml.kernel.org/r/20180328174011.041801248@stormcage.americas.sgi.com	2018-03-28 20:19:45 +02:00
Andi Kleen	dd60d21706	KVM: x86: Fix perf timer mode IP reporting KVM and perf have a special backdoor mechanism to report the IP for interrupts re-executed after vm exit. This works for the NMIs that perf normally uses. However when perf is in timer mode it doesn't work because the timer interrupt doesn't get this special treatment. This is common when KVM is running nested in another hypervisor which may not implement the PMU, so only timer mode is available. Call the functions to set up the backdoor IP also for non NMI interrupts. I renamed the functions to set up the backdoor IP reporting to be more appropiate for their new use. The SVM change is only compile tested. v2: Moved the functions inline. For the normal interrupt case the before/after functions are now called from x86.c, not arch specific code. For the NMI case we still need to call it in the architecture specific code, because it's already needed in the low level *_run functions. Signed-off-by: Andi Kleen <ak@linux.intel.com> [Removed unnecessary calls from arch handle_external_intr. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 16:12:59 +02:00
Wanpeng Li	17a1079d9c	KVM: x86: Fix pv tlb flush dependencies PV TLB FLUSH can only be turned on when steal time is enabled. The condition got reversed during conflict resolution. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Fixes: `4f2f61fc50` ("KVM: X86: Avoid traversing all the cpus for pv tlb flush when steal time is disabled") [Rebased on top of kvm/master and reworded the commit message. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>	2018-03-28 15:53:34 +02:00
Greg Kroah-Hartman	b24d0d5b12	Merge 4.16-rc7 into char-misc-next We want the hyperv fix in here for merging and testing. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2018-03-28 12:27:35 +02:00
Tom Lendacky	07344b15a9	x86/boot: Fix SEV boot failure from change to __PHYSICAL_MASK_SHIFT In arch/x86/boot/compressed/kaslr_64.c, CONFIG_AMD_MEM_ENCRYPT support was initially #undef'd to support SME with minimal effort. When support for SEV was added, the #undef remained and some minimal support for setting the encryption bit was added for building identity mapped pagetable entries. Commit `b83ce5ee91` ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52") changed __PHYSICAL_MASK_SHIFT from 46 to 52 in support of 5-level paging. This change resulted in SEV guests failing to boot because the encryption bit was no longer being automatically masked out. The compressed boot path now requires sme_me_mask to be defined in order for the pagetable functions, such as pud_present(), to properly mask out the encryption bit (currently bit 47) when evaluating pagetable entries. Add an sme_me_mask variable in arch/x86/boot/compressed/mem_encrypt.S, which is set when SEV is active, delete the #undef CONFIG_AMD_MEM_ENCRYPT from arch/x86/boot/compressed/kaslr_64.c and use sme_me_mask when building the identify mapped pagetable entries. Fixes: `b83ce5ee91` ("x86/mm/64: Make __PHYSICAL_MASK_SHIFT always 52") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180327220711.8702.55842.stgit@tlendack-t1.amdoffice.net	2018-03-28 10:42:57 +02:00
Andrew Banman	151ad17fbe	x86/platform/uv/BAU: Add APIC idt entry BAU uses the old alloc_initr_gate90 method to setup its interrupt. This fails silently as the BAU vector is in the range of APIC vectors that are registered to the spurious interrupt handler. As a consequence BAU broadcasts are not handled, and the broadcast source CPU hangs. Update BAU to use new idt structure. Fixes: `dc20b2d526` ("x86/idt: Move interrupt gate initialization to IDT code") Signed-off-by: Andrew Banman <abanman@hpe.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Mike Travis <mike.travis@hpe.com> Cc: Dimitri Sivanich <sivanich@hpe.com> Cc: Russ Anderson <rja@hpe.com> Cc: stable@vger.kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/1522188546-196177-1-git-send-email-abanman@hpe.com	2018-03-28 10:40:55 +02:00
Eric Dumazet	9b9a51354c	x86/msr: Make rdmsrl_safe_on_cpu() scheduling safe as well When changing rdmsr_safe_on_cpu() to schedule, it was missed that __rdmsr_safe_on_cpu() was also used by rdmsrl_safe_on_cpu() Make rdmsrl_safe_on_cpu() a wrapper instead of copy/pasting the code which was added for the completion handling. Fixes: `07cde313b2` ("x86/msr: Allow rdmsr_safe_on_cpu() to schedule") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180328032233.153055-1-edumazet@google.com	2018-03-28 10:34:13 +02:00
Dave Airlie	2b4f44eec2	Linux 4.16-rc7 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJauCZfAAoJEHm+PkMAQRiGWGUH/2rhdQDkoJpYWnjaQkolECG8 MxpGE7nmIIHxQcbSDdHTGJ8IhVm6Z5wZ7ym/PwCDTT043Y1y341sJrIwL2/nTG6d HVidk8hFvgN6QzlzVAHT3ZZMII/V9Zt+VV5SUYLGnPAVuJNHo/6uzWlTU5g+NTFo IquFDdQUaGBlkKqby+NoAFnkV1UAIkW0g22cfvPnlO5GMer0gusGyVNvVp7TNj3C sqj4Hvt3RMDLMNe9RZ2pFTiOD096n8FWpYftZneUTxFImhRV3Jg5MaaYZm9SI3HW tXrv/LChT/F1mi5Pkx6tkT5Hr8WvcrwDMJ4It1kom10RqWAgjxIR3CMm448ileY= =YKUG -----END PGP SIGNATURE----- Backmerge tag 'v4.16-rc7' into drm-next Linux 4.16-rc7 This was requested by Daniel, and things were getting a bit hard to reconcile, most of the conflicts were trivial though.	2018-03-28 14:30:41 +10:00
Mark Brown	5b6d7104f6	Merge remote-tracking branch 'asoc/topic/intel' into asoc-next	2018-03-28 10:26:09 +08:00
Eric Dumazet	67bbd7a8d6	x86/cpuid: Allow cpuid_read() to schedule High latencies can be observed caused by a daemon periodically reading CPUID on all cpus. On KASAN enabled kernels ~10ms latencies can be observed. Even without KASAN, sending an IPI to a CPU, which is in a deep sleep state or in a long hard IRQ disabled section, waiting for the answer can consume hundreds of microseconds. cpuid_read() is invoked in preemptible context, so it can be converted to sleep instead of busy wait. Switching to smp_call_function_single_async() and a completion allows to reschedule and reduces CPU usage and latencies. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Link: https://lkml.kernel.org/r/20180323215818.127774-2-edumazet@google.com	2018-03-27 12:01:48 +02:00
Eric Dumazet	07cde313b2	x86/msr: Allow rdmsr_safe_on_cpu() to schedule High latencies can be observed caused by a daemon periodically reading various MSR on all cpus. On KASAN enabled kernels ~10ms latencies can be observed simply reading one MSR. Even without KASAN, sending an IPI to a CPU, which is in a deep sleep state or in a long hard IRQ disabled section, waiting for the answer can consume hundreds of microseconds. All usage sites are in preemptible context, convert rdmsr_safe_on_cpu() to use a completion instead of busy polling. Overall daemon cpu usage was reduced by 35 %, and latencies caused by msr_read() disappeared. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Link: https://lkml.kernel.org/r/20180323215818.127774-1-edumazet@google.com	2018-03-27 12:01:47 +02:00
Kirill A. Shutemov	547edaca24	x86/mm: Update comment in detect_tme() regarding x86_phys_bits As Kai pointed out, the primary reason for adjusting x86_phys_bits is to reflect that the the address space is reduced and not the ability to communicate the available physical address space to virtual machines. Suggested-by: Kai Huang <kai.huang@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: linux-mm@kvack.org Link: https://lkml.kernel.org/r/20180315134907.9311-2-kirill.shutemov@linux.intel.com	2018-03-27 11:49:58 +02:00
Andy Shevchenko	47a9973d3e	x86/PCI: Fix a potential regression when using dmi_get_bios_year() dmi_get_bios_year() may return 0 when it cannot parse the BIOS date string. Previously this has been checked in pci_acpi_crs_quirks(). Update the code to restore old behaviour. Reported-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Lukas Wunner <lukas@wunner.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rjw@rjwysocki.net> Cc: linux-acpi@vger.kernel.org Cc: linux-pci@vger.kernel.org Fixes: `69c42d493d` ("x86/pci: Simplify code by using the new dmi_get_bios_year() helper") Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-27 11:33:16 +02:00
Jaak Ristioja	1897a9691e	Documentation: Fix early-microcode.txt references after file rename The file Documentation/x86/early-microcode.txt was renamed to Documentation/x86/microcode.txt in `0e3258753f`, but it was still referenced by its old name in a three places: * Documentation/x86/00-INDEX * arch/x86/Kconfig * arch/x86/kernel/cpu/microcode/amd.c This commit updates these references accordingly. Fixes: `0e3258753f` ("x86/microcode: Document the three loading methods") Signed-off-by: Jaak Ristioja <jaak@ristioja.ee> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2018-03-27 09:51:23 +02:00
Alexey Dobriyan	bd6271039e	x86/alternatives: Fixup alternative_call_2 The following pattern fails to compile while the same pattern with alternative_call() does: if (...) alternative_call_2(...); else alternative_call_2(...); as it expands into if (...) { }; <=== else { }; Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20180114120504.GA11368@avx2	2018-03-27 09:47:53 +02:00
David Rientjes	fc5d1073ca	x86/mm/32: Remove unused node_memmap_size_bytes() & CONFIG_NEED_NODE_MEMMAP_SIZE logic node_memmap_size_bytes() has been unused since the v3.9 kernel, so remove it. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Fixes: `f03574f2d5` ("x86-32, mm: Rip out x86_32 NUMA remapping code") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803262325540.256524@chino.kir.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-27 08:45:02 +02:00
Ingo Molnar	0bc91d4ba7	Linux 4.16-rc7 -----BEGIN PGP SIGNATURE----- iQEcBAABAgAGBQJauCZfAAoJEHm+PkMAQRiGWGUH/2rhdQDkoJpYWnjaQkolECG8 MxpGE7nmIIHxQcbSDdHTGJ8IhVm6Z5wZ7ym/PwCDTT043Y1y341sJrIwL2/nTG6d HVidk8hFvgN6QzlzVAHT3ZZMII/V9Zt+VV5SUYLGnPAVuJNHo/6uzWlTU5g+NTFo IquFDdQUaGBlkKqby+NoAFnkV1UAIkW0g22cfvPnlO5GMer0gusGyVNvVp7TNj3C sqj4Hvt3RMDLMNe9RZ2pFTiOD096n8FWpYftZneUTxFImhRV3Jg5MaaYZm9SI3HW tXrv/LChT/F1mi5Pkx6tkT5Hr8WvcrwDMJ4It1kom10RqWAgjxIR3CMm448ileY= =YKUG -----END PGP SIGNATURE----- Merge tag 'v4.16-rc7' into x86/mm, to fix up conflict Conflicts: arch/x86/mm/init_64.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-27 08:43:39 +02:00
Stephane Eranian	71eb9ee959	perf/x86/intel: Fix linear IP of PEBS real_ip on Haswell and later CPUs this patch fix a bug in how the pebs->real_ip is handled in the PEBS handler. real_ip only exists in Haswell and later processor. It is actually the eventing IP, i.e., where the event occurred. As opposed to the pebs->ip which is the PEBS interrupt IP which is always off by one. The problem is that the real_ip just like the IP needs to be fixed up because PEBS does not record all the machine state registers, and in particular the code segement (cs). This is why we have the set_linear_ip() function. The problem was that set_linear_ip() was only used on the pebs->ip and not the pebs->real_ip. We have profiles which ran into invalid callstacks because of this. Here is an example: ..... 0: ffffffffffffff80 recent entry, marker kernel v ..... 1: 000000000040044d <= user address in kernel space! ..... 2: fffffffffffffe00 marker enter user v ..... 3: 000000000040044d ..... 4: 00000000004004b6 oldest entry Debugging output in get_perf_callchain(): [ 857.769909] CALLCHAIN: CPU8 ip=40044d regs->cs=10 user_mode(regs)=0 The problem is that the kernel entry in 1: points to a user level address. How can that be? The reason is that with PEBS sampling the instruction that caused the event to occur and the instruction where the CPU was when the interrupt was posted may be far apart. And sometime during that time window, the privilege level may change. This happens, for instance, when the PEBS sample is taken close to a kernel entry point. Here PEBS, eventing IP (real_ip) captured a user level instruction. But by the time the PMU interrupt fired, the processor had already entered kernel space. This is why the debug output shows a user address with user_mode() false. The problem comes from PEBS not recording the code segment (cs) register. The register is used in x86_64 to determine if executing in kernel vs user space. This is okay because the kernel has a software workaround called set_linear_ip(). But the issue in setup_pebs_sample_data() is that set_linear_ip() is never called on the real_ip value when it is available (Haswell and later) and precise_ip > 1. This patch fixes this problem and eliminates the callchain discrepancy. The patch restructures the code around set_linear_ip() to minimize the number of times the IP has to be set. Signed-off-by: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1521788507-10231-1-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-27 08:27:27 +02:00
Davidlohr Bueso	631fe154ed	perf/x86: Update rdpmc_always_available static key to the modern API No changes in refcount semantics -- use DEFINE_STATIC_KEY_FALSE() for initialization and replace: static_key_slow_inc\|dec() => static_branch_inc\|dec() static_key_false() => static_branch_unlikely() Added a '_key' suffix to rdpmc_always_available, for better self-documentation. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Link: http://lkml.kernel.org/r/20180326210929.5244-5-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-27 07:53:00 +02:00
Ivan Gorinov	4e07db9c8d	x86/devicetree: Use CPU description from Device Tree Current x86 Device Tree implementation does not support multiprocessing. Use new DT bindings to describe the processors. Signed-off-by: Ivan Gorinov <ivan.gorinov@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Frank Rowand <frowand.list@gmail.com> Link: https://lkml.kernel.org/r/c291fb2cef51b730b59916d7745be0eaa4378c6c.1521753738.git.ivan.gorinov@intel.com	2018-03-26 15:13:32 +02:00
Joe Perches	447a5647c9	treewide: Align function definition open/close braces Some functions definitions have either the initial open brace and/or the closing brace outside of column 1. Move those braces to column 1. This allows various function analyzers like gnu complexity to work properly for these modified functions. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Nicolin Chen <nicoleotsuka@gmail.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2018-03-26 11:13:09 +02:00
David Rientjes	e25283bf83	x86/apic: Finish removing unused callbacks The ->cpu_mask_to_apicid() and ->vector_allocation_domain() callbacks are now unused, so remove them. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: `baab1e84b1` ("x86/apic: Remove unused callbacks") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803251403540.80485@chino.kir.corp.google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-26 08:49:38 +02:00
Linus Torvalds	d2862360bf	Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 and PTI fixes from Ingo Molnar: "Misc fixes: - fix EFI pagetables freeing - fix vsyscall pagetable setting on Xen PV guests - remove ancient CONFIG_X86_PPRO_FENCE=y - x86 is TSO again - fix two binutils (ld) development version related incompatibilities - clean up breakpoint handling - fix an x86 self-test" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry/64: Don't use IST entry for #BP stack x86/efi: Free efi_pgd with free_pages() x86/vsyscall/64: Use proper accessor to update P4D entry x86/cpu: Remove the CONFIG_X86_PPRO_FENCE=y quirk x86/boot/64: Verify alignment of the LOAD segment x86/build/64: Force the linker to use 2MB page size selftests/x86/ptrace_syscall: Fix for yet more glibc interference	2018-03-25 07:36:02 -10:00
Linus Torvalds	eaf67993f5	Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Misc kernel side fixes. Generic: - cgroup events counting fix x86: - Intel PMU truncated-parameter fix - RDPMC fix - API naming fix/rename - uncore driver big-hardware PCI enumeration fix - uncore driver filter constraint fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/cgroup: Fix child event counting bug perf/x86/intel/uncore: Fix multi-domain PCI CHA enumeration bug on Skylake servers perf/x86/intel: Rename confusing 'freerunning PEBS' API and implementation to 'large PEBS' perf/x86/intel/uncore: Add missing filter constraint for SKX CHA event perf/x86/intel: Don't accidentally clear high bits in bdw_limit_period() perf/x86/intel: Disable userspace RDPMC usage for large PEBS	2018-03-25 07:27:32 -10:00
Sven Wegener	e847f6aaf6	x86/purgatory: Avoid creating stray .<pid>.d files, remove -MD from KBUILD_CFLAGS The kernel build system already takes care of generating the dependency files. Having the additional -MD in KBUILD_CFLAGS leads to stray .<pid>.d files in the build directory when we call the cc-option macro. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthias Kaehlcke <mka@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vivek Goyal <vgoyal@redhat.com> Link: http://lkml.kernel.org/r/alpine.LNX.2.21.1803242219380.30139@titan.int.lan.stealer.net Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-25 11:04:02 +02:00
Ingo Molnar	ea2301b622	Merge branch 'linus' into x86/dma, to resolve a conflict with upstream Conflicts: arch/x86/mm/init_64.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-24 09:25:26 +01:00
Ingo Molnar	7054e4e0b1	Merge branch 'perf/urgent' into perf/core, to pick up fixes With the cherry-picked perf/urgent commit merged separately we can now merge all the fixes without conflicts. Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-24 09:21:47 +01:00
Andy Lutomirski	d8ba61ba58	x86/entry/64: Don't use IST entry for #BP stack There's nothing IST-worthy about #BP/int3. We don't allow kprobes in the small handful of places in the kernel that run at CPL0 with an invalid stack, and 32-bit kernels have used normal interrupt gates for #BP forever. Furthermore, we don't allow kprobes in places that have usergs while in kernel mode, so "paranoid" is also unnecessary. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org	2018-03-23 21:10:36 +01:00
Waiman Long	06ace26f4e	x86/efi: Free efi_pgd with free_pages() The efi_pgd is allocated as PGD_ALLOCATION_ORDER pages and therefore must also be freed as PGD_ALLOCATION_ORDER pages with free_pages(). Fixes: `d9e9a64180` ("x86/mm/pti: Allocate a separate user PGD") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1521746333-19593-1-git-send-email-longman@redhat.com	2018-03-23 20:18:31 +01:00
Dan Carpenter	d32ef547fd	kvm: x86: hyperv: delete dead code in kvm_hv_hypercall() "rep_done" is always zero so the "(((u64)rep_done & 0xfff) << 32)" expression is just zero. We can remove the "res" temporary variable as well and just use "ret" directly. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-03-23 20:11:01 +01:00
Thomas Gleixner	ea89c06548	x86/tsc: Get rid of rdtscll() Commit `99770737ca` ("x86/asm/tsc: Add rdtscll() merge helper") added rdtscll() in August 2015 along with the comment: /* Deprecated, keep it for a cycle for easier merging: */ 12 cycles later it's really overdue for removal. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2018-03-23 20:07:54 +01:00
Sean Christopherson	81811c162d	KVM: SVM: add struct kvm_svm to hold SVM specific KVM vars Add struct kvm_svm, which is analagous to struct vcpu_svm, along with a helper to_kvm_svm() to retrieve kvm_svm from a struct kvm *. Move the SVM specific variables and struct definitions out of kvm_arch and into kvm_svm. Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-03-23 18:32:19 +01:00
Sean Christopherson	40bbb9d03f	KVM: VMX: add struct kvm_vmx to hold VMX specific KVM vars Add struct kvm_vmx, which wraps struct kvm, and a helper to_kvm_vmx() that retrieves 'struct kvm_vmx ' from 'struct kvm '. Move the VMX specific variables out of kvm_arch and into kvm_vmx. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-03-23 18:32:03 +01:00
Sean Christopherson	2ac52ab861	KVM: x86: move setting of ept_identity_map_addr to vmx.c Add kvm_x86_ops->set_identity_map_addr and set ept_identity_map_addr in VMX specific code so that ept_identity_map_addr can be moved out of 'struct kvm_arch' in a future patch. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-03-23 18:30:47 +01:00
Sean Christopherson	434a1e9446	KVM: x86: define SVM/VMX specific kvm_arch_[alloc\|free]_vm Define kvm_arch_[alloc\|free]_vm in x86 as pass through functions to new kvm_x86_ops vm_alloc and vm_free, and move the current allocation logic as-is to SVM and VMX. Vendor specific alloc/free functions set the stage for SVM/VMX wrappers of 'struct kvm', which will allow us to move the growing number of SVM/VMX specific member variables out of 'struct kvm_arch'. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-03-23 18:30:44 +01:00
Sean Christopherson	9d1887ef32	KVM: nVMX: sync vmcs02 segment regs prior to vmx_set_cr0 Segment registers must be synchronized prior to any code that may trigger a call to emulation_required()/guest_state_valid(), e.g. vmx_set_cr0(). Because preparing vmcs02 writes segmentation fields directly, i.e. doesn't use vmx_set_segment(), emulation_required will not be re-evaluated when synchronizing the segment registers, which can result in L0 incorrectly starting emulation of L2. Fixes: `8665c3f973` ("KVM: nVMX: initialize descriptor cache fields in prepare_vmcs02_full") Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> [Move all of prepare_vmcs02_full earlier, not just segment registers. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-03-23 18:26:21 +01:00
David S. Miller	03fe2debbb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Fun set of conflict resolutions here... For the mac80211 stuff, these were fortunately just parallel adds. Trivially resolved. In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the function phy_disable_interrupts() earlier in the file, whilst in 'net-next' the phy_error() call from this function was removed. In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the 'rt_table_id' member of rtable collided with a bug fix in 'net' that added a new struct member "rt_mtu_locked" which needs to be copied over here. The mlxsw driver conflict consisted of net-next separating the span code and definitions into separate files, whilst a 'net' bug fix made some changes to that moved code. The mlx5 infiniband conflict resolution was quite non-trivial, the RDMA tree's merge commit was used as a guide here, and here are their notes: ==================== Due to bug fixes found by the syzkaller bot and taken into the for-rc branch after development for the 4.17 merge window had already started being taken into the for-next branch, there were fairly non-trivial merge issues that would need to be resolved between the for-rc branch and the for-next branch. This merge resolves those conflicts and provides a unified base upon which ongoing development for 4.17 can be based. Conflicts: drivers/infiniband/hw/mlx5/main.c - Commit `42cea83f95` (IB/mlx5: Fix cleanup order on unload) added to for-rc and commit `b5ca15ad7e` (IB/mlx5: Add proper representors support) add as part of the devel cycle both needed to modify the init/de-init functions used by mlx5. To support the new representors, the new functions added by the cleanup patch needed to be made non-static, and the init/de-init list added by the representors patch needed to be modified to match the init/de-init list changes made by the cleanup patch. Updates: drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function prototypes added by representors patch to reflect new function names as changed by cleanup patch drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init stage list to match new order from cleanup patch ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2018-03-23 11:31:58 -04:00
Linus Torvalds	f36b7534b8	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm, thp: do not cause memcg oom for thp mm/vmscan: wake up flushers for legacy cgroups too Revert "mm: page_alloc: skip over regions of invalid pfns where possible" mm/shmem: do not wait for lock_page() in shmem_unused_huge_shrink() mm/thp: do not wait for lock_page() in deferred_split_scan() mm/khugepaged.c: convert VM_BUG_ON() to collapse fail x86/mm: implement free pmd/pte page interfaces mm/vmalloc: add interfaces to free unmapped page table h8300: remove extraneous __BIG_ENDIAN definition hugetlbfs: check for pgoff value overflow lockdep: fix fs_reclaim warning MAINTAINERS: update Mark Fasheh's e-mail mm/mempolicy.c: avoid use uninitialized preferred_node	2018-03-22 18:48:43 -07:00
Linus Torvalds	8401c72c59	Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "Two regression fixes, two bug fixes for older issues, two fixes for new functionality added this cycle that have userspace ABI concerns, and a small cleanup. These have appeared in a linux-next release and have a build success report from the 0day robot. * The 4.16 rework of altmap handling led to some configurations leaking page table allocations due to freeing from the altmap reservation rather than the page allocator. The impact without the fix is leaked memory and a WARN() message when tearing down libnvdimm namespaces. The rework also missed a place where error handling code needed to be removed that can lead to a crash if devm_memremap_pages() fails. * acpi_map_pxm_to_node() had a latent bug whereby it could misidentify the closest online node to a given proximity domain. * Block integrity handling was reworked several kernels back to allow calling add_disk() after setting up the integrity profile. The nd_btt and nd_blk drivers are just now catching up to fix automatic partition detection at driver load time. * The new peristence_domain attribute, a platform indicator of whether cpu caches are powerfail protected for example, is meant to be a single value enum and not a set of flags. This oversight was caught while reviewing new userspace code in libndctl to communicate the attribute. Fix this new enabling up so that we are not stuck with an unwanted userspace ABI" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm, nfit: fix persistence domain reporting libnvdimm, region: hide persistence_domain when unknown acpi, numa: fix pxm to online numa node associations x86, memremap: fix altmap accounting at free libnvdimm: remove redundant assignment to pointer 'dev' libnvdimm, {btt, blk}: do integrity setup before add_disk() kernel/memremap: Remove stale devres_free() call	2018-03-22 18:37:49 -07:00
Toshi Kani	28ee90fe60	x86/mm: implement free pmd/pte page interfaces Implement pud_free_pmd_page() and pmd_free_pte_page() on x86, which clear a given pud/pmd entry and free up lower level page table(s). The address range associated with the pud/pmd entry must have been purged by INVLPG. Link: http://lkml.kernel.org/r/20180314180155.19492-3-toshi.kani@hpe.com Fixes: `e61ce6ade4` ("mm: change ioremap to set up huge I/O mappings") Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Reported-by: Lei Li <lious.lilei@hisilicon.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-03-22 17:07:01 -07:00
Toshi Kani	b6bdb7517c	mm/vmalloc: add interfaces to free unmapped page table On architectures with CONFIG_HAVE_ARCH_HUGE_VMAP set, ioremap() may create pud/pmd mappings. A kernel panic was observed on arm64 systems with Cortex-A75 in the following steps as described by Hanjun Guo. 1. ioremap a 4K size, valid page table will build, 2. iounmap it, pte0 will set to 0; 3. ioremap the same address with 2M size, pgd/pmd is unchanged, then set the a new value for pmd; 4. pte0 is leaked; 5. CPU may meet exception because the old pmd is still in TLB, which will lead to kernel panic. This panic is not reproducible on x86. INVLPG, called from iounmap, purges all levels of entries associated with purged address on x86. x86 still has memory leak. The patch changes the ioremap path to free unmapped page table(s) since doing so in the unmap path has the following issues: - The iounmap() path is shared with vunmap(). Since vmap() only supports pte mappings, making vunmap() to free a pte page is an overhead for regular vmap users as they do not need a pte page freed up. - Checking if all entries in a pte page are cleared in the unmap path is racy, and serializing this check is expensive. - The unmap path calls free_vmap_area_noflush() to do lazy TLB purges. Clearing a pud/pmd entry before the lazy TLB purges needs extra TLB purge. Add two interfaces, pud_free_pmd_page() and pmd_free_pte_page(), which clear a given pud/pmd entry and free up a page for the lower level entries. This patch implements their stub functions on x86 and arm64, which work as workaround. [akpm@linux-foundation.org: fix typo in pmd_free_pte_page() stub] Link: http://lkml.kernel.org/r/20180314180155.19492-2-toshi.kani@hpe.com Fixes: `e61ce6ade4` ("mm: change ioremap to set up huge I/O mappings") Reported-by: Lei Li <lious.lilei@hisilicon.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Wang Xuefeng <wxf.wang@hisilicon.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-03-22 17:07:01 -07:00
Linus Torvalds	c4f4d2f917	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Always validate XFRM esn replay attribute, from Florian Westphal. 2) Fix RCU read lock imbalance in xfrm_get_tos(), from Xin Long. 3) Don't try to get firmware dump if not loaded in iwlwifi, from Shaul Triebitz. 4) Fix BPF helpers to deal with SCTP GSO SKBs properly, from Daniel Axtens. 5) Fix some interrupt handling issues in e1000e driver, from Benjamin Poitier. 6) Use strlcpy() in several ethtool get_strings methods, from Florian Fainelli. 7) Fix rhlist dup insertion, from Paul Blakey. 8) Fix SKB leak in netem packet scheduler, from Alexey Kodanev. 9) Fix driver unload crash when link is up in smsc911x, from Jeremy Linton. 10) Purge out invalid socket types in l2tp_tunnel_create(), from Eric Dumazet. 11) Need to purge the write queue when TCP connections are aborted, otherwise userspace using MSG_ZEROCOPY can't close the fd. From Soheil Hassas Yeganeh. 12) Fix double free in error path of team driver, from Arkadi Sharshevsky. 13) Filter fixes for hv_netvsc driver, from Stephen Hemminger. 14) Fix non-linear packet access in ipv6 ndisc code, from Lorenzo Bianconi. 15) Properly filter out unsupported feature flags in macvlan driver, from Shannon Nelson. 16) Don't request loading the diag module for a protocol if the protocol itself is not even registered. From Xin Long. 17) If datagram connect fails in ipv6, make sure the socket state is consistent afterwards. From Paolo Abeni. 18) Use after free in qed driver, from Dan Carpenter. 19) If received ipv4 PMTU is less than the min pmtu, lock the mtu in the entry. From Sabrina Dubroca. 20) Fix sleep in atomic in tg3 driver, from Jonathan Toppins. 21) Fix vlan in vlan untagging in some situations, from Toshiaki Makita. 22) Fix double SKB free in genlmsg_mcast(). From Nicolas Dichtel. 23) Fix NULL derefs in error paths of tcf__init(), from Davide Caratti. 24) Unbalanced PM runtime calls in FEC driver, from Florian Fainelli. 25) Memory leak in gemini driver, from Igor Pylypiv. 26) IDR leaks in error paths of tcf__init() functions, from Davide Caratti. 27) Need to use GFP_ATOMIC in seg6_build_state(), from David Lebrun. 28) Missing dev_put() in error path of macsec_newlink(), from Dan Carpenter. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (201 commits) macsec: missing dev_put() on error in macsec_newlink() net: dsa: Fix functional dsa-loop dependency on FIXED_PHY hv_netvsc: common detach logic hv_netvsc: change GPAD teardown order on older versions hv_netvsc: use RCU to fix concurrent rx and queue changes hv_netvsc: disable NAPI before channel close net/ipv6: Handle onlink flag with multipath routes ppp: avoid loop in xmit recursion detection code ipv6: sr: fix NULL pointer dereference when setting encap source address ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state net: aquantia: driver version bump net: aquantia: Implement pci shutdown callback net: aquantia: Allow live mac address changes net: aquantia: Add tx clean budget and valid budget handling logic net: aquantia: Change inefficient wait loop on fw data reads net: aquantia: Fix a regression with reset on old firmware net: aquantia: Fix hardware reset when SPI may rarely hangup s390/qeth: on channel error, reject further cmd requests s390/qeth: lock read device while queueing next buffer s390/qeth: when thread completes, wake up all waiters ...	2018-03-22 14:10:29 -07:00
Paolo Bonzini	3184a995f7	KVM: nVMX: fix vmentry failure code when L2 state would require emulation Commit `2bb8cafea8` ("KVM: vVMX: signal failure for nested VMEntry if emulation_required", 2018-03-12) introduces a new error path which does not set *entry_failure_code. Fix that to avoid a leak of L0 stack to L1. Reported-by: Radim Krčmář <rkrcmar@redhat.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-03-21 14:20:33 +01:00
Liran Alon	e40ff1d660	KVM: nVMX: Do not load EOI-exitmap while running L2 When L1 IOAPIC redirection-table is written, a request of KVM_REQ_SCAN_IOAPIC is set on all vCPUs. This is done such that all vCPUs will now recalc their IOAPIC handled vectors and load it to their EOI-exitmap. However, it could be that one of the vCPUs is currently running L2. In this case, load_eoi_exitmap() will be called which would write to vmcs02->eoi_exit_bitmap, which is wrong because vmcs02->eoi_exit_bitmap should always be equal to vmcs12->eoi_exit_bitmap. Furthermore, at this point KVM_REQ_SCAN_IOAPIC was already consumed and therefore we will never update vmcs01->eoi_exit_bitmap. This could lead to remote_irr of some IOAPIC level-triggered entry to remain set forever. Fix this issue by delaying the load of EOI-exitmap to when vCPU is running L1. One may wonder why not just delay entire KVM_REQ_SCAN_IOAPIC processing to when vCPU is running L1. This is done in order to handle correctly the case where LAPIC & IO-APIC of L1 is pass-throughed into L2. In this case, vmcs12->virtual_interrupt_delivery should be 0. In current nVMX implementation, that results in vmcs02->virtual_interrupt_delivery to also be 0. Thus, vmcs02->eoi_exit_bitmap is not used. Therefore, every L2 EOI cause a #VMExit into L0 (either on MSR_WRITE to x2APIC MSR or APIC_ACCESS/APIC_WRITE/EPT_MISCONFIG to APIC MMIO page). In order for such L2 EOI to be broadcasted, if needed, from LAPIC to IO-APIC, vcpu->arch.ioapic_handled_vectors must be updated while L2 is running. Therefore, patch makes sure to delay only the loading of EOI-exitmap but not the update of vcpu->arch.ioapic_handled_vectors. Reviewed-by: Arbel Moshe <arbel.moshe@oracle.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2018-03-21 14:16:44 +01:00
Jason Andryuk	36104cb901	x86/xen: Delay get_cpu_cap until stack canary is established Commit `2cc42bac1c` ("x86-64/Xen: eliminate W+X mappings") introduced a call to get_cpu_cap, which is fstack-protected. This is works on x86-64 as commit `4f277295e5` ("x86/xen: init %gs very early to avoid page faults with stack protector") ensures the stack protector is configured, but it it did not cover x86-32. Delay calling get_cpu_cap until after xen_setup_gdt has initialized the stack canary. Without this, a 32bit PV machine crashes early in boot. (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.6.6-xc x86_64 debug=n Tainted: C ]---- (XEN) CPU: 0 (XEN) RIP: e019:[<00000000c10362f8>] And the PV kernel IP corresponds to init_scattered_cpuid_features 0xc10362f8 <+24>: mov %gs:0x14,%eax Fixes `2cc42bac1c` ("x86-64/Xen: eliminate W+X mappings") Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>	2018-03-21 08:29:04 -04:00
Linus Torvalds	32d43cd391	kvm/x86: fix icebp instruction handling The undocumented 'icebp' instruction (aka 'int1') works pretty much like 'int3' in the absense of in-circuit probing equipment (except, obviously, that it raises #DB instead of raising #BP), and is used by some validation test-suites as such. But Andy Lutomirski noticed that his test suite acted differently in kvm than on bare hardware. The reason is that kvm used an inexact test for the icebp instruction: it just assumed that an all-zero VM exit qualification value meant that the VM exit was due to icebp. That is not unlike the guess that do_debug() does for the actual exception handling case, but it's purely a heuristic, not an absolute rule. do_debug() does it because it wants to ascribe _some_ reasons to the #DB that happened, and an empty %dr6 value means that 'icebp' is the most likely casue and we have no better information. But kvm can just do it right, because unlike the do_debug() case, kvm actually sees the real reason for the #DB in the VM-exit interruption information field. So instead of relying on an inexact heuristic, just use the actual VM exit information that says "it was 'icebp'". Right now the 'icebp' instruction isn't technically documented by Intel, but that will hopefully change. The special "privileged software exception" information _is_ actually mentioned in the Intel SDM, even though the cause of it isn't enumerated. Reported-by: Andy Lutomirski <luto@kernel.org> Tested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-03-20 14:58:34 -07:00
Boris Ostrovsky	31ad7f8e7d	x86/vsyscall/64: Use proper accessor to update P4D entry Writing to it directly does not work for Xen PV guests. Fixes: `49275fef98` ("x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy") Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180319143154.3742-1-boris.ostrovsky@oracle.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 12:00:53 +01:00
Peter Zijlstra	d0266046ad	x86: Remove FAST_FEATURE_TESTS Since we want to rely on static branches to avoid speculation, remove any possible fallback code for static_cpu_has. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: torvalds@linux-foundation.org Link: https://lkml.kernel.org/r/20180319154717.705383007@infradead.org	2018-03-20 10:58:03 +01:00
Peter Zijlstra	e501ce957a	x86: Force asm-goto We want to start using asm-goto to guarantee the absence of dynamic branches (and thus speculation). A primary prerequisite for this is of course that the compiler supports asm-goto. This effecively lifts the minimum GCC version to build an x86 kernel to gcc-4.5. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: torvalds@linux-foundation.org Link: https://lkml.kernel.org/r/20180319201327.GJ4043@hirez.programming.kicks-ass.net	2018-03-20 10:58:02 +01:00
Will Deacon	4c0ca49e6d	Merge branch 'siginfo-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace into aarch64/for-next/core Pull in pending siginfo changes from Eric Biederman as we depend on the definition of FPE_FLTUNK for cleaning up our floating-point exception signal delivery (which is currently broken and using FPE_FIXME).	2018-03-20 09:57:15 +00:00
Thomas Gleixner	edc39c9b45	Merge branch 'linus' into x86/build to pick up dependencies	2018-03-20 10:56:18 +01:00
Christoph Hellwig	c10f07aa27	dma/direct: Handle force decryption for DMA coherent buffers in common code With that in place the generic DMA-direct routines can be used to allocate non-encrypted bounce buffers, and the x86 SEV case can use the generic swiotlb ops including nice features such as using CMA allocations. Note that I'm not too happy about using sev_active() in DMA-direct, but I couldn't come up with a good enough name for a wrapper to make it worth adding. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-14-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 10:01:59 +01:00
Christoph Hellwig	b6e05477c1	dma/direct: Handle the memory encryption bit in common code Give the basic phys_to_dma() and dma_to_phys() helpers a __-prefix and add the memory encryption mask to the non-prefixed versions. Use the __-prefixed versions directly instead of clearing the mask again in various places. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-13-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 10:01:59 +01:00
Christoph Hellwig	e7de6c7cc2	dma/swiotlb: Remove swiotlb_set_mem_attributes() Now that set_memory_decrypted() is always available we can just call it directly. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-12-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 10:01:58 +01:00
Christoph Hellwig	178c568244	x86/dma: Remove dma_alloc_coherent_gfp_flags() All dma_ops implementations used on x86 now take care of setting their own required GFP_ masks for the allocation. And given that the common code now clears harmful flags itself that means we can stop the flags in all the IOMMU implementations as well. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-10-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 10:01:58 +01:00
Christoph Hellwig	51c7eeba79	x86/dma/amd_gart: Use dma_direct_{alloc,free}() This gains support for CMA allocations for the force_iommu case, and cleans up the code a bit. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-7-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 10:01:57 +01:00
Christoph Hellwig	f3c39d5104	x86/dma/amd_gart: Look at dev->coherent_dma_mask instead of GFP_DMA We want to phase out looking at the magic GFP_DMA flag in the DMA mapping routines, so switch the gart driver to use the dev->coherent_dma_mask instead, which is used to select the GFP_DMA flag in the caller. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-6-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 10:01:57 +01:00
Christoph Hellwig	6e4bf58677	x86/dma: Use generic swiotlb_ops The generic swiotlb DMA ops were based on the x86 ones and provide equivalent functionality, so use them. Also fix the sta2x11 case. For that SOC the DMA map ops need an additional physical to DMA address translations. For swiotlb buffers that is done throught the phys_to_dma helper, but the sta2x11_dma_ops also added an additional translation on the return value from x86_swiotlb_alloc_coherent, which is only correct if that functions returns a direct allocation and not a swiotlb buffer. With the generic swiotlb and DMA-direct code phys_to_dma is not always used and the separate sta2x11_dma_ops can be replaced with a simple bit that marks if the additional physical to DMA address translation is needed. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-5-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 10:01:57 +01:00
Christoph Hellwig	fec777c385	x86/dma: Use DMA-direct (CONFIG_DMA_DIRECT_OPS=y) The generic DMA-direct (CONFIG_DMA_DIRECT_OPS=y) implementation is now functionally equivalent to the x86 nommu dma_map implementation, so switch over to using it. That includes switching from using x86_dma_supported in various IOMMU drivers to use dma_direct_supported instead, which provides the same functionality. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-4-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 10:01:56 +01:00
Christoph Hellwig	038d07a283	x86/dma: Remove dma_alloc_coherent_mask() These days all devices (including the ISA fallback device) have a coherent DMA mask set, so remove the workaround. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-3-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 10:01:56 +01:00
Ingo Molnar	3eb93ea327	Merge branch 'x86/mm' into x86/dma, to pick up dependencies Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 10:01:37 +01:00
Christoph Hellwig	5927145efd	x86/cpu: Remove the CONFIG_X86_PPRO_FENCE=y quirk There were only a few Pentium Pro multiprocessors systems where this errata applied. They are more than 20 years old now, and we've slowly dropped places which put the workarounds in and discouraged anyone from enabling the workaround. Get rid of it for good. Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jon Mason <jdmason@kudzu.us> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <mulix@mulix.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: iommu@lists.linux-foundation.org Link: http://lkml.kernel.org/r/20180319103826.12853-2-hch@lst.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-03-20 10:01:05 +01:00

... 2 3 4 5 6 ...

29654 Commits