linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-13 22:46:40 +07:00

Author	SHA1	Message	Date
Peter Gonda	e558b38b5e	x86/sev-es: Fix SEV-ES OUT/IN immediate opcode vc handling [ Upstream commit a8f7e08a81708920a928664a865208fdf451c49f ] The IN and OUT instructions with port address as an immediate operand only use an 8-bit immediate (imm8). The current VC handler uses the entire 32-bit immediate value but these instructions only set the first bytes. Cast the operand to an u8 for that. [ bp: Massage commit message. ] Fixes: `25189d08e5` ("x86/sev-es: Add support for handling IOIO exceptions") Signed-off-by: Peter Gonda <pgonda@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: David Rientjes <rientjes@google.com> Link: https://lkml.kernel.org/r/20210105163311.221490-1-pgonda@google.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-19 18:27:28 +01:00
Arnd Bergmann	e30f6e1ac3	ARM: picoxcell: fix missing interrupt-parent properties [ Upstream commit bac717171971176b78c72d15a8b6961764ab197f ] dtc points out that the interrupts for some devices are not parsable: picoxcell-pc3x2.dtsi:45.19-49.5: Warning (interrupts_property): /paxi/gem@30000: Missing interrupt-parent picoxcell-pc3x2.dtsi:51.21-55.5: Warning (interrupts_property): /paxi/dmac@40000: Missing interrupt-parent picoxcell-pc3x2.dtsi:57.21-61.5: Warning (interrupts_property): /paxi/dmac@50000: Missing interrupt-parent picoxcell-pc3x2.dtsi:233.21-237.5: Warning (interrupts_property): /rwid-axi/axi2pico@c0000000: Missing interrupt-parent There are two VIC instances, so it's not clear which one needs to be used. I found the BSP sources that reference VIC0, so use that: https://github.com/r1mikey/meta-picoxcell/blob/master/recipes-kernel/linux/linux-picochip-3.0/0001-picoxcell-support-for-Picochip-picoXcell-SoC.patch Acked-by: Jamie Iles <jamie@jamieiles.com> Link: https://lore.kernel.org/r/20201230152010.3914962-1-arnd@kernel.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-19 18:27:27 +01:00
Randy Dunlap	bb3700925c	arch/arc: add copy_user_page() to <asm/page.h> to fix build error on ARC [ Upstream commit 8a48c0a3360bf2bf4f40c980d0ec216e770e58ee ] fs/dax.c uses copy_user_page() but ARC does not provide that interface, resulting in a build error. Provide copy_user_page() in <asm/page.h>. ../fs/dax.c: In function 'copy_cow_page_dax': ../fs/dax.c:702:2: error: implicit declaration of function 'copy_user_page'; did you mean 'copy_to_user_page'? [-Werror=implicit-function-declaration] Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: linux-snps-arc@lists.infradead.org Cc: Dan Williams <dan.j.williams@intel.com> #Acked-by: Vineet Gupta <vgupta@synopsys.com> # v1 Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: linux-fsdevel@vger.kernel.org Cc: linux-nvdimm@lists.01.org #Reviewed-by: Ira Weiny <ira.weiny@intel.com> # v2 Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-19 18:27:26 +01:00
Linus Walleij	06b0d83b33	ARM: dts: ux500/golden: Set display max brightness [ Upstream commit 7887cc89d5851cbdec49219e9614beec776af150 ] A too high brightness by default (default is max) makes the screen go blank. Set this to 15 as in the Vendor tree. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: Stephan Gerhold <stephan@gerhold.net> Link: https://lore.kernel.org/r/20201214223413.253893-1-linus.walleij@linaro.org' Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-19 18:27:24 +01:00
Carl Philipp Klemm	54cfdd6507	ARM: omap2: pmic-cpcap: fix maximum voltage to be consistent with defaults on xt875 [ Upstream commit c0bc969c176b10598b31d5d1a5edf9a5261f0a9f ] xt875 comes up with a iva voltage of 1375000 and android runs at this too. fix maximum voltage to be consistent with this. Signed-off-by: Carl Philipp Klemm <philipp@uvos.xyz> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-19 18:27:23 +01:00
Masahiro Yamada	6169a5cfaa	ARC: build: move symlink creation to arch/arc/Makefile to avoid race [ Upstream commit c5e6ae563c802c4d828d42e134af64004db2e58c ] If you run 'make uImage uImage.gz' with the parallel option, uImage.gz will be created by two threads simultaneously. This is because arch/arc/Makefile does not specify the dependency between uImage and uImage.gz. Hence, GNU Make assumes they can be built in parallel. One thread descends into arch/arc/boot/ to create uImage, and another to create uImage.gz. Please notice the same log is displayed twice in the following steps: $ export CROSS_COMPILE=<your-arc-compiler-prefix> $ make -s ARCH=arc defconfig $ make -j$(nproc) ARCH=arc uImage uImage.gz [ snip ] LD vmlinux SORTTAB vmlinux SYSMAP System.map OBJCOPY arch/arc/boot/vmlinux.bin OBJCOPY arch/arc/boot/vmlinux.bin GZIP arch/arc/boot/vmlinux.bin.gz GZIP arch/arc/boot/vmlinux.bin.gz UIMAGE arch/arc/boot/uImage.gz UIMAGE arch/arc/boot/uImage.gz Image Name: Linux-5.10.0-rc4-00003-g62f23044 Created: Sun Nov 22 02:52:26 2020 Image Type: ARC Linux Kernel Image (gzip compressed) Data Size: 2109376 Bytes = 2059.94 KiB = 2.01 MiB Load Address: 80000000 Entry Point: 80004000 Image arch/arc/boot/uImage is ready Image Name: Linux-5.10.0-rc4-00003-g62f23044 Created: Sun Nov 22 02:52:26 2020 Image Type: ARC Linux Kernel Image (gzip compressed) Data Size: 2815455 Bytes = 2749.47 KiB = 2.69 MiB Load Address: 80000000 Entry Point: 80004000 This is a race between the two threads trying to write to the same file arch/arc/boot/uImage.gz. This is a potential problem that can generate a broken file. I fixed a similar problem for ARM by commit `3939f33450` ("ARM: 8418/1: add boot image dependencies to not generate invalid images"). I highly recommend to avoid such build rules that cause a race condition. Move the uImage rule to arch/arc/Makefile. Another strangeness is that arch/arc/boot/Makefile compares the timestamps between $(obj)/uImage and $(obj)/uImage.*: $(obj)/uImage: $(obj)/uImage.$(suffix-y) @ln -sf $(notdir $<) $@ @echo ' Image $@ is ready' This does not work as expected since $(obj)/uImage is a symlink. The symlink should be created in a phony target rule. I used $(kecho) instead of echo to suppress the message 'Image arch/arc/boot/uImage is ready' when the -s option is given. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-19 18:27:23 +01:00
Masahiro Yamada	443fb88d6d	ARC: build: add boot_targets to PHONY [ Upstream commit 0cfccb3c04934cdef42ae26042139f16e805b5f7 ] The top-level boot_targets (uImage and uImage.*) should be phony targets. They just let Kbuild descend into arch/arc/boot/ and create files there. If a file exists in the top directory with the same name, the boot image will not be created. You can confirm it by the following steps: $ export CROSS_COMPILE=<your-arc-compiler-prefix> $ make -s ARCH=arc defconfig all # vmlinux will be built $ touch uImage.gz $ make ARCH=arc uImage.gz CALL scripts/atomic/check-atomics.sh CALL scripts/checksyscalls.sh CHK include/generated/compile.h # arch/arc/boot/uImage.gz is not created Specify the targets as PHONY to fix this. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-19 18:27:23 +01:00
Masahiro Yamada	e1c4b5ff96	ARC: build: add uImage.lzma to the top-level target [ Upstream commit f2712ec76a5433e5ec9def2bd52a95df1f96d050 ] arch/arc/boot/Makefile supports uImage.lzma, but you cannot do 'make uImage.lzma' because the corresponding target is missing in arch/arc/Makefile. Add it. I also changed the assignment operator '+=' to ':=' since this is the only place where we expect this variable to be set. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-19 18:27:23 +01:00
Masahiro Yamada	cf4592a2d7	ARC: build: remove non-existing bootpImage from KBUILD_IMAGE [ Upstream commit 9836720911cfec25d3fbdead1c438bf87e0f2841 ] The deb-pkg builds for ARCH=arc fail. $ export CROSS_COMPILE=<your-arc-compiler-prefix> $ make -s ARCH=arc defconfig $ make ARCH=arc bindeb-pkg SORTTAB vmlinux SYSMAP System.map MODPOST Module.symvers make KERNELRELEASE=5.10.0-rc4 ARCH=arc KBUILD_BUILD_VERSION=2 -f ./Makefile intdeb-pkg sh ./scripts/package/builddeb cp: cannot stat 'arch/arc/boot/bootpImage': No such file or directory make[4]: * [scripts/Makefile.package:87: intdeb-pkg] Error 1 make[3]: * [Makefile:1527: intdeb-pkg] Error 2 make[2]: * [debian/rules:13: binary-arch] Error 2 dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2 make[1]: * [scripts/Makefile.package:83: bindeb-pkg] Error 2 make: *** [Makefile:1527: bindeb-pkg] Error 2 The reason is obvious; arch/arc/Makefile sets $(boot)/bootpImage as the default image, but there is no rule to build it. Remove the meaningless KBUILD_IMAGE assignment so it will fallback to the default vmlinux. With this change, you can build the deb package. I removed the 'bootpImage' target as well. At best, it provides 'make bootpImage' as an alias of 'make vmlinux', but I do not see much sense in doing so. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-19 18:27:23 +01:00
Alexander Lobakin	c8c2b27ab3	MIPS: relocatable: fix possible boot hangup with KASLR enabled commit 69e976831cd53f9ba304fd20305b2025ecc78eab upstream. LLVM-built Linux triggered a boot hangup with KASLR enabled. arch/mips/kernel/relocate.c:get_random_boot() uses linux_banner, which is a string constant, as a random seed, but accesses it as an array of unsigned long (in rotate_xor()). When the address of linux_banner is not aligned to sizeof(long), such access emits unaligned access exception and hangs the kernel. Use PTR_ALIGN() to align input address to sizeof(long) and also align down the input length to prevent possible access-beyond-end. Fixes: `405bc8fd12` ("MIPS: Kernel: Implement KASLR using CONFIG_RELOCATABLE") Cc: stable@vger.kernel.org # 4.7+ Signed-off-by: Alexander Lobakin <alobakin@pm.me> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-19 18:27:20 +01:00
Al Viro	652daca07f	MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps commit 698222457465ce343443be81c5512edda86e5914 upstream. Patches that introduced NT_FILE and NT_SIGINFO notes back in 2012 had taken care of native (fs/binfmt_elf.c) and compat (fs/compat_binfmt_elf.c) coredumps; unfortunately, compat on mips (which does not go through the usual compat_binfmt_elf.c) had not been noticed. As the result, both N32 and O32 coredumps on 64bit mips kernels have those sections malformed enough to confuse the living hell out of all gdb and readelf versions (up to and including the tip of binutils-gdb.git). Longer term solution is to make both O32 and N32 compat use the regular compat_binfmt_elf.c, but that's too much for backports. The minimal solution is to do in arch/mips/kernel/binfmt_elf[on]32.c the same thing those patches have done in fs/compat_binfmt_elf.c Cc: stable@kernel.org # v3.7+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-19 18:27:20 +01:00
Paul Cercueil	9e2413f41a	MIPS: boot: Fix unaligned access with CONFIG_MIPS_RAW_APPENDED_DTB commit 4d4f9c1a17a3480f8fe523673f7232b254d724b7 upstream. The compressed payload is not necesarily 4-byte aligned, at least when compiling with Clang. In that case, the 4-byte value appended to the compressed payload that corresponds to the uncompressed kernel image size must be read using get_unaligned_le32(). This fixes Clang-built kernels not booting on MIPS (tested on a Ingenic JZ4770 board). Fixes: `b8f54f2cde` ("MIPS: ZBOOT: copy appended dtb to the end of the kernel") Cc: <stable@vger.kernel.org> # v4.7 Signed-off-by: Paul Cercueil <paul@crapouillou.net> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-19 18:27:20 +01:00
Anders Roxell	974f19621f	mips: lib: uncached: fix non-standard usage of variable 'sp' commit 5b058973d3205578aa6c9a71392e072a11ca44ef upstream. When building mips tinyconfig with clang the following warning show up: arch/mips/lib/uncached.c:45:6: warning: variable 'sp' is uninitialized when used here [-Wuninitialized] if (sp >= (long)CKSEG0 && sp < (long)CKSEG2) ^~ arch/mips/lib/uncached.c:40:18: note: initialize the variable 'sp' to silence this warning register long sp __asm__("$sp"); ^ = 0 1 warning generated. Rework to make an explicit inline move, instead of the non-standard use of specifying registers for local variables. This is what's written from the gcc-10 manual [1] about specifying registers for local variables: "6.47.5.2 Specifying Registers for Local Variables ................................................. [...] "The only supported use for this feature is to specify registers for input and output operands when calling Extended 'asm' (*note Extended Asm::). [...]". [1] https://docs.w3cub.com/gcc~10/local-register-variables Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-19 18:27:20 +01:00
Anders Roxell	5ca873f92b	mips: fix Section mismatch in reference commit ad4fddef5f2345aa9214e979febe2f47639c10d9 upstream. When building mips tinyconfig with clang the following error show up: WARNING: modpost: vmlinux.o(.text+0x1940c): Section mismatch in reference from the function r4k_cache_init() to the function .init.text:loongson3_sc_init() The function r4k_cache_init() references the function __init loongson3_sc_init(). This is often because r4k_cache_init lacks a __init annotation or the annotation of loongson3_sc_init is wrong. Remove marked __init from function loongson3_sc_init(), mips_sc_probe_cm3(), and mips_sc_probe(). Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-19 18:27:20 +01:00
Nick Hu	4b0a0655da	riscv: Fix KASAN memory mapping. commit c25a053e15778f6b4d6553708673736e27a6c2cf upstream. Use virtual address instead of physical address when translating the address to shadow memory by kasan_mem_to_shadow(). Signed-off-by: Nick Hu <nickhu@andestech.com> Signed-off-by: Nylon Chen <nylon7@andestech.com> Fixes: `b10d6bca87` ("arch, drivers: replace for_each_membock() with for_each_mem_range()") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-19 18:27:20 +01:00
Guo Ren	ab7594f639	riscv: Fixup CONFIG_GENERIC_TIME_VSYSCALL commit 0aa2ec8a475fb505fd98d93bbcf4e03beeeebcb6 upstream. The patch fix commit: `ad5d112` ("riscv: use vDSO common flow to reduce the latency of the time-related functions"). The GENERIC_TIME_VSYSCALL should be CONFIG_GENERIC_TIME_VSYSCALL or vgettimeofday won't work. Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Fixes: `ad5d1122b8` ("riscv: use vDSO common flow to reduce the latency of the time-related functions") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-19 18:27:20 +01:00
Andreas Schwab	7c4ced3682	riscv: return -ENOSYS for syscall -1 commit cf7b2ae4d70432fa94ebba3fbaab825481ae7189 upstream. Properly return -ENOSYS for syscall -1 instead of leaving the return value uninitialized. This fixes the strace teststuite. Fixes: `5340627e3f` ("riscv: add support for SECCOMP and SECCOMP_FILTER") Cc: stable@vger.kernel.org Signed-off-by: Andreas Schwab <schwab@suse.de> Reviewed-by: Tycho Andersen <tycho@tycho.pizza> Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-19 18:27:20 +01:00
Kefeng Wang	eae7b19b32	riscv: Drop a duplicated PAGE_KERNEL_EXEC commit 0ea02c73775277001c651ad4a0e83781a9acf406 upstream. commit `b91540d52a` ("RISC-V: Add EFI runtime services") add a duplicated PAGE_KERNEL_EXEC, kill it. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Reviewed-by: Atish Patra <atish.patra@wdc.com> Fixes: `b91540d52a` ("RISC-V: Add EFI runtime services") Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-19 18:27:20 +01:00
Wei Liu	ad8ca24ba8	x86/hyperv: check cpu mask after interrupt has been disabled commit ad0a6bad44758afa3b440c254a24999a0c7e35d5 upstream. We've observed crashes due to an empty cpu mask in hyperv_flush_tlb_others. Obviously the cpu mask in question is changed between the cpumask_empty call at the beginning of the function and when it is actually used later. One theory is that an interrupt comes in between and a code path ends up changing the mask. Move the check after interrupt has been disabled to see if it fixes the issue. Signed-off-by: Wei Liu <wei.liu@kernel.org> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20210105175043.28325-1-wei.liu@kernel.org Reviewed-by: Michael Kelley <mikelley@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-19 18:27:18 +01:00
Marc Zyngier	610e2c5699	KVM: arm64: Don't access PMCR_EL0 when no PMU is available commit 2a5f1b67ec577fb1544b563086e0377f095f88e2 upstream. We reset the guest's view of PMCR_EL0 unconditionally, based on the host's view of this register. It is however legal for an implementation not to provide any PMU, resulting in an UNDEF. The obvious fix is to skip the reset of this shadow register when no PMU is available, sidestepping the issue entirely. If no PMU is available, the guest is not able to request a virtual PMU anyway, so not doing nothing is the right thing to do! It is unlikely that this bug can hit any HW implementation though, as they all provide a PMU. It has been found using nested virt with the host KVM not implementing the PMU itself. Fixes: `ab9468340d` ("arm64: KVM: Add access handler for PMCR register") Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20201210083059.1277162-1-maz@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-17 14:17:05 +01:00
Shannon Zhao	ae4db0bc5c	arm64: cpufeature: remove non-exist CONFIG_KVM_ARM_HOST commit 45ba7b195a369f35cb39094fdb32efe5908b34ad upstream. Commit `d82755b2e7` ("KVM: arm64: Kill off CONFIG_KVM_ARM_HOST") deletes CONFIG_KVM_ARM_HOST option, it should use CONFIG_KVM instead. Just remove CONFIG_KVM_ARM_HOST here. Fixes: `d82755b2e7` ("KVM: arm64: Kill off CONFIG_KVM_ARM_HOST") Signed-off-by: Shannon Zhao <shannon.zhao@linux.alibaba.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1609760324-92271-1-git-send-email-shannon.zhao@linux.alibaba.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-17 14:17:04 +01:00
Nicolas Saenz Julienne	41dcfc0cb9	arm64: mm: Fix ARCH_LOW_ADDRESS_LIMIT when !CONFIG_ZONE_DMA commit 095507dc1350b3a2b8b39fdc05edba0c10859eca upstream. Systems configured with CONFIG_ZONE_DMA32, CONFIG_ZONE_NORMAL and !CONFIG_ZONE_DMA will fail to properly setup ARCH_LOW_ADDRESS_LIMIT. The limit will default to ~0ULL, effectively spanning the whole memory, which is too high for a configuration that expects low memory to be capped at 4GB. Fix ARCH_LOW_ADDRESS_LIMIT by falling back to arm64_dma32_phys_limit when arm64_dma_phys_limit isn't set. arm64_dma32_phys_limit will honour CONFIG_ZONE_DMA32, or span the entire memory when not enabled. Fixes: `1a8e1cef76` ("arm64: use both ZONE_DMA and ZONE_DMA32") Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Link: https://lore.kernel.org/r/20201218163307.10150-1-nsaenzjulienne@suse.de Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-17 14:17:02 +01:00
Andreas Kemnade	0a27398d89	ARM: OMAP2+: omap_device: fix idling of devices during probe commit ec76c2eea903947202098090bbe07a739b5246e9 upstream. On the GTA04A5 od->_driver_status was not set to BUS_NOTIFY_BIND_DRIVER during probe of the second mmc used for wifi. Therefore omap_device_late_idle idled the device during probing causing oopses when accessing the registers. It was not set because od->_state was set to OMAP_DEVICE_STATE_IDLE in the notifier callback. Therefore set od->_driver_status also in that case. This came apparent after commit `21b2cec61c` ("mmc: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in v4.4") causing this oops: omap_hsmmc 480b4000.mmc: omap_device_late_idle: enabled but no driver. Idling 8<--- cut here --- Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa0b402c ... (omap_hsmmc_set_bus_width) from [<c07996bc>] (omap_hsmmc_set_ios+0x11c/0x258) (omap_hsmmc_set_ios) from [<c077b2b0>] (mmc_power_up.part.8+0x3c/0xd0) (mmc_power_up.part.8) from [<c077c14c>] (mmc_start_host+0x88/0x9c) (mmc_start_host) from [<c077d284>] (mmc_add_host+0x58/0x84) (mmc_add_host) from [<c0799190>] (omap_hsmmc_probe+0x5fc/0x8c0) (omap_hsmmc_probe) from [<c0666728>] (platform_drv_probe+0x48/0x98) (platform_drv_probe) from [<c066457c>] (really_probe+0x1dc/0x3b4) Fixes: `04abaf07f6` ("ARM: OMAP2+: omap_device: Sync omap_device and pm_runtime after probe defer") Fixes: `21b2cec61c` ("mmc: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in v4.4") Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Andreas Kemnade <andreas@kemnade.info> [tony@atomide.com: left out extra parens, trimmed description stack trace] Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-17 14:16:59 +01:00
Brian Gerst	797335659e	fanotify: Fix sys_fanotify_mark() on native x86-32 commit 2ca408d9c749c32288bc28725f9f12ba30299e8f upstream. Commit `121b32a58a` ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments") converted native x86-32 which take 64-bit arguments to use the compat handlers to allow conversion to passing args via pt_regs. sys_fanotify_mark() was however missed, as it has a general compat handler. Add a config option that will use the syscall wrapper that takes the split args for native 32-bit. [ bp: Fix typo in Kconfig help text. ] Fixes: `121b32a58a` ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments") Reported-by: Paweł Jasiak <pawel@jasiak.xyz> Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jan Kara <jack@suse.cz> Acked-by: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20201130223059.101286-1-brgerst@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-17 14:16:59 +01:00
Christophe Leroy	bca9ca5a60	powerpc/32s: Fix RTAS machine check with VMAP stack [ Upstream commit 98bf2d3f4970179c702ef64db658e0553bc6ef3a ] When we have VMAP stack, exception prolog 1 sets r1, not r11. When it is not an RTAS machine check, don't trash r1 because it is needed by prolog 1. Fixes: `da7bb43ab9` ("powerpc/32: Fix vmap stack - Properly set r1 before activating MMU") Fixes: d2e006036082 ("powerpc/32: Use SPRN_SPRG_SCRATCH2 in exception prologs") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Squash in fixup for RTAS machine check from Christophe] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bc77d61d1c18940e456a2dee464f1e2eda65a3f0.1608621048.git.christophe.leroy@csgroup.eu Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-17 14:16:52 +01:00
Paolo Bonzini	563135ec66	KVM: x86: fix shift out of bounds reported by UBSAN commit 2f80d502d627f30257ba7e3655e71c373b7d1a5a upstream. Since we know that e >= s, we can reassociate the left shift, changing the shifted number from 1 to 2 in exchange for decreasing the right hand side by 1. Reported-by: syzbot+e87846c48bf72bc85311@syzkaller.appspotmail.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-12 20:18:26 +01:00
Ying-Tsun Huang	02ccda90ef	x86/mtrr: Correct the range check before performing MTRR type lookups commit cb7f4a8b1fb426a175d1708f05581939c61329d4 upstream. In mtrr_type_lookup(), if the input memory address region is not in the MTRR, over 4GB, and not over the top of memory, a write-back attribute is returned. These condition checks are for ensuring the input memory address region is actually mapped to the physical memory. However, if the end address is just aligned with the top of memory, the condition check treats the address is over the top of memory, and write-back attribute is not returned. And this hits in a real use case with NVDIMM: the nd_pmem module tries to map NVDIMMs as cacheable memories when NVDIMMs are connected. If a NVDIMM is the last of the DIMMs, the performance of this NVDIMM becomes very low since it is aligned with the top of memory and its memory type is uncached-minus. Move the input end address change to inclusive up into mtrr_type_lookup(), before checking for the top of memory in either mtrr_type_lookup_{variable,fixed}() helpers. [ bp: Massage commit message. ] Fixes: `0cc705f56e` ("x86/mm/mtrr: Clean up mtrr_type_lookup()") Signed-off-by: Ying-Tsun Huang <ying-tsun.huang@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201215070721.4349-1-ying-tsun.huang@amd.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-12 20:18:26 +01:00
Aaro Koskinen	56429ddfd5	ARM: dts: OMAP3: disable AES on N950/N9 commit f1dc15cd7fc146107cad2a926d9c1d005f69002a upstream. AES needs to be disabled on Nokia N950/N9 as well (HS devices), otherwise kernel fails to boot. Fixes: `c312f06631` ("ARM: dts: omap3: Migrate AES from hwmods to sysc-omap2") Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-12 20:18:25 +01:00
Nick Desaulniers	1cd7e30a6d	arm64: link with -z norelro for LLD or aarch64-elf commit 311bea3cb9ee20ef150ca76fc60a592bf6b159f5 upstream. With GNU binutils 2.35+, linking with BFD produces warnings for vmlinux: aarch64-linux-gnu-ld: warning: -z norelro ignored BFD can produce this warning when the target emulation mode does not support RELRO program headers, and -z relro or -z norelro is passed. Alan Modra clarifies: The default linker emulation for an aarch64-linux ld.bfd is -maarch64linux, the default for an aarch64-elf linker is -maarch64elf. They are not equivalent. If you choose -maarch64elf you get an emulation that doesn't support -z relro. The ARCH=arm64 kernel prefers -maarch64elf, but may fall back to -maarch64linux based on the toolchain configuration. LLD will always create RELRO program header regardless of target emulation. To avoid the above warning when linking with BFD, pass -z norelro only when linking with LLD or with -maarch64linux. Fixes: `3b92fa7485` ("arm64: link with -z norelro regardless of CONFIG_RELOCATABLE") Fixes: `3bbd3db864` ("arm64: relocatable: fix inconsistencies in linker script and options") Cc: <stable@vger.kernel.org> # 5.0.x- Reported-by: kernelci.org bot <bot@kernelci.org> Reported-by: Quentin Perret <qperret@google.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Cc: Alan Modra <amodra@gmail.com> Cc: Fāng-ruì Sòng <maskray@google.com> Link: https://lore.kernel.org/r/20201218002432.788499-1-ndesaulniers@google.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-12 20:18:24 +01:00
Fenghua Yu	397e352ca9	x86/resctrl: Don't move a task to the same resource group commit a0195f314a25582b38993bf30db11c300f4f4611 upstream. Shakeel Butt reported in [1] that a user can request a task to be moved to a resource group even if the task is already in the group. It just wastes time to do the move operation which could be costly to send IPI to a different CPU. Add a sanity check to ensure that the move operation only happens when the task is not already in the resource group. [1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/ Fixes: `e02737d5b8` ("x86/intel_rdt: Add tasks files") Reported-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/962ede65d8e95be793cb61102cca37f7bb018e66.1608243147.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-12 20:18:23 +01:00
Fenghua Yu	34e4ae4dca	x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR commit ae28d1aae48a1258bd09a6f707ebb4231d79a761 upstream. Currently, when moving a task to a resource group the PQR_ASSOC MSR is updated with the new closid and rmid in an added task callback. If the task is running, the work is run as soon as possible. If the task is not running, the work is executed later in the kernel exit path when the kernel returns to the task again. Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task is running is the right thing to do. Queueing work for a task that is not running is unnecessary (the PQR_ASSOC MSR is already updated when the task is scheduled in) and causing system resource waste with the way in which it is implemented: Work to update the PQR_ASSOC register is queued every time the user writes a task id to the "tasks" file, even if the task already belongs to the resource group. This could result in multiple pending work items associated with a single task even if they are all identical and even though only a single update with most recent values is needed. Specifically, even if a task is moved between different resource groups while it is sleeping then it is only the last move that is relevant but yet a work item is queued during each move. This unnecessary queueing of work items could result in significant system resource waste, especially on tasks sleeping for a long time. For example, as demonstrated by Shakeel Butt in [1] writing the same task id to the "tasks" file can quickly consume significant memory. The same problem (wasted system resources) occurs when moving a task between different resource groups. As pointed out by Valentin Schneider in [2] there is an additional issue with the way in which the queueing of work is done in that the task_struct update is currently done after the work is queued, resulting in a race with the register update possibly done before the data needed by the update is available. To solve these issues, update the PQR_ASSOC MSR in a synchronous way right after the new closid and rmid are ready during the task movement, only if the task is running. If a moved task is not running nothing is done since the PQR_ASSOC MSR will be updated next time the task is scheduled. This is the same way used to update the register when tasks are moved as part of resource group removal. [1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/ [2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.com [ bp: Massage commit message and drop the two update_task_closid_rmid() variants. ] Fixes: `e02737d5b8` ("x86/intel_rdt: Add tasks files") Reported-by: Shakeel Butt <shakeelb@google.com> Reported-by: Valentin Schneider <valentin.schneider@arm.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: James Morse <james.morse@arm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/17aa2fb38fc12ce7bb710106b3e7c7b45acb9e94.1608243147.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-12 20:18:23 +01:00
Ben Gardon	c3cf9ffe8d	KVM: x86/mmu: Ensure TDP MMU roots are freed after yield commit a889ea54b3daa63ee1463dc19ed699407d61458b upstream. Many TDP MMU functions which need to perform some action on all TDP MMU roots hold a reference on that root so that they can safely drop the MMU lock in order to yield to other threads. However, when releasing the reference on the root, there is a bug: the root will not be freed even if its reference count (root_count) is reduced to 0. To simplify acquiring and releasing references on TDP MMU root pages, and to ensure that these roots are properly freed, move the get/put operations into another TDP MMU root iterator macro. Moving the get/put operations into an iterator macro also helps simplify control flow when a root does need to be freed. Note that using the list_for_each_entry_safe macro would not have been appropriate in this situation because it could keep a pointer to the next root across an MMU lock release + reacquire, during which time that root could be freed. Reported-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Fixes: `faaf05b00a` ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU") Fixes: `063afacd87` ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU") Fixes: `a6a0b05da9` ("kvm: x86/mmu: Support dirty logging for the TDP MMU") Fixes: `1488199856` ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU") Signed-off-by: Ben Gardon <bgardon@google.com> Message-Id: <20210107001935.3732070-1-bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-12 20:18:22 +01:00
Sean Christopherson	f4064ef40c	KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE commit 39b4d43e6003cee51cd119596d3c33d0449eb44c upstream. Get the so called "root" level from the low level shadow page table walkers instead of manually attempting to calculate it higher up the stack, e.g. in get_mmio_spte(). When KVM is using PAE shadow paging, the starting level of the walk, from the callers perspective, is not the CR3 root but rather the PDPTR "root". Checking for reserved bits from the CR3 root causes get_mmio_spte() to consume uninitialized stack data due to indexing into sptes[] for a level that was not filled by get_walk(). This can result in false positives and/or negatives depending on what garbage happens to be on the stack. Opportunistically nuke a few extra newlines. Fixes: `95fb5b0258` ("kvm: x86/mmu: Support MMIO in the TDP MMU") Reported-by: Richard Herbert <rherbert@sympatico.ca> Cc: Ben Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201218003139.2167891-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-12 20:18:22 +01:00
Sean Christopherson	afd621673f	KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte() commit 2aa078932ff6c66bf10cc5b3144440dbfa7d813d upstream. Return -1 from the get_walk() helpers if the shadow walk doesn't fill at least one spte, which can theoretically happen if the walk hits a not-present PDPTR. Returning the root level in such a case will cause get_mmio_spte() to return garbage (uninitialized stack data). In practice, such a scenario should be impossible as KVM shouldn't get a reserved-bit page fault with a not-present PDPTR. Note, using mmu->root_level in get_walk() is wrong for other reasons, too, but that's now a moot point. Fixes: `95fb5b0258` ("kvm: x86/mmu: Support MMIO in the TDP MMU") Cc: Ben Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20201218003139.2167891-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-12 20:18:22 +01:00
Dan Williams	23220e87c9	x86/mm: Fix leak of pmd ptlock commit d1c5246e08eb64991001d97a3bd119c93edbc79a upstream. Commit `28ee90fe60` ("x86/mm: implement free pmd/pte page interfaces") introduced a new location where a pmd was released, but neglected to run the pmd page destructor. In fact, this happened previously for a different pmd release path and was fixed by commit: `c283610e44` ("x86, mm: do not leak page->ptl for pmd page tables"). This issue was hidden until recently because the failure mode is silent, but commit: `b2b29d6d01` ("mm: account PMD tables like PTE tables") turns the failure mode into this signature: BUG: Bad page state in process lt-pmem-ns pfn:15943d page:000000007262ed7b refcount:0 mapcount:-1024 mapping:0000000000000000 index:0x0 pfn:0x15943d flags: 0xaffff800000000() raw: 00affff800000000 dead000000000100 0000000000000000 0000000000000000 raw: 0000000000000000 ffff913a029bcc08 00000000fffffbff 0000000000000000 page dumped because: nonzero mapcount [..] dump_stack+0x8b/0xb0 bad_page.cold+0x63/0x94 free_pcp_prepare+0x224/0x270 free_unref_page+0x18/0xd0 pud_free_pmd_page+0x146/0x160 ioremap_pud_range+0xe3/0x350 ioremap_page_range+0x108/0x160 __ioremap_caller.constprop.0+0x174/0x2b0 ? memremap+0x7a/0x110 memremap+0x7a/0x110 devm_memremap+0x53/0xa0 pmem_attach_disk+0x4ed/0x530 [nd_pmem] ? __devm_release_region+0x52/0x80 nvdimm_bus_probe+0x85/0x210 [libnvdimm] Given this is a repeat occurrence it seemed prudent to look for other places where this destructor might be missing and whether a better helper is needed. try_to_free_pmd_page() looks like a candidate, but testing with setting up and tearing down pmd mappings via the dax unit tests is thus far not triggering the failure. As for a better helper pmd_free() is close, but it is a messy fit due to requiring an @mm arg. Also, ___pmd_free_tlb() wants to call paravirt_tlb_remove_table() instead of free_page(), so open-coded pgtable_pmd_page_dtor() seems the best way forward for now. Debugged together with Matthew Wilcox <willy@infradead.org>. Fixes: `28ee90fe60` ("x86/mm: implement free pmd/pte page interfaces") Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Yi Zhang <yi.zhang@redhat.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/160697689204.605323.17629854984697045602.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-12 20:18:22 +01:00
Nathan Chancellor	cb5a170e97	powerpc: Handle .text.{hot,unlikely}.* in linker script commit 3ce47d95b7346dcafd9bed3556a8d072cb2b8571 upstream. Commit `eff8728fe6` ("vmlinux.lds.h: Add PGO and AutoFDO input sections") added ".text.unlikely." and ".text.hot." due to an LLVM change [1]. After another LLVM change [2], these sections are seen in some PowerPC builds, where there is a orphan section warning then build failure: $ make -skj"$(nproc)" \ ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 O=out \ distclean powernv_defconfig zImage.epapr ld.lld: warning: kernel/built-in.a(panic.o):(.text.unlikely.) is being placed in '.text.unlikely.' ... ld.lld: warning: address (0xc000000000009314) of section .text is not a multiple of alignment (256) ... ERROR: start_text address is c000000000009400, should be c000000000008000 ERROR: try to enable LD_HEAD_STUB_CATCH config option ERROR: see comments in arch/powerpc/tools/head_check.sh ... Explicitly handle these sections like in the main linker script so there is no more build failure. [1]: https://reviews.llvm.org/D79600 [2]: https://reviews.llvm.org/D92493 Fixes: `83a092cf95` ("powerpc: Link warning for orphan sections") Cc: stable@vger.kernel.org Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/1218 Link: https://lore.kernel.org/r/20210104205952.1399409-1-natechancellor@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-12 20:18:17 +01:00
Randy Dunlap	2179bae04b	local64.h: make <asm/local64.h> mandatory [ Upstream commit 87dbc209ea04645fd2351981f09eff5d23f8e2e9 ] Make <asm-generic/local64.h> mandatory in include/asm-generic/Kbuild and remove all arch/*/include/asm/local64.h arch-specific files since they only #include <asm-generic/local64.h>. This fixes build errors on arch/c6x/ and arch/nios2/ for block/blk-iocost.c. Build-tested on 21 of 25 arch-es. (tools problems on the others) Yes, we could even rename <asm-generic/local64.h> to <linux/local64.h> and change all #includes to use <linux/local64.h> instead. Link: https://lkml.kernel.org/r/20201227024446.17018-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Masahiro Yamada <masahiroy@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <jacquiot.aurelien@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-12 20:18:16 +01:00
Heiko Carstens	13f9eec229	s390: always clear kernel stack backchain before calling functions [ Upstream commit 9365965db0c7ca7fc81eee27c21d8522d7102c32 ] Clear the kernel stack backchain before potentially calling the lockdep trace_hardirqs_off/on functions. Without this walking the kernel backchain, e.g. during a panic, might stop too early. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-06 14:56:55 +01:00
Gabriel Krisman Bertazi	ef3b9ad967	um: ubd: Submit all data segments atomically [ Upstream commit fc6b6a872dcd48c6f39c7975836d75113db67d37 ] Internally, UBD treats each physical IO segment as a separate command to be submitted in the execution pipe. If the pipe returns a transient error after a few segments have already been written, UBD will tell the block layer to requeue the request, but there is no way to reclaim the segments already submitted. When a new attempt to dispatch the request is done, those segments already submitted will get duplicated, causing the WARN_ON below in the best case, and potentially data corruption. In my system, running a UML instance with 2GB of RAM and a 50M UBD disk, I can reproduce the WARN_ON by simply running mkfs.fvat against the disk on a freshly booted system. There are a few ways to around this, like reducing the pressure on the pipe by reducing the queue depth, which almost eliminates the occurrence of the problem, increasing the pipe buffer size on the host system, or by limiting the request to one physical segment, which causes the block layer to submit way more requests to resolve a single operation. Instead, this patch modifies the format of a UBD command, such that all segments are sent through a single element in the communication pipe, turning the command submission atomic from the point of view of the block layer. The new format has a variable size, depending on the number of elements, and looks like this: +------------+-----------+-----------+------------ \| cmd_header \| segment 0 \| segment 1 \| segment ... +------------+-----------+-----------+------------ With this format, we push a pointer to cmd_header in the submission pipe. This has the advantage of reducing the memory footprint of executing a single request, since it allow us to merge some fields in the header. It is possible to reduce even further each segment memory footprint, by merging bitmap_words and cow_offset, for instance, but this is not the focus of this patch and is left as future work. One issue with the patch is that for a big number of segments, we now perform one big memory allocation instead of multiple small ones, but I wasn't able to trigger any real issues or -ENOMEM because of this change, that wouldn't be reproduced otherwise. This was tested using fio with the verify-crc32 option, and by running an ext4 filesystem over this UBD device. The original WARN_ON was: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0x13f/0x141 refcount_t: underflow; use-after-free. Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.5.0-rc6-00002-g2a5bb2cf75c8 #346 Stack: 6084eed0 6063dc77 00000009 6084ef60 00000000 604b8d9f 6084eee0 6063dcbc 6084ef40 6006ab8d e013d780 1c00000000 Call Trace: [<600a0c1c>] ? printk+0x0/0x94 [<6004a888>] show_stack+0x13b/0x155 [<6063dc77>] ? dump_stack_print_info+0xdf/0xe8 [<604b8d9f>] ? refcount_warn_saturate+0x13f/0x141 [<6063dcbc>] dump_stack+0x2a/0x2c [<6006ab8d>] __warn+0x107/0x134 [<6008da6c>] ? wake_up_process+0x17/0x19 [<60487628>] ? blk_queue_max_discard_sectors+0x0/0xd [<6006b05f>] warn_slowpath_fmt+0xd1/0xdf [<6006af8e>] ? warn_slowpath_fmt+0x0/0xdf [<600acc14>] ? raw_read_seqcount_begin.constprop.0+0x0/0x15 [<600619ae>] ? os_nsecs+0x1d/0x2b [<604b8d9f>] refcount_warn_saturate+0x13f/0x141 [<6048bc8f>] refcount_sub_and_test.constprop.0+0x2f/0x37 [<6048c8de>] blk_mq_free_request+0xf1/0x10d [<6048ca06>] __blk_mq_end_request+0x10c/0x114 [<6005ac0f>] ubd_intr+0xb5/0x169 [<600a1a37>] __handle_irq_event_percpu+0x6b/0x17e [<600a1b70>] handle_irq_event_percpu+0x26/0x69 [<600a1bd9>] handle_irq_event+0x26/0x34 [<600a1bb3>] ? handle_irq_event+0x0/0x34 [<600a5186>] ? unmask_irq+0x0/0x37 [<600a57e6>] handle_edge_irq+0xbc/0xd6 [<600a131a>] generic_handle_irq+0x21/0x29 [<60048f6e>] do_IRQ+0x39/0x54 [...] ---[ end trace c6e7444e55386c0f ]--- Cc: Christopher Obbard <chris.obbard@collabora.com> Reported-by: Martyn Welch <martyn@collabora.com> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Tested-by: Christopher Obbard <chris.obbard@collabora.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-06 14:56:55 +01:00
Christopher Obbard	a8b49c4bdf	um: random: Register random as hwrng-core device [ Upstream commit 72d3e093afae79611fa38f8f2cfab9a888fe66f2 ] The UML random driver creates a dummy device under the guest, /dev/hw_random. When this file is read from the guest, the driver reads from the host machine's /dev/random, in-turn reading from the host kernel's entropy pool. This entropy pool could have been filled by a hardware random number generator or just the host kernel's internal software entropy generator. Currently the driver does not fill the guests kernel entropy pool, this requires a userspace tool running inside the guest (like rng-tools) to read from the dummy device provided by this driver, which then would fill the guest's internal entropy pool. This all seems quite pointless when we are already reading from an entropy pool, so this patch aims to register the device as a hwrng device using the hwrng-core framework. This not only improves and cleans up the driver, but also fills the guest's entropy pool without having to resort to using extra userspace tools in the guest. This is typically a nuisance when booting a guest: the random pool takes a long time (~200s) to build up enough entropy since the dummy hwrng is not used to fill the guest's pool. This port was originally attempted by Alexander Neville "dark" (in CC, discussion in Link), but the conversation there stalled since the handling of -EAGAIN errors were no removed and longer handled by the driver. This patch attempts to use the existing method of error handling but utilises the new hwrng core. The issue can be noticed when booting a UML guest: [ 2.560000] random: fast init done [ 214.000000] random: crng init done With the patch applied, filling the pool becomes a lot quicker: [ 2.560000] random: fast init done [ 12.000000] random: crng init done Cc: Alexander Neville <dark@volatile.bz> Link: https://lore.kernel.org/lkml/20190828204609.02a7ff70@TheDarkness/ Link: https://lore.kernel.org/lkml/20190829135001.6a5ff940@TheDarkness.local/ Cc: Sjoerd Simons <sjoerd.simons@collabora.co.uk> Signed-off-by: Christopher Obbard <chris.obbard@collabora.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-06 14:56:55 +01:00
Nicholas Piggin	b1e155ccc8	powerpc/64: irq replay remove decrementer overflow check [ Upstream commit 59d512e4374b2d8a6ad341475dc94c4a4bdec7d3 ] This is way to catch some cases of decrementer overflow, when the decrementer has underflowed an odd number of times, while MSR[EE] was disabled. With a typical small decrementer, a timer that fires when MSR[EE] is disabled will be "lost" if MSR[EE] remains disabled for between 4.3 and 8.6 seconds after the timer expires. In any case, the decrementer interrupt would be taken at 8.6 seconds and the timer would be found at that point. So this check is for catching extreme latency events, and it prevents those latencies from being a further few seconds long. It's not obvious this is a good tradeoff. This is already a watchdog magnitude event and that situation is not improved a significantly with this check. For large decrementers, it's useless. Therefore remove this check, which avoids a mftb when enabling hard disabled interrupts (e.g., when enabling after coming from hardware interrupt handlers). Perhaps more importantly, it also removes the clunky MSR[EE] vs PACA_IRQ_HARD_DIS incoherency in soft-interrupt replay which simplifies the code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201107014336.2337337-1-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-06 14:56:54 +01:00
Qinglang Miao	498d90690f	powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe() [ Upstream commit ffa1797040c5da391859a9556be7b735acbe1242 ] I noticed that iounmap() of msgr_block_addr before return from mpic_msgr_probe() in the error handling case is missing. So use devm_ioremap() instead of just ioremap() when remapping the message register block, so the mapping will be automatically released on probe failure. Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-01-06 14:56:53 +01:00
Baoquan He	98b57685c2	mm: memmap defer init doesn't work as expected commit dc2da7b45ffe954a0090f5d0310ed7b0b37d2bd2 upstream. VMware observed a performance regression during memmap init on their platform, and bisected to commit `73a6e474cb` ("mm: memmap_init: iterate over memblock regions rather that check each PFN") causing it. Before the commit: [0.033176] Normal zone: 1445888 pages used for memmap [0.033176] Normal zone: 89391104 pages, LIFO batch:63 [0.035851] ACPI: PM-Timer IO Port: 0x448 With commit [0.026874] Normal zone: 1445888 pages used for memmap [0.026875] Normal zone: 89391104 pages, LIFO batch:63 [2.028450] ACPI: PM-Timer IO Port: 0x448 The root cause is the current memmap defer init doesn't work as expected. Before, memmap_init_zone() was used to do memmap init of one whole zone, to initialize all low zones of one numa node, but defer memmap init of the last zone in that numa node. However, since commit `73a6e474cb`, function memmap_init() is adapted to iterater over memblock regions inside one zone, then call memmap_init_zone() to do memmap init for each region. E.g, on VMware's system, the memory layout is as below, there are two memory regions in node 2. The current code will mistakenly initialize the whole 1st region [mem 0xab00000000-0xfcffffffff], then do memmap defer to iniatialize only one memmory section on the 2nd region [mem 0x10000000000-0x1033fffffff]. In fact, we only expect to see that there's only one memory section's memmap initialized. That's why more time is costed at the time. [ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff] [ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff] [ 0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff] [ 0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff] [ 0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff] [ 0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff] Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass down the real zone end pfn so that defer_init() can use it to judge whether defer need be taken in zone wide. Link: https://lkml.kernel.org/r/20201223080811.16211-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20201223080811.16211-2-bhe@redhat.com Fixes: commit `73a6e474cb` ("mm: memmap_init: iterate over memblock regions rather that check each PFN") Signed-off-by: Baoquan He <bhe@redhat.com> Reported-by: Rahul Gopakumar <gopakumarr@vmware.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-01-06 14:56:50 +01:00
Yazen Ghannam	700d098ace	x86/CPU/AMD: Save AMD NodeId as cpu_die_id [ Upstream commit 028c221ed1904af9ac3c5162ee98f48966de6b3d ] AMD systems provide a "NodeId" value that represents a global ID indicating to which "Node" a logical CPU belongs. The "Node" is a physical structure equivalent to a Die, and it should not be confused with logical structures like NUMA nodes. Logical nodes can be adjusted based on firmware or other settings whereas the physical nodes/dies are fixed based on hardware topology. The NodeId value can be used when a physical ID is needed by software. Save the AMD NodeId to struct cpuinfo_x86.cpu_die_id. Use the value from CPUID or MSR as appropriate. Default to phys_proc_id otherwise. Do so for both AMD and Hygon systems. Drop the node_id parameter from cacheinfo_*_init_llc_id() as it is no longer needed. Update the x86 topology documentation. Suggested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201109210659.754018-2-Yazen.Ghannam@amd.com Signed-off-by: Sasha Levin <sashal@kernel.org>	2020-12-30 11:54:29 +01:00
Steven Rostedt (VMware)	2bbb320656	Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS" commit adab66b71abfe206a020f11e561f4df41f0b2aba upstream. It was believed that metag was the only architecture that required the ring buffer to keep 8 byte words aligned on 8 byte architectures, and with its removal, it was assumed that the ring buffer code did not need to handle this case. It appears that sparc64 also requires this. The following was reported on a sparc64 boot up: kernel: futex hash table entries: 65536 (order: 9, 4194304 bytes, linear) kernel: Running postponed tracer tests: kernel: Testing tracer function: kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140 kernel: Kernel unaligned access at TPC[552a24] trace_function+0x44/0x140 kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140 kernel: Kernel unaligned access at TPC[552a24] trace_function+0x44/0x140 kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140 kernel: PASSED Need to put back the 64BIT aligned code for the ring buffer. Link: https://lore.kernel.org/r/CADxRZqzXQRYgKc=y-KV=S_yHL+Y8Ay2mh5ezeZUnpRvg+syWKw@mail.gmail.com Cc: stable@vger.kernel.org Fixes: `86b3de60a0` ("ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS") Reported-by: Anatoly Pugachev <matorola@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-12-30 11:54:29 +01:00
Johannes Berg	ef82413937	um: Fix time-travel mode commit ff9632d2a66512436d616ef4c380a0e73f748db1 upstream. Since the time-travel rework, basic time-travel mode hasn't worked properly, but there's no longer a need for this WARN_ON() so just remove it and thereby fix things. Cc: stable@vger.kernel.org Fixes: `4b786e24ca` ("um: time-travel: Rewrite as an event scheduler") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-12-30 11:54:17 +01:00
Anton Ivanov	c4b4253221	um: Remove use of asprinf in umid.c commit 97be7ceaf7fea68104824b6aa874cff235333ac1 upstream. asprintf is not compatible with the existing uml memory allocation mechanism. Its use on the "user" side of UML results in a corrupt slab state. Fixes: `0d4e5ac7e7` ("um: remove uses of variable length arrays") Cc: stable@vger.kernel.org Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-12-30 11:54:17 +01:00
David Hildenbrand	cd2eda58ea	powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrently commit d6718941a2767fb383e105d257d2105fe4f15f0e upstream. It's very easy to crash the kernel right now by simply trying to enable memtrace concurrently, hammering on the "enable" interface loop.sh: #!/bin/bash dmesg --console-off while true; do echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable done [root@localhost ~]# loop.sh & [root@localhost ~]# loop.sh & Resulting quickly in a kernel crash. Let's properly protect using a mutex. Fixes: `9d5171a8f2` ("powerpc/powernv: Enable removal of memory for in memory tracing") Cc: stable@vger.kernel.org# v4.14+ Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-3-david@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-12-30 11:54:16 +01:00
David Hildenbrand	4b8dcb006e	powerpc/powernv/memtrace: Don't leak kernel memory to user space commit c74cf7a3d59a21b290fe0468f5b470d0b8ee37df upstream. We currently leak kernel memory to user space, because memory offlining doesn't do any implicit clearing of memory and we are missing explicit clearing of memory. Let's keep it simple and clear pages before removing the linear mapping. Reproduced in QEMU/TCG with 10 GiB of main memory: [root@localhost ~]# dd obs=9G if=/dev/urandom of=/dev/null [... wait until "free -m" used counter no longer changes and cancel] 19665802+0 records in 1+0 records out 9663676416 bytes (9.7 GB, 9.0 GiB) copied, 135.548 s, 71.3 MB/s [root@localhost ~]# cat /sys/devices/system/memory/block_size_bytes 40000000 [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable [ 402.978663][ T1086] page:000000001bc4bc74 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24900 [ 402.980063][ T1086] flags: 0x7ffff000001000(reserved) [ 402.980415][ T1086] raw: 007ffff000001000 c00c000000924008 c00c000000924008 0000000000000000 [ 402.980627][ T1086] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 [ 402.980845][ T1086] page dumped because: unmovable page [ 402.989608][ T1086] Offlined Pages 16384 [ 403.324155][ T1086] memtrace: Allocated trace memory on node 0 at 0x0000000200000000 Before this patch: [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace \| head 00000000 c8 25 72 51 4d 26 36 c5 5c c2 56 15 d5 1a cd 10 \|.%rQM&6.\.V.....\| 00000010 19 b9 50 b2 cb e3 60 b8 ec 0a f3 ec 4b 3c 39 f0 \|..P...`.....K<9.\|$ 00000020 4e 5a 4c cf bd 26 19 ff 37 79 13 67 24 b7 b8 57 \|NZL..&..7y.g$..W\|$ 00000030 98 3e f5 be 6f 14 6a bd a4 52 bc 6e e9 e0 c1 5d \|.>..o.j..R.n...]\|$ 00000040 76 b3 ae b5 88 d7 da e3 64 23 85 2c 10 88 07 b6 \|v.......d#.,....\|$ 00000050 9a d8 91 de f7 50 27 69 2e 64 9c 6f d3 19 45 79 \|.....P'i.d.o..Ey\|$ 00000060 6a 6f 8a 61 71 19 1f c7 f1 df 28 26 ca 0f 84 55 \|jo.aq.....(&...U\|$ 00000070 01 3f be e4 e2 e1 da ff 7b 8c 8e 32 37 b4 24 53 \|.?......{..27.$S\|$ 00000080 1b 70 30 45 56 e6 8c c4 0e b5 4c fb 9f dd 88 06 \|.p0EV.....L.....\|$ 00000090 ef c4 18 79 f1 60 b1 5c 79 59 4d f4 36 d7 4a 5c \|...y.`.\yYM.6.J\\|$ After this patch: [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace \| head 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \|................\| * 40000000 Fixes: `9d5171a8f2` ("powerpc/powernv: Enable removal of memory for in memory tracing") Cc: stable@vger.kernel.org # v4.14+ Reported-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201111145322.15793-2-david@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-12-30 11:54:16 +01:00
Alexey Kardashevskiy	8fe4bee4c0	powerpc/powernv/npu: Do not attempt NPU2 setup on POWER8NVL NPU commit b1198a88230f2ce50c271e22b82a8b8610b2eea9 upstream. We execute certain NPU2 setup code (such as mapping an LPID to a device in NPU2) unconditionally if an Nvlink bridge is detected. However this cannot succeed on POWER8NVL machines and errors appear in dmesg. This is harmless as skiboot returns an error and the only place we check it is vfio-pci but that code does not get called on P8+ either. This adds a check if pnv_npu2_xxx helpers are called on a machine with NPU2 which initializes pnv_phb::npu in pnv_npu2_init(); pnv_phb::npu==NULL on POWER8/NVL (Naples). While at this, fix NULL derefencing in pnv_npu_peers_take_ownership/ pnv_npu_peers_release_ownership which occurs when GPUs on mentioned P8s cause EEH which happens if "vfio-pci" disables devices using the D3 power state; the vfio-pci's disable_idle_d3 module parameter controls this and must be set on Naples. The EEH handling clears the entire pnv_ioda_pe struct in pnv_ioda_free_pe() hence the NULL derefencing. We cannot recover from that but at least we stop crashing. Tested on - POWER9 pvr=004e1201, Ubuntu 19.04 host, Ubuntu 18.04 vm, NVIDIA GV100 10de:1db1 driver 418.39 - POWER8 pvr=004c0100, RHEL 7.6 host, Ubuntu 16.10 vm, NVIDIA P100 10de:15f9 driver 396.47 Fixes: `1b785611e1` ("powerpc/powernv/npu: Add release_ownership hook") Cc: stable@vger.kernel.org # 5.0 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201122073828.15446-1-aik@ozlabs.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2020-12-30 11:54:16 +01:00

1 2 3 4 5 ...

178354 Commits