[ Upstream commit a8f7e08a81708920a928664a865208fdf451c49f ]
The IN and OUT instructions with port address as an immediate operand
only use an 8-bit immediate (imm8). The current VC handler uses the
entire 32-bit immediate value but these instructions only set the first
bytes.
Cast the operand to an u8 for that.
[ bp: Massage commit message. ]
Fixes: 25189d08e5 ("x86/sev-es: Add support for handling IOIO exceptions")
Signed-off-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Link: https://lkml.kernel.org/r/20210105163311.221490-1-pgonda@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bac717171971176b78c72d15a8b6961764ab197f ]
dtc points out that the interrupts for some devices are not parsable:
picoxcell-pc3x2.dtsi:45.19-49.5: Warning (interrupts_property): /paxi/gem@30000: Missing interrupt-parent
picoxcell-pc3x2.dtsi:51.21-55.5: Warning (interrupts_property): /paxi/dmac@40000: Missing interrupt-parent
picoxcell-pc3x2.dtsi:57.21-61.5: Warning (interrupts_property): /paxi/dmac@50000: Missing interrupt-parent
picoxcell-pc3x2.dtsi:233.21-237.5: Warning (interrupts_property): /rwid-axi/axi2pico@c0000000: Missing interrupt-parent
There are two VIC instances, so it's not clear which one needs to be
used. I found the BSP sources that reference VIC0, so use that:
https://github.com/r1mikey/meta-picoxcell/blob/master/recipes-kernel/linux/linux-picochip-3.0/0001-picoxcell-support-for-Picochip-picoXcell-SoC.patch
Acked-by: Jamie Iles <jamie@jamieiles.com>
Link: https://lore.kernel.org/r/20201230152010.3914962-1-arnd@kernel.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8a48c0a3360bf2bf4f40c980d0ec216e770e58ee ]
fs/dax.c uses copy_user_page() but ARC does not provide that interface,
resulting in a build error.
Provide copy_user_page() in <asm/page.h>.
../fs/dax.c: In function 'copy_cow_page_dax':
../fs/dax.c:702:2: error: implicit declaration of function 'copy_user_page'; did you mean 'copy_to_user_page'? [-Werror=implicit-function-declaration]
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Dan Williams <dan.j.williams@intel.com>
#Acked-by: Vineet Gupta <vgupta@synopsys.com> # v1
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
#Reviewed-by: Ira Weiny <ira.weiny@intel.com> # v2
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7887cc89d5851cbdec49219e9614beec776af150 ]
A too high brightness by default (default is max) makes the
screen go blank. Set this to 15 as in the Vendor tree.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Stephan Gerhold <stephan@gerhold.net>
Link: https://lore.kernel.org/r/20201214223413.253893-1-linus.walleij@linaro.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c0bc969c176b10598b31d5d1a5edf9a5261f0a9f ]
xt875 comes up with a iva voltage of 1375000 and android runs at this too. fix
maximum voltage to be consistent with this.
Signed-off-by: Carl Philipp Klemm <philipp@uvos.xyz>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c5e6ae563c802c4d828d42e134af64004db2e58c ]
If you run 'make uImage uImage.gz' with the parallel option, uImage.gz
will be created by two threads simultaneously.
This is because arch/arc/Makefile does not specify the dependency
between uImage and uImage.gz. Hence, GNU Make assumes they can be
built in parallel. One thread descends into arch/arc/boot/ to create
uImage, and another to create uImage.gz.
Please notice the same log is displayed twice in the following steps:
$ export CROSS_COMPILE=<your-arc-compiler-prefix>
$ make -s ARCH=arc defconfig
$ make -j$(nproc) ARCH=arc uImage uImage.gz
[ snip ]
LD vmlinux
SORTTAB vmlinux
SYSMAP System.map
OBJCOPY arch/arc/boot/vmlinux.bin
OBJCOPY arch/arc/boot/vmlinux.bin
GZIP arch/arc/boot/vmlinux.bin.gz
GZIP arch/arc/boot/vmlinux.bin.gz
UIMAGE arch/arc/boot/uImage.gz
UIMAGE arch/arc/boot/uImage.gz
Image Name: Linux-5.10.0-rc4-00003-g62f23044
Created: Sun Nov 22 02:52:26 2020
Image Type: ARC Linux Kernel Image (gzip compressed)
Data Size: 2109376 Bytes = 2059.94 KiB = 2.01 MiB
Load Address: 80000000
Entry Point: 80004000
Image arch/arc/boot/uImage is ready
Image Name: Linux-5.10.0-rc4-00003-g62f23044
Created: Sun Nov 22 02:52:26 2020
Image Type: ARC Linux Kernel Image (gzip compressed)
Data Size: 2815455 Bytes = 2749.47 KiB = 2.69 MiB
Load Address: 80000000
Entry Point: 80004000
This is a race between the two threads trying to write to the same file
arch/arc/boot/uImage.gz. This is a potential problem that can generate
a broken file.
I fixed a similar problem for ARM by commit 3939f33450 ("ARM: 8418/1:
add boot image dependencies to not generate invalid images").
I highly recommend to avoid such build rules that cause a race condition.
Move the uImage rule to arch/arc/Makefile.
Another strangeness is that arch/arc/boot/Makefile compares the
timestamps between $(obj)/uImage and $(obj)/uImage.*:
$(obj)/uImage: $(obj)/uImage.$(suffix-y)
@ln -sf $(notdir $<) $@
@echo ' Image $@ is ready'
This does not work as expected since $(obj)/uImage is a symlink.
The symlink should be created in a phony target rule.
I used $(kecho) instead of echo to suppress the message
'Image arch/arc/boot/uImage is ready' when the -s option is given.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0cfccb3c04934cdef42ae26042139f16e805b5f7 ]
The top-level boot_targets (uImage and uImage.*) should be phony
targets. They just let Kbuild descend into arch/arc/boot/ and create
files there.
If a file exists in the top directory with the same name, the boot
image will not be created.
You can confirm it by the following steps:
$ export CROSS_COMPILE=<your-arc-compiler-prefix>
$ make -s ARCH=arc defconfig all # vmlinux will be built
$ touch uImage.gz
$ make ARCH=arc uImage.gz
CALL scripts/atomic/check-atomics.sh
CALL scripts/checksyscalls.sh
CHK include/generated/compile.h
# arch/arc/boot/uImage.gz is not created
Specify the targets as PHONY to fix this.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f2712ec76a5433e5ec9def2bd52a95df1f96d050 ]
arch/arc/boot/Makefile supports uImage.lzma, but you cannot do
'make uImage.lzma' because the corresponding target is missing
in arch/arc/Makefile. Add it.
I also changed the assignment operator '+=' to ':=' since this is the
only place where we expect this variable to be set.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9836720911cfec25d3fbdead1c438bf87e0f2841 ]
The deb-pkg builds for ARCH=arc fail.
$ export CROSS_COMPILE=<your-arc-compiler-prefix>
$ make -s ARCH=arc defconfig
$ make ARCH=arc bindeb-pkg
SORTTAB vmlinux
SYSMAP System.map
MODPOST Module.symvers
make KERNELRELEASE=5.10.0-rc4 ARCH=arc KBUILD_BUILD_VERSION=2 -f ./Makefile intdeb-pkg
sh ./scripts/package/builddeb
cp: cannot stat 'arch/arc/boot/bootpImage': No such file or directory
make[4]: *** [scripts/Makefile.package:87: intdeb-pkg] Error 1
make[3]: *** [Makefile:1527: intdeb-pkg] Error 2
make[2]: *** [debian/rules:13: binary-arch] Error 2
dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2
make[1]: *** [scripts/Makefile.package:83: bindeb-pkg] Error 2
make: *** [Makefile:1527: bindeb-pkg] Error 2
The reason is obvious; arch/arc/Makefile sets $(boot)/bootpImage as
the default image, but there is no rule to build it.
Remove the meaningless KBUILD_IMAGE assignment so it will fallback
to the default vmlinux. With this change, you can build the deb package.
I removed the 'bootpImage' target as well. At best, it provides
'make bootpImage' as an alias of 'make vmlinux', but I do not see
much sense in doing so.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 69e976831cd53f9ba304fd20305b2025ecc78eab upstream.
LLVM-built Linux triggered a boot hangup with KASLR enabled.
arch/mips/kernel/relocate.c:get_random_boot() uses linux_banner,
which is a string constant, as a random seed, but accesses it
as an array of unsigned long (in rotate_xor()).
When the address of linux_banner is not aligned to sizeof(long),
such access emits unaligned access exception and hangs the kernel.
Use PTR_ALIGN() to align input address to sizeof(long) and also
align down the input length to prevent possible access-beyond-end.
Fixes: 405bc8fd12 ("MIPS: Kernel: Implement KASLR using CONFIG_RELOCATABLE")
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 698222457465ce343443be81c5512edda86e5914 upstream.
Patches that introduced NT_FILE and NT_SIGINFO notes back in 2012
had taken care of native (fs/binfmt_elf.c) and compat (fs/compat_binfmt_elf.c)
coredumps; unfortunately, compat on mips (which does not go through the
usual compat_binfmt_elf.c) had not been noticed.
As the result, both N32 and O32 coredumps on 64bit mips kernels
have those sections malformed enough to confuse the living hell out of
all gdb and readelf versions (up to and including the tip of binutils-gdb.git).
Longer term solution is to make both O32 and N32 compat use the
regular compat_binfmt_elf.c, but that's too much for backports. The minimal
solution is to do in arch/mips/kernel/binfmt_elf[on]32.c the same thing
those patches have done in fs/compat_binfmt_elf.c
Cc: stable@kernel.org # v3.7+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d4f9c1a17a3480f8fe523673f7232b254d724b7 upstream.
The compressed payload is not necesarily 4-byte aligned, at least when
compiling with Clang. In that case, the 4-byte value appended to the
compressed payload that corresponds to the uncompressed kernel image
size must be read using get_unaligned_le32().
This fixes Clang-built kernels not booting on MIPS (tested on a Ingenic
JZ4770 board).
Fixes: b8f54f2cde ("MIPS: ZBOOT: copy appended dtb to the end of the kernel")
Cc: <stable@vger.kernel.org> # v4.7
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5b058973d3205578aa6c9a71392e072a11ca44ef upstream.
When building mips tinyconfig with clang the following warning show up:
arch/mips/lib/uncached.c:45:6: warning: variable 'sp' is uninitialized when used here [-Wuninitialized]
if (sp >= (long)CKSEG0 && sp < (long)CKSEG2)
^~
arch/mips/lib/uncached.c:40:18: note: initialize the variable 'sp' to silence this warning
register long sp __asm__("$sp");
^
= 0
1 warning generated.
Rework to make an explicit inline move, instead of the non-standard use
of specifying registers for local variables. This is what's written
from the gcc-10 manual [1] about specifying registers for local
variables:
"6.47.5.2 Specifying Registers for Local Variables
.................................................
[...]
"The only supported use for this feature is to specify registers for
input and output operands when calling Extended 'asm' (*note Extended
Asm::). [...]".
[1] https://docs.w3cub.com/gcc~10/local-register-variables
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad4fddef5f2345aa9214e979febe2f47639c10d9 upstream.
When building mips tinyconfig with clang the following error show up:
WARNING: modpost: vmlinux.o(.text+0x1940c): Section mismatch in reference from the function r4k_cache_init() to the function .init.text:loongson3_sc_init()
The function r4k_cache_init() references
the function __init loongson3_sc_init().
This is often because r4k_cache_init lacks a __init
annotation or the annotation of loongson3_sc_init is wrong.
Remove marked __init from function loongson3_sc_init(),
mips_sc_probe_cm3(), and mips_sc_probe().
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c25a053e15778f6b4d6553708673736e27a6c2cf upstream.
Use virtual address instead of physical address when translating
the address to shadow memory by kasan_mem_to_shadow().
Signed-off-by: Nick Hu <nickhu@andestech.com>
Signed-off-by: Nylon Chen <nylon7@andestech.com>
Fixes: b10d6bca87 ("arch, drivers: replace for_each_membock() with for_each_mem_range()")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0aa2ec8a475fb505fd98d93bbcf4e03beeeebcb6 upstream.
The patch fix commit: ad5d112 ("riscv: use vDSO common flow to
reduce the latency of the time-related functions").
The GENERIC_TIME_VSYSCALL should be CONFIG_GENERIC_TIME_VSYSCALL
or vgettimeofday won't work.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Fixes: ad5d1122b8 ("riscv: use vDSO common flow to reduce the latency of the time-related functions")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cf7b2ae4d70432fa94ebba3fbaab825481ae7189 upstream.
Properly return -ENOSYS for syscall -1 instead of leaving the return value
uninitialized. This fixes the strace teststuite.
Fixes: 5340627e3f ("riscv: add support for SECCOMP and SECCOMP_FILTER")
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Schwab <schwab@suse.de>
Reviewed-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ad0a6bad44758afa3b440c254a24999a0c7e35d5 upstream.
We've observed crashes due to an empty cpu mask in
hyperv_flush_tlb_others. Obviously the cpu mask in question is changed
between the cpumask_empty call at the beginning of the function and when
it is actually used later.
One theory is that an interrupt comes in between and a code path ends up
changing the mask. Move the check after interrupt has been disabled to
see if it fixes the issue.
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20210105175043.28325-1-wei.liu@kernel.org
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2a5f1b67ec577fb1544b563086e0377f095f88e2 upstream.
We reset the guest's view of PMCR_EL0 unconditionally, based on
the host's view of this register. It is however legal for an
implementation not to provide any PMU, resulting in an UNDEF.
The obvious fix is to skip the reset of this shadow register
when no PMU is available, sidestepping the issue entirely.
If no PMU is available, the guest is not able to request
a virtual PMU anyway, so not doing nothing is the right thing
to do!
It is unlikely that this bug can hit any HW implementation
though, as they all provide a PMU. It has been found using nested
virt with the host KVM not implementing the PMU itself.
Fixes: ab9468340d ("arm64: KVM: Add access handler for PMCR register")
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201210083059.1277162-1-maz@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 095507dc1350b3a2b8b39fdc05edba0c10859eca upstream.
Systems configured with CONFIG_ZONE_DMA32, CONFIG_ZONE_NORMAL and
!CONFIG_ZONE_DMA will fail to properly setup ARCH_LOW_ADDRESS_LIMIT. The
limit will default to ~0ULL, effectively spanning the whole memory,
which is too high for a configuration that expects low memory to be
capped at 4GB.
Fix ARCH_LOW_ADDRESS_LIMIT by falling back to arm64_dma32_phys_limit
when arm64_dma_phys_limit isn't set. arm64_dma32_phys_limit will honour
CONFIG_ZONE_DMA32, or span the entire memory when not enabled.
Fixes: 1a8e1cef76 ("arm64: use both ZONE_DMA and ZONE_DMA32")
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Link: https://lore.kernel.org/r/20201218163307.10150-1-nsaenzjulienne@suse.de
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ec76c2eea903947202098090bbe07a739b5246e9 upstream.
On the GTA04A5 od->_driver_status was not set to BUS_NOTIFY_BIND_DRIVER
during probe of the second mmc used for wifi. Therefore
omap_device_late_idle idled the device during probing causing oopses when
accessing the registers.
It was not set because od->_state was set to OMAP_DEVICE_STATE_IDLE
in the notifier callback. Therefore set od->_driver_status also in that
case.
This came apparent after commit 21b2cec61c ("mmc: Set
PROBE_PREFER_ASYNCHRONOUS for drivers that existed in v4.4") causing this
oops:
omap_hsmmc 480b4000.mmc: omap_device_late_idle: enabled but no driver. Idling
8<--- cut here ---
Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa0b402c
...
(omap_hsmmc_set_bus_width) from [<c07996bc>] (omap_hsmmc_set_ios+0x11c/0x258)
(omap_hsmmc_set_ios) from [<c077b2b0>] (mmc_power_up.part.8+0x3c/0xd0)
(mmc_power_up.part.8) from [<c077c14c>] (mmc_start_host+0x88/0x9c)
(mmc_start_host) from [<c077d284>] (mmc_add_host+0x58/0x84)
(mmc_add_host) from [<c0799190>] (omap_hsmmc_probe+0x5fc/0x8c0)
(omap_hsmmc_probe) from [<c0666728>] (platform_drv_probe+0x48/0x98)
(platform_drv_probe) from [<c066457c>] (really_probe+0x1dc/0x3b4)
Fixes: 04abaf07f6 ("ARM: OMAP2+: omap_device: Sync omap_device and pm_runtime after probe defer")
Fixes: 21b2cec61c ("mmc: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in v4.4")
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
[tony@atomide.com: left out extra parens, trimmed description stack trace]
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ca408d9c749c32288bc28725f9f12ba30299e8f upstream.
Commit
121b32a58a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments")
converted native x86-32 which take 64-bit arguments to use the
compat handlers to allow conversion to passing args via pt_regs.
sys_fanotify_mark() was however missed, as it has a general compat
handler. Add a config option that will use the syscall wrapper that
takes the split args for native 32-bit.
[ bp: Fix typo in Kconfig help text. ]
Fixes: 121b32a58a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments")
Reported-by: Paweł Jasiak <pawel@jasiak.xyz>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20201130223059.101286-1-brgerst@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 98bf2d3f4970179c702ef64db658e0553bc6ef3a ]
When we have VMAP stack, exception prolog 1 sets r1, not r11.
When it is not an RTAS machine check, don't trash r1 because it is
needed by prolog 1.
Fixes: da7bb43ab9 ("powerpc/32: Fix vmap stack - Properly set r1 before activating MMU")
Fixes: d2e006036082 ("powerpc/32: Use SPRN_SPRG_SCRATCH2 in exception prologs")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Squash in fixup for RTAS machine check from Christophe]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bc77d61d1c18940e456a2dee464f1e2eda65a3f0.1608621048.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 2f80d502d627f30257ba7e3655e71c373b7d1a5a upstream.
Since we know that e >= s, we can reassociate the left shift,
changing the shifted number from 1 to 2 in exchange for
decreasing the right hand side by 1.
Reported-by: syzbot+e87846c48bf72bc85311@syzkaller.appspotmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cb7f4a8b1fb426a175d1708f05581939c61329d4 upstream.
In mtrr_type_lookup(), if the input memory address region is not in the
MTRR, over 4GB, and not over the top of memory, a write-back attribute
is returned. These condition checks are for ensuring the input memory
address region is actually mapped to the physical memory.
However, if the end address is just aligned with the top of memory,
the condition check treats the address is over the top of memory, and
write-back attribute is not returned.
And this hits in a real use case with NVDIMM: the nd_pmem module tries
to map NVDIMMs as cacheable memories when NVDIMMs are connected. If a
NVDIMM is the last of the DIMMs, the performance of this NVDIMM becomes
very low since it is aligned with the top of memory and its memory type
is uncached-minus.
Move the input end address change to inclusive up into
mtrr_type_lookup(), before checking for the top of memory in either
mtrr_type_lookup_{variable,fixed}() helpers.
[ bp: Massage commit message. ]
Fixes: 0cc705f56e ("x86/mm/mtrr: Clean up mtrr_type_lookup()")
Signed-off-by: Ying-Tsun Huang <ying-tsun.huang@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201215070721.4349-1-ying-tsun.huang@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f1dc15cd7fc146107cad2a926d9c1d005f69002a upstream.
AES needs to be disabled on Nokia N950/N9 as well (HS devices), otherwise
kernel fails to boot.
Fixes: c312f06631 ("ARM: dts: omap3: Migrate AES from hwmods to sysc-omap2")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 311bea3cb9ee20ef150ca76fc60a592bf6b159f5 upstream.
With GNU binutils 2.35+, linking with BFD produces warnings for vmlinux:
aarch64-linux-gnu-ld: warning: -z norelro ignored
BFD can produce this warning when the target emulation mode does not
support RELRO program headers, and -z relro or -z norelro is passed.
Alan Modra clarifies:
The default linker emulation for an aarch64-linux ld.bfd is
-maarch64linux, the default for an aarch64-elf linker is
-maarch64elf. They are not equivalent. If you choose -maarch64elf
you get an emulation that doesn't support -z relro.
The ARCH=arm64 kernel prefers -maarch64elf, but may fall back to
-maarch64linux based on the toolchain configuration.
LLD will always create RELRO program header regardless of target
emulation.
To avoid the above warning when linking with BFD, pass -z norelro only
when linking with LLD or with -maarch64linux.
Fixes: 3b92fa7485 ("arm64: link with -z norelro regardless of CONFIG_RELOCATABLE")
Fixes: 3bbd3db864 ("arm64: relocatable: fix inconsistencies in linker script and options")
Cc: <stable@vger.kernel.org> # 5.0.x-
Reported-by: kernelci.org bot <bot@kernelci.org>
Reported-by: Quentin Perret <qperret@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Alan Modra <amodra@gmail.com>
Cc: Fāng-ruì Sòng <maskray@google.com>
Link: https://lore.kernel.org/r/20201218002432.788499-1-ndesaulniers@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a0195f314a25582b38993bf30db11c300f4f4611 upstream.
Shakeel Butt reported in [1] that a user can request a task to be moved
to a resource group even if the task is already in the group. It just
wastes time to do the move operation which could be costly to send IPI
to a different CPU.
Add a sanity check to ensure that the move operation only happens when
the task is not already in the resource group.
[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/
Fixes: e02737d5b8 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/962ede65d8e95be793cb61102cca37f7bb018e66.1608243147.git.reinette.chatre@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae28d1aae48a1258bd09a6f707ebb4231d79a761 upstream.
Currently, when moving a task to a resource group the PQR_ASSOC MSR is
updated with the new closid and rmid in an added task callback. If the
task is running, the work is run as soon as possible. If the task is not
running, the work is executed later in the kernel exit path when the
kernel returns to the task again.
Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
is running is the right thing to do. Queueing work for a task that is
not running is unnecessary (the PQR_ASSOC MSR is already updated when
the task is scheduled in) and causing system resource waste with the way
in which it is implemented: Work to update the PQR_ASSOC register is
queued every time the user writes a task id to the "tasks" file, even if
the task already belongs to the resource group.
This could result in multiple pending work items associated with a
single task even if they are all identical and even though only a single
update with most recent values is needed. Specifically, even if a task
is moved between different resource groups while it is sleeping then it
is only the last move that is relevant but yet a work item is queued
during each move.
This unnecessary queueing of work items could result in significant
system resource waste, especially on tasks sleeping for a long time.
For example, as demonstrated by Shakeel Butt in [1] writing the same
task id to the "tasks" file can quickly consume significant memory. The
same problem (wasted system resources) occurs when moving a task between
different resource groups.
As pointed out by Valentin Schneider in [2] there is an additional issue
with the way in which the queueing of work is done in that the task_struct
update is currently done after the work is queued, resulting in a race with
the register update possibly done before the data needed by the update is
available.
To solve these issues, update the PQR_ASSOC MSR in a synchronous way
right after the new closid and rmid are ready during the task movement,
only if the task is running. If a moved task is not running nothing
is done since the PQR_ASSOC MSR will be updated next time the task is
scheduled. This is the same way used to update the register when tasks
are moved as part of resource group removal.
[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/
[2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.com
[ bp: Massage commit message and drop the two update_task_closid_rmid()
variants. ]
Fixes: e02737d5b8 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <shakeelb@google.com>
Reported-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/17aa2fb38fc12ce7bb710106b3e7c7b45acb9e94.1608243147.git.reinette.chatre@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a889ea54b3daa63ee1463dc19ed699407d61458b upstream.
Many TDP MMU functions which need to perform some action on all TDP MMU
roots hold a reference on that root so that they can safely drop the MMU
lock in order to yield to other threads. However, when releasing the
reference on the root, there is a bug: the root will not be freed even
if its reference count (root_count) is reduced to 0.
To simplify acquiring and releasing references on TDP MMU root pages, and
to ensure that these roots are properly freed, move the get/put operations
into another TDP MMU root iterator macro.
Moving the get/put operations into an iterator macro also helps
simplify control flow when a root does need to be freed. Note that using
the list_for_each_entry_safe macro would not have been appropriate in
this situation because it could keep a pointer to the next root across
an MMU lock release + reacquire, during which time that root could be
freed.
Reported-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: faaf05b00a ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Fixes: 063afacd87 ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU")
Fixes: a6a0b05da9 ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
Fixes: 1488199856 ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210107001935.3732070-1-bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 39b4d43e6003cee51cd119596d3c33d0449eb44c upstream.
Get the so called "root" level from the low level shadow page table
walkers instead of manually attempting to calculate it higher up the
stack, e.g. in get_mmio_spte(). When KVM is using PAE shadow paging,
the starting level of the walk, from the callers perspective, is not
the CR3 root but rather the PDPTR "root". Checking for reserved bits
from the CR3 root causes get_mmio_spte() to consume uninitialized stack
data due to indexing into sptes[] for a level that was not filled by
get_walk(). This can result in false positives and/or negatives
depending on what garbage happens to be on the stack.
Opportunistically nuke a few extra newlines.
Fixes: 95fb5b0258 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Reported-by: Richard Herbert <rherbert@sympatico.ca>
Cc: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2aa078932ff6c66bf10cc5b3144440dbfa7d813d upstream.
Return -1 from the get_walk() helpers if the shadow walk doesn't fill at
least one spte, which can theoretically happen if the walk hits a
not-present PDPTR. Returning the root level in such a case will cause
get_mmio_spte() to return garbage (uninitialized stack data). In
practice, such a scenario should be impossible as KVM shouldn't get a
reserved-bit page fault with a not-present PDPTR.
Note, using mmu->root_level in get_walk() is wrong for other reasons,
too, but that's now a moot point.
Fixes: 95fb5b0258 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Cc: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d1c5246e08eb64991001d97a3bd119c93edbc79a upstream.
Commit
28ee90fe60 ("x86/mm: implement free pmd/pte page interfaces")
introduced a new location where a pmd was released, but neglected to
run the pmd page destructor. In fact, this happened previously for a
different pmd release path and was fixed by commit:
c283610e44 ("x86, mm: do not leak page->ptl for pmd page tables").
This issue was hidden until recently because the failure mode is silent,
but commit:
b2b29d6d01 ("mm: account PMD tables like PTE tables")
turns the failure mode into this signature:
BUG: Bad page state in process lt-pmem-ns pfn:15943d
page:000000007262ed7b refcount:0 mapcount:-1024 mapping:0000000000000000 index:0x0 pfn:0x15943d
flags: 0xaffff800000000()
raw: 00affff800000000 dead000000000100 0000000000000000 0000000000000000
raw: 0000000000000000 ffff913a029bcc08 00000000fffffbff 0000000000000000
page dumped because: nonzero mapcount
[..]
dump_stack+0x8b/0xb0
bad_page.cold+0x63/0x94
free_pcp_prepare+0x224/0x270
free_unref_page+0x18/0xd0
pud_free_pmd_page+0x146/0x160
ioremap_pud_range+0xe3/0x350
ioremap_page_range+0x108/0x160
__ioremap_caller.constprop.0+0x174/0x2b0
? memremap+0x7a/0x110
memremap+0x7a/0x110
devm_memremap+0x53/0xa0
pmem_attach_disk+0x4ed/0x530 [nd_pmem]
? __devm_release_region+0x52/0x80
nvdimm_bus_probe+0x85/0x210 [libnvdimm]
Given this is a repeat occurrence it seemed prudent to look for other
places where this destructor might be missing and whether a better
helper is needed. try_to_free_pmd_page() looks like a candidate, but
testing with setting up and tearing down pmd mappings via the dax unit
tests is thus far not triggering the failure.
As for a better helper pmd_free() is close, but it is a messy fit
due to requiring an @mm arg. Also, ___pmd_free_tlb() wants to call
paravirt_tlb_remove_table() instead of free_page(), so open-coded
pgtable_pmd_page_dtor() seems the best way forward for now.
Debugged together with Matthew Wilcox <willy@infradead.org>.
Fixes: 28ee90fe60 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/160697689204.605323.17629854984697045602.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ce47d95b7346dcafd9bed3556a8d072cb2b8571 upstream.
Commit eff8728fe6 ("vmlinux.lds.h: Add PGO and AutoFDO input
sections") added ".text.unlikely.*" and ".text.hot.*" due to an LLVM
change [1].
After another LLVM change [2], these sections are seen in some PowerPC
builds, where there is a orphan section warning then build failure:
$ make -skj"$(nproc)" \
ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 O=out \
distclean powernv_defconfig zImage.epapr
ld.lld: warning: kernel/built-in.a(panic.o):(.text.unlikely.) is being placed in '.text.unlikely.'
...
ld.lld: warning: address (0xc000000000009314) of section .text is not a multiple of alignment (256)
...
ERROR: start_text address is c000000000009400, should be c000000000008000
ERROR: try to enable LD_HEAD_STUB_CATCH config option
ERROR: see comments in arch/powerpc/tools/head_check.sh
...
Explicitly handle these sections like in the main linker script so
there is no more build failure.
[1]: https://reviews.llvm.org/D79600
[2]: https://reviews.llvm.org/D92493
Fixes: 83a092cf95 ("powerpc: Link warning for orphan sections")
Cc: stable@vger.kernel.org
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/ClangBuiltLinux/linux/issues/1218
Link: https://lore.kernel.org/r/20210104205952.1399409-1-natechancellor@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 87dbc209ea04645fd2351981f09eff5d23f8e2e9 ]
Make <asm-generic/local64.h> mandatory in include/asm-generic/Kbuild and
remove all arch/*/include/asm/local64.h arch-specific files since they
only #include <asm-generic/local64.h>.
This fixes build errors on arch/c6x/ and arch/nios2/ for
block/blk-iocost.c.
Build-tested on 21 of 25 arch-es. (tools problems on the others)
Yes, we could even rename <asm-generic/local64.h> to
<linux/local64.h> and change all #includes to use
<linux/local64.h> instead.
Link: https://lkml.kernel.org/r/20201227024446.17018-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <jacquiot.aurelien@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9365965db0c7ca7fc81eee27c21d8522d7102c32 ]
Clear the kernel stack backchain before potentially calling the
lockdep trace_hardirqs_off/on functions. Without this walking the
kernel backchain, e.g. during a panic, might stop too early.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fc6b6a872dcd48c6f39c7975836d75113db67d37 ]
Internally, UBD treats each physical IO segment as a separate command to
be submitted in the execution pipe. If the pipe returns a transient
error after a few segments have already been written, UBD will tell the
block layer to requeue the request, but there is no way to reclaim the
segments already submitted. When a new attempt to dispatch the request
is done, those segments already submitted will get duplicated, causing
the WARN_ON below in the best case, and potentially data corruption.
In my system, running a UML instance with 2GB of RAM and a 50M UBD disk,
I can reproduce the WARN_ON by simply running mkfs.fvat against the
disk on a freshly booted system.
There are a few ways to around this, like reducing the pressure on
the pipe by reducing the queue depth, which almost eliminates the
occurrence of the problem, increasing the pipe buffer size on the host
system, or by limiting the request to one physical segment, which causes
the block layer to submit way more requests to resolve a single
operation.
Instead, this patch modifies the format of a UBD command, such that all
segments are sent through a single element in the communication pipe,
turning the command submission atomic from the point of view of the
block layer. The new format has a variable size, depending on the
number of elements, and looks like this:
+------------+-----------+-----------+------------
| cmd_header | segment 0 | segment 1 | segment ...
+------------+-----------+-----------+------------
With this format, we push a pointer to cmd_header in the submission
pipe.
This has the advantage of reducing the memory footprint of executing a
single request, since it allow us to merge some fields in the header.
It is possible to reduce even further each segment memory footprint, by
merging bitmap_words and cow_offset, for instance, but this is not the
focus of this patch and is left as future work. One issue with the
patch is that for a big number of segments, we now perform one big
memory allocation instead of multiple small ones, but I wasn't able to
trigger any real issues or -ENOMEM because of this change, that wouldn't
be reproduced otherwise.
This was tested using fio with the verify-crc32 option, and by running
an ext4 filesystem over this UBD device.
The original WARN_ON was:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0x13f/0x141
refcount_t: underflow; use-after-free.
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.5.0-rc6-00002-g2a5bb2cf75c8 #346
Stack:
6084eed0 6063dc77 00000009 6084ef60
00000000 604b8d9f 6084eee0 6063dcbc
6084ef40 6006ab8d e013d780 1c00000000
Call Trace:
[<600a0c1c>] ? printk+0x0/0x94
[<6004a888>] show_stack+0x13b/0x155
[<6063dc77>] ? dump_stack_print_info+0xdf/0xe8
[<604b8d9f>] ? refcount_warn_saturate+0x13f/0x141
[<6063dcbc>] dump_stack+0x2a/0x2c
[<6006ab8d>] __warn+0x107/0x134
[<6008da6c>] ? wake_up_process+0x17/0x19
[<60487628>] ? blk_queue_max_discard_sectors+0x0/0xd
[<6006b05f>] warn_slowpath_fmt+0xd1/0xdf
[<6006af8e>] ? warn_slowpath_fmt+0x0/0xdf
[<600acc14>] ? raw_read_seqcount_begin.constprop.0+0x0/0x15
[<600619ae>] ? os_nsecs+0x1d/0x2b
[<604b8d9f>] refcount_warn_saturate+0x13f/0x141
[<6048bc8f>] refcount_sub_and_test.constprop.0+0x2f/0x37
[<6048c8de>] blk_mq_free_request+0xf1/0x10d
[<6048ca06>] __blk_mq_end_request+0x10c/0x114
[<6005ac0f>] ubd_intr+0xb5/0x169
[<600a1a37>] __handle_irq_event_percpu+0x6b/0x17e
[<600a1b70>] handle_irq_event_percpu+0x26/0x69
[<600a1bd9>] handle_irq_event+0x26/0x34
[<600a1bb3>] ? handle_irq_event+0x0/0x34
[<600a5186>] ? unmask_irq+0x0/0x37
[<600a57e6>] handle_edge_irq+0xbc/0xd6
[<600a131a>] generic_handle_irq+0x21/0x29
[<60048f6e>] do_IRQ+0x39/0x54
[...]
---[ end trace c6e7444e55386c0f ]---
Cc: Christopher Obbard <chris.obbard@collabora.com>
Reported-by: Martyn Welch <martyn@collabora.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Tested-by: Christopher Obbard <chris.obbard@collabora.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 72d3e093afae79611fa38f8f2cfab9a888fe66f2 ]
The UML random driver creates a dummy device under the guest,
/dev/hw_random. When this file is read from the guest, the driver
reads from the host machine's /dev/random, in-turn reading from
the host kernel's entropy pool. This entropy pool could have been
filled by a hardware random number generator or just the host
kernel's internal software entropy generator.
Currently the driver does not fill the guests kernel entropy pool,
this requires a userspace tool running inside the guest (like
rng-tools) to read from the dummy device provided by this driver,
which then would fill the guest's internal entropy pool.
This all seems quite pointless when we are already reading from an
entropy pool, so this patch aims to register the device as a hwrng
device using the hwrng-core framework. This not only improves and
cleans up the driver, but also fills the guest's entropy pool
without having to resort to using extra userspace tools in the guest.
This is typically a nuisance when booting a guest: the random pool
takes a long time (~200s) to build up enough entropy since the dummy
hwrng is not used to fill the guest's pool.
This port was originally attempted by Alexander Neville "dark" (in CC,
discussion in Link), but the conversation there stalled since the
handling of -EAGAIN errors were no removed and longer handled by the
driver. This patch attempts to use the existing method of error
handling but utilises the new hwrng core.
The issue can be noticed when booting a UML guest:
[ 2.560000] random: fast init done
[ 214.000000] random: crng init done
With the patch applied, filling the pool becomes a lot quicker:
[ 2.560000] random: fast init done
[ 12.000000] random: crng init done
Cc: Alexander Neville <dark@volatile.bz>
Link: https://lore.kernel.org/lkml/20190828204609.02a7ff70@TheDarkness/
Link: https://lore.kernel.org/lkml/20190829135001.6a5ff940@TheDarkness.local/
Cc: Sjoerd Simons <sjoerd.simons@collabora.co.uk>
Signed-off-by: Christopher Obbard <chris.obbard@collabora.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 59d512e4374b2d8a6ad341475dc94c4a4bdec7d3 ]
This is way to catch some cases of decrementer overflow, when the
decrementer has underflowed an odd number of times, while MSR[EE] was
disabled.
With a typical small decrementer, a timer that fires when MSR[EE] is
disabled will be "lost" if MSR[EE] remains disabled for between 4.3 and
8.6 seconds after the timer expires. In any case, the decrementer
interrupt would be taken at 8.6 seconds and the timer would be found at
that point.
So this check is for catching extreme latency events, and it prevents
those latencies from being a further few seconds long. It's not obvious
this is a good tradeoff. This is already a watchdog magnitude event and
that situation is not improved a significantly with this check. For
large decrementers, it's useless.
Therefore remove this check, which avoids a mftb when enabling hard
disabled interrupts (e.g., when enabling after coming from hardware
interrupt handlers). Perhaps more importantly, it also removes the
clunky MSR[EE] vs PACA_IRQ_HARD_DIS incoherency in soft-interrupt replay
which simplifies the code.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201107014336.2337337-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ffa1797040c5da391859a9556be7b735acbe1242 ]
I noticed that iounmap() of msgr_block_addr before return from
mpic_msgr_probe() in the error handling case is missing. So use
devm_ioremap() instead of just ioremap() when remapping the message
register block, so the mapping will be automatically released on
probe failure.
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit dc2da7b45ffe954a0090f5d0310ed7b0b37d2bd2 upstream.
VMware observed a performance regression during memmap init on their
platform, and bisected to commit 73a6e474cb ("mm: memmap_init:
iterate over memblock regions rather that check each PFN") causing it.
Before the commit:
[0.033176] Normal zone: 1445888 pages used for memmap
[0.033176] Normal zone: 89391104 pages, LIFO batch:63
[0.035851] ACPI: PM-Timer IO Port: 0x448
With commit
[0.026874] Normal zone: 1445888 pages used for memmap
[0.026875] Normal zone: 89391104 pages, LIFO batch:63
[2.028450] ACPI: PM-Timer IO Port: 0x448
The root cause is the current memmap defer init doesn't work as expected.
Before, memmap_init_zone() was used to do memmap init of one whole zone,
to initialize all low zones of one numa node, but defer memmap init of
the last zone in that numa node. However, since commit 73a6e474cb,
function memmap_init() is adapted to iterater over memblock regions
inside one zone, then call memmap_init_zone() to do memmap init for each
region.
E.g, on VMware's system, the memory layout is as below, there are two
memory regions in node 2. The current code will mistakenly initialize the
whole 1st region [mem 0xab00000000-0xfcffffffff], then do memmap defer to
iniatialize only one memmory section on the 2nd region [mem
0x10000000000-0x1033fffffff]. In fact, we only expect to see that there's
only one memory section's memmap initialized. That's why more time is
costed at the time.
[ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
[ 0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
[ 0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
[ 0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
[ 0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]
Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
down the real zone end pfn so that defer_init() can use it to judge
whether defer need be taken in zone wide.
Link: https://lkml.kernel.org/r/20201223080811.16211-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20201223080811.16211-2-bhe@redhat.com
Fixes: commit 73a6e474cb ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Baoquan He <bhe@redhat.com>
Reported-by: Rahul Gopakumar <gopakumarr@vmware.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 028c221ed1904af9ac3c5162ee98f48966de6b3d ]
AMD systems provide a "NodeId" value that represents a global ID
indicating to which "Node" a logical CPU belongs. The "Node" is a
physical structure equivalent to a Die, and it should not be confused
with logical structures like NUMA nodes. Logical nodes can be adjusted
based on firmware or other settings whereas the physical nodes/dies are
fixed based on hardware topology.
The NodeId value can be used when a physical ID is needed by software.
Save the AMD NodeId to struct cpuinfo_x86.cpu_die_id. Use the value
from CPUID or MSR as appropriate. Default to phys_proc_id otherwise.
Do so for both AMD and Hygon systems.
Drop the node_id parameter from cacheinfo_*_init_llc_id() as it is no
longer needed.
Update the x86 topology documentation.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201109210659.754018-2-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit adab66b71abfe206a020f11e561f4df41f0b2aba upstream.
It was believed that metag was the only architecture that required the ring
buffer to keep 8 byte words aligned on 8 byte architectures, and with its
removal, it was assumed that the ring buffer code did not need to handle
this case. It appears that sparc64 also requires this.
The following was reported on a sparc64 boot up:
kernel: futex hash table entries: 65536 (order: 9, 4194304 bytes, linear)
kernel: Running postponed tracer tests:
kernel: Testing tracer function:
kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140
kernel: Kernel unaligned access at TPC[552a24] trace_function+0x44/0x140
kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140
kernel: Kernel unaligned access at TPC[552a24] trace_function+0x44/0x140
kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140
kernel: PASSED
Need to put back the 64BIT aligned code for the ring buffer.
Link: https://lore.kernel.org/r/CADxRZqzXQRYgKc=y-KV=S_yHL+Y8Ay2mh5ezeZUnpRvg+syWKw@mail.gmail.com
Cc: stable@vger.kernel.org
Fixes: 86b3de60a0 ("ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS")
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ff9632d2a66512436d616ef4c380a0e73f748db1 upstream.
Since the time-travel rework, basic time-travel mode hasn't worked
properly, but there's no longer a need for this WARN_ON() so just
remove it and thereby fix things.
Cc: stable@vger.kernel.org
Fixes: 4b786e24ca ("um: time-travel: Rewrite as an event scheduler")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 97be7ceaf7fea68104824b6aa874cff235333ac1 upstream.
asprintf is not compatible with the existing uml memory allocation
mechanism. Its use on the "user" side of UML results in a corrupt slab
state.
Fixes: 0d4e5ac7e7 ("um: remove uses of variable length arrays")
Cc: stable@vger.kernel.org
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d6718941a2767fb383e105d257d2105fe4f15f0e upstream.
It's very easy to crash the kernel right now by simply trying to
enable memtrace concurrently, hammering on the "enable" interface
loop.sh:
#!/bin/bash
dmesg --console-off
while true; do
echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
done
[root@localhost ~]# loop.sh &
[root@localhost ~]# loop.sh &
Resulting quickly in a kernel crash. Let's properly protect using a
mutex.
Fixes: 9d5171a8f2 ("powerpc/powernv: Enable removal of memory for in memory tracing")
Cc: stable@vger.kernel.org# v4.14+
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-3-david@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b1198a88230f2ce50c271e22b82a8b8610b2eea9 upstream.
We execute certain NPU2 setup code (such as mapping an LPID to a device
in NPU2) unconditionally if an Nvlink bridge is detected. However this
cannot succeed on POWER8NVL machines and errors appear in dmesg. This is
harmless as skiboot returns an error and the only place we check it is
vfio-pci but that code does not get called on P8+ either.
This adds a check if pnv_npu2_xxx helpers are called on a machine with
NPU2 which initializes pnv_phb::npu in pnv_npu2_init();
pnv_phb::npu==NULL on POWER8/NVL (Naples).
While at this, fix NULL derefencing in pnv_npu_peers_take_ownership/
pnv_npu_peers_release_ownership which occurs when GPUs on mentioned P8s
cause EEH which happens if "vfio-pci" disables devices using
the D3 power state; the vfio-pci's disable_idle_d3 module parameter
controls this and must be set on Naples. The EEH handling clears
the entire pnv_ioda_pe struct in pnv_ioda_free_pe() hence
the NULL derefencing. We cannot recover from that but at least we stop
crashing.
Tested on
- POWER9 pvr=004e1201, Ubuntu 19.04 host, Ubuntu 18.04 vm,
NVIDIA GV100 10de:1db1 driver 418.39
- POWER8 pvr=004c0100, RHEL 7.6 host, Ubuntu 16.10 vm,
NVIDIA P100 10de:15f9 driver 396.47
Fixes: 1b785611e1 ("powerpc/powernv/npu: Add release_ownership hook")
Cc: stable@vger.kernel.org # 5.0
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201122073828.15446-1-aik@ozlabs.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>