Commit Graph

178354 Commits

Author SHA1 Message Date
Peter Gonda
e558b38b5e x86/sev-es: Fix SEV-ES OUT/IN immediate opcode vc handling
[ Upstream commit a8f7e08a81708920a928664a865208fdf451c49f ]

The IN and OUT instructions with port address as an immediate operand
only use an 8-bit immediate (imm8). The current VC handler uses the
entire 32-bit immediate value but these instructions only set the first
bytes.

Cast the operand to an u8 for that.

 [ bp: Massage commit message. ]

Fixes: 25189d08e5 ("x86/sev-es: Add support for handling IOIO exceptions")
Signed-off-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Link: https://lkml.kernel.org/r/20210105163311.221490-1-pgonda@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:27:28 +01:00
Arnd Bergmann
e30f6e1ac3 ARM: picoxcell: fix missing interrupt-parent properties
[ Upstream commit bac717171971176b78c72d15a8b6961764ab197f ]

dtc points out that the interrupts for some devices are not parsable:

picoxcell-pc3x2.dtsi:45.19-49.5: Warning (interrupts_property): /paxi/gem@30000: Missing interrupt-parent
picoxcell-pc3x2.dtsi:51.21-55.5: Warning (interrupts_property): /paxi/dmac@40000: Missing interrupt-parent
picoxcell-pc3x2.dtsi:57.21-61.5: Warning (interrupts_property): /paxi/dmac@50000: Missing interrupt-parent
picoxcell-pc3x2.dtsi:233.21-237.5: Warning (interrupts_property): /rwid-axi/axi2pico@c0000000: Missing interrupt-parent

There are two VIC instances, so it's not clear which one needs to be
used. I found the BSP sources that reference VIC0, so use that:

https://github.com/r1mikey/meta-picoxcell/blob/master/recipes-kernel/linux/linux-picochip-3.0/0001-picoxcell-support-for-Picochip-picoXcell-SoC.patch

Acked-by: Jamie Iles <jamie@jamieiles.com>
Link: https://lore.kernel.org/r/20201230152010.3914962-1-arnd@kernel.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:27:27 +01:00
Randy Dunlap
bb3700925c arch/arc: add copy_user_page() to <asm/page.h> to fix build error on ARC
[ Upstream commit 8a48c0a3360bf2bf4f40c980d0ec216e770e58ee ]

fs/dax.c uses copy_user_page() but ARC does not provide that interface,
resulting in a build error.

Provide copy_user_page() in <asm/page.h>.

../fs/dax.c: In function 'copy_cow_page_dax':
../fs/dax.c:702:2: error: implicit declaration of function 'copy_user_page'; did you mean 'copy_to_user_page'? [-Werror=implicit-function-declaration]

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Dan Williams <dan.j.williams@intel.com>
#Acked-by: Vineet Gupta <vgupta@synopsys.com> # v1
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
#Reviewed-by: Ira Weiny <ira.weiny@intel.com> # v2
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:27:26 +01:00
Linus Walleij
06b0d83b33 ARM: dts: ux500/golden: Set display max brightness
[ Upstream commit 7887cc89d5851cbdec49219e9614beec776af150 ]

A too high brightness by default (default is max) makes the
screen go blank. Set this to 15 as in the Vendor tree.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Stephan Gerhold <stephan@gerhold.net>
Link: https://lore.kernel.org/r/20201214223413.253893-1-linus.walleij@linaro.org'
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:27:24 +01:00
Carl Philipp Klemm
54cfdd6507 ARM: omap2: pmic-cpcap: fix maximum voltage to be consistent with defaults on xt875
[ Upstream commit c0bc969c176b10598b31d5d1a5edf9a5261f0a9f ]

xt875 comes up with a iva voltage of 1375000 and android runs at this too. fix
maximum voltage to be consistent with this.

Signed-off-by: Carl Philipp Klemm <philipp@uvos.xyz>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:27:23 +01:00
Masahiro Yamada
6169a5cfaa ARC: build: move symlink creation to arch/arc/Makefile to avoid race
[ Upstream commit c5e6ae563c802c4d828d42e134af64004db2e58c ]

If you run 'make uImage uImage.gz' with the parallel option, uImage.gz
will be created by two threads simultaneously.

This is because arch/arc/Makefile does not specify the dependency
between uImage and uImage.gz. Hence, GNU Make assumes they can be
built in parallel. One thread descends into arch/arc/boot/ to create
uImage, and another to create uImage.gz.

Please notice the same log is displayed twice in the following steps:

  $ export CROSS_COMPILE=<your-arc-compiler-prefix>
  $ make -s ARCH=arc defconfig
  $ make -j$(nproc) ARCH=arc uImage uImage.gz
  [ snip ]
    LD      vmlinux
    SORTTAB vmlinux
    SYSMAP  System.map
    OBJCOPY arch/arc/boot/vmlinux.bin
    OBJCOPY arch/arc/boot/vmlinux.bin
    GZIP    arch/arc/boot/vmlinux.bin.gz
    GZIP    arch/arc/boot/vmlinux.bin.gz
    UIMAGE  arch/arc/boot/uImage.gz
    UIMAGE  arch/arc/boot/uImage.gz
  Image Name:   Linux-5.10.0-rc4-00003-g62f23044
  Created:      Sun Nov 22 02:52:26 2020
  Image Type:   ARC Linux Kernel Image (gzip compressed)
  Data Size:    2109376 Bytes = 2059.94 KiB = 2.01 MiB
  Load Address: 80000000
  Entry Point:  80004000
    Image arch/arc/boot/uImage is ready
  Image Name:   Linux-5.10.0-rc4-00003-g62f23044
  Created:      Sun Nov 22 02:52:26 2020
  Image Type:   ARC Linux Kernel Image (gzip compressed)
  Data Size:    2815455 Bytes = 2749.47 KiB = 2.69 MiB
  Load Address: 80000000
  Entry Point:  80004000

This is a race between the two threads trying to write to the same file
arch/arc/boot/uImage.gz. This is a potential problem that can generate
a broken file.

I fixed a similar problem for ARM by commit 3939f33450 ("ARM: 8418/1:
add boot image dependencies to not generate invalid images").

I highly recommend to avoid such build rules that cause a race condition.

Move the uImage rule to arch/arc/Makefile.

Another strangeness is that arch/arc/boot/Makefile compares the
timestamps between $(obj)/uImage and $(obj)/uImage.*:

  $(obj)/uImage: $(obj)/uImage.$(suffix-y)
          @ln -sf $(notdir $<) $@
          @echo '  Image $@ is ready'

This does not work as expected since $(obj)/uImage is a symlink.
The symlink should be created in a phony target rule.

I used $(kecho) instead of echo to suppress the message
'Image arch/arc/boot/uImage is ready' when the -s option is given.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:27:23 +01:00
Masahiro Yamada
443fb88d6d ARC: build: add boot_targets to PHONY
[ Upstream commit 0cfccb3c04934cdef42ae26042139f16e805b5f7 ]

The top-level boot_targets (uImage and uImage.*) should be phony
targets. They just let Kbuild descend into arch/arc/boot/ and create
files there.

If a file exists in the top directory with the same name, the boot
image will not be created.

You can confirm it by the following steps:

  $ export CROSS_COMPILE=<your-arc-compiler-prefix>
  $ make -s ARCH=arc defconfig all   # vmlinux will be built
  $ touch uImage.gz
  $ make ARCH=arc uImage.gz
  CALL    scripts/atomic/check-atomics.sh
  CALL    scripts/checksyscalls.sh
  CHK     include/generated/compile.h
  # arch/arc/boot/uImage.gz is not created

Specify the targets as PHONY to fix this.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:27:23 +01:00
Masahiro Yamada
e1c4b5ff96 ARC: build: add uImage.lzma to the top-level target
[ Upstream commit f2712ec76a5433e5ec9def2bd52a95df1f96d050 ]

arch/arc/boot/Makefile supports uImage.lzma, but you cannot do
'make uImage.lzma' because the corresponding target is missing
in arch/arc/Makefile. Add it.

I also changed the assignment operator '+=' to ':=' since this is the
only place where we expect this variable to be set.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:27:23 +01:00
Masahiro Yamada
cf4592a2d7 ARC: build: remove non-existing bootpImage from KBUILD_IMAGE
[ Upstream commit 9836720911cfec25d3fbdead1c438bf87e0f2841 ]

The deb-pkg builds for ARCH=arc fail.

  $ export CROSS_COMPILE=<your-arc-compiler-prefix>
  $ make -s ARCH=arc defconfig
  $ make ARCH=arc bindeb-pkg
  SORTTAB vmlinux
  SYSMAP  System.map
  MODPOST Module.symvers
  make KERNELRELEASE=5.10.0-rc4 ARCH=arc KBUILD_BUILD_VERSION=2 -f ./Makefile intdeb-pkg
  sh ./scripts/package/builddeb
  cp: cannot stat 'arch/arc/boot/bootpImage': No such file or directory
  make[4]: *** [scripts/Makefile.package:87: intdeb-pkg] Error 1
  make[3]: *** [Makefile:1527: intdeb-pkg] Error 2
  make[2]: *** [debian/rules:13: binary-arch] Error 2
  dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2
  make[1]: *** [scripts/Makefile.package:83: bindeb-pkg] Error 2
  make: *** [Makefile:1527: bindeb-pkg] Error 2

The reason is obvious; arch/arc/Makefile sets $(boot)/bootpImage as
the default image, but there is no rule to build it.

Remove the meaningless KBUILD_IMAGE assignment so it will fallback
to the default vmlinux. With this change, you can build the deb package.

I removed the 'bootpImage' target as well. At best, it provides
'make bootpImage' as an alias of 'make vmlinux', but I do not see
much sense in doing so.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-19 18:27:23 +01:00
Alexander Lobakin
c8c2b27ab3 MIPS: relocatable: fix possible boot hangup with KASLR enabled
commit 69e976831cd53f9ba304fd20305b2025ecc78eab upstream.

LLVM-built Linux triggered a boot hangup with KASLR enabled.

arch/mips/kernel/relocate.c:get_random_boot() uses linux_banner,
which is a string constant, as a random seed, but accesses it
as an array of unsigned long (in rotate_xor()).
When the address of linux_banner is not aligned to sizeof(long),
such access emits unaligned access exception and hangs the kernel.

Use PTR_ALIGN() to align input address to sizeof(long) and also
align down the input length to prevent possible access-beyond-end.

Fixes: 405bc8fd12 ("MIPS: Kernel: Implement KASLR using CONFIG_RELOCATABLE")
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:20 +01:00
Al Viro
652daca07f MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps
commit 698222457465ce343443be81c5512edda86e5914 upstream.

Patches that introduced NT_FILE and NT_SIGINFO notes back in 2012
had taken care of native (fs/binfmt_elf.c) and compat (fs/compat_binfmt_elf.c)
coredumps; unfortunately, compat on mips (which does not go through the
usual compat_binfmt_elf.c) had not been noticed.

As the result, both N32 and O32 coredumps on 64bit mips kernels
have those sections malformed enough to confuse the living hell out of
all gdb and readelf versions (up to and including the tip of binutils-gdb.git).

Longer term solution is to make both O32 and N32 compat use the
regular compat_binfmt_elf.c, but that's too much for backports.  The minimal
solution is to do in arch/mips/kernel/binfmt_elf[on]32.c the same thing
those patches have done in fs/compat_binfmt_elf.c

Cc: stable@kernel.org # v3.7+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:20 +01:00
Paul Cercueil
9e2413f41a MIPS: boot: Fix unaligned access with CONFIG_MIPS_RAW_APPENDED_DTB
commit 4d4f9c1a17a3480f8fe523673f7232b254d724b7 upstream.

The compressed payload is not necesarily 4-byte aligned, at least when
compiling with Clang. In that case, the 4-byte value appended to the
compressed payload that corresponds to the uncompressed kernel image
size must be read using get_unaligned_le32().

This fixes Clang-built kernels not booting on MIPS (tested on a Ingenic
JZ4770 board).

Fixes: b8f54f2cde ("MIPS: ZBOOT: copy appended dtb to the end of the kernel")
Cc: <stable@vger.kernel.org> # v4.7
Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:20 +01:00
Anders Roxell
974f19621f mips: lib: uncached: fix non-standard usage of variable 'sp'
commit 5b058973d3205578aa6c9a71392e072a11ca44ef upstream.

When building mips tinyconfig with clang the following warning show up:

arch/mips/lib/uncached.c:45:6: warning: variable 'sp' is uninitialized when used here [-Wuninitialized]
        if (sp >= (long)CKSEG0 && sp < (long)CKSEG2)
            ^~
arch/mips/lib/uncached.c:40:18: note: initialize the variable 'sp' to silence this warning
        register long sp __asm__("$sp");
                        ^
                         = 0
1 warning generated.

Rework to make an explicit inline move, instead of the non-standard use
of specifying registers for local variables. This is what's written
from the gcc-10 manual [1] about specifying registers for local
variables:

"6.47.5.2 Specifying Registers for Local Variables
.................................................
[...]

"The only supported use for this feature is to specify registers for
input and output operands when calling Extended 'asm' (*note Extended
Asm::).  [...]".

[1] https://docs.w3cub.com/gcc~10/local-register-variables
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:20 +01:00
Anders Roxell
5ca873f92b mips: fix Section mismatch in reference
commit ad4fddef5f2345aa9214e979febe2f47639c10d9 upstream.

When building mips tinyconfig with clang the following error show up:

WARNING: modpost: vmlinux.o(.text+0x1940c): Section mismatch in reference from the function r4k_cache_init() to the function .init.text:loongson3_sc_init()
The function r4k_cache_init() references
the function __init loongson3_sc_init().
This is often because r4k_cache_init lacks a __init
annotation or the annotation of loongson3_sc_init is wrong.

Remove marked __init from function loongson3_sc_init(),
mips_sc_probe_cm3(), and mips_sc_probe().

Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:20 +01:00
Nick Hu
4b0a0655da riscv: Fix KASAN memory mapping.
commit c25a053e15778f6b4d6553708673736e27a6c2cf upstream.

Use virtual address instead of physical address when translating
the address to shadow memory by kasan_mem_to_shadow().

Signed-off-by: Nick Hu <nickhu@andestech.com>
Signed-off-by: Nylon Chen <nylon7@andestech.com>
Fixes: b10d6bca87 ("arch, drivers: replace for_each_membock() with for_each_mem_range()")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:20 +01:00
Guo Ren
ab7594f639 riscv: Fixup CONFIG_GENERIC_TIME_VSYSCALL
commit 0aa2ec8a475fb505fd98d93bbcf4e03beeeebcb6 upstream.

The patch fix commit: ad5d112 ("riscv: use vDSO common flow to
reduce the latency of the time-related functions").

The GENERIC_TIME_VSYSCALL should be CONFIG_GENERIC_TIME_VSYSCALL
or vgettimeofday won't work.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Fixes: ad5d1122b8 ("riscv: use vDSO common flow to reduce the latency of the time-related functions")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:20 +01:00
Andreas Schwab
7c4ced3682 riscv: return -ENOSYS for syscall -1
commit cf7b2ae4d70432fa94ebba3fbaab825481ae7189 upstream.

Properly return -ENOSYS for syscall -1 instead of leaving the return value
uninitialized.  This fixes the strace teststuite.

Fixes: 5340627e3f ("riscv: add support for SECCOMP and SECCOMP_FILTER")
Cc: stable@vger.kernel.org
Signed-off-by: Andreas Schwab <schwab@suse.de>
Reviewed-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:20 +01:00
Kefeng Wang
eae7b19b32 riscv: Drop a duplicated PAGE_KERNEL_EXEC
commit 0ea02c73775277001c651ad4a0e83781a9acf406 upstream.

commit b91540d52a ("RISC-V: Add EFI runtime services") add
a duplicated PAGE_KERNEL_EXEC, kill it.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Fixes: b91540d52a ("RISC-V: Add EFI runtime services")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:20 +01:00
Wei Liu
ad8ca24ba8 x86/hyperv: check cpu mask after interrupt has been disabled
commit ad0a6bad44758afa3b440c254a24999a0c7e35d5 upstream.

We've observed crashes due to an empty cpu mask in
hyperv_flush_tlb_others.  Obviously the cpu mask in question is changed
between the cpumask_empty call at the beginning of the function and when
it is actually used later.

One theory is that an interrupt comes in between and a code path ends up
changing the mask. Move the check after interrupt has been disabled to
see if it fixes the issue.

Signed-off-by: Wei Liu <wei.liu@kernel.org>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20210105175043.28325-1-wei.liu@kernel.org
Reviewed-by:  Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-19 18:27:18 +01:00
Marc Zyngier
610e2c5699 KVM: arm64: Don't access PMCR_EL0 when no PMU is available
commit 2a5f1b67ec577fb1544b563086e0377f095f88e2 upstream.

We reset the guest's view of PMCR_EL0 unconditionally, based on
the host's view of this register. It is however legal for an
implementation not to provide any PMU, resulting in an UNDEF.

The obvious fix is to skip the reset of this shadow register
when no PMU is available, sidestepping the issue entirely.
If no PMU is available, the guest is not able to request
a virtual PMU anyway, so not doing nothing is the right thing
to do!

It is unlikely that this bug can hit any HW implementation
though, as they all provide a PMU. It has been found using nested
virt with the host KVM not implementing the PMU itself.

Fixes: ab9468340d ("arm64: KVM: Add access handler for PMCR register")
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201210083059.1277162-1-maz@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17 14:17:05 +01:00
Shannon Zhao
ae4db0bc5c arm64: cpufeature: remove non-exist CONFIG_KVM_ARM_HOST
commit 45ba7b195a369f35cb39094fdb32efe5908b34ad upstream.

Commit d82755b2e7 ("KVM: arm64: Kill off CONFIG_KVM_ARM_HOST") deletes
CONFIG_KVM_ARM_HOST option, it should use CONFIG_KVM instead.

Just remove CONFIG_KVM_ARM_HOST here.

Fixes: d82755b2e7 ("KVM: arm64: Kill off CONFIG_KVM_ARM_HOST")
Signed-off-by: Shannon Zhao <shannon.zhao@linux.alibaba.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1609760324-92271-1-git-send-email-shannon.zhao@linux.alibaba.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17 14:17:04 +01:00
Nicolas Saenz Julienne
41dcfc0cb9 arm64: mm: Fix ARCH_LOW_ADDRESS_LIMIT when !CONFIG_ZONE_DMA
commit 095507dc1350b3a2b8b39fdc05edba0c10859eca upstream.

Systems configured with CONFIG_ZONE_DMA32, CONFIG_ZONE_NORMAL and
!CONFIG_ZONE_DMA will fail to properly setup ARCH_LOW_ADDRESS_LIMIT. The
limit will default to ~0ULL, effectively spanning the whole memory,
which is too high for a configuration that expects low memory to be
capped at 4GB.

Fix ARCH_LOW_ADDRESS_LIMIT by falling back to arm64_dma32_phys_limit
when arm64_dma_phys_limit isn't set. arm64_dma32_phys_limit will honour
CONFIG_ZONE_DMA32, or span the entire memory when not enabled.

Fixes: 1a8e1cef76 ("arm64: use both ZONE_DMA and ZONE_DMA32")
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Link: https://lore.kernel.org/r/20201218163307.10150-1-nsaenzjulienne@suse.de
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17 14:17:02 +01:00
Andreas Kemnade
0a27398d89 ARM: OMAP2+: omap_device: fix idling of devices during probe
commit ec76c2eea903947202098090bbe07a739b5246e9 upstream.

On the GTA04A5 od->_driver_status was not set to BUS_NOTIFY_BIND_DRIVER
during probe of the second mmc used for wifi. Therefore
omap_device_late_idle idled the device during probing causing oopses when
accessing the registers.

It was not set because od->_state was set to OMAP_DEVICE_STATE_IDLE
in the notifier callback. Therefore set od->_driver_status also in that
case.

This came apparent after commit 21b2cec61c ("mmc: Set
PROBE_PREFER_ASYNCHRONOUS for drivers that existed in v4.4") causing this
oops:

omap_hsmmc 480b4000.mmc: omap_device_late_idle: enabled but no driver.  Idling
8<--- cut here ---
Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa0b402c
...
(omap_hsmmc_set_bus_width) from [<c07996bc>] (omap_hsmmc_set_ios+0x11c/0x258)
(omap_hsmmc_set_ios) from [<c077b2b0>] (mmc_power_up.part.8+0x3c/0xd0)
(mmc_power_up.part.8) from [<c077c14c>] (mmc_start_host+0x88/0x9c)
(mmc_start_host) from [<c077d284>] (mmc_add_host+0x58/0x84)
(mmc_add_host) from [<c0799190>] (omap_hsmmc_probe+0x5fc/0x8c0)
(omap_hsmmc_probe) from [<c0666728>] (platform_drv_probe+0x48/0x98)
(platform_drv_probe) from [<c066457c>] (really_probe+0x1dc/0x3b4)

Fixes: 04abaf07f6 ("ARM: OMAP2+: omap_device: Sync omap_device and pm_runtime after probe defer")
Fixes: 21b2cec61c ("mmc: Set PROBE_PREFER_ASYNCHRONOUS for drivers that existed in v4.4")
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
[tony@atomide.com: left out extra parens, trimmed description stack trace]
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17 14:16:59 +01:00
Brian Gerst
797335659e fanotify: Fix sys_fanotify_mark() on native x86-32
commit 2ca408d9c749c32288bc28725f9f12ba30299e8f upstream.

Commit

  121b32a58a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments")

converted native x86-32 which take 64-bit arguments to use the
compat handlers to allow conversion to passing args via pt_regs.
sys_fanotify_mark() was however missed, as it has a general compat
handler. Add a config option that will use the syscall wrapper that
takes the split args for native 32-bit.

 [ bp: Fix typo in Kconfig help text. ]

Fixes: 121b32a58a ("x86/entry/32: Use IA32-specific wrappers for syscalls taking 64-bit arguments")
Reported-by: Paweł Jasiak <pawel@jasiak.xyz>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Jan Kara <jack@suse.cz>
Acked-by: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20201130223059.101286-1-brgerst@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-17 14:16:59 +01:00
Christophe Leroy
bca9ca5a60 powerpc/32s: Fix RTAS machine check with VMAP stack
[ Upstream commit 98bf2d3f4970179c702ef64db658e0553bc6ef3a ]

When we have VMAP stack, exception prolog 1 sets r1, not r11.

When it is not an RTAS machine check, don't trash r1 because it is
needed by prolog 1.

Fixes: da7bb43ab9 ("powerpc/32: Fix vmap stack - Properly set r1 before activating MMU")
Fixes: d2e006036082 ("powerpc/32: Use SPRN_SPRG_SCRATCH2 in exception prologs")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Squash in fixup for RTAS machine check from Christophe]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bc77d61d1c18940e456a2dee464f1e2eda65a3f0.1608621048.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-17 14:16:52 +01:00
Paolo Bonzini
563135ec66 KVM: x86: fix shift out of bounds reported by UBSAN
commit 2f80d502d627f30257ba7e3655e71c373b7d1a5a upstream.

Since we know that e >= s, we can reassociate the left shift,
changing the shifted number from 1 to 2 in exchange for
decreasing the right hand side by 1.

Reported-by: syzbot+e87846c48bf72bc85311@syzkaller.appspotmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:26 +01:00
Ying-Tsun Huang
02ccda90ef x86/mtrr: Correct the range check before performing MTRR type lookups
commit cb7f4a8b1fb426a175d1708f05581939c61329d4 upstream.

In mtrr_type_lookup(), if the input memory address region is not in the
MTRR, over 4GB, and not over the top of memory, a write-back attribute
is returned. These condition checks are for ensuring the input memory
address region is actually mapped to the physical memory.

However, if the end address is just aligned with the top of memory,
the condition check treats the address is over the top of memory, and
write-back attribute is not returned.

And this hits in a real use case with NVDIMM: the nd_pmem module tries
to map NVDIMMs as cacheable memories when NVDIMMs are connected. If a
NVDIMM is the last of the DIMMs, the performance of this NVDIMM becomes
very low since it is aligned with the top of memory and its memory type
is uncached-minus.

Move the input end address change to inclusive up into
mtrr_type_lookup(), before checking for the top of memory in either
mtrr_type_lookup_{variable,fixed}() helpers.

 [ bp: Massage commit message. ]

Fixes: 0cc705f56e ("x86/mm/mtrr: Clean up mtrr_type_lookup()")
Signed-off-by: Ying-Tsun Huang <ying-tsun.huang@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201215070721.4349-1-ying-tsun.huang@amd.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:26 +01:00
Aaro Koskinen
56429ddfd5 ARM: dts: OMAP3: disable AES on N950/N9
commit f1dc15cd7fc146107cad2a926d9c1d005f69002a upstream.

AES needs to be disabled on Nokia N950/N9 as well (HS devices), otherwise
kernel fails to boot.

Fixes: c312f06631 ("ARM: dts: omap3: Migrate AES from hwmods to sysc-omap2")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:25 +01:00
Nick Desaulniers
1cd7e30a6d arm64: link with -z norelro for LLD or aarch64-elf
commit 311bea3cb9ee20ef150ca76fc60a592bf6b159f5 upstream.

With GNU binutils 2.35+, linking with BFD produces warnings for vmlinux:
aarch64-linux-gnu-ld: warning: -z norelro ignored

BFD can produce this warning when the target emulation mode does not
support RELRO program headers, and -z relro or -z norelro is passed.

Alan Modra clarifies:
  The default linker emulation for an aarch64-linux ld.bfd is
  -maarch64linux, the default for an aarch64-elf linker is
  -maarch64elf.  They are not equivalent.  If you choose -maarch64elf
  you get an emulation that doesn't support -z relro.

The ARCH=arm64 kernel prefers -maarch64elf, but may fall back to
-maarch64linux based on the toolchain configuration.

LLD will always create RELRO program header regardless of target
emulation.

To avoid the above warning when linking with BFD, pass -z norelro only
when linking with LLD or with -maarch64linux.

Fixes: 3b92fa7485 ("arm64: link with -z norelro regardless of CONFIG_RELOCATABLE")
Fixes: 3bbd3db864 ("arm64: relocatable: fix inconsistencies in linker script and options")
Cc: <stable@vger.kernel.org> # 5.0.x-
Reported-by: kernelci.org bot <bot@kernelci.org>
Reported-by: Quentin Perret <qperret@google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Alan Modra <amodra@gmail.com>
Cc: Fāng-ruì Sòng <maskray@google.com>
Link: https://lore.kernel.org/r/20201218002432.788499-1-ndesaulniers@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:24 +01:00
Fenghua Yu
397e352ca9 x86/resctrl: Don't move a task to the same resource group
commit a0195f314a25582b38993bf30db11c300f4f4611 upstream.

Shakeel Butt reported in [1] that a user can request a task to be moved
to a resource group even if the task is already in the group. It just
wastes time to do the move operation which could be costly to send IPI
to a different CPU.

Add a sanity check to ensure that the move operation only happens when
the task is not already in the resource group.

[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/

Fixes: e02737d5b8 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/962ede65d8e95be793cb61102cca37f7bb018e66.1608243147.git.reinette.chatre@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:23 +01:00
Fenghua Yu
34e4ae4dca x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR
commit ae28d1aae48a1258bd09a6f707ebb4231d79a761 upstream.

Currently, when moving a task to a resource group the PQR_ASSOC MSR is
updated with the new closid and rmid in an added task callback. If the
task is running, the work is run as soon as possible. If the task is not
running, the work is executed later in the kernel exit path when the
kernel returns to the task again.

Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
is running is the right thing to do. Queueing work for a task that is
not running is unnecessary (the PQR_ASSOC MSR is already updated when
the task is scheduled in) and causing system resource waste with the way
in which it is implemented: Work to update the PQR_ASSOC register is
queued every time the user writes a task id to the "tasks" file, even if
the task already belongs to the resource group.

This could result in multiple pending work items associated with a
single task even if they are all identical and even though only a single
update with most recent values is needed. Specifically, even if a task
is moved between different resource groups while it is sleeping then it
is only the last move that is relevant but yet a work item is queued
during each move.

This unnecessary queueing of work items could result in significant
system resource waste, especially on tasks sleeping for a long time.
For example, as demonstrated by Shakeel Butt in [1] writing the same
task id to the "tasks" file can quickly consume significant memory. The
same problem (wasted system resources) occurs when moving a task between
different resource groups.

As pointed out by Valentin Schneider in [2] there is an additional issue
with the way in which the queueing of work is done in that the task_struct
update is currently done after the work is queued, resulting in a race with
the register update possibly done before the data needed by the update is
available.

To solve these issues, update the PQR_ASSOC MSR in a synchronous way
right after the new closid and rmid are ready during the task movement,
only if the task is running. If a moved task is not running nothing
is done since the PQR_ASSOC MSR will be updated next time the task is
scheduled. This is the same way used to update the register when tasks
are moved as part of resource group removal.

[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3SN5Pw@mail.gmail.com/
[2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.com

 [ bp: Massage commit message and drop the two update_task_closid_rmid()
   variants. ]

Fixes: e02737d5b8 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <shakeelb@google.com>
Reported-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: James Morse <james.morse@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/17aa2fb38fc12ce7bb710106b3e7c7b45acb9e94.1608243147.git.reinette.chatre@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:23 +01:00
Ben Gardon
c3cf9ffe8d KVM: x86/mmu: Ensure TDP MMU roots are freed after yield
commit a889ea54b3daa63ee1463dc19ed699407d61458b upstream.

Many TDP MMU functions which need to perform some action on all TDP MMU
roots hold a reference on that root so that they can safely drop the MMU
lock in order to yield to other threads. However, when releasing the
reference on the root, there is a bug: the root will not be freed even
if its reference count (root_count) is reduced to 0.

To simplify acquiring and releasing references on TDP MMU root pages, and
to ensure that these roots are properly freed, move the get/put operations
into another TDP MMU root iterator macro.

Moving the get/put operations into an iterator macro also helps
simplify control flow when a root does need to be freed. Note that using
the list_for_each_entry_safe macro would not have been appropriate in
this situation because it could keep a pointer to the next root across
an MMU lock release + reacquire, during which time that root could be
freed.

Reported-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: faaf05b00a ("kvm: x86/mmu: Support zapping SPTEs in the TDP MMU")
Fixes: 063afacd87 ("kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU")
Fixes: a6a0b05da9 ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
Fixes: 1488199856 ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20210107001935.3732070-1-bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:22 +01:00
Sean Christopherson
f4064ef40c KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE
commit 39b4d43e6003cee51cd119596d3c33d0449eb44c upstream.

Get the so called "root" level from the low level shadow page table
walkers instead of manually attempting to calculate it higher up the
stack, e.g. in get_mmio_spte().  When KVM is using PAE shadow paging,
the starting level of the walk, from the callers perspective, is not
the CR3 root but rather the PDPTR "root".  Checking for reserved bits
from the CR3 root causes get_mmio_spte() to consume uninitialized stack
data due to indexing into sptes[] for a level that was not filled by
get_walk().  This can result in false positives and/or negatives
depending on what garbage happens to be on the stack.

Opportunistically nuke a few extra newlines.

Fixes: 95fb5b0258 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Reported-by: Richard Herbert <rherbert@sympatico.ca>
Cc: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:22 +01:00
Sean Christopherson
afd621673f KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte()
commit 2aa078932ff6c66bf10cc5b3144440dbfa7d813d upstream.

Return -1 from the get_walk() helpers if the shadow walk doesn't fill at
least one spte, which can theoretically happen if the walk hits a
not-present PDPTR.  Returning the root level in such a case will cause
get_mmio_spte() to return garbage (uninitialized stack data).  In
practice, such a scenario should be impossible as KVM shouldn't get a
reserved-bit page fault with a not-present PDPTR.

Note, using mmu->root_level in get_walk() is wrong for other reasons,
too, but that's now a moot point.

Fixes: 95fb5b0258 ("kvm: x86/mmu: Support MMIO in the TDP MMU")
Cc: Ben Gardon <bgardon@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20201218003139.2167891-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:22 +01:00
Dan Williams
23220e87c9 x86/mm: Fix leak of pmd ptlock
commit d1c5246e08eb64991001d97a3bd119c93edbc79a upstream.

Commit

  28ee90fe60 ("x86/mm: implement free pmd/pte page interfaces")

introduced a new location where a pmd was released, but neglected to
run the pmd page destructor. In fact, this happened previously for a
different pmd release path and was fixed by commit:

  c283610e44 ("x86, mm: do not leak page->ptl for pmd page tables").

This issue was hidden until recently because the failure mode is silent,
but commit:

  b2b29d6d01 ("mm: account PMD tables like PTE tables")

turns the failure mode into this signature:

 BUG: Bad page state in process lt-pmem-ns  pfn:15943d
 page:000000007262ed7b refcount:0 mapcount:-1024 mapping:0000000000000000 index:0x0 pfn:0x15943d
 flags: 0xaffff800000000()
 raw: 00affff800000000 dead000000000100 0000000000000000 0000000000000000
 raw: 0000000000000000 ffff913a029bcc08 00000000fffffbff 0000000000000000
 page dumped because: nonzero mapcount
 [..]
  dump_stack+0x8b/0xb0
  bad_page.cold+0x63/0x94
  free_pcp_prepare+0x224/0x270
  free_unref_page+0x18/0xd0
  pud_free_pmd_page+0x146/0x160
  ioremap_pud_range+0xe3/0x350
  ioremap_page_range+0x108/0x160
  __ioremap_caller.constprop.0+0x174/0x2b0
  ? memremap+0x7a/0x110
  memremap+0x7a/0x110
  devm_memremap+0x53/0xa0
  pmem_attach_disk+0x4ed/0x530 [nd_pmem]
  ? __devm_release_region+0x52/0x80
  nvdimm_bus_probe+0x85/0x210 [libnvdimm]

Given this is a repeat occurrence it seemed prudent to look for other
places where this destructor might be missing and whether a better
helper is needed. try_to_free_pmd_page() looks like a candidate, but
testing with setting up and tearing down pmd mappings via the dax unit
tests is thus far not triggering the failure.

As for a better helper pmd_free() is close, but it is a messy fit
due to requiring an @mm arg. Also, ___pmd_free_tlb() wants to call
paravirt_tlb_remove_table() instead of free_page(), so open-coded
pgtable_pmd_page_dtor() seems the best way forward for now.

Debugged together with Matthew Wilcox <willy@infradead.org>.

Fixes: 28ee90fe60 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/160697689204.605323.17629854984697045602.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:22 +01:00
Nathan Chancellor
cb5a170e97 powerpc: Handle .text.{hot,unlikely}.* in linker script
commit 3ce47d95b7346dcafd9bed3556a8d072cb2b8571 upstream.

Commit eff8728fe6 ("vmlinux.lds.h: Add PGO and AutoFDO input
sections") added ".text.unlikely.*" and ".text.hot.*" due to an LLVM
change [1].

After another LLVM change [2], these sections are seen in some PowerPC
builds, where there is a orphan section warning then build failure:

$ make -skj"$(nproc)" \
       ARCH=powerpc CROSS_COMPILE=powerpc64le-linux-gnu- LLVM=1 O=out \
       distclean powernv_defconfig zImage.epapr
ld.lld: warning: kernel/built-in.a(panic.o):(.text.unlikely.) is being placed in '.text.unlikely.'
...
ld.lld: warning: address (0xc000000000009314) of section .text is not a multiple of alignment (256)
...
ERROR: start_text address is c000000000009400, should be c000000000008000
ERROR: try to enable LD_HEAD_STUB_CATCH config option
ERROR: see comments in arch/powerpc/tools/head_check.sh
...

Explicitly handle these sections like in the main linker script so
there is no more build failure.

[1]: https://reviews.llvm.org/D79600
[2]: https://reviews.llvm.org/D92493

Fixes: 83a092cf95 ("powerpc: Link warning for orphan sections")
Cc: stable@vger.kernel.org
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/ClangBuiltLinux/linux/issues/1218
Link: https://lore.kernel.org/r/20210104205952.1399409-1-natechancellor@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:17 +01:00
Randy Dunlap
2179bae04b local64.h: make <asm/local64.h> mandatory
[ Upstream commit 87dbc209ea04645fd2351981f09eff5d23f8e2e9 ]

Make <asm-generic/local64.h> mandatory in include/asm-generic/Kbuild and
remove all arch/*/include/asm/local64.h arch-specific files since they
only #include <asm-generic/local64.h>.

This fixes build errors on arch/c6x/ and arch/nios2/ for
block/blk-iocost.c.

Build-tested on 21 of 25 arch-es.  (tools problems on the others)

Yes, we could even rename <asm-generic/local64.h> to
<linux/local64.h> and change all #includes to use
<linux/local64.h> instead.

Link: https://lkml.kernel.org/r/20201227024446.17018-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <jacquiot.aurelien@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-12 20:18:16 +01:00
Heiko Carstens
13f9eec229 s390: always clear kernel stack backchain before calling functions
[ Upstream commit 9365965db0c7ca7fc81eee27c21d8522d7102c32 ]

Clear the kernel stack backchain before potentially calling the
lockdep trace_hardirqs_off/on functions. Without this walking the
kernel backchain, e.g. during a panic, might stop too early.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-06 14:56:55 +01:00
Gabriel Krisman Bertazi
ef3b9ad967 um: ubd: Submit all data segments atomically
[ Upstream commit fc6b6a872dcd48c6f39c7975836d75113db67d37 ]

Internally, UBD treats each physical IO segment as a separate command to
be submitted in the execution pipe.  If the pipe returns a transient
error after a few segments have already been written, UBD will tell the
block layer to requeue the request, but there is no way to reclaim the
segments already submitted.  When a new attempt to dispatch the request
is done, those segments already submitted will get duplicated, causing
the WARN_ON below in the best case, and potentially data corruption.

In my system, running a UML instance with 2GB of RAM and a 50M UBD disk,
I can reproduce the WARN_ON by simply running mkfs.fvat against the
disk on a freshly booted system.

There are a few ways to around this, like reducing the pressure on
the pipe by reducing the queue depth, which almost eliminates the
occurrence of the problem, increasing the pipe buffer size on the host
system, or by limiting the request to one physical segment, which causes
the block layer to submit way more requests to resolve a single
operation.

Instead, this patch modifies the format of a UBD command, such that all
segments are sent through a single element in the communication pipe,
turning the command submission atomic from the point of view of the
block layer.  The new format has a variable size, depending on the
number of elements, and looks like this:

+------------+-----------+-----------+------------
| cmd_header | segment 0 | segment 1 | segment ...
+------------+-----------+-----------+------------

With this format, we push a pointer to cmd_header in the submission
pipe.

This has the advantage of reducing the memory footprint of executing a
single request, since it allow us to merge some fields in the header.
It is possible to reduce even further each segment memory footprint, by
merging bitmap_words and cow_offset, for instance, but this is not the
focus of this patch and is left as future work.  One issue with the
patch is that for a big number of segments, we now perform one big
memory allocation instead of multiple small ones, but I wasn't able to
trigger any real issues or -ENOMEM because of this change, that wouldn't
be reproduced otherwise.

This was tested using fio with the verify-crc32 option, and by running
an ext4 filesystem over this UBD device.

The original WARN_ON was:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at lib/refcount.c:28 refcount_warn_saturate+0x13f/0x141
refcount_t: underflow; use-after-free.
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.5.0-rc6-00002-g2a5bb2cf75c8 #346
Stack:
 6084eed0 6063dc77 00000009 6084ef60
 00000000 604b8d9f 6084eee0 6063dcbc
 6084ef40 6006ab8d e013d780 1c00000000
Call Trace:
 [<600a0c1c>] ? printk+0x0/0x94
 [<6004a888>] show_stack+0x13b/0x155
 [<6063dc77>] ? dump_stack_print_info+0xdf/0xe8
 [<604b8d9f>] ? refcount_warn_saturate+0x13f/0x141
 [<6063dcbc>] dump_stack+0x2a/0x2c
 [<6006ab8d>] __warn+0x107/0x134
 [<6008da6c>] ? wake_up_process+0x17/0x19
 [<60487628>] ? blk_queue_max_discard_sectors+0x0/0xd
 [<6006b05f>] warn_slowpath_fmt+0xd1/0xdf
 [<6006af8e>] ? warn_slowpath_fmt+0x0/0xdf
 [<600acc14>] ? raw_read_seqcount_begin.constprop.0+0x0/0x15
 [<600619ae>] ? os_nsecs+0x1d/0x2b
 [<604b8d9f>] refcount_warn_saturate+0x13f/0x141
 [<6048bc8f>] refcount_sub_and_test.constprop.0+0x2f/0x37
 [<6048c8de>] blk_mq_free_request+0xf1/0x10d
 [<6048ca06>] __blk_mq_end_request+0x10c/0x114
 [<6005ac0f>] ubd_intr+0xb5/0x169
 [<600a1a37>] __handle_irq_event_percpu+0x6b/0x17e
 [<600a1b70>] handle_irq_event_percpu+0x26/0x69
 [<600a1bd9>] handle_irq_event+0x26/0x34
 [<600a1bb3>] ? handle_irq_event+0x0/0x34
 [<600a5186>] ? unmask_irq+0x0/0x37
 [<600a57e6>] handle_edge_irq+0xbc/0xd6
 [<600a131a>] generic_handle_irq+0x21/0x29
 [<60048f6e>] do_IRQ+0x39/0x54
 [...]
---[ end trace c6e7444e55386c0f ]---

Cc: Christopher Obbard <chris.obbard@collabora.com>
Reported-by: Martyn Welch <martyn@collabora.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Tested-by: Christopher Obbard <chris.obbard@collabora.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-06 14:56:55 +01:00
Christopher Obbard
a8b49c4bdf um: random: Register random as hwrng-core device
[ Upstream commit 72d3e093afae79611fa38f8f2cfab9a888fe66f2 ]

The UML random driver creates a dummy device under the guest,
/dev/hw_random. When this file is read from the guest, the driver
reads from the host machine's /dev/random, in-turn reading from
the host kernel's entropy pool. This entropy pool could have been
filled by a hardware random number generator or just the host
kernel's internal software entropy generator.

Currently the driver does not fill the guests kernel entropy pool,
this requires a userspace tool running inside the guest (like
rng-tools) to read from the dummy device provided by this driver,
which then would fill the guest's internal entropy pool.

This all seems quite pointless when we are already reading from an
entropy pool, so this patch aims to register the device as a hwrng
device using the hwrng-core framework. This not only improves and
cleans up the driver, but also fills the guest's entropy pool
without having to resort to using extra userspace tools in the guest.

This is typically a nuisance when booting a guest: the random pool
takes a long time (~200s) to build up enough entropy since the dummy
hwrng is not used to fill the guest's pool.

This port was originally attempted by Alexander Neville "dark" (in CC,
discussion in Link), but the conversation there stalled since the
handling of -EAGAIN errors were no removed and longer handled by the
driver. This patch attempts to use the existing method of error
handling but utilises the new hwrng core.

The issue can be noticed when booting a UML guest:

    [    2.560000] random: fast init done
    [  214.000000] random: crng init done

With the patch applied, filling the pool becomes a lot quicker:

    [    2.560000] random: fast init done
    [   12.000000] random: crng init done

Cc: Alexander Neville <dark@volatile.bz>
Link: https://lore.kernel.org/lkml/20190828204609.02a7ff70@TheDarkness/
Link: https://lore.kernel.org/lkml/20190829135001.6a5ff940@TheDarkness.local/
Cc: Sjoerd Simons <sjoerd.simons@collabora.co.uk>
Signed-off-by: Christopher Obbard <chris.obbard@collabora.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-06 14:56:55 +01:00
Nicholas Piggin
b1e155ccc8 powerpc/64: irq replay remove decrementer overflow check
[ Upstream commit 59d512e4374b2d8a6ad341475dc94c4a4bdec7d3 ]

This is way to catch some cases of decrementer overflow, when the
decrementer has underflowed an odd number of times, while MSR[EE] was
disabled.

With a typical small decrementer, a timer that fires when MSR[EE] is
disabled will be "lost" if MSR[EE] remains disabled for between 4.3 and
8.6 seconds after the timer expires. In any case, the decrementer
interrupt would be taken at 8.6 seconds and the timer would be found at
that point.

So this check is for catching extreme latency events, and it prevents
those latencies from being a further few seconds long.  It's not obvious
this is a good tradeoff. This is already a watchdog magnitude event and
that situation is not improved a significantly with this check. For
large decrementers, it's useless.

Therefore remove this check, which avoids a mftb when enabling hard
disabled interrupts (e.g., when enabling after coming from hardware
interrupt handlers). Perhaps more importantly, it also removes the
clunky MSR[EE] vs PACA_IRQ_HARD_DIS incoherency in soft-interrupt replay
which simplifies the code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201107014336.2337337-1-npiggin@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-06 14:56:54 +01:00
Qinglang Miao
498d90690f powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()
[ Upstream commit ffa1797040c5da391859a9556be7b735acbe1242 ]

I noticed that iounmap() of msgr_block_addr before return from
mpic_msgr_probe() in the error handling case is missing. So use
devm_ioremap() instead of just ioremap() when remapping the message
register block, so the mapping will be automatically released on
probe failure.

Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028091551.136400-1-miaoqinglang@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-01-06 14:56:53 +01:00
Baoquan He
98b57685c2 mm: memmap defer init doesn't work as expected
commit dc2da7b45ffe954a0090f5d0310ed7b0b37d2bd2 upstream.

VMware observed a performance regression during memmap init on their
platform, and bisected to commit 73a6e474cb ("mm: memmap_init:
iterate over memblock regions rather that check each PFN") causing it.

Before the commit:

  [0.033176] Normal zone: 1445888 pages used for memmap
  [0.033176] Normal zone: 89391104 pages, LIFO batch:63
  [0.035851] ACPI: PM-Timer IO Port: 0x448

With commit

  [0.026874] Normal zone: 1445888 pages used for memmap
  [0.026875] Normal zone: 89391104 pages, LIFO batch:63
  [2.028450] ACPI: PM-Timer IO Port: 0x448

The root cause is the current memmap defer init doesn't work as expected.

Before, memmap_init_zone() was used to do memmap init of one whole zone,
to initialize all low zones of one numa node, but defer memmap init of
the last zone in that numa node.  However, since commit 73a6e474cb,
function memmap_init() is adapted to iterater over memblock regions
inside one zone, then call memmap_init_zone() to do memmap init for each
region.

E.g, on VMware's system, the memory layout is as below, there are two
memory regions in node 2.  The current code will mistakenly initialize the
whole 1st region [mem 0xab00000000-0xfcffffffff], then do memmap defer to
iniatialize only one memmory section on the 2nd region [mem
0x10000000000-0x1033fffffff].  In fact, we only expect to see that there's
only one memory section's memmap initialized.  That's why more time is
costed at the time.

[    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[    0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
[    0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
[    0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
[    0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
[    0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]

Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
down the real zone end pfn so that defer_init() can use it to judge
whether defer need be taken in zone wide.

Link: https://lkml.kernel.org/r/20201223080811.16211-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20201223080811.16211-2-bhe@redhat.com
Fixes: commit 73a6e474cb ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Baoquan He <bhe@redhat.com>
Reported-by: Rahul Gopakumar <gopakumarr@vmware.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-06 14:56:50 +01:00
Yazen Ghannam
700d098ace x86/CPU/AMD: Save AMD NodeId as cpu_die_id
[ Upstream commit 028c221ed1904af9ac3c5162ee98f48966de6b3d ]

AMD systems provide a "NodeId" value that represents a global ID
indicating to which "Node" a logical CPU belongs. The "Node" is a
physical structure equivalent to a Die, and it should not be confused
with logical structures like NUMA nodes. Logical nodes can be adjusted
based on firmware or other settings whereas the physical nodes/dies are
fixed based on hardware topology.

The NodeId value can be used when a physical ID is needed by software.

Save the AMD NodeId to struct cpuinfo_x86.cpu_die_id. Use the value
from CPUID or MSR as appropriate. Default to phys_proc_id otherwise.
Do so for both AMD and Hygon systems.

Drop the node_id parameter from cacheinfo_*_init_llc_id() as it is no
longer needed.

Update the x86 topology documentation.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20201109210659.754018-2-Yazen.Ghannam@amd.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-12-30 11:54:29 +01:00
Steven Rostedt (VMware)
2bbb320656 Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"
commit adab66b71abfe206a020f11e561f4df41f0b2aba upstream.

It was believed that metag was the only architecture that required the ring
buffer to keep 8 byte words aligned on 8 byte architectures, and with its
removal, it was assumed that the ring buffer code did not need to handle
this case. It appears that sparc64 also requires this.

The following was reported on a sparc64 boot up:

   kernel: futex hash table entries: 65536 (order: 9, 4194304 bytes, linear)
   kernel: Running postponed tracer tests:
   kernel: Testing tracer function:
   kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140
   kernel: Kernel unaligned access at TPC[552a24] trace_function+0x44/0x140
   kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140
   kernel: Kernel unaligned access at TPC[552a24] trace_function+0x44/0x140
   kernel: Kernel unaligned access at TPC[552a20] trace_function+0x40/0x140
   kernel: PASSED

Need to put back the 64BIT aligned code for the ring buffer.

Link: https://lore.kernel.org/r/CADxRZqzXQRYgKc=y-KV=S_yHL+Y8Ay2mh5ezeZUnpRvg+syWKw@mail.gmail.com

Cc: stable@vger.kernel.org
Fixes: 86b3de60a0 ("ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS")
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30 11:54:29 +01:00
Johannes Berg
ef82413937 um: Fix time-travel mode
commit ff9632d2a66512436d616ef4c380a0e73f748db1 upstream.

Since the time-travel rework, basic time-travel mode hasn't worked
properly, but there's no longer a need for this WARN_ON() so just
remove it and thereby fix things.

Cc: stable@vger.kernel.org
Fixes: 4b786e24ca ("um: time-travel: Rewrite as an event scheduler")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30 11:54:17 +01:00
Anton Ivanov
c4b4253221 um: Remove use of asprinf in umid.c
commit 97be7ceaf7fea68104824b6aa874cff235333ac1 upstream.

asprintf is not compatible with the existing uml memory allocation
mechanism. Its use on the "user" side of UML results in a corrupt slab
state.

Fixes: 0d4e5ac7e7 ("um: remove uses of variable length arrays")
Cc: stable@vger.kernel.org
Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30 11:54:17 +01:00
David Hildenbrand
cd2eda58ea powerpc/powernv/memtrace: Fix crashing the kernel when enabling concurrently
commit d6718941a2767fb383e105d257d2105fe4f15f0e upstream.

It's very easy to crash the kernel right now by simply trying to
enable memtrace concurrently, hammering on the "enable" interface

loop.sh:
  #!/bin/bash

  dmesg --console-off

  while true; do
          echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
  done

[root@localhost ~]# loop.sh &
[root@localhost ~]# loop.sh &

Resulting quickly in a kernel crash. Let's properly protect using a
mutex.

Fixes: 9d5171a8f2 ("powerpc/powernv: Enable removal of memory for in memory tracing")
Cc: stable@vger.kernel.org# v4.14+
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-3-david@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30 11:54:16 +01:00
David Hildenbrand
4b8dcb006e powerpc/powernv/memtrace: Don't leak kernel memory to user space
commit c74cf7a3d59a21b290fe0468f5b470d0b8ee37df upstream.

We currently leak kernel memory to user space, because memory
offlining doesn't do any implicit clearing of memory and we are
missing explicit clearing of memory.

Let's keep it simple and clear pages before removing the linear
mapping.

Reproduced in QEMU/TCG with 10 GiB of main memory:
  [root@localhost ~]# dd obs=9G if=/dev/urandom of=/dev/null
  [... wait until "free -m" used counter no longer changes and cancel]
  19665802+0 records in
  1+0 records out
  9663676416 bytes (9.7 GB, 9.0 GiB) copied, 135.548 s, 71.3 MB/s
  [root@localhost ~]# cat /sys/devices/system/memory/block_size_bytes
  40000000
  [root@localhost ~]# echo 0x40000000 > /sys/kernel/debug/powerpc/memtrace/enable
  [  402.978663][ T1086] page:000000001bc4bc74 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x24900
  [  402.980063][ T1086] flags: 0x7ffff000001000(reserved)
  [  402.980415][ T1086] raw: 007ffff000001000 c00c000000924008 c00c000000924008 0000000000000000
  [  402.980627][ T1086] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
  [  402.980845][ T1086] page dumped because: unmovable page
  [  402.989608][ T1086] Offlined Pages 16384
  [  403.324155][ T1086] memtrace: Allocated trace memory on node 0 at 0x0000000200000000

Before this patch:
  [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace  | head
  00000000  c8 25 72 51 4d 26 36 c5  5c c2 56 15 d5 1a cd 10  |.%rQM&6.\.V.....|
  00000010  19 b9 50 b2 cb e3 60 b8  ec 0a f3 ec 4b 3c 39 f0  |..P...`.....K<9.|$
  00000020  4e 5a 4c cf bd 26 19 ff  37 79 13 67 24 b7 b8 57  |NZL..&..7y.g$..W|$
  00000030  98 3e f5 be 6f 14 6a bd  a4 52 bc 6e e9 e0 c1 5d  |.>..o.j..R.n...]|$
  00000040  76 b3 ae b5 88 d7 da e3  64 23 85 2c 10 88 07 b6  |v.......d#.,....|$
  00000050  9a d8 91 de f7 50 27 69  2e 64 9c 6f d3 19 45 79  |.....P'i.d.o..Ey|$
  00000060  6a 6f 8a 61 71 19 1f c7  f1 df 28 26 ca 0f 84 55  |jo.aq.....(&...U|$
  00000070  01 3f be e4 e2 e1 da ff  7b 8c 8e 32 37 b4 24 53  |.?......{..27.$S|$
  00000080  1b 70 30 45 56 e6 8c c4  0e b5 4c fb 9f dd 88 06  |.p0EV.....L.....|$
  00000090  ef c4 18 79 f1 60 b1 5c  79 59 4d f4 36 d7 4a 5c  |...y.`.\yYM.6.J\|$

After this patch:
  [root@localhost ~]# hexdump -C /sys/kernel/debug/powerpc/memtrace/00000000/trace  | head
  00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
  *
  40000000

Fixes: 9d5171a8f2 ("powerpc/powernv: Enable removal of memory for in memory tracing")
Cc: stable@vger.kernel.org # v4.14+
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201111145322.15793-2-david@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30 11:54:16 +01:00
Alexey Kardashevskiy
8fe4bee4c0 powerpc/powernv/npu: Do not attempt NPU2 setup on POWER8NVL NPU
commit b1198a88230f2ce50c271e22b82a8b8610b2eea9 upstream.

We execute certain NPU2 setup code (such as mapping an LPID to a device
in NPU2) unconditionally if an Nvlink bridge is detected. However this
cannot succeed on POWER8NVL machines and errors appear in dmesg. This is
harmless as skiboot returns an error and the only place we check it is
vfio-pci but that code does not get called on P8+ either.

This adds a check if pnv_npu2_xxx helpers are called on a machine with
NPU2 which initializes pnv_phb::npu in pnv_npu2_init();
pnv_phb::npu==NULL on POWER8/NVL (Naples).

While at this, fix NULL derefencing in pnv_npu_peers_take_ownership/
pnv_npu_peers_release_ownership which occurs when GPUs on mentioned P8s
cause EEH which happens if "vfio-pci" disables devices using
the D3 power state; the vfio-pci's disable_idle_d3 module parameter
controls this and must be set on Naples. The EEH handling clears
the entire pnv_ioda_pe struct in pnv_ioda_free_pe() hence
the NULL derefencing. We cannot recover from that but at least we stop
crashing.

Tested on
- POWER9 pvr=004e1201, Ubuntu 19.04 host, Ubuntu 18.04 vm,
  NVIDIA GV100 10de:1db1 driver 418.39
- POWER8 pvr=004c0100, RHEL 7.6 host, Ubuntu 16.10 vm,
  NVIDIA P100 10de:15f9 driver 396.47

Fixes: 1b785611e1 ("powerpc/powernv/npu: Add release_ownership hook")
Cc: stable@vger.kernel.org # 5.0
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201122073828.15446-1-aik@ozlabs.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30 11:54:16 +01:00