linux_dsm_epyc7002/arch/powerpc
Mahesh Salgaonkar 481fee8295 powerpc/eeh: Fix EEH handling for hugepages in ioremap space.
commit 5ae5bc12d0728db60a0aa9b62160ffc038875f1a upstream.

During the EEH MMIO error checking, the current implementation fails to map
the (virtual) MMIO address back to the pci device on radix with hugepage
mappings for I/O. This results into failure to dispatch EEH event with no
recovery even when EEH capability has been enabled on the device.

eeh_check_failure(token)		# token = virtual MMIO address
  addr = eeh_token_to_phys(token);
  edev = eeh_addr_cache_get_dev(addr);
  if (!edev)
	return 0;
  eeh_dev_check_failure(edev);	<= Dispatch the EEH event

In case of hugepage mappings, eeh_token_to_phys() has a bug in virt -> phys
translation that results in wrong physical address, which is then passed to
eeh_addr_cache_get_dev() to match it against cached pci I/O address ranges
to get to a PCI device. Hence, it fails to find a match and the EEH event
never gets dispatched leaving the device in failed state.

The commit 3343962068 ("powerpc/eeh: Handle hugepages in ioremap space")
introduced following logic to translate virt to phys for hugepage mappings:

eeh_token_to_phys():
+	pa = pte_pfn(*ptep);
+
+	/* On radix we can do hugepage mappings for io, so handle that */
+       if (hugepage_shift) {
+               pa <<= hugepage_shift;			<= This is wrong
+               pa |= token & ((1ul << hugepage_shift) - 1);
+       }

This patch fixes the virt -> phys translation in eeh_token_to_phys()
function.

  $ cat /sys/kernel/debug/powerpc/eeh_address_cache
  mem addr range [0x0000040080000000-0x00000400807fffff]: 0030:01:00.1
  mem addr range [0x0000040080800000-0x0000040080ffffff]: 0030:01:00.1
  mem addr range [0x0000040081000000-0x00000400817fffff]: 0030:01:00.0
  mem addr range [0x0000040081800000-0x0000040081ffffff]: 0030:01:00.0
  mem addr range [0x0000040082000000-0x000004008207ffff]: 0030:01:00.1
  mem addr range [0x0000040082080000-0x00000400820fffff]: 0030:01:00.0
  mem addr range [0x0000040082100000-0x000004008210ffff]: 0030:01:00.1
  mem addr range [0x0000040082110000-0x000004008211ffff]: 0030:01:00.0

Above is the list of cached io address ranges of pci 0030:01:00.<fn>.

Before this patch:

Tracing 'arg1' of function eeh_addr_cache_get_dev() during error injection
clearly shows that 'addr=' contains wrong physical address:

   kworker/u16:0-7       [001] ....   108.883775: eeh_addr_cache_get_dev:
	   (eeh_addr_cache_get_dev+0xc/0xf0) addr=0x80103000a510

dmesg shows no EEH recovery messages:

  [  108.563768] bnx2x: [bnx2x_timer:5801(eth2)]MFW seems hanged: drv_pulse (0x9ae) != mcp_pulse (0x7fff)
  [  108.563788] bnx2x: [bnx2x_hw_stats_update:870(eth2)]NIG timer max (4294967295)
  [  108.883788] bnx2x: [bnx2x_acquire_hw_lock:2013(eth1)]lock_status 0xffffffff  resource_bit 0x1
  [  108.884407] bnx2x 0030:01:00.0 eth1: MDC/MDIO access timeout
  [  108.884976] bnx2x 0030:01:00.0 eth1: MDC/MDIO access timeout
  <..>

After this patch:

eeh_addr_cache_get_dev() trace shows correct physical address:

  <idle>-0       [001] ..s.  1043.123828: eeh_addr_cache_get_dev:
	  (eeh_addr_cache_get_dev+0xc/0xf0) addr=0x40080bc7cd8

dmesg logs shows EEH recovery getting triggerred:

  [  964.323980] bnx2x: [bnx2x_timer:5801(eth2)]MFW seems hanged: drv_pulse (0x746f) != mcp_pulse (0x7fff)
  [  964.323991] EEH: Recovering PHB#30-PE#10000
  [  964.324002] EEH: PE location: N/A, PHB location: N/A
  [  964.324006] EEH: Frozen PHB#30-PE#10000 detected
  <..>

Fixes: 3343962068 ("powerpc/eeh: Handle hugepages in ioremap space")
Cc: stable@vger.kernel.org # v5.3+
Reported-by: Dominic DeMarco <ddemarc@us.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/161821396263.48361.2796709239866588652.stgit@jupiter
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11 14:47:35 +02:00
..
boot powerpc/boot: Fix build of dts/fsl 2020-12-30 11:54:01 +01:00
configs powerpc updates for 5.10 2020-10-16 12:21:15 -07:00
crypto crypto: powerpc/crc-vpmsum_test - Fix sparse endianness warning 2020-09-04 17:57:15 +10:00
include powerpc/powernv: Enable HAIL (HV AIL) for ISA v3.1 processors 2021-05-11 14:47:35 +02:00
kernel powerpc/eeh: Fix EEH handling for hugepages in ioremap space. 2021-05-11 14:47:35 +02:00
kexec powerpc/kexec_file: fix FDT size estimation for kdump kernel 2021-03-04 11:38:39 +01:00
kvm KVM: PPC: Make the VMX instruction emulation routines static 2021-03-04 11:38:01 +01:00
lib powerpc/sstep: Fix darn emulation 2021-03-25 09:04:12 +01:00
math-emu treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
mm powerpc/mm: Fix verification of MMU_FTR_TYPE_44x 2020-12-30 11:54:16 +01:00
net bpf, powerpc: Fix misuse of fallthrough in bpf_jit_comp() 2020-09-29 16:39:11 +02:00
oprofile powerpc/oprofile: fix spelling mistake "contex" -> "context" 2020-08-25 01:31:33 +10:00
perf powerpc/perf: Record counter overflow always if SAMPLE_IP is unset 2021-03-17 17:06:23 +01:00
platforms powerpc/pseries: Don't enforce MSI affinity with kdump 2021-03-17 17:06:10 +01:00
purgatory powerpc/kexec_file: Enable early kernel OPAL calls 2020-07-29 23:47:55 +10:00
sysdev powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe() 2021-01-06 14:56:53 +01:00
tools powerpc/tools: Remove 90 line limit in checkpatch script 2020-09-08 22:57:11 +10:00
xmon powerpc/xmon: Change printk() to pr_cont() 2020-12-30 11:54:16 +01:00
Kbuild powerpc/kexec: Move kexec files into a dedicated subdir. 2019-11-21 15:41:34 +11:00
Kconfig powerpc/47x: Disable 256k page size 2021-03-04 11:38:01 +01:00
Kconfig.debug powerpc: Remove Xilinx PPC405/PPC440 support 2020-05-28 23:24:34 +10:00
Makefile Kbuild fixes for v5.10 (2nd) 2020-12-06 10:31:39 -08:00
Makefile.postlink powerpc: unrel_branch_check.sh: use nm to find symbol value 2020-09-02 11:00:22 +10:00