Load the PowerNV platform pci controller ops into pci controllers
after all the operations are loaded into the platform ops struct, not
before.
Otherwise we aren't actually setting the ops properly which can break
IO for some devices.
Fixes: 65ebf4b63 ("powerpc/powernv: Move controller ops from ppc_md to controller_ops")
Reported-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Merge Richard's work to support SR-IOV on PowerNV. All generic PCI
patches acked by Bjorn.
Some minor conflicts with Daniel's pci_controller_ops work.
Conflicts:
arch/powerpc/include/asm/machdep.h
arch/powerpc/platforms/powernv/pci-ioda.c
This moves the PowerNV platform to use the pci_controller_ops
structure rather than ppc_md for PCI controller operations.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
pcibios_enable_device_hook returned an int. Every implementation
returned either -EINVAL or 0. The return value wasn't propagated by
the caller: any non-zero return value caused pcibios_enable_device
to return -EINVAL itself. Therefore, make the hook return a bool.
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
If M64 has been supported, the prefetchable 64-bits memory resources
shouldn't be mapped to the corresponding PE# via M32DT. Unfortunately,
we're doing that in pnv_ioda_setup_pe_seg() wrongly. The issue was
introduced by commit 262af55 ("powerpc/powernv: Enable M64 aperatus
for PHB3"). The patch fixes the issue by simply skipping M64 resources
when updating to M32DT.
Cc: <stable@vger.kernel.org> # v3.17+
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In struct pci_dn, the pcidev field is assigned but not used, so remove it.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When IOV BAR is big, each is covered by 4 M64 windows. This leads to
several VF PE sits in one PE in terms of M64.
Group VF PEs according to the M64 allocation.
[bhelgaas: use dev_printk() when possible]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
M64 aperture size is limited on PHB3. When the IOV BAR is too big, this
will exceed the limitation and failed to be assigned.
Introduce a different mechanism based on the IOV BAR size:
- if IOV BAR size is smaller than 64MB, expand to total_pe
- if IOV BAR size is bigger than 64MB, roundup power2
[bhelgaas: make dev_printk() output more consistent, use PCI_SRIOV_NUM_BARS]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On PowerNV platform, resource position in M64 BAR implies the PE# the
resource belongs to. In some cases, adjustment of a resource is necessary
to locate it to a correct position in M64 BAR .
This patch adds pnv_pci_vf_resource_shift() to shift the 'real' PF IOV BAR
address according to an offset.
Note:
After doing so, there would be a "hole" in the /proc/iomem when offset
is a positive value. It looks like the device return some mmio back to
the system, which actually no one could use it.
[bhelgaas: rework loops, rework overlap check, index resource[]
conventionally, remove pci_regs.h include, squashed with next patch]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Implement pcibios_iov_resource_alignment() on powernv platform.
On PowerNV platform, there are 3 cases for the IOV BAR:
1. initial state, the IOV BAR size is multiple times of VF BAR size
2. after expanded, the IOV BAR size is expanded to meet the M64 segment size
3. sizing stage, the IOV BAR is truncated to 0
pnv_pci_iov_resource_alignment() handle these three cases respectively.
[bhelgaas: adjust to drop "align" parameter, return pci_iov_resource_size()
if no ppc_md machdep_call version]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On PHB3, PF IOV BAR will be covered by M64 BAR to have better PE isolation.
M64 BAR is a type of hardware resource in PHB3, which could map a range of
MMIO to PE numbers on powernv platform. And this range is divided equally
by the number of total_pe with each divided range mapping to a PE number.
Also, the M64 BAR must map a MMIO range with power-of-two size.
The total_pe number is usually different from total_VFs, which can lead to
a conflict between MMIO space and the PE number.
For example, if total_VFs is 128 and total_pe is 256, the second half of
M64 BAR will be part of other PCI device, which may already belong to other
PEs.
This patch prevents the conflict by reserving additional space for the PF
IOV BAR, which is total_pe number of VF's BAR size.
[bhelgaas: make dev_printk() output more consistent, index resource[]
conventionally]
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Previously the iommu_table had the same lifetime as a struct pnv_ioda_pe
and was embedded in it. The pnv_ioda_pe was assigned to a PE on the bootup
stage. Since PEs are based on the hardware layout which is static in the
system, they will never get released. This means the iommu_table in the
pnv_ioda_pe will never get released either.
This no longer works for VF PE. VF PEs are created and released dynamically
when VFs are created and released. So we need to assign pnv_ioda_pe to VF
PEs respectively when VFs are enabled and clean up those resources for VF
PE when VFs are disabled. And iommu_table is one of the resources we need
to handle dynamically.
Current iommu_table is a static field in pnv_ioda_pe, which will face a
problem when freeing it. During the disabling of a VF,
pnv_pci_ioda2_release_dma_pe will call iommu_free_table to release the
iommu_table for this PE. A static iommu_table will fail in
iommu_free_table.
According to these requirement, this patch allocates iommu_table
dynamically.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
pci_dn is the extension of PCI device node and is created from device node.
Unfortunately, VFs are enabled dynamically by PF's driver and they don't
have corresponding device nodes and pci_dn, which is required to access
VFs' config spaces.
The patch creates pci_dn for VFs in pcibios_sriov_enable() on their PF,
and removes pci_dn for VFs in pcibios_sriov_disable() on their PF. When
VF's pci_dn is created, it's put to the child list of the pci_dn of PF's
upstream bridge. The pci_dn is linked to pci_dev during early fixup time
to setup the fast path.
[bhelgaas: add ifdef around add_one_dev_pci_info(), use dev_printk()]
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch removes unused file eeh-ioda.c and updates makefile
accordingly. Besides, the definition of "struct pnv_eeh_ops" and
the instances are all removed. Until now, the chip layer of EEH
implementation for PowerNV platform is removed completely.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch drops PHB EEH operation reset() and merges its logic to
eeh_ops::reset().
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When unbinding and rebinding the driver on a system with a card in PHB0, this
error condition is reached after a few attempts:
ERROR: Bad of_node_put() on /pciex@3fffe40000000
CPU: 0 PID: 3040 Comm: bash Not tainted 3.18.0-rc3-12545-g3627ffe #152
Call Trace:
[c000000721acb5c0] [c00000000086ef94] .dump_stack+0x84/0xb0 (unreliable)
[c000000721acb640] [c00000000073a0a8] .of_node_release+0xd8/0xe0
[c000000721acb6d0] [c00000000044bc44] .kobject_release+0x74/0xe0
[c000000721acb760] [c0000000007394fc] .of_node_put+0x1c/0x30
[c000000721acb7d0] [c000000000545cd8] .cxl_probe+0x1a98/0x1d50
[c000000721acb900] [c0000000004845a0] .local_pci_probe+0x40/0xc0
[c000000721acb980] [c000000000484998] .pci_device_probe+0x128/0x170
[c000000721acba30] [c00000000052400c] .driver_probe_device+0xac/0x2a0
[c000000721acbad0] [c000000000522468] .bind_store+0x108/0x160
[c000000721acbb70] [c000000000521448] .drv_attr_store+0x38/0x60
[c000000721acbbe0] [c000000000293840] .sysfs_kf_write+0x60/0xa0
[c000000721acbc50] [c000000000292500] .kernfs_fop_write+0x140/0x1d0
[c000000721acbcf0] [c000000000208648] .vfs_write+0xd8/0x260
[c000000721acbd90] [c000000000208b18] .SyS_write+0x58/0x100
[c000000721acbe30] [c000000000009258] syscall_exit+0x0/0x98
We are missing a call to of_node_get(). pnv_pci_to_phb_node() should
call of_node_get() otherwise np's reference count isn't incremented and
it might go away. Rename pnv_pci_to_phb_node() to pnv_pci_get_phb_node()
so it's clear it calls of_node_get().
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When IOMMU bypass is enabled, a PCI device can read and write memory
that was not mapped by the driver without causing an EEH. That might
cause memory corruption, for example.
When we disable bypass, DMA reads and writes to addresses not mapped by
the IOMMU will cause an EEH, allowing us to debug such issues.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The M64 range information is missed in dmesg, which would be helpful in debug.
This patch prints the M64 range information in the same format as M32.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Reviewed-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Turning snoops on is the last step in CAPP recovery. Sapphire is expected to
have reinitialized the PHB and done the previous recovery steps.
Add mode argument to opal call to do this. Driver can turn snoops off although
it does not currently.
Signed-off-by: Ryan Grimm <grimm@linux.vnet.ibm.com>
Acked-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some nice cleanups like removing bootmem, and removal of __get_cpu_var().
There is one patch to mm/gup.c. This is the generic GUP implementation, but is
only used by us and arm(64). We have an ack from Steve Capper, and although we
didn't get an ack from Andrew he told us to take the patch through the powerpc
tree.
There's one cxl patch. This is in drivers/misc, but Greg said he was happy for
us to manage fixes for it.
There is an infrastructure patch to support an IPMI driver for OPAL. That patch
also appears in Corey Minyard's IPMI tree, you may see a conflict there.
There is also an RTC driver for OPAL. We weren't able to get any response from
the RTC maintainer, Alessandro Zummo, so in the end we just merged the driver.
The usual batch of Freescale updates from Scott.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUiSTSAAoJEFHr6jzI4aWAirQP/3rIEng0LzLu5kW2zkGylIaM
SNDum1vze3mHiTFl+CFcSIGpC1UEULoB49HA+2oE/ExKpIceG6lpL2LP+wNh2FW5
mozjMjS6mZt4w1Fu1D2ZtgQc3O1T1pxkqsnZmPa8gVf5k5d5IQNPY6yB0pgVWwbV
gwBKxe4VwPAzJjppE9i9MDhNTJwmHZq0lI8XuoTXOOU/f+4G1WxmjrbyveQ7cRP5
i/sq2cKjxpWA+KDeIXo0GR0DpXR7qMeAvFX5xXY7oKuUJIFDM4kSHfmMYP6qLf5c
2vlsJqHVqfOgQdve41z1ooaPzNtg7ezVo+VqqguSgtSgwy2JUo/uHpnzz3gD1Olo
AP5+6xj8LZac0rTPxF4n4Hoyrp7AaaFjEFt1zqT9PWniZW4B41wtia0QORBNUf1S
UEmKAC9T3WZJ47mH7WMSadtOPF9E3Yd/zuiPD4udtptCNKPbr6/k1MpJPIW2D4Rn
BJ0QZTRd7V0yRofXxZtHxaMxq8pWd/Tip7J/zr/ghz+ulnH8BuFamuhCCLuJlESU
+A2PMfuseyTMpH9sMAmmTwSGPDKjaUFWvmFvY/n88NZL7r2LlomNrDWFSSQOIHUP
FxjYmjUMpZeexsfyRdgFV/INhYC3o3cso2fRGO45YK6nkxNnjNFEBS6WhQLvNLBu
sknd1WjXkuJtoMC15SrQ
=jvyT
-----END PGP SIGNATURE-----
Merge tag 'powerpc-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
Pull powerpc updates from Michael Ellerman:
"Some nice cleanups like removing bootmem, and removal of
__get_cpu_var().
There is one patch to mm/gup.c. This is the generic GUP
implementation, but is only used by us and arm(64). We have an ack
from Steve Capper, and although we didn't get an ack from Andrew he
told us to take the patch through the powerpc tree.
There's one cxl patch. This is in drivers/misc, but Greg said he was
happy for us to manage fixes for it.
There is an infrastructure patch to support an IPMI driver for OPAL.
There is also an RTC driver for OPAL. We weren't able to get any
response from the RTC maintainer, Alessandro Zummo, so in the end we
just merged the driver.
The usual batch of Freescale updates from Scott"
* tag 'powerpc-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux: (101 commits)
powerpc/powernv: Return to cpu offline loop when finished in KVM guest
powerpc/book3s: Fix partial invalidation of TLBs in MCE code.
powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault
powerpc/xmon: Cleanup the breakpoint flags
powerpc/xmon: Enable HW instruction breakpoint on POWER8
powerpc/mm/thp: Use tlbiel if possible
powerpc/mm/thp: Remove code duplication
powerpc/mm/hugetlb: Sanity check gigantic hugepage count
powerpc/oprofile: Disable pagefaults during user stack read
powerpc/mm: Check for matching hpte without taking hpte lock
powerpc: Drop useless warning in eeh_init()
powerpc/powernv: Cleanup unused MCE definitions/declarations.
powerpc/eeh: Dump PHB diag-data early
powerpc/eeh: Recover EEH error on ownership change for BCM5719
powerpc/eeh: Set EEH_PE_RESET on PE reset
powerpc/eeh: Refactor eeh_reset_pe()
powerpc: Remove more traces of bootmem
powerpc/pseries: Initialise nvram_pstore_info's buf_lock
cxl: Name interrupts in /proc/interrupt
cxl: Return error to PSL if IRQ demultiplexing fails & print clearer warning
...
Pull powerpc fixes from Michael Ellerman:
"Here are five fixes for you to pull please.
They're all CC'ed to stable except the "Fix PE state format" one which
went in this release"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
powerpc: 32 bit getcpu VDSO function uses 64 bit instructions
powerpc/powernv: Replace OPAL_DEASSERT_RESET with EEH_RESET_DEACTIVATE
powerpc/eeh: Fix PE state format
powerpc/pseries: Fix endiannes issue in RTAS call from xmon
powerpc/powernv: Fix the hmi event version check.
The flag passed to ioda_eeh_phb_reset() should be EEH_RESET_DEACTIVATE,
which is translated to OPAL_DEASSERT_RESET or something else by the
EEH backend accordingly.
The patch replaces OPAL_DEASSERT_RESET with EEH_RESET_DEACTIVATE for
ioda_eeh_phb_reset().
Cc: stable@vger.kernel.org
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Instead of the arch specific quirk which we are deprecating
and that drivers don't understand.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: <stable@vger.kernel.org>
Although we are now selecting NO_BOOTMEM, we still have some traces of
bootmem lying around. That is because even with NO_BOOTMEM there is
still a shim that converts bootmem calls into memblock calls, but
ultimately we want to remove all traces of bootmem.
Most of the patch is conversions from alloc_bootmem() to
memblock_virt_alloc(). In general a call such as:
p = (struct foo *)alloc_bootmem(x);
Becomes:
p = memblock_virt_alloc(x, 0);
We don't need the cast because memblock_virt_alloc() returns a void *.
The alignment value of zero tells memblock to use the default alignment,
which is SMP_CACHE_BYTES, the same value alloc_bootmem() uses.
We remove a number of NULL checks on the result of
memblock_virt_alloc(). That is because memblock_virt_alloc() will panic
if it can't allocate, in exactly the same way as alloc_bootmem(), so the
NULL checks are and always have been redundant.
The memory returned by memblock_virt_alloc() is already zeroed, so we
remove several memsets of the result of memblock_virt_alloc().
Finally we convert a few uses of __alloc_bootmem(x, y, MAX_DMA_ADDRESS)
to just plain memblock_virt_alloc(). We don't use memblock_alloc_base()
because MAX_DMA_ADDRESS is ~0ul on powerpc, so limiting the allocation
to that is pointless, 16XB ought to be enough for anyone.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When freezing compound PEs in pnv_ioda_freeze_pe(), we should bail
upon illegal master PE. We needn't freeze slave PE because it should
have been put into frozen state by hardware.
Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Nested if statements are always bad and the patch avoids one by
checking PHB type and bail in advance if necessary.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 262af55 ("powerpc/powernv: Enable M64 aperatus for PHB3")
introduced compound PEs in order to support M64 aperatus on PHB3.
However, we never configured PELTV for compound PEs. The patch
fixes that by: parent PE can freeze all child compound PEs. Any
compound PE affects the group.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch initializes PE instance when reserving PE number to
keep consistent things as we did before. Also, it replaces the
iteration on bridge's windows with the prefered way.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch renames alloc_m64_pe() to reserve_m64_pe() to reflect
its real usage: We reserve PE numbers for M64 segments in advance
and then pick up the reserved PE numbers when building the mapping
between PE numbers and M64 segments.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The M64 resource should be removed if we don't have hook to
initialize it, or (not and) fail to do that.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The patch checks PHB type a bit early to save a bit cycles
for P7 because we don't support M64 for P7IOC no matter what
OPAL firmware we have.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
This adds a number of functions for allocating IRQs under powernv PCIe for cxl.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Some of the MSI IRQ code in pnv_pci_ioda_msi_setup() is generically useful so
split it out.
This will be used by some of the cxl PCIe code later.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The dma_get_required_mask() function is used by some drivers to
query the platform about what DMA mask is needed to cover all of
memory. This is a bit of a strange semantic when we have to choose
between IOMMU translation or bypass, but essentially what it means
is "what DMA mask will give best performances".
Currently, our IOMMU backend always returns a 32-bit mask here, we
don't do anything special to it when we have bypass available. This
causes some drivers to choose a 32-bit mask, thus losing the ability
to use the bypass window, thinking this is more efficient. The problem
was reported from the driver of following device:
0004:03:00.0 0107: 1000:0087 (rev 05)
0004:03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios \
Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)
This patch adds an override of that function in order to, instead,
return a 64-bit mask whenever a bypass window is available in order
for drivers to prefer this configuration.
Reported-by: Murali N. Iyer <mniyer@us.ibm.com>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The names of PCI reset scopes aren't sychronized with firmware.
The patch fixes it.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
No need for 3 functions when a single one will do.
Modify the function declaring macros to call the single function.
Reduces object code size a little:
$ size arch/powerpc/platforms/powernv/pci-ioda.o*
text data bss dec hex filename
22303 1073 6680 30056 7568 arch/powerpc/platforms/powernv/pci-ioda.o.new
22840 1121 6776 30737 7811 arch/powerpc/platforms/powernv/pci-ioda.o.old
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The return value is unnecessary and unused, so make the functions
void instead of int.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When we take full hotplug to recover from EEH errors, PCI buses
could be involved. For the case, the child devices of involved
PCI buses can't be attached to IOMMU group properly, which is
caused by commit 3f28c5a ("powerpc/powernv: Reduce multi-hit of
iommu_add_device()").
When adding the PCI devices of the newly created PCI buses to
the system, the IOMMU group is expected to be added in (C).
(A) fails to bind the IOMMU group because bus->is_added is
false. (B) fails because the device doesn't have binding IOMMU
table yet. bus->is_added is set to true at end of (C) and
pdev->is_added is set to true at (D).
pcibios_add_pci_devices()
pci_scan_bridge()
pci_scan_child_bus()
pci_scan_slot()
pci_scan_single_device()
pci_scan_device()
pci_device_add()
pcibios_add_device() A: Ignore
device_add() B: Ignore
pcibios_fixup_bus()
pcibios_setup_bus_devices()
pcibios_setup_device() C: Hit
pcibios_finish_adding_to_bus()
pci_bus_add_devices()
pci_bus_add_device() D: Add device
If the parent PCI bus isn't involved in hotplug, the IOMMU
group is expected to be bound in (B). (A) should fail as the
sysfs entries aren't populated.
The patch fixes the issue by reverting commit 3f28c5a and remove
WARN_ON() in iommu_add_device() to allow calling the function
even the specified device already has associated IOMMU group.
Cc: <stable@vger.kernel.org> # 3.16+
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The patch introduces 3 PHB callbacks: compound PE state retrieval,
force freezing and unfreezing compound PE. The PCI config accessors
and PowerNV EEH backend can use them in subsequent patches.
We don't export the capability of compound PE to EEH core, which
helps avoiding more complexity to EEH core.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch enables M64 aperatus for PHB3.
We already had platform hook (ppc_md.pcibios_window_alignment) to affect
the PCI resource assignment done in PCI core so that each PE's M32 resource
was built on basis of M32 segment size. Similarly, we're using that for
M64 assignment on basis of M64 segment size.
* We're using last M64 BAR to cover M64 aperatus, and it's shared by all
256 PEs.
* We don't support P7IOC yet. However, some function callbacks are added
to (struct pnv_phb) so that we can reuse them on P7IOC in future.
* PE, corresponding to PCI bus with large M64 BAR device attached, might
span multiple M64 segments. We introduce "compound" PE to cover the case.
The compound PE is a list of PEs and the master PE is used as before.
The slave PEs are just for MMIO isolation.
Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
There are multiple global EEH flags. Almost each flag has its own
accessor, which doesn't make sense. The patch refactors EEH flag
accessors so that they look unified:
eeh_add_flag(): Add EEH flag
eeh_clear_flag(): Clear EEH flag
eeh_has_flag(): Check if one specific flag has been set
eeh_enabled(): Check if EEH functionality has been enabled
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On PHB3, PCI devices can bypass IOMMU for DMA access. If we pass
through one PCI device, whose hose driver ever enable the bypass
mode, pdev->dev.archdata.dma_data.iommu_table_base isn't IOMMU
table. However, EEH needs access the IOMMU table when the device
is owned by guest.
The patch fixes pdev->dev.archdata.dma_data.iommu_table when
passing through the device to guest in pnv_pci_ioda2_set_bypass().
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc defines various machine-specific routines for handling
pci_set_dma_mask(). The routines for machine "PowerNV" may neglect
to set dev->dma_mask. This could confuse anyone (e.g. drivers) that
consult dev->dma_mask to find the current mask. Set the dma_mask in
the PowerNV leaf routine.
Signed-off-by: Brian W. Hart <hartb@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The sysfs entries are lost because of commit 2213fb1 ("powerpc/eeh:
Skip eeh sysfs when eeh is disabled"). That commit added condition
to create sysfs entries with EEH_ENABLED, which isn't populated
when trying to create sysfs entries on PowerNV platform during system
boot time. The patch fixes the issue by:
* Reoder EEH initialization functions so that they're same on
PowerNV/pSeries.
* Cache PE's primary bus by PowerNV platform instead of EEH core
to avoid kernel crash caused by the function reorder. Another
benefit with this is to avoid one eeh_probe_mode_dev() in EEH
core.
Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Since a TCE page size can be other than 4K, make it configurable for
P5IOC2 and IODA PHBs.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This fixes IODA1/2 to use it_page_shift as it may be bigger than 4K.
This changes involved constant values to use "ull" modifier.
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
If the PE contains single PCI function, "pe->pbus" would be NULL.
It's not reliable to be used by pci_domain_nr(). We just grab the
PCI domain number from the PCI host controller (struct pci_controller)
instance.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In function pnv_pci_ioda2_setup_dma_pe(), the IOMMU table type is
set to (TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE) unconditionally.
It was just set to TCE_PCI by pnv_pci_setup_iommu_table(). So the
primary IOMMU table type (TCE_PCI) is lost. The patch fixes it.
Also, pnv_pci_setup_iommu_table() already set "tbl->it_busno" to
zero and we needn't do it again. The patch removes the redundant
assignment.
The patch also fixes similar issues in pnv_pci_ioda_setup_dma_pe().
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>