Pass-through devices to VM guest can get updated IRQ affinity
information via irq_set_affinity() when not running in guest mode.
Currently, AMD IOMMU driver in GA mode ignores the updated information
if the pass-through device is setup to use vAPIC regardless of guest_mode.
This could cause invalid interrupt remapping.
Also, the guest_mode bit should be set and cleared only when
SVM updates posted-interrupt interrupt remapping information.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Joerg Roedel <jroedel@suse.de>
Fixes: d98de49a53 ('iommu/amd: Enable vAPIC interrupt remapping mode by default')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add the missing name, so debugging will work proper.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Joerg Roedel <joro@8bytes.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: iommu@lists.linux-foundation.org
Cc: Christoph Hellwig <hch@lst.de>
Link: http://lkml.kernel.org/r/20170619235443.343236995@linutronix.de
To benefit from IOTLB flushes on other CPUs we have to free
the already flushed IOVAs from the ring-buffer before we do
the queue_ring_full() check.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When booting into a kdump kernel, suppress IO_PAGE_FAULTs by
default for all devices. But allow the faults again when a
domain is assigned to a device.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Add a timer to each dma_ops domain so that we flush unused
IOTLB entries regularily, even if the queues don't get full
all the time.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The counters are increased every time the TLB for a given
domain is flushed. We also store the current value of that
counter into newly added entries of the flush-queue, so that
we can tell whether this entry is already flushed.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The queue flushing is pretty inefficient when it flushes the
queues for all cpus at once. Further it flushes all domains
from all IOMMUs for all CPUs, which is overkill as well.
Rip it out to make room for something more efficient.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Currently if there is no room to add a command to the command buffer, the
driver performs a "completion wait" which only returns when all commands
on the queue have been processed. There is no need to wait for the entire
command queue to be executed before adding the next command.
Update the driver to perform the same udelay() loop that the "completion
wait" performs, but instead re-read the head pointer to determine if
sufficient space is available. The very first time it is found that there
is no space available, the udelay() will be skipped to immediately perform
the opportunistic read of the head pointer. If it is still found that
there is not sufficient space, then the udelay() will be performed.
Signed-off-by: Leo Duran <leo.duran@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
As newer, higher speed devices are developed, perf data shows that the
amount of MMIO that is performed when submitting commands to the IOMMU
causes performance issues. Currently, the command submission path reads
the command buffer head and tail pointers and then writes the tail
pointer once the command is ready.
The tail pointer is only ever updated by the driver so it can be tracked
by the driver without having to read it from the hardware.
The head pointer is updated by the hardware, but can be read
opportunistically. Reading the head pointer only when it appears that
there might not be room in the command buffer and then re-checking the
available space reduces the number of times the head pointer has to be
read.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
struct irq_domain_ops is not modified, so it can be made const.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Misbehaving devices can cause an endless chain of
io-page-faults, flooding dmesg and making the system-log
unusable or even prevent the system from booting.
So ratelimit the error messages about io-page-faults on a
per-device basis.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Introduce amd_iommu_get_num_iommus(), which returns the value of
amd_iommus_present. The function is used to replace direct access to the
variable, which is now declared as static.
This function will also be used by AMD IOMMU perf driver.
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Jörg Rödel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: iommu@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/1487926102-13073-6-git-send-email-Suravee.Suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The introduction of reserved regions has left a couple of rough edges
which we could do with sorting out sooner rather than later. Since we
are not yet addressing the potential dynamic aspect of software-managed
reservations and presenting them at arbitrary fixed addresses, it is
incongruous that we end up displaying hardware vs. software-managed MSI
regions to userspace differently, especially since ARM-based systems may
actually require one or the other, or even potentially both at once,
(which iommu-dma currently has no hope of dealing with at all). Let's
resolve the former user-visible inconsistency ASAP before the ABI has
been baked into a kernel release, in a way that also lays the groundwork
for the latter shortcoming to be addressed by follow-up patches.
For clarity, rename the software-managed type to IOMMU_RESV_SW_MSI, use
IOMMU_RESV_MSI to describe the hardware type, and document everything a
little bit. Since the x86 MSI remapping hardware falls squarely under
this meaning of IOMMU_RESV_MSI, apply that type to their regions as well,
so that we tell the same story to userspace across all platforms.
Secondly, as the various region types require quite different handling,
and it really makes little sense to ever try combining them, convert the
bitfield-esque #defines to a plain enum in the process before anyone
gets the wrong impression.
Fixes: d30ddcaa7b ("iommu: Add a new type field in iommu_resv_region")
Reviewed-by: Eric Auger <eric.auger@redhat.com>
CC: Alex Williamson <alex.williamson@redhat.com>
CC: David Woodhouse <dwmw2@infradead.org>
CC: kvm@vger.kernel.org
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Bart Van Assche noted that the ib DMA mapping code was significantly
similar enough to the core DMA mapping code that with a few changes
it was possible to remove the IB DMA mapping code entirely and
switch the RDMA stack to use the core DMA mapping code. This resulted
in a nice set of cleanups, but touched the entire tree. This branch
will be submitted separately to Linus at the end of the merge window
as per normal practice for tree wide changes like this.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYo06oAAoJELgmozMOVy/d9Z8QALedWHdu98St1L0u2c8sxnR9
2zo/4sF5Vb9u7FpmdIX32L4SQ9s9KhPE8Qp8NtZLf9v10zlDebIRJDpXknXtKooV
CAXxX4sxBXV27/UrhbZEfXiPrmm6ccJFyIfRnMU6NlMqh2AtAsRa5AC2/RMp8oUD
Med97PFiF0o6TD22/UH1VFbRpX1zjaKyqm7a3as5sJfzNA+UGIZAQ7Euz8000DKZ
xCgVLTEwS0FmOujtBkCst7xa9TjuqR1HLOB4DdGvAhP6BHdz2yamM7Qmh9NN+NEX
0BtjsuXomtn6j6AszGC+bpipCZh3NUigcwoFAARXCYFHibBvo4DPdFeGsraFgXdy
1+KyR8CCeQG3Aly5Vwr264RFPGkGpwMj8PsBlXgQVtrlg4rriaCzOJNmIIbfdADw
ftqhxBOzReZw77aH2s+9p2ILRfcAmPqhynLvFGFo9LBvsik8LVso7YgZN0xGxwcI
IjI/XGC8UskPVsIZBIYA6sl2bYzgOjtBIHiXjRrPlW3uhduIXLrvKFfLPP/5XLAG
ehLXK+J0bfsyY9ClmlNS8oH/WdLhXAyy/KNmnj5bRRm9qg6BRJR3bsOBhZJODuoC
XgEXFfF6/7roNESWxowff7pK0rTkRg/m/Pa4VQpeO+6NWHE7kgZhL6kyIp5nKcwS
3e7mgpcwC+3XfA/6vU3F
=e0Si
-----END PGP SIGNATURE-----
Merge tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma DMA mapping updates from Doug Ledford:
"Drop IB DMA mapping code and use core DMA code instead.
Bart Van Assche noted that the ib DMA mapping code was significantly
similar enough to the core DMA mapping code that with a few changes it
was possible to remove the IB DMA mapping code entirely and switch the
RDMA stack to use the core DMA mapping code.
This resulted in a nice set of cleanups, but touched the entire tree
and has been kept separate for that reason."
* tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
IB/core: Remove ib_device.dma_device
nvme-rdma: Switch from dma_device to dev.parent
RDS: net: Switch from dma_device to dev.parent
IB/srpt: Modify a debug statement
IB/srp: Switch from dma_device to dev.parent
IB/iser: Switch from dma_device to dev.parent
IB/IPoIB: Switch from dma_device to dev.parent
IB/rxe: Switch from dma_device to dev.parent
IB/vmw_pvrdma: Switch from dma_device to dev.parent
IB/usnic: Switch from dma_device to dev.parent
IB/qib: Switch from dma_device to dev.parent
IB/qedr: Switch from dma_device to dev.parent
IB/ocrdma: Switch from dma_device to dev.parent
IB/nes: Remove a superfluous assignment statement
IB/mthca: Switch from dma_device to dev.parent
IB/mlx5: Switch from dma_device to dev.parent
IB/mlx4: Switch from dma_device to dev.parent
IB/i40iw: Remove a superfluous assignment statement
IB/hns: Switch from dma_device to dev.parent
...
The callers of the DMA alloc functions already provide the proper
context GFP flags. Make sure to pass them through to the CMA allocator,
to make the CMA compaction context aware.
Link: http://lkml.kernel.org/r/20170127172328.18574-3-l.stach@pengutronix.de
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alexander Graf <agraf@suse.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is currently support for iommu sysfs bindings, but
those need to be implemented in the IOMMU drivers. Add a
more generic version of this by adding a struct device to
struct iommu_device and use that for the sysfs bindings.
Also convert the AMD and Intel IOMMU driver to make use of
it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This struct represents one hardware iommu in the iommu core
code. For now it only has the iommu-ops associated with it,
but that will be extended soon.
The register/unregister interface is also added, as well as
making use of it in the Intel and AMD IOMMU drivers.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Some but not all architectures provide set_dma_ops(). Move dma_ops
from struct dev_archdata into struct device such that it becomes
possible on all architectures to configure dma_ops per device.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Russell King <linux@armlinux.org.uk>
Cc: x86@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch registers the MSI and HT regions as non mappable
reserved regions. They will be exposed in the iommu-group sysfs.
For direct-mapped regions let's also use iommu_alloc_resv_region().
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We introduce a new field to differentiate the reserved region
types and specialize the apply_resv_region implementation.
Legacy direct mapped regions have IOMMU_RESV_DIRECT type.
We introduce 2 new reserved memory types:
- IOMMU_RESV_MSI will characterize MSI regions that are mapped
- IOMMU_RESV_RESERVED characterize regions that cannot by mapped.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We want to extend the callbacks used for dm regions and
use them for reserved regions. Reserved regions can be
- directly mapped regions
- regions that cannot be iommu mapped (PCI host bridge windows, ...)
- MSI regions (because they belong to another address space or because
they are not translated by the IOMMU and need special handling)
So let's rename the struct and also the callbacks.
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Tomasz Nowicki <tomasz.nowicki@caviumnetworks.com>
Tested-by: Bharat Bhushan <bharat.bhushan@nxp.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The generic command buffer entry is 128 bits (16 bytes), so the offset
of tail and head pointer should be 16 bytes aligned and increased with
0x10 per command.
When cmd buf is full, head = (tail + 0x10) % CMD_BUFFER_SIZE.
So when left space of cmd buf should be able to store only two
command, we should be issued one COMPLETE_WAIT additionally to wait
all older commands completed. Then the left space should be increased
after IOMMU fetching from cmd buf.
So left check value should be left <= 0x20 (two commands).
Signed-off-by: Huang Rui <ray.huang@amd.com>
Fixes: ac0ea6e92b ('x86/amd-iommu: Improve handling of full command buffer')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If acpihid_device_group() finds an existing group for the relevant
devid, it should be taking an additional reference on that group.
Otherwise, the caller of iommu_group_get_for_dev() will inadvertently
remove the reference taken by iommu_group_add_device(), and the group
will be freed prematurely if any device is removed.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Including:
* Support for interrupt virtualization in the AMD IOMMU driver.
These patches were shared with the KVM tree and are already
merged through that tree.
* Generic DT-binding support for the ARM-SMMU driver. With this
the driver now makes use of the generic DMA-API code. This
also required some changes outside of the IOMMU code, but
these are acked by the respective maintainers.
* More cleanups and fixes all over the place.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJX/PmUAAoJECvwRC2XARrjyloP/1hymxXC2yXZ4EIBTHSO5X+c
jSJaGTIbQAdQDpllscSNJ0Au43L3vGtJcHo4JqwEERNlwLsU82LH7QJhq+q1La/b
5cPaY5gI3E++qxQt8umuZJAIUQthFYrfGoS5lJc5t5r/d8iVsLWbW4VkR19/1o7A
4/Uz7ETmi9VVy8Hkvumx+PQ0VHJet381KB7ud9LU5Spim0En2AAGwZXLMkmxXd2W
uDQ+O1rlDVc2/ka3+GmfZEml5EASWRqS/MTNoU/ZbQGYWKCWygXbuiqt6gLudWjx
dCR1Knh68b0gN6k/QAj8XY/1gkfmZ3YkfS0AHIMLYTFRT51BuxOrkXrBdkYnWEBv
UirmaiV87SlR1j83yb3ZmjpBPvd2sGWYFDqY1P0riLutjGUS6zycWWs13olvbfbz
SFrH7PT7JPQGYprI1oVn4ihszjN1NZ4+Gj7QBhyFW6FtvqTzmaFVsMOlDIeg1FwR
k8cOzov4NG33Bp4IpsHK8e0/qV6K3oJOiOQgCyQp9kPKK+UWv9v9+HaEA7npJuRV
c+lTE6j3G4LjEoVybkqm8TiPKxTMVNjUjgA3kwB2yNkCQT7hTCNYIAFrtfCYjYdo
B1dnFE7feVqtoimnu2qvkVs59hWlF7Hc3RRHoBxMmO8DLwl9n2OcmoQIeCTsviss
i9aNwC9bzBs+Hd3X/psB
=1hFE
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
- support for interrupt virtualization in the AMD IOMMU driver. These
patches were shared with the KVM tree and are already merged through
that tree.
- generic DT-binding support for the ARM-SMMU driver. With this the
driver now makes use of the generic DMA-API code. This also required
some changes outside of the IOMMU code, but these are acked by the
respective maintainers.
- more cleanups and fixes all over the place.
* tag 'iommu-updates-v4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (40 commits)
iommu/amd: No need to wait iommu completion if no dte irq entry change
iommu/amd: Free domain id when free a domain of struct dma_ops_domain
iommu/amd: Use standard bitmap operation to set bitmap
iommu/amd: Clean up the cmpxchg64 invocation
iommu/io-pgtable-arm: Check for v7s-incapable systems
iommu/dma: Avoid PCI host bridge windows
iommu/dma: Add support for mapping MSIs
iommu/arm-smmu: Set domain geometry
iommu/arm-smmu: Wire up generic configuration support
Docs: dt: document ARM SMMU generic binding usage
iommu/arm-smmu: Convert to iommu_fwspec
iommu/arm-smmu: Intelligent SMR allocation
iommu/arm-smmu: Add a stream map entry iterator
iommu/arm-smmu: Streamline SMMU data lookups
iommu/arm-smmu: Refactor mmu-masters handling
iommu/arm-smmu: Keep track of S2CR state
iommu/arm-smmu: Consolidate stream map entry state
iommu/arm-smmu: Handle stream IDs more dynamically
iommu/arm-smmu: Set PRIVCFG in stage 1 STEs
iommu/arm-smmu: Support non-PCI devices with SMMUv3
...
All architectures:
Move `make kvmconfig` stubs from x86; use 64 bits for debugfs stats.
ARM:
Important fixes for not using an in-kernel irqchip; handle SError
exceptions and present them to guests if appropriate; proxying of GICV
access at EL2 if guest mappings are unsafe; GICv3 on AArch32 on ARMv8;
preparations for GICv3 save/restore, including ABI docs; cleanups and
a bit of optimizations.
MIPS:
A couple of fixes in preparation for supporting MIPS EVA host kernels;
MIPS SMP host & TLB invalidation fixes.
PPC:
Fix the bug which caused guests to falsely report lockups; other minor
fixes; a small optimization.
s390:
Lazy enablement of runtime instrumentation; up to 255 CPUs for nested
guests; rework of machine check deliver; cleanups and fixes.
x86:
IOMMU part of AMD's AVIC for vmexit-less interrupt delivery; Hyper-V
TSC page; per-vcpu tsc_offset in debugfs; accelerated INS/OUTS in
nVMX; cleanups and fixes.
-----BEGIN PGP SIGNATURE-----
iQEcBAABCAAGBQJX9iDrAAoJEED/6hsPKofoOPoIAIUlgojkb9l2l1XVDgsXdgQL
sRVhYSVv7/c8sk9vFImrD5ElOPZd+CEAIqFOu45+NM3cNi7gxip9yftUVs7wI5aC
eDZRWm1E4trDZLe54ZM9ThcqZzZZiELVGMfR1+ZndUycybwyWzafpXYsYyaXp3BW
hyHM3qVkoWO3dxBWFwHIoO/AUJrWYkRHEByKyvlC6KPxSdBPSa5c1AQwMCoE0Mo4
K/xUj4gBn9eMelNhg4Oqu/uh49/q+dtdoP2C+sVM8bSdquD+PmIeOhPFIcuGbGFI
B+oRpUhIuntN39gz8wInJ4/GRSeTuR2faNPxMn4E1i1u4LiuJvipcsOjPfe0a18=
=fZRB
-----END PGP SIGNATURE-----
Merge tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Radim Krčmář:
"All architectures:
- move `make kvmconfig` stubs from x86
- use 64 bits for debugfs stats
ARM:
- Important fixes for not using an in-kernel irqchip
- handle SError exceptions and present them to guests if appropriate
- proxying of GICV access at EL2 if guest mappings are unsafe
- GICv3 on AArch32 on ARMv8
- preparations for GICv3 save/restore, including ABI docs
- cleanups and a bit of optimizations
MIPS:
- A couple of fixes in preparation for supporting MIPS EVA host
kernels
- MIPS SMP host & TLB invalidation fixes
PPC:
- Fix the bug which caused guests to falsely report lockups
- other minor fixes
- a small optimization
s390:
- Lazy enablement of runtime instrumentation
- up to 255 CPUs for nested guests
- rework of machine check deliver
- cleanups and fixes
x86:
- IOMMU part of AMD's AVIC for vmexit-less interrupt delivery
- Hyper-V TSC page
- per-vcpu tsc_offset in debugfs
- accelerated INS/OUTS in nVMX
- cleanups and fixes"
* tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (140 commits)
KVM: MIPS: Drop dubious EntryHi optimisation
KVM: MIPS: Invalidate TLB by regenerating ASIDs
KVM: MIPS: Split kernel/user ASID regeneration
KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
KVM: arm/arm64: vgic: Don't flush/sync without a working vgic
KVM: arm64: Require in-kernel irqchip for PMU support
KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register
KVM: PPC: Book3S PR: Support 64kB page size on POWER8E and POWER8NVL
KVM: PPC: Book3S: Remove duplicate setting of the B field in tlbie
KVM: PPC: BookE: Fix a sanity check
KVM: PPC: Book3S HV: Take out virtual core piggybacking code
KVM: PPC: Book3S: Treat VTB as a per-subcore register, not per-thread
ARM: gic-v3: Work around definition of gic_write_bpr1
KVM: nVMX: Fix the NMI IDT-vectoring handling
KVM: VMX: Enable MSR-BASED TPR shadow even if APICv is inactive
KVM: nVMX: Fix reload apic access page warning
kvmconfig: add virtio-gpu to config fragment
config: move x86 kvm_guest.config to a common location
arm64: KVM: Remove duplicating init code for setting VMID
ARM: KVM: Support vgic-v3
...
This is a clean up. In get_irq_table() only if DTE entry is changed
iommu_completion_wait() need be called. Otherwise no need to do it.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The current code missed freeing domain id when free a domain of
struct dma_ops_domain.
Signed-off-by: Baoquan He <bhe@redhat.com>
Fixes: ec487d1a11 ('x86, AMD IOMMU: add domain allocation and deallocation functions')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Change it as it's designed for and keep it consistent with other
places.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The semaphore used by the AMD IOMMU to signal command
completion lived on the stack until now, which was safe as
the driver busy-waited on the semaphore with IRQs disabled,
so the stack can't go away under the driver.
But the recently introduced vmap-based stacks break this as
the physical address of the semaphore can't be determinded
easily anymore. The driver used the __pa() macro, but that
only works in the direct-mapping. The result were
Completion-Wait timeout errors seen by the IOMMU driver,
breaking system boot.
Since putting the semaphore on the stack is bad design
anyway, move the semaphore into 'struct amd_iommu'. It is
protected by the per-iommu lock and now in the direct
mapping again. This fixes the Completion-Wait timeout errors
and makes AMD IOMMU systems boot again with vmap-based
stacks enabled.
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introduce struct iommu_dev_data.use_vapic flag, which IOMMU driver
uses to determine if it should enable vAPIC support, by setting
the ga_mode bit in the device's interrupt remapping table entry.
Currently, it is enabled for all pass-through device if vAPIC mode
is enabled.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This patch implements irq_set_vcpu_affinity() function to set up interrupt
remapping table entry with vapic mode for pass-through devices.
In case requirements for vapic mode are not met, it falls back to set up
the IRTE in legacy mode.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This patch adds AMD IOMMU guest virtual APIC log (GALOG) handler.
When IOMMU hardware receives an interrupt targeting a blocking vcpu,
it creates an entry in the GALOG, and generates an interrupt to notify
the AMD IOMMU driver.
At this point, the driver processes the log entry, and notify the SVM
driver via the registered iommu_ga_log_notifier function.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This patch enables support for the new 128-bit IOMMU IRTE format,
which can be used for both legacy and vapic interrupt remapping modes.
It replaces the existing operations on IRTE, which can only support
the older 32-bit IRTE format, with calls to the new struct amd_irt_ops.
It also provides helper functions for setting up, accessing, and
updating interrupt remapping table entries in different mode.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Currently, IOMMU support two interrupt remapping table entry formats,
32-bit (legacy) and 128-bit (GA). The spec also implies that it might
support additional modes/formats in the future.
So, this patch introduces the new struct amd_irte_ops, which allows
the same code to work with different irte formats by providing hooks
for various operations on an interrupt remapping table entry.
Suggested-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Move existing unions and structs for accessing/managing IRTE to a proper
header file. This is mainly to simplify variable declarations in subsequent
patches.
Besides, this patch also introduces new struct irte_ga for the new
128-bit IRTE format.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Fix to return a negative error code from the alloc_irq_index() error
handling case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Fixes the following sparse warning:
drivers/iommu/amd_iommu.c:106:1: warning:
symbol '__pcpu_scope_flush_queue' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer. Thus the pointer can point to const data.
However the attributes do not have to be a bitfield. Instead unsigned
long will do fine:
1. This is just simpler. Both in terms of reading the code and setting
attributes. Instead of initializing local attributes on the stack
and passing pointer to it to dma_set_attr(), just set the bits.
2. It brings safeness and checking for const correctness because the
attributes are passed by value.
Semantic patches for this change (at least most of them):
virtual patch
virtual context
@r@
identifier f, attrs;
@@
f(...,
- struct dma_attrs *attrs
+ unsigned long attrs
, ...)
{
...
}
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
and
// Options: --all-includes
virtual patch
virtual context
@r@
identifier f, attrs;
type t;
@@
t f(..., struct dma_attrs *attrs);
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A two-level page-table can map up to 1GB of address space.
With the IOVA allocator now in use, the allocated addresses
are often more closely to 4G, which requires the address
space to be increased much more often. Avoid that by using a
three-level page-table by default.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Not doing so might cause IO-Page-Faults when a device uses
an alias request-id and the alias-dte is left in a lower
page-mode which does not cover the address allocated from
the iova-allocator.
Fixes: 492667dacc ('x86/amd-iommu: Remove amd_iommu_pd_table')
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This is better than storing an extra pointer in struct
protection_domain, because this pointer can now be removed
from the struct.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Before a dma_ops_domain can be freed, we need to make sure
it is not longer referenced by the flush queue. So empty the
queue before a dma_ops_domain can be freed.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This domain type is not yet handled in the
iommu_ops->domain_free() call-back. Fix that.
Fixes: 0bb6e243d7 ('iommu/amd: Support IOMMU_DOMAIN_DMA type allocation')
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Optimize these functions so that they need only one call
into the address alloctor. This also saves a couple of
io-tlb flushes in the unmap_sg path.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This function converts dma_data_direction to
iommu-protection flags. This will be needed on multiple
places in the code, so this will save some code.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In case the queue doesn't fill up, we flush the TLB at least
10ms after the unmap happened to make sure that the TLB is
cleaned up.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
With the flush queue the IOMMU TLBs will not be flushed at
every dma-ops unmap operation. The unmapped ranges will be
queued and flushed at once, when the queue is full. This
makes unmapping operations a lot faster (on average) and
restores the performance of the old address allocator.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
If domain == NULL is passed to the function, it will queue a
completion-wait command on all IOMMUs in the system.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The flush queue is the equivalent to defered-flushing in the
Intel VT-d driver. This patch sets up the data structures
needed for this.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Remove the old address allocation code and make use of the
generic IOVA allocator that is also used by other dma-ops
implementations.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Use the iommu-api map/unmap functions instead. This will be
required anyway when IOVA code is used for address
allocation.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Put the MSI-range, the HT-range and the MMIO ranges of PCI
devices into that range, so that these addresses are not
allocated for DMA.
Copy this address list into every created dma_ops_domain.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The default domain for a device might also be
identity-mapped. In this case the kernel would crash when
unity mappings are defined for the device. Fix that by
making sure the domain is a dma_ops domain.
Fixes: 0bb6e243d7 ('iommu/amd: Support IOMMU_DOMAIN_DMA type allocation')
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Joerg Roedel <jroedel@suse.de>
AMD has more drivers will use ACPI to platform bus driver later,
all those devices need iommu support, for example: eMMC driver.
For latest AMD eMMC controller, it will utilize sdhci-acpi.c driver,
which will rely on platform bus to match device and driver, where we
will set 'dev' of struct platform_device as map_sg parameter passing
to iommu driver for DMA request, so the iommu-ops are needed on the
platform bus.
Signed-off-by: Wan Zongshun <Vincent.Wan@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The updates include:
* Rate limiting for the VT-d fault handler
* Remove statistics code from the AMD IOMMU driver. It is unused
and should be replaced by something more generic if needed
* Per-domain pagesize-bitmaps in IOMMU core code to support
systems with different types of IOMMUs
* Support for ACPI devices in the AMD IOMMU driver
* 4GB mode support for Mediatek IOMMU driver
* ARM-SMMU updates from Will Deacon:
- Support for 64k pages with SMMUv1 implementations
(e.g MMU-401)
- Remove open-coded 64-bit MMIO accessors
- Initial support for 16-bit VMIDs, as supported by some
ThunderX SMMU implementations
- A couple of errata workarounds for silicon in the
field
* Various fixes here and there
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJXPeM1AAoJECvwRC2XARrjA2QP/2Cz+pVkpQCuvhAse57eN4rB
wWXKTjqSFZ4PcA3Vu5yvX6XMv15g46xXFJAhf2spE5//8+xgFfYBgkBRpnqu1brw
SL6f8A912MnfMRgWqcdKkJNeHbiN0kOvcIQv1J8GNfciqMiyYFhiLP6fFiRmWR/F
XDBjUeFZ5+Uwf1BAGqw0cVPexeakEbsLHUGqxFsh5g2T4i43aHzO2HJT3IdwWHDt
F2ivs8gNFGBeJEyzhW8TD0rOEEyHAnM3N18qPEU9+dD0UmjnTQPymEZSbsGW5d4j
Cn40QYlA+Zmbwgx6LaDVChzQyRJu6O3uvFThyRviiYKCri/Nc9cUT4vHsFGU4MXb
1d3bqrgzaw7vw31BN7S1Py3MV+WpVnEYjFm2O+hW28OjtSpm6ZvbI8wc0rF4UT/I
KgL0gSeA8tp25uVISM+ktpIrObYsAcoCz8nvurpDv2AGkKRzhyoSze0Jg43rusD8
BH7iFWu1LRPlulTGlrHMtNmbZeEApUPbObcQAOcrBOj9vjuFaZ8qduZmB+hwS2iV
p9atn+54LmGO0LuzqsGrhApIeXTeTZSrGyjlbUADWBJlTw8Xyk/CR39Wf3m/Xmpr
DiJ/5oa8SKQtNbwvbScn1+sInNWP/pH/JgnRO3Yvqth8HWF/DlpzNj5XxAB8czwr
qjk9WjpEXun50ocPFQeS
=jpPD
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"The updates include:
- rate limiting for the VT-d fault handler
- remove statistics code from the AMD IOMMU driver. It is unused and
should be replaced by something more generic if needed
- per-domain pagesize-bitmaps in IOMMU core code to support systems
with different types of IOMMUs
- support for ACPI devices in the AMD IOMMU driver
- 4GB mode support for Mediatek IOMMU driver
- ARM-SMMU updates from Will Deacon:
- support for 64k pages with SMMUv1 implementations (e.g MMU-401)
- remove open-coded 64-bit MMIO accessors
- initial support for 16-bit VMIDs, as supported by some ThunderX
SMMU implementations
- a couple of errata workarounds for silicon in the field
- various fixes here and there"
* tag 'iommu-updates-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (44 commits)
iommu/arm-smmu: Use per-domain page sizes.
iommu/amd: Remove statistics code
iommu/dma: Finish optimising higher-order allocations
iommu: Allow selecting page sizes per domain
iommu: of: enforce const-ness of struct iommu_ops
iommu: remove unused priv field from struct iommu_ops
iommu/dma: Implement scatterlist segment merging
iommu/arm-smmu: Clear cache lock bit of ACR
iommu/arm-smmu: Support SMMUv1 64KB supplement
iommu/arm-smmu: Decouple context format from kernel config
iommu/arm-smmu: Tidy up 64-bit/atomic I/O accesses
io-64-nonatomic: Add relaxed accessor variants
iommu/arm-smmu: Work around MMU-500 prefetch errata
iommu/arm-smmu: Convert ThunderX workaround to new method
iommu/arm-smmu: Differentiate specific implementations
iommu/arm-smmu: Workaround for ThunderX erratum #27704
iommu/arm-smmu: Add support for 16 bit VMID
iommu/amd: Move get_device_id() and friends to beginning of file
iommu/amd: Don't use IS_ERR_VALUE to check integer values
iommu/amd: Signedness bug in acpihid_device_group()
...
Enumeration
Refine PCI support check in pcibios_init() (Adrian-Ken Rueegsegger)
Provide common functions for ECAM mapping (Jayachandran C)
Allow all PCIe services on non-ACPI host bridges (Jon Derrick)
Remove return values from pcie_port_platform_notify() and relatives (Jon Derrick)
Widen portdrv service type from 4 bits to 8 bits (Keith Busch)
Add Downstream Port Containment portdrv service type (Keith Busch)
Add Downstream Port Containment driver (Keith Busch)
Resource management
Identify Enhanced Allocation (EA) BAR Equivalent resources in sysfs (Alex Williamson)
Supply CPU physical address (not bus address) to iomem_is_exclusive() (Bjorn Helgaas)
alpha: Call iomem_is_exclusive() for IORESOURCE_MEM, but not IORESOURCE_IO (Bjorn Helgaas)
Mark Broadwell-EP Home Agent 1 as having non-compliant BARs (Prarit Bhargava)
Disable all BAR sizing for devices with non-compliant BARs (Prarit Bhargava)
Move PCI I/O space management from OF to PCI core code (Tomasz Nowicki)
PCI device hotplug
acpiphp_ibm: Avoid uninitialized variable reference (Dan Carpenter)
Use cached copy of PCI_EXP_SLTCAP_HPC bit (Lukas Wunner)
Virtualization
Mark Intel i40e NIC INTx masking as broken (Alex Williamson)
Reverse standard ACS vs device-specific ACS enabling (Alex Williamson)
Work around Intel Sunrise Point PCH incorrect ACS capability (Alex Williamson)
IOMMU
Add pci_add_dma_alias() to abstract implementation (Bjorn Helgaas)
Move informational printk to pci_add_dma_alias() (Bjorn Helgaas)
Add support for multiple DMA aliases (Jacek Lawrynowicz)
Add DMA alias quirk for mic_x200_dma (Jacek Lawrynowicz)
Thunderbolt
Fix double free of drom buffer (Andreas Noever)
Add Intel Thunderbolt device IDs (Lukas Wunner)
Fix typos and magic number (Lukas Wunner)
Support 1st gen Light Ridge controller (Lukas Wunner)
Generic host bridge driver
Use generic ECAM API (Jayachandran C)
Cavium ThunderX host bridge driver
Don't clobber read-only bits in bridge config registers (David Daney)
Use generic ECAM API (Jayachandran C)
Freescale i.MX6 host bridge driver
Use enum instead of bool for variant indicator (Andrey Smirnov)
Implement reset sequence for i.MX6+ (Andrey Smirnov)
Factor out ref clock enable (Bjorn Helgaas)
Add initial imx6sx support (Christoph Fritz)
Add reset-gpio-active-high boolean property to DT (Petr Štetiar)
Add DT property for link gen, default to Gen1 (Tim Harvey)
dts: Specify imx6qp version of PCIe core (Andrey Smirnov)
dts: Fix PCIe reset GPIO polarity on Toradex Apalis Ixora (Petr Štetiar)
Marvell Armada host bridge driver
add DT binding for Marvell Armada 7K/8K PCIe controller (Thomas Petazzoni)
Add driver for Marvell Armada 7K/8K PCIe controller (Thomas Petazzoni)
Marvell MVEBU host bridge driver
Constify mvebu_pcie_pm_ops structure (Jisheng Zhang)
Use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS for mvebu_pcie_pm_ops (Jisheng Zhang)
Microsoft Hyper-V host bridge driver
Report resources release after stopping the bus (Vitaly Kuznetsov)
Add explicit barriers to config space access (Vitaly Kuznetsov)
Renesas R-Car host bridge driver
Select PCI_MSI_IRQ_DOMAIN (Arnd Bergmann)
Synopsys DesignWare host bridge driver
Remove incorrect RC memory base/limit configuration (Gabriele Paoloni)
Move Root Complex setup code to dw_pcie_setup_rc() (Jisheng Zhang)
TI Keystone host bridge driver
Add error IRQ handler (Murali Karicheri)
Remove unnecessary goto statement (Murali Karicheri)
Miscellaneous
Fix spelling errors (Colin Ian King)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXPdMKAAoJEFmIoMA60/r8ofUP/j0zyzn24f0xY1wLeGJ8geB9
6nHk1QdkPqwCiXZahEcnA5HMlFCl/ciWjjsoCqeMlvS6NXkX13KGcc1UGZszelTs
68bFhyBKqcoMn0it53vBjBXnkfA64PmlxwY/T1ADulxL8amFOCpjjBruZ8pxJ/U7
r6uHvhxUxHCRF7hMmpNN+V5XWXWCFFkPJZvxOTkglaxkbdnhZ0h0Xz9p9liUvjPH
mBE72E3WUjiGogXGoLAPDclz1NI6rhRVUyTRcQ8EWaOwitV3OqMuDpAwoWH62ZZJ
iorCkQk2/eKfN6OA6UgZh4loauAty0FeoZDX7ZVftQr52IpAzRUVx1oAq0J7u4ga
KRX37mlK/53UcMZyv9Lz2kw4KjaLLELiInzcF+w3Bbov4UhY4/sL5uh9eNMFvSUU
iZuY+GFlceL0P6wZuVKU5U8td/CyBr3f5vY/3htxuYHE1xJq4FkL92JpWRCvwpVr
YdCzocscw73Yn8ZMplt8DX2fyabN7HyGezbQISrDDGY6T0ZDsRRKc6FFAt4xF+ta
JJ+bcY8OcXtxGw6SXtrscL7vNXdR7Zg1HBSa8Sl/CopCdW9zs0VdwgFoxgORcWDT
mphIgt57DMzaiUUaV8FRQz0mSLixnAcCEfGjVbAEEw3SP5ZChGfS3EknKb/CPRyk
TD6I3pXTBhTWXd8aS113
=68Iz
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Refine PCI support check in pcibios_init() (Adrian-Ken Rueegsegger)
- Provide common functions for ECAM mapping (Jayachandran C)
- Allow all PCIe services on non-ACPI host bridges (Jon Derrick)
- Remove return values from pcie_port_platform_notify() and relatives (Jon Derrick)
- Widen portdrv service type from 4 bits to 8 bits (Keith Busch)
- Add Downstream Port Containment portdrv service type (Keith Busch)
- Add Downstream Port Containment driver (Keith Busch)
Resource management:
- Identify Enhanced Allocation (EA) BAR Equivalent resources in sysfs (Alex Williamson)
- Supply CPU physical address (not bus address) to iomem_is_exclusive() (Bjorn Helgaas)
- alpha: Call iomem_is_exclusive() for IORESOURCE_MEM, but not IORESOURCE_IO (Bjorn Helgaas)
- Mark Broadwell-EP Home Agent 1 as having non-compliant BARs (Prarit Bhargava)
- Disable all BAR sizing for devices with non-compliant BARs (Prarit Bhargava)
- Move PCI I/O space management from OF to PCI core code (Tomasz Nowicki)
PCI device hotplug:
- acpiphp_ibm: Avoid uninitialized variable reference (Dan Carpenter)
- Use cached copy of PCI_EXP_SLTCAP_HPC bit (Lukas Wunner)
Virtualization:
- Mark Intel i40e NIC INTx masking as broken (Alex Williamson)
- Reverse standard ACS vs device-specific ACS enabling (Alex Williamson)
- Work around Intel Sunrise Point PCH incorrect ACS capability (Alex Williamson)
IOMMU:
- Add pci_add_dma_alias() to abstract implementation (Bjorn Helgaas)
- Move informational printk to pci_add_dma_alias() (Bjorn Helgaas)
- Add support for multiple DMA aliases (Jacek Lawrynowicz)
- Add DMA alias quirk for mic_x200_dma (Jacek Lawrynowicz)
Thunderbolt:
- Fix double free of drom buffer (Andreas Noever)
- Add Intel Thunderbolt device IDs (Lukas Wunner)
- Fix typos and magic number (Lukas Wunner)
- Support 1st gen Light Ridge controller (Lukas Wunner)
Generic host bridge driver:
- Use generic ECAM API (Jayachandran C)
Cavium ThunderX host bridge driver:
- Don't clobber read-only bits in bridge config registers (David Daney)
- Use generic ECAM API (Jayachandran C)
Freescale i.MX6 host bridge driver:
- Use enum instead of bool for variant indicator (Andrey Smirnov)
- Implement reset sequence for i.MX6+ (Andrey Smirnov)
- Factor out ref clock enable (Bjorn Helgaas)
- Add initial imx6sx support (Christoph Fritz)
- Add reset-gpio-active-high boolean property to DT (Petr Štetiar)
- Add DT property for link gen, default to Gen1 (Tim Harvey)
- dts: Specify imx6qp version of PCIe core (Andrey Smirnov)
- dts: Fix PCIe reset GPIO polarity on Toradex Apalis Ixora (Petr Štetiar)
Marvell Armada host bridge driver:
- add DT binding for Marvell Armada 7K/8K PCIe controller (Thomas Petazzoni)
- Add driver for Marvell Armada 7K/8K PCIe controller (Thomas Petazzoni)
Marvell MVEBU host bridge driver:
- Constify mvebu_pcie_pm_ops structure (Jisheng Zhang)
- Use SET_NOIRQ_SYSTEM_SLEEP_PM_OPS for mvebu_pcie_pm_ops (Jisheng Zhang)
Microsoft Hyper-V host bridge driver:
- Report resources release after stopping the bus (Vitaly Kuznetsov)
- Add explicit barriers to config space access (Vitaly Kuznetsov)
Renesas R-Car host bridge driver:
- Select PCI_MSI_IRQ_DOMAIN (Arnd Bergmann)
Synopsys DesignWare host bridge driver:
- Remove incorrect RC memory base/limit configuration (Gabriele Paoloni)
- Move Root Complex setup code to dw_pcie_setup_rc() (Jisheng Zhang)
TI Keystone host bridge driver:
- Add error IRQ handler (Murali Karicheri)
- Remove unnecessary goto statement (Murali Karicheri)
Miscellaneous:
- Fix spelling errors (Colin Ian King)"
* tag 'pci-v4.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (48 commits)
PCI: Disable all BAR sizing for devices with non-compliant BARs
x86/PCI: Mark Broadwell-EP Home Agent 1 as having non-compliant BARs
PCI: Identify Enhanced Allocation (EA) BAR Equivalent resources in sysfs
PCI, of: Move PCI I/O space management to PCI core code
PCI: generic, thunder: Use generic ECAM API
PCI: Provide common functions for ECAM mapping
PCI: hv: Add explicit barriers to config space access
PCI: Use cached copy of PCI_EXP_SLTCAP_HPC bit
PCI: Add Downstream Port Containment driver
PCI: Add Downstream Port Containment portdrv service type
PCI: Widen portdrv service type from 4 bits to 8 bits
PCI: designware: Remove incorrect RC memory base/limit configuration
PCI: hv: Report resources release after stopping the bus
ARM: dts: imx6qp: Specify imx6qp version of PCIe core
PCI: imx6: Implement reset sequence for i.MX6+
PCI: imx6: Use enum instead of bool for variant indicator
PCI: thunder: Don't clobber read-only bits in bridge config registers
thunderbolt: Fix double free of drom buffer
PCI: rcar: Select PCI_MSI_IRQ_DOMAIN
PCI: armada: Add driver for Marvell Armada 7K/8K PCIe controller
...
The statistics are not really used for anything and should
be replaced by generic and per-device statistic counters.
Remove the code for now.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Use the better 'var < 0' check.
Fixes: 7aba6cb9ee ('iommu/amd: Make call-sites of get_device_id aware of its return value')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
"devid" needs to be signed for the error handling to work.
Fixes: b097d11a0f ('iommu/amd: Manage iommu_group for ACPI HID devices')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Commit 61289cb ('iommu/amd: Remove old alias handling code')
removed the old alias handling code from the AMD IOMMU
driver because this is now handled by the IOMMU core code.
But this also removed the handling of PCI aliases, which is
not handled by the core code. This caused issues with PCI
devices that have hidden PCIe-to-PCI bridges that rewrite
the request-id.
Fix this bug by re-introducing some of the removed functions
from commit 61289cbaf6 and add a alias field
'struct iommu_dev_data'. This field carrys the return value
of the get_alias() function and uses that instead of the
amd_iommu_alias_table[] array in the code.
Fixes: 61289cbaf6 ('iommu/amd: Remove old alias handling code')
Cc: stable@vger.kernel.org # v4.4+
Tested-by: Tomasz Golinski <tomaszg@math.uwb.edu.pl>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
AMD Uart DMA belongs to ACPI HID type device, and its driver
is basing on AMBA Bus, need also IOMMU support.
This patch is just to set the AMD iommu callbacks for amba bus.
Signed-off-by: Wan Zongshun <Vincent.Wan@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This patch creates a new function for finding or creating an IOMMU
group for acpihid(ACPI Hardware ID) device.
The acpihid devices with the same devid will be put into same group and
there will have the same domain id and share the same page table.
Signed-off-by: Wan Zongshun <Vincent.Wan@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Current IOMMU driver make assumption that the downstream devices are PCI.
With the newly added ACPI-HID IVHD device entry support, this is no
longer true. This patch is to add dev type check and to distinguish the
pci and acpihid device code path.
Signed-off-by: Wan Zongshun <Vincent.Wan@amd.com>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This patch is to make the call-sites of get_device_id aware of its
return value.
Signed-off-by: Wan Zongshun <Vincent.Wan@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This patch introduces acpihid_map, which is used to store
the new IVHD device entry extracted from BIOS IVRS table.
It also provides a utility function add_acpi_hid_device(),
to add this types of devices to the map.
Signed-off-by: Wan Zongshun <Vincent.Wan@amd.com>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Detach the device that is about to be removed from its
domain (if it has one) to clear any related state like DTE
entry and device's ATS state.
Reported-by: Kelly Zytaruk <Kelly.Zytaruk@amd.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In below commit alias DTE is set when its peripheral is
setting DTE. However there's a code bug here to wrongly
set the alias DTE, correct it in this patch.
commit e25bfb56ea
Author: Joerg Roedel <jroedel@suse.de>
Date: Tue Oct 20 17:33:38 2015 +0200
iommu/amd: Set alias DTE in do_attach/do_detach
Signed-off-by: Baoquan He <bhe@redhat.com>
Tested-by: Mark Hounschell <markh@compro.net>
Cc: stable@vger.kernel.org # v4.4
Signed-off-by: Joerg Roedel <jroedel@suse.de>
get_device_id() returns an unsigned short device id. It never fails and
it never returns a negative so we can remove this condition.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Preallocate between 4 and 8 apertures when a device gets it
dma_mask. With more apertures we reduce the lock contention
of the domain lock significantly.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Make this pointer percpu so that we start searching for new
addresses in the range we last stopped and which is has a
higher probability of being still in the cache.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This allows to build up the page-tables without holding any
locks. As a consequence it removes the need to pre-populate
dma_ops page-tables.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Don't flush the iommu tlb when we free something behind the
current next_bit pointer. Update the next_bit pointer
instead and let the flush happen on the next wraparound in
the allocation path.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The flushing of iommu tlbs is now done on a per-range basis.
So there is no need anymore for domain-wide flush tracking.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
It points to the next aperture index to allocate from. We
don't need the full address anymore because this is now
tracked in struct aperture_range.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Moving it before the pte_pages array puts in into the same
cache-line as the spin-lock and the bitmap array pointer.
This should safe a cache-miss.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
There have been present PTEs which in theory could have made
it to the IOMMU TLB. Flush the addresses out on the error
path to make sure no stale entries remain.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts. They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve". __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".
Over time, callers had a requirement to not block when fallback options
were available. Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.
This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative. High priority users continue to use
__GFP_HIGH. __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim. __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim. __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.
This patch then converts a number of sites
o __GFP_ATOMIC is used by callers that are high priority and have memory
pools for those requests. GFP_ATOMIC uses this flag.
o Callers that have a limited mempool to guarantee forward progress clear
__GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
into this category where kswapd will still be woken but atomic reserves
are not used as there is a one-entry mempool to guarantee progress.
o Callers that are checking if they are non-blocking should use the
helper gfpflags_allow_blocking() where possible. This is because
checking for __GFP_WAIT as was done historically now can trigger false
positives. Some exceptions like dm-crypt.c exist where the code intent
is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
flag manipulations.
o Callers that built their own GFP flags instead of starting with GFP_KERNEL
and friends now also need to specify __GFP_KSWAPD_RECLAIM.
The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.
The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL. They may
now wish to specify __GFP_KSWAPD_RECLAIM. It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The driver always uses a constant size for these buffers
anyway, so there is no need to waste memory to store the
sizes.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This mostly removes the code to create dev_data structures
for alias device ids. They are not necessary anymore, as
they were only created for device ids which have no struct
pci_dev associated with it. But these device ids are
handled in a simpler way now, so there is no need for this
code anymore.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
With this we don't have to create dev_data entries for
non-existent devices (which only exist as request-ids).
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The alias list is handled aleady by iommu core code. No need
anymore to handle it in this part of the AMD IOMMU code
Signed-off-by: Joerg Roedel <jroedel@suse.de>
The condition in the BUG_ON is an indicator of a BUG, but no
reason to kill the code path. Turn it into a WARN_ON and
bail out if it is hit.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
During device assignment/deassignment the flags in the DTE
get lost, which might cause spurious faults, for example
when the device tries to access the system management range.
Fix this by not clearing the flags with the rest of the DTE.
Reported-by: G. Richard Bellamy <rbellamy@pteradigm.com>
Tested-by: G. Richard Bellamy <rbellamy@pteradigm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <jroedel@suse.de>
When a device group is detached from its domain, the iommu
core code calls into the iommu driver to detach each device
individually.
Before this functionality went into the iommu core code, it
was implemented in the drivers, also in the AMD IOMMU
driver as the device alias handling code.
This code is still present, as there might be aliases that
don't exist as real PCI devices (and are therefore invisible
to the iommu core code).
Unfortunatly it might happen now, that a device is unbound
multiple times from its domain, first by the alias handling
code and then by the iommu core code (or vice verca).
This ends up in the do_detach function which dereferences
the dev_data->domain pointer. When the device is already
detached, this pointer is NULL and we get a kernel oops.
Removing the alias code completly is not an option, as that
would also remove the code which handles invisible aliases.
The code could be simplified, but this is too big of a
change outside the merge window.
For now, just check the dev_data->domain pointer in
do_detach and bail out if it is NULL.
Reported-by: Andreas Hartmann <andihartmann@freenet.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
With the grouping of multi-function devices a non-ATS
capable device might also end up in the same domain as an
IOMMUv2 capable device.
So handle this situation gracefully and don't consider it a
bug anymore.
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Some AMD systems also have non-PCI devices which can do DMA.
Those can't be handled by the AMD IOMMU, as the hardware can
only handle PCI. These devices would end up with no dma_ops,
as neither the per-device nor the global dma_ops will get
set. SWIOTLB provides global dma_ops when it is active, so
make sure there are global dma_ops too when swiotlb is
disabled.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
In passthrough mode (iommu=pt) all devices are identity
mapped. If a device does not support 64bit DMA it might
still need remapping. Make sure swiotlb is initialized to
provide this remapping.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Since devices with IOMMUv2 functionality might be in the
same group as devices without it, allow those devices in
IOMMUv2 domains too.
Otherwise attaching the group with the IOMMUv2 device to the
domain will fail.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Remove the AMD IOMMU driver implementation for passthrough
mode and rely on the new iommu core features for that.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Four fixes have queued up to fix regressions introduced after v4.1:
* Don't fail IOMMU driver initialization when the add_device
call-back returns -ENODEV, as that just means that the device
is not translated by the IOMMU. This is pretty common on ARM.
* Two fixes for the ARM-SMMU driver for a wrong feature check
and to remove a redundant NULL check.
* A fix for the AMD IOMMU driver to fix a boot panic on systems
where the BIOS requests Unity Mappings in the IVRS table.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJVlBZTAAoJECvwRC2XARrj6ngQAI/fjz1cW4WYVDBGPffoHhtB
9BH72XQTyt2OaQPsiECnWkl4bAoJ0TmS6dvcYTf75znqClL1Eez/TqfATEPOSFwI
7N0qkkVc3OffvF3XnxksNNV4tLaojdIFNdxAVrrOOuWDeNKC4Rkvcx+Typ9Y7CxI
YR4+qdkPqjYVn13JVMvZDr6SLAnvfHPSIcW1CP3vQzH6w4mWJSmRMLd42Xel1Kb7
hvEDqlT6k6KJxBt3W601eo3sgqZ1AJTFiY4RFh0diHbHQlgg1PcsbWsL5QJMHozi
SSHFDCxag9NgHy97OTcGuDptD9F9fI4+t1ANtWULis7+sN5Bx5/xsG/VRJ9fpiMN
RNlcCMCufC89EHXdoPuAvOcoPmUHqv1CU9I6+DpOo9FQrGMoXDrdosApNaJZ73E/
qtgzJN0hueeBOvB7Hk+U+mI4BSzAtGguHoO+LzjrZBzoW5L9WWuznmHYriLE0bMm
uKnZFBEnXFe8DugQ3ta7PkyzIWsnD0O++NRueN9pSOLvOUpNk6Iddv4hER9QwwPA
RQOfsASEo1ResAd9SJGnPX1MQxXxl4OB/9R1Q648lQguAj7WhV1nn21cISgLjESC
nEKma+A7dGT6nOTm/wK+wokAgndOGlztMU9wJBK12ozxrhbO+0VP/oTjhhmvcWGb
DbpzhyeCpi1qLmsZe0x7
=NzGR
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pul IOMMU fixes from Joerg Roedel:
"Four fixes have queued up to fix regressions introduced after v4.1:
- Don't fail IOMMU driver initialization when the add_device
call-back returns -ENODEV, as that just means that the device is
not translated by the IOMMU. This is pretty common on ARM.
- Two fixes for the ARM-SMMU driver for a wrong feature check and to
remove a redundant NULL check.
- A fix for the AMD IOMMU driver to fix a boot panic on systems where
the BIOS requests Unity Mappings in the IVRS table"
* tag 'iommu-fixes-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Introduce protection_domain_init() function
iommu/arm-smmu: Delete an unnecessary check before the function call "free_io_pgtable_ops"
iommu/arm-smmu: Fix broken ATOS check
iommu: Ignore -ENODEV errors from add_device call-back
This function contains the common parts between the
initialization of dma_ops_domains and usual protection
domains. This also fixes a long-standing bug which was
uncovered by recent changes, in which the api_lock was not
initialized for dma_ops_domains.
Reported-by: George Wang <xuw2015@gmail.com>
Tested-by: George Wang <xuw2015@gmail.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This time with bigger changes than usual:
* A new IOMMU driver for the ARM SMMUv3. This IOMMU is pretty
different from SMMUv1 and v2 in that it is configured through
in-memory structures and not through the MMIO register region.
The ARM SMMUv3 also supports IO demand paging for PCI devices
with PRI/PASID capabilities, but this is not implemented in
the driver yet.
* Lots of cleanups and device-tree support for the Exynos IOMMU
driver. This is part of the effort to bring Exynos DRM support
upstream.
* Introduction of default domains into the IOMMU core code. The
rationale behind this is to move functionalily out of the
IOMMU drivers to common code to get to a unified behavior
between different drivers.
The patches here introduce a default domain for iommu-groups
(isolation groups). A device will now always be attached to a
domain, either the default domain or another domain handled by
the device driver. The IOMMU drivers have to be modified to
make use of that feature. So long the AMD IOMMU driver is
converted, with others to follow.
* Patches for the Intel VT-d drvier to fix DMAR faults that
happen when a kdump kernel boots. When the kdump kernel boots
it re-initializes the IOMMU hardware, which destroys all
mappings from the crashed kernel. As this happens before
the endpoint devices are re-initialized, any in-flight DMA
causes a DMAR fault. These faults cause PCI master aborts,
which some devices can't handle properly and go into an
undefined state, so that the device driver in the kdump kernel
fails to initialize them and the dump fails.
This is now fixed by copying over the mapping structures (only
context tables and interrupt remapping tables) from the old
kernel and keep the old mappings in place until the device
driver of the new kernel takes over. This emulates the the
behavior without an IOMMU to the best degree possible.
* A couple of other small fixes and cleanups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJViSIWAAoJECvwRC2XARrjl+cP/2FXS7SWDq91VFiIZfXfPt8H
C5Ef3OGWCnMzn4MKE1ExkyDhC+AH6pF1s4zi3XfT6b8iOA+DUpa51rxJjixszt31
tQwmvB7hWu4mznGxSN7EA0Pm0l/v3tBAY5BvG598af0aNZFFJ6po+31MyQA5X67+
6xpqLbH/hm4IZhFBOEzZwxuWWsNxlJwwzKqeAjGyqeUhdruRYZiPHWQ17sDjwLM/
QcVvWBb7meOtKv1OCtpzC4sglSk3scbAfEHMEBuDt8cI6OD7/t2VzPXDWWZuXGqK
nRAxCT7NrXvyOnv0xwdn0j5p1FUGipVxvhsGWX7sJsh3UHWm8Q+5rRKFFVI9pm50
QcMjiIMazK5VwcAkDnLoDgSz4Zz6TfHXEOqSJ2vjTPt2VDP/J9zdM2iwHx2ujicI
mIkrtmsBprvAPx6e9jcqiS5L/Xy1y1xewXuGxa5F2XOjqdoXkPqaupjlyrWzrChA
MC8w67FdzjHDPCfIqfIWZpJQj4f1OFQGd3HS5HpkBACxIwCg85gRw4DEMfD/sirO
BL2VM0RO/bB5+4R0AY7UA2VszQvNMqedj1bA4vAbrnXqOh8BI/0GgeoWiBMXhyX1
qvT1jl+cxuCm5tgBOMUGYoRyF+//bH+l78jLsTYaWRtuVzFlkAX6idNvYYK0dmNt
tLII2IIZBk87P3pF4d6A
=Zicw
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU updates from Joerg Roedel:
"This time with bigger changes than usual:
- A new IOMMU driver for the ARM SMMUv3.
This IOMMU is pretty different from SMMUv1 and v2 in that it is
configured through in-memory structures and not through the MMIO
register region. The ARM SMMUv3 also supports IO demand paging for
PCI devices with PRI/PASID capabilities, but this is not
implemented in the driver yet.
- Lots of cleanups and device-tree support for the Exynos IOMMU
driver. This is part of the effort to bring Exynos DRM support
upstream.
- Introduction of default domains into the IOMMU core code.
The rationale behind this is to move functionalily out of the IOMMU
drivers to common code to get to a unified behavior between
different drivers. The patches here introduce a default domain for
iommu-groups (isolation groups).
A device will now always be attached to a domain, either the
default domain or another domain handled by the device driver. The
IOMMU drivers have to be modified to make use of that feature. So
long the AMD IOMMU driver is converted, with others to follow.
- Patches for the Intel VT-d drvier to fix DMAR faults that happen
when a kdump kernel boots.
When the kdump kernel boots it re-initializes the IOMMU hardware,
which destroys all mappings from the crashed kernel. As this
happens before the endpoint devices are re-initialized, any
in-flight DMA causes a DMAR fault. These faults cause PCI master
aborts, which some devices can't handle properly and go into an
undefined state, so that the device driver in the kdump kernel
fails to initialize them and the dump fails.
This is now fixed by copying over the mapping structures (only
context tables and interrupt remapping tables) from the old kernel
and keep the old mappings in place until the device driver of the
new kernel takes over. This emulates the the behavior without an
IOMMU to the best degree possible.
- A couple of other small fixes and cleanups"
* tag 'iommu-updates-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (69 commits)
iommu/amd: Handle large pages correctly in free_pagetable
iommu/vt-d: Don't disable IR when it was previously enabled
iommu/vt-d: Make sure copied over IR entries are not reused
iommu/vt-d: Copy IR table from old kernel when in kdump mode
iommu/vt-d: Set IRTA in intel_setup_irq_remapping
iommu/vt-d: Disable IRQ remapping in intel_prepare_irq_remapping
iommu/vt-d: Move QI initializationt to intel_setup_irq_remapping
iommu/vt-d: Move EIM detection to intel_prepare_irq_remapping
iommu/vt-d: Enable Translation only if it was previously disabled
iommu/vt-d: Don't disable translation prior to OS handover
iommu/vt-d: Don't copy translation tables if RTT bit needs to be changed
iommu/vt-d: Don't do early domain assignment if kdump kernel
iommu/vt-d: Allocate si_domain in init_dmars()
iommu/vt-d: Mark copied context entries
iommu/vt-d: Do not re-use domain-ids from the old kernel
iommu/vt-d: Copy translation tables from old kernel
iommu/vt-d: Detect pre enabled translation
iommu/vt-d: Make root entry visible for hardware right after allocation
iommu/vt-d: Init QI before root entry is allocated
iommu/vt-d: Cleanup log messages
...
Pull x86 core updates from Ingo Molnar:
"There were so many changes in the x86/asm, x86/apic and x86/mm topics
in this cycle that the topical separation of -tip broke down somewhat -
so the result is a more traditional architecture pull request,
collected into the 'x86/core' topic.
The topics were still maintained separately as far as possible, so
bisectability and conceptual separation should still be pretty good -
but there were a handful of merge points to avoid excessive
dependencies (and conflicts) that would have been poorly tested in the
end.
The next cycle will hopefully be much more quiet (or at least will
have fewer dependencies).
The main changes in this cycle were:
* x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas
Gleixner)
- This is the second and most intrusive part of changes to the x86
interrupt handling - full conversion to hierarchical interrupt
domains:
[IOAPIC domain] -----
|
[MSI domain] --------[Remapping domain] ----- [ Vector domain ]
| (optional) |
[HPET MSI domain] ----- |
|
[DMAR domain] -----------------------------
|
[Legacy domain] -----------------------------
This now reflects the actual hardware and allowed us to distangle
the domain specific code from the underlying parent domain, which
can be optional in the case of interrupt remapping. It's a clear
separation of functionality and removes quite some duct tape
constructs which plugged the remap code between ioapic/msi/hpet
and the vector management.
- Intel IOMMU IRQ remapping enhancements, to allow direct interrupt
injection into guests (Feng Wu)
* x86/asm changes:
- Tons of cleanups and small speedups, micro-optimizations. This
is in preparation to move a good chunk of the low level entry
code from assembly to C code (Denys Vlasenko, Andy Lutomirski,
Brian Gerst)
- Moved all system entry related code to a new home under
arch/x86/entry/ (Ingo Molnar)
- Removal of the fragile and ugly CFI dwarf debuginfo annotations.
Conversion to C will reintroduce many of them - but meanwhile
they are only getting in the way, and the upstream kernel does
not rely on them (Ingo Molnar)
- NOP handling refinements. (Borislav Petkov)
* x86/mm changes:
- Big PAT and MTRR rework: making the code more robust and
preparing to phase out exposing direct MTRR interfaces to drivers -
in favor of using PAT driven interfaces (Toshi Kani, Luis R
Rodriguez, Borislav Petkov)
- New ioremap_wt()/set_memory_wt() interfaces to support
Write-Through cached memory mappings. This is especially
important for good performance on NVDIMM hardware (Toshi Kani)
* x86/ras changes:
- Add support for deferred errors on AMD (Aravind Gopalakrishnan)
This is an important RAS feature which adds hardware support for
poisoned data. That means roughly that the hardware marks data
which it has detected as corrupted but wasn't able to correct, as
poisoned data and raises an APIC interrupt to signal that in the
form of a deferred error. It is the OS's responsibility then to
take proper recovery action and thus prolonge system lifetime as
far as possible.
- Add support for Intel "Local MCE"s: upcoming CPUs will support
CPU-local MCE interrupts, as opposed to the traditional system-
wide broadcasted MCE interrupts (Ashok Raj)
- Misc cleanups (Borislav Petkov)
* x86/platform changes:
- Intel Atom SoC updates
... and lots of other cleanups, fixlets and other changes - see the
shortlog and the Git log for details"
* 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits)
x86/hpet: Use proper hpet device number for MSI allocation
x86/hpet: Check for irq==0 when allocating hpet MSI interrupts
x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled
x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled
x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail
genirq: Prevent crash in irq_move_irq()
genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain
iommu, x86: Properly handle posted interrupts for IOMMU hotplug
iommu, x86: Provide irq_remapping_cap() interface
iommu, x86: Setup Posted-Interrupts capability for Intel iommu
iommu, x86: Add cap_pi_support() to detect VT-d PI capability
iommu, x86: Avoid migrating VT-d posted interrupts
iommu, x86: Save the mode (posted or remapped) of an IRTE
iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip
iommu: dmar: Provide helper to copy shared irte fields
iommu: dmar: Extend struct irte for VT-d Posted-Interrupts
iommu: Add new member capability to struct irq_remap_ops
x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code
x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation
x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry()
...
Make sure that we are skipping over large PTEs while walking
the page-table tree.
Cc: stable@kernel.org
Fixes: 5c34c403b7 ("iommu/amd: Fix memory leak in free_pagetable")
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Without this patch only -ENOTSUPP is handled, but there are
other possible errors. Handle them too.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
With device intialization done in the add_device call-back
now there is no reason for this function anymore.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
A device that might be used for HSA needs to be in a direct
mapped domain so that all DMA-API mappings stay alive when
the IOMMUv2 stack is used.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This enables allocation of DMA-API default domains from the
IOMMU core and switches allocation of domain dma-api domain
to the IOMMU core too.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Implement these two iommu-ops call-backs to make use of the
initialization and notifier features of the iommu core.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
This reverts commit 5fc872c732.
The DMA-API does not strictly require that the memory
returned by dma_alloc_coherent is zeroed out. For that
another function (dma_zalloc_coherent) should be used. But
all other x86 DMA-API implementation I checked zero out the
memory, so that some drivers rely on it and break when it is
not.
It seems the (driver-)world is not yet ready for this
change, so revert it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>