Commit Graph

1789 Commits

Author SHA1 Message Date
shameer
99caf177f6 iommu/arm-smmu-v3: Enable ACPI based HiSilicon CMD_PREFETCH quirk(erratum 161010701)
HiSilicon SMMUv3 on Hip06/Hip07 platforms doesn't support CMD_PREFETCH
command. The dt based support for this quirk is already present in the
driver(hisilicon,broken-prefetch-cmd). This adds ACPI support for the
quirk using the IORT smmu model number.

Signed-off-by: shameer <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: hanjun <guohanjun@huawei.com>
[will: rewrote patch]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:04 +01:00
Linu Cherian
e5b829de05 iommu/arm-smmu-v3: Add workaround for Cavium ThunderX2 erratum #74
Cavium ThunderX2 SMMU implementation doesn't support page 1 register space
and PAGE0_REGS_ONLY option is enabled as an errata workaround.
This option when turned on, replaces all page 1 offsets used for
EVTQ_PROD/CONS, PRIQ_PROD/CONS register access with page 0 offsets.

SMMU resource size checks are now based on SMMU option PAGE0_REGS_ONLY,
since resource size can be either 64k/128k.
For this, arm_smmu_device_dt_probe/acpi_probe has been moved before
platform_get_resource call, so that SMMU options are set beforehand.

Signed-off-by: Linu Cherian <linu.cherian@cavium.com>
Signed-off-by: Geetha Sowjanya <geethasowjanya.akula@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:03 +01:00
Robert Richter
12275bf0a4 iommu/arm-smmu-v3, acpi: Add temporary Cavium SMMU-V3 IORT model number definitions
The model number is already defined in acpica and we are actually
waiting for the acpi maintainers to include it:

 https://github.com/acpica/acpica/commit/d00a4eb86e64

Adding those temporary definitions until the change makes it into
include/acpi/actbl2.h. Once that is done this patch can be reverted.

Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:03 +01:00
Will Deacon
77f3445866 iommu/io-pgtable-arm: Use dma_wmb() instead of wmb() when publishing table
When writing a new table entry, we must ensure that the contents of the
table is made visible to the SMMU page table walker before the updated
table entry itself.

This is currently achieved using wmb(), which expands to an expensive and
unnecessary DSB instruction. Ideally, we'd just use cmpxchg64_release when
writing the table entry, but this doesn't have memory ordering semantics
on !SMP systems.

Instead, use dma_wmb(), which emits DMB OSHST. Strictly speaking, this
does more than we require (since it targets the outer-shareable domain),
but it's likely to be significantly faster than the DSB approach.

Reported-by: Linu Cherian <linu.cherian@cavium.com>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:02 +01:00
Will Deacon
c1004803b4 iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST with LPAE
The LPAE/ARMv8 page table format relies on the ability to read and write
64-bit page table entries in an atomic fashion. With the move to a lockless
implementation, we also need support for cmpxchg64 to resolve races when
installing table entries concurrently.

Unfortunately, not all architectures support cmpxchg64, so the code can
fail to compiler when building for these architectures using COMPILE_TEST.
Rather than disable COMPILE_TEST altogether, instead check that
GENERIC_ATOMIC64 is not selected, which is a reasonable indication that
the architecture has support for 64-bit cmpxchg.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:02 +01:00
Robin Murphy
58188afeb7 iommu/arm-smmu-v3: Remove io-pgtable spinlock
As for SMMUv2, take advantage of io-pgtable's newfound tolerance for
concurrency. Unfortunately in this case the command queue lock remains a
point of serialisation for the unmap path, but there may be a little
more we can do to ameliorate that in future.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:01 +01:00
Robin Murphy
523d7423e2 iommu/arm-smmu: Remove io-pgtable spinlock
With the io-pgtable code now robust against (valid) races, we no longer
need to serialise all operations with a lock. This might make broken
callers who issue concurrent operations on overlapping addresses go even
more wrong than before, but hey, they already had little hope of useful
or deterministic results.

We do however still have to keep a lock around to serialise the ATS1*
translation ops, as parallel iova_to_phys() calls could lead to
unpredictable hardware behaviour otherwise.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:01 +01:00
Robin Murphy
119ff305b0 iommu/io-pgtable-arm-v7s: Support lockless operation
Mirroring the LPAE implementation, rework the v7s code to be robust
against concurrent operations. The same two potential races exist, and
are solved in the same manner, with the fixed 2-level structure making
life ever so slightly simpler.

What complicates matters compared to LPAE, however, is large page
entries, since we can't update a block of 16 PTEs atomically, nor assume
available software bits to do clever things with. As most users are
never likely to do partial unmaps anyway (due to DMA API rules), it
doesn't seem unreasonable for this case to remain behind a serialising
lock; we just pull said lock down into the bowels of the implementation
so it's well out of the way of the normal call paths.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:00 +01:00
Robin Murphy
2c3d273eab iommu/io-pgtable-arm: Support lockless operation
For parallel I/O with multiple concurrent threads servicing the same
device (or devices, if several share a domain), serialising page table
updates becomes a massive bottleneck. On reflection, though, we don't
strictly need to do that - for valid IOMMU API usage, there are in fact
only two races that we need to guard against: multiple map requests for
different blocks within the same region, when the intermediate-level
table for that region does not yet exist; and multiple unmaps of
different parts of the same block entry. Both of those are fairly easily
solved by using a cmpxchg to install the new table, such that if we then
find that someone else's table got there first, we can simply free ours
and continue.

Make the requisite changes such that we can withstand being called
without the caller maintaining a lock. In theory, this opens up a few
corners in which wildly misbehaving callers making nonsensical
overlapping requests might lead to crashes instead of just unpredictable
results, but correct code really does not deserve to pay a significant
performance cost for the sake of masking bugs in theoretical broken code.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:00 +01:00
Robin Murphy
81b3c25218 iommu/io-pgtable: Introduce explicit coherency
Once we remove the serialising spinlock, a potential race opens up for
non-coherent IOMMUs whereby a caller of .map() can be sure that cache
maintenance has been performed on their new PTE, but will have no
guarantee that such maintenance for table entries above it has actually
completed (e.g. if another CPU took an interrupt immediately after
writing the table entry, but before initiating the DMA sync).

Handling this race safely will add some potentially non-trivial overhead
to installing a table entry, which we would much rather avoid on
coherent systems where it will be unnecessary, and where we are stirivng
to minimise latency by removing the locking in the first place.

To that end, let's introduce an explicit notion of cache-coherency to
io-pgtable, such that we will be able to avoid penalising IOMMUs which
know enough to know when they are coherent.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:58:00 +01:00
Robin Murphy
b9f1ef30ac iommu/io-pgtable-arm-v7s: Refactor split_blk_unmap
Whilst the short-descriptor format's split_blk_unmap implementation has
no need to be recursive, it followed the pattern of the LPAE version
anyway for the sake of consistency. With the latter now reworked for
both efficiency and future scalability improvements, tweak the former
similarly, not least to make it less obtuse.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:57:59 +01:00
Robin Murphy
fb3a95795d iommu/io-pgtable-arm: Improve split_blk_unmap
The current split_blk_unmap implementation suffers from some inscrutable
pointer trickery for creating the tables to replace the block entry, but
more than that it also suffers from hideous inefficiency. For example,
the most pathological case of unmapping a level 3 page from a level 1
block will allocate 513 lower-level tables to remap the entire block at
page granularity, when only 2 are actually needed (the rest can be
covered by level 2 block entries).

Also, we would like to be able to relax the spinlock requirement in
future, for which the roll-back-and-try-again logic for race resolution
would be pretty hideous under the current paradigm.

Both issues can be resolved most neatly by turning things sideways:
instead of repeatedly recursing into __arm_lpae_map() map to build up an
entire new sub-table depth-first, we can directly replace the block
entry with a next-level table of block/page entries, then repeat by
unmapping at the next level if necessary. With a little refactoring of
some helper functions, the code ends up not much bigger than before, but
considerably easier to follow and to adapt in future.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:57:59 +01:00
Robin Murphy
9db829d281 iommu/io-pgtable-arm-v7s: Check table PTEs more precisely
Whilst we don't support the PXN bit at all, so should never encounter a
level 1 section or supersection PTE with it set, it would still be wise
to check both table type bits to resolve any theoretical ambiguity.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:57:58 +01:00
Arvind Yadav
5c2d021829 iommu: arm-smmu: Handle return of iommu_device_register.
iommu_device_register returns an error code and, although it currently
never fails, we should check its return value anyway.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
[will: adjusted to follow arm-smmu.c]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:57:58 +01:00
Arvind Yadav
ebdd13c93f iommu: arm-smmu-v3: make of_device_ids const
of_device_ids are not supposed to change at runtime. All functions
working with of_device_ids provided by <linux/of.h> work with const
of_device_ids. So mark the non-const structs as const.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:57:57 +01:00
Robin Murphy
84c24379a7 iommu/arm-smmu: Plumb in new ACPI identifiers
Revision C of IORT now allows us to identify ARM MMU-401 and the Cavium
ThunderX implementation. Wire them up so that we can probe these models
once firmware starts using the new codes in place of generic ones, and
so that the appropriate features and quirks get enabled when we do.

For the sake of backports and mitigating sychronisation problems with
the ACPICA headers, we'll carry a backup copy of the new definitions
locally for the short term to make life simpler.

CC: stable@vger.kernel.org # 4.10
Acked-by: Robert Richter <rrichter@cavium.com>
Tested-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:57:57 +01:00
Arvind Yadav
60ab7a75c8 iommu/io-pgtable-arm-v7s: constify dummy_tlb_ops.
File size before:
   text	   data	    bss	    dec	    hex	filename
   6146	     56	      9	   6211	   1843	drivers/iommu/io-pgtable-arm-v7s.o

File size After adding 'const':
   text	   data	    bss	    dec	    hex	filename
   6170	     24	      9	   6203	   183b	drivers/iommu/io-pgtable-arm-v7s.o

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:57:57 +01:00
Sunil Goutham
b847de4e50 iommu/arm-smmu-v3: Increase CMDQ drain timeout value
Waiting for a CMD_SYNC to be processed involves waiting for the command
queue to drain, which can take an awful lot longer than waiting for a
single entry to become available. Consequently, the common timeout value
of 100us has been observed to be too short on some platforms when a
CMD_SYNC is issued into a queued full of TLBI commands.

This patch resolves the issue by using a different (1s) timeout when
waiting for the CMDQ to drain and using a simple back-off mechanism
when polling the cons pointer in the absence of WFE support.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
[will: rewrote commit message and cosmetic changes]
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-06-23 17:57:56 +01:00
Linus Torvalds
28b47809b2 IOMMU Updates for Linux v4.12
This includes:
 
 	* Some code optimizations for the Intel VT-d driver
 
 	* Code to switch off a previously enabled Intel IOMMU
 
 	* Support for 'struct iommu_device' for OMAP, Rockchip and
 	  Mediatek IOMMUs
 
 	* Some header optimizations for IOMMU core code headers and a
 	  few fixes that became necessary in other parts of the kernel
 	  because of that
 
 	* ACPI/IORT updates and fixes
 
 	* Some Exynos IOMMU optimizations
 
 	* Code updates for the IOMMU dma-api code to bring it closer to
 	  use per-cpu iova caches
 
 	* New command-line option to set default domain type allocated
 	  by the iommu core code
 
 	* Another command line option to allow the Intel IOMMU switched
 	  off in a tboot environment
 
 	* ARM/SMMU: TLB sync optimisations for SMMUv2, Support for using
 	  an IDENTITY domain in conjunction with DMA ops, Support for
 	  SMR masking, Support for 16-bit ASIDs (was previously broken)
 
 	* Various other small fixes and improvements
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJZEY4XAAoJECvwRC2XARrjth0QAKV56zjnFclv39aDo6eCq9CT
 51+XT4bPY5VKQ2+Jx76TBNObHmGK+8KEMHfT9khpWJtFCDyy25SGckLry1nYqmZs
 tSTsbj4sOeCyKzOLITlRN9/OzKXkjKAxYuq+sQZZFDFYf3kCM/eag0dGAU6aVLNp
 tkIal3CSpGjCQ9M5JohrtQ1mwiGqCIkMIgvnBjRw+bfpLnQNG+VL6VU2G3RAkV2b
 5Vbdoy+P7ZQnJSZr/bibYL2BaQs2diR4gOppT5YbsfniMq4QYSjheu1xBboGX8b7
 sx8yuPi4370irSan0BDvlvdQdjBKIRiDjfGEKDhRwPhtvN6JREGakhEOC8MySQ37
 mP96B72Lmd+a7DEl5udOL7tQILA0DcUCX0aOyF714khnZuFU5tVlCotb/36xeJ+T
 FPc3RbEVQ90m8dYU6MNJ+ahtb/ZapxGTRfisIigB6wlnZa0Evabp9EJSce6oJMkm
 whbBhDubeEU18n9XAaofMbu+P2LAzq8cxiRMlsDvT4mIy7jO86jjCmhpu1Tfn2GY
 4wrEQZdWOMvhUsIhObXA0aC3BzC506uvnKPW3qy041RaxBuelWiBi29qzYbhxzkr
 DLDpWbUZNYPyFJjttpavyQb2/XRduBTJdVP1pQpkJNDsW5jLiBkpSqm9xNADapRY
 vLSYRX0JCIquaD+PAuxn
 =3aE8
 -----END PGP SIGNATURE-----

Merge tag 'iommu-updates-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU updates from Joerg Roedel:

 - code optimizations for the Intel VT-d driver

 - ability to switch off a previously enabled Intel IOMMU

 - support for 'struct iommu_device' for OMAP, Rockchip and Mediatek
   IOMMUs

 - header optimizations for IOMMU core code headers and a few fixes that
   became necessary in other parts of the kernel because of that

 - ACPI/IORT updates and fixes

 - Exynos IOMMU optimizations

 - updates for the IOMMU dma-api code to bring it closer to use per-cpu
   iova caches

 - new command-line option to set default domain type allocated by the
   iommu core code

 - another command line option to allow the Intel IOMMU switched off in
   a tboot environment

 - ARM/SMMU: TLB sync optimisations for SMMUv2, Support for using an
   IDENTITY domain in conjunction with DMA ops, Support for SMR masking,
   Support for 16-bit ASIDs (was previously broken)

 - various other small fixes and improvements

* tag 'iommu-updates-v4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (63 commits)
  soc/qbman: Move dma-mapping.h include to qman_priv.h
  soc/qbman: Fix implicit header dependency now causing build fails
  iommu: Remove trace-events include from iommu.h
  iommu: Remove pci.h include from trace/events/iommu.h
  arm: dma-mapping: Don't override dma_ops in arch_setup_dma_ops()
  ACPI/IORT: Fix CONFIG_IOMMU_API dependency
  iommu/vt-d: Don't print the failure message when booting non-kdump kernel
  iommu: Move report_iommu_fault() to iommu.c
  iommu: Include device.h in iommu.h
  x86, iommu/vt-d: Add an option to disable Intel IOMMU force on
  iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed
  iommu/arm-smmu: Correct sid to mask
  iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid()
  iommu: Make iommu_bus_notifier return NOTIFY_DONE rather than error code
  omap3isp: Remove iommu_group related code
  iommu/omap: Add iommu-group support
  iommu/omap: Make use of 'struct iommu_device'
  iommu/omap: Store iommu_dev pointer in arch_data
  iommu/omap: Move data structures to omap-iommu.h
  iommu/omap: Drop legacy-style device support
  ...
2017-05-09 15:15:47 -07:00
Linus Torvalds
1062ae4982 extra pull request because I missed tegra.
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZC9nKAAoJEAx081l5xIa+LqoP+wYuQDS4q0XFN232F/2Zpa/9
 hkXN6HoKfme68AWlPtcM2Srt6Ydk7exUNl4SVxCFWwUI3z7Rt4uv3eZxnMESZ5sM
 4NT3rtWgkrzyXD4VNwTDHeqzAyqiajeMuSwlCZBwslH81UZydmYcc8+4JHYy3aaK
 BYklPxYm629XNQ7+6S0r9M1XDpEH8gcml9zd50X9jpxOtbr2Q70EFa+rOZaJncxe
 FxoqrriuaJiMNlDlOovO0P5g7+y4DFUJqCgtipgCUYNWtipUUjujiRtjjFYGH0eQ
 Yixsvz8xlHdp0swpCM6jfdwLDqdXYL+AV9lRuIraFkl72kNHZeP/E3XidV+NOTD0
 r5CYjROlf7g0K/Wob8UhQigLwm3D78tg2LzHMt3fIDY6vG70g1oNVt51D3pzMdZP
 d3pypFFhZMeAS+/czF3Fg99lw6DB9jNx6H6Vq3oKc2qZtjcL3jc/rHB/7eHNo7VK
 tWMQshPhQvYi/bG5OUq/YHzz4wBf9UK12d3U2had0EDon8Gzd5Sd0oaHtyApolsv
 iMdlWCho/2eRsWTPelDWvVbKInC33gg/Lr8lr/l4tosWVmRKJl10a8RrW27v+lVu
 Kqu1K70OhPeHzJTo1gu8wVjH8JHw20gkpO0BTemsb2f9ViZxl6/Bz/8TGNq30Lxr
 hRzJxP4ZOmrXffR4hktf
 =DQ0P
 -----END PGP SIGNATURE-----

Merge tag 'drm-forgot-about-tegra-for-v4.12-rc1' of git://people.freedesktop.org/~airlied/linux

Pull drm tegra updates from Dave Airlie:
 "I missed a pull request from Thierry, this stuff has been in
  linux-next for a while anyways.

  It does contain a branch from the iommu tree, but Thierry said it
  should be fine"

* tag 'drm-forgot-about-tegra-for-v4.12-rc1' of git://people.freedesktop.org/~airlied/linux:
  gpu: host1x: Fix host1x driver shutdown
  gpu: host1x: Support module reset
  gpu: host1x: Sort includes alphabetically
  drm/tegra: Add VIC support
  dt-bindings: Add bindings for the Tegra VIC
  drm/tegra: Add falcon helper library
  drm/tegra: Add Tegra DRM allocation API
  drm/tegra: Add tiling FB modifiers
  drm/tegra: Don't leak kernel pointer to userspace
  drm/tegra: Protect IOMMU operations by mutex
  drm/tegra: Enable IOVA API when IOMMU support is enabled
  gpu: host1x: Add IOMMU support
  gpu: host1x: Fix potential out-of-bounds access
  iommu/iova: Fix compile error with CONFIG_IOMMU_IOVA=m
  iommu: Add dummy implementations for !IOMMU_IOVA
  MAINTAINERS: Add related headers to IOMMU section
  iommu/iova: Consolidate code for adding new node to iovad domain rbtree
2017-05-05 17:18:44 -07:00
Dave Airlie
644b4930bf drm/tegra: Changes for v4.12-rc1
This contains various fixes to the host1x driver as well as a plug for a
 leak of kernel pointers to userspace.
 
 A fairly big addition this time around is the Video Image Composer (VIC)
 support that can be used to accelerate some 2D and image compositing
 operations.
 
 Furthermore the driver now supports FB modifiers, so we no longer rely
 on a custom IOCTL to set those.
 
 Finally this contains a few preparatory patches for Tegra186 support
 which unfortunately didn't quite make it this time, but will hopefully
 be ready for v4.13.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEiOrDCAFJzPfAjcif3SOs138+s6EFAljmxI8THHRyZWRpbmdA
 bnZpZGlhLmNvbQAKCRDdI6zXfz6zocZaD/9AaoXn19oEPYiWx+MjIAkhBiYEw+oq
 RrJ1uVisZIx8Y+ftVtwURLh3HS7SXg3F3mjJxfJVxqqlGfTCry0eiru1qbAbkwC2
 ebStjdBBnAhMws22Dba5R5aHh21b1px/fj9u+jtiqcKgWzB28D+jH6G4ViTzOgXs
 TT1inq5SQfpL3e0+ovmVEh7/URdWf5tPQVWVuvOewTdVA2NYRXpAAFKhGWp4vD28
 brvGPzYgEzBF1Q8p3f9kczbtwqChZuwnwVeP/9EP+U+gApIlmFC8XTISAuBYHWfT
 baJQJ/V5wjM8m3FmsX4YL8VKplhw2/JMINfD37hzjVlqskN5V80KfNlKTe8yAK96
 sP0HDg72zlZ0HHOmOESuxyNyLRYkDpuVNKh4my24u77y/VcpihwMRPc8chnpG76h
 jh+7qlZZM8XNnVnFJN5HWLiRTUjZ9y3loJ//Wu97JkOr7SuHQ7qzhPDZV/jZepIG
 XkSD1dac7ii2huA6zxSFZzylwBvLHyrdj4s7imNckkNSbVNpm204Bzs0Rr2ju0bv
 YBe2+yGkurE8ePfONeaAkpFxHkAkdC0b2SlbZf7MgSaant2s/CV9DGgj03ZipC1g
 n2juRlDGLw+B1MOU+dWv/cMZmrO3Xg36E0gYgqskYRunqTdVT+Vx70D0rygo7hDd
 G7qvllT6oqKJ1g==
 =Eqtp
 -----END PGP SIGNATURE-----

Merge tag 'drm/tegra/for-4.12-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next

drm/tegra: Changes for v4.12-rc1

This contains various fixes to the host1x driver as well as a plug for a
leak of kernel pointers to userspace.

A fairly big addition this time around is the Video Image Composer (VIC)
support that can be used to accelerate some 2D and image compositing
operations.

Furthermore the driver now supports FB modifiers, so we no longer rely
on a custom IOCTL to set those.

Finally this contains a few preparatory patches for Tegra186 support
which unfortunately didn't quite make it this time, but will hopefully
be ready for v4.13.

* tag 'drm/tegra/for-4.12-rc1' of git://anongit.freedesktop.org/tegra/linux:
  gpu: host1x: Fix host1x driver shutdown
  gpu: host1x: Support module reset
  gpu: host1x: Sort includes alphabetically
  drm/tegra: Add VIC support
  dt-bindings: Add bindings for the Tegra VIC
  drm/tegra: Add falcon helper library
  drm/tegra: Add Tegra DRM allocation API
  drm/tegra: Add tiling FB modifiers
  drm/tegra: Don't leak kernel pointer to userspace
  drm/tegra: Protect IOMMU operations by mutex
  drm/tegra: Enable IOVA API when IOMMU support is enabled
  gpu: host1x: Add IOMMU support
  gpu: host1x: Fix potential out-of-bounds access
  iommu/iova: Fix compile error with CONFIG_IOMMU_IOVA=m
  iommu: Add dummy implementations for !IOMMU_IOVA
  MAINTAINERS: Add related headers to IOMMU section
  iommu/iova: Consolidate code for adding new node to iovad domain rbtree
2017-05-05 11:47:01 +10:00
Joerg Roedel
2c0248d688 Merge branches 'arm/exynos', 'arm/omap', 'arm/rockchip', 'arm/mediatek', 'arm/smmu', 'arm/core', 'x86/vt-d', 'x86/amd' and 'core' into next 2017-05-04 18:06:17 +02:00
Linus Torvalds
b68e7e952f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:

 - three merges for KVM/s390 with changes for vfio-ccw and cpacf. The
   patches are included in the KVM tree as well, let git sort it out.

 - add the new 'trng' random number generator

 - provide the secure key verification API for the pkey interface

 - introduce the z13 cpu counters to perf

 - add a new system call to set up the guarded storage facility

 - simplify TASK_SIZE and arch_get_unmapped_area

 - export the raw STSI data related to CPU topology to user space

 - ... and the usual churn of bug-fixes and cleanups.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (74 commits)
  s390/crypt: use the correct module alias for paes_s390.
  s390/cpacf: Introduce kma instruction
  s390/cpacf: query instructions use unique parameters for compatibility with KMA
  s390/trng: Introduce s390 TRNG device driver.
  s390/crypto: Provide s390 specific arch random functionality.
  s390/crypto: Add new subfunctions to the cpacf PRNO function.
  s390/crypto: Renaming PPNO to PRNO.
  s390/pageattr: avoid unnecessary page table splitting
  s390/mm: simplify arch_get_unmapped_area[_topdown]
  s390/mm: make TASK_SIZE independent from the number of page table levels
  s390/gs: add regset for the guarded storage broadcast control block
  s390/kvm: Add use_cmma field to mm_context_t
  s390/kvm: Add PGSTE manipulation functions
  vfio: ccw: improve error handling for vfio_ccw_mdev_remove
  vfio: ccw: remove unnecessary NULL checks of a pointer
  s390/spinlock: remove compare and delay instruction
  s390/spinlock: use atomic primitives for spinlocks
  s390/cpumf: simplify detection of guest samples
  s390/pci: remove forward declaration
  s390/pci: increase the PCI_NR_FUNCTIONS default
  ...
2017-05-02 09:50:09 -07:00
Joerg Roedel
461a6946b1 iommu: Remove pci.h include from trace/events/iommu.h
The include file does not need any PCI specifics, so remove
that include. Also fix the places that relied on it.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-29 00:20:49 +02:00
Qiuxu Zhuo
8e12188400 iommu/vt-d: Don't print the failure message when booting non-kdump kernel
When booting a new non-kdump kernel, we have below failure message:

[    0.004000] DMAR-IR: IRQ remapping was enabled on dmar2 but we are not in kdump mode
[    0.004000] DMAR-IR: Failed to copy IR table for dmar2 from previous kernel
[    0.004000] DMAR-IR: IRQ remapping was enabled on dmar1 but we are not in kdump mode
[    0.004000] DMAR-IR: Failed to copy IR table for dmar1 from previous kernel
[    0.004000] DMAR-IR: IRQ remapping was enabled on dmar0 but we are not in kdump mode
[    0.004000] DMAR-IR: Failed to copy IR table for dmar0 from previous kernel
[    0.004000] DMAR-IR: IRQ remapping was enabled on dmar3 but we are not in kdump mode
[    0.004000] DMAR-IR: Failed to copy IR table for dmar3 from previous kernel

For non-kdump case, we no need to copy IR table from previous kernel
so it's nonthing actually failed. To be less alarming or misleading,
do not print "DMAR-IR: Failed to copy IR table for dmar[0-9] from
previous kernel" messages when booting non-kdump kernel.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-28 23:38:42 +02:00
Joerg Roedel
207c6e36f1 iommu: Move report_iommu_fault() to iommu.c
The function is in no fast-path, there is no need for it to
be static inline in a header file. This also removes the
need to include iommu trace-points in iommu.h.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-27 11:24:11 +02:00
Shaohua Li
bfd20f1cc8 x86, iommu/vt-d: Add an option to disable Intel IOMMU force on
IOMMU harms performance signficantly when we run very fast networking
workloads. It's 40GB networking doing XDP test. Software overhead is
almost unaware, but it's the IOTLB miss (based on our analysis) which
kills the performance. We observed the same performance issue even with
software passthrough (identity mapping), only the hardware passthrough
survives. The pps with iommu (with software passthrough) is only about
~30% of that without it. This is a limitation in hardware based on our
observation, so we'd like to disable the IOMMU force on, but we do want
to use TBOOT and we can sacrifice the DMA security bought by IOMMU. I
must admit I know nothing about TBOOT, but TBOOT guys (cc-ed) think not
eabling IOMMU is totally ok.

So introduce a new boot option to disable the force on. It's kind of
silly we need to run into intel_iommu_init even without force on, but we
need to disable TBOOT PMR registers. For system without the boot option,
nothing is changed.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-26 23:57:53 +02:00
Sunil Goutham
bdf9592308 iommu/arm-smmu: Return IOVA in iova_to_phys when SMMU is bypassed
For software initiated address translation, when domain type is
IOMMU_DOMAIN_IDENTITY i.e SMMU is bypassed, mimic HW behavior
i.e return the same IOVA as translated address.

This patch is an extension to Will Deacon's patchset
"Implement SMMU passthrough using the default domain".

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-26 12:30:10 +02:00
Peng Fan
6323f47490 iommu/arm-smmu: Correct sid to mask
From code "SMR mask 0x%x out of range for SMMU", so, we need to use mask, not
sid.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-25 14:26:05 +02:00
Pan Bian
73dbd4a423 iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid()
In function amd_iommu_bind_pasid(), the control flow jumps
to label out_free when pasid_state->mm and mm is NULL. And
mmput(mm) is called.  In function mmput(mm), mm is
referenced without validation. This will result in a NULL
dereference bug. This patch fixes the bug.

Signed-off-by: Pan Bian <bianpan2016@163.com>
Fixes: f0aac63b87 ('iommu/amd: Don't hold a reference to mm_struct')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-24 12:33:34 +02:00
zhichang.yuan
3ba8775f64 iommu: Make iommu_bus_notifier return NOTIFY_DONE rather than error code
In iommu_bus_notifier(), when action is
BUS_NOTIFY_ADD_DEVICE, it will return 'ops->add_device(dev)'
directly. But ops->add_device will return ERR_VAL, such as
-ENODEV. These value will make notifier_call_chain() not to
traverse the remain nodes in struct notifier_block list.

This patch revises iommu_bus_notifier() to return
NOTIFY_DONE when some errors happened in ops->add_device().

Signed-off-by: zhichang.yuan <yuanzhichang@hisilicon.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-20 16:42:52 +02:00
Joerg Roedel
28ae1e3e14 iommu/omap: Add iommu-group support
Support for IOMMU groups will become mandatory for drivers,
so add it to the omap iommu driver.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
[s-anna@ti.com: minor error cleanups]
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-20 16:33:59 +02:00
Joerg Roedel
01611fe847 iommu/omap: Make use of 'struct iommu_device'
Modify the driver to register individual iommus and
establish links between devices and iommus in sysfs.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
[s-anna@ti.com: fix some cleanup issues during failures]
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-20 16:33:58 +02:00
Joerg Roedel
ede1c2e7d4 iommu/omap: Store iommu_dev pointer in arch_data
Instead of finding the matching IOMMU for a device using
string comparision functions, store the pointer to the
iommu_dev in arch_data during the omap_iommu_add_device
callback and reset it during the omap_iommu_remove_device
callback functions.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
[s-anna@ti.com: few minor cleanups]
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-20 16:33:58 +02:00
Joerg Roedel
e73b7afe4e iommu/omap: Move data structures to omap-iommu.h
The internal data-structures are scattered over various
header and C files. Consolidate them in omap-iommu.h.

While at this, add the kerneldoc comment for the missing
iommu domain variable and revise the iommu_arch_data name.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
[s-anna@ti.com: revise kerneldoc comments]
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-20 16:33:58 +02:00
Suman Anna
49a57ef7f8 iommu/omap: Drop legacy-style device support
All the supported boards that have OMAP IOMMU devices do support
DT boot only now. So, drop the support for the non-DT legacy-style
devices from the OMAP IOMMU driver. Couple of the fields from the
iommu platform data would no longer be required, so they have also
been cleaned up. The IOMMU platform data is still needed though for
performing reset management properly in a multi-arch environment.

Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-20 16:33:58 +02:00
Suman Anna
abaa7e5b05 iommu/omap: Register driver before setting IOMMU ops
Move the registration of the OMAP IOMMU platform driver before
setting the IOMMU callbacks on the platform bus. This causes
the IOMMU devices to be probed first before the .add_device()
callback is invoked for all registered devices, and allows
the iommu_group support to be added to the OMAP IOMMU driver.

While at this, also check for the return status from bus_set_iommu.

Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-20 16:33:58 +02:00
Robin Murphy
f6810c15cf iommu/arm-smmu: Clean up early-probing workarounds
Now that the appropriate ordering is enforced via probe-deferral of
masters in core code, rip it all out and bask in the simplicity.

Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
[Sricharan: Rebased on top of ACPI IORT SMMU series]
Signed-off-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-20 16:31:09 +02:00
Laurent Pinchart
7b07cbefb6 iommu: of: Handle IOMMU lookup failure with deferred probing or error
Failures to look up an IOMMU when parsing the DT iommus property need to
be handled separately from the .of_xlate() failures to support deferred
probing.

The lack of a registered IOMMU can be caused by the lack of a driver for
the IOMMU, the IOMMU device probe not having been performed yet, having
been deferred, or having failed.

The first case occurs when the device tree describes the bus master and
IOMMU topology correctly but no device driver exists for the IOMMU yet
or the device driver has not been compiled in. Return NULL, the caller
will configure the device without an IOMMU.

The second and third cases are handled by deferring the probe of the bus
master device which will eventually get reprobed after the IOMMU.

The last case is currently handled by deferring the probe of the bus
master device as well. A mechanism to either configure the bus master
device without an IOMMU or to fail the bus master device probe depending
on whether the IOMMU is optional or mandatory would be a good
enhancement.

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Laurent Pichart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Sricharan R <sricharan@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-20 16:31:07 +02:00
Robin Murphy
d7b0558230 iommu/of: Prepare for deferred IOMMU configuration
IOMMU configuration represents unchanging properties of the hardware,
and as such should only need happen once in a device's lifetime, but
the necessary interaction with the IOMMU device and driver complicates
exactly when that point should be.

Since the only reasonable tool available for handling the inter-device
dependency is probe deferral, we need to prepare of_iommu_configure()
to run later than it is currently called (i.e. at driver probe rather
than device creation), to handle being retried, and to tell whether a
not-yet present IOMMU should be waited for or skipped (by virtue of
having declared a built-in driver or not).

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-20 16:31:06 +02:00
Robin Murphy
2a0c57545a iommu/of: Refactor of_iommu_configure() for error handling
In preparation for some upcoming cleverness, rework the control flow in
of_iommu_configure() to minimise duplication and improve the propogation
of errors. It's also as good a time as any to switch over from the
now-just-a-compatibility-wrapper of_iommu_get_ops() to using the generic
IOMMU instance interface directly.

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-20 16:31:05 +02:00
Nate Watterson
5016bdb796 iommu/iova: Fix underflow bug in __alloc_and_insert_iova_range
Normally, calling alloc_iova() using an iova_domain with insufficient
pfns remaining between start_pfn and dma_limit will fail and return a
NULL pointer. Unexpectedly, if such a "full" iova_domain contains an
iova with pfn_lo == 0, the alloc_iova() call will instead succeed and
return an iova containing invalid pfns.

This is caused by an underflow bug in __alloc_and_insert_iova_range()
that occurs after walking the "full" iova tree when the search ends
at the iova with pfn_lo == 0 and limit_pfn is then adjusted to be just
below that (-1). This (now huge) limit_pfn gives the impression that a
vast amount of space is available between it and start_pfn and thus
a new iova is allocated with the invalid pfn_hi value, 0xFFF.... .

To rememdy this, a check is introduced to ensure that adjustments to
limit_pfn will not underflow.

This issue has been observed in the wild, and is easily reproduced with
the following sample code.

	struct iova_domain *iovad = kzalloc(sizeof(*iovad), GFP_KERNEL);
	struct iova *rsvd_iova, *good_iova, *bad_iova;
	unsigned long limit_pfn = 3;
	unsigned long start_pfn = 1;
	unsigned long va_size = 2;

	init_iova_domain(iovad, SZ_4K, start_pfn, limit_pfn);
	rsvd_iova = reserve_iova(iovad, 0, 0);
	good_iova = alloc_iova(iovad, va_size, limit_pfn, true);
	bad_iova = alloc_iova(iovad, va_size, limit_pfn, true);

Prior to the patch, this yielded:
	*rsvd_iova == {0, 0}   /* Expected */
	*good_iova == {2, 3}   /* Expected */
	*bad_iova  == {-2, -1} /* Oh no... */

After the patch, bad_iova is NULL as expected since inadequate
space remains between limit_pfn and start_pfn after allocating
good_iova.

Signed-off-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
2017-04-07 13:40:40 +02:00
Robin Murphy
022f4e4f31 iommu/io-pgtable-arm: Avoid shift overflow in block size
The recursive nature of __arm_lpae_{map,unmap}() means that
ARM_LPAE_BLOCK_SIZE() is evaluated for every level, including those
where block mappings aren't possible. This in itself is harmless enough,
as we will only ever be called with valid sizes from the pgsize_bitmap,
and thus always recurse down past any imaginary block sizes. The only
problem is that most of those imaginary sizes overflow the type used for
the calculation, and thus trigger warnings under UBsan:

[   63.020939] ================================================================================
[   63.021284] UBSAN: Undefined behaviour in drivers/iommu/io-pgtable-arm.c:312:22
[   63.021602] shift exponent 39 is too large for 32-bit type 'int'
[   63.021909] CPU: 0 PID: 1119 Comm: lkvm Not tainted 4.7.0-rc3+ #819
[   63.022163] Hardware name: FVP Base (DT)
[   63.022345] Call trace:
[   63.022629] [<ffffff900808f258>] dump_backtrace+0x0/0x3a8
[   63.022975] [<ffffff900808f614>] show_stack+0x14/0x20
[   63.023294] [<ffffff90086bc9dc>] dump_stack+0x104/0x148
[   63.023609] [<ffffff9008713ce8>] ubsan_epilogue+0x18/0x68
[   63.023956] [<ffffff9008714410>] __ubsan_handle_shift_out_of_bounds+0x18c/0x1bc
[   63.024365] [<ffffff900890fcb0>] __arm_lpae_map+0x720/0xae0
[   63.024732] [<ffffff9008910170>] arm_lpae_map+0x100/0x190
[   63.025049] [<ffffff90089183d8>] arm_smmu_map+0x78/0xc8
[   63.025390] [<ffffff9008906c18>] iommu_map+0x130/0x230
[   63.025763] [<ffffff9008bf7564>] vfio_iommu_type1_attach_group+0x4bc/0xa00
[   63.026156] [<ffffff9008bf3c78>] vfio_fops_unl_ioctl+0x320/0x580
[   63.026515] [<ffffff9008377420>] do_vfs_ioctl+0x140/0xd28
[   63.026858] [<ffffff9008378094>] SyS_ioctl+0x8c/0xa0
[   63.027179] [<ffffff9008086e70>] el0_svc_naked+0x24/0x28
[   63.027412] ================================================================================

Perform the shift in a 64-bit type to prevent the theoretical overflow
and keep the peace. As it turns out, this generates identical code for
32-bit ARM, and marginally shorter AArch64 code, so it's good all round.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-06 16:06:44 +01:00
Will Deacon
fccb4e3b8a iommu: Allow default domain type to be set on the kernel command line
The IOMMU core currently initialises the default domain for each group
to IOMMU_DOMAIN_DMA, under the assumption that devices will use
IOMMU-backed DMA ops by default. However, in some cases it is desirable
for the DMA ops to bypass the IOMMU for performance reasons, reserving
use of translation for subsystems such as VFIO that require it for
enforcing device isolation.

Rather than modify each IOMMU driver to provide different semantics for
DMA domains, instead we introduce a command line parameter that can be
used to change the type of the default domain. Passthrough can then be
specified using "iommu.passthrough=1" on the kernel command line.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-06 16:06:44 +01:00
Will Deacon
beb3c6a066 iommu/arm-smmu-v3: Install bypass STEs for IOMMU_DOMAIN_IDENTITY domains
In preparation for allowing the default domain type to be overridden,
this patch adds support for IOMMU_DOMAIN_IDENTITY domains to the
ARM SMMUv3 driver.

An identity domain is created by placing the corresponding stream table
entries into "bypass" mode, which allows transactions to flow through
the SMMU without any translation.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-06 16:06:43 +01:00
Will Deacon
67560edcd8 iommu/arm-smmu-v3: Make arm_smmu_install_ste_for_dev return void
arm_smmu_install_ste_for_dev cannot fail and always returns 0, however
the fact that it returns int means that callers end up implementing
redundant error handling code which complicates STE tracking and is
never executed.

This patch changes the return type of arm_smmu_install_ste_for_dev
to void, to make it explicit that it cannot fail.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-06 16:06:43 +01:00
Will Deacon
61bc671179 iommu/arm-smmu: Install bypass S2CRs for IOMMU_DOMAIN_IDENTITY domains
In preparation for allowing the default domain type to be overridden,
this patch adds support for IOMMU_DOMAIN_IDENTITY domains to the
ARM SMMU driver.

An identity domain is created by placing the corresponding S2CR
registers into "bypass" mode, which allows transactions to flow through
the SMMU without any translation.

Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-06 16:06:43 +01:00
Will Deacon
0834cc28fa iommu/arm-smmu: Restrict domain attributes to UNMANAGED domains
The ARM SMMU drivers provide a DOMAIN_ATTR_NESTING domain attribute,
which allows callers of the IOMMU API to request that the page table
for a domain is installed at stage-2, if supported by the hardware.

Since setting this attribute only makes sense for UNMANAGED domains,
this patch returns -ENODEV if the domain_{get,set}_attr operations are
called on other domain types.

Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-06 16:06:43 +01:00
Robin Murphy
56fbf600dd iommu/arm-smmu: Add global SMR masking property
The current SMR masking support using a 2-cell iommu-specifier is
primarily intended to handle individual masters with large and/or
complex Stream ID assignments; it quickly gets a bit clunky in other SMR
use-cases where we just want to consistently mask out the same part of
every Stream ID (e.g. for MMU-500 configurations where the appended TBU
number gets in the way unnecessarily). Let's add a new property to allow
a single global mask value to better fit the latter situation.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Nipun Gupta <nipun.gupta@nxp.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-06 16:06:43 +01:00
Robin Murphy
8513c89300 iommu/arm-smmu: Poll for TLB sync completion more effectively
On relatively slow development platforms and software models, the
inefficiency of our TLB sync loop tends not to show up - for instance on
a Juno r1 board I typically see the TLBI has completed of its own accord
by the time we get to the sync, such that the latter finishes instantly.

However, on larger systems doing real I/O, it's less realistic for the
TLBs to go idle immediately, and at that point falling into the 1MHz
polling loop turns out to throw away performance drastically. Let's
strike a balance by polling more than once between pauses, such that we
have much more chance of catching normal operations completing before
committing to the fixed delay, but also backing off exponentially, since
if a sync really hasn't completed within one or two "reasonable time"
periods, it becomes increasingly unlikely that it ever will.

Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-04-06 16:06:43 +01:00