Allow the mapping code to request a specific virtual address for the gem
mapping. If the virtual address is zero we fall back to the old mode of
allocating a virtual address for the mapping.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
This builds on top of the MMU contexts introduced earlier. Instead of having
one context per GPU core, each GPU client receives its own context.
On MMUv1 this still means a single shared pagetable set is used by all
clients, but on MMUv2 there is now a distinct set of pagetables for each
client. As the command fetch is also translated via the MMU on MMUv2 the
kernel command ringbuffer is mapped into each of the client pagetables.
As the MMU context switch is a bit of a heavy operation, due to the needed
cache and TLB flushing, this patch implements a lazy way of switching the
MMU context. The kernel does not have its own MMU context, but reuses the
last client context for all of its operations. This has some visible impact,
as the GPU can now only be started once a client has submitted some work and
we got the client MMU context assigned. Also the MMU context has a different
lifetime than the general client context, as the GPU might still execute the
kernel command buffer in the context of a client even after the client has
completed all GPU work and has been terminated. Only when the GPU is runtime
suspended or switches to another clients MMU context is the old context
freed up.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
This reworks the MMU handling to make it possible to have multiple MMU contexts.
A context is basically one instance of GPU page tables. Currently we have one
set of page tables per GPU, which isn't all that clever, as it has the
following two consequences:
1. All GPU clients (aka processes) are sharing the same pagetables, which means
there is no isolation between clients, but only between GPU assigned memory
spaces and the rest of the system. Better than nothing, but also not great.
2. Clients operating on the same set of buffers with different etnaviv GPU
cores, e.g. a workload using both the 2D and 3D GPU, need to map the used
buffers into the pagetable sets of each used GPU.
This patch reworks all the MMU handling to introduce the abstraction of the
MMU context. A context can be shared across different GPU cores, as long as
they have compatible MMU implementations, which is the case for all systems
with Vivante GPUs seen in the wild.
As MMUv1 is not able to change pagetables on the fly, without a
"stop the world" operation, which stops GPU, changes pagetables via CPU
interaction, restarts GPU, the implementation introduces a shared context on
MMUv1, which is returned whenever there is a request for a new context.
This patch assigns a MMU context to each GPU, so on MMUv2 systems there is
still one set of pagetables per GPU, but due to the shared context MMUv1
systems see a change in behavior as now a single pagetable set is used
across all GPU cores.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
If a MMU is shared between multiple GPUs, all of them need to flush their
TLBs, so a single marker that gets reset on the first flush won't do.
Replace the flush marker with a sequence number, so that it's possible to
check if the TLB is in sync with the current page table state for each GPU.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
This allows to decouple the cmdbuf suballocator create and mapping
the region into the GPU address space. Allowing multiple AS to share
a single cmdbuf suballoc.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
This replaces the repetitive GPL-2.0 license text in code and header files
with the SPDX tags. Generated hardware headers aren't changed, as any changes
there need to be done in the upstream rnndb repository.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This was useful on MMUv1 GPUs, which don't generate proper faults,
when the GPU write caches weren't fully understood and not properly
handled by the kernel driver. As this has been fixed for quite some
time, the cycling though the MMU address space needlessly spreads
out the MMU mappings.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Using the IOMMU API to manage the internal GPU MMU has been an
historical accident and it keeps getting in the way, as well as
entangling the driver with the inner workings of the IOMMU
subsystem.
Clean this up by removing the usage of iommu_domain, which is the
last piece linking etnaviv to the IOMMU subsystem.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
And clean up the header file a bit.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
This is a preparation to remove the etnaviv dependency on the IOMMU
subsystem by importing the relevant parts of the iommu map/unamp
functions into the driver.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
There are 3 big benefits to suballocating a single big DMA buffer
for command submission:
1. Avoid hammering CMA. The old way of allocating and freeing a DMA
buffer for each submission was hitting some of the real slow
pathes in CMA, as this allocator was not designed for a concurrent
small buffers load.
2. Less TLB flushes on IOMMUv2. If a new command buffer is mapped into
the GPU address space the MMU TLBs need to be flushed. By having
one big buffer statically mapped to the GPU, a lot of those flushes
can be avoided.
3. No funky workarounds for GC3000. The FE TLB flush on GC3000 isn't
reliable. To work around that we tried to lay out the cmdbufs in
the GPU address space in a way to avoid this issue. This hasn't
always worked if the address space is crowded. A single statically
mapped buffer avoids the erratum completely.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
With MMUv2 all buffers need to be mapped through the MMU once it
is enabled. Align the buffer size to 4K, as the MMU is only able to
map page aligned buffers.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The GPU virtual address for the command buffers differs depending on
the IOMMU version. Move the calculation of the iova into etnaviv
mmu, to enable proper dispatch.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The GPU code doesn't need to deal with the IOMMU directly, instead
it can all be hidden behind the etnaviv mmu interface. Move the
last remaining part into etnaviv mmu.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This adds the etnaviv DRM driver and hooks it up in Makefiles
and Kconfig.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>