There is no need to hold the GPU lock while freeing the submit
object. Only move the retired submits from the GPU active list to
a temporary retire list under the GPU lock.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
As long as there is an active submit, we want the GPU to stay awake. This
is slightly complicated by the fact that we really want to wake the GPU
at the last possible moment to achieve maximum power savings.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The active count is used to check if the BO is idle, where idle is defined
as not active on the GPU and all VM mappings and reference counts dropped
to the initial state. As the idling of the mappings and references now only
happens in the submit cleanup, the active state handling must be moved to
the same location in order to keep the userspace semantics.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Less dynamic allocations and slims down the cmdbuf object to only the
required information, as everything else is already available in the
submit object.
This also simplifies buffer and mappings lifetime management, as they
are now exlusively attached to the submit object and not additionally
to the cmdbuf.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The GPU exec state may have changed at the time when the perfmon sampling
is done, as it reflects the state of the last submission, not the current
GPU execution state.
So for proper sampling we must use the submit exec_state.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
We'll need this in some places where only the submit is available. Also
this is a first step at slimming down the cmdbuf object.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
To make them available to the event worker even after the actual
command stream execution has finished.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The object fencing has nothing to do with the actual GPU buffer submit,
so move it to the gem submit path to have a cleaner split.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Inserting the END command when suspending the GPU is changing the
command buffer state, which requires the GPU to be held.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
While the etnaviv workqueue needs to be ordered, as we rely on work items
being executed in queuing order, this is only true for a single GPU.
Having a shared workqueue for all GPUs in the system limits concurrency
artificially.
Getting each GPU its own ordered workqueue still meets our ordering
expectations and enables retire workers to run concurrently.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
There is no need to store this in the gpu struct. MMU flushes are triggered
correctly in reaction to MMU maps and unmaps, independent of the current ctx.
Any required pipe switches can be infered from the current and the desired
GPU exec state.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
There is no need to synchronize with oustanding retire jobs if the object
has gone idle. Retire jobs only ever change the object state from active to
idle, not the other way around.
The IOVA put race is uncritical, as the GEM_WAIT ioctl itself is holding
a reference to the GEM object, so the retire worker will not pull the
object into the CPU domain, which is the thing we are trying to guard
against with etnaviv_gpu_wait_obj_inactive. The ordering of the various
counts and waits may change a bit, but the userspace visible behavior at
the bounds of the syscall are unchanged.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Flush and prefetch are properly handled in the buffer code, data endianess
would need much wider changes than adding something to this single function.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
If the FE is restarted before the sync point event is cleared, the GPU
might trigger a completion IRQ for the next sync point, corrupting
the state of the currently running worker.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The etnaviv driver causes a link failure if it is built-in but THERMAL
is built as a module:
drivers/gpu/drm/etnaviv/etnaviv_gpu.o: In function `etnaviv_gpu_bind':
etnaviv_gpu.c:(.text+0x4c4): undefined reference to `thermal_of_cooling_device_register'
etnaviv_gpu.c:(.text+0x600): undefined reference to `thermal_cooling_device_unregister'
drivers/gpu/drm/etnaviv/etnaviv_gpu.o: In function `etnaviv_gpu_unbind':
etnaviv_gpu.c:(.text+0x2aac): undefined reference to `thermal_cooling_device_unregister'
Adding a Kconfig dependency on THERMAL || !THERMAL to avoid this causes
a dependency loop on x86_64:
drivers/gpu/drm/tve200/Kconfig:1:error: recursive dependency detected!
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpu/drm/tve200/Kconfig:1: symbol DRM_TVE200 depends on CMA
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
mm/Kconfig:489: symbol CMA is selected by DRM_ETNAVIV
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpu/drm/etnaviv/Kconfig:2: symbol DRM_ETNAVIV depends on THERMAL
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/thermal/Kconfig:5: symbol THERMAL is selected by ACPI_VIDEO
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/acpi/Kconfig:189: symbol ACPI_VIDEO is selected by BACKLIGHT_CLASS_DEVICE
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/video/backlight/Kconfig:158: symbol BACKLIGHT_CLASS_DEVICE is selected by DRM_PARADE_PS8622
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpu/drm/bridge/Kconfig:62: symbol DRM_PARADE_PS8622 depends on DRM_BRIDGE
For a resolution refer to Documentation/kbuild/kconfig-language.txt
subsection "Kconfig recursive dependency limitations"
drivers/gpu/drm/bridge/Kconfig:1: symbol DRM_BRIDGE is selected by DRM_TVE200
To work around this, add a new option DRM_ETNAVIV_THERMAL to optionally
enable thermal throttling support and make DRM_ETNAVIV select THERMAL
at the same time.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaCm8RAAoJEAx081l5xIa+zX0QAJSm31kCG3vdw2CNiRx25L3q
3hcsEOgAjVJ9FQVGKFWjzb8TK35tSqtNx5kWIj0VGaIfBE5Bdg5SLLgKKUYas8rY
4LaphqICq2uxu2BNa2tpiar/sHhAnuozwQ4czpVWXzlaISnb9yYzRl7gMuyUVGkx
+Gih5VUhLmQC0HsRTLJ3vaZQoUsLAl2gAjKcWa1bx57j2S+iKOPfsLaq7VYo+y1I
Njc+iSGqMhJzRLXVkxL2lQKaslp7R38Bbh5K4Kvyjkm4Aq7zErOF6irpOXKMcrGl
mwnr89vf1G9thjikrBaXpKnuvdbWYveoN/ORMlTdCfxkFnChHLnm3bd7NJ49RXDN
Hv/Iq9YYjmZ9GTatxnx7lWtmXnZXC5he1yn1JAuz/yt7/0b/Wx+Mu/wEpBXYNFTd
1AZdD586i+AmPo3yDkqH9nBu8JC0W0AnS9VZma4LVvZOP2UfJmj5Im1CLHItbGDN
FnUCkwyD/lJUUk+WgT+w/GOMJgmFHDiFFl4tFtYVVjrUirpCFVguSKG9xuv6tT8P
8iRsoP7RrcmDN9ojN2SEHwcpsAv3HnKkDv+9+GIbWnrGsSbCPq8Qm+JDSvf4h22I
K5lwNpJrcpSKI+q10L7w2xliTBwb98sJkWGA/rssomrdBOWteGZAyqFRYAVgQ+mJ
x/nJurIqQYh2KQN9+uLG
=xVV2
-----END PGP SIGNATURE-----
Merge tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux
Pull drm updates from Dave Airlie:
"This is the main drm pull request for v4.15.
Core:
- Atomic object lifetime fixes
- Atomic iterator improvements
- Sparse/smatch fixes
- Legacy kms ioctls to be interruptible
- EDID override improvements
- fb/gem helper cleanups
- Simple outreachy patches
- Documentation improvements
- Fix dma-buf rcu races
- DRM mode object leasing for improving VR use cases.
- vgaarb improvements for non-x86 platforms.
New driver:
- tve200: Faraday Technology TVE200 block.
This "TV Encoder" encodes a ITU-T BT.656 stream and can be found in
the StorLink SL3516 (later Cortina Systems CS3516) as well as the
Grain Media GM8180.
New bridges:
- SiI9234 support
New panels:
- S6E63J0X03, OTM8009A, Seiko 43WVF1G, 7" rpi touch panel, Toshiba
LT089AC19000, Innolux AT043TN24
i915:
- Remove Coffeelake from alpha support
- Cannonlake workarounds
- Infoframe refactoring for DisplayPort
- VBT updates
- DisplayPort vswing/emph/buffer translation refactoring
- CCS fixes
- Restore GPU clock boost on missed vblanks
- Scatter list updates for userptr allocations
- Gen9+ transition watermarks
- Display IPC (Isochronous Priority Control)
- Private PAT management
- GVT: improved error handling and pci config sanitizing
- Execlist refactoring
- Transparent Huge Page support
- User defined priorities support
- HuC/GuC firmware refactoring
- DP MST fixes
- eDP power sequencing fixes
- Use RCU instead of stop_machine
- PSR state tracking support
- Eviction fixes
- BDW DP aux channel timeout fixes
- LSPCON fixes
- Cannonlake PLL fixes
amdgpu:
- Per VM BO support
- Powerplay cleanups
- CI powerplay support
- PASID mgr for kfd
- SR-IOV fixes
- initial GPU reset for vega10
- Prime mmap support
- TTM updates
- Clock query interface for Raven
- Fence to handle ioctl
- UVD encode ring support on Polaris
- Transparent huge page DMA support
- Compute LRU pipe tweaks
- BO flag to allow buffers to opt out of implicit sync
- CTX priority setting API
- VRAM lost infrastructure plumbing
qxl:
- fix flicker since atomic rework
amdkfd:
- Further improvements from internal AMD tree
- Usermode events
- Drop radeon support
nouveau:
- Pascal temperature sensor support
- Improved BAR2 handling
- MMU rework to support Pascal MMU
exynos:
- Improved HDMI/mixer support
- HDMI audio interface support
tegra:
- Prep work for tegra186
- Cleanup/fixes
msm:
- Preemption support for a5xx
- Display fixes for 8x96 (snapdragon 820)
- Async cursor plane fixes
- FW loading rework
- GPU debugging improvements
vc4:
- Prep for DSI panels
- fix T-format tiling scanout
- New madvise ioctl
Rockchip:
- LVDS support
omapdrm:
- omap4 HDMI CEC support
etnaviv:
- GPU performance counters groundwork
sun4i:
- refactor driver load + TCON backend
- HDMI improvements
- A31 support
- Misc fixes
udl:
- Probe/EDID read fixes.
tilcdc:
- Misc fixes.
pl111:
- Support more variants
adv7511:
- Improve EDID handling.
- HDMI CEC support
sii8620:
- Add remote control support"
* tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux: (1480 commits)
drm/rockchip: analogix_dp: Use mutex rather than spinlock
drm/mode_object: fix documentation for object lookups.
drm/i915: Reorder context-close to avoid calling i915_vma_close() under RCU
drm/i915: Move init_clock_gating() back to where it was
drm/i915: Prune the reservation shared fence array
drm/i915: Idle the GPU before shinking everything
drm/i915: Lock llist_del_first() vs llist_del_all()
drm/i915: Calculate ironlake intermediate watermarks correctly, v2.
drm/i915: Disable lazy PPGTT page table optimization for vGPU
drm/i915/execlists: Remove the priority "optimisation"
drm/i915: Filter out spurious execlists context-switch interrupts
drm/amdgpu: use irq-safe lock for kiq->ring_lock
drm/amdgpu: bypass lru touch for KIQ ring submission
drm/amdgpu: Potential uninitialized variable in amdgpu_vm_update_directories()
drm/amdgpu: potential uninitialized variable in amdgpu_vce_ring_parse_cs()
drm/amd/powerplay: initialize a variable before using it
drm/amd/powerplay: suppress KASAN out of bounds warning in vega10_populate_all_memory_levels
drm/amd/amdgpu: fix evicted VRAM bo adjudgement condition
drm/vblank: Tune drm_crtc_accurate_vblank_count() WARN down to a debug
drm/rockchip: add CONFIG_OF dependency for lvds
...
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: David Airlie <airlied@linux.ie>
Cc: etnaviv@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Kees Cook <keescook@chromium.org>
There is no reason to wait for clock stabilization here, as the clock
framework guarantees that PLL clock sources are stable before clk_enable
returns.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
After reset assertion, we only have to wait for the reset signals to
propagate through the GPU before deasserting the reset again. A few
hundred clock cycles should be more than enough. Replace the msleep(1),
which can actually take about 30 ms on i.MX6Q in some configurations,
with an usleep_range of a few microseconds. If the delay was too short,
the FE would not be idle afterwards, and the reset would be retried.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This comment is outdated as the driver is taking care about clock
gating and the pulse eater for quite some time already.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Some performance register are debug register and they need to
be enabled in order to be functional.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
As done by Vivante kernel driver.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
With 'sync points' we can sample the reqeustes perform signals
before and/or after the submited command buffer.
Changes v2 -> v3:
- fixed indentation and init nr_events to 1
Changes v4 -> v5:
- simplify logic around fence handling.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Results in less code as the users do not set every struct member to 0/NULL.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
In order to support performance counters in a sane way we need to provide
a method to sync the GPU with the CPU. The GPU can process multpile command
buffers/events per irq. With the help of a 'sync point' we can trigger an event
and stop the GPU/FE immediately. When the CPU is done with is processing it
simply needs to restart the FE and the GPU will process the command stream.
Changes from v1 -> v2:
- process sync point with a work item to keep irq as fast as possible
Changes from v4 -> v5:
- renamed pmrs_* to sync_point_*
- call event_free(..) in sync_point_worker(..)
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This commits extends etnaviv_gpu_cmdbuf_new(..) to define the number
of struct etnaviv_perfmon elements gets used.
Changes from v1 -> v2:
- make use of goto as requested by Lucas
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This makes it possible to allocate multiple events under the event
spinlock. This change is needed to support 'sync'-points.
Changes v2 -> v3:
- wait for the completion of all events
- use 10sec timeout regardless of the number of events
- removed validation if there are enough free events
- fixed return value evaluation of event_alloc(..) in etnaviv_gpu_submit(..)
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This is prep work to be able to allocate multiple events in one go.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The reset path wants to initialize the clock control register regardless
of the DYNAMIC_FREQUENCY_SCALING feature, so don't call clock update, but
explicitly load the register.
Also disabling of the debug registers is moved into the reset function,
so we always get to the same state after a GPU reset. This means the
clock update function should not touch the bits already set in the clock
control register, but instead only update the scaling bits.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
The stub functions returns -ENODEV when trying to register the cooling device,
thus failing the GPU bind, rendering the GPU subsystem unusable when
CONFIG_THERMAL isn't enabled.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
GPU cores with the DYNAMIC_FREQUENCY_SCALING feature bit set expect the
platform to provide the clock scaling and ignore any requests to use the
internal FSCALE divider. Writes to this register still work, but don't
have any effect on the GPU clock frequency.
Save the initial core and shader clock frequency and ask the platform
to provide a slower clock when cooling is requested.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
PA clock gating can be enabled when the right bugfix bit is present.
There are broken revs of GC4000 and GC2000, which need TX clock gating
to be disabled.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJY881cAAoJEHm+PkMAQRiGG4UH+wa2z6Qet36Uc4nXFZuSMYrO
ErUWs1QpTDDv4a+LE4fgyMvM3j9XqtpfQLy1n70jfD14IqPBhHe4gytasAf+8lg1
YvddFx0Yl3sygVu3dDBNigWeVDbfwepW59coN0vI5nrMo+wrei8aVIWcFKOxdMuO
n72u9vuhrkEnLJuQk7SF+t4OQob9McXE3s7QgyRopmlKhKo7mh8On7K2BRI5uluL
t0j5kZM0a43EUT5rq9xR8f5pgtyfTMG/FO2MuzZn43MJcZcyfmnOP/cTSIvAKA5U
1i12lxlokYhURNUe+S6jm8A47TrqSRSJxaQJZRlfGJksZ0LJa8eUaLDCviBQEoE=
=6QWZ
-----END PGP SIGNATURE-----
Merge tag 'v4.11-rc7' into drm-next
Backmerge Linux 4.11-rc7 from Linus tree, to fix some
conflicts that were causing problems with the rerere cache
in drm-tip.
Add the missing unlock before return from function etnaviv_gpu_submit()
in the error handling case.
lst: fixed label name.
Fixes: f3cd1b064f ("drm/etnaviv: (re-)protect fence allocation with
GPU mutex")
CC: stable@vger.kernel.org #4.9+
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJY6mY1AAoJEHm+PkMAQRiGB14IAImsH28JPjxJVDasMIRPBxVc
euPPlZgoBieu7sNt+kEsEqdkXuu0MLk6gln0IGxWLeoB2S+u3Tz5LMa2YArVqV9Z
tWzOnI9auE73P2Pz/tUMOdyMs5tO0PolQxX3uljbULBozOHjHRh13fsXchX2yQvl
mFeFCDqpPV0KhWRH/ciA8uIHdvYPhMpkKgRtmR8jXL0yzqLp6+2J+Bs8nHG4NNng
HMVxZPC8jOE/TgWq6k/GmXgxh3H/AideFdHFbLKYnIFJW41ZGOI8a262zq3NmjPd
lywpVU7O7RMhSITY5PnuR3LpNV8ftw1hz2y6t35unyFK1P02adOSj5GJ3hGdhaQ=
=Xz5O
-----END PGP SIGNATURE-----
Backmerge tag 'v4.11-rc6' into drm-next
Linux 4.11-rc6
drm-misc needs 4.11-rc5, may as well fix conflicts with rc6.
The next patch will need the complete dma_fence, instead of just the seqno,
to create the sync_file in etnaviv_ioctl_gem_submit, in case an
out_fence_fd is requested.
The submit needs to hold a reference to the dma_fence, to avoid raceing
with the GPU completing the fence.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
---
New patch in v3.
Loosely based on commit f0a42bb542 ("drm/msm: submit support for
in-fences"). Unfortunately, struct drm_etnaviv_gem_submit doesn't have
a flags field yet, so we have to extend the structure and trust that
drm_ioctl will clear the flags for us if an older userspace only submits
part of the struct.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.com>
Reviewed-by: Sumit Semwal <sumit.semwal@linaro.org>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Each Vivante GPU contains a clock divider which can divide the GPU clock
by 2^n, which can lower the power dissipation from the GPU. It has been
suggested that the GC600 on Dove is responsible for 20-30% of the power
dissipation from the SoC, so lowering the GPU clock rate provides a way
to throttle the power dissiptation, and reduce the temperature when the
SoC gets hot.
This patch hooks the Etnaviv driver into the kernel's thermal management
to allow the GPUs to be throttled when necessary, allowing a reduction in
GPU clock rate from /1 to /64 in power of 2 steps.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The fence allocation needs to be protected by the GPU mutex, otherwise
the fence seqnos of concurrent submits might not match the insertion order
of the jobs in the kernel ring. This breaks the assumption that jobs
complete with monotonically increasing fence seqnos.
Fixes: d985349017 (drm/etnaviv: take GPU lock later in the submit process)
CC: stable@vger.kernel.org #4.9+
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
There are 3 big benefits to suballocating a single big DMA buffer
for command submission:
1. Avoid hammering CMA. The old way of allocating and freeing a DMA
buffer for each submission was hitting some of the real slow
pathes in CMA, as this allocator was not designed for a concurrent
small buffers load.
2. Less TLB flushes on IOMMUv2. If a new command buffer is mapped into
the GPU address space the MMU TLBs need to be flushed. By having
one big buffer statically mapped to the GPU, a lot of those flushes
can be avoided.
3. No funky workarounds for GC3000. The FE TLB flush on GC3000 isn't
reliable. To work around that we tried to lay out the cmdbufs in
the GPU address space in a way to avoid this issue. This hasn't
always worked if the address space is crowded. A single statically
mapped buffer avoids the erratum completely.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Don't call the IOMMU directly, but go through the new cmdbuf abstraction.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This will get more complex with the following changes, so move it
into its own place.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Set up the PULSE_EATER register (0x0010C) in etnaviv_gpu_hw_init. This
ports three mostly undocumented model/revision-specific register
overrides from the Vivante kernel driver.
This is relevant as at least the "disable internal DFS" for revisions >
0x5420 has shown to have a huge impact on shader performance (sped up
memory read performance by 7.5x and write performance by 1.5x) on an
affected GPU.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
- fix dma-buf export path to return correct SG table
- trivially implement direct dma-buf mapping
- allow DRAW_INSTANCED commands in validator
- make the driver work on i.MX6SX, yielding a working 2D/3D stack
together with Mareks MXS DRM driver
* 'drm-etnaviv-next' of git://git.pengutronix.de/lst/linux:
MAINTAINERS: add etnaviv mailinglist
drm/etnaviv: move linear window on MC1.0 parts if necessary
drm/etnaviv: don't invoke OOM killer from dump code
drm/etnaviv: fix gem_prime_get_sg_table to return new SG table
drm/etnaviv: Allow DRAW_INSTANCED commands
drm/etnaviv: implement dma-buf mmap
On i.MX6SX the physical memory is placed above the 2GB mark, so the GPU
linear window has to be moved for the GPU to work at all. This doesn't
mix with the FAST_CLEAR feature, as the TS unit doesn't take the linear
window offset into account and will corrupt memory when used with a
non-zero offset.
Move the linear window if it's necessary for the GPU to work, but avoid
announcing FAST_CLEAR support to userspace in this case.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Marek Vasut <marex@denx.de>
If we reset the GPU to get it back into a usable state we lose
all context, not just the MMU one. Mark the whole context as
lost to trigger a restore of the exec and MMU state.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
GC2000+ on the i.MX6QP is just a re-branded GC3000, lets call it by
its real name to avoid confusion in other parts of the driver.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>