From: Lucas Stach <l.stach@pengutronix.de>
"not much to de-stage this time. Changes from Philipp and Souptick to
use memset32 more and switch the fault handler to the new vm_fault_t
and two small fixes for issues that can be hit in rare corner cases
from me."
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1533563808.2809.7.camel@pengutronix.de
When the suballocator was unable to provide a suitable buffer for the MMUv1
linear window, we roll back the GPU initialization. As the GPU is runtime
resumed at that point we need to clear the kernel cmdbuf suballoc entry to
properly skip any attempt to manipulate the cmdbuf when the GPU gets shut
down in the runtime suspend later on.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Use new return type vm_fault_t for fault handler. For
now, this is just documenting that the function returns
a VM_FAULT value rather than an errno. Once all instances
are converted, vm_fault_t will become a distinct type.
Ref- commit 1c8f422059 ("mm: change return type to vm_fault_t")
Previously vm_insert_page() returns err which driver
mapped into VM_FAULT_* type. The new function
vmf_insert_page() will replace this inefficiency by
returning VM_FAULT_* type.
vmf_error() is the newly introduce inline function
in 4.17-rc6.
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The documentation of drm_sched_job_init and drm_sched_entity_push_job has
been clarified. Both functions should be called under a shared lock, to
avoid jobs getting pushed into the scheduler queue in a different order
than their sched_fence seqnos, which will confuse checks that are looking
at the seqnos to infer information about completion order.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Replace the open-coded scratch page initialization loop with memset32
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
entity has a scheduler field and we don't need the sched argument
in any of the functions where entity is provided.
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
replace run queue by a list of run queues and remove the
sched arg as that is part of run queue itself
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
More features for 4.19:
- Use core pcie functionality rather than duplicating our own for pcie
gens and lanes
- Scheduler function naming cleanups
- More documentation
- Reworked DC/Powerplay interfaces to improve power savings
- Initial stutter mode support for RV (power feature)
- Vega12 powerplay updates
- GFXOFF fixes
- Misc fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180705221447.2807-1-alexander.deucher@amd.com
Everything in the flush code path (i.e. waiting for SW queue
to become empty) names with *_flush()
and everything in the release code path names *_fini()
This patch also effect the amdgpu and etnaviv drivers which
use those functions.
v2:
Also pplay the change to vd3.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When the hangcheck handler was replaced by the DRM scheduler timeout
handling we dropped the forward progress check, as this might allow
clients to hog the GPU for a long time with a big job.
It turns out that even reasonably well behaved clients like the
Armada Xorg driver occasionally trip over the 500ms timeout. Bring
back the forward progress check to get rid of the userspace regression.
We would still like to fix userspace to submit smaller batches
if possible, but that is for another day.
Cc: <stable@vger.kernel.org>
Fixes: 6d7a20c077 (drm/etnaviv: replace hangcheck with scheduler timeout)
Reported-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
dma_fence_default_wait is the default now, same for the trivial
enable_signaling implementation.
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: etnaviv@lists.freedesktop.org
Link: https://patchwork.freedesktop.org/patch/msgid/20180503142603.28513-8-daniel.vetter@ffwll.ch
Russell King reported:
"When removing and reloading the etnaviv module, the following splat
occurs:
sysfs: cannot create duplicate filename '/devices/platform/etnaviv'
CPU: 0 PID: 1471 Comm: modprobe Not tainted 4.17.0+ #1608
Hardware name: Marvell Dove (Cubox)
Backtrace:
[<c00157d4>] (dump_backtrace) from [<c0015b8c>] (show_stack+0x18/0x1c)
r6:ef033e38 r5:ee07b340 r4:edb9d000 r3:00000000
[<c0015b74>] (show_stack) from [<c0620784>] (dump_stack+0x20/0x28)
[<c0620764>] (dump_stack) from [<c01bcd24>] (sysfs_warn_dup+0x5c/0x70)
[<c01bccc8>] (sysfs_warn_dup) from [<c01bce14>] (sysfs_create_dir_ns+0x90/0x98)
..."
Commit 246774d17f ("drm/etnaviv: remove the need for a gpu-subsystem
DT node") introduced DRM registration via
platform_device_register_simple(), but missed to call
platform_device_unregister() inside etnaviv_exit().
Fix the problem by calling platform_device_unregister() inside
etnaviv_exit(). While at it, also rearrange the function calls
in the exit path to make them happen in the opposite order of
registration.
Tested on a imx6-sabresd board.
Cc: <stable@vger.kernel.org>
Fixes: 246774d17f ("drm/etnaviv: remove the need for a gpu-subsystem DT node")
Reported-by: Russell King <linux@armlinux.org.uk>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
platform_device_register_simple() may fail, so we should better
check its return value and propagate it in the case of error.
Cc: <stable@vger.kernel.org>
Fixes: 246774d17f ("drm/etnaviv: remove the need for a gpu-subsystem DT node")
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
So what we have for this cycle is a bit of spring cleaning with removal
of unused register logging code and getting rid of the license text in
favor of SPDX, a few smaller MMU handling improvements and a timeout
calculation change, fixing premature fence wait timeouts after 50 days
of uptime.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1526652437.28565.2.camel@pengutronix.de
This replaces the repetitive GPL-2.0 license text in code and header files
with the SPDX tags. Generated hardware headers aren't changed, as any changes
there need to be done in the upstream rnndb repository.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
MMUv2 supports up to 40 bits of physical address by folding the upper
8 bits into bits [4:11] of the PTE.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
With etnaviv not being tied into the IOMMU framework anymore, the MMU
functions will only be called under sleeping locks. Thus we are able
to allocate the memory for the 2nd level page tables on demand without
having to deal with memory allocation in atomic context.
This speeds up driver intitialization on MMUv2 GPU cores, as we don't
need to preallocate all the page table memory and also reduces memory
consumption for most workloads, as most of them won't use the full
GPU virtual address space.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
We are likely to write multiple page entries at once and already ensure
proper write buffer flushing before GPU submit, so this improves CPU
time usage in the submit path without any downsides.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
I'm not aware of any case where tracing GPU register manipulation at the
kernel level would have been useful. It only adds more indirections and
adds to the code size.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
This was useful on MMUv1 GPUs, which don't generate proper faults,
when the GPU write caches weren't fully understood and not properly
handled by the kernel driver. As this has been fixed for quite some
time, the cycling though the MMU address space needlessly spreads
out the MMU mappings.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The old way did clamp the jiffy conversion and thus caused the timeouts
to become negative after some time. Also it didn't work with userspace
which actually fills the upper 32bits of the 64bit timestamp value.
clock_gettime() is 32-bit on 32-bit architectures. Using 64-bit timespec
math, like we do in this commit, means that when a wrap occurs, the
specified timeout goes into the past and we can't request a timeout in
the future. As the Linux implementation of CLOCK_MONOTONIC is reasonable
and starts at 0, the first such timer wrap will occur after approx. 68
years of system uptime.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
this patch also effect the amdgpu and etnaviv drivers which
use the function drm_sched_entity_init
Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The current limit of 2 leads to some GPU idle times, as the usual
IRQ latency leads to up to 3 jobs getting signaled at once with some
standard workloads.
A larger HW job limit might lead to slightly worse QoS, but we accept
that to not sacrifice GPU throughput in the common case.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
etnaviv_sched_dependency() and etnaviv_sched_run_job() are only
used in this file, so make them static.
This fixes the following sparse warnings:
drivers/gpu/drm/etnaviv/etnaviv_sched.c:30:18: warning: symbol 'etnaviv_sched_dependency' was not declared. Should it be static?
drivers/gpu/drm/etnaviv/etnaviv_sched.c:81:18: warning: symbol 'etnaviv_sched_run_job' was not declared. Should it be static?
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The Page Table Array is a new first level structure above the MTLB
availabale on GPUs with the security feature. Use the PTa to set up
the MMU when the security related states are handled by the kernel driver.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
On GPUs with the security feature the MTLB config is stored in the PTA.
Add a function to trigger the initial PTA load through the FE.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
GPUs with support for the security features need some additional
setup to get the frontend started.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
With the introduction of GPU security we have 3 different modes of
GPU operation:
- GPU core doesn't have security features -> no handling required
- the security related states are handled by the kernel driver
- the security related states are handled by a TrustZone application
Add a enum to differentiate between the different operation modes.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
New versions of the Vivante kernel driver don't trust the hardware feature
bits anymore, but use an internal hardware database. This also includes
more feature fields than are available in hardware.
As we can't trust the hardware feature bits to be correct anymore, we need
to replicate the HWDB in etanviv. For now only the GC7000L as found on
the i.MX8M is supported.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Update the state HI and common header from rnndb commit
8478eef32fd9 (rnndb: document secure GPU reset bit).
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The slave interface clock is a clock input found on newer cores to gate
the register interface. For now we simply ungate it when the GPU is in
active state.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Split out the fault dumping, as this will get more complex in the future.
Also there is no need to read and dump the fault address from MMUs that
didn't signal a fault.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The module autoloading can be triggered through the GPU core nodes
and the necessary platform device for the DRM toplevel device will
be instantiated on module init.
Suggested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Rob Herring <robh@kernel.org>
This replaces the etnaviv internal hangcheck logic with the job timeout
handling provided by the DRM scheduler. This simplifies the driver further
and allows to replay jobs after a GPU reset, so only minimal state is lost.
This introduces a user-visible change in that we don't allow jobs to run
indefinitely as long as they make progress anymore, as this introduces
quality of service issues when multiple processes are using the GPU.
Userspace is now responsible to flush jobs in a way that the finish in a
reasonable time, where reasonable is currently defined as less than 500ms.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Populating objects, adding them to the GPU VM and patching/validating
the command stream might take a lot of CPU time. There is no reason to
hold all object reservations during that time.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Move the fence dependency handling to the scheduler where it belongs.
Jobs with unsignaled dependencies just get to sit in the scheduler queue
without holding any locks.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This hooks in the DRM GPU scheduler. No improvement yet, as all the
dependency handling is still done in etnaviv_gem_submit. This just
replaces the actual GPU submit by passing through the scheduler.
Allows to get rid of the retire worker, as this is now driven by the
scheduler.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This moves away from using the internal seqno as the userspace fence
reference. By moving to a generic ID, we can later replace the internal
fence by something different than the etnaviv seqno fence.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Some architecture ports like ARC don't provide the PHYS_OFFSET symbol.
Define it to 0 in that case, which is the most conservative default in
the usage context of the etnaviv driver.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Fixes the following sparse warnings:
drivers/gpu/drm/etnaviv/etnaviv_iommu.c:161:39: warning:
symbol 'etnaviv_iommuv1_ops' was not declared. Should it be static?
drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c:239:39: warning:
symbol 'etnaviv_iommuv2_ops' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
There is no need to hold the GPU lock while freeing the submit
object. Only move the retired submits from the GPU active list to
a temporary retire list under the GPU lock.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Now that the PMR lifetime issues are solved we can safely re-enable
performance counter profiling support.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
As long as there is an active submit, we want the GPU to stay awake. This
is slightly complicated by the fact that we really want to wake the GPU
at the last possible moment to achieve maximum power savings.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The active count is used to check if the BO is idle, where idle is defined
as not active on the GPU and all VM mappings and reference counts dropped
to the initial state. As the idling of the mappings and references now only
happens in the submit cleanup, the active state handling must be moved to
the same location in order to keep the userspace semantics.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Less dynamic allocations and slims down the cmdbuf object to only the
required information, as everything else is already available in the
submit object.
This also simplifies buffer and mappings lifetime management, as they
are now exlusively attached to the submit object and not additionally
to the cmdbuf.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>