Adds support for:
- 64KiB/2MiB big page sizes (128KiB not supported by HW with new PT layout).
- System-memory PTs.
- LPTE "invalid" state.
- (Tegra) Use of video memory aperture.
- Sparse PDEs/PTEs.
- Additional blocklinear kinds.
- 49-bit address-space.
GP100 supports an entirely new 5-level page table layout that provides
an expanded 49-bit address-space. It also supports the layout present
on previous generations, which we've been making do with until now.
This commit implements support for the new layout, and enables it by
default.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Adds support for:
- 64KiB big page size.
- System-memory PTs.
- LPTE "invalid" state.
- (Tegra) Use of video memory aperture.
Adds support for marking LPTEs invalid, resulting in the corresponding
SPTEs being ignored, which is supposed to speed up TLB invalidates.
On The Tegra side, this will switch to using the video memory aperture
for all mappings. The HW will still target non-coherent system memory,
but this aperture needs to be selected in order to support compression.
Tegra's instmem backend somewhat cheated to get this effect previously.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is the common code to support a rework of the VMM backends.
It adds support for more than 2 levels of page table nesting, which
is required to be able to support GP100's MMU layout.
Sparse mappings (that don't cause MMU faults when accessed) are now
supported, where the backend provides it.
Dual-PT handling had to become more sophisticated to support sparse,
but this also allows us to support an optimisation the MMU provides
on GK104 and newer.
Certain operations can now be combined into a single page tree walk
to avoid some overhead, but also enables optimsations like skipping
PTE unmap writes when the PT will be destroyed anyway.
The old backend has been hacked up to forward requests onto the new
backend, if present, so that it's possible to bisect between issues
in the backend changes vs the upcoming frontend changes.
Until the new frontend has been merged, new backends will leak BAR2
page tables on module unload. This is expected, and it's not worth
the effort of hacking around this as it doesn't effect runtime.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
To avoid wasting compression tags when using 64KiB pages, we need to
enable this so we can select between upper/lower comptagline in PTEs.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
If NV_PFB_MMU_CTRL_USE_FULL_COMP_TAG_LINE is TRUE, then the last bit of
NV_MMU_PTE_COMPTAGLINE is re-purposed to select the upper/lower half of
a compression tag when using 64KiB big pages.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We previously required each VMM user to allocate their own page directory
and fill in the instance block themselves.
It makes more sense to handle this in a common location.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Adds support for:
- Selection of old/new-style page table layout (GP100MmuLayout=0/1).
- System-memory PDs.
New layout disabled by default for the moment, as we don't have a
backend that can handle it yet.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is the first chunk of the new VMM code that provides the structures
needed to describe a GPU virtual address-space layout, as well as common
interfaces to handle VMM creation, and connecting instances to a VMM.
The constructor now allocates the PD itself, rather than having the user
handle that manually. This won't/can't be used until after all backends
have been ported to these interfaces, so a little bit of memory will be
wasted on Fermi and newer for a couple of commits in the series.
Compatibility has been hacked into the old code to allow each GPU backend
to be ported individually.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GP100 "big" (which is a funny name, when it supports "even bigger") page
tables are small enough that we want to be able to suballocate them from
a larger block of memory.
This builds on the previous page table cache interfaces so that the VMM
code doesn't need to know the difference.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Builds up and maintains a small cache of each page table size in order
to reduce the frequency of expensive allocations, particularly in the
pathological case where an address range ping-pongs between allocated
and free.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Removes the need to expose internals outside of MMU, and GP100 is both
different, and a lot harder to deal with.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This will cause a subtle behaviour change on GPUs that are in mixed-memory
configurations in that VRAM in the degraded section of VRAM will no longer
be used for TTM buffer objects.
That section of VRAM is not meant to be used for displayable/compressed
surfaces, and we have no reliable way with the current interfaces to be
able to make that decision properly.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Another transition step to allow finer-grained patches transitioning to
new MMU backends.
Old backends will continue operate as before (accessing nvkm_mem::tag),
and new backends will get a reference to the tags allocated here.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Upcoming MMU changes use nvkm_memory as its basic representation of memory,
so we need to be able to allocate VRAM like this.
The code is basically identical to the current chipset-specific allocators,
minus support for compression tags (which will be handled elsewhere anyway).
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Adds support for 64-bit writes, and optimised filling of buffers with
fixed 32/64-bit values.
These will all be used by the upcoming MMU changes.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We need to be able to prevent memory from being freed while it's still
mapped in a GPU's address-space.
Will be used by upcoming MMU changes.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Needed by VMM code to determine whether an allocation is compatible with
a given page size (ie. you can't map 4KiB system memory pages into 64KiB
GPU pages).
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Map flags (access, kind, etc) are currently defined in either the VMA,
or the memory object, which turns out to not be ideal for things like
suballocated buffers, etc.
These will become per-map flags instead, so we need to support passing
these arguments in nvkm_memory_map().
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
nvkm_memory is going to be used by the upcoming mmu rework for the basic
representation of a memory allocation, as such, this commit adds support
for comptag allocation to nvkm_memory.
This is very simple for now, in that it requires comptags for the entire
memory allocation even if only certain ranges are compressed.
Support for tracking ranges will be added at a later date.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We're moving towards having a central place to handle comptag allocation,
and as some GPUs don't have a ram submodule (ie. Tegra), we need to move
the mm somewhere else.
It probably never belonged in ram anyways.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Different sections of VRAM may have different properties (ie. can't be used
for compression/display, can't be mapped, etc).
We currently already support this, but it's a bit magic. This change makes
it more obvious where we're allocating from.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
TTM memory allocations will be hanging off the DRM's client, but the
locking needed to do so gets really tricky with all the other use of
the DRM's object tree.
To solve this, we make the normal DRM client a child of a new master,
where the memory allocations will be done from instead.
This also solves a potential race with client creation.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
We don't really care about where the memory is, just that it's compatible
with a VMA allocated for a given page size.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Before: "imem: init completed in 299277us"
After: "imem: init completed in 11574us"
Suspend from Fedora 26 gnome desktop on GP102.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Before: "imem: suspend completed in 5540487us"
After: "imem: suspend completed in 1871526us"
Suspend from Fedora 26 gnome desktop on GP102.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
A good deal of the structures we map into here aren't accessed very often
at all, and Fedora 26 has exposed an issue where after creating a heap of
channels, BAR2 space would run out, and we'd need to make use of the slow
path while accessing important structures like page tables.
This implements an LRU on BAR2 space, which allows eviction of mappings
that aren't currently needed, to make space for other objects.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Another piece of solving the "GP100 BAR2 VMM bootstrap" puzzle.
Without doing this, we'd attempt to write PDEs for the lower page table
levels through BAR2 before BAR2 access has been fully initialised.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is not as simple as it was for earlier GPUs, due to the need to swap
accessor functions depending on whether BAR2 is usable or not.
We were previously protected by nvkm_instobj's accessor functions keeping
an object mapped permanently, with some unclear magic that managed to hit
the slow-path where needed even if an object was marked as mapped.
That's been replaced here by reference counting maps (some objects, like
page tables can be accessed concurrently), and swapping the functions as
necessary.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is to simplify upcoming changes. The slow-path is something that
currently occurs during bootstrap of the BAR2 VMM, while backing up an
object during suspend/resume, or when BAR2 address space runs out.
The latter is a real problem that can happen at runtime, and occurs in
Fedora 26 already (due to some change that causes a lot of channels to
be created at login), so ideally we'd prefer not to make it any slower.
We'd also like suspend/resume speed to not suffer.
Upcoming commits will solve those problems in a better way, making the
extra overhead of moving the locking here a non-issue.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The accessor functions can change as a result of acquire()/release() calls,
and are protected by any refcounting done there.
Other functions must remain constant, as they can be called any time.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Discovered by accident while working to use BAR2 access to instmem objects
on more paths.
We've apparently been relying on luck up until now!
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
GP100's page table nests a lot more deeply than the GF100-compatible
layout we're currently using, which means our hackish-but-simple way
of dealing with BAR2 VMM teardown won't work anymore.
In order to sanely handle the chicken-and-egg (BAR2's PTs get mapped
into themselves) problem, we need prevent page tables getting mapped
back into BAR2 during the destruction of its VMM.
To do this, we simply key off the state that's now maintained by the
BAR2 init/fini functions.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Upcoming changes will remove the nvkm_vmm pointer from nvkm_vma, instead
requiring it to be explicitly specified on each operation.
It's not currently possible to get this information for BAR1 mappings,
so let's fix that ahead of time.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Will prevent spurious MMU fault interrupts if something decides to touch
BAR1 after we've unloaded the driver.
Exposed external to BAR so that INSTMEM can use it to better control the
suspend/resume fast-path access.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
If we want to be able to hit the instmem fast-path in a few trickier cases,
we need to be more flexible with when we can initialise BAR2 access.
There's probably a decent case to be made for merging BAR/INSTMEM into BUS,
but that's something to ponder another day.
Flushes have been added after the write to bind the instance block,
as later commits will reveal the need for them.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Will prevent spurious MMU fault interrupts if something decides to touch
BAR1 after we've unloaded the driver.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
BAR2 being done for practical reasons, this is just for consistency.
Flushes have been added after the write to bind the instance block,
as later commits will reveal the need for them.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
NVIDIA call it BAR2, Linux APIs treat it as BAR3 due to BAR1 being a
64-bit BAR, which I presume take two slots or something.
No actual code changes here, just to make future commits less messy.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Will already be done by MMU as a result of the PT writes that occur
during BAR2 bootstrapping.
This is likely just a left-over from the days when it was hardcoded.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
RM appears to do this really early in its initialisation, before DEVINIT.
We currently do this before BAR2 initialisation for some reason.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
MMU will be needing this to specify kind info on BAR mappings.
We have no userspace currently using these interfaces, so break the ABI
instead of supporting both. NVIF version bump so any future use can be
guarded.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The correct thing to do on OOM is to return 0 and set mm_node to NULL,
otherwise TTM will assume some other kind of error, and not attempt to
evict other buffers to make space.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This was already done in dcb.c inside nvkm, but the other parser did not
get the update.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Using the ARRAY_SIZE macro improves the readability of the code. Also,
it is useless to re-invent it.
Found with Coccinelle with the following semantic patch:
@r depends on (org || report)@
type T;
T[] E;
position p;
@@
(
(sizeof(E)@p /sizeof(*E))
|
(sizeof(E)@p /sizeof(E[...]))
|
(sizeof(E)@p /sizeof(T))
)
Reviewed-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
nouveau supports the Tegra K1 and higher after the SoC-based GPUs converged
with the main GeForce GPU families.
v2:
- Qualify that support is Tegra K1+ (Martin Peres)
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Martin Peres <martin.peres@free.fr>
Acked-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
v2:
- add nv138 and drop nv13b chipsets (Ilia Mirkin)
- refactor out status variable and instead mask tsensor (Ilia Mirkin)
- switch SHADOWed state message away from nvkm_error() (Ilia Mirkin)
- rename internal temperature variable (Karol Herbst)
v3:
- use nvkm_trace() for SHADOWed state message (Ben Skeggs)
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The fan control mode can either be FDO_PWM_MODE_STATIC or FDO_PWM_MODE_STATIC_RPM.
Setting it as AMD_FAN_CTRL_AUTO will cause the fan spin faster wrongly.
This can be reproduced by:
'# cat /sys/class/hwmon/hwmon0/pwm1
38
'# cat /sys/class/hwmon/hwmon0/pwm1_enable
2
'# echo "2" > /sys/class/hwmon/hwmon0/pwm1_enable
'# cat /sys/class/hwmon/hwmon0/pwm1
122
The fan speed get faster wrongly even with its original mode echo back.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
otherwise PF & VF exchange is broken
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZ9kEFAAoJEHm+PkMAQRiGw6wH/0j197qyGd0hkVFMJO6LAgN3
KQWS4nZ5BkVDocwv0RVnUJTtXqU1eozFgdVEtSoaFXpzlHGuptR2Tau9efDCJ7w3
/utZxqvhGebZd2T+j+/o/LE8BRQxhADBNJq2D/o0WNt8ecxuG0GIkhkEYt/o3z1v
/sxlwVwzXB7Dc/h1WcgGJG7cS6L9KzzAzGAS/iNvdFrPOygHBv8c0MxVZIiBIeeK
1nZdyvbyM8uenSyG+prGt9ENrqXZxxfwUxIchi2V7A9m1WmD5zijNkf1JCWji/O+
UsA1auxna7MwoxjxqZuGm4MlKOwZ+8xutk4JGgc+aP/ulndJbJYu+4op/3vaFBM=
=Mhx+
-----END PGP SIGNATURE-----
Backmerge tag 'v4.14-rc7' into drm-next
Linux 4.14-rc7
Requested by Ben Skeggs for nouveau to avoid major conflicts,
and things were getting a bit conflicty already, esp around amdgpu
reverts.
+ preemption support for a5xx[1][2]
+ display fixes for 8x96 (snapdragon 820) including fixes for 4k scanout
(hwpipe assignment re-work to handle multiple hwpipe assigned to plane
for wide scanout)
+ async cursor plane updates and fixes
+ refactor adreno_bind/hwinit.. still defer fw loading until device open,
but move clk/irq/etc to probe/bind time to fix issues when fw isn't
present in filesys
+ clk/dt bindings cleanups w/ backward compat via msm_clk_get() (dt docs
part ack'ed by Rob Herring)
+ fw loading re-work with helper to handle either /lib/firmware/qcom/$fw
or /lib/firmware/$fw.. background, we've started landing fw for some of
generations in linux-firmware, but there is a preference to put fw files
under 'qcom' subdirectory, which is not what was done on android or for
people who copied fw from android. So now we first look in qcom subdir
and then fallback to the original location.
+ bunch of GPU debugging enhancements, to dump full cmdline of processes
that trigger faults, and to add a new debugfs to capture cmdstream of
just submits that triggered faults.. both quite useful for piglit ;-)
* tag 'drm-msm-next-2017-11-01' of git://people.freedesktop.org/~robclark/linux: (38 commits)
drm/msm: use %z format modifier for printing size_t
drm/msm/mdp5: Don't use async plane update path if plane visibility changes
drm/msm/mdp5: mdp5_crtc: Restore cursor state only if LM cursors are enabled
drm/msm/mdp5: Update mdp5_pipe_assign to spit out both planes
drm/msm/mdp5: Prepare mdp5_pipe_assign for some rework
drm/msm: remove mdp5_cursor_plane_funcs
drm/msm: update cursors asynchronously through atomic
drm/msm/atomic: switch to drm_atomic_helper_check
drm/msm/mdp5: restore cursor state when enabling crtc
drm/msm/mdp5: don't use autosuspend
drm/msm/mdp5: ignore planes that are not visible
drm/msm: dump submits which triggered gpu hang
drm/msm: preserve IOVAs in submit's bo table
drm/msm/rd: allow adding addition msg to top of dump
drm/msm: split rd debugfs file
drm/msm: add special _get_vaddr_active() for cmdstream dumps
drm/msm: show task cmdline in gpu recovery messages
drm/msm: dump a rd GPUADDR header for all buffers in the command
drm/msm: Removed unused struct_mutex_task
drm/msm: Implement preemption for A5XX targets
...
map_queues_cpsch uses the queue_count to decide whether to upload
a new runlist. So update the counter before calling it.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Process registration needs to happen on each device. So use per-device
queue lists to determine when to register/deregister the process.
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Take the dbgmgr lock and unregister before destroying the debug manager.
Do this before destroying the queues.
v2: Correct locking order in kfd_ioctl_dbg_register to ake sure the
process mutex and dbgmgr mutex are always taken in the same order.
Signed-off-by: Yair Shachar <yair.shachar@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
When kfd suspending on APU, we do not need to call
amd_iommu_unbind_pasid(), because pasid will be unbound automatically
when power goes off.
On the other hand, calling amd_iommu_unbind_pasid() will trigger
kfd_process_iommu_unbind_callback() if the process is not terminating.
By design, kfd_process_iommu_unbind_callback() should only be called
for process terminating. So we would rather not to call
amd_iommu_unbind_pasid() when suspending.
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The MQD represents an inactive context and should not have ring or
doorbell enable bits set. Doing so interferes with HWS which streams
the MQD onto the HQD. If enable bits are set this activates the ring
or doorbell before the HQD is fully configured.
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
A list of per-process queues is maintained in the
kfd_process_queue_manager, so the queues array in kfd_process is
redundant and in fact unused.
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
The return type of ARRAY_SIZE() is size_t, so we have to use
%zu instead of %lu to avoid this warning:
drivers/gpu/drm/msm/msm_gpu.c: In function 'msm_gpu_init':
drivers/gpu/drm/msm/msm_gpu.c:742:31: error: format '%lu' expects argument of type 'long unsigned int', but argument 7 has type 'unsigned int' [-Werror=format=]
The warning it otherwise harmless as size_t is always the
same size as unsigned long in all supported architectures,
but gcc doesn't know that.
Fixes: c2fceabca6d5 ("drm/msm: Support multiple ringbuffers")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rob Clark <robdclark@gmail.com>
This patch fixes the following soft lockup:
BUG: soft lockup - CPU#0 stuck for 23s! [weston:307]
On weston idle-timeout the IP is powered down and reset
asserted. On weston resume we get a massive vblank
IRQ storm due to the LDI registers having lost some state.
This state loss is caused by ade_crtc_atomic_begin() not
calling ade_ldi_set_mode(). With this patch applied
resuming from Weston idle-timeout works well.
Signed-off-by: Peter Griffin <peter.griffin@linaro.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Cc: stable@vger.kernel.org
Reviewed-by: Xinliang Liu <xinliang.liu@linaro.org>
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
The function for byteswapping the data send to/from atombios was buggy for
num_bytes not divisible by four. The function must be aware of the fact
that after byte-swapping the u32 units, valid bytes might end up after the
num_bytes boundary.
This patch was tested on kernel 3.12 and allowed us to sucesfully use
DisplayPort on and Radeon SI card. Namely it fixed the link training and
EDID readout.
The function is patched both in radeon and amd drivers, since the functions
and the fixes are identical.
Signed-off-by: Roman Kapl <rka@sysgo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
after individualize we need manually call reservation_object_fini()
if all fences on resv signaled during test, otherwise kmemory leak
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The bo structure is freed up in case of an error, so we can't do any
accounting if that happens.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
CC: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When the mutex is locked just in the moment we copy it we end up with a
warning that we release a locked mutex.
Fix this by properly reinitializing the mutex.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To acquire all modeset locks requires a ww_ctx to be allocated. As this
is the legacy path and the allocation small, to reduce the changes
required (and complex untested error handling) to the legacy drivers, we
simply assume that the allocation succeeds. At present, it relies on the
too-small-to-fail rule, but syzbot found that by injecting a failure
here we would hit the WARN. Document that this allocation must succeed
with __GFP_NOFAIL.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171031115535.15166-1-chris@chris-wilson.co.uk
To quote Felix: "For testing KV with current user mode stack, please use
amdgpu. I don't expect this to work with radeon and I'm not planning to
spend any effort on making radeon work with a current user mode stack."
Only compile tested, but should be straight forward.
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
When a plane moves out of bounds (i.e, outside the crtc clip region), the
plane state's "visible" parameter changes to false. When this happens, we
(a) release the hwpipe resources away from it, and
(b) unstage the corresponding hwpipe(s) from the Layer Mixers in the CRTC.
(a) requires use to acquire the global atomic state and assign a new
hwpipe. (b) requires us to re-configure the Layer Mixer, which is done in
the CRTC. We don't want to do these things in the async plane update path,
so return an error if the new state's "visible" isn't the same as the
current state's "visible".
Cc: Gustavo Padovan <gustavo.padovan@collabora.com>
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
MDP5 on newer SoCs support cursor planes (i.e, cursor SSPPs). They are a
separate entity unlike the cursors within LM.
Do not try to restore the MDP5 LM cursor registers, or the corresponding
CTL bits if we are not using LM cursors.
Also, since we've introduced a new variable 'lm_cursor_enabled', we can
now use it to avoid creating a different sets of crtc_funcs for CRTCs
with LM cursors and CRTCs with cursor planes.
Fixes: "drm/msm/mdp5: restore cursor state when enabling crtc"
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
We currently call mdp5_pipe_assign() twice to assign the left and right
hwpipes for our drm_plane. When merging 2 hwpipes, there are a few
constraints that we need to keep in mind:
- Only the same types of SSPPs are preferred. I.e, a RGB pipe should
be paired with another RGB pipe, VIG with VIG etc.
- The hwpipe staged on the left should have a higher priority than
the hwpipe staged on the right. The priorities are as follows:
VIG0 > VIG1 > VIG2 > VIG3
RGB0 > RGB1 > RGB2 > RGB3
DMA0 > DMA1
We can't apply these constraints easily if mdp5_pipe_assign() is
called twice. Update mdp5_pipe_assign() to find both hwpipes in
one go, and add the extra constraints needed.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
mdp5_pipe_assign currently returns the hwpipe pointer for the drm_plane.
Return it indirectly by setting a pointer passed as an argument. This
is needed because we want the func to find out the right hwpipe too.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
After converting legacy cursor updates to atomic async commits
mdp5_cursor_plane_funcs just duplicates mdp5_plane_funcs now.
Cc: Rob Clark <robdclark@gmail.com>
Cc: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com>
Tested-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add support to async updates of cursors by using the new atomic
interface for that. Basically what this commit does is do what
mdp5_update_cursor_plane_legacy() did but through atomic.
v5: call drm_atomic_helper_async_check() from the check hook
v4: add missing atomic async commit call to msm_atomic_commit(Archit Taneja)
v3: move size checks back to drivers (Ville Syrjälä)
v2: move fb setting to core and use new state (Eric Anholt)
Cc: Rob Clark <robdclark@gmail.com>
Cc: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com>
Tested-by: Archit Taneja <architt@codeaurora.org> (v4)
[added comment about not hitting async update path if hwpipes are
re-assigned or global state is touched]
Signed-off-by: Rob Clark <robdclark@gmail.com>
Since we enabled runtime PM, we cannot count on cursor registers to
retain their values. This can result in situations where we think the
cursor is enabled when we enable the CRTC but it is trying to scan out
null (and the rest of cursor position/size is lost), resulting in faults
and generally angering the hw when coming out of DPMS with a cursor
enabled.
stable backport note: reverting 774e39ee35 is also a suitable fix
Fixes: 774e39ee35 drm/msm/mdp5: Set up runtime PM for MDSS
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Archit Taneja <architt@codeaurora.org>
It's only likely to paper over bugs. Unlike the gpu, where we want to
keep things alive a bit longer in expectation of the next frame's
submit, when the display is shut down we can power off immediately.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Archit Taneja <architt@codeaurora.org>
Note we need to move update_fences() to after msm_rd_dump_submit(),
otherwise the bo's referenced by the submit may no longer be valid.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We need this if we want to dump the submit after cleanup (ie. from hang
or fault). But in the backoff/unpin case we want to clear them. So add
a flag so we can skip clearing the IOVAs in at cleanup.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Split into two instances, the existing $debugfs/rd which continues to
dump all submits, and $debugfs/hangrd which will be used to dump just
submits that cause gpu hangs (and eventually faults, but that will
require some iommu framework enhancements).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Prep work for adding a debugfs file that dumps just submits which
trigger hangs/faults. In this case the bo may already be in the
MADV_DONTNEED state, but will be still on the active list (since
the submit hasn't completed yet). So the normal check that the
bo is in the WILLNEED state does not apply. (But of course the bo
should definitely not be in the PURGED state!)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Now that freedreno gallium driver defaults to using submit_queue task
(render reordering), just showing task->comm is not so useful (ie. it is
always "flush_queue:0"), so also dump the cmdline. This should also be
more useful for piglit/shader_runner.
Signed-off-by: Rob Clark <robdclark@gmail.com>