Commit Graph

43 Commits

Author SHA1 Message Date
Sushmita Susheelendra
0e08270a1f drm/msm: Separate locking of buffer resources from struct_mutex
Buffer object specific resources like pages, domains, sg list
need not be protected with struct_mutex. They can be protected
with a buffer object level lock. This simplifies locking and
makes it easier to avoid potential recursive locking scenarios
for SVM involving mmap_sem and struct_mutex. This also removes
unnecessary serialization when creating buffer objects, and also
between buffer object creation and GPU command submission.

Signed-off-by: Sushmita Susheelendra <ssusheel@codeaurora.org>
[robclark: squash in handling new locking for shrinker]
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-17 08:03:07 -04:00
Rob Clark
8432a903fb drm/msm: remove address-space id
Now that the msm_gem supports an arbitrary number of vma's, we no longer
need to assign an id (index) to each address space.  So rip out the
associated code.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16 11:16:06 -04:00
Rob Clark
8bdcd949bb drm/msm: pass address-space to _get_iova() and friends
No functional change, that will come later.  But this will make it
easier to deal with dynamically created address spaces (ie. per-
process pagetables for gpu).

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16 11:16:04 -04:00
Rob Clark
cb1e38181a drm/msm: fix locking inconsistency for gpu->hw_init()
Most, but not all, paths where calling the with struct_mutex held.  The
fast-path in msm_gem_get_iova() (plus some sub-code-paths that only run
the first time) was masking this issue.

So lets just always hold struct_mutex for hw_init().  And sprinkle some
WARN_ON()'s and might_lock() to avoid this sort of problem in the
future.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16 11:16:01 -04:00
Jordan Crouse
5770fc7a56 drm/msm: Add a struct to pass configuration to msm_gpu_init()
The amount of information that we need to pass into msm_gpu_init()
is steadily increasing, so add a new struct to stabilize the function
call and make it easier to add new configuration down the line.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16 11:16:00 -04:00
Rob Clark
134ccada7a drm/msm/gpu: check legacy clk names in get_clocks()
Otherwise if someone was using old bindings with "core_clk" instead of
"core" as the clock name, we'd never find it and gpu would be stuck at
27MHz (or whatever it's slowest rate is).

Fixes: 98db803 ("msm/drm: gpu: Dynamically locate the clocks from the device tree")
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-05-27 13:48:25 -04:00
Jordan Crouse
98db803f64 msm/drm: gpu: Dynamically locate the clocks from the device tree
Instead of using a fixed list of clock names use the clock-names
list in the device tree to discover and get the list of clocks
that we need.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-08 06:59:37 -04:00
Jordan Crouse
bf5af4ae87 drm/msm: Hard code the GPU "slow frequency"
Some A3XX and A4XX GPU targets required that the GPU clock be
programmed to a non zero value when it was disabled so
27Mhz was chosen as the "invalid" frequency.

Even though newer targets do not have the same clock restrictions
we still write 27Mhz on clock disable and expect the clock subsystem
to round down to zero.

For unknown reasons even though the slow clock speed is always
27Mhz and it isn't actually a functional level the legacy device tree
frequency tables always defined it and then did gymnastics to work
around it.

Instead of playing the same silly games just hard code the "slow" clock
speed in the code as 27MHz and save ourselves a bit of infrastructure.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-08 06:59:37 -04:00
Jordan Crouse
9873ef0743 drm/msm: Make sure to detach the MMU during GPU cleanup
We should be detaching the MMU before destroying the address
space. To do this cleanly, the detach has to happen in
adreno_gpu_cleanup() because it needs access to structs
in adreno_gpu.c.  Plus it is better symmetry to have
the attach and detach at the same code level.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-08 06:59:36 -04:00
Rob Clark
eeb754746b drm/msm/gpu: use pm-runtime
We need to use pm-runtime properly when IOMMU is using device_link() to
control it's own clocks.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-08 06:59:31 -04:00
Rob Clark
720c3bb802 drm/msm: drop _clk suffix from clk names
Suggested by Rob Herring.  We still support the old names for
compatibility with downstream android dt files.

Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Rob Herring <robh@kernel.org>
2017-02-06 11:28:42 -05:00
Jordan Crouse
b5f103ab98 drm/msm: gpu: Add A5XX target support
Add support for the A5XX family of Adreno GPUs.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-28 15:14:15 -05:00
Jordan Crouse
89d777a572 drm/msm: Remove 'src_clk' from adreno configuration
The adreno code inherited a silly workaround from downstream
from the bad old days before decent clock control. grp_clk[0]
(named 'src_clk') doesn't actually exist - it was used as a proxy
for whatever the core clock actually was (usually 'core_clk').

All targets should be able to correctly request 'core_clk' and
get the right thing back so zap the anachronism and directly
use grp_clk[0] to control the clock rate.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-28 15:14:13 -05:00
Rob Clark
78babc1633 drm/msm: convert iova to 64b
For a5xx the gpu is 64b so we need to change iova to 64b everywhere.  On
the display side, iova is still 32b so it can ignore the upper bits.
(Although all the armv8 devices have an iommu that can map 64b pa to 32b
iova.)

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-28 15:14:08 -05:00
Rob Clark
667ce33e57 drm/msm: support multiple address spaces
We can have various combinations of 64b and 32b address space, ie. 64b
CPU but 32b display and gpu, or 64b CPU and GPU but 32b display.  So
best to decouple the device iova's from mmap offset.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-27 11:23:09 -05:00
Chris Wilson
f54d186700 dma-buf: Rename struct fence to dma_fence
I plan to usurp the short name of struct fence for a core kernel struct,
and so I need to rename the specialised fence/timeline for DMA
operations to make room.

A consensus was reached in
https://lists.freedesktop.org/archives/dri-devel/2016-July/113083.html
that making clear this fence applies to DMA operations was a good thing.
Since then the patch has grown a bit as usage increases, so hopefully it
remains a good thing!

(v2...: rebase, rerun spatch)
v3: Compile on msm, spotted a manual fixup that I broke.
v4: Try again for msm, sorry Daniel

coccinelle script:
@@

@@
- struct fence
+ struct dma_fence
@@

@@
- struct fence_ops
+ struct dma_fence_ops
@@

@@
- struct fence_cb
+ struct dma_fence_cb
@@

@@
- struct fence_array
+ struct dma_fence_array
@@

@@
- enum fence_flag_bits
+ enum dma_fence_flag_bits
@@

@@
(
- fence_init
+ dma_fence_init
|
- fence_release
+ dma_fence_release
|
- fence_free
+ dma_fence_free
|
- fence_get
+ dma_fence_get
|
- fence_get_rcu
+ dma_fence_get_rcu
|
- fence_put
+ dma_fence_put
|
- fence_signal
+ dma_fence_signal
|
- fence_signal_locked
+ dma_fence_signal_locked
|
- fence_default_wait
+ dma_fence_default_wait
|
- fence_add_callback
+ dma_fence_add_callback
|
- fence_remove_callback
+ dma_fence_remove_callback
|
- fence_enable_sw_signaling
+ dma_fence_enable_sw_signaling
|
- fence_is_signaled_locked
+ dma_fence_is_signaled_locked
|
- fence_is_signaled
+ dma_fence_is_signaled
|
- fence_is_later
+ dma_fence_is_later
|
- fence_later
+ dma_fence_later
|
- fence_wait_timeout
+ dma_fence_wait_timeout
|
- fence_wait_any_timeout
+ dma_fence_wait_any_timeout
|
- fence_wait
+ dma_fence_wait
|
- fence_context_alloc
+ dma_fence_context_alloc
|
- fence_array_create
+ dma_fence_array_create
|
- to_fence_array
+ to_dma_fence_array
|
- fence_is_array
+ dma_fence_is_array
|
- trace_fence_emit
+ trace_dma_fence_emit
|
- FENCE_TRACE
+ DMA_FENCE_TRACE
|
- FENCE_WARN
+ DMA_FENCE_WARN
|
- FENCE_ERR
+ DMA_FENCE_ERR
)
 (
 ...
 )

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Acked-by: Sumit Semwal <sumit.semwal@linaro.org>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161025120045.28839-1-chris@chris-wilson.co.uk
2016-10-25 14:40:39 +02:00
Rob Clark
f44d32c79f drm/msm: move fence allocation out of msm_gpu_submit()
Prep work for next patch.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-09-15 17:43:51 -04:00
Rob Clark
4816b6267c drm/msm: print offender task name on hangcheck recovery
Track the pid per submit, so we can print the name of the task which
submitted the batch that caused the gpu to hang.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-05-08 10:22:18 -04:00
Rob Clark
40e6815bba drm/msm: fix leak in failed submit path
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-05-08 10:22:18 -04:00
Rob Clark
1193c3bcb5 drm/msm: drop return from gpu->submit()
At this point, there is nothing left to fail.  And submit already has a
fence assigned and is added to the submit_list.  Any problems from here
on out are asynchronous (ie. hangcheck/recovery).

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-05-08 10:22:18 -04:00
Rob Clark
b6295f9a38 drm/msm: 'struct fence' conversion
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-05-08 10:22:15 -04:00
Rob Clark
ca762a8ae7 drm/msm: introduce msm_fence_context
Better encapsulate the per-timeline stuff into fence-context.  For now
there is just a single fence-context, but eventually we'll also have one
per-CRTC to enable fully explicit fencing.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-05-08 10:19:51 -04:00
Rob Clark
7d12a279d4 drm/msm/gpu: simplify tracking in-flight bo's
Since we already track the array of bo's in the submit object, just
unconditionally take and drop ref's per submit (rather than only taking
ref's if bo is not already active).  This simplifies later patches.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-05-08 10:16:03 -04:00
Rob Clark
fde5de6cb4 drm/msm: move fence code to it's own file
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-05-08 10:16:02 -04:00
Stephane Viau
5e921b1926 drm/msm: Fix IOMMU clean up path in case msm_iommu_new() fails
msm_iommu_new() can fail and this change makes sure that we
detect the failure and free the allocated domain before going
any further.

Signed-off-by: Stephane Viau <sviau@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2015-10-22 15:39:54 -04:00
Rob Clark
1a370be9ac drm/msm: restart queued submits after hang
Track the list of in-flight submits.  If the gpu hangs, retire up to an
including the offending submit, and then re-submit the remainder.  This
way, for concurrently running piglit tests (for example), one failing
test doesn't cause unrelated tests to fail simply because it's submit
was queued up after one that triggered a hang.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2015-06-11 13:11:06 -04:00
Rob Clark
de558cd2ae drm/msm: adreno a306 support
As found in apq8016 (used in DragonBoard 410c) and msm8916.

Note that numerically a306 is actually 307 (since a305c already claimed
306).  Nice and confusing.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2015-06-11 13:11:01 -04:00
Rob Clark
6490ad4740 drm/msm: clarify downstream bus scaling
A few spots in the driver have support for downstream android
CONFIG_MSM_BUS_SCALING.  This is mainly to simplify backporting the
driver for various devices which do not have sufficient upstream
kernel support.  But the intentionally dead code seems to cause
some confusion.  Rename the #define to make this more clear.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2015-06-11 13:11:01 -04:00
Rob Clark
a1ad352333 drm/msm: fix potential deadlock in gpu init
Somewhere along the way, the firmware loader sprouted another lock
dependency, resulting in possible deadlock scenario:

 &dev->struct_mutex --> &sb->s_type->i_mutex_key#2 --> &mm->mmap_sem

which is problematic vs things like gem mmap.

So introduce a separate mutex to synchronize gpu init.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2014-08-04 11:55:29 -04:00
Rob Clark
944fc36c31 drm/msm: use upstream iommu
Downstream kernel IOMMU had a non-standard way of dealing with multiple
devices and multiple ports/contexts.  We don't need that on upstream
kernel, so rip out the crazy.

Note that we have to move the pinning of the ringbuffer to after the
IOMMU is attached.  No idea how that managed to work properly on the
downstream kernel.

For now, I am leaving the IOMMU port name stuff in place, to simplify
things for folks trying to backport latest drm/msm to device kernels.
Once we no longer have to care about pre-DT kernels, we can drop this
and instead backport upstream IOMMU driver.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2014-08-04 11:55:29 -04:00
Rob Clark
70c70f091b drm/msm: add perf logging debugfs
Signed-off-by: Rob Clark <robdclark@gmail.com>
2014-06-02 07:36:21 -04:00
Rob Clark
a7d3c9509b drm/msm: add rd logging debugfs
To ease debugging, add debugfs file which can be cat/tail'd to log
submits, along with fence #.  If GPU hangs, you can look at 'gpu'
debugfs file to find last completed fence and current register state,
and compare with logged rd file to narrow down the DRAW_INDX which
triggered the GPU hang.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2014-06-02 07:36:11 -04:00
Rob Clark
37d77c3ab5 drm/msm: crank down gpu when inactive
Shut down the clks when the gpu has nothing to do.  A short inactivity
timer is used to provide a low pass filter for power transitions.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2014-03-31 10:27:46 -04:00
Rob Clark
c2703b13a6 drm/msm: bigger synchronization hammer
Because we use a list_head in the bo to track it's position in a submit,
we need to serialize at a higher layer.  Otherwise there are problems
when multiple contexts are SUBMIT'ing in parallel cmdstreams referencing
a shared bo.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2014-02-07 10:26:25 -05:00
Rob Clark
871d812aa4 drm/msm: add support for non-IOMMU systems
Add a VRAM carveout that is used for systems which do not have an IOMMU.

The VRAM carveout uses CMA.  The arch code must setup a CMA pool for the
device (preferrably in highmem.. a 256m-512m VRAM pool in lowmem is not
cool).  The user can configure the VRAM pool size using msm.vram module
param.

Technically, the abstraction of IOMMU behind msm_mmu is not strictly
needed, but it simplifies the GEM code a bit, and will be useful later
when I add support for a2xx devices with GPUMMU, so I decided to keep
this part.

It appears to be possible to configure the GPU to restrict access to
addresses within the VRAM pool, but this is not done yet.  So for now
the GPU will refuse to load if there is no sort of mmu.  Once address
based limits are supported and tested to confirm that we aren't giving
the GPU access to arbitrary memory, this restriction can be lifted

Signed-off-by: Rob Clark <robdclark@gmail.com>
2014-01-09 14:38:58 -05:00
Rob Clark
bf2b33afb9 drm/msm: fix bus scaling
This got a bit broken with original patches when re-arranging things to
move dependencies on mach-msm inside #ifndef OF.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2014-01-09 14:38:58 -05:00
Rob Clark
edd4fc63a3 drm/msm: rework inactive-work
Re-arrange things a bit so that we can get work requested after a bo
fence passes, like pageflip, done before retiring bo's.  Without any
sort of bo cache in userspace, some games can trigger hundred's of
transient bo's, which can cause retire to take a long time (5-10ms).
Obviously we want a bo cache.. but this cleanup will make things a
bit easier for atomic as well and makes things a bit cleaner.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: David Brown <davidb@codeaurora.org>
2013-11-01 12:39:45 -04:00
Wei Yongjun
aea6a64c38 drm/msm: fix potential NULL pointer dereference
The dereference to 'pdata' should be moved below the NULL test.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
2013-09-12 10:32:12 -04:00
Rob Clark
6b8819c811 drm/msm: workaround for missing irq
Occasionally we seem to miss an IRQ from the ME (microengine).  I'm not
entirely sure the root cause, but for now we can unwedge things by
retiring from the hangcheck timer.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2013-09-11 17:37:48 -04:00
Rob Clark
26791c48e1 drm/msm: hangcheck harder
If gpu locks up with the rptr shortly beyond the wrap-around point in
the ringbuffer, because the rptr was not reset (but wptr is, by virtue
of resetting rb->cur), we could end up in a scenario where we think
there is not enough space in the ringbuffer for the next cmds.  And
since the CP won't reset rptr until after processing an IB, this leaves
things in a sort of deadlock.

So reset rptr too.  And a bit more spiffing up of hangcheck to make
things easier to debug.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2013-09-10 13:56:59 -04:00
Rob Clark
bf6811f304 drm/msm: handle read vs write fences
The userspace API already had everything needed to handle read vs write
synchronization.  This patch actually bothers to hook it up properly, so
that we don't need to (for example) stall on userspace read access to a
buffer that gpu is also still reading.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2013-09-10 13:56:58 -04:00
Rob Clark
bd6f82d828 drm/msm: add basic hangcheck/recovery mechanism
A basic, no-frills recovery mechanism in case the gpu gets wedged.  We
could try to be a bit more fancy and restart the next submit after the
one that got wedged, but for now keep it simple.  This is enough to
recover things if, for example, the gpu hangs mid way through a piglit
run.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2013-08-24 14:57:19 -04:00
Rob Clark
7198e6b031 drm/msm: add a3xx gpu support
Add initial support for a3xx 3d core.

So far, with hardware that I've seen to date, we can have:
 + zero, one, or two z180 2d cores
 + a3xx or a2xx 3d core, which share a common CP (the firmware
   for the CP seems to implement some different PM4 packet types
   but the basics of cmdstream submission are the same)

Which means that the eventual complete "class" hierarchy, once
support for all past and present hw is in place, becomes:
 + msm_gpu
   + adreno_gpu
     + a3xx_gpu
     + a2xx_gpu
   + z180_gpu

This commit splits out the parts that will eventually be common
between a2xx/a3xx into adreno_gpu, and the parts that are even
common to z180 into msm_gpu.

Note that there is no cmdstream validation required.  All memory access
from the GPU is via IOMMU/MMU.  So as long as you don't map silly things
to the GPU, there isn't much damage that the GPU can do.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2013-08-24 14:57:18 -04:00