Commit Graph

73 Commits

Author SHA1 Message Date
Bjorn Andersson
79d57bf6fa drm/msm: Trigger fence completion from GPU
Interrupt commands causes the CP to trigger an interrupt as the command
is processed, regardless of the GPU being done processing previous
commands. This is seen by the interrupt being delivered before the
fence is written on 8974 and is likely the cause of the additional
CP_WAIT_FOR_IDLE workaround found for a306, which would cause the CP to
wait for the GPU to go idle before triggering the interrupt.

Instead we can set the (undocumented) BIT(31) of the CACHE_FLUSH_TS
which will cause a special CACHE_FLUSH_TS interrupt to be triggered from
the GPU as the write event is processed.

Add CACHE_FLUSH_TS to the IRQ masks of A3xx and A4xx and remove the
workaround for A306.

Suggested-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-03-19 06:33:36 -04:00
Jordan Crouse
9de43e79c1 drm/msm/adreno: Use generic function to load firmware to a buffer object
Move a5xx specific code to load firmware into a buffer object to
the generic Adreno code. This will come in useful for future targets.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-20 10:41:22 -05:00
Jordan Crouse
c5e3548c29 drm/msm/adreno: Define a list of firmware files to load per target
The number and type of firmware files required differs for each
target. Instead of using a fixed struct member for each possible
firmware file use a generic list of files that should be loaded
on boot.  Use some semi-target specific enums to help each target
find the appropriate firmware(s) that it needs to load.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-02-20 10:41:22 -05:00
Jordan Crouse
f91c14ab44 drm/msm: Add devfreq support for the GPU
Add support for devfreq to dynamically control the GPU frequency.
By default try to use the 'simple_ondemand' governor which can
adjust the frequency based on GPU load.

v2: Fix __aeabi_uldivmod issue from the 0 day bot and use
devfreq_recommended_opp() as suggested by Rob.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-01-10 14:30:03 -05:00
Jordan Crouse
999ae6edc1 drm/msm/adreno: Move clock parsing to adreno_gpu_init()
Move the clock parsing to adreno_gpu_init() to allow for target
specific probing and manipulation of the clock tables.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-01-10 08:58:42 -05:00
Jordan Crouse
1babd706b4 drm/msm/gpu: Remove unused bus scaling code
Remove the downstream bus scaling code. It isn't needed for for
compatibility with a downstream or vendor kernel. Get it out of the
way to clear space for devfreq support.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2018-01-10 08:58:42 -05:00
Colin Ian King
3a9016ba0e drm/msm: fix spelling mistake: "ringubffer" -> "ringbuffer"
Trivial fix to spelling mistake in DRM_DEV_ERROR error message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-12-13 11:01:20 -05:00
Jordan Crouse
b1fc2839d2 drm/msm: Implement preemption for A5XX targets
Implement preemption for A5XX targets - this allows multiple
ringbuffers for different priorities with automatic preemption
of a lower priority ringbuffer if a higher one is ready.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-10-28 11:01:38 -04:00
Jordan Crouse
4d87fc32df drm/msm: Make the value of RB_CNTL (almost) generic
We use a global ringbuffer size and block size for all targets and
at least for 5XX preemption we need to know the value the RB_CNTL
in several locations so it makes sense to calculate it once and use
it everywhere.

The only monkey wrench is that we need to disable the RPTR shadow
for A430 targets but that only needs to be done once and doesn't
affect A5XX so we can or in the value at init time.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-10-28 11:01:38 -04:00
Jordan Crouse
4c7085a5d5 drm/msm: Shadow current pointer in the ring until command is complete
Add a shadow pointer to track the current command being written into
the ring. Don't commit it as 'cur' until the command is submitted.
Because 'cur' is used to construct the software copy of the wptr this
ensures that somebody peeking in on the ring doesn't assume that a
command is inflight while it is being written. This isn't a huge deal
with a single ring (though technically the hangcheck could assume
the system is prematurely busy when it isn't) but it will be rather
important for preemption where the decision to preempt is based
on a non-empty ringbuffer. Without a shadow an aggressive preemption
scheme could assume that the ringbuffer is non empty and switch to it
before the CPU is done writing the command and boom.

Even though preemption won't be supported for all targets because of
the way the code is organized it is simpler to make this generic for
all targets. The extra load for non-preemption targets should be
minimal.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-10-28 11:01:37 -04:00
Jordan Crouse
a6e29a0eea drm/msm: Add a parameter query for the number of ringbuffers
In order to manage ringbuffer priority to its fullest userspace
should know how many ringbuffers it has to work with. Add a
parameter to return the number of active rings.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-10-28 11:01:37 -04:00
Jordan Crouse
f97decac5f drm/msm: Support multiple ringbuffers
Add the infrastructure to support the idea of multiple ringbuffers.
Assign each ringbuffer an id and use that as an index for the various
ring specific operations.

The biggest delta is to support legacy fences. Each fence gets its own
sequence number but the legacy functions expect to use a unique integer.
To handle this we return a unique identifier for each submission but
map it to a specific ring/sequence under the covers. Newer users use
a dma_fence pointer anyway so they don't care about the actual sequence
ID or ring.

The actual mechanics for multiple ringbuffers are very target specific
so this code just allows for the possibility but still only defines
one ringbuffer for each target family.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-10-28 11:01:36 -04:00
Jordan Crouse
cd414f3d93 drm/msm: Move memptrs to msm_gpu
When we move to multiple ringbuffers we're going to store the data
in the memptrs on a per-ring basis. In order to prepare for that
move the current memptrs from the adreno namespace into msm_gpu.
This is way cleaner and immediately lets us kill off some sub
functions so there is much less cost later when we do move to
per-ring structs.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-10-28 11:01:36 -04:00
Rob Clark
2c41ef1b6f drm/msm/adreno: deal with linux-firmware fw paths
When firmware was added to linux-firmware, it was put in a qcom sub-
directory, unlike what we'd been using before.  For a300_pfp.fw and
a300_pm4.fw symlinks were created, but we'd prefer not to have to do
this in the future.  So add support to look in both places when
loading firmware.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-10-28 11:01:31 -04:00
Rob Clark
e8f3de96a9 drm/msm/adreno: split out helper to load fw
Prep work for the next patch.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-10-28 11:01:31 -04:00
Rob Clark
eec874ce5f drm/msm/adreno: load gpu at probe/bind time
Previously, in an effort to defer initializing the gpu until firmware
was available (ie. rootfs mounted), the gpu was not loaded at when the
subdevice was bound.  Which resulted that clks/etc were requested in a
place that devm couldn't really help unwind if something failed.

Instead move request_firmware() to gpu->hw_init() and construct the gpu
earlier in adreno_bind().  To avoid the rest of the driver needing to
be aware of a gpu that hasn't managed to load firmware and hw_init()
yet, stash the gpu ptr in the adreno device's drvdata, and don't set
priv->gpu() until hw_init() succeeds.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-10-28 11:01:31 -04:00
Jordan Crouse
8223286d62 drm/msm: Add a helper function for in-kernel buffer allocations
Nearly all of the buffer allocations for kernel allocate an buffer object,
virtual address and GPU iova at the same time. Make a helper function to
handle the details.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
[dropped msm_fbdev conversion to new helper, since it interferes with
display-handover work, where we want to separate allocation and mapping]
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-08-22 13:19:17 -04:00
Jordan Crouse
1267a4dfe0 drm/msm: Attach the GPU MMU when it is created
Currently the GPU MMU is attached in the adreno_gpu code but as
more and more of the GPU initialization moves to the generic
GPU path we have a need to map and use GPU memory earlier and
earlier.  There isn't any reason to defer attaching the MMU
until later so attach it right after the address space is
created so it can be used immediately.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-08-22 13:19:15 -04:00
Archit Taneja
541de4c9c9 drm/msm/adreno: Prevent unclocked access when retrieving timestamps
msm_gpu's get_timestamp() op (called by the MSM_GET_PARAM ioctl) can
result in register accesses. We need our power domain and clocks to
be active for that. Make sure they are enabled here.

Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-08-01 19:20:13 -04:00
Sushmita Susheelendra
0e08270a1f drm/msm: Separate locking of buffer resources from struct_mutex
Buffer object specific resources like pages, domains, sg list
need not be protected with struct_mutex. They can be protected
with a buffer object level lock. This simplifies locking and
makes it easier to avoid potential recursive locking scenarios
for SVM involving mmap_sem and struct_mutex. This also removes
unnecessary serialization when creating buffer objects, and also
between buffer object creation and GPU command submission.

Signed-off-by: Sushmita Susheelendra <ssusheel@codeaurora.org>
[robclark: squash in handling new locking for shrinker]
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-17 08:03:07 -04:00
Rob Clark
8bdcd949bb drm/msm: pass address-space to _get_iova() and friends
No functional change, that will come later.  But this will make it
easier to deal with dynamically created address spaces (ie. per-
process pagetables for gpu).

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16 11:16:04 -04:00
Rob Clark
cb1e38181a drm/msm: fix locking inconsistency for gpu->hw_init()
Most, but not all, paths where calling the with struct_mutex held.  The
fast-path in msm_gem_get_iova() (plus some sub-code-paths that only run
the first time) was masking this issue.

So lets just always hold struct_mutex for hw_init().  And sprinkle some
WARN_ON()'s and might_lock() to avoid this sort of problem in the
future.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16 11:16:01 -04:00
Jordan Crouse
42a105e9cf drm/msm: Remove memptrs->wptr
memptrs->wptr seems to be unused. Remove it to avoid
confusing the upcoming preemption code.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16 11:16:01 -04:00
Jordan Crouse
5770fc7a56 drm/msm: Add a struct to pass configuration to msm_gpu_init()
The amount of information that we need to pass into msm_gpu_init()
is steadily increasing, so add a new struct to stabilize the function
call and make it easier to add new configuration down the line.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-06-16 11:16:00 -04:00
Jordan Crouse
bf5af4ae87 drm/msm: Hard code the GPU "slow frequency"
Some A3XX and A4XX GPU targets required that the GPU clock be
programmed to a non zero value when it was disabled so
27Mhz was chosen as the "invalid" frequency.

Even though newer targets do not have the same clock restrictions
we still write 27Mhz on clock disable and expect the clock subsystem
to round down to zero.

For unknown reasons even though the slow clock speed is always
27Mhz and it isn't actually a functional level the legacy device tree
frequency tables always defined it and then did gymnastics to work
around it.

Instead of playing the same silly games just hard code the "slow" clock
speed in the code as 27MHz and save ourselves a bit of infrastructure.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-08 06:59:37 -04:00
Jordan Crouse
e3689e470f drm/msm: Add MSM_PARAM_GMEM_BASE
User space needs to know where the GMEM whole starts so that they
can set up the addressing correctly.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-08 06:59:36 -04:00
Jordan Crouse
ee546cd34a drm/msm: Reference count address spaces
There are reasons for a memory object to outlive the file descriptor
that created it and so the address space that a buffer object is
attached to must also outlive the file descriptor. Reference count
the address space so that it can remain viable until all the objects
have released their addresses.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-08 06:59:36 -04:00
Jordan Crouse
9873ef0743 drm/msm: Make sure to detach the MMU during GPU cleanup
We should be detaching the MMU before destroying the address
space. To do this cleanly, the detach has to happen in
adreno_gpu_cleanup() because it needs access to structs
in adreno_gpu.c.  Plus it is better symmetry to have
the attach and detach at the same code level.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-08 06:59:36 -04:00
Rob Clark
de098e5fb1 drm/msm/adreno: reset ringbuffer in hw_init
We need to do this also in resume path when we need to re-hw_init().

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-08 06:59:31 -04:00
Rob Clark
eeb754746b drm/msm/gpu: use pm-runtime
We need to use pm-runtime properly when IOMMU is using device_link() to
control it's own clocks.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-08 06:59:31 -04:00
Rob Clark
c3c3ab199b drm/msm/gpu: move suspend/resume into debugfs->show
Each of the per-generation callbacks was doing this.  Lets just simplify
and move it into toplevel show() fxn.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-04-08 06:59:31 -04:00
Rob Clark
4e09b95d72 drm/msm: drop quirks binding
This was never documented or used in upstream dtb.  It is used by
downstream bindings from android device kernels.  But the quirks are
a property of the gpu revision, and as such are redundant to be listed
separately in dt.  Instead, move the quirks to the device table.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-02-06 11:28:42 -05:00
Rob Clark
de85d2b35a drm/msm: fix potential null ptr issue in non-iommu case
Fixes: 9cb07b099fb ("drm/msm: support multiple address spaces")
Reported-by: Riku Voipio <riku.voipio@linaro.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2017-01-13 10:23:00 -05:00
Jordan Crouse
88b333b0ed drm/msm: Ensure that the hardware write pointer is valid
Currently the value written to CP_RB_WPTR is calculated on the fly as
(rb->next - rb->start). But as the code is designed rb->next is wrapped
before writing the commands so if a series of commands happened to
fit perfectly in the ringbuffer, rb->next would end up being equal to
rb->size / 4 and thus result in an out of bounds address to CP_RB_WPTR.

The easiest way to fix this is to mask WPTR when writing it to the
hardware; it makes the hardware happy and the rest of the ringbuffer
math appears to work and there isn't any point in upsetting anything.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
[squash in is_power_of_2() check]
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-12-29 15:02:58 -05:00
Jordan Crouse
b5f103ab98 drm/msm: gpu: Add A5XX target support
Add support for the A5XX family of Adreno GPUs.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-28 15:14:15 -05:00
Jordan Crouse
4ac277cd9d drm/msm: Disable interrupts during init
Disable the interrupt during the init sequence to avoid having
interrupts fired for errors and other things that we are not
ready to handle while initializing.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-28 15:14:14 -05:00
Jordan Crouse
fb03998192 drm/msm: Add adreno_gpu_write64()
Add a new generic function to write a "64" bit value. This isn't
actually a 64 bit operation, it just writes the upper and lower
32 bit of a 64 bit value to a specified LO and HI register.  If
a particular target doesn't support one of the registers it can
mark that register as SKIP and writes/reads from that register
will be quietly dropped.

This can be immediately put in place for the ringbuffer base and
the RPTR address.  Both writes are converted to use
adreno_gpu_write64() with their respective high and low registers
and the high register appropriately marked as SKIP for both 32 bit
targets (a3xx and a4xx). When a5xx comes it will define valid target
registers for the 'hi' option and everything else will just work.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-28 15:14:12 -05:00
Jordan Crouse
c4a8d47560 drm/msm: gpu: Return error on hw_init failure
When the GPU hardware init function fails (like say, ME_INIT timed
out) return error instead of blindly continuing on. This gives us
a small chance of saving the system before it goes boom.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-28 15:14:11 -05:00
Rob Clark
398efc46f8 drm/msm/adreno: move scratch register dumping to per-gen code
Scratch registers move, annoyingly enough, in a5xx.  Move to
per-generation aNxx_recover() fxn.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-28 15:14:09 -05:00
Rob Clark
667ce33e57 drm/msm: support multiple address spaces
We can have various combinations of 64b and 32b address space, ie. 64b
CPU but 32b display and gpu, or 64b CPU and GPU but 32b display.  So
best to decouple the device iova's from mmap offset.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-11-27 11:23:09 -05:00
Rob Clark
6b597ce2f7 drm/msm: deal with arbitrary # of cmd buffers
For some optimizations coming on the userspace side, splitting larger
draw or gmem cmds into multiple cmdstream buffers, we need to support
much more than the previous small/arbitrary limit.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-16 10:09:08 -04:00
Rob Clark
18f23049f6 drm/msm: change gem->vmap() to get/put
Before we can add vmap shrinking, we really need to know which vmap'ings
are currently being used.  So switch to get/put interface.  Stubbed put
fxns for now.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-07-16 10:09:07 -04:00
Rob Clark
69a834c28f drm/msm: deal with exhausted vmap space better
Some, but not all, callers of obj->vmap() would check if return
IS_ERR().  So let's actually return an error if vmap() fails.  And fixup
the call-sites that were not handling this properly.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-06-04 14:45:48 -04:00
Rob Clark
1193c3bcb5 drm/msm: drop return from gpu->submit()
At this point, there is nothing left to fail.  And submit already has a
fence assigned and is added to the submit_list.  Any problems from here
on out are asynchronous (ie. hangcheck/recovery).

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-05-08 10:22:18 -04:00
Rob Clark
2755734390 drm/msm: fix ->last_fence() after recover
It is no longer true that we discard all in-flight submits on recover
(these days we only discard the first one that hung).  After the first
re-submitted batch completes it would overwrite the fence with a correct
value, but there would be a window of time which showed all re-submitted
batches as already complete.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-05-08 10:22:15 -04:00
Rob Clark
b6295f9a38 drm/msm: 'struct fence' conversion
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-05-08 10:22:15 -04:00
Rob Clark
ca762a8ae7 drm/msm: introduce msm_fence_context
Better encapsulate the per-timeline stuff into fence-context.  For now
there is just a single fence-context, but eventually we'll also have one
per-CRTC to enable fully explicit fencing.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-05-08 10:19:51 -04:00
Rob Clark
6c77d1abe6 drm/msm: add timestamp param
We need this for GL_TIMESTAMP queries.

Note: currently only supported on a4xx.. a3xx doesn't have this
always-on counter.  I think we could emulate it with the one CP
counter that is available, but for now it is of limited usefulness
on a3xx (since we can't seem to do time-elapsed queries in any sane
way with the existing firmware on a3xx, and if you are trying to do
profiling on a tiler you want time-elapsed).  We can add that later
if it becomes useful.

Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-03-03 11:55:32 -05:00
Craig Stout
7d0c5ee9f0 drm/msm/adreno: get CP_RPTR from register instead of shadow memory
As described in the downstream/kgsl driver:
Sometimes the RPTR shadow memory is unreliable causing timeouts
in adreno_idle().  Read it directly from the register instead.

Signed-off-by: Craig Stout <cstout@chromium.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-03-03 11:55:28 -05:00
Craig Stout
357ff00b08 drm/msm/adreno: support for adreno 430.
Signed-off-by: Craig Stout <cstout@chromium.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
2016-03-03 11:55:27 -05:00