Add the infrastructure to support the idea of multiple ringbuffers.
Assign each ringbuffer an id and use that as an index for the various
ring specific operations.
The biggest delta is to support legacy fences. Each fence gets its own
sequence number but the legacy functions expect to use a unique integer.
To handle this we return a unique identifier for each submission but
map it to a specific ring/sequence under the covers. Newer users use
a dma_fence pointer anyway so they don't care about the actual sequence
ID or ring.
The actual mechanics for multiple ringbuffers are very target specific
so this code just allows for the possibility but still only defines
one ringbuffer for each target family.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Nearly all of the buffer allocations for kernel allocate an buffer object,
virtual address and GPU iova at the same time. Make a helper function to
handle the details.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
[dropped msm_fbdev conversion to new helper, since it interferes with
display-handover work, where we want to separate allocation and mapping]
Signed-off-by: Rob Clark <robdclark@gmail.com>
Buffer object specific resources like pages, domains, sg list
need not be protected with struct_mutex. They can be protected
with a buffer object level lock. This simplifies locking and
makes it easier to avoid potential recursive locking scenarios
for SVM involving mmap_sem and struct_mutex. This also removes
unnecessary serialization when creating buffer objects, and also
between buffer object creation and GPU command submission.
Signed-off-by: Sushmita Susheelendra <ssusheel@codeaurora.org>
[robclark: squash in handling new locking for shrinker]
Signed-off-by: Rob Clark <robdclark@gmail.com>
Currently the value written to CP_RB_WPTR is calculated on the fly as
(rb->next - rb->start). But as the code is designed rb->next is wrapped
before writing the commands so if a series of commands happened to
fit perfectly in the ringbuffer, rb->next would end up being equal to
rb->size / 4 and thus result in an out of bounds address to CP_RB_WPTR.
The easiest way to fix this is to mask WPTR when writing it to the
hardware; it makes the hardware happy and the rest of the ringbuffer
math appears to work and there isn't any point in upsetting anything.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
[squash in is_power_of_2() check]
Signed-off-by: Rob Clark <robdclark@gmail.com>
Before we can add vmap shrinking, we really need to know which vmap'ings
are currently being used. So switch to get/put interface. Stubbed put
fxns for now.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some, but not all, callers of obj->vmap() would check if return
IS_ERR(). So let's actually return an error if vmap() fails. And fixup
the call-sites that were not handling this properly.
Signed-off-by: Rob Clark <robdclark@gmail.com>
In error paths, this was being called without struct_mutex held.
Leading to panics like:
msm 1a00000.qcom,mdss_mdp: No memory protection without IOMMU
Kernel panic - not syncing: BUG!
CPU: 0 PID: 1409 Comm: cat Not tainted 4.0.0-dirty #4
Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
Call trace:
[<ffffffc000089c78>] dump_backtrace+0x0/0x118
[<ffffffc000089da0>] show_stack+0x10/0x20
[<ffffffc0006686d4>] dump_stack+0x84/0xc4
[<ffffffc0006678b4>] panic+0xd0/0x210
[<ffffffc0003e1ce4>] drm_gem_object_free+0x5c/0x60
[<ffffffc000402870>] adreno_gpu_cleanup+0x60/0x80
[<ffffffc0004035a0>] a3xx_destroy+0x20/0x70
[<ffffffc0004036f4>] a3xx_gpu_init+0x84/0x108
[<ffffffc0004018b8>] adreno_load_gpu+0x58/0x190
[<ffffffc000419dac>] msm_open+0x74/0x88
[<ffffffc0003e0a48>] drm_open+0x168/0x400
[<ffffffc0003e7210>] drm_stub_open+0xa8/0x118
[<ffffffc0001a0e84>] chrdev_open+0x94/0x198
[<ffffffc000199f88>] do_dentry_open+0x208/0x310
[<ffffffc00019a4c4>] vfs_open+0x44/0x50
[<ffffffc0001aa26c>] do_last.isra.14+0x2c4/0xc10
[<ffffffc0001aac38>] path_openat+0x80/0x5e8
[<ffffffc0001ac354>] do_filp_open+0x2c/0x98
[<ffffffc00019b60c>] do_sys_open+0x13c/0x228
[<ffffffc00019b72c>] SyS_openat+0xc/0x18
CPU1: stopping
But there isn't any particularly good reason to hold struct_mutex for
teardown, so just standardize on calling it without the mutex held and
use the _unlocked() versions for GEM obj unref'ing
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add initial support for a3xx 3d core.
So far, with hardware that I've seen to date, we can have:
+ zero, one, or two z180 2d cores
+ a3xx or a2xx 3d core, which share a common CP (the firmware
for the CP seems to implement some different PM4 packet types
but the basics of cmdstream submission are the same)
Which means that the eventual complete "class" hierarchy, once
support for all past and present hw is in place, becomes:
+ msm_gpu
+ adreno_gpu
+ a3xx_gpu
+ a2xx_gpu
+ z180_gpu
This commit splits out the parts that will eventually be common
between a2xx/a3xx into adreno_gpu, and the parts that are even
common to z180 into msm_gpu.
Note that there is no cmdstream validation required. All memory access
from the GPU is via IOMMU/MMU. So as long as you don't map silly things
to the GPU, there isn't much damage that the GPU can do.
Signed-off-by: Rob Clark <robdclark@gmail.com>