In zap_shader_load_mdt(), we pass a pointer to a phys_addr_t
into dmam_alloc_coherent, which the compiler warns about:
drivers/gpu/drm/msm/adreno/a5xx_gpu.c: In function 'zap_shader_load_mdt':
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:54:50: error: passing argument 3 of 'dmam_alloc_coherent' from incompatible pointer type [-Werror=incompatible-pointer-types]
The returned DMA address is later passed on to a function that
takes a phys_addr_t, so it's clearly wrong to use the DMA
mapping interface here: the memory may be uncached, or the
address may be completely wrong if there is an IOMMU connected
to the device. What the code actually wants to do is to get
the physical address from the reserved-mem node. It goes through
the dma-mapping interfaces for obscure reasons, and this
apparently only works by chance, relying on specific bugs
in the error handling of the arm64 dma-mapping implementation.
The same problem existed in the "venus" media driver, which was
now fixed by Stanimir Varbanov after long discussions.
In order to make some progress here, I have now ported his
approach over to the adreno driver. The patch is currently
untested, and should get a good review, but it is now much
simpler than the original, and it should be obvious what
goes wrong if I made a mistake in the port.
See also: a6e2d36bf6 ("media: venus: don't abuse dma_alloc for non-DMA allocations")
Cc: Stanimir Varbanov <stanimir.varbanov@linaro.org>
Fixes: 7c65817e6d ("drm/msm: gpu: Enable zap shader for A5XX")
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-and-Tested-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rob Clark <robdclark@gmail.com>
When compile-testing for something other than ARCH_QCOM,
we run into a link error:
drivers/gpu/drm/msm/adreno/a5xx_gpu.o: In function `a5xx_hw_init':
a5xx_gpu.c:(.text.a5xx_hw_init+0x600): undefined reference to `qcom_mdt_get_size'
a5xx_gpu.c:(.text.a5xx_hw_init+0x93c): undefined reference to `qcom_mdt_load'
There is already an #ifdef that tries to check for CONFIG_QCOM_MDT_LOADER,
but that symbol is only meaningful when building for ARCH_QCOM.
This adds a compile-time check for ARCH_QCOM, and clarifies the
Kconfig select statement so we don't even try it for other targets.
The check for CONFIG_QCOM_MDT_LOADER can then go away, which also
improves compile-time coverage and makes the code a little nicer
to read.
Fixes: 7c65817e6d ("drm/msm: gpu: Enable zap shader for A5XX")
Acked-by: Jordan Crouse <jcrouse@codeaurora.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rob Clark <robdclark@gmail.com>
msm_gpu's get_timestamp() op (called by the MSM_GET_PARAM ioctl) can
result in register accesses. We need our power domain and clocks to
be active for that. Make sure they are enabled here.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
On A5XX GPU hardware clock gating needs to be turned off before
reading certain GPU registers via AHB. Turn off HWCG before calling
adreno_show() to safely dump all the registers without a system hang.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
There are some use cases wherein we need to turn off hardware clock
gating before reading certain registers. Modify the A5XX HWCG function
to allow user to enable or disable clock gating at will.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The 0xf400 and 0xf800 ranges are in the RBBM_SECVID block which may
be protected from CPU access. Skip dumping them since they are minimally
useful for debugging and they aren't worth a system hang.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Buffer object specific resources like pages, domains, sg list
need not be protected with struct_mutex. They can be protected
with a buffer object level lock. This simplifies locking and
makes it easier to avoid potential recursive locking scenarios
for SVM involving mmap_sem and struct_mutex. This also removes
unnecessary serialization when creating buffer objects, and also
between buffer object creation and GPU command submission.
Signed-off-by: Sushmita Susheelendra <ssusheel@codeaurora.org>
[robclark: squash in handling new locking for shrinker]
Signed-off-by: Rob Clark <robdclark@gmail.com>
No functional change, that will come later. But this will make it
easier to deal with dynamically created address spaces (ie. per-
process pagetables for gpu).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Most, but not all, paths where calling the with struct_mutex held. The
fast-path in msm_gem_get_iova() (plus some sub-code-paths that only run
the first time) was masking this issue.
So lets just always hold struct_mutex for hw_init(). And sprinkle some
WARN_ON()'s and might_lock() to avoid this sort of problem in the
future.
Signed-off-by: Rob Clark <robdclark@gmail.com>
memptrs->wptr seems to be unused. Remove it to avoid
confusing the upcoming preemption code.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The amount of information that we need to pass into msm_gpu_init()
is steadily increasing, so add a new struct to stabilize the function
call and make it easier to add new configuration down the line.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
There isn't any generic code that uses ->idle so remove it.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The A5XX GPU powers on in "secure" mode. In secure mode the GPU can
only render to buffers that are marked as secure and inaccessible
to the kernel and user through a series of hardware protections. In
practice secure mode is used to draw things like a UI on a secure
video frame.
In order to switch out of secure mode the GPU executes a special
shader that clears out the GMEM and other sensitve registers and
then writes a register. Because the kernel can't be trusted the
shader binary is signed and verified and programmed by the
secure world. To do this we need to read the MDT header and the
segments from the firmware location and put them in memory and
present them for approval.
For targets without secure support there is an out: if the
secure world doesn't support secure then there are no hardware
protections and we can freely write the SECVID_TRUST register from
the CPU. We don't have 100% confidence that we can query the
secure capabilities at run time but we have enough calls that
need to go right to give us some confidence that we're at least doing
something useful.
Of course if we guess wrong you trigger a permissions violation
which usually ends up in a system crash but thats a problem
that shows up immediately.
[v2: use child device per Bjorn]
[v3: use generic MDT loader per Bjorn]
[v4: use managed dma functions and ifdefs for the MDT loader]
[v5: Add depends for QCOM_MDT_LOADER]
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
[robclark: fix Kconfig to use select instead of depends + #if IS_ENABLED()]
Signed-off-by: Rob Clark <robdclark@gmail.com>
If a OPP table is defined for the GPU device in the device tree use
that in lieu of the downstream style GPU frequency table. If we do
use the downstream table convert it to a OPP table so that we can
take advantage of the OPP lookup facilities later.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some A3XX and A4XX GPU targets required that the GPU clock be
programmed to a non zero value when it was disabled so
27Mhz was chosen as the "invalid" frequency.
Even though newer targets do not have the same clock restrictions
we still write 27Mhz on clock disable and expect the clock subsystem
to round down to zero.
For unknown reasons even though the slow clock speed is always
27Mhz and it isn't actually a functional level the legacy device tree
frequency tables always defined it and then did gymnastics to work
around it.
Instead of playing the same silly games just hard code the "slow" clock
speed in the code as 27MHz and save ourselves a bit of infrastructure.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
User space needs to know where the GMEM whole starts so that they
can set up the addressing correctly.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
There are reasons for a memory object to outlive the file descriptor
that created it and so the address space that a buffer object is
attached to must also outlive the file descriptor. Reference count
the address space so that it can remain viable until all the objects
have released their addresses.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
We should be detaching the MMU before destroying the address
space. To do this cleanly, the detach has to happen in
adreno_gpu_cleanup() because it needs access to structs
in adreno_gpu.c. Plus it is better symmetry to have
the attach and detach at the same code level.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The interrupt status was being cleared before processing the handlers.
a5xx_rbbm_err_irq() was checking the interrupt status again, which would
likely turn out bad because the interrupt status would be 0 (or at least
different). Pass the original status to the function instead.
Also, skip clearing RBBM_AHB_ERROR from the interrupt status. The interrupt
will keep firing until the error source is cleared. Skip the clear to
avoid a storm until the error is cleared in a5xx_rbbm_err_irq().
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Instead of checking for a5xx_gpu->gpmu_iova during destroy we
accidently check a5xx_gpu->gpmu_bo.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The newly added a5xx support fails to build when debugfs is diabled:
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:849:4: error: 'struct msm_gpu_funcs' has no member named 'show'
drivers/gpu/drm/msm/adreno/a5xx_gpu.c:849:11: error: 'a5xx_show' undeclared here (not in a function); did you mean 'a5xx_irq'?
This adds a missing #ifdef.
Fixes: b5f103ab98 ("drm/msm: gpu: Add A5XX target support")
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Each of the per-generation callbacks was doing this. Lets just simplify
and move it into toplevel show() fxn.
Signed-off-by: Rob Clark <robdclark@gmail.com>
This was never documented or used in upstream dtb. It is used by
downstream bindings from android device kernels. But the quirks are
a property of the gpu revision, and as such are redundant to be listed
separately in dt. Instead, move the quirks to the device table.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The original way we determined the gpu version was based on downstream
bindings from android kernel. A cleaner way is to get the version from
the compatible string.
Note that no upstream dtb uses these bindings. But the code still
supports falling back to the legacy bindings (with a warning), so that
we are still compatible with the gpu dt node from android device
kernels.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Rob Herring <robh@kernel.org>
The plan is to use the OPP bindings. For now, remove the documentation
for qcom,gpu-pwrlevels, and make the driver fall back to a safe low
clock if the node is not present.
Note that no upstream dtb use this node. For now we keep compatibility
with this node to avoid breaking compatibility with downstream android
dt files.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Rob Herring <robh@kernel.org>
Fixes: 9cb07b099fb ("drm/msm: support multiple address spaces")
Reported-by: Riku Voipio <riku.voipio@linaro.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Currently the value written to CP_RB_WPTR is calculated on the fly as
(rb->next - rb->start). But as the code is designed rb->next is wrapped
before writing the commands so if a series of commands happened to
fit perfectly in the ringbuffer, rb->next would end up being equal to
rb->size / 4 and thus result in an out of bounds address to CP_RB_WPTR.
The easiest way to fix this is to mask WPTR when writing it to the
hardware; it makes the hardware happy and the rest of the ringbuffer
math appears to work and there isn't any point in upsetting anything.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
[squash in is_power_of_2() check]
Signed-off-by: Rob Clark <robdclark@gmail.com>
Most 5XX targets have GPMU (Graphics Power Management Unit) that
handles a lot of the heavy lifting for power management including
thermal and limits management and dynamic power collapse. While
the GPMU itself is optional, it is usually nessesary to hit
aggressive power targets.
The GPMU firmware needs to be loaded into the GPMU at init time via a
shared hardware block of registers. Using the GPU to write the microcode
is more efficient than using the CPU so at first load create an indirect
buffer that can be executed during subsequent initalization sequences.
After loading the GPMU gets initalized through a shared register
interface and then we mostly get out of its way and let it do
its thing.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Disable the interrupt during the init sequence to avoid having
interrupts fired for errors and other things that we are not
ready to handle while initializing.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add helper functions for TYPE4 and TYPE7 ME opcodes that replace
TYPE0 and TYPE3 starting with the A5XX targets.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add a new generic function to write a "64" bit value. This isn't
actually a 64 bit operation, it just writes the upper and lower
32 bit of a 64 bit value to a specified LO and HI register. If
a particular target doesn't support one of the registers it can
mark that register as SKIP and writes/reads from that register
will be quietly dropped.
This can be immediately put in place for the ringbuffer base and
the RPTR address. Both writes are converted to use
adreno_gpu_write64() with their respective high and low registers
and the high register appropriately marked as SKIP for both 32 bit
targets (a3xx and a4xx). When a5xx comes it will define valid target
registers for the 'hi' option and everything else will just work.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add some new functions to manipulate GPU registers. gpu_read64 and
gpu_write64 can read/write a 64 bit value to two 32 bit registers.
For 4XX and older these are normally perfcounter registers, but
future targets will use 64 bit addressing so there will be many
more spots where a 64 bit read and write are needed.
gpu_rmw() does a read/modify/write on a 32 bit register given a mask
and bits to OR in.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
When the GPU hardware init function fails (like say, ME_INIT timed
out) return error instead of blindly continuing on. This gives us
a small chance of saving the system before it goes boom.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
There are very few register accesses in the common code. Cut down
the list of common registers to just those that are used. This
saves const space and saves us the effort of maintaining registers
for A3XX and A4XX that don't exist or are unused.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
For a5xx the gpu is 64b so we need to change iova to 64b everywhere. On
the display side, iova is still 32b so it can ignore the upper bits.
(Although all the armv8 devices have an iommu that can map 64b pa to 32b
iova.)
Signed-off-by: Rob Clark <robdclark@gmail.com>
We can have various combinations of 64b and 32b address space, ie. 64b
CPU but 32b display and gpu, or 64b CPU and GPU but 32b display. So
best to decouple the device iova's from mmap offset.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We get 2 warnings when building kernel with W=1:
drivers/gpu/drm/msm/adreno/a3xx_gpu.c:535:17: warning: no previous prototype for 'a3xx_gpu_init' [-Wmissing-prototypes]
drivers/gpu/drm/msm/adreno/a4xx_gpu.c:624:17: warning: no previous prototype for 'a4xx_gpu_init' [-Wmissing-prototypes]
In fact, both functions are declared in
drivers/gpu/drm/msm/adreno/adreno_device.c, but should be declared
in a header file. So this patch moves both function declarations to
drivers/gpu/drm/msm/adreno/adreno_gpu.h.
Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1477127865-9381-1-git-send-email-baoyou.xie@linaro.org
For some optimizations coming on the userspace side, splitting larger
draw or gmem cmds into multiple cmdstream buffers, we need to support
much more than the previous small/arbitrary limit.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Before we can add vmap shrinking, we really need to know which vmap'ings
are currently being used. So switch to get/put interface. Stubbed put
fxns for now.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some, but not all, callers of obj->vmap() would check if return
IS_ERR(). So let's actually return an error if vmap() fails. And fixup
the call-sites that were not handling this properly.
Signed-off-by: Rob Clark <robdclark@gmail.com>
At this point, there is nothing left to fail. And submit already has a
fence assigned and is added to the submit_list. Any problems from here
on out are asynchronous (ie. hangcheck/recovery).
Signed-off-by: Rob Clark <robdclark@gmail.com>
It is no longer true that we discard all in-flight submits on recover
(these days we only discard the first one that hung). After the first
re-submitted batch completes it would overwrite the fence with a correct
value, but there would be a window of time which showed all re-submitted
batches as already complete.
Signed-off-by: Rob Clark <robdclark@gmail.com>