HLSQ, SP and TP registers are only accessible from a special
aperture and to make matters worse the aperture is blocked from
the CPU on targets that can support secure rendering. Luckily the
GPU hardware has its own purpose built register dumper that can
access the registers from the aperture. Add a5xx specific code
to program the crashdumper and retrieve the wayward registers
and dump them for the crash state.
Also, remove a block of registers the regular CPU accessible
list that aren't useful for debug which helps reduce the size
of the crash state file by a goodly amount.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Capture the GPU state on a GPU hang and store it for later playback
via the devcoredump facility. Only one crash state is stored at a
time on the assumption that the first hang is usually the most
interesting. The existing crash state can be cleared after capturing
it and then a new one will be captured on the next hang.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Convert the existing GPU show function to use the GPU state to
dump the information rather than reading it directly from the hardware.
This will require an additional step to capture the state before
dumping it for the existing nodes but it will greatly facilitate reusing
the same code for dumping a previously captured state from a GPU hang.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add the infrastructure to capture the current state of the GPU and
store it in memory so that it can be dumped later.
For now grab the same basic ringbuffer information and registers
that are provided by the debugfs 'gpu' node but obviously this should
be extended to capture a much larger set of GPU information.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Interrupt commands causes the CP to trigger an interrupt as the command
is processed, regardless of the GPU being done processing previous
commands. This is seen by the interrupt being delivered before the
fence is written on 8974 and is likely the cause of the additional
CP_WAIT_FOR_IDLE workaround found for a306, which would cause the CP to
wait for the GPU to go idle before triggering the interrupt.
Instead we can set the (undocumented) BIT(31) of the CACHE_FLUSH_TS
which will cause a special CACHE_FLUSH_TS interrupt to be triggered from
the GPU as the write event is processed.
Add CACHE_FLUSH_TS to the IRQ masks of A3xx and A4xx and remove the
workaround for A306.
Suggested-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The number and type of firmware files required differs for each
target. Instead of using a fixed struct member for each possible
firmware file use a generic list of files that should be loaded
on boot. Use some semi-target specific enums to help each target
find the appropriate firmware(s) that it needs to load.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add the infrastructure to support the idea of multiple ringbuffers.
Assign each ringbuffer an id and use that as an index for the various
ring specific operations.
The biggest delta is to support legacy fences. Each fence gets its own
sequence number but the legacy functions expect to use a unique integer.
To handle this we return a unique identifier for each submission but
map it to a specific ring/sequence under the covers. Newer users use
a dma_fence pointer anyway so they don't care about the actual sequence
ID or ring.
The actual mechanics for multiple ringbuffers are very target specific
so this code just allows for the possibility but still only defines
one ringbuffer for each target family.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
When we move to multiple ringbuffers we're going to store the data
in the memptrs on a per-ring basis. In order to prepare for that
move the current memptrs from the adreno namespace into msm_gpu.
This is way cleaner and immediately lets us kill off some sub
functions so there is much less cost later when we do move to
per-ring structs.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Commit eeb754746b ("drm/msm/gpu: use pm-runtime") adds a pointer
for the GPU platform device to the msm_gpu struct so we can
happily remove the same pointers from the individual GPU
structs.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
There isn't any generic code that uses ->idle so remove it.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Each of the per-generation callbacks was doing this. Lets just simplify
and move it into toplevel show() fxn.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add a new generic function to write a "64" bit value. This isn't
actually a 64 bit operation, it just writes the upper and lower
32 bit of a 64 bit value to a specified LO and HI register. If
a particular target doesn't support one of the registers it can
mark that register as SKIP and writes/reads from that register
will be quietly dropped.
This can be immediately put in place for the ringbuffer base and
the RPTR address. Both writes are converted to use
adreno_gpu_write64() with their respective high and low registers
and the high register appropriately marked as SKIP for both 32 bit
targets (a3xx and a4xx). When a5xx comes it will define valid target
registers for the 'hi' option and everything else will just work.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add some new functions to manipulate GPU registers. gpu_read64 and
gpu_write64 can read/write a 64 bit value to two 32 bit registers.
For 4XX and older these are normally perfcounter registers, but
future targets will use 64 bit addressing so there will be many
more spots where a 64 bit read and write are needed.
gpu_rmw() does a read/modify/write on a 32 bit register given a mask
and bits to OR in.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
When the GPU hardware init function fails (like say, ME_INIT timed
out) return error instead of blindly continuing on. This gives us
a small chance of saving the system before it goes boom.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
There are very few register accesses in the common code. Cut down
the list of common registers to just those that are used. This
saves const space and saves us the effort of maintaining registers
for A3XX and A4XX that don't exist or are unused.
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
We can have various combinations of 64b and 32b address space, ie. 64b
CPU but 32b display and gpu, or 64b CPU and GPU but 32b display. So
best to decouple the device iova's from mmap offset.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Not sure where it came from, but seem unintentional. And also not
needed on a420, so let's just drop it.
Signed-off-by: Rob Clark <robdclark@gmail.com>
We need this for GL_TIMESTAMP queries.
Note: currently only supported on a4xx.. a3xx doesn't have this
always-on counter. I think we could emulate it with the one CP
counter that is available, but for now it is of limited usefulness
on a3xx (since we can't seem to do time-elapsed queries in any sane
way with the existing firmware on a3xx, and if you are trying to do
profiling on a tiler you want time-elapsed). We can add that later
if it becomes useful.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Dump a bit more info when the GPU hangs, without having hang_debug
enabled (which dumps a *lot* of registers). Also dump the scratch
registers, as they are useful for determining where in the cmdstream
the GPU hung (and they seem always safe to read when GPU has hung).
Note that the freedreno gallium driver emits increasing counter values
to SCRATCH6 (to identify tile #) and SCRATCH7 (to identify draw #), so
these two in particular can be used to "triangulate" where in the
cmdstream the GPU hung.
Signed-off-by: Rob Clark <robdclark@gmail.com>