Commit Graph

24 Commits

Author SHA1 Message Date
Chris Wilson
3a891a6267 drm/i915: Move intel_engine_mask_t around for use by i915_request_types.h
We want to use intel_engine_mask_t inside i915_request.h, which means
extracting it from the general header file mess and placing it inside a
types.h. A knock on effect is that the compiler wants to warn about
type-contraction of ALL_ENGINES into intel_engine_maskt_t, so prepare
for the worst.

v2: Use intel_engine_mask_t consistently
v3: Move I915_NUM_ENGINES to its natural home at the end of the enum

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190401162641.10963-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2019-04-02 15:09:08 +01:00
Chris Wilson
41a1bde367 drm/i915: Always kick the execlists tasklet after reset
With direct submission being disabled while the reset in progress, we
have a small window where we may forgo the submission of a new request
and not notice its addition during execlists_reset_finish. To close this
window, always schedule the submission tasklet on coming out of reset to
catch any residual work.

<6> [333.144082] i915: Running intel_hangcheck_live_selftests/igt_reset_engines
<3> [333.296927] i915_reset_engine(rcs0:idle): failed to idle after reset
<6> [333.296932] i915 0000:00:02.0: [drm] rcs0
<6> [333.296934] i915 0000:00:02.0: [drm] 	Hangcheck 0:a9ddf7a5 [4157 ms]
<6> [333.296936] i915 0000:00:02.0: [drm] 	Reset count: 36048 (global 754)
<6> [333.296938] i915 0000:00:02.0: [drm] 	Requests:
<6> [333.296997] i915 0000:00:02.0: [drm] 	RING_START: 0x00000000
<6> [333.296999] i915 0000:00:02.0: [drm] 	RING_HEAD:  0x00000000
<6> [333.297001] i915 0000:00:02.0: [drm] 	RING_TAIL:  0x00000000
<6> [333.297003] i915 0000:00:02.0: [drm] 	RING_CTL:   0x00000000
<6> [333.297005] i915 0000:00:02.0: [drm] 	RING_MODE:  0x00000200 [idle]
<6> [333.297007] i915 0000:00:02.0: [drm] 	RING_IMR: fffffeff
<6> [333.297010] i915 0000:00:02.0: [drm] 	ACTHD:  0x00000000_00000000
<6> [333.297012] i915 0000:00:02.0: [drm] 	BBADDR: 0x00000000_00000000
<6> [333.297015] i915 0000:00:02.0: [drm] 	DMA_FADDR: 0x00000000_00000000
<6> [333.297017] i915 0000:00:02.0: [drm] 	IPEIR: 0x00000000
<6> [333.297019] i915 0000:00:02.0: [drm] 	IPEHR: 0x00000000
<6> [333.297021] i915 0000:00:02.0: [drm] 	Execlist status: 0x00000001 00000000
<6> [333.297023] i915 0000:00:02.0: [drm] 	Execlist CSB read 5, write 5 [mmio:7], tasklet queued? no (enabled)
<6> [333.297025] i915 0000:00:02.0: [drm] 		ELSP[0] idle
<6> [333.297027] i915 0000:00:02.0: [drm] 		ELSP[1] idle
<6> [333.297028] i915 0000:00:02.0: [drm] 		HW active? 0x0
<6> [333.297044] i915 0000:00:02.0: [drm] 		Queue priority hint: -8186
<6> [333.297067] i915 0000:00:02.0: [drm] 		Q  2afac:5f2+  prio=-8186 @ 50ms: (null)
<6> [333.297068] i915 0000:00:02.0: [drm] HWSP:
<6> [333.297071] i915 0000:00:02.0: [drm] [0000] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<6> [333.297073] i915 0000:00:02.0: [drm] *
<6> [333.297075] i915 0000:00:02.0: [drm] [0040] 00000001 00000000 00000018 00000002 00000001 00000000 00000018 00000000
<6> [333.297077] i915 0000:00:02.0: [drm] [0060] 00000001 00000000 00008002 00000002 00000000 00000000 00000000 00000005
<6> [333.297079] i915 0000:00:02.0: [drm] [0080] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<6> [333.297081] i915 0000:00:02.0: [drm] *
<6> [333.297083] i915 0000:00:02.0: [drm] [00c0] 00000000 00000000 00000000 00000000 a9ddf7a5 00000000 00000000 00000000
<6> [333.297085] i915 0000:00:02.0: [drm] [00e0] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
<6> [333.297087] i915 0000:00:02.0: [drm] *
<6> [333.297089] i915 0000:00:02.0: [drm] Idle? no
<6> [333.297090] i915_reset_engine(rcs0:idle): 3000 resets
<3> [333.297092] i915/intel_hangcheck_live_selftests: igt_reset_engines failed with error -5
<3> [333.455460] i915 0000:00:02.0: Failed to idle engines, declaring wedged!
...
<0> [333.491294] i915_sel-4916    1.... 333262143us : i915_reset_engine: rcs0 flags=4
<0> [333.491328] i915_sel-4916    1.... 333262143us : execlists_reset_prepare: rcs0: depth<-0
<0> [333.491362] i915_sel-4916    1.... 333262143us : intel_engine_stop_cs: rcs0
<0> [333.491396] i915_sel-4916    1d..1 333262144us : process_csb: rcs0 cs-irq head=5, tail=5
<0> [333.491424] i915_sel-4916    1.... 333262145us : intel_gpu_reset: engine_mask=1
<0> [333.491454] kworker/-214     5.... 333262184us : i915_gem_switch_to_kernel_context: awake?=yes
<0> [333.491487] kworker/-214     5.... 333262192us : i915_request_add: rcs0 fence 2afac:1522
<0> [333.491520] kworker/-214     5.... 333262193us : i915_request_add: marking (null) as active
<0> [333.491553] i915_sel-4916    1.... 333262199us : intel_engine_cancel_stop_cs: rcs0
<0> [333.491587] i915_sel-4916    1.... 333262199us : execlists_reset_finish: rcs0: depth->0

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190313162835.30228-1-chris@chris-wilson.co.uk
2019-03-15 10:58:23 +00:00
Chris Wilson
3d60624916 drm/i915: Make I915_GEM_IDLE_TIMEOUT into a macro
Currently we use HZ/5 for detecting a dead gpu on startup, and we will
wish to reuse this value for detecting a dead gpu on suspend, so convert
it into a macro for later convenience.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190307104530.21745-1-chris@chris-wilson.co.uk
2019-03-07 18:09:26 +00:00
Tvrtko Ursulin
bbb8a9d7e0 drm/i915: GEM_WARN_ON considered harmful
GEM_WARN_ON currently has dangerous semantics where it is completely
compiled out on !GEM_DEBUG builds. This can leave users who expect it to
be more like a WARN_ON, just without a warning in non-debug builds, in
complete ignorance.

Another gotcha with it is that it cannot be used as a statement. Which is
again different from a standard kernel WARN_ON.

This patch fixes both problems by making it behave as one would expect.

It can now be used both as an expression and as statement, and also the
condition evaluates properly in all builds - code under the conditional
will therefore not unexpectedly disappear.

To satisfy call sites which really want the code under the conditional to
completely disappear, we add GEM_DEBUG_WARN_ON and convert some of the
callers to it. This one can also be used as both expression and statement.

>From the above it follows GEM_DEBUG_WARN_ON should be used in situations
where we are certain the condition will be hit during development, but at
a place in code where error can be handled to the benefit of not crashing
the machine.

GEM_WARN_ON on the other hand should be used where condition may happen in
production and we just want to distinguish the level of debugging output
emitted between the production and debug build.

v2:
 * Dropped BUG_ON hunk.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181012063142.16080-1-tvrtko.ursulin@linux.intel.com
2018-10-18 10:10:12 +01:00
Chris Wilson
9e4fa01221 drm/i915/execlists: Flush tasklet directly from reset-finish
On finishing the reset, the intention is to restart the GPU before we
relinquish the forcewake taken to handle the reset - the goal being the
GPU reloads a context before it is allowed to sleep. For this purpose,
we used tasklet_flush() which although it accomplished the goal of
restarting the GPU, carried with it a sting in its tail: it cleared the
TASKLET_STATE_SCHED bit. This meant that if another CPU queued a new
request to this engine, we would clear the flag and later attempt to
requeue the tasklet on the local CPU, breaking the per-cpu softirq
lists.

Remove the dangerous tasklet_kill() and just run the tasklet func
directly as we know it is safe to do so (the tasklets are internally
locked to allow mixed usage from direct submission).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180828152702.27536-1-chris@chris-wilson.co.uk
2018-08-29 13:49:08 +01:00
Chris Wilson
9512f985c3 drm/i915/execlists: Direct submission of new requests (avoid tasklet/ksoftirqd)
Back in commit 27af5eea54 ("drm/i915: Move execlists irq handler to a
bottom half"), we came to the conclusion that running our CSB processing
and ELSP submission from inside the irq handler was a bad idea. A really
bad idea as we could impose nearly 1s latency on other users of the
system, on average! Deferring our work to a tasklet allowed us to do the
processing with irqs enabled, reducing the impact to an average of about
50us.

We have since eradicated the use of forcewaked mmio from inside the CSB
processing and ELSP submission, bringing the impact down to around 5us
(on Kabylake); an order of magnitude better than our measurements 2
years ago on Broadwell and only about 2x worse on average than the
gem_syslatency on an unladen system.

In this iteration of the tasklet-vs-direct submission debate, we seek a
compromise where by we submit new requests immediately to the HW but
defer processing the CS interrupt onto a tasklet. We gain the advantage
of low-latency and ksoftirqd avoidance when waking up the HW, while
avoiding the system-wide starvation of our CS irq-storms.

Comparing the impact on the maximum latency observed (that is the time
stolen from an RT process) over a 120s interval, repeated several times
(using gem_syslatency, similar to RT's cyclictest) while the system is
fully laden with i915 nops, we see that direct submission an actually
improve the worse case.

Maximum latency in microseconds of a third party RT thread
(gem_syslatency -t 120 -f 2)
  x Always using tasklets (a couple of >1000us outliers removed)
  + Only using tasklets from CS irq, direct submission of requests
+------------------------------------------------------------------------+
|          +                                                             |
|          +                                                             |
|          +                                                             |
|          +       +                                                     |
|          + +     +                                                     |
|       +  + +     +  x     x     x                                      |
|      +++ + +     +  x  x  x  x  x  x                                   |
|      +++ + ++  + +  *x x  x  x  x  x                                   |
|      +++ + ++  + *  *x x  *  x  x  x                                   |
|    + +++ + ++  * * +*xxx  *  x  x  xx                                  |
|    * +++ + ++++* *x+**xx+ *  x  x xxxx x                               |
|   **x++++*++**+*x*x****x+ * +x xx xxxx x          x                    |
|x* ******+***************++*+***xxxxxx* xx*x     xxx +                x+|
|             |__________MA___________|                                  |
|      |______M__A________|                                              |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 118            91           186           124     125.28814     16.279137
+ 120            92           187           109     112.00833     13.458617
Difference at 95.0% confidence
	-13.2798 +/- 3.79219
	-10.5994% +/- 3.02677%
	(Student's t, pooled s = 14.9237)

However the mean latency is adversely affected:

Mean latency in microseconds of a third party RT thread
(gem_syslatency -t 120 -f 1)
  x Always using tasklets
  + Only using tasklets from CS irq, direct submission of requests
+------------------------------------------------------------------------+
|           xxxxxx                                        +   ++         |
|           xxxxxx                                        +   ++         |
|           xxxxxx                                      + +++ ++         |
|           xxxxxxx                                     +++++ ++         |
|           xxxxxxx                                     +++++ ++         |
|           xxxxxxx                                     +++++ +++        |
|           xxxxxxx                                   + ++++++++++       |
|           xxxxxxxx                                 ++ ++++++++++       |
|           xxxxxxxx                                 ++ ++++++++++       |
|          xxxxxxxxxx                                +++++++++++++++     |
|         xxxxxxxxxxx    x                           +++++++++++++++     |
|x       xxxxxxxxxxxxx   x           +            + ++++++++++++++++++  +|
|           |__A__|                                                      |
|                                                      |____A___|        |
+------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x 120         3.506         3.727         3.631     3.6321417    0.02773109
+ 120         3.834         4.149         4.039     4.0375167   0.041221676
Difference at 95.0% confidence
	0.405375 +/- 0.00888913
	11.1608% +/- 0.244735%
	(Student's t, pooled s = 0.03513)

However, since the mean latency corresponds to the amount of irqsoff
processing we have to do for a CS interrupt, we only need to speed that
up to benefit not just system latency but our own throughput.

v2: Remember to defer submissions when under reset.
v4: Only use direct submission for new requests
v5: Be aware that with mixing direct tasklet evaluation and deferred
tasklets, we may end up idling before running the deferred tasklet.
v6: Remove the redudant likely() from tasklet_is_enabled(), restrict the
annotation to reset_in_progress().
v7: Take the full timeline.lock when enabling perf_pmu stats as the
tasklet is no longer a valid guard. A consequence is that the stats are
now only valid for engines also using the timeline.lock to process
state.

Testcase: igt/gem_exec_latency/*rthog*
References: 27af5eea54 ("drm/i915: Move execlists irq handler to a bottom half")
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-9-chris@chris-wilson.co.uk
2018-06-28 22:55:10 +01:00
Chris Wilson
fe25f30483 drm/i915/execlists: Wait for ELSP submission on restart
After a reset, we will ensure that there is at least one request
submitted to HW to ensure that a context is loaded for powersaving.
Let's wait for this submission via a tasklet to complete before we drop
our forcewake, ensuring the system is ready for rc6 before we let it
possibly sleep.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180522101937.7738-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2018-05-25 13:39:49 +01:00
Chris Wilson
09a4c02e58 drm/i915: Look for an active kernel context before switching
We were not very carefully checking to see if an older request on the
engine was an earlier switch-to-kernel-context before deciding to emit a
new switch. The end result would be that we could get into a permanent
loop of trying to emit a new request to perform the switch simply to
flush the existing switch.

What we need is a means of tracking the completion of each timeline
versus the kernel context, that is to detect if a more recent request
has been submitted that would result in a switch away from the kernel
context. To realise this, we need only to look in our syncmap on the
kernel context and check that we have synchronized against all active
rings.

v2: Since all ringbuffer clients currently share the same timeline, we do
have to use the gem_context to distinguish clients.

As a bonus, include all the tracing used to debug the death inside
suspend.

v3: Test, test, test. Construct a selftest to exercise and assert the
expected behaviour that multiple switch-to-contexts do not emit
redundant requests.

Reported-by: Mika Kuoppala <mika.kuoppala@intel.com>
Fixes: a89d1f921c ("drm/i915: Split i915_gem_timeline into individual timelines")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180524081135.15278-1-chris@chris-wilson.co.uk
2018-05-24 15:51:45 +01:00
Chris Wilson
f351d087d8 drm/i915: Only sync tasklets once for recursive reset preparation
When setting up reset, we may need to recursively prepare an engine. In
which case we should only synchronously flush the tasklets on the outer
most call, the inner calls will then be inside an atomic section where
the tasklet will never be run (and so the sync will never complete).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-2-chris@chris-wilson.co.uk
2018-05-16 20:20:33 +01:00
Chris Wilson
7f961d799f drm/i915: Compile out engine debug for release
The majority of the engine state dumping is too voluminous to be useful
outside of a controlled setup, though a few do accompany severe errors.
Keep the debug dumps next to the errors, but hide the others behind a CI
compile flag. This becomes more useful when adding more dumps to latency
sensitive paths.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180426103219.22181-1-chris@chris-wilson.co.uk
2018-04-26 15:13:35 +01:00
Chris Wilson
e4d2006f8f drm/i915: Split out parking from the idle worker for reuse
We will want to park GEM before disengaging the drive^W^W^W unwedging.
Since we already do the work for idling, expose the guts as a new
function that we can then reuse.

v2: Just skip if already parked; makes it more forgiving to use by
future callers.
v3: Extract mark_busy, rename it to i915_gem_unpark and place it next to
i915_gem_park so that we can evaluate it for symmetry more easily.
Calling GEM from inside i915_request looks to be a bit of a layering
violation, for the moment I am imaging them as being notify_cb.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> #v1
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180406155144.27791-1-chris@chris-wilson.co.uk
2018-04-06 20:07:13 +01:00
Chris Wilson
629820fcd0 drm/i915: Show GEM_TRACE when detecting a failed GPU idle
If we timeout waiting for the GPU to idle, something went seriously
wrong. We currently dump the engine state, but we can also dump the
ftrace buffer showing our last operations (when available).

In passing, note that since commit 559e040f1f ("drm/i915: Show the GPU
state when declaring wedged", we now show the engine state twice, once
in detecting the failed idle and then again on declaring wedged.

v2: ftrace_dump() takes a parameter specifying whether to dump all cpu
buffers or the local cpu's.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180309101114.1138-1-chris@chris-wilson.co.uk
2018-03-13 21:41:09 +00:00
Tvrtko Ursulin
022d3093a9 drm/i915/icl: Prepare for more rings
Gen11 will add more VCS and VECS rings so prepare the
infrastructure to support that.

Bspec: 7021

v2: Rebase.
v3: Rebase.
v4: Rebase.
v5: Rebase.
v6:
  - Update for POR changes. (Daniele Ceraolo Spurio)
  - Add provisional guc engine ids - to be checked and confirmed.
v7:
  - Rebased.
  - Added the new ring masks.
  - Added the new HW ids.
v8:
  - Introduce I915_MAX_VCS/VECS to avoid magic numbers (Michal)

v9: increase MAX_ENGINE_INSTANCE to 3

Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180228101153.7224-1-mika.kuoppala@linux.intel.com
2018-03-01 14:13:47 +02:00
Chris Wilson
128326a10c drm/i915: Repeat the GEM_BUG_ON message in the ftrace log
As the ftrace log is overflowing the pstore capture, we lose the last
gasps from dmesg which includes the GEM_BUG_ON function:line and condition
that failed. Vital information for tracking down the bug, so append it to
the frace log as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180227211816.5546-1-chris@chris-wilson.co.uk
2018-02-28 09:31:57 +00:00
Mika Kuoppala
d5653ec322 drm/i915: Print the condition causing GEM_BUG_ON
It is easier to categorize and debug bugs if the failed condition
is in plain sight in the actual dmesg output. Make it so.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171116083954.3357-1-mika.kuoppala@linux.intel.com
2017-11-16 15:35:47 +02:00
Chris Wilson
bccd3b8311 drm/i915: Use trace_printk to provide a death rattle for GEM
Trying to enable printk debugging for GEM is fraught with the issue of
spam; interactions with HW are very frequent and often boring. However,
one instance where they are not so boring is just before a BUG; here
ftrace provides a facility to dump its ringbuffer on an oops. So for CI
let's enable trace_printk() to capture the last exchanges with HW as a
death rattle.

For example,
[   79.234110] ------------[ cut here ]------------
[   79.234137] kernel BUG at drivers/gpu/drm/i915/intel_lrc.c:907!
[   79.234145] invalid opcode: 0000 [#1] SMP
[   79.234153] Dumping ftrace buffer:
[   79.234158] ---------------------------------
...
[   79.314044] gem_conc-1059    1..s1 79203443us : intel_lrc_irq_handler: bcs0 out[0]: ctx=5.2, seqno=145
[   79.314089] gem_conc-1059    1..s. 79220800us : intel_lrc_irq_handler: bcs0 csb[1/1]: status=0x00000018:0x00000005
[   79.314133] gem_conc-1059    1..s. 79220803us : intel_lrc_irq_handler: bcs0 out[0]: ctx=5.1, seqno=145
[   79.314177] gem_conc-1062    2..s1 79230458us : intel_lrc_irq_handler: bcs0 in[0]:  ctx=8.1, seqno=146
[   79.314220] gem_conc-1062    2..s1 79230515us : intel_lrc_irq_handler: bcs0 in[0]:  ctx=8.2, seqno=147
[   79.314265] gem_conc-1059    1..s1 79230951us : intel_lrc_irq_handler: bcs0 csb[2/3]: status=0x00000012:0x00000008
[   79.314309] gem_conc-1059    1..s1 79230954us : intel_lrc_irq_handler: bcs0 out[0]: ctx=8.2, seqno=147
[   79.314353] gem_conc-1059    1..s1 79230954us : intel_lrc_irq_handler: bcs0 csb[3/3]: status=0x00008002:0x00000008
[   79.314396] gem_conc-1059    1..s1 79230955us : intel_lrc_irq_handler: bcs0 out[0]: ctx=8.1, seqno=147
[   79.314402] ---------------------------------

v2: Tweak the formatting to be more consistent between in/out.
v3: do {} while (0) stub macro protection

Suggested-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171109143019.16568-1-chris@chris-wilson.co.uk
2017-11-09 21:39:18 +00:00
Chris Wilson
4797948071 drm/i915: Squash repeated awaits on the same fence
Track the latest fence waited upon on each context, and only add a new
asynchronous wait if the new fence is more recent than the recorded
fence for that context. This requires us to filter out unordered
timelines, which are noted by DMA_FENCE_NO_CONTEXT. However, in the
absence of a universal identifier, we have to use our own
i915->mm.unordered_timeline token.

v2: Throw around the debug crutches
v3: Inline the likely case of the pre-allocation cache being full.
v4: Drop the pre-allocation support, we can lose the most recent fence
in case of allocation failure -- it just means we may emit more awaits
than strictly necessary but will not break.
v5: Trim allocation size for leaf nodes, they only need an array of u32
not pointers.
v6: Create mock_timeline to tidy selftest writing
v7: s/intel_timeline_sync_get/intel_timeline_sync_is_later/ (Tvrtko)
v8: Prune the stale sync points when we idle.
v9: Include a small benchmark in the kselftests
v10: Separate the idr implementation into its own compartment. (Tvrkto)
v11: Refactor igt_sync kselftests to avoid deep nesting (Tvrkto)
v12: __sync_leaf_idx() to assert that p->height is 0 when checking leaves
v13: kselftests to investigate struct i915_syncmap itself (Tvrtko)
v14: Foray into ascii art graphs
v15: Take into account that the random lookup/insert does 2 prng calls,
not 1, when benchmarking, and use for_each_set_bit() (Tvrtko)
v16: Improved ascii art

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170503093924.5320-4-chris@chris-wilson.co.uk
2017-05-03 11:08:48 +01:00
Chris Wilson
ae9a043b0c drm/i915: Rename conditional GEM execution macros
After a brief discussion, we settled on a naming convention for the
conditional GEM debugging data that should be clearer to the casual
user: GEM_DEBUG

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170207102319.10910-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-02-10 21:43:43 +00:00
Chris Wilson
eca56a3511 drm/i915: Mark the end of intel_ring_begin() and check in intel_ring_advance()
It is required that the caller declare the exact number of dwords they
wish to write into the ring. This is required for two reasons, we need
to allocate sufficient space for the entire command packet and we need
to be sure that the contents are completely written to avoid executing
stale data. The current interface requires for any bug to be caught in
review, the reader has to carefully count the number of
intel_ring_emit() between intel_ring_begin() and intel_ring_advance().
If we record the end of the packet of each intel_ring_begin() we can
also have CI check for us.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170206170502.30944-1-chris@chris-wilson.co.uk
2017-02-06 20:41:00 +00:00
Matthew Auld
17cfde60a1 drm/i915: introduce GEM_WARN_ON
In a similar spirit to GEM_BUG_ON we now also have GEM_WARN_ON, with the
simple goal of expressing warnings which are truly insane, and so are
only really useful for CI where we have some abusive tests.

v2:
  - use BUILD_BUG_ON_INVALID for !DEBUG_GEM
  - clarify commit message

Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161213203222.32564-1-matthew.auld@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-12-16 21:16:36 +00:00
Matthew Auld
4cfa20c834 drm/i915: allow GEM_BUG_ON expr checking with !DEBUG_GEM
Use BUILD_BUG_ON_INVALID(expr) in GEM_BUG_ON when building without
DEBUG_GEM. This means the compiler can now check the validity of expr
without generating any code, in turn preventing us from inadvertently
breaking the build when DEBUG_GEM is not enabled.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161202184750.3843-1-matthew.auld@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2016-12-05 20:49:17 +00:00
Arnd Bergmann
71d5895aac drm/i915: avoid harmless empty-body warning
The newly added assert_kernel_context_is_current introduces a warning
when built with W=1:

drivers/gpu/drm/i915/i915_gem.c: In function ‘assert_kernel_context_is_current’:
drivers/gpu/drm/i915/i915_gem.c:4417:63: error: suggest braces around empty body in an ‘else’ statement [-Werror=empty-body]

Changing the GEM_BUG_ON() macro from an empty definition to "do { } while (0)"
makes the macro more robust to use and avoids the warning.

Fixes: 3033acab07 ("drm/i915: Queue the idling context switch after all other timelines")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161108135834.2166677-1-arnd@arndb.de
2016-11-08 17:33:47 +01:00
Chris Wilson
73cb97010d drm/i915: Combine seqno + tracking into a global timeline struct
Our timelines are more than just a seqno. They also provide an ordered
list of requests to be executed. Due to the restriction of handling
individual address spaces, we are limited to a timeline per address
space but we use a fence context per engine within.

Our first step to introducing independent timelines per context (i.e. to
allow each context to have a queue of requests to execute that have a
defined set of dependencies on other requests) is to provide a timeline
abstraction for the global execution queue.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161028125858.23563-23-chris@chris-wilson.co.uk
2016-10-28 20:53:51 +01:00
Chris Wilson
d501b1d2b1 drm/i915: Add GEM debugging Kconfig option
Currently there is a #define to enable extra BUG_ON for debugging
requests and associated activities. I want to expand its use to cover
all of GEM internals (so that we can saturate the code with asserts).
We can add a Kconfig option to make it easier to enable - with the usual
caveats of not enabling unless explicitly requested.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-3-git-send-email-chris@chris-wilson.co.uk
2016-04-14 10:45:40 +01:00