Commit Graph

800 Commits

Author SHA1 Message Date
Chris Wilson
5adfb772f8 drm/i915: Move engine reset prepare/finish to backends
In preparation to more carefully handling incomplete preemption during
reset by execlists, we move the existing code wholesale to the backends
under a couple of new reset vfuncs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-4-chris@chris-wilson.co.uk
2018-05-16 20:20:35 +01:00
Chris Wilson
a89d1f921c drm/i915: Split i915_gem_timeline into individual timelines
We need to move to a more flexible timeline that doesn't assume one
fence context per engine, and so allow for a single timeline to be used
across a combination of engines. This means that preallocating a fence
context per engine is now a hindrance, and so we want to introduce the
singular timeline. From the code perspective, this has the notable
advantage of clearing up a lot of mirky semantics and some clumsy
pointer chasing.

By splitting the timeline up into a single entity rather than an array
of per-engine timelines, we can realise the goal of the previous patch
of tracking the timeline alongside the ring.

v2: Tweak wait_for_idle to stop the compiling thinking that ret may be
uninitialised.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk
2018-05-02 23:57:18 +01:00
Chris Wilson
65fcb8064d drm/i915: Move timeline from GTT to ring
In the future, we want to move a request between engines. To achieve
this, we first realise that we have two timelines in effect here. The
first runs through the GTT is required for ordering vma access, which is
tracked currently by engine. The second is implied by sequential
execution of commands inside the ringbuffer. This timeline is one that
maps to userspace's expectations when submitting requests (i.e. given the
same context, batch A is executed before batch B). As the rings's
timelines map to userspace and the GTT timeline an implementation
detail, move the timeline from the GTT into the ring itself (per-context
in logical-ring-contexts/execlists, or a global per-engine timeline for
the shared ringbuffers in legacy submission.

The two timelines are still assumed to be equivalent at the moment (no
migrating requests between engines yet) and so we can simply move from
one to the other without adding extra ordering.

v2: Reinforce that one isn't allowed to mix the engine execution
timeline with the client timeline from userspace (on the ring).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-1-chris@chris-wilson.co.uk
2018-05-02 23:57:13 +01:00
Chris Wilson
643b450a59 drm/i915: Only track live rings for retiring
We don't need to track every ring for its lifetime as they are managed
by the contexts/engines. What we do want to track are the live rings so
that we can sporadically clean up requests if userspace falls behind. We
can simply restrict the gt->rings list to being only gt->live_rings.

v2: s/live/active/ for consistency with gt.active_requests

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-4-chris@chris-wilson.co.uk
2018-04-30 16:01:20 +01:00
Chris Wilson
b887d61546 drm/i915: Retire requests along rings
In the next patch, rings are the central timeline as requests may jump
between engines. Therefore in the future as we retire in order along the
engine timeline, we may retire out-of-order within a ring (as the ring now
occurs along multiple engines), leading to much hilarity in miscomputing
the position of ring->head.

As an added bonus, retiring along the ring reduces the penalty of having
one execlists client do cleanup for another (old legacy submission
shares a ring between all clients). The downside is that slow and
irregular (off the critical path) process of cleaning up stale requests
after userspace becomes a modicum less efficient.

In the long run, it will become apparent that the ordered
ring->request_list matches the ring->timeline, a fun challenge for the
future will be unifying the two lists to avoid duplication!

v2: We need both engine-order and ring-order processing to maintain our
knowledge of where individual rings have completed upto as well as
knowing what was last executing on any engine. And finally by decoupling
retiring the contexts on the engine and the timelines along the rings,
we do have to keep a reference to the context on each request
(previously it was guaranteed by the context being pinned).

v3: Not just a reference to the context, but we need to keep it pinned
as we manipulate the rings; i.e. we need a pin for both the manipulation
of the engine state during its retirements, and a separate pin for the
manipulation of the ring state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-3-chris@chris-wilson.co.uk
2018-04-30 16:01:18 +01:00
Chris Wilson
ab82a0635c drm/i915: Wrap engine->context_pin() and engine->context_unpin()
Make life easier in upcoming patches by moving the context_pin and
context_unpin vfuncs into inline helpers.

v2: Fixup mock_engine to mark the context as pinned on use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-2-chris@chris-wilson.co.uk
2018-04-30 16:01:13 +01:00
Chris Wilson
1f177a131b drm/i915: Use memset64() to align the ring with MI_NOOP
When filling the ring to align the emit pointer to the next cacheline,
use memset64() rather than open-coding it. As we know that we always
have an even number of dwords, we can replace the dword loop with the
qword equivalent.

v2: s/0/MI_NOOP<<32 | MI_NOOP/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180425123718.16366-1-chris@chris-wilson.co.uk
2018-04-25 15:27:38 +01:00
Chris Wilson
f4ecfbfc32 drm/i915: Check whitelist registers across resets
Add a selftest to ensure that we restore the whitelisted registers after
rewrite the registers everytime they might be scrubbed, e.g. module
load, reset and resume. For the other volatile workaround registers, we
export their presence via debugfs and check in igt/gem_workarounds.
However, we don't export the whitelist and rather than do so, let's test
them directly in the kernel.

The test we use is to read the registers back from the CS (this helps us
be sure that the registers will be valid for MI_LRI etc). In order to
generate the expected list, we split intel_whitelist_workarounds_emit
into two phases, the first to build the list and the second to apply.
Inside the test, we only build the list and then check that list against
the hw.

v2: Filter out pre-gen8 as they do not have RING_NONPRIV.
v3: Drop unused engine parameter, no plans to use it now or future.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180414122754.569-1-chris@chris-wilson.co.uk
2018-04-14 18:36:45 +01:00
Oscar Mateo
59b449d5c8 drm/i915: Split out functions for different kinds of workarounds
There are different kind of workarounds (those that modify registers that
live in the context image, those that modify global registers, those that
whitelist registers, etc...) and they have different requirements in terms
of where they are applied and how. Also, by splitting them apart, it should
be easier to decide where a new workaround should go.

v2:
  - Add multiple MISSING_CASE
  - Rebased

v3:
  - Rename mmio_workarounds to gt_workarounds (Chris, Mika)
  - Create empty placeholders for BDW and CHV GT WAs
  - Rebased

v4: Rebased

v5:
 - Rebased
 - FORCE_TO_NONPRIV register exists since BDW, so make a path
   for it to achieve universality, even if empty (Chris)

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[ickle: appease checkpatch]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1523376767-18480-2-git-send-email-oscar.mateo@intel.com
2018-04-11 22:47:46 +01:00
Oscar Mateo
7d3c425fef drm/i915: Move a bunch of workaround-related code to its own file
This has grown to be a sizable amount of code, so move it to
its own file before we try to refactor anything. For the moment,
we are leaving behind the WA BB code and the WAs that get applied
(incorrectly) in init_clock_gating, but we will deal with it later.

v2: Use intel_ prefix for code that deals with the hardware (Chris)
v3: Rebased
v4:
  - Rebased
  - New license header
v5:
  - Rebased
  - Added some organisational notes to the file (Chris)
v6: Include DOC section in the documentation build (Jani)

Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[ickle: appease checkpatch, mostly]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1523376767-18480-1-git-send-email-oscar.mateo@intel.com
2018-04-11 22:47:01 +01:00
Chris Wilson
46b863325c drm/i915: Prefer memset64() when filling the iomap
As the ringbuffer may exist inside stolen memory, our access to it may
be via the GTT iomap. This implies we may only have WC access for which
the conventional memset() substitution of rep stos performs very badly,
so switch to the rep mov[dq] variants when available.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180319123528.28249-1-chris@chris-wilson.co.uk
2018-03-19 14:42:40 +00:00
Daniele Ceraolo Spurio
fa6f071d54 drm/i915: move gen8 irq shifts to intel_lrc.c
The only usage outside the intel_lrc.c file is in the ringbuffer
init, but the irq mask calculated there is then overwritten for
all engines that have a non-zero shift, so we can drop it.

This change is not aimed at code saving but at removing from
intel_engines information that does not apply to all gens that have
the engine. When checking without the temporary WARN_ON, code size
is basically unchanged.

v2: make the irq_shifts array static const
v3: rebase, move irq_shifts array to logical_ring_default_irqs
v4: move array inside the if and use u8 for it (Chris)

Suggested-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180314182653.26981-4-daniele.ceraolospurio@intel.com
2018-03-15 08:46:06 +00:00
Daniele Ceraolo Spurio
80b216b98b drm/i915: store all mmio bases in intel_engines
The mmio bases we're currently storing in the intel_engines array are
only valid for a subset of gens, so we need to ignore them and use
different values in some cases. Instead of doing that, we can have a
table of [starting gen, mmio base] pairs for each engine in
intel_engines and select the correct one based on the gen we're running
on in a consistent way.

v2: document that the list goes in reverse order, update starting gen
    for render (Chris)

v3: starting gen for render back to 1 to make our life easier with
    selftests (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v2
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180314182653.26981-1-daniele.ceraolospurio@intel.com
2018-03-15 08:46:06 +00:00
Chris Wilson
36620032ce drm/i915: Update ring position from request on retiring
When wedged, we do not update the ring->tail as we submit the requests
causing us to leak the ring->space upon cleaning up the wedged driver.
We can just use the value stored in rq->tail, and keep the submission
backend details away from set-wedge.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180307134226.25492-3-chris@chris-wilson.co.uk
2018-03-09 14:13:31 +00:00
Chris Wilson
e61e0f51ba drm/i915: Rename drm_i915_gem_request to i915_request
We want to de-emphasize the link between the request (dependency,
execution and fence tracking) from GEM and so rename the struct from
drm_i915_gem_request to i915_request. That is we may implement the GEM
user interface on top of requests, but they are an abstraction for
tracking execution rather than an implementation detail of GEM. (Since
they are not tied to HW, we keep the i915 prefix as opposed to intel.)

In short, the spatch:
@@

@@
- struct drm_i915_gem_request
+ struct i915_request

A corollary to contracting the type name, we also harmonise on using
'rq' shorthand for local variables where space if of the essence and
repetition makes 'request' unwieldy. For globals and struct members,
'request' is still much preferred for its clarity.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2018-02-21 20:57:22 +00:00
Tvrtko Ursulin
c56b89f16d drm/i915: Use INTEL_GEN everywhere
Coccinelle patch:

 @@
 identifier p;
 @@
 -INTEL_INFO(p)->gen
 +INTEL_GEN(p)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180208130606.15556-12-tvrtko.ursulin@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180209215847.6660-1-chris@chris-wilson.co.uk
2018-02-09 22:29:02 +00:00
Chris Wilson
179f402550 drm/i915: Fix kerneldoc warnings for intel_ringbuffer
drivers/gpu/drm/i915/intel_ringbuffer.c:179: warning: No description found for parameter 'req'
drivers/gpu/drm/i915/intel_ringbuffer.c:741: warning: No description found for parameter 'req'
drivers/gpu/drm/i915/intel_ringbuffer.c:741: warning: No description found for parameter 'cs'

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180208111220.32293-1-chris@chris-wilson.co.uk
2018-02-08 13:58:23 +00:00
Chris Wilson
8177e11252 drm/i915: Tidy up some error messages around reset failure
On blb and pnv, we are seeing sporadic

  i915 0000:00:02.0: Resetting chip after gpu hang
  [drm:intel_gpu_reset [i915]] rcs0: timed out on STOP_RING
  [drm:i915_reset [i915]] *ERROR* Failed hw init on reset -5

which notably lack the actual root cause of the error. Ostensibly it
should be the init_ring_common() that failed, but it's error paths are
covered by DRM_ERROR.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207111545.17078-1-chris@chris-wilson.co.uk
2018-02-07 13:12:32 +00:00
Chris Wilson
8911a31c81 drm/i915: Move mi_set_context() into the legacy ringbuffer submission
The legacy i915_switch_context() is only applicable to the legacy
ringbuffer submission method, so move it from the general
i915_gem_context.c to intel_ringbuffer.c (rename pending!).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171123152631.31385-2-chris@chris-wilson.co.uk
2017-11-23 16:12:06 +00:00
Chris Wilson
b1c24a6137 drm/i915: Unwind incomplete legacy context switches
The legacy context switch for ringbuffer submission is multistaged,
where each of those stages may fail. However, we were updating global
state after some stages, and so we had to force the incomplete request
to be submitted because we could not unwind. Save the global state
before performing the switches, and so enable us to unwind back to the
previous global state should any phase fail. We then must cancel the
request instead of submitting it should the construction fail.

v2: s/saved_ctx/from_ctx/; s/ctx/to_ctx/ etc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171123152631.31385-1-chris@chris-wilson.co.uk
2017-11-23 16:12:04 +00:00
Chris Wilson
93c6e966b4 drm/i915: Remove i915.semaphores modparam
Having disabled the broken semaphores on Sandybridge, there is no need
for a modparam any more, so remove it in favour of a simple
HAS_LEGACY_SEMAPHORES() guard.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-5-chris@chris-wilson.co.uk
2017-11-20 21:59:09 +00:00
Chris Wilson
79e6770cb1 drm/i915: Remove obsolete ringbuffer emission for gen8+
Since removing the module parameter to force selection of ringbuffer
emission for gen8, the code is defunct. Remove it.

To put the difference into perspective, a couple of microbenchmarks
(bdw i7-5557u, 20170324):
                                        ring          execlists
exec continuous nops on all rings:   1.491us            2.223us
exec sequential nops on each ring:  12.508us           53.682us
single nop + sync:                   9.272us           30.291us

vblank_mode=0 glxgears:            ~11000fps           ~9000fps

Since the earlier submission, gen8 ringbuffer submission has fallen
further and further behind in features. So while ringbuffer may hold the
throughput crown, in terms of interactive latency, execlists is much
better. Alas, we have no convenient metrics for such, other than
demonstrating things we can do with execlists but can not using
legacy ringbuffer submission.

We have made a few improvements to lowlevel execlists throughput,
and ringbuffer currently panics on boot! (bdw i7-5557u, 20171026):

                                        ring          execlists
exec continuous nops on all rings:       n/a            1.921us
exec sequential nops on each ring:       n/a           44.621us
single nop + sync:                       n/a           21.953us

vblank_mode=0 glxgears:                  n/a          ~18500fps

References: https://bugs.freedesktop.org/show_bug.cgi?id=87725
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Once-upon-a-time-Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-2-chris@chris-wilson.co.uk
2017-11-20 21:54:58 +00:00
Chris Wilson
3fef5cda97 drm/i915: Automatic i915_switch_context for legacy
During request construction, after pinning the context we know whether
or not we have to emit a context switch. So move this common operation
from every caller into i915_gem_request_alloc() itself.

v2: Always submit the request if we emitted some commands during request
construction, as typically it also involves changes in global state.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171120102002.22254-2-chris@chris-wilson.co.uk
2017-11-20 15:56:16 +00:00
Chris Wilson
fd13821219 drm/i915: Make request's wait-for-space explicit
At the start of building a request, we would wait for roughly enough
space to fit the average request (to reduce the likelihood of having to
wait and abort partway through request construction). To achieve we
would try to begin a 0-length command packet, this just adds extra
confusion so make the wait-for-space explicit, as in the next patch we
want to move it from the backend to the i915_gem_request_alloc() so it
can ensure that the wait-for-space is the first operation in building a
new request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171115151204.8105-2-chris@chris-wilson.co.uk
2017-11-15 17:12:49 +00:00
Chris Wilson
7c2fa7faf1 drm/i915: Stop caching the "golden" renderstate
As we now record the default HW state and so only emit the "golden"
renderstate once to prepare the HW, there is no advantage in keeping the
renderstate batch around as it will never be used again.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-8-chris@chris-wilson.co.uk
2017-11-10 17:23:22 +00:00
Chris Wilson
d2b4b97933 drm/i915: Record the default hw state after reset upon load
Take a copy of the HW state after a reset upon module loading by
executing a context switch from a blank context to the kernel context,
thus saving the default hw state over the blank context image.
We can then use the default hw state to initialise any future context,
ensuring that each starts with the default view of hw state.

v2: Unmap our default state from the GTT after stealing it from the
context. This should stop us from accidentally overwriting it via the
GTT (and frees up some precious GTT space).

Testcase: igt/gem_ctx_isolation
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-7-chris@chris-wilson.co.uk
2017-11-10 17:23:10 +00:00
Chris Wilson
f4e15af7e2 drm/i915: Mark the context state as dirty/written
In the next few patches, we will want to both copy out of the context
image and write a valid image into a new context. To be completely safe,
we should then couple in our domain tracking to ensure that we don't
have any issues with stale data remaining in unwanted cachelines.

Historically, we omitted the .write=true from the call to set-gtt-domain
in i915_switch_context() in order to avoid a stall between every request
as we would want to wait for the previous context write from the gpu.
Since then, we limit the set-gtt-domain to only occur when we first bind
the vma, so once in use we will never stall, and we are sure to flush
the context following a load from swap.

Equally we never applied the lessons learnt from ringbuffer submission
to execlists; so time to apply the flush of the lrc after load as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171110142634.10551-6-chris@chris-wilson.co.uk
2017-11-10 17:20:29 +00:00
Chris Wilson
11caf5517c drm/i915: Empty the ring before disabling
An interesting snippet from Sandybridge's prm:

"Although a Ring Buffer can be enabled in the non-empty state, it must
not be disabled unless it is empty. Attempting to disable a Ring Buffer
in the non-empty state is UNDEFINED."

Let's avoid the undefined behaviour as we disable the rings prior to
reset and resume.

v2: Tell HEAD to catch up to TAIL (empty ring) first, then reset both to
0 (supposedly while stopped).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171027094311.30380-1-chris@chris-wilson.co.uk
2017-10-27 12:09:29 +01:00
Chris Wilson
aba5e27858 drm/i915: Add a hook for making the engines idle (parking) and unparking
In the next patch, we will want to install a callback when the engines
(GT as a whole) become idle and similarly when they first become busy.
To enable that callback, first rename intel_engines_mark_idle() to
intel_engines_park() and provide the companion intel_engines_unpark().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171025143943.7661-2-chris@chris-wilson.co.uk
2017-10-25 18:47:30 +01:00
Chris Wilson
3d574a6bbb drm/i915: Remove walk over obj->vma_list for the shrinker
In the next patch, we want to reduce the lock coverage within the
shrinker, and one of the dangerous walks we have is over obj->vma_list.
We are only walking the obj->vma_list in order to check whether it has
been permanently pinned by HW access, typically via use on the scanout.
But we have a couple of other long term pins, the context objects for
which we currently have to check the individual vma pin_count. If we
instead mark these using obj->pin_display, we can forgo the dangerous
and sometimes slow list iteration.

v2: Rearrange code to try and avoid confusion from false associations
due to arrangement of whitespace along with rebasing on obj->pin_global.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013202621.7276-4-chris@chris-wilson.co.uk
2017-10-16 20:44:19 +01:00
Chris Wilson
7836cd02f2 drm/i915: Keep the rings stopped until they have been re-initialized
Before modifying the ring register (RING_START, HEAD, TAIL, CTL) we
first make sure it is stopped (or else the hw may not resample the
registers). However, we do not need to let the hw restart until after we
have reprogrammed all the rings. This should help prevent situations
where pending operations on the ring may resume (because we are trying
to re-initialize following an unsuccessful GPU hang, i.e. from
i915_gem_unset_wedged).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103260
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171013131218.18013-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-10-13 20:57:29 +01:00
Chris Wilson
67e6456485 drm/i915: Provide an assert for when we expect forcewake to be held
Add assert_forcewakes_active() (the complementary function to
assert_forcewakes_inactive) that documents the requirement of a
function for its callers to be holding the forcewake ref (i.e. the
function is part of a sequence over which RC6 must be prevented).

One such example is during ringbuffer reset, where RC6 must be held
across the whole reinitialisation sequence.

v2: Include debug information in the WARN so we know which fw domain is
missing.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20171009110301.21705-5-chris@chris-wilson.co.uk
2017-10-09 17:07:29 +01:00
Michal Wajdeczko
4f044a88a8 drm/i915: Rename global i915 to i915_modparams
Our global struct with params is named exactly the same way
as new preferred name for the drm_i915_private function parameter.
To avoid such name reuse lets use different name for the global.

v5: pure rename
v6: fix

Credits-to: Coccinelle

@@
identifier n;
@@
(
-	i915.n
+	i915_modparams.n
)

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjala <ville.syrjala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170919193846.38060-1-michal.wajdeczko@intel.com
2017-09-22 14:50:36 +03:00
Chris Wilson
27a5f61b37 drm/i915: Cancel all ready but queued requests when wedging
When wedging the hw, we want to mark all in-flight requests as -EIO.
This is made slightly more complex by execlists who store the ready but
not yet submitted-to-hw requests on a private queue (an rbtree
priolist). Call into execlists to cancel not only the ELSP tracking for
the submitted requests, but also the queue of unsubmitted requests.

v2: Move the majority of engine_set_wedged to the backends (both legacy
ringbuffer and execlists handling their own lists).

Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Testcase: igt/gem_eio/in-flight-contexts
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170915173100.26470-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
2017-09-18 10:59:55 +01:00
Ville Syrjälä
c549808946 drm/i915: Mask everything in ring HWSTAM on gen6+ in ringbuffer mode
The execlist code already masks everything in the ring HWSTAM, but
the ringbuffer code doesn't. Let's go ahead and do that. Pre-gen6
platforms setup HWSTAM during irq setup already since there's just
the one register, and it also contains bits for non-ring interrupts.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170818183705.27850-13-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-15 14:42:55 +03:00
Daniele Ceraolo Spurio
486e93f72a drm/i915/lrc: allocate separate page for HWSP
On gen8+ we're currently using the PPHWSP of the kernel ctx as the
global HWSP. However, when the kernel ctx gets submitted (e.g. from
__intel_autoenable_gt_powersave) the HW will use that page as both
HWSP and PPHWSP. This causes a conflict in the register arena of the
HWSP, i.e. dword indices below 0x30. We don't current utilize this arena,
but in the following patches we will take advantage of the cached
register state for handling execlist's context status interrupt.

To avoid the conflict, instead of re-using the PPHWSP of the kernel
ctx we can allocate a separate page for the HWSP like what happens for
pre-execlists platform.

v2: Add a use-case for the register arena of the HWSP.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1499357440-34688-1-git-send-email-daniele.ceraolospurio@intel.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170913085605.18299-3-chris@chris-wilson.co.uk
2017-09-13 15:02:39 +01:00
Michel Thierry
a2d3d2655e drm/i915: Add a default case in gen7 hwsp switch-case
Gen7 won't get any new engines, and we already added VCS2 there to just
silence gcc's not handled in switch warnings.

Use a default case instead, otherwise we will need to keep adding extra
cases if changes happen in the future.

v2: Since reaching the default case is impossible, use GEM_BUG_ON (Chris).

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170830180115.907-1-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-09-08 20:07:06 +01:00
Chris Wilson
6492ca79c8 drm/i915: Enforce that CS packets are qword aligned
We require the caller to ensure that the packets they wish to emit into
the CS ring are qword aligned (i.e. have an even number of dwords).
Double check this.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20170721161101.1618-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
2017-07-27 09:38:57 +02:00
Tvrtko Ursulin
c58949f418 drm/i915: Do not re-calculate num_rings locally
Since bb8f0f5abd ("drm/i915: Split intel_engine allocation
and initialisation") intel_info->num_rings is set early in the
load sequence and so available to be used direclty in the 2nd
load phase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170616130339.23015-1-tvrtko.ursulin@linux.intel.com
2017-06-20 12:42:13 +01:00
Chris Wilson
5e5655c32d drm/i915: Micro-optimise hotpath through intel_ring_begin()
Typically, there is space available within the ring and if not we have
to wait (by definition a slow path). Rearrange the code to reduce the
number of branches and stack size for the hotpath, accomodating a slight
growth for the wait.

v2: Fix the new assert that packets are not larger than the actual ring.
v3: Make the parameters unsigned as well to make usage.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170504130846.4807-3-chris@chris-wilson.co.uk
2017-05-04 15:40:38 +01:00
Chris Wilson
95aebcb2da drm/i915: Report the ring->space from intel_ring_update_space()
Some callers immediately want to know the current ring->space after
calling intel_ring_update_space(), which we can freely provide via the
return parameter.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170504130846.4807-2-chris@chris-wilson.co.uk
2017-05-04 15:40:38 +01:00
Chris Wilson
605d5b3297 drm/i915: Avoid the branch in computing intel_ring_space()
Exploit the power-of-two ring size to compute the space across the
wraparound using a mask rather than a if. Convert to unsigned integers
so the operation is well defined.

References: https://bugs.freedesktop.org/show_bug.cgi?id=99671
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170504130846.4807-1-chris@chris-wilson.co.uk
2017-05-04 15:40:38 +01:00
Chris Wilson
266a240bf0 drm/i915: Use engine->context_pin() to report the intel_ring
Since unifying ringbuffer/execlist submission to use
engine->pin_context, we ensure that the intel_ring is available before
we start constructing the request. We can therefore move the assignment
of the request->ring to the central i915_gem_request_alloc() and not
require it in every engine->request_alloc() callback. Another small step
towards simplification (of the core, but at a cost of handling error
pointers in less important callers of engine->pin_context).

v2: Rearrange a few branches to reduce impact of PTR_ERR() on gcc's code
generation.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170504093308.4137-1-chris@chris-wilson.co.uk
2017-05-04 11:54:43 +01:00
Joonas Lahtinen
63ffbcdadc drm/i915: Sanitize engine context sizes
Pre-calculate engine context size based on engine class and device
generation and store it in the engine instance.

v2:
- Squash and get rid of hw_context_size (Chris)

v3:
- Move after MMIO init for probing on Gen7 and 8 (Chris)
- Retained rounding (Tvrtko)
v4:
- Rebase for deferred legacy context allocation

Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Oscar Mateo <oscar.mateo@intel.com>
Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
Cc: intel-gvt-dev@lists.freedesktop.org
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-28 12:11:59 +03:00
Chris Wilson
3204c343bb drm/i915: Defer context state allocation for legacy ring submission
Almost from the outset for execlists, we used deferred allocation of the
logical context and rings. Then we ported the infrastructure for pinning
contexts back to legacy, and so now we are able to also implement
deferred allocation for context objects prior to first use on the legacy
submission.

v2: We still need to differentiate between legacy engines, Joonas is
fixing that but I want this first ;) (Joonas)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170427104651.22394-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-27 12:22:13 +01:00
Chris Wilson
0100186386 drm/i915: Poison the request before emitting commands
If we poison the request before we emit commands, it should be easier to
spot when we execute an uninitialised request.

References: https://bugs.freedesktop.org/show_bug.cgi?id=100144
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170423170619.7156-2-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-04-25 15:34:24 +01:00
Chris Wilson
e6ba9992de drm/i915: Differentiate between sw write location into ring and last hw read
We need to keep track of the last location we ask the hw to read up to
(RING_TAIL) separately from our last write location into the ring, so
that in the event of a GPU reset we do not tell the HW to proceed into
a partially written request (which can happen if that request is waiting
for an external signal before being executed).

v2: Refactor intel_ring_reset() (Mika)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100144
Testcase: igt/gem_exec_fence/await-hang
Fixes: 821ed7df6e ("drm/i915: Update reset path to fix incomplete requests")
Fixes: d55ac5bf97 ("drm/i915: Defer transfer onto execution timeline to actual hw submission")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170425130049.26147-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
2017-04-25 15:33:22 +01:00
Chris Wilson
2d6c4c8423 drm/i915: Use discardable buffers for rings
The contents of a ring are only valid between HEAD and TAIL, when the
ring is idle (HEAD == TAIL) we can simply let the pages go under memory
pressure if they are not pinned by an active context. Any new content
will be written after HEAD and so the ring will again be valid between
HEAD and TAIL, everything outside can be discarded.

Note that we take care of ensuring that we do not reset the HEAD
backwards following a GPU hang on an idle ring.

The same precautions are what enable us to use stolen memory for rings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170420101709.27250-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2017-04-20 12:00:32 +01:00
Oscar Mateo
5ff36d36a3 drm/i915: Use the same vfunc for BSD2 ring init
If we needed to do something different for the init functions, we could
always look at the engine instance to make the distinction. But, in any
case, the two functions are virtually identical already (please notice
that BSD2_RING is only used from gen8 onwards).

With this, the init functions depends excusively on the engine class
(a fact that we will use soon).

v2: Commit message

Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1491834873-9345-3-git-send-email-oscar.mateo@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2017-04-11 12:57:51 +01:00
Chris Wilson
f42bb651d1 drm/i915: Use safer intel_uncore_wait_for_register in ring-init
While we do hold the forcewake for legacy ringbuffer initialisation, we
don't guard our access with the uncore.lock spinlock. In theory, we only
initialise when no others should be accessing the same mmio cachelines,
but in practice be safe as this is an infrequently used path and not
worth risky micro-optimisations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170411101340.31994-5-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2017-04-11 12:50:44 +01:00