One does not lightly add a new hidden struct_mutex dependency deep within
the execbuf bowels! The immediate suspicion in seeing the whitelist
cached on the context, is that it is intended to be preserved between
batches, as the kernel is quite adept at caching small allocations
itself. But no, it's sole purpose is to serialise command submission in
order to save a kmalloc on a slow, slow path!
By removing the whitelist dependency from the context, our freedom to
chop the big struct_mutex is greatly augmented.
v2: s/set_bit/__set_bit/ as the whitelist shall never be accessed
concurrently.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191128113424.3885958-1-chris@chris-wilson.co.uk
The design of our interrupt handlers is that we ack the receipt of the
interrupt first, inside the critical section where the master interrupt
control is off and other cpus cannot start processing the next
interrupt; and then process the interrupt events afterwards. However,
Icelake introduced a whole new set of banked GT_IIR that are inherently
serialised and slow to retrieve the IIR and must be processed within the
critical section. We can still push our breadcrumbs out of this critical
section by using our irq_worker. On bdw+, this should not make too much
of a difference as we only slightly defer the breadcrumbs, but on icl+
this should make a big difference to our throughput of interrupts from
concurrently executing engines.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127115813.3345823-1-chris@chris-wilson.co.uk
The expected downside to commit 58b4c1a07a ("drm/i915: Reduce nested
prepare_remote_context() to a trylock") was that it would need to return
-EAGAIN to userspace in order to resolve potential mutex inversion. Such
an unsightly round trip is unnecessary if we could atomically insert a
barrier into the i915_active_fence, so make it happen.
Currently, we use the timeline->mutex (or some other named outer lock)
to order insertion into the i915_active_fence (and so individual nodes
of i915_active). Inside __i915_active_fence_set, we only need then
serialise with the interrupt handler in order to claim the timeline for
ourselves.
However, if we remove the outer lock, we need to ensure the order is
intact between not only multiple threads trying to insert themselves
into the timeline, but also with the interrupt handler completing the
previous occupant. We use xchg() on insert so that we have an ordered
sequence of insertions (and each caller knows the previous fence on
which to wait, preserving the chain of all fences in the timeline), but
we then have to cmpxchg() in the interrupt handler to avoid overwriting
the new occupant. The only nasty side-effect is having to temporarily
strip off the RCU-annotations to apply the atomic operations, otherwise
the rules are much more conventional!
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112402
Fixes: 58b4c1a07a ("drm/i915: Reduce nested prepare_remote_context() to a trylock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127134527.3438410-1-chris@chris-wilson.co.uk
Now that we rapidly park the GT when the GPU idles, we often find
ourselves idling faster than the RC6 promotion timer. Thus if we tell
the GPU to enter RC6 manually as we park, we can do so quicker (by
around 50ms, half an EI on average) and marginally increase our
powersaving across all execlists platforms.
v2: Now with a selftest to check we can enter RC6 manually
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Acked-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191127095657.3209854-1-chris@chris-wilson.co.uk
During the Display Interrupt Service routine the Display Interrupt
Enable bit must be disabled, The interrupts handled, then the
Display Interrupt Enable bit must be set to prevent possible missed
interrupts.
Bspec: 49212
V2: Change Title to remove SDE reference.
V3: Fix TAB spacing.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Aditya Swarup <aditya.swarup@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Clint Taylor <clinton.a.taylor@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121201455.2558-1-clinton.a.taylor@intel.com
Starting with gen12, PORT_A can be connected to a transcoder
with audio support. Modify the existing logic that disabled
audio on PORT_A unconditionally.
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125125313.17584-1-kai.vehmanen@linux.intel.com
According to BSpec 53998, there is a mask of
max 8 SAGV/QGV points we need to support.
Bumping this up to keep the CI happy(currently
preventing tests to run), until all SAGV
changes land.
v2: Fix second plane where QGV points were
hardcoded as well.
v3: Change the naming of I915_NUM_SAGV_POINTS
to be I915_NUM_QGV_POINTS, as more meaningful
(Ville Syrjälä)
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112189
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125160800.14740-1-stanislav.lisovskiy@intel.com
[vsyrjala: Add missing braces around else (checkpatch), fix Bugzilla tag]
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Based on a sampling of a number of benchmarks across platforms, by
default opt for a much more lenient timeout so that we should not
adversely affect existing "good" clients.
640ms ought to be enough for anyone.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112169
Fixes: 3a7a92aba8 ("drm/i915/execlists: Force preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125162737.2161069-1-chris@chris-wilson.co.uk
The major drawback of commit 7e34f4e4aa ("drm/i915/gen8+: Add RC6 CTX
corruption WA") is that it disables RC6 while Skylake (and friends) is
active, and we do not consider the GPU idle until all outstanding
requests have been retired and the engine switched over to the kernel
context. If userspace is idle, this task falls onto our background idle
worker, which only runs roughly once a second, meaning that userspace has
to have been idle for a couple of seconds before we enable RC6 again.
Naturally, this causes us to consume considerably more energy than
before as powersaving is effectively disabled while a display server
(here's looking at you Xorg) is running.
As execlists will get a completion event as each context is completed,
we can use this interrupt to queue a retire worker bound to this engine
to cleanup idle timelines. We will then immediately notice the idle
engine (without userspace intervention or the aid of the background
retire worker) and start parking the GPU. Thus during light workloads,
we will do much more work to idle the GPU faster... Hopefully with
commensurate power saving!
v2: Watch context completions and only look at those local to the engine
when retiring to reduce the amount of excess work we perform.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112315
References: 7e34f4e4aa ("drm/i915/gen8+: Add RC6 CTX corruption WA")
References: 2248a28384 ("drm/i915/gen8+: Add RC6 CTX corruption WA")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-3-chris@chris-wilson.co.uk
In the next patch, we will introduce a new asynchronous retirement
worker, fed by execlists CS events. Here we may queue a retirement as
soon as a request is submitted to HW (and completes instantly), and we
also want to process that retirement as early as possible and cannot
afford to postpone (as there may not be another opportunity to retire it
for a few seconds). To allow the new async retirer to run in parallel
with our submission, pull the __i915_request_queue (that passes the
request to HW) inside the timelines spinlock so that the retirement
cannot release the timeline before we have completed the submission.
v2: Actually to play nicely with engine_retire, we have to raise the
timeline.active_lock before releasing the HW. intel_gt_retire_requsts()
is still serialised by the outer lock so they cannot see this
intermediate state, and engine_retire is serialised by HW submission.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-2-chris@chris-wilson.co.uk
As the engine->kernel_context is used within the engine-pm barrier, we
have to be careful when emitting requests outside of the barrier, as the
strict timeline locking rules do not apply. Instead, we must ensure the
engine_park() cannot be entered as we build the request, which is
simplest by taking an explicit engine-pm wakeref around the request
construction.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-1-chris@chris-wilson.co.uk
I rushed a last minute correction to cancel_port_requests() to prevent
the snooping of *execlists->active as the inflight array was being
updated, without noticing we iterated the inflight array starting from
active! Oops.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112387
Fixes: 331bf90591 ("drm/i915/gt: Mark the execlists->active as the primary volatile access")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125112520.1760492-1-chris@chris-wilson.co.uk
Since we want to do a lockless read of the current active request, and
that request is written to by process_csb also without serialisation, we
need to instruct gcc to take care in reading the pointer itself.
Otherwise, we have observed execlists_active() to report 0x40.
[ 2400.760381] igt/para-4098 1..s. 2376479300us : process_csb: rcs0 cs-irq head=3, tail=4
[ 2400.760826] igt/para-4098 1..s. 2376479303us : process_csb: rcs0 csb[4]: status=0x00000001:0x00000000
[ 2400.761271] igt/para-4098 1..s. 2376479306us : trace_ports: rcs0: promote { b9c59:2622, b9c55:2624 }
[ 2400.761726] igt/para-4097 0d... 2376479311us : __i915_schedule: rcs0: -2147483648->3, inflight:0000000000000040, rq:ffff888208c1e940
which is impossible!
The answer is that as we keep the existing execlists->active pointing
into the array as we copy over that array, the unserialised read may see
a partial pointer value.
Fixes: df40306902 ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191125094318.1630806-1-chris@chris-wilson.co.uk
Commit 750e76b4f9 ("drm/i915/gt: Move the [class][inst] lookup for
engines onto the GT") changed the engine query to iterate over uabi
engines but left the buffer size calculation look at the physical engine
count. Difference has no practical consequence but it is nicer to align
both queries.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 750e76b4f9 ("drm/i915/gt: Move the [class][inst] lookup for engines onto the GT")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191122104115.29610-1-tvrtko.ursulin@linux.intel.com
Before checking the current i915_active state for the asynchronous work
we submitted, flush any ongoing callback. This ensures that our sampling
is robust and does not sporadically fail due to bad timing as the work
is running on another cpu.
v2: Drop the fence callback sync, retiring under the lock should be good
enough to synchronize with engine_retire() and the
intel_gt_retire_requests() background worker.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191122132404.690440-1-chris@chris-wilson.co.uk
Bonded request submission is designed to allow requests to execute in
parallel as laid out by the user. If the master request is already
finished before its bonded pair is submitted, the pair were not destined
to run in parallel and we lose the information about the master engine
to dictate selection of the secondary. If the second request was
required to be run on a particular engine in a virtual set, that should
have been specified, rather than left to the whims of a random
unconnected requests!
In the selftest, I made the mistake of not ensuring the master would
overlap with its bonded pairs, meaning that it could indeed complete
before we submitted the bonds. Those bonds were then free to select any
available engine in their virtual set, and not the one expected by the
test.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191122112152.660743-1-chris@chris-wilson.co.uk
As we start peeking into requests for longer and longer, e.g.
incorporating use of spinlocks when only protected by an
rcu_read_lock(), we need to be careful in how we reset the request when
recycling and need to preserve any barriers that may still be in use as
the request is reset for reuse.
Quoting Linus Torvalds:
> If there is refcounting going on then why use SLAB_TYPESAFE_BY_RCU?
.. because the object can be accessed (by RCU) after the refcount has
gone down to zero, and the thing has been released.
That's the whole and only point of SLAB_TYPESAFE_BY_RCU.
That flag basically says:
"I may end up accessing this object *after* it has been free'd,
because there may be RCU lookups in flight"
This has nothing to do with constructors. It's ok if the object gets
reused as an object of the same type and does *not* get
re-initialized, because we're perfectly fine seeing old stale data.
What it guarantees is that the slab isn't shared with any other kind
of object, _and_ that the underlying pages are free'd after an RCU
quiescent period (so the pages aren't shared with another kind of
object either during an RCU walk).
And it doesn't necessarily have to have a constructor, because the
thing that a RCU walk will care about is
(a) guaranteed to be an object that *has* been on some RCU list (so
it's not a "new" object)
(b) the RCU walk needs to have logic to verify that it's still the
*same* object and hasn't been re-used as something else.
In contrast, a SLAB_TYPESAFE_BY_RCU memory gets free'd and re-used
immediately, but because it gets reused as the same kind of object,
the RCU walker can "know" what parts have meaning for re-use, in a way
it couidn't if the re-use was random.
That said, it *is* subtle, and people should be careful.
> So the re-use might initialize the fields lazily, not necessarily using a ctor.
If you have a well-defined refcount, and use "atomic_inc_not_zero()"
to guard the speculative RCU access section, and use
"atomic_dec_and_test()" in the freeing section, then you should be
safe wrt new allocations.
If you have a completely new allocation that has "random stale
content", you know that it cannot be on the RCU list, so there is no
speculative access that can ever see that random content.
So the only case you need to worry about is a re-use allocation, and
you know that the refcount will start out as zero even if you don't
have a constructor.
So you can think of the refcount itself as always having a zero
constructor, *BUT* you need to be careful with ordering.
In particular, whoever does the allocation needs to then set the
refcount to a non-zero value *after* it has initialized all the other
fields. And in particular, it needs to make sure that it uses the
proper memory ordering to do so.
NOTE! One thing to be very worried about is that re-initializing
whatever RCU lists means that now the RCU walker may be walking on the
wrong list so the walker may do the right thing for this particular
entry, but it may miss walking *other* entries. So then you can get
spurious lookup failures, because the RCU walker never walked all the
way to the end of the right list. That ends up being a much more
subtle bug.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191122094924.629690-1-chris@chris-wilson.co.uk
Use our more regular igt_flush_test() to bind the wait-for-idle and
error out instead of waiting around forever on critical failure.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121233021.507400-1-chris@chris-wilson.co.uk
Assume that intel_wakeref_get() may take the mutex, and perform other
sleeping actions in the course of its callbacks and so use might_sleep()
to ensure that all callers abide. Anything that cannot sleep has to use
e.g. intel_wakeref_get_if_active() to guarantee its avoidance of the
non-atomic paths.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121130528.309474-1-chris@chris-wilson.co.uk
As we wait upon a request, we must be holding a reference to it, and be
wary that i915_request_add() consumes the passed in reference.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121093326.134774-1-chris@chris-wilson.co.uk
Since retirement may be running in a worker on another CPU, it may be
skipped in the local intel_gt_wait_for_idle(). To ensure the state is
consistent for our sanity checks upon load, serialise with the remote
retirer by waiting on the timeline->mutex.
Outside of this use case, e.g. on suspend or module unload, we expect the
slack to be picked up by intel_gt_pm_wait_for_idle() and so prefer to
put the special case serialisation with retirement in its single user,
for now at least.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-2-chris@chris-wilson.co.uk
From inside an active timeline in the execbuf ioctl, we may try to
reclaim some space in the GGTT. We need GGTT space for all objects on
!full-ppgtt platforms, and for context images everywhere. However, to
free up space in the GGTT we may need to remove some pinned objects
(e.g. context images) that require flushing the idle barriers to remove.
For this we use the big hammer of intel_gt_wait_for_idle()
However, commit 7936a22dd4 ("drm/i915/gt: Wait for new requests in
intel_gt_retire_requests()") will continue spinning on the wait if a
timeline is active but lacks requests, as is the case during execbuf
reservation. Spinning forever is quite time consuming, so revert that
commit and start again.
In practice, the effect commit 7936a22dd4 was trying to achieve is
accomplished by commit 1683d24c14 ("drm/i915/gt: Move new timelines
to the end of active_list"), so there is no immediate rush to replace
the looping.
Testcase: igt/gem_exec_reloc/basic-range
Fixes: 7936a22dd4 ("drm/i915/gt: Wait for new requests in intel_gt_retire_requests()")
References: 1683d24c14 ("drm/i915/gt: Move new timelines to the end of active_list")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-1-chris@chris-wilson.co.uk
GuC submission path can be called from an interrupt context
and so should use a worker to avoid holding a mutex.
References: 07779a76ee ("drm/i915: Mark up the calling context for intel_wakeref_put()")
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120211321.88021-1-stuart.summers@intel.com
pm_suspend_target_state is declared under CONFIG_PM_SLEEP but only
defined under CONFIG_SUSPEND. Play safe and only use the symbol if it is
both declared and defined.
Reported-by: kbuild-all@lists.01.org
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: a70a9e998e ("drm/i915: Defer rc6 shutdown to suspend_late")
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120182209.3967833-1-chris@chris-wilson.co.uk
In commit a79ca656b6 ("drm/i915: Push the wakeref->count deferral to
the backend"), I erroneously concluded that we last modify the engine
inside __i915_request_commit() meaning that we could enable concurrent
submission for userspace as we enqueued this request. However, this
falls into a trap with other users of the engine->kernel_context waking
up and submitting their request before the idle-switch is queued, with
the result that the kernel_context is executed out-of-sequence most
likely upsetting the GPU and certainly ourselves when we try to retire
the out-of-sequence requests.
As such we need to hold onto the effective engine->kernel_context mutex
lock (via the engine pm mutex proxy) until we have finish queuing the
request to the engine.
v2: Serialise against concurrent intel_gt_retire_requests()
v3: Describe the hairy locking scheme with intel_gt_retire_requests()
for future reference.
v4: Combine timeline->lock and engine pm release; it's hairy.
Fixes: a79ca656b6 ("drm/i915: Push the wakeref->count deferral to the backend")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120165514.3955081-2-chris@chris-wilson.co.uk
The general concept was that intel_timeline.active_count was locked by
the intel_timeline.mutex. The exception was for power management, where
the engine->kernel_context->timeline could be manipulated under the
global wakeref.mutex.
This was quite solid, as we always manipulated the timeline only while
we held an engine wakeref.
And then we started retiring requests outside of struct_mutex, only
using the timelines.active_list and the timeline->mutex. There we
started manipulating intel_timeline.active_count outside of an engine
wakeref, and so introduced a race between __engine_park() and
intel_gt_retire_requests(), a race that could result in the
engine->kernel_context not being added to the active timelines and so
losing requests, which caused us to keep the system permanently powered
up [and unloadable].
The race would be easy to close if we could take the engine wakeref for
the timeline before we retire -- except timelines are not bound to any
engine and so we would need to keep all active engines awake. The
alternative is to guard intel_timeline_enter/intel_timeline_exit for use
outside of the timeline->mutex.
Fixes: e5dadff4b0 ("drm/i915: Protect request retirement with timeline->mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120165514.3955081-1-chris@chris-wilson.co.uk
Previously, we assumed we could use mutex_trylock() within an atomic
context, falling back to a worker if contended. However, such trickery
is illegal inside interrupt context, and so we need to always use a
worker under such circumstances. As we normally are in process context,
we can typically use a plain mutex, and only defer to a work when we
know we are being called from an interrupt path.
Fixes: 51fbd8de87 ("drm/i915/pmu: Atomically acquire the gt_pm wakeref")
References: a0855d24fc ("locking/mutex: Complain upon mutex API misuse in IRQ contexts")
References: https://bugs.freedesktop.org/show_bug.cgi?id=111626
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120125433.3767149-1-chris@chris-wilson.co.uk
Move the assert_vblank_disabled() into intel_crtc_vblank_on()
so that we don't have to inline it all over.
This does mean we now assert_vblank_disabled() during readout as well
but that is totally fine as it happens after drm_crtc_vblank_reset().
One can even argue it's what we want to do anyway to make sure
the reset actually happened.
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191118164430.27265-4-ville.syrjala@linux.intel.com
Just pass the atomic state and the crtc to intel_encoders_enable() & co.
Make life simpler when you don't have to think which state (old vs. new)
you have to pass in. Also constify the states while at it.
Reviewed-by: Manasi Navare <manasi.d.navare@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191118164430.27265-2-ville.syrjala@linux.intel.com
i915_request_add() consumes the passed in reference to the i915_request,
so if the selftest caller wishes to wait upon it afterwards, it needs to
take a reference for itself.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191120102741.3734346-1-chris@chris-wilson.co.uk