In order to ensure constant caches are invalidated
properly with a0, we need extra hdc flush after invalidation.
v2: use IS_TGL_REVID (Chris)
References: HSDES#1604544889
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-4-mika.kuoppala@linux.intel.com
Add hdc pipeline flush to ensure memory state is coherent
in L3 when we are done.
v2: Flush also in breadcrumbs (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-3-mika.kuoppala@linux.intel.com
Aim for completeness and invalidate also the ro parts
in l3 cache. This might allow to get rid of the preparser
disable/enable workaround on invalidation path.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015154449.10338-2-mika.kuoppala@linux.intel.com
There is no significance to our delay before clearing the semaphore the
engine is waiting on, so release it as soon as we acknowledge the CS
update following our preemption request. This should allow the GPU to
resume work earlier, if it was stuck on the semaphore at the end of a
request.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191015093204.25693-1-chris@chris-wilson.co.uk
We rely on only the tasklet being allowed to call into process_csb(), so
assert that is locked when we do. As the tasklet uses a simple bitlock,
there is no strong lockdep checking so we must make do with a plain
assertion that the tasklet is running and assume that we are the
tasklet!
v2: Fixup intel_gt_sanitize() to prepare each engine for the reset so
that the locks are marked as held during the reset
v3: Check for existent function pointers for very early sanitisation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191014121336.30137-1-chris@chris-wilson.co.uk
Since commit e2144503bf ("drm/i915: Prevent bonded requests from
overtaking each other on preemption") we have restricted requests to run
on their chosen engine across preemption events. We can take this
restriction into account to know that we will want to resubmit those
requests onto the same physical engine, and so can shortcircuit the
virtual engine selection process and keep the request on the same
engine during unwind.
References: e2144503bf ("drm/i915: Prevent bonded requests from overtaking each other on preemption")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Ramlingam C <ramalingam.c@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191013203012.25208-1-chris@chris-wilson.co.uk
Sometimes we want to emit a terminator request, a request that flushes
the pipeline and allows no request to come after it. This can be used
for a "preempt-to-idle" to ensure that upon processing the
context-switch to that request, all other active contexts have been
flushed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191012070136.32058-1-chris@chris-wilson.co.uk
We set out-of-bound parameters inside the i915_requests.flags field,
such as disabling preemption or marking the end-of-context. We should
not coalesce consecutive requests if they have differing instructions
as we only inspect the last active request in a context. Thus if we
allow a later request to be merged into the same execution context, it
will mask any of the earlier flags.
References: 2a98f4e65b ("drm/i915: add infrastructure to hold off preemption on a request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191011190325.10979-9-chris@chris-wilson.co.uk
Only the requests that have not completed do we want to change the
status of to signal the -EIO when cancelling the inflight set of requests
upon wedging.
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191011103345.26013-1-chris@chris-wilson.co.uk
Before we BUG out with bad pending state, leave a telltale as to which
test failed.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191010071434.31195-2-chris@chris-wilson.co.uk
There are small differences between the blitter and the video engines in
the xcs context image (e.g. registers 0x200 and 0x204 only exist on the
blitter). Since we never explicitly set a value for those register and
given that we don't need to update the offsets in the lrc image when we
change engine within the class for virtual engine because the HW can
handle that, instead of having a separate define for the BCS we can
just restrict the programming to the part we're interested in, which is
common across the engines.
Bspec: 45584
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191009230424.6507-2-daniele.ceraolospurio@intel.com
The specs don't mention any specific HW limitation on the blitter and
manual inspection shows that the HW does set the relative MMIO bit in
the LRI of the blitter context image, so we can remove our limitations.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20191009230424.6507-1-daniele.ceraolospurio@intel.com
Now that we dropped the engine->active.lock serialisation from around
process_csb(), direct submission can run concurrently to the interrupt
handler. As such execlists->active may be advanced as we dequeue,
dropping the reference to the request. We need to employ our RCU request
protection to ensure that the request is not freed too early.
Fixes: df40306902 ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191009100955.21477-1-chris@chris-wilson.co.uk
With the introduction of ctx->engines[] we allow multiple logical
contexts to be used on the same engine (e.g. with virtual engines).
According to bspec, aach logical context requires a unique tag in order
for context-switching to occur correctly between them. [Simple
experiments show that it is not so easy to trick the HW into performing
a lite-restore with matching logical IDs, though my memory from early
Broadwell experiments do suggest that it should be generating
lite-restores.]
We only need to keep a unique tag for the active lifetime of the
context, and for as long as we need to identify that context. The HW
uses the tag to determine if it should use a lite-restore (why not the
LRCA?) and passes the tag back for various status identifies. The only
status we need to track is for OA, so when using perf, we assign the
specific context a unique tag.
v2: Calculate required number of tags to fill ELSP.
Fixes: 976b55f0e1 ("drm/i915: Allow a context to define its set of engines")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111895
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-14-chris@chris-wilson.co.uk
If execlists's lite-restore is based on the common GEM context tag
rather than the per-intel_context LRCA, then a context switch between
two intel_contexts on the same engine derived from the same GEM context
will perform a lite-restore instead of a full context switch. We can
exploit this by poisoning the ringbuffer of the first context and trying
to trick a simple RING_TAIL update (i.e. lite-restore)
v2: Also check what happens if preempt ce[0] with ce[1] (both instances
on the same engine from the same parent context) [Tvrtko]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191002183459.26614-1-chris@chris-wilson.co.uk
Some of our commands (MI_FLUSH_DW / PIPE_CONTROL) require a post-sync write
operation to be performed. Currently we're using dedicated VMA for
PIPE_CONTROL and global HWSP for MI_FLUSH_DW.
On execlists platforms, each of our contexts has an area that can be
used as scratch space. Let's use that instead.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190926133142.2838-2-chris@chris-wilson.co.uk
Having decided that we only care about the promotion predicate, we can
simplify gen12_csb_parse to simply check whether we need to jump to a
new queue.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190925130845.17952-1-chris@chris-wilson.co.uk
Before we submit the first context to HW, we need to construct a valid
image of the register state. This layout is defined by the HW and should
match the layout generated by HW when it saves the context image.
Asserting that this should be equivalent should help avoid any undefined
behaviour and verify that we haven't missed anything important!
Of course, having insisted that the initial register state within the
LRC should match that returned by HW, we need to ensure that it does.
v2: Drop the RELATIVE_MMIO flag from gen11, we ignore it for
constructing the lrc image.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190924145950.3011-1-chris@chris-wilson.co.uk
Force bonded requests to run on distinct engines so that they cannot be
shuffled onto the same engine where timeslicing will reverse the order.
A bonded request will often wait on a semaphore signaled by its master,
creating an implicit dependency -- if we ignore that implicit dependency
and allow the bonded request to run on the same engine and before its
master, we will cause a GPU hang. [Whether it will hang the GPU is
debatable, we should keep on timeslicing and each timeslice should be
"accidentally" counted as forward progress, in which case it should run
but at one-half to one-third speed.]
We can prevent this inversion by restricting which engines we allow
ourselves to jump to upon preemption, i.e. baking in the arrangement
established at first execution. (We should also consider capturing the
implicit dependency using i915_sched_add_dependency(), but first we need
to think about the constraints that requires on the execution/retirement
ordering.)
Fixes: 8ee36e048c ("drm/i915/execlists: Minimalistic timeslicing")
References: ee1136908e ("drm/i915/execlists: Virtual engine bonding")
Testcase: igt/gem_exec_balancer/bonded-slice
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-3-chris@chris-wilson.co.uk
Due to the nature of preempt-to-busy the execlists active tracking and
the schedule queue may become temporarily desync'ed (between resubmission
to HW and its ack from HW). This means that we may have unwound a
request and passed it back to the virtual engine, but it is still
inflight on the HW and may even result in a GPU hang. If we detect that
GPU hang and try to reset, the hanging request->engine will no longer
match the current engine, which means that the request is not on the
execlists active list and we should not try to find an older incomplete
request. Given that we have deduced this must be a request on a virtual
engine, it is the single active request in the context and so must be
guilty (as the context is still inflight, it is prevented from being
executed on another engine as we process the reset).
Fixes: 22b7a426bb ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-2-chris@chris-wilson.co.uk
As preempt-to-busy leaves the request on the HW as the resubmission is
processed, that request may complete in the background and even cause a
second virtual request to enter queue. This second virtual request
breaks our "single request in the virtual pipeline" assumptions.
Furthermore, as the virtual request may be completed and retired, we
lose the reference the virtual engine assumes is held. Normally, just
removing the request from the scheduler queue removes it from the
engine, but the virtual engine keeps track of its singleton request via
its ve->request. This pointer needs protecting with a reference.
v2: Drop unnecessary motion of rq->engine = owner
Fixes: 22b7a426bb ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923152844.8914-1-chris@chris-wilson.co.uk
Pull setting -EIO on the hung requests into its own utility function.
Having allowed ourselves to short-circuit submission of completed
requests, we can now do the mark_eio() prior to submission and avoid
some redundant operations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-4-chris@chris-wilson.co.uk
If we are asked to submit a completed request, just move it onto the
active-list without modifying it's payload. If we try to emit the
modified payload of a completed request, we risk racing with the
ring->head update during retirement which may advance the head past our
breadcrumb and so we generate a warning for the emission being behind
the RING_HEAD.
v2: Commentary for the sneaky, shared responsibility between functions.
v3: Spelling mistakes and bonus assertion
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-3-chris@chris-wilson.co.uk
Since amalgamating the queued and active lists in commit 422d7df4f0
("drm/i915: Replace engine->timeline with a plain list"), performing a
i915_request_submit() will remove the request from the execlists
priority queue.
References: 422d7df4f0 ("drm/i915: Replace engine->timeline with a plain list")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-2-chris@chris-wilson.co.uk
A gpu hang can occur at any time, given a sufficiently angry gpu. An
example is when it forgets to perform a context-switch at the end of a
request, leaving us with a hanging GPU on a completed request. Here, we
may retire the request, only leaving its context alive via the active
barrier. When we reset the GPU on a completed request, we do not modify
its context image (just updating the ring state) and can safely defer
the assertion that we have the image pinned and ready to modify.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111639
Fixes: dffa8feb30 ("drm/i915/perf: Assert locking for i915_init_oa_perf_state()")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190923110056.15176-1-chris@chris-wilson.co.uk
The request->timeline is only valid until the request is retired (i.e.
before it is completed). Upon retiring the request, the context may be
unpinned and freed, and along with it the timeline may be freed. We
therefore need to be very careful when chasing rq->timeline that the
pointer does not disappear beneath us. The vast majority of users are in
a protected context, either during request construction or retirement,
where the timeline->mutex is held and the timeline cannot disappear. It
is those few off the beaten path (where we access a second timeline) that
need extra scrutiny -- to be added in the next patch after first adding
the warnings about dangerous access.
One complication, where we cannot use the timeline->mutex itself, is
during request submission onto hardware (under spinlocks). Here, we want
to check on the timeline to finalize the breadcrumb, and so we need to
impose a second rule to ensure that the request->timeline is indeed
valid. As we are submitting the request, it's context and timeline must
be pinned, as it will be used by the hardware. Since it is pinned, we
know the request->timeline must still be valid, and we cannot submit the
idle barrier until after we release the engine->active.lock, ergo while
submitting and holding that spinlock, a second thread cannot release the
timeline.
v2: Don't be lazy inside selftests; hold the timeline->mutex for as long
as we need it, and tidy up acquiring the timeline with a bit of
refactoring (i915_active_add_request)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-1-chris@chris-wilson.co.uk
Before we execute a batch, we must first issue any and all TLB
invalidations so that batch picks up the new page table entries.
Tigerlake's preparser is weakening our post-sync CS_STALL inside the
invalidate pipe-control and allowing the loading of the batch buffer
before we have setup its page table (and so it loads the wrong page and
executes indefinitely).
The igt_cs_tlb indicates that this issue can only be observed on rcs,
even though the preparser is common to all engines. Alternatively, we
could do TLB shootdown via mmio on updating the GTT.
By inserting the pre-parser disable inside EMIT_INVALIDATE, we will also
accidentally fixup execution that writes into subsequent batches, such
as gem_exec_whisper and even relocations performed on the GPU. We should
be careful not to allow this disable to become baked into the uABI! The
issue is that if userspace relies on our disabling of the HW
optimisation, when we are ready to enable that optimisation, userspace
will then be broken...
Testcase: igt/i915_selftests/live_gtt/igt_cs_tlb
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111753
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919151811.9526-1-chris@chris-wilson.co.uk
On Tigerlake, MI_SEMAPHORE_WAIT grew an extra dword, so be sure to
update the length field and emit that extra parameter and any padding
noop as required.
v2: Define the token shift while we are adding the updated MI_SEMAPHORE_WAIT
v3: Use int instead of bool in the addition so that readers are not left
wondering about the intricacies of the C spec. Now they just have to
worry what the integer value of a boolean operation is...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190917123055.28965-1-chris@chris-wilson.co.uk
We see failures where the context continues executing past a
preemption event, eventually leading to situations where a request has
executed before we have event submitted it to HW! It seems like tgl is
ignoring our RING_TAIL updates, but more likely is that there is a
missing update required for our semaphore waits around preemption.
v2: And disable internal semaphore usage
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190912132313.12751-1-chris@chris-wilson.co.uk
After we manipulate the context to allow replay after a GPU reset, force
that context to be reloaded. This should be a layer of paranoia, for if
the GPU was reset, the context will no longer be resident!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190912092933.4729-2-chris@chris-wilson.co.uk
After a GPU reset, we need to drain all the CS events so that we have an
accurate picture of the execlists state at the time of the reset. Be
paranoid and force a read of the CSB write pointer from memory.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190912092933.4729-1-chris@chris-wilson.co.uk
During reset, we try to ensure no forward progress of the CS prior to
the reset by setting the STOP_RING bit in RING_MI_MODE. Since gen9, this
register is context saved and do we end up in the odd situation where we
save the STOP_RING bit and so try to stop the engine again immediately
upon resume. This is quite unexpected and causes us to complain about an
early CS completion event!
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111514
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190910080208.4223-1-chris@chris-wilson.co.uk
As we may unwind incomplete requests (for preemption) prior to
processing the CSB and the schedule-out events, we may update rq->engine
(resetting it to point back to the parent virtual engine) prior to
calling execlists_schedule_out(), invalidating the assertion that the
request still points to the inflight engine. (The likelihood of this is
increased if the CSB interrupt processing is pushed to the ksoftirqd for
being too slow and direct submission overtakes it.)
Tvrtko summarised it as:
"So unwind from direct submission resets rq->engine and races with
process_csb from the tasklet which notices request has actually
completed."
Reported-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Fixes: df40306902 ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190907105046.19934-1-chris@chris-wilson.co.uk
Gen12 has subtle changes in the reg state context offsets (some fields
are gone, some are in a different location), compared to previous Gens.
The simplest approach seems to be keeping Gen12 (and future platform)
changes apart from the previous gens, while keeping the registers that
are contiguous in functions we can reuse.
v2: alias, virtual engine, rpcs, prune unused regs
v3: use engine base (Daniele), take ctx_bb for all
Bspec: 46255
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Tested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
[ickle: Tweaked the GEM_WARN_ON after settling on a compromise with
Daniele]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190906122314.2146-2-mika.kuoppala@linux.intel.com
Daniele pointed out that relative mmio works differently in
on context restore. Instead of adding the engine mmio base to offset,
it masks out the base and adds bits [12:2] to current engine base.
This should allow us to construct context register state to be
applicable to all instances, including virtual. And avoid the trouble
of updating the registers on virtual instances when submitting work.
v2: only enable for gen12 for now (Mika)
v3: make enabling readable (Chris)
Bspec: 20206
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190906134957.25909-1-mika.kuoppala@linux.intel.com
We use the context->pin_mutex to serialise updates to the OA config and
the registers values written into each new context. Document this
relationship and assert we do hold the context->pin_mutex as used by
gen8_configure_all_contexts() to serialise updates to the OA config
itself.
v2: Add a white-lie for when we call intel_gt_resume() from init.
v3: Lie while we have the context pinned inside atomic reset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20190830181929.18663-1-chris@chris-wilson.co.uk
With the upcoming change in timing (dramatically reducing the latency
between manipulating the ppGTT and execution), no amount of tweaking
could save Cherryview, it would always fail to invalidate its TLB.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190830180000.24608-2-chris@chris-wilson.co.uk
The addition of the DC_FLUSH failed to ensure sanctity of the post-sync
write as CI immediately got a completion CS-event before the breadcrumb
was coherent. So let's try the other idea of moving the post-sync write
into the CS_STALL.
References: https://bugs.freedesktop.org/show_bug.cgi?id=111514
References: e8f6b4952e ("drm/i915/execlists: Flush the post-sync breadcrumb write harder")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190829081150.10271-2-chris@chris-wilson.co.uk
Quite rarely we see that the CS completion event fires before the
breadcrumb is coherent, which presumably is a result of the CS_STALL not
waiting for the post-sync operation. Try throwing in a DC_FLUSH into
the following pipecontrol to see if that makes any difference.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190827120615.31390-1-chris@chris-wilson.co.uk
The CS pre-parser can pre-fetch commands across memory sync points and
starting from gen12 it is able to pre-fetch across BB_START and BB_END
boundaries as well, so when we emit gpu relocs the pre-parser might
fetch the target location of the reloc before the memory write lands.
The parser can't pre-fetch across the ctx switch, so we use a separate
context to guarantee that the memory is synchronized before the parser
can get to it.
Note that there is no risk of the CS doing a lite restore from the reloc
context to the user context, even if the two have the same hw_id,
because since gen11 the CS also checks the LRCA when deciding if it can
lite-restore.
v2: limit new context to gen12+, release in eb_destroy, add a comment
in emit_fini_breadcrumb (Chris).
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190827185805.21799-1-daniele.ceraolospurio@intel.com
Since we now run process_csb() outside of the engine->active.lock, we
can process a CS-event immediately upon our ELSP write. As we currently
inspect the pending queue *after* the ELSP write, there is an
opportunity for a CS-event to update the pending queue before we can
read it, making ourselves chases an invalid pointer.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111427
Fixes: df40306902 ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190821142336.21609-1-chris@chris-wilson.co.uk
Add empty workaround hooks for Tiger Lake. The workarounds will be added
on separate patches. We were already applying
WaRsForcewakeAddDelayForAck, which is indeed still valid, so also update
the comment.
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817093902.2171-21-lucas.demarchi@intel.com