Gen12 has dual-subslices (DSS), which compared to gen11 subslices have
some duplicated resources/paths. Although DSS behave similarly to 2
subslices, instead of splitting this and presenting userspace with bits
not directly representative of hardware resources, present userspace
with a subslice_mask made up of DSS bits instead.
v2: GEM_BUG_ON on mask size (Lionel)
Bspec: 29547
Bspec: 12247
Cc: Kelvin Gardiner <kelvin.gardiner@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
CC: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com> #v1
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: James Ausmus <james.ausmus@intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190913075137.18476-2-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Make both GuC and HuC to use "." as the separator. Hardcode
the separator in MAKE_UC_FW_PATH. Remove the usage of "ver" from HuC.
The current convention being:
<platform>_<g/h>uc_<major>.<minor>.patch.bin
Update the versions of HuC being loaded of the platforms.
SKL - v2.0.0
BXT - v2.0.0
KBL - v4.0.0
GLK - v4.0.0
CFL - KBL v4.0.0
ICL - v9.0.0
CML - v4.0.0
v2: Remove the separator parameter altogether from
__MAKE_UC_FW_PATH.(Daniele)
- Squash all firmware update patches (Daniele)
v3: s/huc/HuC
- Correct the order of platforms
- Change REVID of cml to 5(Michal)
- Code space changes in huc_def (Daniele)
Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919201204.9691-1-daniele.ceraolospurio@intel.com
Our sanitychecks indicate that while this register is context
saved/restore, the HW does not preserve this bit within the register --
it likely doesn't exist, or one of those mythical bits that the
architects insist does something despite all appearances to the
contrary.
For reference, SAMPLER_MODE is already in i915_reg.h as
GEN10_SAMPLER_MODE and is being setup in icl_ctx_workarounds_init() as
opposed to the chosen location here of rcs_engine_wa_init).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111754
Fixes: 7f0cc34b53 ("drm/i915/tgl: Implement Wa_1406941453")
Testcase: igt/i915_selftest/live_workarounds
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190920081254.18389-1-chris@chris-wilson.co.uk
As not only is the signal->timeline volatile, so will be acquiring the
timeline's HWSP. We must first carefully acquire the timeline from the
signaling request and then lock the timeline. With the removal of the
struct_mutex serialisation of request construction, we can have multiple
timelines active at once, and so we must avoid using the nested mutex
lock as it is quite possible for both timelines to be establishing
semaphores on the other and so deadlock.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-3-chris@chris-wilson.co.uk
The request->timeline is only valid until the request is retired (i.e.
before it is completed). Upon retiring the request, the context may be
unpinned and freed, and along with it the timeline may be freed. We
therefore need to be very careful when chasing rq->timeline that the
pointer does not disappear beneath us. The vast majority of users are in
a protected context, either during request construction or retirement,
where the timeline->mutex is held and the timeline cannot disappear. It
is those few off the beaten path (where we access a second timeline) that
need extra scrutiny -- to be added in the next patch after first adding
the warnings about dangerous access.
One complication, where we cannot use the timeline->mutex itself, is
during request submission onto hardware (under spinlocks). Here, we want
to check on the timeline to finalize the breadcrumb, and so we need to
impose a second rule to ensure that the request->timeline is indeed
valid. As we are submitting the request, it's context and timeline must
be pinned, as it will be used by the hardware. Since it is pinned, we
know the request->timeline must still be valid, and we cannot submit the
idle barrier until after we release the engine->active.lock, ergo while
submitting and holding that spinlock, a second thread cannot release the
timeline.
v2: Don't be lazy inside selftests; hold the timeline->mutex for as long
as we need it, and tidy up acquiring the timeline with a bit of
refactoring (i915_active_add_request)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919111912.21631-1-chris@chris-wilson.co.uk
Before we execute a batch, we must first issue any and all TLB
invalidations so that batch picks up the new page table entries.
Tigerlake's preparser is weakening our post-sync CS_STALL inside the
invalidate pipe-control and allowing the loading of the batch buffer
before we have setup its page table (and so it loads the wrong page and
executes indefinitely).
The igt_cs_tlb indicates that this issue can only be observed on rcs,
even though the preparser is common to all engines. Alternatively, we
could do TLB shootdown via mmio on updating the GTT.
By inserting the pre-parser disable inside EMIT_INVALIDATE, we will also
accidentally fixup execution that writes into subsequent batches, such
as gem_exec_whisper and even relocations performed on the GPU. We should
be careful not to allow this disable to become baked into the uABI! The
issue is that if userspace relies on our disabling of the HW
optimisation, when we are ready to enable that optimisation, userspace
will then be broken...
Testcase: igt/i915_selftests/live_gtt/igt_cs_tlb
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111753
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919151811.9526-1-chris@chris-wilson.co.uk
Our assumption that the we can ask the HW to lock the SFC even if not
currently in use does not match the HW commitment. The expectation from
the HW is that SW will not try to lock the SFC if the engine is not
using it and if we do that the behavior is undefined; on ICL the HW
ends up to returning the ack and ignoring our lock request, but this is
not guaranteed and we shouldn't expect it going forward.
Also, failing to get the ack while the SFC is in use means that we can't
cleanly reset it, so fail the engine reset in that scenario.
v2: drop rmw change, keep the log as debug and handle failure (Chris),
improve comments (Tvrtko).
Reported-by: Owen Zhang <owen.zhang@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190919015330.15435-1-daniele.ceraolospurio@intel.com
A few times in CI, we have detected a GPU hang on our Haswell GT2
systems with the characteristic IPEHR of 0x780c0000. When the PSMI w/a
was first introducted, it was applied to all Haswell, but later on we
found an erratum that supposedly restricted the issue to GT1 and so
constrained it only be applied on GT1. That may have been a mistake...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111692
Fixes: 167bc759e8 ("drm/i915: Restrict PSMI context load w/a to Haswell GT1")
References: 2c55018347 ("drm/i915: Disable PSMI sleep messages on all rings around context switches")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190917194746.26710-1-chris@chris-wilson.co.uk
On Tigerlake, MI_SEMAPHORE_WAIT grew an extra dword, so be sure to
update the length field and emit that extra parameter and any padding
noop as required.
v2: Define the token shift while we are adding the updated MI_SEMAPHORE_WAIT
v3: Use int instead of bool in the addition so that readers are not left
wondering about the intricacies of the C spec. Now they just have to
worry what the integer value of a boolean operation is...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190917123055.28965-1-chris@chris-wilson.co.uk
Include the active context register state when dumping the engine.
Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190915203701.29163-1-chris@chris-wilson.co.uk
While srcu may use an integer tag, it does not exclude potential error
codes and so may overlap with our own use of -EINTR. Use a separate
outparam to store the tag, and report the error code separately.
Fixes: 2caffbf117 ("drm/i915: Revoke mmaps and prevent access to fence registers across reset")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190912160834.30601-1-chris@chris-wilson.co.uk
We see failures where the context continues executing past a
preemption event, eventually leading to situations where a request has
executed before we have event submitted it to HW! It seems like tgl is
ignoring our RING_TAIL updates, but more likely is that there is a
missing update required for our semaphore waits around preemption.
v2: And disable internal semaphore usage
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190912132313.12751-1-chris@chris-wilson.co.uk
As we track when we put the GT device to sleep upon idling, we can use
that callback to sample the current rc6 counters and record the
timestamp for estimating samples after that point while asleep.
v2: Stick to using ktime_t
v3: Track user_wakerefs that interfere with the new
intel_gt_pm_wait_for_idle
v4: No need for parked/unparked estimation if !CONFIG_PM
v5: Keep timer park/unpark logic as was
v6: Refactor duplicated estimate/update rc6 logic
v7: Pull intel_get_pm_get_if_awake() out from the pmu->lock.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105010
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190912124813.19225-1-chris@chris-wilson.co.uk
After we manipulate the context to allow replay after a GPU reset, force
that context to be reloaded. This should be a layer of paranoia, for if
the GPU was reset, the context will no longer be resident!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190912092933.4729-2-chris@chris-wilson.co.uk
After a GPU reset, we need to drain all the CS events so that we have an
accurate picture of the execlists state at the time of the reset. Be
paranoid and force a read of the CSB write pointer from memory.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190912092933.4729-1-chris@chris-wilson.co.uk
This allows userspace to use "legacy" mode for push constants, where
they are committed at 3DPRIMITIVE or flush time, rather than being
committed at 3DSTATE_BINDING_TABLE_POINTERS_XS time. Gen6-8 and Gen11
both use the "legacy" behavior - only Gen9 works in the "new" way.
Conflating push constants with binding tables is painful for userspace,
we would like to be able to avoid doing so.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: stable@vger.kernel.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190911014801.26821-1-kenneth@whitecape.org
Both in the container_of and getting to gt->awake there is no need to go
via i915 since both the wakeref and awake are members of gt.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190910143823.10686-4-tvrtko.ursulin@linux.intel.com
Code in i915_gem_init_hw is all about GT init so move it to intel_gt.c
renaming to intel_gt_init_hw.
Existing intel_gt_init_hw is renamed to intel_gt_init_hw_early since it
is currently called from driver probe.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190910143823.10686-2-tvrtko.ursulin@linux.intel.com
During reset, we try to ensure no forward progress of the CS prior to
the reset by setting the STOP_RING bit in RING_MI_MODE. Since gen9, this
register is context saved and do we end up in the odd situation where we
save the STOP_RING bit and so try to stop the engine again immediately
upon resume. This is quite unexpected and causes us to complain about an
early CS completion event!
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111514
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190910080208.4223-1-chris@chris-wilson.co.uk
As we may unwind incomplete requests (for preemption) prior to
processing the CSB and the schedule-out events, we may update rq->engine
(resetting it to point back to the parent virtual engine) prior to
calling execlists_schedule_out(), invalidating the assertion that the
request still points to the inflight engine. (The likelihood of this is
increased if the CSB interrupt processing is pushed to the ksoftirqd for
being too slow and direct submission overtakes it.)
Tvrtko summarised it as:
"So unwind from direct submission resets rq->engine and races with
process_csb from the tasklet which notices request has actually
completed."
Reported-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Fixes: df40306902 ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190907105046.19934-1-chris@chris-wilson.co.uk
Refactor the GT power management interface to work through the GT now
that it is under the control of gt/
Based on a patch by Chris Wilson.
Signed-off-by: Andi Shyti <andi.shyti@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190905111403.10071-1-andi.shyti@intel.com
Gen12 has subtle changes in the reg state context offsets (some fields
are gone, some are in a different location), compared to previous Gens.
The simplest approach seems to be keeping Gen12 (and future platform)
changes apart from the previous gens, while keeping the registers that
are contiguous in functions we can reuse.
v2: alias, virtual engine, rpcs, prune unused regs
v3: use engine base (Daniele), take ctx_bb for all
Bspec: 46255
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Tested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
[ickle: Tweaked the GEM_WARN_ON after settling on a compromise with
Daniele]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190906122314.2146-2-mika.kuoppala@linux.intel.com
Daniele pointed out that relative mmio works differently in
on context restore. Instead of adding the engine mmio base to offset,
it masks out the base and adds bits [12:2] to current engine base.
This should allow us to construct context register state to be
applicable to all instances, including virtual. And avoid the trouble
of updating the registers on virtual instances when submitting work.
v2: only enable for gen12 for now (Mika)
v3: make enabling readable (Chris)
Bspec: 20206
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190906134957.25909-1-mika.kuoppala@linux.intel.com
This bit was fliped on for "syncing dependencies between camera and
graphics". BSpec has no recollection why, and it is causing
unrecoverable GPU hangs with Vulkan compute workloads.
From BSpec, setting bit5 to 0 enables relaxed padding requirements for
buffers, 1D and 2D non-array, non-MSAA, non-mip-mapped linear surfaces;
and *must* be set to 0h on skl+ to ensure "Out of Bounds" case is
suppressed.
Reported-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110998
Fixes: 8424171e13 ("drm/i915/gen9: h/w w/a: syncing dependencies between camera and graphics")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: denys.kostin@globallogic.com
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.1+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190904100707.7377-1-chris@chris-wilson.co.uk
We use the context->pin_mutex to serialise updates to the OA config and
the registers values written into each new context. Document this
relationship and assert we do hold the context->pin_mutex as used by
gen8_configure_all_contexts() to serialise updates to the OA config
itself.
v2: Add a white-lie for when we call intel_gt_resume() from init.
v3: Lie while we have the context pinned inside atomic reset.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #v1
Link: https://patchwork.freedesktop.org/patch/msgid/20190830181929.18663-1-chris@chris-wilson.co.uk
With the upcoming change in timing (dramatically reducing the latency
between manipulating the ppGTT and execution), no amount of tweaking
could save Cherryview, it would always fail to invalidate its TLB.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190830180000.24608-2-chris@chris-wilson.co.uk
With the upcoming change in timing (dramatically reducing the latency
between manipulating the ppGTT and execution), no amount of tweaking
could save Baytrail, it would always fail to invalidate its TLB. Ville
was right, Baytrail is beyond hope.
v2: Rollback on all gen7; same timing instability on TLB invalidation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190830180000.24608-1-chris@chris-wilson.co.uk
The addition of the DC_FLUSH failed to ensure sanctity of the post-sync
write as CI immediately got a completion CS-event before the breadcrumb
was coherent. So let's try the other idea of moving the post-sync write
into the CS_STALL.
References: https://bugs.freedesktop.org/show_bug.cgi?id=111514
References: e8f6b4952e ("drm/i915/execlists: Flush the post-sync breadcrumb write harder")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190829081150.10271-2-chris@chris-wilson.co.uk
During normal driver unload we attempt to disable GuC communication
while it is currently stopped. This results in a nop'd call to
intel_guc_ct_disable within guc_disable_communication because
stop/disable rely on the same flag to prevent further comms with CT.
We can avoid the call to disable and still leave communication in a
satisfactory state by extracting a set of shared steps from stop/disable.
This set can include guc_disable_interrupts as we do not require the
single caller of guc_stop_communication to be atomic:
"drm/i915/selftests: Fixup atomic reset checking".
This situation (stop -> disable) only occurs during intel_uc_fini_hw,
so during fini, call guc_disable_communication only if currently enabled.
The symmetric calls to enable/disable remain unmodified for all other
scenarios.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110943
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190829174154.14675-1-fernando.pacheco@intel.com
We've been ignoring similar coherency issues in IGT for Broadwater, and
specifically Broadwater (original gen4) and not, for example, Crestline
(same generation as Broadwater, but the mobile variant). Without any
means to reproduce locally (I have a 965GM but alas no 965G), fixing will
be slow, so tell CI to ignore any failure until we are ready with a fix.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Acked-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190826133837.6784-1-chris@chris-wilson.co.uk
Our current avoidance of non readable mcr range was not
inclusive enough. Extend the start and end.
References: HSDES#1405586840
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190809145653.2279-1-mika.kuoppala@linux.intel.com
Quite rarely we see that the CS completion event fires before the
breadcrumb is coherent, which presumably is a result of the CS_STALL not
waiting for the post-sync operation. Try throwing in a DC_FLUSH into
the following pipecontrol to see if that makes any difference.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190827120615.31390-1-chris@chris-wilson.co.uk
The CS pre-parser can pre-fetch commands across memory sync points and
starting from gen12 it is able to pre-fetch across BB_START and BB_END
boundaries as well, so when we emit gpu relocs the pre-parser might
fetch the target location of the reloc before the memory write lands.
The parser can't pre-fetch across the ctx switch, so we use a separate
context to guarantee that the memory is synchronized before the parser
can get to it.
Note that there is no risk of the CS doing a lite restore from the reloc
context to the user context, even if the two have the same hw_id,
because since gen11 the CS also checks the LRCA when deciding if it can
lite-restore.
v2: limit new context to gen12+, release in eb_destroy, add a comment
in emit_fini_breadcrumb (Chris).
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190827185805.21799-1-daniele.ceraolospurio@intel.com
To properly handle asynchronous migration of batch objects, we need to
couple the fences on the incoming batch into the request and should not
assume that they always start idle.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190826072149.9447-1-chris@chris-wilson.co.uk
Sadly lockdep records when the irqs are re-enabled and then marks up the
fake lock as being irq-unsafe. Our hand is forced and so we must mark up
the entire fake lock critical section as irq-off.
Hopefully this is the last tweak required!
v2: Not quite, we need to mark the timeline spinlock as irqsafe. That
was a genuine bug being hidden by the earlier lockdep splat.
Fixes: d67739268c ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823132700.25286-2-chris@chris-wilson.co.uk
Currently, the subslice_mask runtime parameter is stored as an
array of subslices per slice. Expand the subslice mask array to
better match what is presented to userspace through the
I915_QUERY_TOPOLOGY_INFO ioctl. The index into this array is
then calculated:
slice * subslice stride + subslice index / 8
v2: Fix 32-bit build
v3: Use new helper function in SSEU workaround warning message
v4: Use GEM_BUG_ON to force developers to use valid SSEU configurations
per platform (Chris)
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823160307.180813-12-stuart.summers@intel.com
Add a subslice stride calculation when setting subslices. This
aligns more closely with the userspace expectation of the subslice
mask structure.
v2: Use local variable for subslice_mask on HSW and
clean up a few other subslice_mask local variable
changes
v3: Add GEM_BUG_ON for ss_stride to prevent array overflow (Chris)
Split main set function and refactors in intel_device_info.c
into separate patches (Chris)
v4: Reduce ss_stride size check when setting subslices per slice
based on actual expected max stride (Chris)
Move that GEM_BUG_ON check for the ss_stride out to the patch
which adds the ss_stride
v5: Use memcpy instead of looping through each stride index
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823160307.180813-8-stuart.summers@intel.com
Add a new SSEU runtime parameter, eu_stride, which is
used to mirror the userspace concept of a range of EUs
per subslice.
This patch simply adds the parameter and updates usage
in the QUERY_TOPOLOGY_INFO handler.
v2: Add GEM_BUG_ON to make sure eu_stride is valid
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823160307.180813-5-stuart.summers@intel.com
Add a new parameter, ss_stride, to the runtime info
structure. This is used to mirror the userspace concept
of subslice stride, which is a range of subslices per slice.
This patch simply adds the definition and updates usage
in the QUERY_TOPOLOGY_INFO handler.
v2: Add GEM_BUG_ON to make sure ss_stride is valid
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823160307.180813-4-stuart.summers@intel.com
Add a new function to allow each platform to set maximum
slice, subslice, and EU information to reduce code duplication.
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823160307.180813-3-stuart.summers@intel.com
GAM registers located in the 0x4xxx range have been relocated to 0xCxxx;
this is to make space for global MOCS registers.
v2: Rename register and bitfield to its new name (suggested by Mika)
HSD: 399379
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190823082055.5992-2-lucas.demarchi@intel.com
We can reduce the locking for fence registers from the dev->struct_mutex
to a local mutex. We could introduce a mutex for the sole purpose of
tracking the fence acquisition, except there is a little bit of overlap
with the fault tracking, so use the i915_ggtt.mutex as it covers both.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190822060914.2671-1-chris@chris-wilson.co.uk
We need the rename of reservation_object to dma_resv.
The solution on this merge came from linux-next:
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Wed, 14 Aug 2019 12:48:39 +1000
Subject: [PATCH] drm: fix up fallout from "dma-buf: rename reservation_object to dma_resv"
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
drivers/gpu/drm/i915/gt/intel_engine_pool.c | 8 ++++----
3 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pool.c b/drivers/gpu/drm/i915/gt/intel_engine_pool.c
index 03d90b49584a..4cd54c569911 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pool.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pool.c
@@ -43,12 +43,12 @@ static int pool_active(struct i915_active *ref)
{
struct intel_engine_pool_node *node =
container_of(ref, typeof(*node), active);
- struct reservation_object *resv = node->obj->base.resv;
+ struct dma_resv *resv = node->obj->base.resv;
int err;
- if (reservation_object_trylock(resv)) {
- reservation_object_add_excl_fence(resv, NULL);
- reservation_object_unlock(resv);
+ if (dma_resv_trylock(resv)) {
+ dma_resv_add_excl_fence(resv, NULL);
+ dma_resv_unlock(resv);
}
err = i915_gem_object_pin_pages(node->obj);
which is a simplified version from a previous one which had:
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Since we now run process_csb() outside of the engine->active.lock, we
can process a CS-event immediately upon our ELSP write. As we currently
inspect the pending queue *after* the ELSP write, there is an
opportunity for a CS-event to update the pending queue before we can
read it, making ourselves chases an invalid pointer.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111427
Fixes: df40306902 ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190821142336.21609-1-chris@chris-wilson.co.uk
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- dma-buf: add reservation_object_fences helper, relax
reservation_object_add_shared_fence, remove
reservation_object seq number (and then
restored)
- dma-fence: Shrinkage of the dma_fence structure,
Merge dma_fence_signal and dma_fence_signal_locked,
Store the timestamp in struct dma_fence in a union with
cb_list
Driver Changes:
- More dt-bindings YAML conversions
- More removal of drmP.h includes
- dw-hdmi: Support get_eld and various i2s improvements
- gm12u320: Few fixes
- meson: Global cleanup
- panfrost: Few refactors, Support for GPU heap allocations
- sun4i: Support for DDC enable GPIO
- New panels: TI nspire, NEC NL8048HL11, LG Philips LB035Q02,
Sharp LS037V7DW01, Sony ACX565AKM, Toppoly TD028TTEC1
Toppoly TD043MTEA1
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRcEzekXsqa64kGDp7j7w1vZxhRxQUCXVqvpwAKCRDj7w1vZxhR
xa3RAQDzAnt5zeesAxX4XhRJzHoCEwj2PJj9Re6xMJ9PlcfcvwD+OS+bcB6jfiXV
Ug9IBd/DqjlmD9G9MxFxfSV946rksAw=
=8uv4
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-2019-08-19' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-next for 5.4:
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- dma-buf: add reservation_object_fences helper, relax
reservation_object_add_shared_fence, remove
reservation_object seq number (and then
restored)
- dma-fence: Shrinkage of the dma_fence structure,
Merge dma_fence_signal and dma_fence_signal_locked,
Store the timestamp in struct dma_fence in a union with
cb_list
Driver Changes:
- More dt-bindings YAML conversions
- More removal of drmP.h includes
- dw-hdmi: Support get_eld and various i2s improvements
- gm12u320: Few fixes
- meson: Global cleanup
- panfrost: Few refactors, Support for GPU heap allocations
- sun4i: Support for DDC enable GPIO
- New panels: TI nspire, NEC NL8048HL11, LG Philips LB035Q02,
Sharp LS037V7DW01, Sony ACX565AKM, Toppoly TD028TTEC1
Toppoly TD043MTEA1
Signed-off-by: Dave Airlie <airlied@redhat.com>
[airlied: fixup dma_resv rename fallout]
From: Maxime Ripard <maxime.ripard@bootlin.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190819141923.7l2adietcr2pioct@flea
Re-use Gen11 context size for now.
[ Lucas: this is a temporary enabling patch that needs to be confirmed:
we need to check BSpec 46255 and recompute ]
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817093902.2171-27-lucas.demarchi@intel.com
Add empty workaround hooks for Tiger Lake. The workarounds will be added
on separate patches. We were already applying
WaRsForcewakeAddDelayForAck, which is indeed still valid, so also update
the comment.
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817093902.2171-21-lucas.demarchi@intel.com
The CSB format has been reworked for Gen12 to include information on
both the context we're switching away from and the context we're
switching to. After the change, some of the events don't have their
own bit anymore and need to be inferred from other values in the csb.
One of the context IDs (0x7FF) has also been reserved to indicate
the invalid ctx, i.e. engine idle.
Note that the full context ID includes the SW counter as well, but since
we currently only care if the context is valid or not we can ignore that
part.
v2: fix mask size, fix and expand comments (Tvrtko),
use if-ladder (Chris)
Bspec: 45555, 46144
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190820102201.29849-1-chris@chris-wilson.co.uk
Gen12 uses a new indirect ctx offset.
Bspec: 11740
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817093902.2171-28-lucas.demarchi@intel.com
Make sure that when submitting requests, we always serialize against
potential vma moves and clflushes.
Time for a i915_request_await_vma() interface!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190819112033.30638-1-chris@chris-wilson.co.uk
We use a fake timeline->mutex lock to reassure lockdep that the timeline
is always locked when emitting requests. However, the use inside
__engine_park() may be inside hardirq and so lockdep now complains about
the mixed irq-state of the nested locked. Disable irqs around the
lockdep tracking to keep it happy.
Fixes: 6c69a45445 ("drm/i915/gt: Mark context->active_count as protected by timeline->mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190819075835.20065-3-chris@chris-wilson.co.uk
There is no need to mark whole GPU as wedged just because
of the custom HuC fw failure as users can always verify
actual HuC firmware status using existing HUC_STATUS ioctl.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190818095204.31568-4-michal.wajdeczko@intel.com
If we failed to fetch default GuC firmware and we didn't plan
to use it for the submission and we never have used GuC before
then we may continue normal driver load, no need to declare
GPU wedged (we can use execlist for submission) and it is safe
to run without the HuC (users will check HuC status anyway).
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190818095204.31568-3-michal.wajdeczko@intel.com
As we plan to continue driver load after GuC initialization
failure, we can't assume that GuC log data will be available
just because GuC was initially enabled. We must check that
GuC is still running instead.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190818095204.31568-2-michal.wajdeczko@intel.com
The timestamp and the cb_list are mutually exclusive, the cb_list can
only be added to prior to being signaled (and once signaled we drain),
while the timestamp is only valid upon being signaled. Both the
timestamp and the cb_list are only valid while the fence is alive, and
as soon as no references are held can be replaced by the rcu_head.
By reusing the union for the timestamp, we squeeze the base dma_fence
struct to 64 bytes on x86-64.
v2: Sort the union chronologically
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian König <christian.koenig@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>.
Link: https://patchwork.freedesktop.org/patch/msgid/20190817153022.5749-1-chris@chris-wilson.co.uk
Let's wait with decision about importance of uC failure to
hardware initialization step.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817131144.26884-4-michal.wajdeczko@intel.com
Be consistent and always perform fw fetch cleanup in GuC/HuC specific
init functions on every failure. Also while converting firmware
status to error, stop treating SELECTED as non-error, as long term
we should not see it.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817131144.26884-3-michal.wajdeczko@intel.com
We can rely on firmware status AVAILABLE to determine if any
firmware cleanup is required. Also don't unconditionally reset
fw status to SELECTED as we will loose MISSING/ERROR codes.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817131144.26884-2-michal.wajdeczko@intel.com
Add a redzone to our context image and check the HW does not write into
after a context save, to verify that we have the correct context size.
(This does vary with feature bits, so test with a live setup that should
match how we run userspace.)
v2: Check the redzone on every context unpin
v3: Use a kernel context to prevent loading garbage for ringbuffer
submission
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190817073711.5897-1-chris@chris-wilson.co.uk
We really need to have separate NOT_SUPPORTED state (for
lack of hardware support) and DISABLED state (to indicate
user decision) as we will have to take special steps even
if GuC firmware is now disabled but hardware exists and
could have been previously used.
v2: fix logic (Chris/CI)
v3: use proper check to avoid probe failure (CI)
v4: explain status transitions (Chris)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816205658.15020-1-michal.wajdeczko@intel.com
To remove the dependency between the GT headers and i915_reg.h, move the
definition of the engine IDs/classes to intel_engine_types.h
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816012343.36433-3-daniele.ceraolospurio@intel.com
If we only call process_csb() from the tasklet, though we lose the
ability to bypass ksoftirqd interrupt processing on direct submission
paths, we can push it out of the irq-off spinlock.
The penalty is that we then allow schedule_out to be called concurrently
with schedule_in requiring us to handle the usage count (baked into the
pointer itself) atomically.
As we do kick the tasklets (via local_bh_enable()) after our submission,
there is a possibility there to see if we can pull the local softirq
processing back from the ksoftirqd.
v2: Store the 'switch_priority_hint' on submission, so that we can
safely check during process_csb().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816171608.11760-1-chris@chris-wilson.co.uk
As every i915_active_request should be serialised by a dedicated lock,
i915_active consists of a tree of locks; one for each node. Markup up
the i915_active_request with what lock is supposed to be guarding it so
that we can verify that the serialised updated are indeed serialised.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816121000.8507-2-chris@chris-wilson.co.uk
We use timeline->mutex to protect modifications to
context->active_count, and the associated enable/disable callbacks.
Due to complications with engine-pm barrier there is a path where we used
a "superlock" to provide serialised protect and so could not
unconditionally assert with lockdep that it was always held. However,
we can mark the mutex as taken (noting that we may be nested underneath
ourselves) which means we can be reassured the right timeline->mutex is
always treated as held and let lockdep roam free.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816121000.8507-1-chris@chris-wilson.co.uk
While we need to know WOPCM size to do this sanity check, it has more to
do with FW than with WOPCM. Let's move the check to fetch phase, it's
not like WOPCM is going to grow in the meantime.
v2: rebased
v3: use __intel_uc_fw_get_upload_size (Daniele)
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jackie Li <yaodong.li@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190816105501.31020-2-michal.wajdeczko@intel.com
Forgo the struct_mutex requirement for request retirement as we have
been transitioning over to only using the timeline->mutex for
controlling the lifetime of a request on that timeline.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-4-chris@chris-wilson.co.uk
In preparation for removing struct_mutex from around context retirement,
we need to make timeline pinning and unpinning safe. Since multiple
engines/contexts can share a single timeline, we cannot rely on
borrowing the context mutex (otherwise we could state that the timeline
is only pinned/unpinned inside the context pin/unpin and so guarded by
it). However, we only perform a sequence of atomic operations inside the
timeline pin/unpin and the sequence of those operations is safe for a
concurrent unpin / pin, so we can relax the struct_mutex requirement.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815205709.24285-3-chris@chris-wilson.co.uk
Flush according to what gen11 expects when writing
breadcrumbs. As only the seqnowrite + flush differs
between engine and gens, enclose the footer to
helper.
v2: avoid problem of sane local naming by not using them
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815094929.358-1-mika.kuoppala@linux.intel.com
Add tile cache flushing for gen11. To relive us from the
burden of previous obsolete workarounds, make a dedicated
flush/invalidate callback for gen11.
To fortify an independent single flush, do post
sync op as there are indications that without it
we don't flush everything. This should also make this
callback more readily usable in tgl (see l3 fabric flush).
v2: whitespacing
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190815083055.14132-1-mika.kuoppala@linux.intel.com
The engine->guc_id is GuC FW defined and it is not guaranteed to be
below I915_NUM_ENGINES, so we shouldn't use it with the i915-defined
client->submissions, as we might overflow.
Instead of fixing it, just get rid of client->submissions, because the
information we get from it is not interesting anymore now that we only
have 1 client.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190814002145.29056-1-daniele.ceraolospurio@intel.com
Stop assuming we only get called with irqs-on for disarming the
breadcrumbs, and do a full save/restore spin_lock_irq.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190813132916.20382-2-chris@chris-wilson.co.uk
We don't care about internal firmware status changes unless
we are doing some real debugging. Note that our CI is not
using DRM_I915_DEBUG_GUC config by default so use it.
v2: protect against accidental overwrites (Chris)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190813081559.23936-1-michal.wajdeczko@intel.com
Since execlists and the guc have diverged in their port tracking, we
cannot simply reuse the execlists cancellation code as it leads to
unbalanced reference counting. Use a local, simpler routine for the guc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190812203626.3948-1-chris@chris-wilson.co.uk
We rely on the tasklet to update the GT PM refcount, so we can't disable
it even if we've processed all the requests for the engine because we
might have detected the request completion before the interrupt arrived.
Since on all platforms on which we plan to support guc submission we
don't allow disabling the breadcrumb interrupts, we can further siplify
the park/unpark flow by removing the interrupt pin/unpin. A BUG_ON has
been added to catch changes to this flow that would require us to
restore some kind of pinning.
v2: split removal of engine_pin/unpin_breadcrumbs_irq to its own
patch (chris)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190812233152.2172-1-daniele.ceraolospurio@intel.com
i915_irq.c is large. It serves as the central dispatch and handler for
all of our device interrupts. Lets break it up by pulling out the GT
interrupt handlers.
Based on a patch by Chris Wilson.
Signed-off-by: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190811210633.18417-1-chris@chris-wilson.co.uk
i915_irq.c is large. It serves as the central dispatch and handler for
all of our device interrupts. Pull out the GT pm interrupt handling
(leaving the central dispatch) so that we can encapsulate the logic a
little better.
Based on a patch by Chris Wilson.
Signed-off-by: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190811142801.2460-1-chris@chris-wilson.co.uk
Include 2019 in copyright years and start using SPDX tag.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190812092935.21048-1-michal.wajdeczko@intel.com