We need to setup the workarounds on all engines, with the knowledge
about which platforms each workaround applies to kept together in the
workaround list. As such, we can pull the w/a initialisation into the
common setup and try to avoid duplicating knowledge about when to setup
the workarounds.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703135805.7310-2-chris@chris-wilson.co.uk
Expose whether or not we support the PMU software tracking in our
scheduler capabilities, so userspace can query at runtime.
v2: Use I915_SCHEDULER_CAP_ENGINE_BUSY_STATS for a less ambiguous
capability name.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703143702.11339-1-chris@chris-wilson.co.uk
We don't care about the result of the read, so it may be garbage, we
only care that the mmio is flushed. As such, we can forgo using an
individual forcewake and lock around any posting-read for an engine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703155225.9501-4-chris@chris-wilson.co.uk
We can assume the caller is holding a blanket forcewake for the
register writes during resume, and so we can skip taking individual
locks around each write inside execlists resume.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703155225.9501-3-chris@chris-wilson.co.uk
During post-reset resume, we call intel_mocs_init_engine to reinitialise
the MOCS registers. Suprisingly, especially when enhanced by lockdep,
the acquisition of the forcewake lock around each register write takes a
substantial portion of the reset time. We don't need to use the
individual forcewake here as we can assume that the caller is holding a
blanket forcewake for the reset&resume and the resume is serialised.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703155225.9501-2-chris@chris-wilson.co.uk
The render state is used to initialise the default RCS context, and only
used during early setup from within the gt code. As such, it makes a
good candidate for placing within gt/, even if it is not yet entirely
clean of our GEM heritage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190704091925.7391-1-chris@chris-wilson.co.uk
Be a little more hesitant before injecting a timeslice, and try to take
into account any change in priority that is due for the running task
before switching to another task. This will allow us to arbitrarily
prevent switching away from a request if we deem it necessarily to
disable preemption, for instance.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703091726.11690-9-chris@chris-wilson.co.uk
We frequently, but not frequently enough!, remember to flush residual
operations and objects at the end of a live subtest. The purpose is to
cleanup after every subtest, leaving a clean slate for the next subtest,
and perform early detection of leaky state. As this should ideally be
common for all live subtests, pull the task into a common teardown
routine.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190703091726.11690-1-chris@chris-wilson.co.uk
When eliminating our use of drm_irq_install() I failed to convert
all our synchronize_irq() calls to consult pdev->irq instead of
dev_priv->drm.irq. As we no longer populate dev_priv->drm.irq
we're no longer synchronizing against anything.
v2: Add intel_syncrhonize_irq() (Chris)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reported-by: Imre Deak <imre.deak@intel.com>
Fixes: b318b82455 ("drm/i915: Nuke drm_driver irq vfuncs")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111012
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190702151723.29739-1-ville.syrjala@linux.intel.com
The same tests failing on CFL+ platforms are also failing on ICL.
Documentation doesn't list the
WaAllowPMDepthAndInvocationCountAccessFromUMD workaround for ICL but
applying it fixes the same tests as CFL.
v2: Use only one whitelist entry (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: stable@vger.kernel.org
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190628120720.21682-4-lionel.g.landwerlin@intel.com
CFL:C0+ changed the status of those registers which are now
blacklisted by default.
This is breaking a number of CTS tests on GL & Vulkan :
KHR-GL45.pipeline_statistics_query_tests_ARB.functional_fragment_shader_invocations (GL)
dEQP-VK.query_pool.statistics_query.fragment_shader_invocations.* (Vulkan)
v2: Only use one whitelist entry (Lionel)
Bspec: 14091
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: stable@vger.kernel.org
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190628120720.21682-3-lionel.g.landwerlin@intel.com
When a register is readonly there is not much we can tell about its
value (apart from its default value?). This can be covered by tests
exercising the value of the register from userspace.
For PS_INVOCATION_COUNT we've got the following piglit tests :
KHR-GL45.pipeline_statistics_query_tests_ARB.functional_fragment_shader_invocations
Vulkan CTS tests :
dEQP-VK.query_pool.statistics_query.fragment_shader_invocations.*
v2: Use a local to shrink under 80cols.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 86554f48e5 ("drm/i915/selftests: Verify whitelist of context registers")
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190629131350.31185-1-chris@chris-wilson.co.uk
Daniele pointed out that the CSB status information will change with
Tigerlake and suggested that we could rearrange our state machine to
hide the differences in generation. gcc also prefers the explicit state
machine, so make it so:
process_csb 1980 1967 -13
Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190701100502.15639-4-chris@chris-wilson.co.uk
With the subdirectories we lost the ability to build individual files on
the command line, for example:
$ make drivers/gpu/drm/i915/display/intel_display.o
This was due to the top level directory missing from header search
path. Add the header search paths to subdir Makefiles.
Note that none of the other options in the top level i915 Makefile are
taken into account when building individual files. Usually this is not a
concern.
Reported-by: Imre Deak <imre.deak@intel.com>
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190626143618.21800-2-jani.nikula@intel.com
Since the reset path wants to recover the engines itself, it only wants
to reinitialise the hardware using i915_gem_init_hw(). Pull the call to
intel_engines_resume() to the module init/resume path so we can avoid it
during reset.
Fixes: 79ffac8599 ("drm/i915: Invert the GEM wakeref hierarchy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190626154549.10066-3-chris@chris-wilson.co.uk
If we issue a reset to a currently idle engine, leave it idle
afterwards. This is useful to excise a linkage between reset and the
shrinker. When waking the engine, we need to pin the default context
image which we use for overwriting a guilty context -- if the engine is
idle we do not need this pinned image! However, this pinning means that
waking the engine acquires the FS_RECLAIM, and so may trigger the
shrinker. The shrinker itself may need to wait upon the GPU to unbind
and object and so may require services of reset; ergo we should avoid
the engine wake up path.
The danger in skipping the recovery for idle engines is that we leave the
engine with no context defined, which may interfere with the operation of
the power context on some older platforms. In practice, we should only
be resetting an active GPU but it something to look out for on Ironlake
(if memory serves).
Fixes: 79ffac8599 ("drm/i915: Invert the GEM wakeref hierarchy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190626154549.10066-2-chris@chris-wilson.co.uk
For use in the next patch, we want to acquire a wakeref without having
to wake the device up -- i.e. only acquire the engine wakeref if the
engine is already active.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190626154549.10066-1-chris@chris-wilson.co.uk
We require that the intel_gpu_reset() was atomic, not the whole of
i915_reset() which is guarded by a mutex. However, we do require that
i915_reset_engine() is atomic for use from within the submission tasklet.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190626134433.6318-3-chris@chris-wilson.co.uk
We no longer need to manually acquire a wakeref for request emission, so
drop the redundant wakerefs, letting us test our wakeref handling more
precisely.
References: 79ffac8599 ("drm/i915: Invert the GEM wakeref hierarchy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190626134433.6318-2-chris@chris-wilson.co.uk
In order for the reset count to be accurate across our selftest, we need
to prevent the background retire worker from modifying our expected
state. To preserve the intent of symmetry, we apply this to both
i915_reset and i915_reset_engine, even though it strictly only affects
i915_reset_engine currently.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190626134433.6318-1-chris@chris-wilson.co.uk
We no longer allocate a contiguous set of timeline ids for all engines
upon creation, so we no longer should assume that the timelines are
densely allocated within a context. Hopefully, the set of fences used
within a workload are still dense enough for us to take advantage of
the compressed radix tree used for the syncmap.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190625233349.32371-1-chris@chris-wilson.co.uk
As this engine owns the lock around rq->sched.link (for those waiters
submitted to this engine), we can use that link as an element in a local
list. We can thus replace the recursive algorithm with an iterative walk
over the ordered list of waiters.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190625130128.11009-1-chris@chris-wilson.co.uk
The Demand Prefetch workaround (binding table prefetching) only applies
to Icelake A0/B0. But the Sampler Prefetch workaround needs to be
applied to all Gen11 steppings, according to a programming note in the
SARCHKMD documentation.
Using the Intel Gallium driver, I have seen intermittent failures in
the dEQP-GLES31.functional.copy_image.non_compressed.* tests. After
applying this workaround, the tests reliably pass.
v2: Remove the overlap with a pre-production w/a
BSpec: 9663
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190625090655.19220-1-chris@chris-wilson.co.uk
In the unlikely case (thank you CI!), we may find ourselves wanting to
issue a preemption but having no runnable requests left. In this case,
we set the semaphore before computing the preemption and so must unset
it before forgetting (or else we leave the machine busywaiting until the
next request comes along and so likely hang).
v2: Replace readback with only a wmb after asserting the semaphore
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190624092009.30189-1-chris@chris-wilson.co.uk
If we introduce a callback for i915_active that is only called the first
time we use the i915_active and is symmetrically paired with the
i915_active.retire callback, we can replace the open-coded and
non-atomic implementations -- which will be very fragile (i.e. broken)
upon removing the struct_mutex serialisation.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-4-chris@chris-wilson.co.uk
Remove the accumulated optimisations that we have for i915_vma_retire
and reduce it to the bare essential of tracking the active object
reference. This allows us to only use atomic operations, and so will be
able to avoid the struct_mutex requirement.
The principal loss here is the shrinker MRU bumping, so now if we have
to shrink, we will do so in much more random order and more likely to
try and shrink recently used objects. That is a nuisance, but shrinking
active objects is a second step we try to avoid and will always be a
system-wide performance issue.
The other loss is here is in the automatic pruning of the
reservation_object when idling. This is not as large an issue as upon
reservation_object introduction as now adding new fences into the object
replaces already signaled fences, keeping the array compact. But we do
lose the auto-expiration of stale fences and unused arrays. That may be
a noticeable problem for which we need to re-implement autopruning.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-3-chris@chris-wilson.co.uk
i915_gem_wait_for_idle() and i915_retire_requests() introduce a
dependency on the timeline->mutex. This is problematic as we want to
later perform allocations underneath i915_active.mutex, forming a link
between the shrinker, the timeline and active mutexes. Nip this cycle in
the bud by removing the acquisition of the timeline mutex (i.e.
retiring) from inside the shrinker.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621183801.23252-1-chris@chris-wilson.co.uk
drivers/gpu/drm/i915/gt/intel_mocs.c:513: warning: Function parameter or member 'gt' not described in 'intel_mocs_init_l3cc_table'
drivers/gpu/drm/i915/gt/intel_mocs.c:513: warning: Excess function parameter 'dev_priv' description in 'intel_mocs_init_l3cc_table'
intel_vgt_balloon/deballoon, i915_ggtt_probe_hw intel_wopcm_init_hw need
similar treatment
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621131640.28864-2-chris@chris-wilson.co.uk
Since the anonymous i915_gt became struct intel_gt and encloses
struct i915_gt_timelines, rename i915_gt_timelines to intel_gt_timelines
to match its parentage.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621131640.28864-1-chris@chris-wilson.co.uk
Scratch vma lives under gt but the API used to work on i915. Make this
consistent by renaming the function to intel_gt_scratch_offset and make
it take struct intel_gt.
v2:
* Move to intel_gt. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-33-tvrtko.ursulin@linux.intel.com
Our timelines are stored inside intel_gt so we can convert the interface
to take exactly that and not i915.
At the same time re-order the params to our more typical layout and
replace the backpointer to the new containing structure.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-31-tvrtko.ursulin@linux.intel.com
For gt related operations it makes more logical sense to stay in the realm
of gt instead of dereferencing via driver i915.
This patch handles a few of the easy ones with work requiring more
refactoring still outstanding.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-30-tvrtko.ursulin@linux.intel.com
This will become useful in the following patch.
v2:
* Assign the pointer through a helper on the top level to work around
the layering violation. (Chris)
v3:
* Handle selftests.
v4:
* Move call to intel_gt_init_hw into mock_init_ggtt. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-28-tvrtko.ursulin@linux.intel.com
Having introduced struct intel_gt (named the anonymous structure in i915)
we can start using it to compartmentalize our code better. It makes more
sense logically to have the code internally like this and it will also
help with future split between gt and display in i915.
v2:
* Keep ggtt flush before fb obj flush. (Chris)
v3:
* Fix refactoring fail.
* Always flush ggtt writes. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-23-tvrtko.ursulin@linux.intel.com
As it will grow in a following patch make a new home for it.
v2:
* Convert mock_gem_device as well. (Chris)
v3:
* Rename to intel_gt_init_early and move call site to i915_drv.c. (Chris)
v4:
* Adjust SPDX tags.
* No need to gt/ path when including intel_gt_types.h. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-3-tvrtko.ursulin@linux.intel.com
We have long been slighlty annoyed by the anonymous i915->gt.
Promote it to a separate structure and give it its own header.
This is a first step towards cleaning up the separation between
i915 and gt.
v2:
* Adjust SPDX header.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621070811.7006-2-tvrtko.ursulin@linux.intel.com
The call to kick_siblings() dereferences the rq->context, so we should
not drop our local reference until afterwards!
v2: Stick to setting ce.inflight=NULL before kicking as this is what the
other threads will check to see if the context is ready for takeover.
Fixes: 22b7a426bb ("drm/i915/execlists: Preempt-to-busy")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190621080729.2652-1-chris@chris-wilson.co.uk