Commit Graph

435 Commits

Author SHA1 Message Date
Chris Wilson
65baf0ef04 drm/i915: Hold a ref to the ring while retiring
As the final request on a ring may hold the reference to this ring (via
retiring the last pinned context), we may find ourselves chasing a
dangling pointer on completion of the list.

A quick solution is to hold a reference to the ring itself as we retire
along it so that we only free it after we stop dereferencing it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190318095204.9913-4-chris@chris-wilson.co.uk
2019-03-18 21:00:28 +00:00
Chris Wilson
a9fe9ca44c drm/i915/gtt: Rename i915_vm_is_48b to i915_vm_is_4lvl
Large ppGTT are differentiated by the requirement to go to four levels
to address more than 32b. Given the introduction of more 4 level ppGTT
with different sizes of addressable bits, rename i915_vm_is_48b() to
better reflect the commonality of using 4 levels.

Based on a patch by Bob Paauwe.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Bob Paauwe <bob.j.paauwe@intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190314223839.28258-4-chris@chris-wilson.co.uk
2019-03-15 09:04:54 +00:00
Chris Wilson
51d623b675 drm/i915: Drop address size from ppgtt_type
With the introduction of the separate addressable bits into the device
info, we can remove the conflation of the ppgtt size from the ppgtt
type.

Based on a patch by Bob Paauwe.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Bob Paauwe <bob.j.paauwe@intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190314223839.28258-3-chris@chris-wilson.co.uk
2019-03-15 09:04:54 +00:00
Chris Wilson
cbecbccaa1 drm/i915: Record platform specific ppGTT size in intel_device_info
As the maximum addressable bits is determined by platform, record that
information in our static chipset tables. This has the advantage of
being clearly recorded in our capability dumps for dmesg, debugfs and
error states.

Based on a patch by Bob Paauwe.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Bob Paauwe <bob.j.paauwe@intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190314223839.28258-2-chris@chris-wilson.co.uk
2019-03-15 09:04:54 +00:00
Chris Wilson
d2eeaf2bc0 drm/i915/selftests: Disable preemption while setting up fence-timers
The impossible happens and a future fence expired while we were still
initialising. The probable cause is that the test was preempted and we
lost our scheduler cpu slice. Disable preemption during this test to
rule out preemption as a source of timer disruption.

References: https://bugs.freedesktop.org/show_bug.cgi?id=110039
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190313205944.5768-1-chris@chris-wilson.co.uk
2019-03-14 11:47:06 +00:00
Chris Wilson
22acf9fc18 drm/i915/selftests: Improve error detection of reset failure
Use a timedwait to promptly detect if the recovery after reset fails and
provide a meaningful debug dump.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190312111146.10662-2-chris@chris-wilson.co.uk
2019-03-12 12:49:30 +00:00
Chris Wilson
85fddf0b00 drm/i915: Introduce a context barrier callback
In the next patch, we will want to update live state within a context.
As this state may be in use by the GPU and we haven't been explicitly
tracking its activity, we instead attach it to a request we send down
the context setup with its new state and on retiring that request
cleanup the old state as we then know that it is no longer live.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190309160250.29324-1-chris@chris-wilson.co.uk
2019-03-09 17:19:54 +00:00
Chris Wilson
0881954965 drm/i915: Introduce intel_context.pin_mutex for pin management
Introduce a mutex to start locking the HW contexts independently of
struct_mutex, with a view to reducing the coarse struct_mutex. The
intel_context.pin_mutex is used to guard the transition to and from being
pinned on the gpu, and so is required before starting to build any
request. The intel_context will then remain pinned until the request
completes, but the mutex can be released immediately unpin completion of
pinning the context.

A slight variant of the above is used by per-context sseu that wants to
inspect the pinned status of the context, and requires that it remains
stable (either !pinned or pinned) across its operation. By using the
pin_mutex to serialise operations while pin_count==0, we can take that
pin_mutex for stabilise the boolean pin status.

v2: for Tvrtko!
* Improved commit message.
* Dropped _gpu suffix from gen8_modify_rpcs_gpu.
v3: Repair the locking for sseu selftests

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-7-chris@chris-wilson.co.uk
2019-03-08 14:04:19 +00:00
Chris Wilson
9dbfea98d7 drm/i915: Track the pinned kernel contexts on each engine
Each engine acquires a pin on the kernel contexts (normal and preempt)
so that the logical state is always available on demand. Keep track of
each engines pin by storing the returned pointer on the engine for quick
access.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-6-chris@chris-wilson.co.uk
2019-03-08 14:00:02 +00:00
Chris Wilson
95f697eb02 drm/i915: Make context pinning part of intel_context_ops
Push the intel_context pin callback down from intel_engine_cs onto the
context itself by virtue of having a central caller for
intel_context_pin() being able to lookup the intel_context itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-5-chris@chris-wilson.co.uk
2019-03-08 13:59:59 +00:00
Chris Wilson
c4d52feb2c drm/i915: Move over to intel_context_lookup()
In preparation for an ever growing number of engines and so ever
increasing static array of HW contexts within the GEM context, move the
array over to an rbtree, allocated upon first use.

Unfortunately, this imposes an rbtree lookup at a few frequent callsites,
but we should be able to mitigate those by moving over to using the HW
context as our primary type and so only incur the lookup on the boundary
with the user GEM context and engines.

v2: Check for no HW context in guc_stage_desc_init

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-4-chris@chris-wilson.co.uk
2019-03-08 13:59:52 +00:00
Chris Wilson
4dc84b77b0 drm/i915: Store the intel_context_ops in the intel_engine_cs
If we place a pointer to the engine specific intel_context_ops in the
engine itself, we can assign the ops pointer on initialising the
context, and then rely on it being set. This simplifies the code in
later patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-3-chris@chris-wilson.co.uk
2019-03-08 13:59:50 +00:00
Chris Wilson
7e3d9a5941 drm/i915: Track active engines within a context
For use in the next patch, if we track which engines have been used by
the HW, we can reduce the work required to flush our state off the HW to
those engines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-1-chris@chris-wilson.co.uk
2019-03-08 13:59:41 +00:00
Chris Wilson
c6eeb4797e drm/i915: Reduce presumption of request ordering for barriers
Currently we assume that we know the order in which requests run and so
can determine if we need to reissue a switch-to-kernel-context prior to
idling. That assumption does not hold for the future, so instead of
tracking which barriers have been used, simply determine if we have ever
switched away from the kernel context by using the engine and before
idling ensure that all engines that have been used since the last idle
are synchronously switched back to the kernel context for safety (and
else of shrinking memory while idle).

v2: Use intel_engine_mask_t and ALL_ENGINES

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-3-chris@chris-wilson.co.uk
2019-03-08 10:57:08 +00:00
Chris Wilson
5861b013e2 drm/i915: Do a synchronous switch-to-kernel-context on idling
When the system idles, we switch to the kernel context as a defensive
measure (no users are harmed if the kernel context is lost). Currently,
we issue a switch to kernel context and then come back later to see if
the kernel context is still current and the system is idle. However,
if we are no longer privy to the runqueue ordering, then we have to
relax our assumptions about the logical state of the GPU and the only
way to ensure that the kernel context is currently loaded is by issuing
a request to run after all others, and wait for it to complete all while
preventing anyone else from issuing their own requests.

v2: Pull wedging into switch_to_kernel_context_sync() but only after
waiting (though only for the same short delay) for the active context to
finish.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-1-chris@chris-wilson.co.uk
2019-03-08 10:57:05 +00:00
Chris Wilson
3123ada8eb drm/i915/selftests: Check preemption support on each engine
Check that we have setup on preemption for the engine before testing,
instead warn if it is not enabled on supported HW.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190306142517.22558-28-chris@chris-wilson.co.uk
2019-03-08 09:34:49 +00:00
Chris Wilson
2835f4f36b drm/i915/selftests: Improve switch-to-kernel-context checking
We can reduce the switch-to-kernel-context selftest to operate as a loop
and so trivially test another state transition (that of idle->busy).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190307211947.6954-1-chris@chris-wilson.co.uk
2019-03-07 23:33:35 +00:00
Michał Winiarski
b218a80b17 drm/i915/selftests: Upgrade printing test/subtest name to pr_info
We're using pr_debug for things that we don't really want to see in the
CI log, but we may find useful during test development.
Let's upgrade the test name printer - we do want to see those in CI log.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305144717.10000-1-michal.winiarski@intel.com
2019-03-06 11:36:36 +00:00
Chris Wilson
161996a800 drm/i915/selftests: Fix MI_STORE_DWORD_IMM alignment
MI_STORE_DWORD_IMM wants to write into a dword-aligned (4B) address, we
mistakenly cleared bit2 and not bits 0 and 1.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190306082447.21563-1-chris@chris-wilson.co.uk
2019-03-06 11:08:32 +00:00
Chris Wilson
8a68d46436 drm/i915: Store the BIT(engine->id) as the engine's mask
In the next patch, we are introducing a broad virtual engine to encompass
multiple physical engines, losing the 1:1 nature of BIT(engine->id). To
reflect the broader set of engines implied by the virtual instance, lets
store the full bitmask.

v2: Use intel_engine_mask_t (s/ring_mask/engine_mask/)
v3: Tvrtko voted for moah churn so teach everyone to not mention ring
and use $class$instance throughout.
v4: Comment upon the disparity in bspec for using VCS1,VCS2 in gen8 and
VCS[0-4] in later gen. We opt to keep the code consistent and use
0-index naming throughout.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305180332.30900-1-chris@chris-wilson.co.uk
2019-03-05 18:19:50 +00:00
Chris Wilson
ebece75392 drm/i915: Keep timeline HWSP allocated until idle across the system
In preparation for enabling HW semaphores, we need to keep in flight
timeline HWSP alive until its use across entire system has completed,
as any other timeline active on the GPU may still refer back to the
already retired timeline. We both have to delay recycling available
cachelines and unpinning old HWSP until the next idle point.

An easy option would be to simply keep all used HWSP until the system as
a whole was idle, i.e. we could release them all at once on parking.
However, on a busy system, we may never see a global idle point,
essentially meaning the resource will be leaked until we are forced to
do a GC pass. We already employ a fine-grained idle detection mechanism
for vma, which we can reuse here so that each cacheline can be freed
immediately after the last request using it is retired.

v3: Keep track of the activity of each cacheline.
v4: cacheline_free() on canceling the seqno tracking
v5: Finally with a testcase to exercise wraparound
v6: Pack cacheline into empty bits of page-aligned vaddr
v7: Use i915_utils to hide the pointer casting around bit manipulation

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-2-chris@chris-wilson.co.uk
2019-03-01 17:40:33 +00:00
Chris Wilson
34ae8455f4 drm/i915/selftests: Check that whitelisted registers are accessible
There is no point in whitelisting a register that the user then cannot
write to, so check the register exists before merging such patches.

v2: Mark SLICE_COMMON_ECO_CHICKEN1 [731c] as write-only
v3: Use different variables for different meanings!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dale B Stimson <dale.b.stimson@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301140404.26690-6-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20190301160108.19039-1-chris@chris-wilson.co.uk
2019-03-01 16:53:39 +00:00
Chris Wilson
3ef7114982 drm/i915: Introduce i915_timeline.mutex
A simple mutex used for guarding the flow of requests in and out of the
timeline. In the short-term, it will be used only to guard the addition
of requests into the timeline, taken on alloc and released on commit so
that only one caller can construct a request into the timeline
(important as the seqno and ring pointers must be serialised). This will
be used by observers to ensure that the seqno/hwsp is stable. Later,
when we have reduced retiring to only operate on a single timeline at a
time, we can then use the mutex as the sole guard required for retiring.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301110547.14758-2-chris@chris-wilson.co.uk
2019-03-01 14:54:46 +00:00
Chris Wilson
b5773a3616 drm/i915/execlists: Suppress mere WAIT preemption
WAIT is occasionally suppressed by virtue of preempted requests being
promoted to NEWCLIENT if they have not all ready received that boost.
Make this consistent for all WAIT boosts that they are not allowed to
preempt executing contexts and are merely granted the right to be at the
front of the queue for the next execution slot. This is in keeping with
the desire that the WAIT boost be a minor tweak that does not give
excessive promotion to its user and open ourselves to trivial abuse.

The problem with the inconsistent WAIT preemption becomes more apparent
as the preemption is propagated across the engines, where one engine may
preempt and the other not, and we be relying on the exact execution
order being consistent across engines (e.g. using HW semaphores to
coordinate parallel execution).

v2: Also protect GuC submission from false preemption loops.
v3: Build bug safeguards and better debug messages for st.
v4: Do the priority bumping in unsubmit (i.e. on preemption/reset
unwind), applying it earlier during submit causes out-of-order execution
combined with execute fences.
v5: Call sw_fence_fini for our dummy request (Matthew)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228220639.3173-1-chris@chris-wilson.co.uk
2019-02-28 23:10:43 +00:00
Chris Wilson
13f1bfd3b3 drm/i915: Make object/vma allocation caches global
As our allocations are not device specific, we can move our slab caches
to a global scope.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-2-chris@chris-wilson.co.uk
2019-02-28 11:08:02 +00:00
Chris Wilson
32eb6bcfdd drm/i915: Make request allocation caches global
As kmem_caches share the same properties (size, allocation/free behaviour)
for all potential devices, we can use global caches. While this
potential has worse fragmentation behaviour (one can argue that
different devices would have different activity lifetimes, but you can
also argue that activity is temporal across the system) it is the
default behaviour of the system at large to amalgamate matching caches.

The benefit for us is much reduced pointer dancing along the frequent
allocation paths.

v2: Defer shrinking until after a global grace period for futureproofing
multiple consumers of the slab caches, similar to the current strategy
for avoiding shrinking too early.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk
2019-02-28 11:07:56 +00:00
Chris Wilson
368375107b drm/i915/selftests: Exercise resetting during non-user payloads
In selftests/live_hangcheck, we have a lot of tests for resetting simple
spinners, but nothing quite prepared us for how the GPU reacted to
triggering a reset outside of the safe spinner. These two subtests fill
the ring with plain old empty, non-spinning requests, and then triggers
a reset. Without a user-payload to blame, these requests will exercise
the 'non-started' paths and mostly be replayed verbatim.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-4-chris@chris-wilson.co.uk
2019-02-26 09:55:41 +00:00
Chris Wilson
b300fde896 drm/i915: Remove i915_request.global_seqno
Having weaned the interrupt handling off using a single global execution
queue, we no longer need to emit a global_seqno. Note that we still have
a few assumptions about execution order along engine timelines, but this
removes the most obvious artefact!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-3-chris@chris-wilson.co.uk
2019-02-26 09:55:37 +00:00
Chris Wilson
8892f47742 drm/i915: Remove access to global seqno in the HWSP
Stop accessing the HWSP to read the global seqno, and stop tracking the
mirror in the engine's execution timeline -- it is unused.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226094922.31617-2-chris@chris-wilson.co.uk
2019-02-26 09:55:33 +00:00
Chris Wilson
c41166f9a1 drm/i915: Beware temporary wedging when determining -EIO
At a few points in our uABI, we check to see if the driver is wedged and
report -EIO back to the user in that case. However, as we perform the
check and reset asynchronously (where once before they were both
serialised by the struct_mutex), we may instead see the temporary wedging
used to cancel inflight rendering to avoid a deadlock during reset
(caused by either us timing out in our reset handler,
i915_wedge_on_timeout or with malice aforethought in intel_reset_prepare
for a stuck modeset). If we suspect this is the case, that is we see a
wedged driver *and* reset in progress, then wait until the reset is
resolved before reporting upon the wedged status.

v2: might_sleep() (Mika)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190220145637.23503-1-chris@chris-wilson.co.uk
2019-02-20 16:31:08 +00:00
Chris Wilson
e4106dae0f drm/i915/selftests: Make unbannable contexts for reset handling
igt_ctx_sseu was caught using bannable contexts, and in the course of
resetting rapidly to run its test, was banned. Don't let ourselves ban
the test!

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190218145051.18981-1-chris@chris-wilson.co.uk
2019-02-18 16:00:34 +00:00
Chris Wilson
83e3a21530 drm/i915/selftests: Move local mock_ggtt allocations to the heap
This struct appears quite large and pushes our stack frame over
1024 bytes -- too high for conservative setups. So move the mock_ggtt
struct to the heap.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190217202518.24730-1-chris@chris-wilson.co.uk
2019-02-17 21:07:46 +00:00
Chris Wilson
2a4a275403 drm/i915/selftests: Always free spinner on __sseu_prepare error
Prepare a nice little onion unwind to ensure that we always free the
spinner if we __sseu_prepare fails.

Fixes: c06ee6ff2c ("drm/i915/selftests: Context SSEU reconfiguration tests")
Reported-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190215195010.16637-1-chris@chris-wilson.co.uk
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
2019-02-16 11:06:22 +00:00
Chris Wilson
9095c86374 drm/i915/selftests: Drop unnecessary struct_mutex around i915_reset()
Since we no longer need to hold struct_mutex to perform a global device
reset, don't do so for igt_reset_wedge().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190215102732.15520-2-chris@chris-wilson.co.uk
2019-02-15 18:17:30 +00:00
Chris Wilson
c836eb79c0 drm/i915/selftests: Always use an active engine while resetting
Currently, we only try to reset a live engine for checking the whitelist
retention across a per-engine reset. For safety, it appears we need to
prime the system with a hanging spinner before performing a full-device
reset. (Figuring out the root cause behind the instability with handling
a reset during a no-op request is a challenge for another test, the
whitelist test has its own purpose.)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109626
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190213224805.32021-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
2019-02-15 09:37:31 +00:00
Chris Wilson
21182b3c4c drm/i915: Don't claim an unstarted request was guilty
If we haven't even begun executing the payload of the stalled request,
then we should not claim that its userspace context was guilty of
submitting a hanging batch.

v2: Check for context corruption before trying to restart.
v3: Preserve semaphores on skipping requests (need to keep the timelines
intact).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190208153708.20023-7-chris@chris-wilson.co.uk
2019-02-08 16:47:40 +00:00
Chris Wilson
2caffbf117 drm/i915: Revoke mmaps and prevent access to fence registers across reset
Previously, we were able to rely on the recursive properties of
struct_mutex to allow us to serialise revoking mmaps and reacquiring the
FENCE registers with them being clobbered over a global device reset.
I then proceeded to throw out the baby with the bath water in order to
pursue a struct_mutex-less reset.

Perusing LWN for alternative strategies, the dilemma on how to serialise
access to a global resource on one side was answered by
https://lwn.net/Articles/202847/ -- Sleepable RCU:

    1  int readside(void) {
    2      int idx;
    3      rcu_read_lock();
    4	   if (nomoresrcu) {
    5          rcu_read_unlock();
    6	       return -EINVAL;
    7      }
    8	   idx = srcu_read_lock(&ss);
    9	   rcu_read_unlock();
    10	   /* SRCU read-side critical section. */
    11	   srcu_read_unlock(&ss, idx);
    12	   return 0;
    13 }
    14
    15 void cleanup(void)
    16 {
    17     nomoresrcu = 1;
    18     synchronize_rcu();
    19     synchronize_srcu(&ss);
    20     cleanup_srcu_struct(&ss);
    21 }

No more worrying about stop_machine, just an uber-complex mutex,
optimised for reads, with the overhead pushed to the rare reset path.

However, we do run the risk of a deadlock as we allocate underneath the
SRCU read lock, and the allocation may require a GPU reset, causing a
dependency cycle via the in-flight requests. We resolve that by declaring
the driver wedged and cancelling all in-flight rendering.

v2: Use expedited rcu barriers to match our earlier timing
characteristics.
v3: Try to annotate locking contexts for sparse
v4: Reduce selftest lock duration to avoid a reset deadlock with fences
v5: s/srcu/reset_backoff_srcu/
v6: Remove more stale comments

Testcase: igt/gem_mmap_gtt/hang
Fixes: eb8d0f5af4 ("drm/i915: Remove GPU reset dependence on struct_mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190208153708.20023-2-chris@chris-wilson.co.uk
2019-02-08 16:47:32 +00:00
Chris Wilson
21950ee7cc drm/i915: Pull i915_gem_active into the i915_active family
Looking forward, we need to break the struct_mutex dependency on
i915_gem_active. In the meantime, external use of i915_gem_active is
quite beguiling, little do new users suspect that it implies a barrier
as each request it tracks must be ordered wrt the previous one. As one
of many, it can be used to track activity across multiple timelines, a
shared fence, which fits our unordered request submission much better. We
need to steer external users away from the singular, exclusive fence
imposed by i915_gem_active to i915_active instead. As part of that
process, we move i915_gem_active out of i915_request.c into
i915_active.c to start separating the two concepts, and rename it to
i915_active_request (both to tie it to the concept of tracking just one
request, and to give it a longer, less appealing name).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-5-chris@chris-wilson.co.uk
2019-02-05 17:20:11 +00:00
Chris Wilson
64d6c500a3 drm/i915: Generalise GPU activity tracking
We currently track GPU memory usage inside VMA, such that we never
release memory used by the GPU until after it has finished accessing it.
However, we may want to track other resources aside from VMA, or we may
want to split a VMA into multiple independent regions and track each
separately. For this purpose, generalise our request tracking (akin to
struct reservation_object) so that we can embed it into other objects.

v2: Tweak error handling during selftest setup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205130005.2807-2-chris@chris-wilson.co.uk
2019-02-05 17:12:00 +00:00
Chris Wilson
a21f453c73 drm/i915/selftests: Exercise some AB...BA preemption chains
Build a chain using 2 contexts (A, B) then request a preemption such
that a later A request runs before the spinner in B.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205123835.25331-1-chris@chris-wilson.co.uk
2019-02-05 16:16:02 +00:00
Tvrtko Ursulin
c06ee6ff2c drm/i915/selftests: Context SSEU reconfiguration tests
Exercise the context image reconfiguration logic for idle and busy
contexts, with the resets thrown into the mix as well.

Free from the uAPI restrictions this test runs on all Gen9+ platforms
with slice power gating.

v2:
 * Rename some helpers for clarity.
 * Include subtest names in error logs.
 * Remove unnecessary function export.

v3:
 * Rebase for RUNTIME_INFO.

v4:
 * Fix incomplete unexport from v2. (Chris Wilson)

v5:
 * Rebased for runtime pm api changes.

v6:
 * Rebased for i915_reset.c.

v7:
 * Tidy checkpatch warnings.
 * Consolidate error checking and logging a bit.
 * Skip idle test phase if something failed before it.

v8:
 (Chris Wilson)
 * Fix i915_request_wait error handling.
 * No need to PIN_HIGH the VMA.
 * Remove pointless GEM_BUG_ON before pointer dereference.

v9:
 * Avoid rq leak if rpcs query fails. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> # v6
Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-5-tvrtko.ursulin@linux.intel.com
2019-02-05 11:32:03 +00:00
Tvrtko Ursulin
7810858412 drm/i915: Add timeline barrier support
Timeline barrier allows serialization between different timelines.

After calling i915_timeline_set_barrier with a request, all following
submissions on this timeline will be set up as depending on this request,
or barrier. Once the barrier has been completed it automatically gets
cleared and things continue as normal.

This facility will be used by the upcoming context SSEU code.

v2:
 * Assert barrier has been retired on timeline_fini. (Chris Wilson)
 * Fix mock_timeline.

v3:
 * Improved comment language. (Chris Wilson)

v4:
 * Maintain ordering with previous barriers set on the timeline.

v5:
 * Rebase.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-3-tvrtko.ursulin@linux.intel.com
2019-02-05 11:32:03 +00:00
Chris Wilson
789659f430 drm/i915: Drop fake breadcrumb irq
Missed breadcrumb detection is defunct due to the tight coupling with
dma_fence signaling and the myriad ways we may signal fences from
everywhere but from an interrupt, i.e. we frequently signal a fence
before we even see its interrupt. This means that even if we miss an
interrupt for a fence, it still is signaled before our breadcrumb
hangcheck fires, so simplify the breadcrumb hangchecking by moving it
into the GPU hangcheck and forgo fake interrupts.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-3-chris@chris-wilson.co.uk
2019-01-29 21:45:30 +00:00
Chris Wilson
52c0fdb25c drm/i915: Replace global breadcrumbs with per-context interrupt tracking
A few years ago, see commit 688e6c7258 ("drm/i915: Slaughter the
thundering i915_wait_request herd"), the issue of handling multiple
clients waiting in parallel was brought to our attention. The
requirement was that every client should be woken immediately upon its
request being signaled, without incurring any cpu overhead.

To handle certain fragility of our hw meant that we could not do a
simple check inside the irq handler (some generations required almost
unbounded delays before we could be sure of seqno coherency) and so
request completion checking required delegation.

Before commit 688e6c7258, the solution was simple. Every client
waiting on a request would be woken on every interrupt and each would do
a heavyweight check to see if their request was complete. Commit
688e6c7258 introduced an rbtree so that only the earliest waiter on
the global timeline would woken, and would wake the next and so on.
(Along with various complications to handle requests being reordered
along the global timeline, and also a requirement for kthread to provide
a delegate for fence signaling that had no process context.)

The global rbtree depends on knowing the execution timeline (and global
seqno). Without knowing that order, we must instead check all contexts
queued to the HW to see which may have advanced. We trim that list by
only checking queued contexts that are being waited on, but still we
keep a list of all active contexts and their active signalers that we
inspect from inside the irq handler. By moving the waiters onto the fence
signal list, we can combine the client wakeup with the dma_fence
signaling (a dramatic reduction in complexity, but does require the HW
being coherent, the seqno must be visible from the cpu before the
interrupt is raised - we keep a timer backup just in case).

Having previously fixed all the issues with irq-seqno serialisation (by
inserting delays onto the GPU after each request instead of random delays
on the CPU after each interrupt), we can rely on the seqno state to
perfom direct wakeups from the interrupt handler. This allows us to
preserve our single context switch behaviour of the current routine,
with the only downside that we lose the RT priority sorting of wakeups.
In general, direct wakeup latency of multiple clients is about the same
(about 10% better in most cases) with a reduction in total CPU time spent
in the waiter (about 20-50% depending on gen). Average herd behaviour is
improved, but at the cost of not delegating wakeups on task_prio.

v2: Capture fence signaling state for error state and add comments to
warm even the most cold of hearts.
v3: Check if the request is still active before busywaiting
v4: Reduce the amount of pointer misdirection with list_for_each_safe
and using a local i915_request variable inside the loops
v5: Add a missing pluralisation to a purely informative selftest message.

References: 688e6c7258 ("drm/i915: Slaughter the thundering i915_wait_request herd")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129205230.19056-2-chris@chris-wilson.co.uk
2019-01-29 21:45:22 +00:00
Chris Wilson
c9a6462288 drm/i915/execlists: Suppress preempting self
In order to avoid preempting ourselves, we currently refuse to schedule
the tasklet if we reschedule an inflight context. However, this glosses
over a few issues such as what happens after a CS completion event and
we then preempt the newly executing context with itself, or if something
else causes a tasklet_schedule triggering the same evaluation to
preempt the active context with itself.

However, when we avoid preempting ELSP[0], we still retain the preemption
value as it may match a second preemption request within the same time period
that we need to resolve after the next CS event. However, since we only
store the maximum preemption priority seen, it may not match the
subsequent event and so we should double check whether or not we
actually do need to trigger a preempt-to-idle by comparing the top
priorities from each queue. Later, this gives us a hook for finer
control over deciding whether the preempt-to-idle is justified.

The sequence of events where we end up preempting for no avail is:

1. Queue requests/contexts A, B
2. Priority boost A; no preemption as it is executing, but keep hint
3. After CS switch, B is less than hint, force preempt-to-idle
4. Resubmit B after idling

v2: We can simplify a bunch of tests based on the knowledge that PI will
ensure that earlier requests along the same context will have the highest
priority.
v3: Demonstrate the stale preemption hint with a selftest

References: a2bf92e8cc ("drm/i915/execlists: Avoid kicking priority on the current context")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-4-chris@chris-wilson.co.uk
2019-01-29 20:00:05 +00:00
Chris Wilson
8547444137 drm/i915: Identify active requests
To allow requests to forgo a common execution timeline, one question we
need to be able to answer is "is this request running?". To track
whether a request has started on HW, we can emit a breadcrumb at the
beginning of the request and check its timeline's HWSP to see if the
breadcrumb has advanced past the start of this request. (This is in
contrast to the global timeline where we need only ask if we are on the
global timeline and if the timeline has advanced past the end of the
previous request.)

There is still confusion from a preempted request, which has already
started but relinquished the HW to a high priority request. For the
common case, this discrepancy should be negligible. However, for
identification of hung requests, knowing which one was running at the
time of the hang will be much more important.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-2-chris@chris-wilson.co.uk
2019-01-29 19:59:59 +00:00
Chris Wilson
06039d9820 drm/i915/selftests: Apply a subtest filter
In bringup on simulated HW even rudimentary tests are slow, and so many
may fail that we want to be able to filter out the noise to focus on the
specific problem. Even just the tests groups provided for igt is not
specific enough, and we would like to isolate one particular subtest
(and probably subsubtests!). For simplicity, allow the user to provide a
command line parameter such as

	i915.st_filter=i915_timeline_mock_selftests/igt_sync

to restrict ourselves to only running on subtest. The exact name to use
is given during a normal run, highlighted as an error if it failed,
debug otherwise. The test group is optional, and then all subtests are
compared for an exact match with the filter (most subtests have unique
names). The filter can be negated, e.g. i915.st_filter=!igt_sync and
then all tests but those that match will be run. More than one match can
be supplied separated by a comma, e.g.

	i915.st_filter=igt_vma_create,igt_vma_pin1

to only run those specified, or

	i915.st_filter=!igt_vma_create,!igt_vma_pin1

to run all but those named. Mixing a blacklist and whitelist will only
execute those subtests matching the whitelist so long as they are
previously excluded in the blacklist.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190129185452.20989-1-chris@chris-wilson.co.uk
2019-01-29 19:59:57 +00:00
Chris Wilson
5013eb8cd6 drm/i915: Track the context's seqno in its own timeline HWSP
Now that we have allocated ourselves a cacheline to store a breadcrumb,
we can emit a write from the GPU into the timeline's HWSP of the
per-context seqno as we complete each request. This drops the mirroring
of the per-engine HWSP and allows each context to operate independently.
We do not need to unwind the per-context timeline, and so requests are
always consistent with the timeline breadcrumb, greatly simplifying the
completion checks as we no longer need to be concerned about the
global_seqno changing mid check.

One complication though is that we have to be wary that the request may
outlive the HWSP and so avoid touching the potentially danging pointer
after we have retired the fence. We also have to guard our access of the
HWSP with RCU, the release of the obj->mm.pages should already be RCU-safe.

At this point, we are emitting both per-context and global seqno and
still using the single per-engine execution timeline for resolving
interrupts.

v2: s/fake_complete/mark_complete/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-5-chris@chris-wilson.co.uk
2019-01-28 19:07:09 +00:00
Chris Wilson
8ba306a6a3 drm/i915: Share per-timeline HWSP using a slab suballocator
If we restrict ourselves to only using a cacheline for each timeline's
HWSP (we could go smaller, but want to avoid needless polluting
cachelines on different engines between different contexts), then we can
suballocate a single 4k page into 64 different timeline HWSP. By
treating each fresh allocation as a slab of 64 entries, we can keep it
around for the next 64 allocation attempts until we need to refresh the
slab cache.

John Harrison noted the issue of fragmentation leading to the same worst
case performance of one page per timeline as before, which can be
mitigated by adopting a freelist.

v2: Keep all partially allocated HWSP on a freelist

This is still without migration, so it is possible for the system to end
up with each timeline in its own page, but we ensure that no new
allocation would needless allocate a fresh page!

v3: Throw a selftest at the allocator to try and catch invalid cacheline
reuse.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-4-chris@chris-wilson.co.uk
2019-01-28 19:07:06 +00:00
Chris Wilson
52954edd1f drm/i915: Allocate a status page for each timeline
Allocate a page for use as a status page by a group of timelines, as we
only need a dword of storage for each (rounded up to the cacheline for
safety) we can pack multiple timelines into the same page. Each timeline
will then be able to track its own HW seqno.

v2: Reuse the common per-engine HWSP for the solitary ringbuffer
timeline, so that we do not have to emit (using per-gen specialised
vfuncs) the breadcrumb into the distinct timeline HWSP and instead can
keep on using the common MI_STORE_DWORD_INDEX. However, to maintain the
sleight-of-hand for the global/per-context seqno switchover, we will
store both temporarily (and so use a custom offset for the shared timeline
HWSP until the switch over).

v3: Keep things simple and allocate a page for each timeline, page
sharing comes next.

v4: I was caught repeating the same MI_STORE_DWORD_IMM over and over
again in selftests.

v5: And caught red handed copying create timeline + check.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-3-chris@chris-wilson.co.uk
2019-01-28 19:07:02 +00:00