Commit Graph

16 Commits

Author SHA1 Message Date
Chris Wilson
67a3acaab7 drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_request
As we start peeking into requests for longer and longer, e.g.
incorporating use of spinlocks when only protected by an
rcu_read_lock(), we need to be careful in how we reset the request when
recycling and need to preserve any barriers that may still be in use as
the request is reset for reuse.

Quoting Linus Torvalds:

> If there is refcounting going on then why use SLAB_TYPESAFE_BY_RCU?

  .. because the object can be accessed (by RCU) after the refcount has
  gone down to zero, and the thing has been released.

  That's the whole and only point of SLAB_TYPESAFE_BY_RCU.

  That flag basically says:

  "I may end up accessing this object *after* it has been free'd,
  because there may be RCU lookups in flight"

  This has nothing to do with constructors. It's ok if the object gets
  reused as an object of the same type and does *not* get
  re-initialized, because we're perfectly fine seeing old stale data.

  What it guarantees is that the slab isn't shared with any other kind
  of object, _and_ that the underlying pages are free'd after an RCU
  quiescent period (so the pages aren't shared with another kind of
  object either during an RCU walk).

  And it doesn't necessarily have to have a constructor, because the
  thing that a RCU walk will care about is

    (a) guaranteed to be an object that *has* been on some RCU list (so
    it's not a "new" object)

    (b) the RCU walk needs to have logic to verify that it's still the
    *same* object and hasn't been re-used as something else.

  In contrast, a SLAB_TYPESAFE_BY_RCU memory gets free'd and re-used
  immediately, but because it gets reused as the same kind of object,
  the RCU walker can "know" what parts have meaning for re-use, in a way
  it couidn't if the re-use was random.

  That said, it *is* subtle, and people should be careful.

> So the re-use might initialize the fields lazily, not necessarily using a ctor.

  If you have a well-defined refcount, and use "atomic_inc_not_zero()"
  to guard the speculative RCU access section, and use
  "atomic_dec_and_test()" in the freeing section, then you should be
  safe wrt new allocations.

  If you have a completely new allocation that has "random stale
  content", you know that it cannot be on the RCU list, so there is no
  speculative access that can ever see that random content.

  So the only case you need to worry about is a re-use allocation, and
  you know that the refcount will start out as zero even if you don't
  have a constructor.

  So you can think of the refcount itself as always having a zero
  constructor, *BUT* you need to be careful with ordering.

  In particular, whoever does the allocation needs to then set the
  refcount to a non-zero value *after* it has initialized all the other
  fields. And in particular, it needs to make sure that it uses the
  proper memory ordering to do so.

  NOTE! One thing to be very worried about is that re-initializing
  whatever RCU lists means that now the RCU walker may be walking on the
  wrong list so the walker may do the right thing for this particular
  entry, but it may miss walking *other* entries. So then you can get
  spurious lookup failures, because the RCU walker never walked all the
  way to the end of the right list. That ends up being a much more
  subtle bug.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191122094924.629690-1-chris@chris-wilson.co.uk
2019-11-22 10:47:38 +00:00
Chris Wilson
253a774bb0 drm/i915/execlists: Don't merely skip submission if maybe timeslicing
Normally, we try and skip submission if ELSP[1] is filled. However, we
may desire to enable timeslicing due to the queue priority, even if
ELSP[1] itself does not require timeslicing. That is the queue is equal
priority to ELSP[0] and higher priority then ELSP[1]. Previously, we
would wait until the context switch to preempt the current ELSP[1], but
with timeslicing, we want to preempt ELSP[0] and replace it with the
queue.

In writing the test case, it become quickly apparent that we were also
suppressing the tasklet during promotion and so failing to notice when
the queue started requiring timeslicing.

Fixes: 2229adc813 ("drm/i915/execlist: Trim immediate timeslice expiry")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191018072027.31948-1-chris@chris-wilson.co.uk
2019-10-18 11:23:26 +01:00
Chris Wilson
25d851adbf drm/i915: Only reschedule the submission tasklet if preemption is possible
If we couple the scheduler more tightly with the execlists policy, we
can apply the preemption policy to the question of whether we need to
kick the tasklet at all for this priority bump.

v2: Rephrase it as a core i915 policy and not an execlists foible.
v3: Pull the kick together.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190507122544.12698-1-chris@chris-wilson.co.uk
2019-05-07 17:40:20 +01:00
Chris Wilson
3a891a6267 drm/i915: Move intel_engine_mask_t around for use by i915_request_types.h
We want to use intel_engine_mask_t inside i915_request.h, which means
extracting it from the general header file mess and placing it inside a
types.h. A knock on effect is that the compiler wants to warn about
type-contraction of ALL_ENGINES into intel_engine_maskt_t, so prepare
for the worst.

v2: Use intel_engine_mask_t consistently
v3: Move I915_NUM_ENGINES to its natural home at the end of the enum

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190401162641.10963-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2019-04-02 15:09:08 +01:00
Chris Wilson
103b76eeff drm/i915: Use i915_global_register()
Rather than manually add every new global into each hook, use
i915_global_register() function and keep a list of registered globals to
invoke instead.

However, I haven't found a way for random drivers to add an .init table
to avoid having to manually add ourselves to i915_globals_init() each
time.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305213830.18094-1-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
2019-03-06 10:00:50 +00:00
Chris Wilson
f9e9e9de58 drm/i915: Prioritise non-busywait semaphore workloads
We don't want to busywait on the GPU if we have other work to do. If we
give non-busywaiting workloads higher (initial) priority than workloads
that require a busywait, we will prioritise work that is ready to run
immediately. We then also have to be careful that we don't give earlier
semaphores an accidental boost because later work doesn't wait on other
rings, hence we keep a history of semaphore usage of the dependency chain.

v2: Stop rolling the bits into a chain and just use a flag in case this
request or any of our dependencies use a semaphore. The rolling around
was contagious as Tvrtko was heard to fall off his chair.

Testcase: igt/gem_exec_schedule/semaphore
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190301170901.8340-4-chris@chris-wilson.co.uk
2019-03-01 17:45:11 +00:00
Chris Wilson
b5773a3616 drm/i915/execlists: Suppress mere WAIT preemption
WAIT is occasionally suppressed by virtue of preempted requests being
promoted to NEWCLIENT if they have not all ready received that boost.
Make this consistent for all WAIT boosts that they are not allowed to
preempt executing contexts and are merely granted the right to be at the
front of the queue for the next execution slot. This is in keeping with
the desire that the WAIT boost be a minor tweak that does not give
excessive promotion to its user and open ourselves to trivial abuse.

The problem with the inconsistent WAIT preemption becomes more apparent
as the preemption is propagated across the engines, where one engine may
preempt and the other not, and we be relying on the exact execution
order being consistent across engines (e.g. using HW semaphores to
coordinate parallel execution).

v2: Also protect GuC submission from false preemption loops.
v3: Build bug safeguards and better debug messages for st.
v4: Do the priority bumping in unsubmit (i.e. on preemption/reset
unwind), applying it earlier during submit causes out-of-order execution
combined with execute fences.
v5: Call sw_fence_fini for our dummy request (Matthew)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228220639.3173-1-chris@chris-wilson.co.uk
2019-02-28 23:10:43 +00:00
Chris Wilson
bd5d6781a0 drm/i915: Use __ffs() in for_each_priolist for more compact code
Gcc has a slight preference if we use __ffs() to subtract one from the
index once rather than each use:

__execlists_submission_tasklet              2867    2847     -20

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190226102404.29153-11-chris@chris-wilson.co.uk
2019-02-28 11:18:43 +00:00
Chris Wilson
32eb6bcfdd drm/i915: Make request allocation caches global
As kmem_caches share the same properties (size, allocation/free behaviour)
for all potential devices, we can use global caches. While this
potential has worse fragmentation behaviour (one can argue that
different devices would have different activity lifetimes, but you can
also argue that activity is temporal across the system) it is the
default behaviour of the system at large to amalgamate matching caches.

The benefit for us is much reduced pointer dancing along the frequent
allocation paths.

v2: Defer shrinking until after a global grace period for futureproofing
multiple consumers of the slab caches, similar to the current strategy
for avoiding shrinking too early.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk
2019-02-28 11:07:56 +00:00
Chris Wilson
e9eaf82d97 drm/i915: Priority boost for waiting clients
Latency is in the eye of the beholder. In the case where a client stops
and waits for the gpu, give that request chain a small priority boost
(not so that it overtakes higher priority clients, to preserve the
external ordering) so that ideally the wait completes earlier.

v2: Tvrtko recommends to keep the boost-from-user-stall as small as
possible and to allow new client flows to be preferred for interactivity
over stalls.

Testcase: igt/gem_sync/switch-default
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-3-chris@chris-wilson.co.uk
2018-10-01 20:34:24 +01:00
Chris Wilson
e2f3496e93 drm/i915: Pull scheduling under standalone lock
Currently, the backend scheduling code abuses struct_mutex into order to
have a global lock to manipulate a temporary list (without widespread
allocation) and to protect against list modifications. This is an
extraneous coupling to struct_mutex and further can not extend beyond
the local device.

Pull all the code that needs to be under the one true lock into
i915_scheduler.c, and make it so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-2-chris@chris-wilson.co.uk
2018-10-01 20:34:21 +01:00
Chris Wilson
b16c765122 drm/i915: Priority boost for new clients
Taken from an idea used for FQ_CODEL, we give the first request of a
new request flows a small priority boost. These flows are likely to
correspond with short, interactive tasks and so be more latency sensitive
than the longer free running queues. As soon as the client has more than
one request in the queue, further requests are not boosted and it settles
down into ordinary steady state behaviour.  Such small kicks dramatically
help combat the starvation issue, by allowing each client the opportunity
to run even when the system is under heavy throughput load (within the
constraints of the user selected priority).

v2: Mark the preempted request as the start of a new flow, to prevent a
single client being continually gazumped by its peers.

Testcase: igt/benchmarks/rrul
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181001144755.7978-1-chris@chris-wilson.co.uk
2018-10-01 20:34:19 +01:00
Chris Wilson
7651a4452d drm/i915: Reserve some priority bits for internal use
In the next few patches, we will want to give a small priority boost to
some requests/queues but not so much that we perturb the user controlled
order. As such we will shift the user priority bits higher leaving
ourselves a few low priority bits for our internal bumping.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181001123204.23982-3-chris@chris-wilson.co.uk
2018-10-01 15:26:19 +01:00
Chris Wilson
b7268c5eed drm/i915: Pack params to engine->schedule() into a struct
Today we only want to pass along the priority to engine->schedule(), but
in the future we want to have much more control over the various aspects
of the GPU during a context's execution, for example controlling the
frequency allowed. As we need an ever growing number of parameters for
scheduling, move those into a struct for convenience.

v2: Move the anonymous struct into its own function for legibility and
ye olde gcc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180418184052.7129-3-chris@chris-wilson.co.uk
2018-04-18 21:09:11 +01:00
Chris Wilson
0c7112a002 drm/i915: Rename priotree to sched
Having moved the priotree struct into i915_scheduler.h, identify it as
the scheduling element and rebrand into i915_sched. This becomes more
useful as we start attaching more information we require to propagate
through the scheduler.

v2: Use i915_sched_node for future distinctiveness

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180418184052.7129-2-chris@chris-wilson.co.uk
2018-04-18 21:09:09 +01:00
Chris Wilson
98ff5c7830 drm/i915: Move the priotree struct to its own headers
Over time the priotree has grown from a sorted list to a more
complicated structure for propagating constraints along the dependency
chain to try and resolve priority inversion. Start to segregate this
information from the rest of the request/fence tracking.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180418184052.7129-1-chris@chris-wilson.co.uk
2018-04-18 21:09:08 +01:00