Now that we use the CSB stored in the CPU friendly HWSP, we do not need
to track interrupts for when the mmio CSB registers are valid and can
just check where we read up to last from the cached HWSP. This means we
can forgo the atomic bit tracking from interrupt, and in the next patch
it means we can check the CSB at any time.
v2: Change the splitting inside reset_prepare, we only want to lose
testing the interrupt in this patch, the next patch requires the change
in locking
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-8-chris@chris-wilson.co.uk
On HW reset, the HW clears the write pointer (to 0). But since it also
writes its first CSB entry to slot 0, we need to reset the write pointer
back to the element before (so the first entry we read is 0).
This is required for the next patch, where we trust the CSB completely!
v2: Use _MASKED_FIELD
v3: Store the reset value, so that we differentiate between mmio/hwsp
transparently and without pretense.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-6-chris@chris-wilson.co.uk
Following the removal of the last workarounds, the only CSB mmio access
is for the old vGPU interface. The mmio registers presented by vGPU do
not require forcewake and can be treated as ordinary volatile memory,
i.e. they behave just like the HWSP access just at a different location.
We can reduce the CSB access to a set of read/write/buffer pointers and
treat the various paths identically and not worry about forcewake.
(Forcewake is nightmare for worstcase latency, and we want to process
this all with irqsoff -- no latency allowed!)
v2: Comments, comments, comments. Well, 2 bonus comments.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628201211.13837-5-chris@chris-wilson.co.uk
commit b2209e62a4 ("drm/i915/execlists: Reset the CSB head tracking on
reset/sanitization") and commit 1288786b18 ("drm/i915: Move GEM sanitize
from resume_early to resume") show the conflicting requirements on the
code. We must reset the GPU before trashing live state on a fast resume
(hibernation debug, or error paths), but we must only reset our state
tracking iff the GPU is reset (or power cycled). This is tricky if we
are disabling GPU reset to simulate broken hardware; we reset our state
tracking but the GPU is left intact and recovers from its stale state.
v2: Again without the assertion for forcewake, no longer required since
commit b3ee09a4de ("drm/i915/ringbuffer: Fix context restore upon reset")
as the contexts are reset from the CS ensuring everything is powered up.
Fixes: b2209e62a4 ("drm/i915/execlists: Reset the CSB head tracking on reset/sanitization")
Fixes: 1288786b18 ("drm/i915: Move GEM sanitize from resume_early to resume")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180616202534.18767-1-chris@chris-wilson.co.uk
Hangcheck is our back up in case the GPU or the driver gets stuck. It
detects when the GPU is not making any progress and issues a GPU reset.
However, if the driver is failing to make any progress, we can get
ourselves into a situation where we continually try resetting the GPU to
no avail. Employ a second timeout such that if we continue to see the
same seqno (the stalled engine has made no progress at all) over the
course of several hangchecks, declare the driver wedged and attempt to
start afresh.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180602104853.17140-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
After triggering the mm switch with a load of PD_DIR, which may be
deferred unto the MI_SET_CONTEXT on rcs, serialise the next commands
with that load by posting a read of PD_DIR (or else those subsequent
commands may access the stale page tables).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180611171825.13678-2-chris@chris-wilson.co.uk
The HW only accepts offsets within ring->size, and fails peculiarly if
the RING_HEAD or RING_TAIL is set to ring->size. Therefore whenever we
set ring->head/ring->tail we want to make sure it is within value (using
intel_ring_wrap()).
v2: Double check execlists as well
v3: Remove redundancy with assert_ring_tail_valid()
v4: Just assert in intel_ring_reset() rather than be over-defensive.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v2
Link: https://patchwork.freedesktop.org/patch/msgid/20180611110845.31890-2-chris@chris-wilson.co.uk
The discovery with trying to enable full-ppgtt was that we were
completely failing to the load both the mm and context following the
reset. Although we were performing mmio to set the PP_DIR (per-process
GTT) and CCID (context), these were taking no effect (the assumption was
that this would trigger reload of the context and restore the page
tables). It was not until we performed the LRI + MI_SET_CONTEXT in a
following context switch would anything occur.
Since we are then required to reset the context image and PP_DIR using
CS commands, we place those commands into every batch. The hardware
should recognise the no-ops and eliminate the expensive context loads,
but we still have to pay the cost of using cross-powerwell register
writes. In practice, this has no effect on actual context switch times,
and only adds a few hundred nanoseconds to no-op switches. We can improve
the latter by eliminating the w/a around known no-op switches, but there
is an ulterior motive to keeping them.
Always emitting the context switch at the beginning of the request (and
relying on HW to skip unneeded switches) does have one key advantage.
Should we implement request reordering on Haswell, we will not know in
advance what the previous executing context was on the GPU and so we
would not be able to elide the MI_SET_CONTEXT commands ourselves and
always have to emit them. Having our hand forced now actually prepares
us for later.
Now since that context and mm follow the request, we no longer (and not
for a long time since requests took over!) require a trace point to tell
when we write the switch into the ring, since it is always. (This is
even more important when you remember that simply writing into the ring
bears no relation to the current mm.)
v2: Sandybridge has to agree to use LRI as well.
Testcase: igt/drv_selftests/live_hangcheck
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180611110845.31890-1-chris@chris-wilson.co.uk
We want to be able to reset the GPU from inside a timer callback
(hardirq context). One step requires us to copy the default context
state over to the guilty context, which means we need to plan in advance
to have that object accessible from within an atomic context. The atomic
context prevents us from pinning the object or from peeking into the
shmemfs backing store (all may sleep), so we choose to pin the
default_state into memory when the engine becomes active. This
compromise allows us to swap out the default state when idle, when
required.
References: 5692251c25 ("drm/i915/lrc: Scrub the GPU state of the guilty hanging request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180518090212.5349-2-chris@chris-wilson.co.uk
To ease the frequent and ugly pointer dance of
&request->gem_context->engine[request->engine->id] during request
submission, store that pointer as request->hw_context. One major
advantage that we will exploit later is that this decouples the logical
context state from the engine itself.
v2: Set mock_context->ops so we don't crash and burn in selftests.
Cleanups from Tvrtko.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-3-chris@chris-wilson.co.uk
We cannot call kthread_park() from softirq context, so let's avoid it
entirely during the reset. We wanted to suspend the signaler so that it
would not mark a request as complete at the same time as we marked it as
being in error. Instead of parking the signaling, stop the engine from
advancing so that the GPU doesn't emit the breadcrumb for our chosen
"guilty" request.
v2: Refactor setting STOP_RING so that we don't have the same code thrice
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Michałt Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-8-chris@chris-wilson.co.uk
In preparation to more carefully handling incomplete preemption during
reset by execlists, we move the existing code wholesale to the backends
under a couple of new reset vfuncs.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
CC: Michel Thierry <michel.thierry@intel.com>
Cc: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180516183355.10553-4-chris@chris-wilson.co.uk
We need to move to a more flexible timeline that doesn't assume one
fence context per engine, and so allow for a single timeline to be used
across a combination of engines. This means that preallocating a fence
context per engine is now a hindrance, and so we want to introduce the
singular timeline. From the code perspective, this has the notable
advantage of clearing up a lot of mirky semantics and some clumsy
pointer chasing.
By splitting the timeline up into a single entity rather than an array
of per-engine timelines, we can realise the goal of the previous patch
of tracking the timeline alongside the ring.
v2: Tweak wait_for_idle to stop the compiling thinking that ret may be
uninitialised.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk
In the future, we want to move a request between engines. To achieve
this, we first realise that we have two timelines in effect here. The
first runs through the GTT is required for ordering vma access, which is
tracked currently by engine. The second is implied by sequential
execution of commands inside the ringbuffer. This timeline is one that
maps to userspace's expectations when submitting requests (i.e. given the
same context, batch A is executed before batch B). As the rings's
timelines map to userspace and the GTT timeline an implementation
detail, move the timeline from the GTT into the ring itself (per-context
in logical-ring-contexts/execlists, or a global per-engine timeline for
the shared ringbuffers in legacy submission.
The two timelines are still assumed to be equivalent at the moment (no
migrating requests between engines yet) and so we can simply move from
one to the other without adding extra ordering.
v2: Reinforce that one isn't allowed to mix the engine execution
timeline with the client timeline from userspace (on the ring).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-1-chris@chris-wilson.co.uk
We don't need to track every ring for its lifetime as they are managed
by the contexts/engines. What we do want to track are the live rings so
that we can sporadically clean up requests if userspace falls behind. We
can simply restrict the gt->rings list to being only gt->live_rings.
v2: s/live/active/ for consistency with gt.active_requests
Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-4-chris@chris-wilson.co.uk
In the next patch, rings are the central timeline as requests may jump
between engines. Therefore in the future as we retire in order along the
engine timeline, we may retire out-of-order within a ring (as the ring now
occurs along multiple engines), leading to much hilarity in miscomputing
the position of ring->head.
As an added bonus, retiring along the ring reduces the penalty of having
one execlists client do cleanup for another (old legacy submission
shares a ring between all clients). The downside is that slow and
irregular (off the critical path) process of cleaning up stale requests
after userspace becomes a modicum less efficient.
In the long run, it will become apparent that the ordered
ring->request_list matches the ring->timeline, a fun challenge for the
future will be unifying the two lists to avoid duplication!
v2: We need both engine-order and ring-order processing to maintain our
knowledge of where individual rings have completed upto as well as
knowing what was last executing on any engine. And finally by decoupling
retiring the contexts on the engine and the timelines along the rings,
we do have to keep a reference to the context on each request
(previously it was guaranteed by the context being pinned).
v3: Not just a reference to the context, but we need to keep it pinned
as we manipulate the rings; i.e. we need a pin for both the manipulation
of the engine state during its retirements, and a separate pin for the
manipulation of the ring state.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-3-chris@chris-wilson.co.uk
We can convert engine stats from a spinlock to seqlock to ensure interrupt
processing is never even a tiny bit delayed by parallel readers.
There is a smidgen bit more cost on the write lock side, and an extremely
unlikely chance that readers will have to retry a few times in face of
heavy interrupt load. But it should be extremely unlikely given how
lightweight read side section is compared to the interrupt processing
side, and also compared to the rest of the code paths which can lead into
it. Furthermore, writer is the ones doing the real, latency sensitive
work, while readers are only informative.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180426074716.7352-1-tvrtko.ursulin@linux.intel.com
Today we only want to pass along the priority to engine->schedule(), but
in the future we want to have much more control over the various aspects
of the GPU during a context's execution, for example controlling the
frequency allowed. As we need an ever growing number of parameters for
scheduling, move those into a struct for convenience.
v2: Move the anonymous struct into its own function for legibility and
ye olde gcc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180418184052.7129-3-chris@chris-wilson.co.uk
This has grown to be a sizable amount of code, so move it to
its own file before we try to refactor anything. For the moment,
we are leaving behind the WA BB code and the WAs that get applied
(incorrectly) in init_clock_gating, but we will deal with it later.
v2: Use intel_ prefix for code that deals with the hardware (Chris)
v3: Rebased
v4:
- Rebased
- New license header
v5:
- Rebased
- Added some organisational notes to the file (Chris)
v6: Include DOC section in the documentation build (Jani)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
[ickle: appease checkpatch, mostly]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/1523376767-18480-1-git-send-email-oscar.mateo@intel.com
Let's avoid having to delve down the pointer chain to see if the i915
device has support for preemption and store that on the engine, which
made the decision in the first place!
v2: Refactor common preemption policy between execlists/guc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tomasz Lis <tomasz.lis@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180403183537.5522-1-chris@chris-wilson.co.uk
We would like to start doing some bookkeeping at the beginning, between
contexts and at the end of execlists submission. We already mark the
beginning and end using EXECLISTS_ACTIVE_USER, to provide an indication
when the HW is idle. This give us a pair of sequence points we can then
expand on for further bookkeeping.
v2: Refactor guc submission to share the same begin/end.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180331130626.10712-1-chris@chris-wilson.co.uk
ICL 11 has a greater number of maximum subslices. This patch
reflects this.
v2: GEN11 updates to MCR_SELECTOR (Oscar)
v3: Copypasta error in the new defines (Lionel)
Bspec: 21139
BSpec: 21108
Signed-off-by: Kelvin Gardiner <kelvin.gardiner@intel.com>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com> (v1)
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> (v1)
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180316121456.11577-3-mika.kuoppala@linux.intel.com
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
The only usage outside the intel_lrc.c file is in the ringbuffer
init, but the irq mask calculated there is then overwritten for
all engines that have a non-zero shift, so we can drop it.
This change is not aimed at code saving but at removing from
intel_engines information that does not apply to all gens that have
the engine. When checking without the temporary WARN_ON, code size
is basically unchanged.
v2: make the irq_shifts array static const
v3: rebase, move irq_shifts array to logical_ring_default_irqs
v4: move array inside the if and use u8 for it (Chris)
Suggested-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180314182653.26981-4-daniele.ceraolospurio@intel.com
There is some redundancy between dma_fence->ops->enable_signaling (via
i915_fence_enable_signaling) and our backend,
intel_engine_enable_signaling() in that both levels recheck the fence
status multiple times. If we convert intel_engine_enable_signaling() to
return the information desired by dma_fence->ops->enable_signaling, we
can reduce i915_fence_enable_signaling to a simple stub and avoid
trying to reinterpret the same information.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180308140732.25090-1-chris@chris-wilson.co.uk
Header intel_ringbuffer.h is using definitions from i915_reg.h
but forget to include it. Remove this hidden dependency by
explicitly include missing header.
v2: add reminder (Chris)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180308095037.18264-2-michal.wajdeczko@intel.com
Up to now, subslice mask was assumed to be uniform across slices. But
starting with Cannonlake, slices can be asymmetric (for example slice0
has different number of subslices as slice1+). This change stores all
subslices masks for all slices rather than having a single mask that
applies to all slices.
v2: Rework how we store total numbers in sseu_dev_info (Tvrtko)
Fix CHV eu masks, was reading disabled as enabled (Tvrtko)
Readability changes (Tvrtko)
Add EU index helper (Tvrtko)
v3: Turn ALIGN(v, 8) / 8 into DIV_ROUND_UP(v, BITS_PER_BYTE) (Tvrtko)
Reuse sseu_eu_idx() for setting eu_mask on CHV (Tvrtko)
Reformat debug prints for subslices (Tvrtko)
v4: Change eu_mask helper into sseu_set_eus() (Tvrtko)
v5: With Haswell reporting masks & counts, bump sseu_*_eus() functions
to use u16 (Lionel)
v6: Fix sseu_get_eus() for > 8 EUs per subslice (Lionel)
v7: Change debugfs enabels for number of subslices per slice, will
need a small igt/pm_sseu change (Lionel)
Drop subslice_total field from sseu_dev_info, rely on
sseu_subslice_total() to recompute the value instead (Lionel)
v8: Remove unused function compute_subslice_total() (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180306122857.27317-2-lionel.g.landwerlin@intel.com
Enhanced Execlists is an upgraded version of execlists which supports
up to 8 ports. The lrcs to be submitted are written to a submit queue
(the ExecLists Submission Queue - ELSQ), which is then loaded on the
HW. When writing to the ELSP register, the lrcs are written cyclically
in the queue from position 0 to position 7. Alternatively, it is
possible to write directly in the individual positions of the queue
using the ELSQC registers. To be able to re-use all the existing code
we're using the latter method and we're currently limiting ourself to
only using 2 elements.
v2: Rebase.
v3: Switch from !IS_GEN11 to GEN < 11 (Daniele Ceraolo Spurio).
v4: Use the elsq registers instead of elsp. (Daniele Ceraolo Spurio)
v5: Reword commit, rename regs to be closer to specs, turn off
preemption (Daniele), reuse engine->execlists.elsp (Chris)
v6: use has_logical_ring_elsq to differentiate the new paths
v7: add preemption support, rename els to submit_reg (Chris)
v8: save the ctrl register inside the execlists struct, drop CSB
handling updates (superseded by preempt_complete_status) (Chris)
v9: s/drm_i915_gem_request/i915_request (Mika)
v10: resolved conflict in inject_preempt_context (Mika)
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180302161501.28594-4-mika.kuoppala@linux.intel.com
Previously, we would spin waiting for all waiters to wake up and notice
their request had completed before we would reset the seqno upon
wraparound. However, we can mark their waits as complete and wake them
up directly using the existing machinery for handling the flushing of
missed wakeups when idling.
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180306130143.13312-2-chris@chris-wilson.co.uk
The goal here is to try and reduce the latency of signaling additional
requests following the wakeup from interrupt by reducing the list of
to-be-signaled requests from an rbtree to a sorted linked list. The
original choice of using an rbtree was to facilitate random insertions
of request into the signaler while maintaining a sorted list. However,
if we assume that most new requests are added when they are submitted,
we see those new requests in execution order making a insertion sort
fast, and the reduction in overhead of each signaler iteration
significant.
Since commit 56299fb7d9 ("drm/i915: Signal first fence from irq handler
if complete"), we signal most fences directly from notify_ring() in the
interrupt handler greatly reducing the amount of work that actually
needs to be done by the signaler kthread. All the thread is then
required to do is operate as the bottom-half, cleaning up after the
interrupt handler and preparing the next waiter. This includes signaling
all later completed fences in a saturated system, but on a mostly idle
system we only have to rebuild the wait rbtree in time for the next
interrupt. With this de-emphasis of the signaler's role, we want to
rejig it's datastructures to reduce the amount of work we require to
both setup the signal tree and maintain it on every interrupt.
References: 56299fb7d9 ("drm/i915: Signal first fence from irq handler if complete")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180222092545.17216-1-chris@chris-wilson.co.uk
Gen11 will add more VCS and VECS rings so prepare the
infrastructure to support that.
Bspec: 7021
v2: Rebase.
v3: Rebase.
v4: Rebase.
v5: Rebase.
v6:
- Update for POR changes. (Daniele Ceraolo Spurio)
- Add provisional guc engine ids - to be checked and confirmed.
v7:
- Rebased.
- Added the new ring masks.
- Added the new HW ids.
v8:
- Introduce I915_MAX_VCS/VECS to avoid magic numbers (Michal)
v9: increase MAX_ENGINE_INSTANCE to 3
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Oscar Mateo <oscar.mateo@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180228101153.7224-1-mika.kuoppala@linux.intel.com
Sometimes we need to boost the priority of an in-flight request, which
may lead to the situation where the second submission port then contains
a higher priority context than the first and so we need to inject a
preemption event. To do so we must always check inside
execlists_dequeue() whether there is a priority inversion between the
ports themselves as well as the head of the priority sorted queue, and we
cannot just skip dequeuing if the queue is empty.
As Michał noted, this doesn't simply extend to handling more than 2-port
submission, as we may need to reorder within the array of executing
requests which themselves are lower priority than the first. A task for
later!
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180222142229.14517-1-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
We want to de-emphasize the link between the request (dependency,
execution and fence tracking) from GEM and so rename the struct from
drm_i915_gem_request to i915_request. That is we may implement the GEM
user interface on top of requests, but they are an abstraction for
tracking execution rather than an implementation detail of GEM. (Since
they are not tied to HW, we keep the i915 prefix as opposed to intel.)
In short, the spatch:
@@
@@
- struct drm_i915_gem_request
+ struct i915_request
A corollary to contracting the type name, we also harmonise on using
'rq' shorthand for local variables where space if of the essence and
repetition makes 'request' unwieldy. For globals and struct members,
'request' is still much preferred for its clarity.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>
Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
When dumping the engine, we print out the current register values. This
requires the rpm wakeref. If the device is alseep, we can assume the
engine is asleep (and the register state is uninteresting) so skip and
only acquire the rpm wakeref if the device is already awake.
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180212102415.24246-1-chris@chris-wilson.co.uk
If we remove some hardcoded assumptions about the preempt context having
a fixed id, reserved from use by normal user contexts, we may only
allocate the i915_gem_context when required. Then the subsequent
decisions on using preemption reduce to having the preempt context
available.
v2: Include an assert that we don't allocate the preempt context twice.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Arkadiusz Hiler <arkadiusz.hiler@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180207210544.26351-3-chris@chris-wilson.co.uk
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Commit 99e48bf98d ("drm/i915: Lock out execlist tasklet while peeking
inside for busy-stats") added a tasklet_disable call in busy stats
enabling, but we failed to understand that the PMU enable callback runs
as an hard IRQ (IPI).
Consequence of this is that the PMU enable callback can interrupt the
execlists tasklet, and will then deadlock when it calls
intel_engine_stats_enable->tasklet_disable.
To fix this, I realized it is possible to move the engine stats enablement
and disablement to PMU event init and destroy hooks. This allows for much
simpler implementation since those hooks run in normal context (can
sleep).
v2: Extract engine_event_destroy. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Fixes: 99e48bf98d ("drm/i915: Lock out execlist tasklet while peeking inside for busy-stats")
Testcase: igt/perf_pmu/enable-race-*
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180205093448.13877-1-tvrtko.ursulin@linux.intel.com
Pass in a format string (and args) to specify the header to be emitted
along with the engine state when pretty-printing. This allows the header
to be emitted inside the drm_printer stream, so sharing the same prefix
and output characteristics (e.g. debug level and filtering).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171208012303.25504-2-chris@chris-wilson.co.uk
Chris requested this backmerge for a reconciliation on
drm_print.h between drm-misc-next and drm-intel-next-queued
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Currently on every submission, we recalculate the ELSP register offset
for the engine, after chasing the pointers to find the iomem base. Since
this is fixed for the lifetime of the driver, record the offset in the
execlists struct.
In practice the difference is negligible, it just happens to remove 27
bytes of eyesore pointer dancing from next to the hottest instruction
(which is itself due to stalling for a cache miss) in perf profiles of
the execlists_submission_tasklet().
v2: Trim off one more elsp local.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171207222434.17686-1-chris@chris-wilson.co.uk
- Init clock gate fix (Ville)
- Execlists event handling corrections (Chris, Michel)
- Improvements on GPU Cache invalidation and context switch (Chris)
- More perf OA changes (Lionel)
- More selftests improvements and fixes (Chris, Matthew)
- Clean-up on modules parameters (Chris)
- Clean-up around old ringbuffer submission and hw semaphore on old platforms (Chris)
- More Cannonlake stabilization effort (David, James)
- Display planes clean-up and improvements (Ville)
- New PMU interface for perf queries... (Tvrtko)
- ... and other subsequent PMU changes and fixes (Tvrtko, Chris)
- Remove success dmesg noise from rotation (Chris)
- New DMC for Kabylake (Anusha)
- Fixes around atomic commits (Daniel)
- GuC updates and fixes (Sagar, Michal, Chris)
- Couple gmbus/i2c fixes (Ville)
- Use exponential backoff for all our wait_for() (Chris)
- Fixes for i915/fbdev (Chris)
- Backlight fixes (Arnd)
- Updates on shrinker (Chris)
- Make Hotplug enable more robuts (Chris)
- Disable huge pages (TPH) on lack of a needed workaround (Joonas)
- New GuC images for SKL, KBL, BXT (Sagar)
- Add HW Workaround for Geminilake performance (Valtteri)
- Fixes for PPS timings (Imre)
- More IPS fixes (Maarten)
- Many fixes for Display Port on gen2-gen4 (Ville)
- Retry GPU reset making the recover from hang more robust (Chris)
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJaIf0wAAoJEPpiX2QO6xPK/wAH/0JrYX+oml+ZJ7xV4wxm0S5S
K8jcSWQPlF78xkuiVaA0200qOPJP4Yp32Gmv7dVBFKq7IaLn3c8B2DjqE3D0QPue
ZPSMMFz7NTMVjfKpmkCp9V3ln87Cc/Cj0L66CKRuFnvefDCGLN4hULj3NAxQfkwm
9CsUH+p5d7usXgIB7tIovNEKw4hLiTQiTUd5bwok6iJ3EB79DHxxlY8nG6n5ukEr
9XryS1ADGmTUy7YifIdJKPyDLjEeP+xRMApUva67y7zaTk7iW6mETkBJ7WpQbCCL
fQaLlmU/SrBD8ViJBx19U6ofKIVJ+vVMoU9JNg4Ak5WZOfjw7aIPiZaaHU3Aj+w=
=IutN
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-next-2017-12-01' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
[airlied: fix conflict in intel_dsi.c]
drm-intel-next-2017-12-01:
- Init clock gate fix (Ville)
- Execlists event handling corrections (Chris, Michel)
- Improvements on GPU Cache invalidation and context switch (Chris)
- More perf OA changes (Lionel)
- More selftests improvements and fixes (Chris, Matthew)
- Clean-up on modules parameters (Chris)
- Clean-up around old ringbuffer submission and hw semaphore on old platforms (Chris)
- More Cannonlake stabilization effort (David, James)
- Display planes clean-up and improvements (Ville)
- New PMU interface for perf queries... (Tvrtko)
- ... and other subsequent PMU changes and fixes (Tvrtko, Chris)
- Remove success dmesg noise from rotation (Chris)
- New DMC for Kabylake (Anusha)
- Fixes around atomic commits (Daniel)
- GuC updates and fixes (Sagar, Michal, Chris)
- Couple gmbus/i2c fixes (Ville)
- Use exponential backoff for all our wait_for() (Chris)
- Fixes for i915/fbdev (Chris)
- Backlight fixes (Arnd)
- Updates on shrinker (Chris)
- Make Hotplug enable more robuts (Chris)
- Disable huge pages (TPH) on lack of a needed workaround (Joonas)
- New GuC images for SKL, KBL, BXT (Sagar)
- Add HW Workaround for Geminilake performance (Valtteri)
- Fixes for PPS timings (Imre)
- More IPS fixes (Maarten)
- Many fixes for Display Port on gen2-gen4 (Ville)
- Retry GPU reset making the recover from hang more robust (Chris)
* tag 'drm-intel-next-2017-12-01' of git://anongit.freedesktop.org/drm/drm-intel: (101 commits)
drm/i915: Update DRIVER_DATE to 20171201
drm/i915/cnl: Mask previous DDI - PLL mapping
drm/i915: Remove unsafe i915.enable_rc6
drm/i915: Sleep and retry a GPU reset if at first we don't succeed
drm/i915: Interlaced DP output doesn't work on VLV/CHV
drm/i915: Pass crtc state to intel_pipe_{enable,disable}()
drm/i915: Wait for pipe to start on i830 as well
drm/i915: Fix vblank timestamp/frame counter jumps on gen2
drm/i915: Fix deadlock in i830_disable_pipe()
drm/i915: Fix has_audio readout for DDI A
drm/i915: Don't add the "force audio" property to DP connectors that don't support audio
drm/i915: Disable DP audio for g4x
drm/i915/selftests: Wake the device before executing requests on the GPU
drm/i915: Set fake_vma.size as well as fake_vma.node.size for capture
drm/i915: Tidy up signed/unsigned comparison
drm/i915: Enable IPS with only sprite plane visible too, v4.
drm/i915: Make ips_enabled a property depending on whether IPS is enabled, v3.
drm/i915: Avoid PPS HW/SW state mismatch due to rounding
drm/i915: Skip switch-to-kernel-context on suspend when wedged
drm/i915/glk: Apply WaProgramL3SqcReg1DefaultForPerf for GLK too
...
- Many improvements for selftests and other igt tests (Chris)
- Forcewake with PUNIT->PMIC bus fixes and robustness (Hans)
- Define an engine class for uABI (Tvrtko)
- Context switch fixes and improvements (Chris)
- GT powersavings and power gating simplification and fixes (Chris)
- Other general driver clean-ups (Chris, Lucas, Ville)
- Removing old, useless and/or bad workarounds (Chris, Oscar, Radhakrishna)
- IPS, pipe config, etc in preparation for another Fast Boot attempt (Maarten)
- OA perf fixes and support to Coffee Lake and Cannonlake (Lionel)
- Fixes around GPU fault registers (Michel)
- GEM Proxy (Tina)
- Refactor of Geminilake and Cannonlake plane color handling (James)
- Generalize transcoder loop (Mika Kahola)
- New HW Workaround for Cannonlake and Geminilake (Rodrigo)
- Resume GuC before using GEM (Chris)
- Stolen Memory handling improvements (Ville)
- Initialize entry in PPAT for older compilers (Chris)
- Other fixes and robustness improvements on execbuf (Chris)
- Improve logs of GEM_BUG_ON (Mika Kuoppala)
- Rework with massive rename of GuC functions and files (Sagar)
- Don't sanitize frame start delay if pipe is off (Ville)
- Cannonlake clock fixes (Rodrigo)
- Cannonlake HDMI 2.0 support (Rodrigo)
- Add a GuC doorbells selftest (Michel)
- Add might_sleep() check to our wait_for() (Chris)
Many GVT changes for 4.16:
- CSB HWSP update support (Weinan)
- GVT debug helpers, dyndbg and debugfs (Chuanxiao, Shuo)
- full virtualized opregion (Xiaolin)
- VM health check for sane fallback (Fred)
- workload submission code refactor for future enabling (Zhi)
- Updated repo URL in MAINTAINERS (Zhenyu)
- other many misc fixes
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJaD2cyAAoJEPpiX2QO6xPKuiEH/2/J7Ebf5IRZtaTU+ke2uOI4
2YCdrn9F1guz6d+cZtsLPkJ9JwQlz9EftfB7KT+9dT8viEG0FFna9bV+Xz3wyGQ6
DRlP9tCFnCDaOyZBI5QshubuzldabPpfscPJI7/EMr91jtveGhKIhsRzHBxKCEZF
LKlAHtXAWSkTozmh6bU+wf5TEOFzYv2oquTVn5ZJrpYlqup/wEKh+KnL9eBQ3+Qp
FLnmKjInaadOV/uXQfeWstJuohG/pfcNm68OmDOxYNmwpeNnwbtfKT9eZeDtDZDy
dXj9mokeTwg4fBrXX/tyxuKogywxQSNFTqCU2yY9up+35ykmjVN8p/1BYi+GGe0=
=ePes
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-next-2017-11-17-1' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
More change sets for 4.16:
- Many improvements for selftests and other igt tests (Chris)
- Forcewake with PUNIT->PMIC bus fixes and robustness (Hans)
- Define an engine class for uABI (Tvrtko)
- Context switch fixes and improvements (Chris)
- GT powersavings and power gating simplification and fixes (Chris)
- Other general driver clean-ups (Chris, Lucas, Ville)
- Removing old, useless and/or bad workarounds (Chris, Oscar, Radhakrishna)
- IPS, pipe config, etc in preparation for another Fast Boot attempt (Maarten)
- OA perf fixes and support to Coffee Lake and Cannonlake (Lionel)
- Fixes around GPU fault registers (Michel)
- GEM Proxy (Tina)
- Refactor of Geminilake and Cannonlake plane color handling (James)
- Generalize transcoder loop (Mika Kahola)
- New HW Workaround for Cannonlake and Geminilake (Rodrigo)
- Resume GuC before using GEM (Chris)
- Stolen Memory handling improvements (Ville)
- Initialize entry in PPAT for older compilers (Chris)
- Other fixes and robustness improvements on execbuf (Chris)
- Improve logs of GEM_BUG_ON (Mika Kuoppala)
- Rework with massive rename of GuC functions and files (Sagar)
- Don't sanitize frame start delay if pipe is off (Ville)
- Cannonlake clock fixes (Rodrigo)
- Cannonlake HDMI 2.0 support (Rodrigo)
- Add a GuC doorbells selftest (Michel)
- Add might_sleep() check to our wait_for() (Chris)
Many GVT changes for 4.16:
- CSB HWSP update support (Weinan)
- GVT debug helpers, dyndbg and debugfs (Chuanxiao, Shuo)
- full virtualized opregion (Xiaolin)
- VM health check for sane fallback (Fred)
- workload submission code refactor for future enabling (Zhi)
- Updated repo URL in MAINTAINERS (Zhenyu)
- other many misc fixes
* tag 'drm-intel-next-2017-11-17-1' of git://anongit.freedesktop.org/drm/drm-intel: (260 commits)
drm/i915: Update DRIVER_DATE to 20171117
drm/i915: Add a policy note for removing workarounds
drm/i915/selftests: Report ENOMEM clearly for an allocation failure
Revert "drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk"
drm/i915: Calculate g4x intermediate watermarks correctly
drm/i915: Calculate vlv/chv intermediate watermarks correctly, v3.
drm/i915: Pass crtc_state to ips toggle functions, v2
drm/i915: Pass idle crtc_state to intel_dp_sink_crc
drm/i915: Enable FIFO underrun reporting after initial fastset, v4.
drm/i915: Mark the userptr invalidate workqueue as WQ_MEM_RECLAIM
drm/i915: Add might_sleep() check to wait_for()
drm/i915/selftests: Add a GuC doorbells selftest
drm/i915/cnl: Extend HDMI 2.0 support to CNL.
drm/i915/cnl: Simplify dco_fraction calculation.
drm/i915/cnl: Don't blindly replace qdiv.
drm/i915/cnl: Fix wrpll math for higher freqs.
drm/i915/cnl: Fix, simplify and unify wrpll variable sizes.
drm/i915/cnl: Remove useless conversion.
drm/i915/cnl: Remove spurious central_freq.
drm/i915/selftests: exercise_ggtt may have nothing to do
...
Sagar noticed the check can be consolidated between the engine stats
implementation and the PMU.
My first choice was a static inline helper but that got into include
ordering mess quickly fast so I went with a macro instead. At some point
we should perhaps looking into taking out the non-ringubffer bits from
intel_ringbuffer.h into a new intel_engine.h or something.
v2: Use engine->flags. (Chris Wilson)
v3: Rebase and mark GuC as not yet supported. (Chris Wilson)
v4: Move flag setting to intel_engines_reset_default_submission.
(Chris Wilson)
v5: Move flag setting to logical_ring_setup.
v6: intel_engines_reset_default_submission is the wrong place to set the
flag - it needs to be in execlists_set_default_submission. (Sagar)
v7: Flag setting in logical_ring_setup is not required. (Chris)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com> (v6)
Link: https://patchwork.freedesktop.org/patch/msgid/20171129102805.22690-1-tvrtko.ursulin@linux.intel.com
Will be adding a new per-engine flags shortly so it makes sense
to consolidate.
v2: Keep the original code flow in intel_engine_cleanup_cmd_parser.
(Joonas Lahtinen)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171129082409.18189-1-tvrtko.ursulin@linux.intel.com
The legacy context switch for ringbuffer submission is multistaged,
where each of those stages may fail. However, we were updating global
state after some stages, and so we had to force the incomplete request
to be submitted because we could not unwind. Save the global state
before performing the switches, and so enable us to unwind back to the
previous global state should any phase fail. We then must cancel the
request instead of submitting it should the construction fail.
v2: s/saved_ctx/from_ctx/; s/ctx/to_ctx/ etc.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171123152631.31385-1-chris@chris-wilson.co.uk
We have agreed during the engine classes discussion that fields marked as
non-ABI are better left out altogether from uapi headers.
v2: Use a local define for maintanability. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171123100701.18430-1-tvrtko.ursulin@linux.intel.com
We can use engine busy stats instead of the sampling timer for
better accuracy.
By doing this we replace the stohastic sampling with busyness
metric derived directly from engine activity. This is context
switch interrupt driven, so as accurate as we can get from
software tracking.
As a secondary benefit, we can also not run the sampling timer
in cases only busyness metric is enabled.
v2: Rebase.
v3:
* Rebase, comments.
* Leave engine busyness controls out of workers.
v4: Checkpatch cleanup.
v5: Added comment to pmu_needs_timer change.
v6:
* Rebase.
* Fix style of some comments. (Chris Wilson)
v7: Rebase and commit message update. (Chris Wilson)
v8: Add delayed stats disabling to improve accuracy in face of
CPU hotplug events.
v9: Rebase.
v10: Rebase - i915_modparams.enable_execlists removal.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-6-tvrtko.ursulin@linux.intel.com
Track total time requests have been executing on the hardware.
We add new kernel API to allow software tracking of time GPU
engines are spending executing requests.
Both per-engine and global API is added with the latter also
being exported for use by external users.
v2:
* Squashed with the internal API.
* Dropped static key.
* Made per-engine.
* Store time in monotonic ktime.
v3: Moved stats clearing to disable.
v4:
* Comments.
* Don't export the API just yet.
v5: Whitespace cleanup.
v6:
* Rename ref to active.
* Drop engine aggregate stats for now.
* Account initial busy period after enabling stats.
v7:
* Rebase.
v8:
* Move context in notification after the notifier. (Chris Wilson)
v9:
In cases where stats tracking is getting disabled while there is
an active context on an engine, add up the current value to the
total. This also implies we don't clear the total when tracking
is disabled any longer. There is no real need to do so because
we define the stats as relative while enabled, meaning
comparison between two samples while tracking is enabled is the
valid usage. However, when busy stats will later be plugged into
the perf PMU API, it is beneficial to not reset the total, since
the PMU core likes to do some counter disable/enable cycles on
startup, and while doing so during a single long context
executing on an engine we would lose some accuracy and so make
unit testing more difficult than needs to be.
v10:
* Fix accounting for preemption.
v11:
* Rebase for i915_modparams.enable_execlists removal.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171121181852.16128-5-tvrtko.ursulin@linux.intel.com