In preparation for removing the manual EMIT_FLUSH prior to emitting the
breadcrumb implement the flush inline with writing the breadcrumb for
ringbuffer emission.
With a combined flush+breadcrumb, we can use a single operation to both
flush and after the flush is complete (post-sync) write the breadcrumb.
This gives us a strongly ordered operation that should be sufficient to
serialise the write before we emit the interrupt; and therefore we may
take the opportunity to remove the irq_seqno_barrier w/a for gen6+.
Although using the PIPECONTROL to write the breadcrumb is slower than
MI_STORE_DWORD_IMM, by combining the operations into one and removing the
extra flush (next patch) it is faster
For gen2-5, we simply combine the MI_FLUSH into the breadcrumb emission,
though maybe we could find a solution here to the seqno-vs-interrupt
issue on Ironlake by mixing up the flush? The answer is no, adding an
MI_FLUSH before the interrupt is insufficient.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181228153114.4948-2-chris@chris-wilson.co.uk
The writing is on the wall for the existence of a single execution queue
along each engine, and as a consequence we will not be able to track
dependencies along the HW queue itself, i.e. we will not be able to use
HW semaphores on gen7 as they use a global set of registers (and unlike
gen8+ we can not effectively target memory to keep per-context seqno and
dependencies).
On the positive side, when we implement request reordering for gen7 we
also can not presume a simple execution queue and would also require
removing the current semaphore generation code. So this bring us another
step closer to request reordering for ringbuffer submission!
The negative side is that using interrupts to drive inter-engine
synchronisation is much slower (4us -> 15us to do a nop on each of the 3
engines on ivb). This is much better than it was at the time of introducing
the HW semaphores and equally important userspace weaned itself off
intermixing dependent BLT/RENDER operations (the prime culprit was glyph
rendering in UXA). So while we regress the microbenchmarks, it should not
impact the user.
References: https://bugs.freedesktop.org/show_bug.cgi?id=108888
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181228140736.32606-2-chris@chris-wilson.co.uk
After we found a workaround for a hang on context load, Ben Widawsky
found confirmation that it was for an issue with waking from rc6 and
loading a context image.
The workaround from on high suggests that we should
I915_WRITE(RING_WAIT_FOR_RC6_EXIT(engine->mmio_base),
_MASKED_FIELD(RING_RC6_SEL_WRITE_ADDR_MASK,
RING_RC6_SEL_WRITE_ADDR_UPPER_LEFT));
in our rc6 setup for Haswell GT1, but on applying that we find instead
that the machine encounters a GT forcewake error and locks up.
As we are removing HW semaphore usage in the next patch, and the
suggested workaround is no improvement, we need to
decouple the PSMI workaround from HAS_SEMAPHORES to IS_HSW_GT1.
References: 2c55018347 ("drm/i915: Disable PSMI sleep messages on all rings around context switches")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181228140736.32606-1-chris@chris-wilson.co.uk
Having completed a test run of gem_eio across all machines in CI we also
observe the phenomenon (of lost interrupts after resetting the GPU) on
gen3 machines as well as the previously sighted gen6/gen7. Let's apply
the same HWSTAM workaround that was effective for gen6+ for all, as
although we haven't seen the same failure on gen4/5 it seems prudent to
keep the code the same.
As a consequence we can remove the extra setting of HWSTAM and apply the
register from a single site.
v2: Delazy and move the HWSTAM into its own function
v3: Mask off all HWSP writes on driver unload and engine cleanup.
v4: And what about the physical hwsp?
v5: No, engine->init_hw() is not called from driver_init_hw(), don't be
daft. Really scrub HWSTAM as early as we can in driver_init_mmio()
v6: Rename set_hwsp as it was setting the mask not the hwsp register.
v7: Ville pointed out that although vcs(bsd) was introduced for g4x/ilk,
per-engine HWSTAM was not introduced until gen6!
References: https://bugs.freedesktop.org/show_bug.cgi?id=108735
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181218102712.11058-1-chris@chris-wilson.co.uk
Define IS_GEN() similarly to our IS_GEN_RANGE(). but use gen instead of
gen_mask to do the comparison. Now callers can pass then gen as a parameter,
so we don't require one macro for each gen.
The following spatch was used to convert the users of these macros:
@@
expression e;
@@
(
- IS_GEN2(e)
+ IS_GEN(e, 2)
|
- IS_GEN3(e)
+ IS_GEN(e, 3)
|
- IS_GEN4(e)
+ IS_GEN(e, 4)
|
- IS_GEN5(e)
+ IS_GEN(e, 5)
|
- IS_GEN6(e)
+ IS_GEN(e, 6)
|
- IS_GEN7(e)
+ IS_GEN(e, 7)
|
- IS_GEN8(e)
+ IS_GEN(e, 8)
|
- IS_GEN9(e)
+ IS_GEN(e, 9)
|
- IS_GEN10(e)
+ IS_GEN(e, 10)
|
- IS_GEN11(e)
+ IS_GEN(e, 11)
)
v2: use IS_GEN rather than GT_GEN and compare to info.gen rather than
using the bitmask
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181212181044.15886-2-lucas.demarchi@intel.com
RANGE makes it longer, but clearer. We are also going to add a macro to
check an individual gen, so add the _RANGE prefix here.
Diff generated with:
sed 's/IS_GEN(/IS_GEN_RANGE(/g' drivers/gpu/drm/i915/{*/,}*.{c,h} -i
v2: use IS_GEN rather than GT_GEN
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181212181044.15886-1-lucas.demarchi@intel.com
Adding an extra MI_STORE_DWORD_IMM to the gpu relocation path for gen3
was good, but still not good enough. To survive 24+ hours under test we
needed to perform not one, not two but three extra store-dw. Doing so
for each GPU relocation was a little unsightly and since we need to
worry about userspace hitting the same issues, we should apply the dummy
store-dw into the EMIT_FLUSH.
Fixes: 7dd4f6729f ("drm/i915: Async GPU relocation processing")
References: 7fa28e1469 ("drm/i915: Write GPU relocs harder with gen3")
Testcase: igt/gem_tiled_fence_blits # blb/pnv
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181207134037.11848-1-chris@chris-wilson.co.uk
Currently we allocate a scratch page for each engine, but since we only
ever write into it for post-sync operations, it is not exposed to
userspace nor do we care for coherency. As we then do not care about its
contents, we can use one page for all, reducing our allocations and
avoid complications by not assuming per-engine isolation.
For later use, it simplifies engine initialisation (by removing the
allocation that required struct_mutex!) and means that we can always rely
on there being a scratch page.
v2: Check that we allocated a large enough scratch for I830 w/a
Fixes: 06e562e7f515 ("drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5") # v4.18.20
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108850
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181204141522.13640-1-chris@chris-wilson.co.uk
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.18.20+
Convert the per context workaround handling code to run against the newly
introduced common workaround framework and fuse the two to use the
existing smarter list add helper, the one which does the sorted insert and
merges registers where possible.
This completes migration of all four classes of workarounds onto the
common framework.
Existing macros are kept untouched for smaller code churn.
v2:
* Rename to list name ctx_wa_list and move from dev_priv to engine.
v3:
* API rename and parameters tweaking. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20181203133357.10341-1-tvrtko.ursulin@linux.intel.com
Exercising the gpu reloc path strenuously revealed an issue where the
updated relocations (from MI_STORE_DWORD_IMM) were not being observed
upon execution. After some experiments with adding pipecontrols (a lot
of pipecontrols (32) as gen4/5 do not have a bit to wait on earlier pipe
controls or even the current on), it was discovered that we merely
needed to delay the EMIT_INVALIDATE by several flushes. It is important
to note that it is the EMIT_INVALIDATE as opposed to the EMIT_FLUSH that
needs the delay as opposed to what one might first expect -- that the
delay is required for the TLB invalidation to take effect (one presumes
to purge any CS buffers) as opposed to a delay after flushing to ensure
the writes have landed before triggering invalidation.
Testcase: igt/gem_tiled_fence_blits
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181105094305.5767-1-chris@chris-wilson.co.uk
We mix hexa- and decimal which is confusing when reading the logs. So make
the single odd one out instance decimal for consistency.
v2:
* Do the intel_ringbuffer.c as well. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180926145033.16318-1-tvrtko.ursulin@linux.intel.com
Continuing the fun of trying to find exactly the delay that is
sufficient to ensure that the page directory is fully loaded between
context switches, move the extra flush added in commit 70b73f9ac1
("drm/i915/ringbuffer: Delay after invalidating gen6+ xcs") to just
after we flush the pd. Entirely based on the empirical data of running
failing tests in a loop until we survive a day (before the mtbf is 10-30
minutes).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107769
References: 70b73f9ac1 ("drm/i915/ringbuffer: Delay after invalidating gen6+ xcs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180904063802.13880-1-chris@chris-wilson.co.uk
Older gen use a physical address for the hardware status page, for which
we use cache-coherent writes. As the writes are into the cpu cache, we use
a normal WB mapped page to read the HWS, used for our seqno tracking.
Anecdotally, I observed lost breadcrumbs writes into the HWS on i965gm,
which so far have not reoccurred with this patch. How reliable that
evidence is remains to be seen.
v2: Explicitly pass the expected physical address to the hw
v3: Also remember the wild writes we once had for HWS above 4G.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20180903152304.31589-2-chris@chris-wilson.co.uk
During stress testing of full-ppgtt (on Baytrail at least), we found
that the invalidation around a context/mm switch was insufficient (writes
would go astray). Adding a second MI_FLUSH_DW barrier prevents this, but
it is unclear as to whether this is merely a delaying tactic or if it is
truly serialising with the TLB invalidation. Either way, it is
empirically required.
v2: Avoid the loop for readability;
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107715
References: https://bugs.freedesktop.org/show_bug.cgi?id=107759
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180830161042.29193-1-chris@chris-wilson.co.uk
This reapplies commit 39f3be162c ("drm/i915: Kick waiters on resetting
legacy rings") after the improved gem_eio was run across all machines we
found that gen3 and early gen4 still lost the immediate interrupt
following reset, and the HWSTAM w/a applied to gen6+ is inadequate.
Unlike the later gen, on gen3/4 the principle (and only tests to fail so
far) are the wait vs reset test cases, whereas the reset stress case
works fine (which was the predominantly failing case for gen6+). That is
enough to suggest the underlying issue is sufficiently different to
support the difference in HWSTAM efficacy.
Testcase: igt/gem_eio/wait-10ms
References: 39f3be162c ("drm/i915: Kick waiters on resetting legacy rings")
References: a69ab52b03 ("drm/i915: Remove extra waiter kick on legacy resets")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180814104056.27001-1-chris@chris-wilson.co.uk
Now with a more efficacious workaround for the lost interrupts after
reset, we can remove the hack of kicking the waiters after reset. The
issue was that the kick only worked for the immediate window after the
reset (those seqno that would complete in the time it took for the
waiter thread to perform its check) but miss any seqno that lacked an
interrupt afterwards.
References: 39f3be162c ("drm/i915: Kick waiters on resetting legacy rings")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180808105101.913-3-chris@chris-wilson.co.uk
An oddity occurs on Sandybridge, Ivybridge and Haswell (and presumably
Valleyview) in that for the period following the GPU restart after a
reset, there are no GT interrupts received. From Ville's notes, bit 0 in
the HWSTAM corresponds to the render interrupt, and if we unmask it we
do see immediate resumption of GT interrupt delivery (via the master irq
handler) after the reset.
v2: Limit the w/a to the render interrupt from rcs
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107500
Fixes: c549808946 ("drm/i915: Mask everything in ring HWSTAM on gen6+ in ringbuffer mode")
References: d420a50c21 ("drm/i915: Clean up the HWSTAM mess")
Testcase: igt/gem_eio/reset-stress
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180808105101.913-2-chris@chris-wilson.co.uk
After disabling resource streamer on ICL (due to it actually not
existing there), I got feedback that there have been some experimental
patches for mesa to use RS years ago, but nothing ever landed or shipped
because there was no performance improvement.
This removes it from kernel keeping the uapi defines around for
compatibility.
v2: - re-add the inadvertent removal of CTX_CTRL_INHIBIT_SYN_CTX_SWITCH
- don't bother trying to document removed params on uapi header:
applications should know that from the query.
(from Chris)
v3: - disable CTX_CTRL_RS_CTX_ENABLE istead of removing it
- reword commit message after Daniele confirmed no performance
regression on his machine
- reword commit message to make clear RS is being removed due to
never been used
v4: - move I915_EXEC_RESOURCE_STREAMER to __I915_EXEC_ILLEGAL_FLAGS so
the check on ioctl() is made much earlier by
i915_gem_check_execbuffer() (suggested by Tvrtko)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180803232443.17193-1-lucas.demarchi@intel.com
Since ggtt_offset_bias is now stored in ggtt.pin_bias, it is duplicated
inside i915_gem_context, and can instead be accessed directly from ggtt.
v3:
Added a helper function to retrieve the ggtt.pin_bias from the vma.
v4:
Moved the helper function to the previous patch in the series.
Dropped the bias from intel_ring_pin. This introduces a slight functional
change since we are always pinning the ring a bit higher if GuC is present
even though we don't really need to.
v8:
Fixed patch not applying on the most recent upstream.
Signed-off-by: Jakub Bartmiński <jakub.bartminski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180727141148.30874-4-jakub.bartminski@intel.com
Using PAGE_SIZE for virtual offset alignment is superfluous as it is
equal to the minimum gtt alignment and so equivalent to 0. It is also
the wrong value to use as we stopped using physical page constructs for
the virtual GTT, i.e. it would be preferrable to use I915_GTT_PAGE_SIZE
and in these cases merely imply I915_GTT_MIN_ALIGNMENT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180727091855.1879-1-chris@chris-wilson.co.uk
power well support and begin of DSI support addition. Also there were many improvements
on execlists and interrupts for minimal latency on command submission; and many fixes
on selftests, mostly caught by our CI.
General driver:
- Clean-up on aux irq (Lucas)
- Mark expected switch fall-through for dealing with static analysis tools (Gustavo)
Gem:
- Different fixes for GuC (Chris, Anusha, Michal)
- Avoid self-relocation BIAS if no relocation (Chris)
- Improve debugging cases in on EINVAL return and vma allocation (Chris)
- Fixes and improvements on context destroying and freeing (Chris)
- Wait for engines to idle before retiring (Chris)
- Many improvements on execlists and interrupts for minimal latency on command submission (Chris)
- Many fixes in selftests, specially on cases highlighted on CI (Chris)
- Other fixes and improvements around GGTT (Chris)
- Prevent background reaping of active objects (Chris)
Display:
- Parallel modeset cleanup to fix driver reset (Chris)
- Get AUX power domain for DP main link (Imre)
- Clean-up on PSR unused func pointers (Rodrigo)
- Many PSR/PSR2 fixes and improvements (DK, Jose, Tarun)
- Add a PSR1 live status (Vathsala)
- Replace old drm_*_{un/reference} with put,get functions (Thomas)
- FBC fixes (Maarten)
- Abstract and document the usage of picking macros (Jani)
- Remove unnecessary check for unsupported modifiers for NV12. (DK)
- Interrupt fixes for display (Ville)
- Clean up on sdvo code (Ville)
- Clean up on current DSI code (Jani)
- Remove support for legacy debugfs crc interface (Maarten)
- Simplify get_encoder_power_domains (Imre)
Icelake:
- MG PLL fixes (Imre)
- Add hw workaround for alpha blending (Vandita)
- Add power well support (Imre)
- Add Interrupt Support (Anusha)
- Start to add support for DSI on Ice Lake (Madhav)
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJbQ+ShAAoJEPpiX2QO6xPKas0H/igf9RFubtkMK7gHTef4FM+d
Bg+Qaq+O1vXlS/gidimL4NsVp1FxkejuCab0IffbTMvvjY0mv5NUA3kiIreAB0QZ
XO2hXr4fjjOINAQrdv5wiVMOqRjDws+fPgFFgZ8s5h1aJbofO27fjY/1MNtHwcA0
8VgtABpk+D3mkWvI8VTL0jCjYk2KocEvqUciz/Y7SQcPGV1iYFXqgBt5PR//rSvP
DU3u4R3KJGLDFbQwbe3uz2GxMfodAI6ijrqFeiizNSVqZORdTwnWlzKi6b6Cj9gl
SuleZacHPfv/+Ia7jmbmBqJEqi2GiAs948ne8QWL5/hsB9MMFO/UzwX/wYLNrP4=
=w6zC
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-next-2018-07-09' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
Higlights here goes to many PSR fixes and improvements; to the Ice lake work with
power well support and begin of DSI support addition. Also there were many improvements
on execlists and interrupts for minimal latency on command submission; and many fixes
on selftests, mostly caught by our CI.
General driver:
- Clean-up on aux irq (Lucas)
- Mark expected switch fall-through for dealing with static analysis tools (Gustavo)
Gem:
- Different fixes for GuC (Chris, Anusha, Michal)
- Avoid self-relocation BIAS if no relocation (Chris)
- Improve debugging cases in on EINVAL return and vma allocation (Chris)
- Fixes and improvements on context destroying and freeing (Chris)
- Wait for engines to idle before retiring (Chris)
- Many improvements on execlists and interrupts for minimal latency on command submission (Chris)
- Many fixes in selftests, specially on cases highlighted on CI (Chris)
- Other fixes and improvements around GGTT (Chris)
- Prevent background reaping of active objects (Chris)
Display:
- Parallel modeset cleanup to fix driver reset (Chris)
- Get AUX power domain for DP main link (Imre)
- Clean-up on PSR unused func pointers (Rodrigo)
- Many PSR/PSR2 fixes and improvements (DK, Jose, Tarun)
- Add a PSR1 live status (Vathsala)
- Replace old drm_*_{un/reference} with put,get functions (Thomas)
- FBC fixes (Maarten)
- Abstract and document the usage of picking macros (Jani)
- Remove unnecessary check for unsupported modifiers for NV12. (DK)
- Interrupt fixes for display (Ville)
- Clean up on sdvo code (Ville)
- Clean up on current DSI code (Jani)
- Remove support for legacy debugfs crc interface (Maarten)
- Simplify get_encoder_power_domains (Imre)
Icelake:
- MG PLL fixes (Imre)
- Add hw workaround for alpha blending (Vandita)
- Add power well support (Imre)
- Add Interrupt Support (Anusha)
- Start to add support for DSI on Ice Lake (Madhav)
Signed-off-by: Dave Airlie <airlied@redhat.com>
# gpg: Signature made Tue 10 Jul 2018 08:41:37 AM AEST
# gpg: using RSA key FA625F640EEB13CA
# gpg: Good signature from "Rodrigo Vivi <rodrigo.vivi@intel.com>"
# gpg: aka "Rodrigo Vivi <rodrigo.vivi@gmail.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6D20 7068 EEDD 6509 1C2C E2A3 FA62 5F64 0EEB 13CA
Link: https://patchwork.freedesktop.org/patch/msgid/20180710234349.GA16562@intel.com
If the user has created a read-only object, they should not be allowed
to circumvent the write protection by using a GGTT mmapping. Deny it.
Also most machines do not support read-only GGTT PTEs, so again we have
to reject attempted writes. Fortunately, this is known a priori, so we
can at least reject in the call to create the mmap (with a sanity check
in the fault handler).
v2: Check the vma->vm_flags during mmap() to allow readonly access.
v3: Remove VM_MAYWRITE to curtail mprotect()
Testcase: igt/gem_userptr_blits/readonly_mmap*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> #v1
Reviewed-by: Jon Bloomfield <jon.bloomfield@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180712185315.3288-4-chris@chris-wilson.co.uk
Hook up the flags to allow read-only ppGTT mappings for gen8+
v2: Include a selftest to check that writes to a readonly PTE are
dropped
v3: Don't duplicate cpu_check() as we can just reuse it, and even worse
don't wholesale copy the theory-of-operation comment from igt_ctx_exec
without changing it to explain the intention behind the new test!
v4: Joonas really likes magic mystery values
Signed-off-by: Jon Bloomfield <jon.bloomfield@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180712185315.3288-2-chris@chris-wilson.co.uk
Across a reset, the seqno (and thus hangcheck) should restart and the
hangcheck naturally progress, for when it does not, we want to declare an
emergency. Currently, we only detect if reset and reinit fails, but we
do not detect if the call to reinit succeeds but the HW is fried - as we
are resetting hangcheck on initialisation the engine. Remove that and
rely on the natural progress to reset the hangcheck timer.
References: e21b141376 ("drm/i915: Mark the hangcheck as idle when unparking the engines")
References: 1fd00c0fae ("drm/i915: Declare the driver wedged if hangcheck makes no progress")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180709130208.11730-2-chris@chris-wilson.co.uk
on all platforms back to HSW. As well many other fix and improvements,
Including:
- Use GEM suspend when aborting initialization (Chris)
- Change i915_gem_fault to return vm_fault_t (Chris)
- Expand VMA to Non gem object entities (Chris)
- Improve logs for load failure, but quite logging on fault injection to avoid noise on CI (Chris)
- Other page directory handling fixes and improvements for gen6 (Chris)
- Other gtt clean-up removing redundancies and unused checks (Chris)
- Reorder aliasing ppgtt fini (Chris)
- Refactor of unsetting obg->mm.pages (Chris)
- Apply batch location restrictions before pinning (Chris)
- Ringbuffer fixes for context restore (Chris)
- Execlist fixes on freeing error pointer on allocation error (Chris)
- Make closing request flush mandatory (Chris)
- Move GEM sanitize from resume_early to resume (Chris)
- Improve debug dumps (Chris)
- Silent compiler for selftest (Chris)
- Other execlists changes to improve hangcheck and reset.
- Many gtt page directory fixes and improvements (Chris)
- Reorg context workarounds (Chris)
- Avoid ERR_PTR dereference on selftest (Chris)
Other GEM related work:
- Stop trying to reset GPU if reset failed (Mika)
- Add HW workaround for KBL to fix GPU reset (Mika)
- Fix context ban and hang accounting for client (Mika)
- Fixes on OA perf (Michel, Jani)
- Refactor on GuC log mechanisms (Piotr)
- Enable provoking vertex fix on Gen9 system (Kenneth)
More ICL patches for Display enabling:
- ICL - 10-bit support for HDMI (RK)
- ICL - Start adding TBT PLL (Paulo)
- ICL - DDI HDMK level selection (Manasi)
- ICL - GMBUS GPIO pin mapping fix (Mahesh)
- ICL - Adding DP_AUX_E support (James)
- ICL - Display interrupts handling (DK)
Other display fixes and improvements:
- Fix sprite destination color keying on SKL+ (Ville)
- Fixes and improvements on PCH detection, specially for non PCH systems (Jani)
- Document PCH_NOP (Lucas)
- Allow DBLSCAN user modes with eDP/LVDS/DSI (Ville)
- Opregion and ACPI cleanup and organization (Jani)
- Kill delays when activation psr (Rodrigo)
- ...and a consequent fix of the psr activation flow (DK)
- Fix HDMI infoframe setting (Imre)
- Fix Display interrupts and modes on old gens (Ville)
- Start switching to kernel unsigned int types (Jani)
- Introduction to Amber Lake and Whiskey Lake platforms (Jose)
- Audio clock fixes for HBR3 (RK)
- Standardize i915_reg.h definitions according to our doc and checkpatch (Paulo)
- Remove unused timespec_to_jiffies_timeout function (Arnd)
- Increase the scope of PSR wake fix for other VBTs out there (Vathsala)
- Improve debug msgs with prop name/id (Ville)
- Other clean up on unecessary cursor size defines (Ville)
- Enforce max hdisplay/hblank_start limits on HSW/BDW (Ville)
- Make ELD pointers constant (Jani)
- Fix for PSR VBT parse (Colin)
- Add warn about unsupported CDCLK rates (Imre)
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJbKsMqAAoJEPpiX2QO6xPKI64H/0dHkMxw7/D83eODTJteDFBN
h3tdBnLFlPfeG3ZWDeSs04/dM4e9YacMN7v53j1ia4eW/F1ms0TLcegcuPqYafTW
H8fhwGB2B5gmr5hLfh5joQkxvaucQMFdg95fWRqir93VrKvVJAJEYNcaiGniejDf
qqiZue6DgAzli0zjAprfbQsnJ17TyRtnxm8lLIcFcHPoayHBzAUBZQEP6cA5qe/Y
/2ahGfkYOVVWY08DHaioDBOLUEUbxCC1AvMlv9VbtKmyPoQjTIW/1iTq0RRxDoGb
BwfDvigSiFAmpYEfVENB0qUd9e/0WhMboSnMrfzEcF2yUn4xoJx5nbmkRFkr1jI=
=mfO6
-----END PGP SIGNATURE-----
Merge tag 'drm-intel-next-2018-06-20' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
Chris is doing many reworks that allow us to get full-ppgtt supported
on all platforms back to HSW. As well many other fix and improvements,
Including:
- Use GEM suspend when aborting initialization (Chris)
- Change i915_gem_fault to return vm_fault_t (Chris)
- Expand VMA to Non gem object entities (Chris)
- Improve logs for load failure, but quite logging on fault injection to avoid noise on CI (Chris)
- Other page directory handling fixes and improvements for gen6 (Chris)
- Other gtt clean-up removing redundancies and unused checks (Chris)
- Reorder aliasing ppgtt fini (Chris)
- Refactor of unsetting obg->mm.pages (Chris)
- Apply batch location restrictions before pinning (Chris)
- Ringbuffer fixes for context restore (Chris)
- Execlist fixes on freeing error pointer on allocation error (Chris)
- Make closing request flush mandatory (Chris)
- Move GEM sanitize from resume_early to resume (Chris)
- Improve debug dumps (Chris)
- Silent compiler for selftest (Chris)
- Other execlists changes to improve hangcheck and reset.
- Many gtt page directory fixes and improvements (Chris)
- Reorg context workarounds (Chris)
- Avoid ERR_PTR dereference on selftest (Chris)
Other GEM related work:
- Stop trying to reset GPU if reset failed (Mika)
- Add HW workaround for KBL to fix GPU reset (Mika)
- Fix context ban and hang accounting for client (Mika)
- Fixes on OA perf (Michel, Jani)
- Refactor on GuC log mechanisms (Piotr)
- Enable provoking vertex fix on Gen9 system (Kenneth)
More ICL patches for Display enabling:
- ICL - 10-bit support for HDMI (RK)
- ICL - Start adding TBT PLL (Paulo)
- ICL - DDI HDMK level selection (Manasi)
- ICL - GMBUS GPIO pin mapping fix (Mahesh)
- ICL - Adding DP_AUX_E support (James)
- ICL - Display interrupts handling (DK)
Other display fixes and improvements:
- Fix sprite destination color keying on SKL+ (Ville)
- Fixes and improvements on PCH detection, specially for non PCH systems (Jani)
- Document PCH_NOP (Lucas)
- Allow DBLSCAN user modes with eDP/LVDS/DSI (Ville)
- Opregion and ACPI cleanup and organization (Jani)
- Kill delays when activation psr (Rodrigo)
- ...and a consequent fix of the psr activation flow (DK)
- Fix HDMI infoframe setting (Imre)
- Fix Display interrupts and modes on old gens (Ville)
- Start switching to kernel unsigned int types (Jani)
- Introduction to Amber Lake and Whiskey Lake platforms (Jose)
- Audio clock fixes for HBR3 (RK)
- Standardize i915_reg.h definitions according to our doc and checkpatch (Paulo)
- Remove unused timespec_to_jiffies_timeout function (Arnd)
- Increase the scope of PSR wake fix for other VBTs out there (Vathsala)
- Improve debug msgs with prop name/id (Ville)
- Other clean up on unecessary cursor size defines (Ville)
- Enforce max hdisplay/hblank_start limits on HSW/BDW (Ville)
- Make ELD pointers constant (Jani)
- Fix for PSR VBT parse (Colin)
- Add warn about unsupported CDCLK rates (Imre)
Signed-off-by: Dave Airlie <airlied@redhat.com>
# gpg: Signature made Thu 21 Jun 2018 07:12:10 AM AEST
# gpg: using RSA key FA625F640EEB13CA
# gpg: Good signature from "Rodrigo Vivi <rodrigo.vivi@intel.com>"
# gpg: aka "Rodrigo Vivi <rodrigo.vivi@gmail.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 6D20 7068 EEDD 6509 1C2C E2A3 FA62 5F64 0EEB 13CA
Link: https://patchwork.freedesktop.org/patch/msgid/20180625165622.GA21761@intel.com
Due to how we only release the pining on the context state on
retirement and never track activity on the context vma itself, the
object can never be active at the point of release. Replace the
conditional transfer of ownership onto an active-reference with an
assert that the object is idle.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180625100604.22598-2-chris@chris-wilson.co.uk
We got a few conflicts in drm_atomic.c after merging the DRM writeback support,
now we need a backmerge to unlock develop development on drm-misc-next.
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.com>
commit b2209e62a4 ("drm/i915/execlists: Reset the CSB head tracking on
reset/sanitization") and commit 1288786b18 ("drm/i915: Move GEM sanitize
from resume_early to resume") show the conflicting requirements on the
code. We must reset the GPU before trashing live state on a fast resume
(hibernation debug, or error paths), but we must only reset our state
tracking iff the GPU is reset (or power cycled). This is tricky if we
are disabling GPU reset to simulate broken hardware; we reset our state
tracking but the GPU is left intact and recovers from its stale state.
v2: Again without the assertion for forcewake, no longer required since
commit b3ee09a4de ("drm/i915/ringbuffer: Fix context restore upon reset")
as the contexts are reset from the CS ensuring everything is powered up.
Fixes: b2209e62a4 ("drm/i915/execlists: Reset the CSB head tracking on reset/sanitization")
Fixes: 1288786b18 ("drm/i915: Move GEM sanitize from resume_early to resume")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180616202534.18767-1-chris@chris-wilson.co.uk
In order to be able to evict the gen6 ppgtt, we have to unpin it at some
point. We can simply use our context activity tracking to know when the
ppgtt is no longer in use by hardware, and so only keep it pinned while
being used a request.
For the kernel_context (and thus aliasing_ppgtt), it remains pinned at
all times, as the kernel_context itself is pinned at all times.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180614094103.18025-5-chris@chris-wilson.co.uk
After triggering the mm switch with a load of PD_DIR, which may be
deferred unto the MI_SET_CONTEXT on rcs, serialise the next commands
with that load by posting a read of PD_DIR (or else those subsequent
commands may access the stale page tables).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180611171825.13678-2-chris@chris-wilson.co.uk
The HW only accepts offsets within ring->size, and fails peculiarly if
the RING_HEAD or RING_TAIL is set to ring->size. Therefore whenever we
set ring->head/ring->tail we want to make sure it is within value (using
intel_ring_wrap()).
v2: Double check execlists as well
v3: Remove redundancy with assert_ring_tail_valid()
v4: Just assert in intel_ring_reset() rather than be over-defensive.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> #v2
Link: https://patchwork.freedesktop.org/patch/msgid/20180611110845.31890-2-chris@chris-wilson.co.uk
The discovery with trying to enable full-ppgtt was that we were
completely failing to the load both the mm and context following the
reset. Although we were performing mmio to set the PP_DIR (per-process
GTT) and CCID (context), these were taking no effect (the assumption was
that this would trigger reload of the context and restore the page
tables). It was not until we performed the LRI + MI_SET_CONTEXT in a
following context switch would anything occur.
Since we are then required to reset the context image and PP_DIR using
CS commands, we place those commands into every batch. The hardware
should recognise the no-ops and eliminate the expensive context loads,
but we still have to pay the cost of using cross-powerwell register
writes. In practice, this has no effect on actual context switch times,
and only adds a few hundred nanoseconds to no-op switches. We can improve
the latter by eliminating the w/a around known no-op switches, but there
is an ulterior motive to keeping them.
Always emitting the context switch at the beginning of the request (and
relying on HW to skip unneeded switches) does have one key advantage.
Should we implement request reordering on Haswell, we will not know in
advance what the previous executing context was on the GPU and so we
would not be able to elide the MI_SET_CONTEXT commands ourselves and
always have to emit them. Having our hand forced now actually prepares
us for later.
Now since that context and mm follow the request, we no longer (and not
for a long time since requests took over!) require a trace point to tell
when we write the switch into the ring, since it is always. (This is
even more important when you remember that simply writing into the ring
bears no relation to the current mm.)
v2: Sandybridge has to agree to use LRI as well.
Testcase: igt/drv_selftests/live_hangcheck
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180611110845.31890-1-chris@chris-wilson.co.uk
An issue encountered with switching mm on gen7 is that the GPU likes to
hang (with the VS unit busy) when told to force restore the current
context. We can simply workaround this by substituting the
MI_FORCE_RESTORE flag with a round-trip through the kernel_context,
forcing the context to be saved and restored; thereby reloading the
PP_DIR registers and updating the modified page directory!
v2: Undo attempted optimisation in caller (Tvrtko)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180611104808.24295-1-chris@chris-wilson.co.uk
In the near future, I want to subclass gen6_hw_ppgtt as it contains a
few specialised members and I wish to add more. To avoid the ugliness of
using ppgtt->base.base, rename the i915_hw_ppgtt base member
(i915_address_space) as vm, which is our common shorthand for an
i915_address_space local.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180605153758.18422-1-chris@chris-wilson.co.uk
Currently, we have a special routine for pinning the context state at
the start of activity tracking, but lack the complementary unpin
routine. Create it to to ease later patches that want to do partial
teardown on error, and, not least, to improve the readability of the
code.
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180605085348.3018-1-chris@chris-wilson.co.uk
If we can use an unmappable ring, try to pin it out of the mappable
aperture. This simple layout preference is to try and keep the mappable
aperture reserved and available to handle GGTT mmapping requests from
userspace without causing evictions and GPU stalls.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180521082131.13744-4-chris@chris-wilson.co.uk
As all backends implement the same pin_count mechanism and do a
dec-and-test as their first step, pull that into the common
intel_context_unpin(). This also pulls into the caller, eliminating the
indirect call in the usual steady state case. The intel_context_pin()
side is a little more complicated as it combines the lookup/alloc as
well as pinning the state, and so is left for a later date.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-4-chris@chris-wilson.co.uk
To ease the frequent and ugly pointer dance of
&request->gem_context->engine[request->engine->id] during request
submission, store that pointer as request->hw_context. One major
advantage that we will exploit later is that this decouples the logical
context state from the engine itself.
v2: Set mock_context->ops so we don't crash and burn in selftests.
Cleanups from Tvrtko.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-3-chris@chris-wilson.co.uk