This trims a little overhead from the common case of not needing to
synchronize between rings.
v2: execlists is special and likes to duplicate code.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently, we only track the last request globally across all engines.
This prevents us from issuing concurrent read requests on e.g. the RCS
and BCS engines (or more likely the render and media engines). Without
semaphores, we incur costly stalls as we synchronise between rings -
greatly impacting the current performance of Broadwell versus Haswell in
certain workloads (like video decode). With the introduction of
reference counted requests, it is much easier to track the last request
per ring, as well as the last global write request so that we can
optimise inter-engine read read requests (as well as better optimise
certain CPU waits).
v2: Fix inverted readonly condition for nonblocking waits.
v3: Handle non-continguous engine array after waits
v4: Rebase, tidy, rewrite ring list debugging
v5: Use obj->active as a bitfield, it looks cool
v6: Micro-optimise, mostly involving moving code around
v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
v8: Rebase
v9: Refactor i915_gem_object_sync() to allow the compiler to better
optimise it.
Benchmark: igt/gem_read_read_speed
hsw:gt3e (with semaphores):
Before: Time to read-read 1024k: 275.794µs
After: Time to read-read 1024k: 123.260µs
hsw:gt3e (w/o semaphores):
Before: Time to read-read 1024k: 230.433µs
After: Time to read-read 1024k: 124.593µs
bdw-u (w/o semaphores): Before After
Time to read-read 1x1: 26.274µs 10.350µs
Time to read-read 128x128: 40.097µs 21.366µs
Time to read-read 256x256: 77.087µs 42.608µs
Time to read-read 512x512: 281.999µs 181.155µs
Time to read-read 1024x1024: 1196.141µs 1118.223µs
Time to read-read 2048x2048: 5639.072µs 5225.837µs
Time to read-read 4096x4096: 22401.662µs 21137.067µs
Time to read-read 8192x8192: 89617.735µs 85637.681µs
Testcase: igt/gem_concurrent_blit (read-read and friends)
Cc: Lionel Landwerlin <lionel.g.landwerlin@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v8]
[danvet: s/\<rq\>/req/g]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If an batch ends while the IRQs are not turned on the notification can
go missing and the GPU can hang. So generate a warning in this case.
Signed-off-by: Peter Antoine <peter.antoine@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We switched from calling i915_gem_alloc_context_obj() to calling
i915_gem_alloc_object() so the error handling needs to be updated to
check for NULL instead of IS_ERR().
Fixes: 149c86e74f ('drm/i915: Allocate context objects from stolen')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
drm-intel-next-2015-04-23:
- dither support for ns2501 dvo (Thomas Richter)
- some polish for the gtt code and fixes to finally enable the cmd parser on hsw
- first pile of bxt stage 1 enabling (too many different people to list ...)
- more psr fixes from Rodrigo
- skl rotation support from Chandra
- more atomic work from Ander and Matt
- pile of cleanups and micro-ops for execlist from Chris
drm-intel-next-2015-04-10:
- cdclk handling cleanup and fixes from Ville
- more prep patches for olr removal from John Harrison
- gmbus pin naming rework from Jani (prep for bxt)
- remove ->new_config from Ander (more atomic conversion work)
- rps (boost) tuning and unification with byt/bsw from Chris
- cmd parser batch bool tuning from Chris
- gen8 dynamic pte allocation (Michel Thierry, based on work from Ben Widawsky)
- execlist tuning (not yet all of it) from Chris
- add drm_plane_from_index (Chandra)
- various small things all over
* tag 'drm-intel-next-2015-04-23-fixed' of git://anongit.freedesktop.org/drm-intel: (204 commits)
drm/i915/gtt: Allocate va range only if vma is not bound
drm/i915: Enable cmd parser to do secure batch promotion for aliasing ppgtt
drm/i915: fix intel_prepare_ddi
drm/i915: factor out ddi_get_encoder_port
drm/i915/hdmi: check port in ibx_infoframe_enabled
drm/i915/hdmi: fix vlv infoframe port check
drm/i915: Silence compiler warning in dvo
drm/i915: Update DRIVER_DATE to 20150423
drm/i915: Enable dithering on NatSemi DVO2501 for Fujitsu S6010
rm/i915: Move i915_get_ggtt_vma_pages into ggtt_bind_vma
drm/i915: Don't try to outsmart gcc in i915_gem_gtt.c
drm/i915: Unduplicate i915_ggtt_unbind/bind_vma
drm/i915: Move ppgtt_bind/unbind around
drm/i915: move i915_gem_restore_gtt_mappings around
drm/i915: Fix up the vma aliasing ppgtt binding
drm/i915: Remove misleading comment around bind_to_vm
drm/i915: Don't use atomics for pg_dirty_rings
drm/i915: Don't look at pg_dirty_rings for aliasing ppgtt
drm/i915/skl: Support Y tiling in MMIO flips
drm/i915: Fixup kerneldoc for struct intel_context
...
Conflicts:
drivers/gpu/drm/i915/i915_drv.c
WaIdleLiteRestore is an execlists-only workaround, and requires the driver
to ensure that any context always has HEAD!=TAIL when attempting lite
restore.
Add two extra MI_NOOP instructions at the end of each request, but keep
the requests tail pointing before the MI_NOOPs. We may not need to
executed them, and this is why request->tail is sampled before adding
these extra instructions.
If we submit a context to the ELSP which has previously been submitted,
move the tail pointer past the MI_NOOPs. This ensures HEAD!=TAIL.
v2: Move overallocation to gen8_emit_request, and added note about
sampling request->tail in commit message (Chris).
v3: Remove redundant request->tail assignment in __i915_add_request, in
lrc mode this is already set in execlists_context_queue.
Do not add wa implementation details inside gem (Chris).
v4: Apply the wa whenever the req has been resubmitted and update
comment (Chris).
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Separate topic branch for bxt didn't work out since we needed to
refactor the gmbus code a bit to make it look decent. So backmerge.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
On GEN9+ per specification a NULL PIPE_CONTROL needs to be emitted
before any PIPE_CONTROL command with the VS_INVALIDATE flag set.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
After commit d7b9ca2f7a
("drm/i915: Remove request->uniq")
dev_priv is no longer needed.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we never expose context objects directly to userspace, we can forgo
allocating a first-class GEM object for them and prefer to use the
limited resource of reserved/stolen memory for them. Note this means
that their initial contents are undefined.
However, a downside of using stolen objects for execlists is that we
cannot access the physical address directly (thanks MCH!) which prevents
their use.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We already assign a unique identifier to every request: seqno. That
someone felt like adding a second one without even mentioning why and
tweaking ABI smells very fishy.
Fixes regression from
commit b3a38998f0
Author: Nick Hoath <nicholas.hoath@intel.com>
Date: Thu Feb 19 16:30:47 2015 +0000
drm/i915: Fix a use after free, and unbalanced refcounting
v2: Rebase
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Thomas Daniel <thomas.daniel@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jani Nikula <jani.nikula@intel.com>
[danvet: Fixup because different merge order.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This eliminates six needless spin lock/unlock pairs when writing out
ELSP.
v2: Respin with my preferred colour.
v3: Mostly back to the original colour
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v1]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
After the removal of DRI1, all access to the rings are through requests
and so we can always be sure that there is a request to wait upon to
free up available space. The fallback code only existed so that we could
quiesce the GPU following unmediated access by DRI1.
v2: Rebase
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When we submit a request to the GPU, we first take the rpm wakelock, and
only release it once the GPU has been idle for a small period of time
after all requests have been complete. This means that we are sure no
new interrupt can arrive whilst we do not hold the rpm wakelock and so
can drop the individual get/put around every single request inside
execlists.
Note: to close one potential issue we should mark the GPU as busy
earlier in __i915_add_request.
To elaborate: The issue is that we emit the irq signalling sequence
before we grab the rpm reference, which means we could miss the
resulting interrupt (since that's not set up when suspended). The only
bad side effect is a missed interrupt, gt mmio writes automatically
wake up the hw itself. But otoh we have an umbrella rpm reference for
the entirety of execbuf, as long as that's there we're covered.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Explain a bit more about the add_request issue, which after
some irc chatting with Chris turns out to not be an issue really.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We can use the simpler spinlock form to disable interrupts as we are
always outside of an irq/softirq handler.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This finishes off the dynamic page tables allocations, in the legacy 3
level style that already exists. Most everything has already been setup
to this point, the patch finishes off the enabling by setting the
appropriate function pointers.
In LRC mode, contexts need to know the PDPs when they are populated. With
dynamic page table allocations, these PDPs may not exist yet. Check if
PDPs have been allocated and use the scratch page if they do not exist yet.
Before submission, update the PDPs in the logic ring context as PDPs
have been allocated.
v2: Update aliasing/true ppgtt allocate/teardown/clear functions for
gen 6 & 7.
v3: Rebase.
v4: Remove BUG() from ppgtt_unbind_vma, but keep checking that either
teardown_va_range or clear_range functions exist (Daniel).
v5: Similar to gen6, in init, gen8_ppgtt_clear_range call is only needed
for aliasing ppgtt. Zombie tracking was originally added for teardown
function and is no longer required.
v6: Update err_out case in gen8_alloc_va_range (missed from lastest
rebase).
v7: Rebase after s/page_tables/page_table/.
v8: Updated scratch_pt check after scratch flag was removed in previous
patch.
v9: Note that lrc mode needs to be updated to support init state without
any PDP.
v10: Unmap correct page_table in gen8_alloc_va_range's error case, clean-up
gen8_aliasing_ppgtt_init (remove duplicated map), and initialize PTs
during page table allocation.
v11: Squashed LRC enabling commit, otherwise LRC mode would be left broken
until it was updated to handle the init case without any PDP.
v12: Do not overallocate new_pts bitmap, make alloc_gen8_temp_bitmaps
static and don't abuse of inline functions. (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When we do dynamic page table allocations for gen8, we'll need to have
more control over how and when we map page tables, similar to gen6.
In particular, DMA mappings for page directories/tables occur at allocation
time.
This patch adds the functionality and calls it at init, which should
have no functional change.
The PDPEs are still a special case for now. We'll need a function for
that in the future as well.
v2: Handle renamed unmap_and_free_page functions.
v3: Updated after teardown_va logic was removed.
v4: Rebase after s/page_tables/page_table/.
v5: No longer allocate all PDPs in GEN8+ systems with less than 4GB of
memory, and update populate_lr_context to handle this new case (proper
tracking will be added later in the patch series).
v6: Assign lrc page directory pointer addresses using a macro. (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I woke up one morning and found 50k objects sitting in the batch pool
and every search seemed to iterate the entire list... Painting the
screen in oils would provide a more fluid display.
One issue with the current design is that we only check for retirements
on the current ring when preparing to submit a new batch. This means
that we can have thousands of "active" batches on another ring that we
have to walk over. The simplest way to avoid that is to split the pools
per ring and then our LRU execution ordering will also ensure that the
inactive buffers remain at the front.
v2: execlists still requires duplicate code.
v3: execlists requires more duplicate code
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
According to Spec this is a reserved bit for Gen9+ and should not be set.
Change-Id: I0215fb7057b94139b7a2f90ecc7a0201c0c93ad4
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The legacy and LRC code paths have an almost identical procedure for waiting for
space in the ring buffer. They both search for a request in the free list that
will advance the tail to a point where sufficient space is available. They then
wait for that request, retire it and recalculate the free space value.
Unfortunately, a bug in the LRC side meant that the resulting free space might
not be as large as expected and indeed, might not be sufficient. This is because
it was testing against the value of request->tail not request->postfix. Whereas,
when a request is retired, ringbuf->tail is updated to req->postfix not
req->tail.
Another significant difference between the two is that the LRC one did not trust
the wait for request to work! It redid the is there enough space available test
and would fail the call if insufficient. Whereas, the legacy version just said
'return 0' - it assumed the preceeding code works. This difference meant that
the LRC version still worked even with the bug - it just fell back to the
polling wait path.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The request allocation code is largely duplicated between legacy mode and
execlist mode. The actual difference between the two versions of the code is
pretty minimal.
This patch moves the common code out into a separate function. This is then
called by the execution specific version prior to setting up the one different
value.
For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The only usage of intel_logical_ring_begin() is within intel_lrc.c so it can be
made static. To avoid a forward declaration at the top of the file, it and bunch
of other functions have been shuffled upwards.
For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tomas Elf <tomas.elf@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU/NacAAoJEHm+PkMAQRiGdUcIAJU5dHclwd9HRc7LX5iOwYN6
mN0aCsYjMD8Pjx2VcPCgJvkIoESQO5pkwYpFFWCwILup1bVEidqXfr8EPOdThzdh
kcaT0FwUvd19K+0jcKVNCX1RjKBtlUfUKONk6sS2x4RrYZpv0Ur8Gh+yXV8iMWtf
fAusNEYlxQJvEz5+NSKw86EZTr4VVcykKLNvj+/t/JrXEuue7IG8EyoAO/nLmNd2
V/TUKKttqpE6aUVBiBDmcMQl2SUVAfp5e+KJAHmizdDpSE80nU59UC1uyV8VCYdM
qwHXgttLhhKr8jBPOkvUxl4aSXW7S0QWO8TrMpNdEOeB3ZB8AKsiIuhe1JrK0ro=
=Xkue
-----END PGP SIGNATURE-----
Merge tag 'v4.0-rc3' into drm-next
Linux 4.0-rc3 backmerge to fix two i915 conflicts, and get
some mainline bug fixes needed for my testing box
Conflicts:
drivers/gpu/drm/i915/i915_drv.h
drivers/gpu/drm/i915/intel_display.c
- Y tiling support for scanout from Tvrtko&Damien
- Remove more UMS support
- some small prep patches for OLR removal from John Harrison
- first few patches for dynamic pagetable allocation from Ben Widawsky, rebased
by tons of other people
- DRRS support patches (Sonika&Vandana)
- fbc patches from Paulo
- make sure our vblank callbacks aren't called when the pipes are off
- various patches all over
* tag 'drm-intel-next-2015-02-27' of git://anongit.freedesktop.org/drm-intel: (61 commits)
drm/i915: Update DRIVER_DATE to 20150227
drm/i915: Clarify obj->map_and_fenceable
drm/i915/skl: Allow Y (and Yf) frame buffer creation
drm/i915/skl: Update watermarks for Y tiling
drm/i915/skl: Updated watermark programming
drm/i915/skl: Adjust get_plane_config() to support Yb/Yf tiling
drm/i915/skl: Teach pin_and_fence_fb_obj() about Y tiling constraints
drm/i915/skl: Adjust intel_fb_align_height() for Yb/Yf tiling
drm/i915/skl: Allow scanning out Y and Yf fbs
drm/i915/skl: Add new displayable tiling formats
drm/i915: Remove DRIVER_MODESET checks from modeset code
drm/i915: Remove regfile code&data for UMS suspend/resume
drm/i915: Remove DRIVER_MODESET checks from gem code
drm/i915: Remove DRIVER_MODESET checks in the gpu reset code
drm/i915: Remove DRIVER_MODESET checks from suspend/resume code
drm/i915: Remove DRIVER_MODESET checks in load/unload/close code
drm/i915: fix a printk format
drm/i915: Add media rc6 residency file to sysfs
drm/i915: Add missing description to parameter in alloc_pt_range
drm/i915: Removed the read of RP_STATE_CAP from sysfs/debugfs functions
...
- use the atomic helpers for plane_upate/disable hooks (Matt Roper)
- refactor the initial plane config code (Damien)
- ppgtt prep patches for dynamic pagetable alloc (Ben Widawsky, reworked and
rebased by a lot of other people)
- framebuffer modifier support from Tvrtko Ursulin, drm core code from Rob Clark
- piles of workaround patches for skl from Damien and Nick Hoath
- vGPU support for xengt on the client side (Yu Zhang)
- and the usual smaller things all over
* tag 'drm-intel-next-2015-02-14' of git://anongit.freedesktop.org/drm-intel: (88 commits)
drm/i915: Update DRIVER_DATE to 20150214
drm/i915: Remove references to previously removed UMS config option
drm/i915/skl: Use a LRI for WaDisableDgMirrorFixInHalfSliceChicken5
drm/i915/skl: Fix always true comparison in a revision id check
drm/i915/skl: Implement WaEnableLbsSlaRetryTimerDecrement
drm/i915/skl: Implement WaSetDisablePixMaskCammingAndRhwoInCommonSliceChicken
drm/i915: Add process identifier to requests
drm/i915/skl: Implement WaBarrierPerformanceFixDisable
drm/i915/skl: Implement WaCcsTlbPrefetchDisable:skl
drm/i915/skl: Implement WaDisableChickenBitTSGBarrierAckForFFSliceCS
drm/i915/skl: Implement WaDisableHDCInvalidation
drm/i915/skl: Implement WaDisableLSQCROPERFforOCL
drm/i915/skl: Implement WaDisablePartialResolveInVc
drm/i915/skl: Introduce a SKL specific init_workarounds()
drm/i915/skl: Document that we implement WaRsClearFWBitsAtReset
drm/i915/skl: Implement WaSetGAPSunitClckGateDisable
drm/i915/skl: Make the init clock gating function skylake specific
drm/i915/skl: Provide a gen9 specific init_render_ring()
drm/i915/skl: Document the WM read latency W/A with its name
drm/i915/skl: Also detect eDRAM on SKL
...
In execlist mode, the ringbuf is a function of the ring and context whereas in
legacy mode, it is derived from the ring alone. Thus the calculation required to
determine the ringbuf pointer from the ring (and context) also needs to test
execlist mode or not. This is messy.
Further, the request structure holds a pointer to both the ring and the context
for which it was created. Thus, given a request, it is possible to derive the
ringbuf in either legacy or execlist mode. Hence it is necessary to pass just
the request in to all the low level functions rather than some combination of
request, ring, context and ringbuf. However, rather than recalculating it each
time, it is much simpler to just cache the ringbuf pointer in the request
structure itself.
Caching the pointer means the calculation is done once at request creation time
and all further code and simply read it directly from the request structure.
OTC-Jira: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
[danvet: Drop contentless comment in lrc alloc request entirely. And
spelling fix in the commit message.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is a trace point in the legacy execbuffer execution path that is missing
from the execlist path. Trace points are extremely useful for debugging and are
used by various automated validation tests. Hence, this patch adds the missing
trace point back in.
OTC-Jira: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
There is a flags word that is passed through the execbuffer code path all the
way from initial decoding of the user parameters down to the very final dispatch
buffer call. It is simply called 'flags'. Unfortuantely, there are many other
flags words floating around in the same blocks of code. Even more once the GPU
scheduler arrives.
This patch makes it more obvious exactly which flags word is which by renaming
'flags' to 'dispatch_flags'. Note that the bit definitions for this flags word
already have an 'I915_DISPATCH_' prefix on them and so are not quite so
ambiguous.
OTC-Jira: VIZ-1587
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
[danvet: Resolve conflict with Chris' rework of the bb parsing.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
As we move toward dynamic page table allocation, it becomes much easier
to manage our data structures if break do things less coarsely by
breaking up all of our actions into individual tasks. This makes the
code easier to write, read, and verify.
Aside from the dissection of the allocation functions, the patch
statically allocates the page table structures without a page directory.
This remains the same for all platforms,
The patch itself should not have much functional difference. The primary
noticeable difference is the fact that page tables are no longer
allocated, but rather statically declared as part of the page directory.
This has non-zero overhead, but things gain additional complexity as a
result.
This patch exists for a few reasons:
1. Splitting out the functions allows easily combining GEN6 and GEN8
code. Page tables have no difference based on GEN8. As we'll see in a
future patch when we add the DMA mappings to the allocations, it
requires only one small change to make work, and error handling should
just fall into place.
2. Unless we always want to allocate all page tables under a given PDE,
we'll have to eventually break this up into an array of pointers (or
pointer to pointer).
3. Having the discrete functions is easier to review, and understand.
All allocations and frees now take place in just a couple of locations.
Reviewing, and catching leaks should be easy.
4. Less important: the GFP flags are confined to one location, which
makes playing around with such things trivial.
v2: Updated commit message to explain why this patch exists
v3: For lrc, s/pdp.page_directory[i].daddr/pdp.page_directory[i]->daddr/
v4: Renamed free_pt/pd_single functions to unmap_and_free_pt/pd (Daniel)
v5: Added additional safety checks in gen8 clear/free/unmap.
v6: Use WARN_ON and return -EINVAL in alloc_pt_range (Mika).
v7: Make err_out loop symmetrical to the way we allocate in
alloc_pt_range. Also s/page_tables/page_table and correct commit
message (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move the remaining members over to the new page table structures.
This can be squashed with the previous commit if desire. The reasoning
is the same as that patch. I simply felt it is easier to review if split.
v2: In lrc: s/ppgtt->pd_dma_addr[i]/ppgtt->pdp.page_directory[i].daddr/
v3: Rebase.
v4: Rebased after s/page_tables/page_table/.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When converting from implicitly tracked execlist queue items to ref counted
requests, not all frees of requests were replaced with unrefs, and extraneous
refs/unrefs of contexts were added.
Correct the unbalanced refcount & replace the frees.
Remove a noisy warning when hitting the request creation path.
drm_i915_gem_request and intel_context are both kref reference counted
structures. Upon allocation, drm_i915_gem_request's ref count should be
bumped using kref_init. When a context is assigned to the request,
the context's reference count should be bumped using i915_gem_context_reference.
i915_gem_request_reference will reduce the context reference count when
the request is freed.
Problem introduced in
commit 6d3d8274bc
Author: Nick Hoath <nicholas.hoath@intel.com>
AuthorDate: Thu Jan 15 13:10:39 2015 +0000
drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request
v2: Added comments explaining how the ctx pointer and the request object should
be ref-counted. Removed noisy warning.
v3: Cleaned up the language used in the commit & the header
description (Thanks David Gordon)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88652
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Work was getting left behind in LRC contexts during reset. This causes a hang
if the GPU is reset when HEAD==TAIL because the context's ringbuffer head and
tail don't get reset and retiring a request doesn't alter them, so the ring
still appears full.
Added a function intel_lr_context_reset() to reset head and tail on a LRC and
its ringbuffer.
Call intel_lr_context_reset() for each context in i915_gem_context_reset() when
in execlists mode.
Testcase: igt/pm_rps --run-subtest reset #bdw
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88096
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
[danvet: Flatten control flow in the lrc reset code a notch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On Gen9 the render power gating can leave slice/subslice/EU in
a partially enabled state. We must make an explicit request for
full SSEU enablement through the Render Power Clock State
register when resuming render work. This register is save/
restored in the logical ring context image for execlist
submission mode. Initialize its value in each LRC image to
request full enablement according to the device SSEU config.
Thanks to Sharma Ankitprasad and Akash Goel for highlighting the
issue and proposing the initial fix on which this patch is based.
v2: Adjusted the names of the power gating support flags to fit
update of an earlier patch.
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: "Akash Goel <akash.goel@intel.com>"
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
WaDisableAsyncFlipPerfMode isn't listed for SKL and
INSTPM_FORCE_ORDERING is MBZ so let's make a gen9 specific render init
function.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Nick Hoath <nicholas.hoath@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This function is only used in intel_lrc.c, so restrict it to that file. The
function was moved around to avoid a forward declaration and group it with its
user.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This function is only used in intel_lrc.c, so restrict it to that file. The
function was moved around to avoid a forward declaration and group it with its
user.
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This patch introduces 2 bit definitions of context save/restore
control register.
Signed-off-by: Zhi Wang <zhi.a.wang@intel.com>
Suggested-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This looked like an odd regression from
commit ec5cc0f9b0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Jun 12 10:28:55 2014 +0100
drm/i915: Restrict GPU boost to the RCS engine
but in reality it undercovered a much older coherency bug. The issue that
boosting the GPU frequency on the BCS ring was masking was that we could
wake the CPU up after completion of a BCS batch and inspect memory prior
to the write cache being fully evicted. In order to serialise the
breadcrumb interrupt (and so ensure that the CPU's view of memory is
coherent) we need to perform a post-sync operation in the MI_FLUSH_DW.
v2: Fix all the MI_FLUSH_DW (bsd plus the duplication in execlists).
Also fix the invalidate_domains mask in gen8_emit_flush() for ring !=
VCS.
Testcase: gpuX-rcs-gpu-read-after-write
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Acked-by: Daniel Vetter <daniel@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Remove request from list before unreferencing it, in case it's actually
the only reference. (Found by Tvrtko Ursulin)
This issue has been most likely introduced in
commit 6d3d8274bc
Author: Nick Hoath <nicholas.hoath@intel.com>
Date: Thu Jan 15 13:10:39 2015 +0000
drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We increase it when we pin, so for the casual reader
rename it to cause less confusion.
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We pin when we submit to execlist queue. Balance
the pinning when the submitted queue is cleaned on reset.
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
We have multiple forcewake domains now on recent gens. Change the
function naming to reflect this.
v2: More verbose names (Chris)
v3: Rebase
v4: Rebase
v5: Add documentation for forcewake_get/put
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On user forcewake access, assert that runtime pm reference is held.
Fix and cleanup the callsites accordingly.
v2: Remove intel_runtime_pm_get() rebasehap (Deepak)
v3: use drivers own runtime state tracking as pm_runtime_active()
will return wrong results when we are in resume callchain (Mika)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com> (v2)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move all remaining elements that were unique to execlists queue items
in to the associated request.
Issue: VIZ-4274
v2: Rebase. Fixed issue of overzealous freeing of request.
v3: Removed re-addition of cleanup work queue (found by Daniel Vetter)
v4: Rebase.
v5: Actual removal of intel_ctx_submit_request. Update both tail and postfix
pointer in __i915_add_request (found by Thomas Daniel)
v6: Removed unrelated changes
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
[danvet: Reformat comment with strange linebreaks.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The first pass implementation of execlists required a backpointer to the context to be held
in the intel_ringbuffer. However the context pointer is available higher in the call stack.
Remove the backpointer from the ring buffer structure and instead pass it down through the
call stack.
v2: Integrate this changeset with the removal of duplicate request/execlist queue item members.
v3: Rebase
v4: Rebase. Remove passing of context when the request is passed.
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Where there were duplicate variables for the tail, context and ring (engine)
in the gem request and the execlist queue item, use the one from the request
and remove the duplicate from the execlist queue item.
Issue: VIZ-4274
v1: Rebase
v2: Fixed build issues. Keep separate postfix & tail pointers as these are
used in different ways. Reinserted missing full tail pointer update.
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Add a reference and pointer from the execlist queue item to the associated
gem request. For execlist requests that don't have a request, create one
as a placeholder.
Issue: VIZ-4274
v1: Rebase after upstream of "Replace seqno values with request structures" patchset.
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
Reviewed-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
A previous commit enabled execlists by default:
commit 27401d126b ("drm/i915/bdw: Enable execlists by default where supported")
This allowed routine testing of execlists which exposed a regression when
resuming from suspend. The cause was tracked down the to recent changes to the
ring init sequence:
commit 35a57ffbb1 ("drm/i915: Only init engines once")
During a suspend/resume cycle the hardware Context Status Buffer write pointer
is reset. However since the recent changes to the init sequence the software CSB
read pointer is no longer reset. This means that context status events are not
handled correctly and new contexts are not written to the ELSP, resulting in an
apparent GPU hang.
Pending further changes to the ring init code, just move the
ring->next_context_status_buffer initialization into gen8_init_common_ring to
fix this regression.
v2: Moved init into gen8_init_common_ring rather than context_enable after
feedback from Daniel Vetter. Updated commit msg to reflect this and also cite
commits related to the regression. Fixed bz link to correct bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88096
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>