There is a possibility on gen9 hardware to miss the forcewake ack
message. The recommended workaround is to use another free
bit and toggle it until original bit is successfully acknowledged.
Some future gen9 revs might or might not fix the underlying issue but
using fallback forcewake bit dance can be considered as harmless:
without the ack timeout we never reach the fallback bit forcewake.
Thus as of now we adopt a blanket approach for all gen9 and leave
the bypassing the fallback bit approach for future patches if
corresponding hw revisions do appear.
Commit 83e3337204 ("drm/i915: Increase maximum polling time to 50ms
for forcewake request/clear ack") did increase the forcewake timeout.
If the issue was a delayed ack, future work could include finding
a suitable timeout value both for primary ack and reserve toggle
to reduce the worst case latency.
v2: use bit 15, naming, comment (Chris), only wait fallback ack
v3: fix return on fallback, backoff after fallback write (Chris)
v4: udelay on first pass, grammar (Chris)
v4: s/reserve/fallback
References: HSDES #1604254524
References: https://bugs.freedesktop.org/show_bug.cgi?id=102051
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171102094836.2506-1-mika.kuoppala@linux.intel.com
This patch adds per engine reset and recovery (TDR) support when GuC is
used to submit workloads to GPU.
In the case of i915 directly submission to ELSP, driver manages hang
detection, recovery and resubmission. With GuC submission these tasks
are shared between driver and GuC. i915 is still responsible for detecting
a hang, and when it does it only requests GuC to reset that Engine. GuC
internally manages acquiring forcewake and idling the engine before
resetting it.
Once the reset is successful, i915 takes over again and handles the
resubmission. The scheduler in i915 knows which requests are pending so
after resetting a engine, pending workloads/requests are resubmitted
again.
v2: s/i915_guc_request_engine_reset/i915_guc_reset_engine/ to match the
non-guc function names.
v3: Removed debug message about engine restarting from which request,
since the new baseline do it regardless of submission mode. (Chris)
v4: Rebase.
v5: Do not pass unnecessary reporting flags to the fw (Jeff);
tasklet_schedule(&execlists->irq_tasklet) handles the resubmit; rebase.
v6: Rename the existing reset engine function and share a similar
interface between guc and non-guc paths (Chris).
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20171031225309.10888-1-michel.thierry@intel.com
Reviewed-by: Jeff McGee <jeff.mcgee@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
intel_guc_reset sounds more like the microcontroller is the one performing
a reset, while in this case is the opposite. intel_reset_guc not only
makes it clearer, it follows the other intel_reset functions available.
v2: Print error message in English (Tvrtko).
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171030185616.32836-2-michel.thierry@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
If GuC firmware performs an engine reset while that engine had a
preemption pending, it will set the terminated attribute bit on our
preemption stage descriptor. GuC firmware retains all pending work
items for a high-priority GuC client, unlike the normal-priority GuC
client where work items are dropped. It wants to make sure the preempt-
to-idle work doesn't run when scheduling resumes, and uses this bit to
inform its scheduler and presumably us as well. Our job is to clear it
for the next preemption after reset, otherwise that and future
preemptions will never complete. We'll just clear it every time.
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171101221630.25086-1-jeff.mcgee@intel.com
Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
- Pascal temperature sensor support
- Improved BAR2 handling, greatly reduces time required to suspend
- Rework of the MMU code
- Allows us to properly support Pascal's new MMU layout (implemented)
- Lays the groundwork for improved userspace APIs later
- Misc other fixes
* 'linux-4.15' of git://github.com/skeggsb/linux: (151 commits)
drm/nouveau/gr/gf100-: don't prevent module load if firmware missing
drm/nouveau/mmu: remove old vmm frontend
drm/nouveau: improve selection of GPU page size
drm/nouveau: switch over to new memory and vmm interfaces
drm/nouveau: remove unused nouveau_fence_work()
drm/nouveau: queue delayed unmapping of VMAs on client workqueue
drm/nouveau: implement per-client delayed workqueue with fence support
drm/nouveau: determine memory class for each client
drm/nouveau: pass handle of vmm object to channel allocation ioctls
drm/nouveau: switch to vmm limit
drm/nouveau: allocate vmm object for every client
drm/nouveau: replace use of cpu_coherent with memory types
drm/nouveau: use nvif_mmu_type to determine BAR1 caching
drm/nouveau: fetch memory type indices that we care about for ttm
drm/nouveau: consolidate handling of dma mask
drm/nouveau: check kind validity against mmu object
drm/nouveau: allocate mmu object for every client
drm/nouveau: remove trivial cases of nvxx_device() usage
drm/nouveau/mmu: define user interfaces to mmu vmm opertaions
drm/nouveau/mmu: define user interfaces to mmu memory allocation
...
VMAs are about to not take references on the VMM they belong to, which
means more care is required when handling delayed unmapping.
Queuing it on the client workqueue ensures all pending VMA unmaps will
have completed before the VMM is destroyed.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This is already handled in the top-level gem_new() ioctl in another manner,
but this will be removed in a future commit.
Ideally we'd not need to check up-front at all, and let the VMM code handle
error checking, but there are paths in the current BO management code where
this isn't possible due to map() not always being called during BO creation,
and map() calls not being allowed to fail during buffer migration.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
If the VMA is being deleted, we don't need to explicity unmap first
anymore. The MMU code will automatically merge the operations into
a single page tree walk.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
These are the new priviledged interfaces to the VMM backends, and expose
some functionality that wasn't previously available.
It's now possible to allocate a chunk of address-space (even all of it),
without causing page tables to be allocated up-front, and then map into
it at arbitrary locations. This is the basic primitive used to support
features such as sparse mapping, or to allow userspace control over its
own address-space, or HMM (where the GPU driver isn't in control of the
address-space layout).
Rather than being tied to a subtle combination of memory object and VMA
properties, arguments that control map flags (ro, kind, etc) are passed
explicitly at map time.
The compatibility hacks to implement the old frontend on top of the new
driver backends have been replaced with something similar to implement
the old frontend's interfaces on top of the new frontend.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Adds support for:
- 64KiB/2MiB big page sizes (128KiB not supported by HW with new PT layout).
- System-memory PTs.
- LPTE "invalid" state.
- (Tegra) Use of video memory aperture.
- Sparse PDEs/PTEs.
- Additional blocklinear kinds.
- 49-bit address-space.
GP100 supports an entirely new 5-level page table layout that provides
an expanded 49-bit address-space. It also supports the layout present
on previous generations, which we've been making do with until now.
This commit implements support for the new layout, and enables it by
default.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Adds support for:
- 64KiB big page size.
- System-memory PTs.
- LPTE "invalid" state.
- (Tegra) Use of video memory aperture.
Adds support for marking LPTEs invalid, resulting in the corresponding
SPTEs being ignored, which is supposed to speed up TLB invalidates.
On The Tegra side, this will switch to using the video memory aperture
for all mappings. The HW will still target non-coherent system memory,
but this aperture needs to be selected in order to support compression.
Tegra's instmem backend somewhat cheated to get this effect previously.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>