linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2025-01-18 20:56:13 +07:00

Author	SHA1	Message	Date
Chris Wilson	f494960d5e	drm/i915/gt: Defend against concurrent updates to execlists->active [ 206.875637] BUG: KCSAN: data-race in __i915_schedule+0x7fc/0x930 [i915] [ 206.875654] [ 206.875666] race at unknown origin, with read to 0xffff8881f7644480 of 8 bytes by task 703 on cpu 3: [ 206.875901] __i915_schedule+0x7fc/0x930 [i915] [ 206.876130] __bump_priority+0x63/0x80 [i915] [ 206.876361] __i915_sched_node_add_dependency+0x258/0x300 [i915] [ 206.876593] i915_sched_node_add_dependency+0x50/0xa0 [i915] [ 206.876824] i915_request_await_dma_fence+0x1da/0x530 [i915] [ 206.877057] i915_request_await_object+0x2fe/0x470 [i915] [ 206.877287] i915_gem_do_execbuffer+0x45dc/0x4c20 [i915] [ 206.877517] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915] [ 206.877535] drm_ioctl_kernel+0xe4/0x120 [ 206.877549] drm_ioctl+0x297/0x4c7 [ 206.877563] ksys_ioctl+0x89/0xb0 [ 206.877577] __x64_sys_ioctl+0x42/0x60 [ 206.877591] do_syscall_64+0x6e/0x2c0 [ 206.877606] entry_SYSCALL_64_after_hwframe+0x44/0xa9 v2: Be safe and include mb References: https://gitlab.freedesktop.org/drm/intel/issues/1318 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309170540.10332-1-chris@chris-wilson.co.uk	2020-03-09 20:38:57 +00:00
Chris Wilson	ff34527103	drm/i915/gt: Mark up intel_rps.active for racy reads We read the current state of intel_rps.active outside of the lock, so mark up the racy access. [ 525.037073] BUG: KCSAN: data-race in intel_rps_boost [i915] / intel_rps_park [i915] [ 525.037091] [ 525.037103] write to 0xffff8881f145efa1 of 1 bytes by task 192 on cpu 2: [ 525.037331] intel_rps_park+0x72/0x230 [i915] [ 525.037552] __gt_park+0x61/0xa0 [i915] [ 525.037771] ____intel_wakeref_put_last+0x42/0x90 [i915] [ 525.037991] __intel_wakeref_put_work+0xd3/0xf0 [i915] [ 525.038008] process_one_work+0x3b1/0x690 [ 525.038022] worker_thread+0x80/0x670 [ 525.038037] kthread+0x19a/0x1e0 [ 525.038051] ret_from_fork+0x1f/0x30 [ 525.038062] [ 525.038074] read to 0xffff8881f145efa1 of 1 bytes by task 733 on cpu 3: [ 525.038304] intel_rps_boost+0x67/0x1f0 [i915] [ 525.038535] i915_request_wait+0x562/0x5d0 [i915] [ 525.038764] i915_gem_object_wait_fence+0x81/0xa0 [i915] [ 525.038994] i915_gem_object_wait_reservation+0x489/0x520 [i915] [ 525.039224] i915_gem_wait_ioctl+0x167/0x2b0 [i915] [ 525.039241] drm_ioctl_kernel+0xe4/0x120 [ 525.039255] drm_ioctl+0x297/0x4c7 [ 525.039269] ksys_ioctl+0x89/0xb0 [ 525.039282] __x64_sys_ioctl+0x42/0x60 [ 525.039296] do_syscall_64+0x6e/0x2c0 [ 525.039311] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309113623.24208-1-chris@chris-wilson.co.uk	2020-03-09 18:30:20 +00:00
Chris Wilson	a4e648a0b3	drm/i915/execlsts: Mark up racy inspection of current i915_request priority [ 120.176548] BUG: KCSAN: data-race in __i915_schedule [i915] / effective_prio [i915] [ 120.176566] [ 120.176577] write to 0xffff8881e35e6540 of 4 bytes by task 730 on cpu 3: [ 120.176792] __i915_schedule+0x63e/0x920 [i915] [ 120.177007] __bump_priority+0x63/0x80 [i915] [ 120.177220] __i915_sched_node_add_dependency+0x258/0x300 [i915] [ 120.177438] i915_sched_node_add_dependency+0x50/0xa0 [i915] [ 120.177654] i915_request_await_dma_fence+0x1da/0x530 [i915] [ 120.177867] i915_request_await_object+0x2fe/0x470 [i915] [ 120.178081] i915_gem_do_execbuffer+0x45dc/0x4c20 [i915] [ 120.178292] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915] [ 120.178309] drm_ioctl_kernel+0xe4/0x120 [ 120.178322] drm_ioctl+0x297/0x4c7 [ 120.178335] ksys_ioctl+0x89/0xb0 [ 120.178348] __x64_sys_ioctl+0x42/0x60 [ 120.178361] do_syscall_64+0x6e/0x2c0 [ 120.178375] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 120.178387] [ 120.178397] read to 0xffff8881e35e6540 of 4 bytes by interrupt on cpu 2: [ 120.178606] effective_prio+0x25/0xc0 [i915] [ 120.178812] process_csb+0xe8b/0x10a0 [i915] [ 120.179021] execlists_submission_tasklet+0x30/0x170 [i915] [ 120.179038] tasklet_action_common.isra.0+0x42/0xa0 [ 120.179053] __do_softirq+0xd7/0x2cd [ 120.179066] irq_exit+0xbe/0xe0 [ 120.179078] do_IRQ+0x51/0x100 [ 120.179090] ret_from_intr+0x0/0x1c [ 120.179104] cpuidle_enter_state+0x1b8/0x5d0 [ 120.179117] cpuidle_enter+0x50/0x90 [ 120.179131] do_idle+0x1a1/0x1f0 [ 120.179145] cpu_startup_entry+0x14/0x16 [ 120.179158] start_secondary+0x120/0x180 [ 120.179172] secondary_startup_64+0xa4/0xb0 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309110934.868-5-chris@chris-wilson.co.uk	2020-03-09 18:24:13 +00:00
Chris Wilson	fa192d90cf	drm/i915/execlists: Mark up read of i915_request.fence.flags [ 145.927961] BUG: KCSAN: data-race in can_merge_rq [i915] / signal_irq_work [i915] [ 145.927980] [ 145.927992] write (marked) to 0xffff8881e513fab0 of 8 bytes by interrupt on cpu 2: [ 145.928250] signal_irq_work+0x134/0x640 [i915] [ 145.928268] irq_work_run_list+0xd7/0x120 [ 145.928283] irq_work_run+0x1d/0x50 [ 145.928300] smp_irq_work_interrupt+0x21/0x30 [ 145.928328] irq_work_interrupt+0xf/0x20 [ 145.928356] _raw_spin_unlock_irqrestore+0x34/0x40 [ 145.928596] execlists_submission_tasklet+0xde/0x170 [i915] [ 145.928616] tasklet_action_common.isra.0+0x42/0xa0 [ 145.928632] __do_softirq+0xd7/0x2cd [ 145.928646] irq_exit+0xbe/0xe0 [ 145.928665] do_IRQ+0x51/0x100 [ 145.928684] ret_from_intr+0x0/0x1c [ 145.928699] schedule+0x0/0xb0 [ 145.928719] worker_thread+0x194/0x670 [ 145.928743] kthread+0x19a/0x1e0 [ 145.928765] ret_from_fork+0x1f/0x30 [ 145.928784] [ 145.928796] read to 0xffff8881e513fab0 of 8 bytes by task 738 on cpu 1: [ 145.929046] can_merge_rq+0xb1/0x100 [i915] [ 145.929282] __execlists_submission_tasklet+0x866/0x25a0 [i915] [ 145.929518] execlists_submit_request+0x2a4/0x2b0 [i915] [ 145.929758] submit_notify+0x8f/0xc0 [i915] [ 145.929989] __i915_sw_fence_complete+0x5d/0x3e0 [i915] [ 145.930221] i915_sw_fence_complete+0x58/0x80 [i915] [ 145.930453] i915_sw_fence_commit+0x16/0x20 [i915] [ 145.930698] __i915_request_queue+0x60/0x70 [i915] [ 145.930935] i915_gem_do_execbuffer+0x3997/0x4c20 [i915] [ 145.931175] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915] [ 145.931194] drm_ioctl_kernel+0xe4/0x120 [ 145.931208] drm_ioctl+0x297/0x4c7 [ 145.931222] ksys_ioctl+0x89/0xb0 [ 145.931238] __x64_sys_ioctl+0x42/0x60 [ 145.931260] do_syscall_64+0x6e/0x2c0 [ 145.931275] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309110934.868-4-chris@chris-wilson.co.uk	2020-03-09 18:24:13 +00:00
Chris Wilson	875c3b4b5c	drm/i915/gt: Mark up racy check of last list element [ 25.025543] BUG: KCSAN: data-race in __i915_request_create [i915] / process_csb [i915] [ 25.025561] [ 25.025573] write (marked) to 0xffff8881e85c1620 of 8 bytes by task 696 on cpu 1: [ 25.025789] __i915_request_create+0x54b/0x5d0 [i915] [ 25.026001] i915_request_create+0xcc/0x150 [i915] [ 25.026218] i915_gem_do_execbuffer+0x2f70/0x4c20 [i915] [ 25.026428] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915] [ 25.026445] drm_ioctl_kernel+0xe4/0x120 [ 25.026459] drm_ioctl+0x297/0x4c7 [ 25.026472] ksys_ioctl+0x89/0xb0 [ 25.026484] __x64_sys_ioctl+0x42/0x60 [ 25.026497] do_syscall_64+0x6e/0x2c0 [ 25.026510] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 25.026522] [ 25.026532] read to 0xffff8881e85c1620 of 8 bytes by interrupt on cpu 2: [ 25.026742] process_csb+0x8d6/0x1070 [i915] [ 25.026949] execlists_submission_tasklet+0x30/0x170 [i915] [ 25.026969] tasklet_action_common.isra.0+0x42/0xa0 [ 25.026984] __do_softirq+0xd7/0x2cd [ 25.026997] irq_exit+0xbe/0xe0 [ 25.027009] do_IRQ+0x51/0x100 [ 25.027021] ret_from_intr+0x0/0x1c [ 25.027033] poll_idle+0x3e/0x13b [ 25.027047] cpuidle_enter_state+0x189/0x5d0 [ 25.027060] cpuidle_enter+0x50/0x90 [ 25.027074] do_idle+0x1a1/0x1f0 [ 25.027086] cpu_startup_entry+0x14/0x16 [ 25.027100] start_secondary+0x120/0x180 [ 25.027116] secondary_startup_64+0xa4/0xb0 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309110934.868-2-chris@chris-wilson.co.uk	2020-03-09 18:23:59 +00:00
Chris Wilson	23a44ae9e8	drm/i915/execlists: Mark up the racy access to switch_priority_hint [ 7534.150687] BUG: KCSAN: data-race in __execlists_submission_tasklet [i915] / process_csb [i915] [ 7534.150706] [ 7534.150717] write to 0xffff8881f1bc24b4 of 4 bytes by task 24404 on cpu 3: [ 7534.150925] __execlists_submission_tasklet+0x1158/0x2780 [i915] [ 7534.151133] execlists_submit_request+0x2e8/0x2f0 [i915] [ 7534.151348] submit_notify+0x8f/0xc0 [i915] [ 7534.151549] __i915_sw_fence_complete+0x5d/0x3e0 [i915] [ 7534.151753] i915_sw_fence_complete+0x58/0x80 [i915] [ 7534.151963] i915_sw_fence_commit+0x16/0x20 [i915] [ 7534.152179] __i915_request_queue+0x60/0x70 [i915] [ 7534.152388] i915_gem_do_execbuffer+0x3997/0x4c20 [i915] [ 7534.152598] i915_gem_execbuffer2_ioctl+0x2c3/0x580 [i915] [ 7534.152615] drm_ioctl_kernel+0xe4/0x120 [ 7534.152629] drm_ioctl+0x297/0x4c7 [ 7534.152642] ksys_ioctl+0x89/0xb0 [ 7534.152654] __x64_sys_ioctl+0x42/0x60 [ 7534.152667] do_syscall_64+0x6e/0x2c0 [ 7534.152681] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 7534.152693] [ 7534.152703] read to 0xffff8881f1bc24b4 of 4 bytes by interrupt on cpu 2: [ 7534.152914] process_csb+0xe7c/0x10a0 [i915] [ 7534.153120] execlists_submission_tasklet+0x30/0x170 [i915] [ 7534.153138] tasklet_action_common.isra.0+0x42/0xa0 [ 7534.153153] __do_softirq+0xd7/0x2cd [ 7534.153166] run_ksoftirqd+0x15/0x20 [ 7534.153180] smpboot_thread_fn+0x1ab/0x300 [ 7534.153194] kthread+0x19a/0x1e0 [ 7534.153207] ret_from_fork+0x1f/0x30 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309144249.10309-1-chris@chris-wilson.co.uk	2020-03-09 17:08:58 +00:00
Chris Wilson	74e5a9aca0	drm/i915/gt: Mark up intel_rps.active for racy reads We read the current state of intel_rps.active outside of the lock, so mark up the racy access. [ 525.037073] BUG: KCSAN: data-race in intel_rps_boost [i915] / intel_rps_park [i915] [ 525.037091] [ 525.037103] write to 0xffff8881f145efa1 of 1 bytes by task 192 on cpu 2: [ 525.037331] intel_rps_park+0x72/0x230 [i915] [ 525.037552] __gt_park+0x61/0xa0 [i915] [ 525.037771] ____intel_wakeref_put_last+0x42/0x90 [i915] [ 525.037991] __intel_wakeref_put_work+0xd3/0xf0 [i915] [ 525.038008] process_one_work+0x3b1/0x690 [ 525.038022] worker_thread+0x80/0x670 [ 525.038037] kthread+0x19a/0x1e0 [ 525.038051] ret_from_fork+0x1f/0x30 [ 525.038062] [ 525.038074] read to 0xffff8881f145efa1 of 1 bytes by task 733 on cpu 3: [ 525.038304] intel_rps_boost+0x67/0x1f0 [i915] [ 525.038535] i915_request_wait+0x562/0x5d0 [i915] [ 525.038764] i915_gem_object_wait_fence+0x81/0xa0 [i915] [ 525.038994] i915_gem_object_wait_reservation+0x489/0x520 [i915] [ 525.039224] i915_gem_wait_ioctl+0x167/0x2b0 [i915] [ 525.039241] drm_ioctl_kernel+0xe4/0x120 [ 525.039255] drm_ioctl+0x297/0x4c7 [ 525.039269] ksys_ioctl+0x89/0xb0 [ 525.039282] __x64_sys_ioctl+0x42/0x60 [ 525.039296] do_syscall_64+0x6e/0x2c0 [ 525.039311] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309113623.24208-1-chris@chris-wilson.co.uk	2020-03-09 17:08:52 +00:00
Matt Roper	dbe748cd3a	drm/i915/tgl: Don't treat unslice registers as masked The UNSLICE_UNIT_LEVEL_CLKGATE and UNSLICE_UNIT_LEVEL_CLKGATE2 registers that we update in a few engine workarounds are not masked registers (i.e., we don't have to write a mask bit in the top 16 bits when updating one of the lower 16 bits). As such, these workarounds should be applied via wa_write_or() rather than wa_masked_en() v2: - Rebase Reported-by: Nick Desaulniers <ndesaulniers@google.com> Reported-by: kernelci.org bot <bot@kernelci.org> References: https://github.com/ClangBuiltLinux/linux/issues/918 Fixes: `50148a25f8` ("drm/i915/tgl: Move and restrict Wa_1408615072") Fixes: `3551ff9287` ("drm/i915/gen11: Moving WAs to rcs_engine_wa_init()") Cc: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Tested-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200306171139.1414649-1-matthew.d.roper@intel.com	2020-03-09 09:17:12 -07:00
Chris Wilson	cc328351e1	drm/i915/gt: Wait for the wa batch to be pinned Be sure to wait for the vma to be in place before we tell the GPU to execute from the wa batch. Since initialisation is mostly synchronous (or rather at some point during start up we will need to sync anyway), we can affort to do an explicit i915_vma_sync() during wa batch construction rather than check for a required await on every context switch. (We don't expect to change the wa bb at run time so paying the cost once up front seems preferrable.) Fixes: `ee2413eeed` ("drm/i915: Add mechanism to submit a context WA on ring submission") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200307122425.29114-1-chris@chris-wilson.co.uk	2020-03-07 17:10:35 +00:00
Chris Wilson	2d4bd971f5	drm/i915/gt: Close race between cacheline_retire and free If the cacheline may still be busy, atomically mark it for future release, and only if we can determine that it will never be used again, immediately free it. Closes: https://gitlab.freedesktop.org/drm/intel/issues/1392 Fixes: `ebece75392` ("drm/i915: Keep timeline HWSP allocated until idle across the system") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.2+ Link: https://patchwork.freedesktop.org/patch/msgid/20200306154647.3528345-1-chris@chris-wilson.co.uk	2020-03-07 00:05:54 +00:00
Chris Wilson	3df2deed41	drm/i915/execlists: Enable timeslice on partial virtual engine dequeue If we stop filling the ELSP due to an incompatible virtual engine request, check if we should enable the timeslice on behalf of the queue. This fixes the case where we are inspecting the last->next element when we know that the last element is the last request in the execution queue, and so decided we did not need to enable timeslicing despite the intent to do so! Fixes: `8ee36e048c` ("drm/i915/execlists: Minimalistic timeslicing") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200306113012.3184606-1-chris@chris-wilson.co.uk	2020-03-07 00:05:54 +00:00
Swathi Dhanavanthri	b592322f50	drm/i915/tgl: Make Wa_1606700617 permanent This workaround is to disable FF DOP Clock gating. The fix in B0 was backed out due to timing reasons and decided to be made permanent. Bspec: 52890 Signed-off-by: Swathi Dhanavanthri <swathi.dhanavanthri@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200305181204.28856-1-swathi.dhanavanthri@intel.com	2020-03-06 14:47:09 -08:00
Matthew Auld	520f835036	drm/i915: properly sanity check batch_start_offset Check the edge case where batch_start_offset sits exactly on the batch size. v2: add new range_overflows variant to capture the special case where the size is permitted to be zero, like with batch_len. v3: other way around. the common case is the exclusive one which should just be >=, with that we then just need to convert the three odd ball cases that don't apply to use the new inclusive _end version. Testcase: igt/gem_exec_params/invalid-batch-start-offset Fixes: `0b5372727b` ("drm/i915/cmdparser: Use cached vmappings") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200306094735.258285-1-matthew.auld@intel.com	2020-03-06 13:15:49 +00:00
Chris Wilson	1eaa251b66	drm/i915: Assert requests within a context are submitted in order Check the flow of requests into the hardware to verify that are submitted in order along their timeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200306071614.2846708-1-chris@chris-wilson.co.uk	2020-03-06 10:53:54 +00:00
Prathap Kumar Valsan	47f8253d2b	drm/i915/gen7: Clear all EU/L3 residual contexts On gen7 and gen7.5 devices, there could be leftover data residuals in EU/L3 from the retiring context. This patch introduces workaround to clear that residual contexts, by submitting a batch buffer with dedicated HW context to the GPU with ring allocation for each context switching. This security mitigation changes does not triggers any performance regression. Performance is on par with current drm-tips. v2: Add igt generated header file for CB kernel assembled with Mesa tool and addressed use of Kernel macro for ptr_align comment. v3: Resolve Sparse warnings with newly generated, and imported CB kernel. v4: Include new igt generated CB kernel for gen7 and gen7.5. Also add code formatting and compiler warnings changes (Chris Wilson) Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Cc: Chris Wilson <chris.p.wilson@intel.com> Cc: Balestrieri Francesco <francesco.balestrieri@intel.com> Cc: Bloomfield Jon <jon.bloomfield@intel.com> Cc: Dutt Sudeep <sudeep.dutt@intel.com> Acked-by: Chris Wilson <chris@chris-wilso.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200306000957.2836150-2-chris@chris-wilson.co.uk	2020-03-06 08:59:06 +00:00
Mika Kuoppala	ee2413eeed	drm/i915: Add mechanism to submit a context WA on ring submission This patch adds framework to submit an arbitrary batchbuffer on each context switch to clear residual state for render engine on Gen7/7.5 devices. The idea of always emitting the context and vm setup around each request is primary to make reset recovery easy, and not require rewriting the ringbuffer. As each request would set up its own context, leaving it to the HW to notice and elide no-op context switches, we could restart the ring at any point, and reorder the requests freely. However, to avoid emitting clear_residuals() between consecutive requests in the ringbuffer of the same context, we do want to track the current context in the ring. In doing so, we need to be careful to only record a context switch when we are sure the next request will be emitted. This security mitigation change does not trigger any performance regression. Performance is on par with current mainline/drm-tip. v2: Update vm_alias params to point to correct address space "vm" due to changes made in the patch "f21613797bae98773" v3-v4: none Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com> Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Balestrieri Francesco <francesco.balestrieri@intel.com> Cc: Bloomfield Jon <jon.bloomfield@intel.com> Cc: Dutt Sudeep <sudeep.dutt@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200306000957.2836150-1-chris@chris-wilson.co.uk	2020-03-06 08:59:06 +00:00
Chris Wilson	81dcef4cee	drm/i915/execlists: Show the "switch priority hint" in dumps Show the timeslicing priority hint in engine dumps to aide debugging. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200305135843.2760512-1-chris@chris-wilson.co.uk	2020-03-05 15:46:59 +00:00
Tvrtko Ursulin	9b11bbf0c4	drm/i915/tgl: WaDisableGPGPUMidThreadPreemption Enable FtrPerCtxtPreemptionGranularityControl bit and select thread- group as the default preemption level. v2: * Remove register whitelisting (Rafael, Tony). Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: piotr.zdunowski@intel.com Cc: michal.mrozek@intel.com Cc: Tony Ye <tony.ye@intel.com> Cc: Rafael Antognolli <rafael.antognolli@intel.com> Acked-by: Rafael Antognolli <rafael.antognolli@intel.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Tony Ye <tony.ye@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200304153144.10675-1-tvrtko.ursulin@linux.intel.com	2020-03-05 13:26:06 +00:00
Chris Wilson	be90e34483	drm/i915/gt: Cancel banned contexts after GT reset As we started marking the ce->gem_context as NULL on closure, we can no longer use that to carry closure information. Instead, we can look at whether the context was killed on closure instead. Fixes: `130a95e909` ("drm/i915/gem: Consolidate ctx->engines[] release") Closes: https://gitlab.freedesktop.org/drm/intel/issues/1379 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200304165113.2449213-1-chris@chris-wilson.co.uk	2020-03-04 20:28:02 +00:00
Chris Wilson	8e9f84cf5c	drm/i915/gt: Propagate change in error status to children on unhold As we release the head requests back into the queue, propagate any change in error status that may have occurred while the requests were temporarily suspended. Closes: https://gitlab.freedesktop.org/drm/intel/issues/1277 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200304121849.2448028-2-chris@chris-wilson.co.uk	2020-03-04 14:29:50 +00:00
Chris Wilson	36e191f064	drm/i915: Apply i915_request_skip() on submission Trying to use i915_request_skip() prior to i915_request_add() causes us to try and fill the ring upto request->postfix, which has not yet been set, and so may cause us to memset() past the end of the ring. Instead of skipping the request immediately, just flag the error on the request (only accepting the first fatal error we see) and then clear the request upon submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200304121849.2448028-1-chris@chris-wilson.co.uk	2020-03-04 14:29:50 +00:00
José Roberto de Souza	50148a25f8	drm/i915/tgl: Move and restrict Wa_1408615072 Following the changes in the previous patch "drm/i915/gen11: Moving WAs to rcs_engine_wa_init()" also moving TGL Wa_1408615072 to rcs_engine_wa_init() this way after a engine reset it will be reapplied also restricting it to A0 as it is fixed in B0 stepping. BSpec: 52890 Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200302231421.224322-2-jose.souza@intel.com	2020-03-03 13:33:11 -08:00
José Roberto de Souza	3551ff9287	drm/i915/gen11: Moving WAs to rcs_engine_wa_init() This are register of render engine, so after a render reset those would return to the default value and init_clock_gating() is not called for single engine reset. So here moving it rcs_engine_wa_init() that will guarantee that this WAs will not be lost. Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200302231421.224322-1-jose.souza@intel.com	2020-03-03 13:32:52 -08:00
Aditya Swarup	9b234d2643	drm/i915/selftests: Fix uninitialized variable Static code analysis tool identified struct lrc_timestamp data as being uninitialized and then data.ce[] is being checked for NULL/negative value in the error path. Initializing data variable fixes the issue. Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Aditya Swarup <aditya.swarup@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200303142347.15696-1-aditya.swarup@intel.com	2020-03-03 17:30:20 +00:00
Chris Wilson	82126e596d	drm/i915/gt: Drop the timeline->mutex as we wait for retirement As we have pinned the timeline (using tl->active_count), we can safely drop the tl->mutex as we wait for what we believe to be the final request on that timeline. This is useful for ensuring that we do not block the engine heartbeat by hogging the kernel_context's timeline on a dead GPU. References: https://gitlab.freedesktop.org/drm/intel/issues/1364 Fixes: `058179e72e` ("drm/i915/gt: Replace hangcheck by heartbeats") Fixes: `f33a8a5160` ("drm/i915: Merge wait_for_timelines with retire_request") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200303140009.1494819-1-chris@chris-wilson.co.uk	2020-03-03 17:30:20 +00:00
Chris Wilson	373f27f24c	drm/i915/gt: Prevent allocation on a banned context If a context is banned even before we submit our first request to it, report the failure before we attempt to allocate any resources for the context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200303080546.1140508-2-chris@chris-wilson.co.uk	2020-03-03 17:30:20 +00:00
Jani Nikula	9e859eb9d0	drm/i915/vgpu: improve vgpu abstractions Add intel_vgpu_register() abstraction, rename i915_detect_vgpu() to intel_vgpu_detect() to match other function naming, un-inline intel_vgpu_active(), intel_vgpu_has_full_ppgtt() and intel_vgpu_has_huge_gtt() to reduce header interdependencies. The i915_vgpu.[ch] filename and intel_vgpu_ prefix discrepancy remains. Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227144408.24345-1-jani.nikula@intel.com	2020-03-03 17:46:54 +02:00
Daniele Ceraolo Spurio	69f5c87cf4	drm/i915/huc: update TGL HuC to v7.0.12 Update to the latest available TGL HuC, which includes changes required by the media team. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Tony Ye <tony.ye@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Anusha Srivatsa <anusha.srivatsa@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Acked-by: Tony Ye <tony.ye@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200229012042.27487-1-daniele.ceraolospurio@intel.com	2020-03-02 16:26:38 -08:00
Chris Wilson	15db5fcce9	drm/i915/execlists: Check the sentinel is alone in the ELSP We only use sentinel requests for "preempt-to-idle" passes, so assert that they are the only request in a new submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200302085812.4172450-12-chris@chris-wilson.co.uk	2020-03-02 21:28:17 +00:00
José Roberto de Souza	f5e5a33037	drm/i915/tgl: Add Wa number to WaAllowPMDepthAndInvocationCountAccessFromUMD Just to make easier to check that the Wa was implemetend when comparing to the number in BSpec. BSpec: 52890 Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227220101.321671-10-jose.souza@intel.com	2020-03-02 12:01:25 -08:00
José Roberto de Souza	7028b08109	drm/i915/tgl: Add note about Wa_1409142259 Different issues with the same fix, so justing adding Wa_1409142259, Wa_1409252684, Wa_1409217633, Wa_1409207793, Wa_1409178076 and 1408979724 to the comment so other devs can check if this Was were implemetend with a simple grep. Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227220101.321671-8-jose.souza@intel.com	2020-03-02 12:00:42 -08:00
José Roberto de Souza	0bd06a59df	drm/i915/tgl: Fix the Wa number of a fix The Wa number for this fix is Wa_1607087056 the BSpec bug id is 1607087056, just updating to match BSpec. BSpec: 52890 Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227220101.321671-7-jose.souza@intel.com	2020-03-02 12:00:42 -08:00
José Roberto de Souza	d55204d3f8	drm/i915/tgl: Add note about Wa_1607063988 This issue workaround in Wa_1607063988 has the same fix as Wa_1607138336, so just adding a note in the code. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227220101.321671-6-jose.souza@intel.com	2020-03-02 12:00:41 -08:00
José Roberto de Souza	e2049b4c0c	drm/i915/tgl: Add note to Wa_1607297627 Add note about the confliting information in BSpec about this WA. BSpec: 52890 Acked-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227220101.321671-5-jose.souza@intel.com	2020-03-02 12:00:41 -08:00
Anusha Srivatsa	f2097beed5	drm/i915/tgl: Extend Wa_1606931601 for all steppings According to BSpec. Wa_1606931601 applies for all TGL steppings. This patch moves the WA implementation out of A0 only block of rcs_engine_wa_init(). The WA is has also been referred to by an alternate name Wa_1607090982. Bspec: 46045, 52890 Fixes: `3873fd1a43` ("drm/i915: Use engine wa list for Wa_1607090982") Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Anusha Srivatsa <anusha.srivatsa@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227220101.321671-4-jose.souza@intel.com	2020-03-02 12:00:41 -08:00
Matt Atwood	52c2e4e6f1	drm/i915/tgl: Add Wa_1409085225, Wa_14010229206 Disable Push Constant buffer addition for TGL. v2: typos, add additional Wa reference v3: use REG_BIT macro, move to rcs_engine_wa_init, clean up commit message. Bspec: 52890 Cc: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227220101.321671-3-jose.souza@intel.com	2020-03-02 12:00:40 -08:00
José Roberto de Souza	072d069a04	drm/i915/tgl: Implement Wa_1806527549 This will whitelist the HIZ_CHICKEN register so mesa can disable the optimizations and avoid hang when using D16_UNORM. v2: moved to the right place and used the right function() (Chris) Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Rafael Antognolli <rafael.antognolli@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227220101.321671-2-jose.souza@intel.com	2020-03-02 12:00:40 -08:00
José Roberto de Souza	ec1e12645f	drm/i915/tgl: Implement Wa_1409804808 This workaround the CS not done issue on PIPE_CONTROL. v2: - replaced BIT() by REG_BIT() in all GEN7_ROW_CHICKEN2() bits - shortened the name of the new bit BSpec: 52890 BSpec: 46218 Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227220101.321671-1-jose.souza@intel.com	2020-03-02 12:00:39 -08:00
Chris Wilson	9a40bddd47	drm/i915/gt: Expose heartbeat interval via sysfs We monitor the health of the system via periodic heartbeat pulses. The pulses also provide the opportunity to perform garbage collection. However, we interpret an incomplete pulse (a missed heartbeat) as an indication that the system is no longer responsive, i.e. hung, and perform an engine or full GPU reset. Given that the preemption granularity can be very coarse on a system, we let the sysadmin override our legacy timeouts which were "optimised" for desktop applications. The heartbeat interval can be adjusted per-engine using, /sys/class/drm/card?/engine/*/heartbeat_interval_ms Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Steve Carbonari <steven.carbonari@intel.com> Tested-by: Steve Carbonari <steven.carbonari@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-7-chris@chris-wilson.co.uk	2020-02-28 22:03:49 +00:00
Chris Wilson	db3d8338ba	drm/i915/gt: Expose preempt reset timeout via sysfs After initialising a preemption request, we give the current resident a small amount of time to vacate the GPU. The preemption request is for a higher priority context and should be immediate to maintain high quality of service (and avoid priority inversion). However, the preemption granularity of the GPU can be quite coarse and so we need a compromise. The preempt timeout can be adjusted per-engine using, /sys/class/drm/card?/engine/*/preempt_timeout_ms and can be disabled by setting it to 0. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Steve Carbonari <steven.carbonari@intel.com> Tested-by: Steve Carbonari <steven.carbonari@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-6-chris@chris-wilson.co.uk	2020-02-28 22:03:46 +00:00
Chris Wilson	72338a1f5e	drm/i915/gt: Expose reset stop timeout via sysfs When we allow ourselves to sleep before a GPU reset after disabling submission, even for a few milliseconds, gives an innocent context the opportunity to clear the GPU before the reset occurs. However, how long to sleep depends on the typical non-preemptible duration (a similar problem to determining the ideal preempt-reset timeout or even the heartbeat interval). As this seems of a hard policy decision, punt it to userspace. The timeout can be adjusted using /sys/class/drm/card?/engine/*/stop_timeout_ms Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Reviewed-by: Steve Carbonari <steven.carbonari@intel.com> Tested-by: Steve Carbonari <steven.carbonari@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-5-chris@chris-wilson.co.uk	2020-02-28 22:03:43 +00:00
Chris Wilson	062444bbc6	drm/i915/gt: Expose busywait duration to sysfs We busywait on an inflight request (one that is currently executing on HW, and so might complete quickly) prior to setting up an interrupt and sleeping. The trade off is that we keep an expensive CPU core busy in order to avoid wake up latency: where that trade off should lie is best left to the sysadmin. The busywait mechanism can be compiled out with ./scripts/config --set-val DRM_I915_SPIN_REQUEST 0 The maximum busywait duration can be adjusted per-engine using, /sys/class/drm/card?/engine/*/ms_busywait_duration_ns Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Steve Carbonari <steven.carbonari@intel.com> Tested-by: Steve Carbonari <steven.carbonari@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-4-chris@chris-wilson.co.uk	2020-02-28 22:03:41 +00:00
Chris Wilson	1a2695a746	drm/i915/gt: Expose timeslice duration to sysfs Execlists uses a scheduling quantum (a timeslice) to alternate execution between ready-to-run contexts of equal priority. This ensures that all users (though only if they of equal importance) have the opportunity to run and prevents livelocks where contexts may have implicit ordering due to userspace semaphores. The timeslicing mechanism can be compiled out with ./scripts/config --set-val DRM_I915_TIMESLICE_DURATION 0 The timeslice duration can be adjusted per-engine using, /sys/class/drm/card?/engine/*/timeslice_duration_ms Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Steve Carbonari <steven.carbonari@intel.com> Tested-by: Steve Carbonari <steven.carbonari@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-3-chris@chris-wilson.co.uk	2020-02-28 22:03:39 +00:00
Chris Wilson	6e57cc3942	drm/i915/gt: Expose engine->mmio_base via sysfs Use the per-engine sysfs directory to let userspace discover the mmio_base of each engine. Prior to recent generations, the user accessible registers on each engine are at a fixed offset relative to each engine -- but require absolute addressing. As the absolute address depends on the actual physical engine, this is not always possible to determine from userspace (for example icl may expose vcs1 or vcs2 as the second vcs engine). Make this easy for userspace to discover by providing the mmio_base in sysfs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Steve Carbonari <steven.carbonari@intel.com> Tested-by: Steve Carbonari <steven.carbonari@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-2-chris@chris-wilson.co.uk	2020-02-28 22:03:35 +00:00
Chris Wilson	4ec76dbeb6	drm/i915/gt: Expose engine properties via sysfs Preliminary stub to add engines underneath /sys/class/drm/cardN/, so that we can expose properties on each engine to the sysadmin. To start with we have basic analogues of the i915_query ioctl so that we can pretty print engine discovery from the shell, and flesh out the directory structure. Later we will add writeable sysadmin properties such as per-engine timeout controls. An example tree of the engine properties on Braswell: /sys/class/drm/card0 └── engine ├── bcs0 │ ├── capabilities │ ├── class │ ├── instance │ ├── known_capabilities │ └── name ├── rcs0 │ ├── capabilities │ ├── class │ ├── instance │ ├── known_capabilities │ └── name ├── vcs0 │ ├── capabilities │ ├── class │ ├── instance │ ├── known_capabilities │ └── name └── vecs0 ├── capabilities ├── class ├── instance ├── known_capabilities └── name v2: Include stringified capabilities v3: Include all known capabilities for futureproofing. v4: Combine the two caps loops into one v5: Hide underneath Kconfig.unstable for wider discussion Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Tested-by: Steve Carbonari <steven.carbonari@intel.com> Reviewed-by: Steve Carbonari <steven.carbonari@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200228131716.3243616-1-chris@chris-wilson.co.uk	2020-02-28 22:03:19 +00:00
Chris Wilson	3fc28d3e0e	drm/i915/gt: Reset queue_priority_hint after wedging An odd and highly unlikely path caught us out. On delayed submission (due to an asynchronous reset handler), we poked the priority_hint and kicked the tasklet. However, we had already marked the device as wedged and swapped out the tasklet for a no-op. The result was that we never cleared the priority hint and became upset when we later checked. <0> [574.303565] i915_sel-6278 2.... 481822445us : __i915_subtests: Running intel_execlists_live_selftests/live_error_interrupt <0> [574.303565] i915_sel-6278 2.... 481822472us : __engine_unpark: 0000:00:02.0 rcs0: <0> [574.303565] i915_sel-6278 2.... 481822491us : __gt_unpark: 0000:00:02.0 <0> [574.303565] i915_sel-6278 2.... 481823220us : execlists_context_reset: 0000:00:02.0 rcs0: context:f4ee reset <0> [574.303565] i915_sel-6278 2.... 481824830us : __intel_context_active: 0000:00:02.0 rcs0: context:f51b active <0> [574.303565] i915_sel-6278 2.... 481825258us : __intel_context_do_pin: 0000:00:02.0 rcs0: context:f51b pin ring:{start:00006000, head:0000, tail:0000} <0> [574.303565] i915_sel-6278 2.... 481825311us : __i915_request_commit: 0000:00:02.0 rcs0: fence f51b:2, current 0 <0> [574.303565] i915_sel-6278 2d..1 481825347us : __i915_request_submit: 0000:00:02.0 rcs0: fence f51b:2, current 0 <0> [574.303565] i915_sel-6278 2d..1 481825363us : trace_ports: 0000:00:02.0 rcs0: submit { f51b:2, 0:0 } <0> [574.303565] i915_sel-6278 2.... 481826809us : __intel_context_active: 0000:00:02.0 rcs0: context:f51c active <0> [574.303565] <idle>-0 7d.h2 481827326us : cs_irq_handler: 0000:00:02.0 rcs0: CS error: 1 <0> [574.303565] <idle>-0 7..s1 481827377us : process_csb: 0000:00:02.0 rcs0: cs-irq head=3, tail=4 <0> [574.303565] <idle>-0 7..s1 481827379us : process_csb: 0000:00:02.0 rcs0: csb[4]: status=0x10000001:0x00000000 <0> [574.305593] <idle>-0 7..s1 481827385us : trace_ports: 0000:00:02.0 rcs0: promote { f51b:2*, 0:0 } <0> [574.305611] <idle>-0 7..s1 481828179us : execlists_reset: 0000:00:02.0 rcs0: reset for CS error <0> [574.305611] i915_sel-6278 2.... 481828284us : __intel_context_do_pin: 0000:00:02.0 rcs0: context:f51c pin ring:{start:00007000, head:0000, tail:0000} <0> [574.305611] i915_sel-6278 2.... 481828345us : __i915_request_commit: 0000:00:02.0 rcs0: fence f51c:2, current 0 <0> [574.305611] <idle>-0 7dNs2 481847823us : __i915_request_unsubmit: 0000:00:02.0 rcs0: fence f51b:2, current 1 <0> [574.305611] <idle>-0 7dNs2 481847857us : execlists_hold: 0000:00:02.0 rcs0: fence f51b:2, current 1 on hold <0> [574.305611] <idle>-0 7.Ns1 481847863us : intel_engine_reset: 0000:00:02.0 rcs0: flags=4 <0> [574.305611] <idle>-0 7.Ns1 481847945us : execlists_reset_prepare: 0000:00:02.0 rcs0: depth<-1 <0> [574.305611] <idle>-0 7.Ns1 481847946us : intel_engine_stop_cs: 0000:00:02.0 rcs0: <0> [574.305611] <idle>-0 7.Ns1 538584284us : intel_engine_stop_cs: 0000:00:02.0 rcs0: timed out on STOP_RING -> IDLE <0> [574.305611] <idle>-0 7.Ns1 538584347us : __intel_gt_reset: 0000:00:02.0 engine_mask=1 <0> [574.305611] <idle>-0 7.Ns1 538584406us : execlists_reset_rewind: 0000:00:02.0 rcs0: <0> [574.305611] <idle>-0 7dNs2 538585050us : __i915_request_reset: 0000:00:02.0 rcs0: fence f51b:2, current 1 guilty? yes <0> [574.305611] <idle>-0 7dNs2 538585063us : __execlists_reset: 0000:00:02.0 rcs0: replay {head:0000, tail:0068} <0> [574.306565] <idle>-0 7.Ns1 538588457us : intel_engine_cancel_stop_cs: 0000:00:02.0 rcs0: <0> [574.306565] <idle>-0 7dNs2 538588462us : __i915_request_submit: 0000:00:02.0 rcs0: fence f51c:2, current 0 <0> [574.306565] <idle>-0 7dNs2 538588471us : trace_ports: 0000:00:02.0 rcs0: submit { f51c:2, 0:0 } <0> [574.306565] <idle>-0 7.Ns1 538588474us : execlists_reset_finish: 0000:00:02.0 rcs0: depth->1 <0> [574.306565] kworker/-202 2.... 538588755us : i915_request_retire: 0000:00:02.0 rcs0: fence f51c:2, current 2 <0> [574.306565] ksoftirq-46 7..s. 538588773us : process_csb: 0000:00:02.0 rcs0: cs-irq head=11, tail=1 <0> [574.306565] ksoftirq-46 7..s. 538588774us : process_csb: 0000:00:02.0 rcs0: csb[0]: status=0x10000001:0x00000000 <0> [574.306565] ksoftirq-46 7..s. 538588776us : trace_ports: 0000:00:02.0 rcs0: promote { f51c:2!, 0:0 } <0> [574.306565] ksoftirq-46 7..s. 538588778us : process_csb: 0000:00:02.0 rcs0: csb[1]: status=0x10000018:0x00000020 <0> [574.306565] ksoftirq-46 7..s. 538588779us : trace_ports: 0000:00:02.0 rcs0: completed { f51c:2!, 0:0 } <0> [574.306565] kworker/-202 2.... 538588826us : intel_context_unpin: 0000:00:02.0 rcs0: context:f51c unpin <0> [574.306565] i915_sel-6278 6.... 538589663us : __intel_gt_set_wedged.part.32: 0000:00:02.0 start <0> [574.306565] i915_sel-6278 6.... 538589667us : execlists_reset_prepare: 0000:00:02.0 rcs0: depth<-0 <0> [574.306565] i915_sel-6278 6.... 538589710us : intel_engine_stop_cs: 0000:00:02.0 rcs0: <0> [574.306565] i915_sel-6278 6.... 538589732us : execlists_reset_prepare: 0000:00:02.0 bcs0: depth<-0 <0> [574.307591] i915_sel-6278 6.... 538589733us : intel_engine_stop_cs: 0000:00:02.0 bcs0: <0> [574.307591] i915_sel-6278 6.... 538589757us : execlists_reset_prepare: 0000:00:02.0 vcs0: depth<-0 <0> [574.307591] i915_sel-6278 6.... 538589758us : intel_engine_stop_cs: 0000:00:02.0 vcs0: <0> [574.307591] i915_sel-6278 6.... 538589771us : execlists_reset_prepare: 0000:00:02.0 vcs1: depth<-0 <0> [574.307591] i915_sel-6278 6.... 538589772us : intel_engine_stop_cs: 0000:00:02.0 vcs1: <0> [574.307591] i915_sel-6278 6.... 538589778us : execlists_reset_prepare: 0000:00:02.0 vecs0: depth<-0 <0> [574.307591] i915_sel-6278 6.... 538589780us : intel_engine_stop_cs: 0000:00:02.0 vecs0: <0> [574.307591] i915_sel-6278 6.... 538589786us : __intel_gt_reset: 0000:00:02.0 engine_mask=ff <0> [574.307591] i915_sel-6278 6.... 538591175us : execlists_reset_cancel: 0000:00:02.0 rcs0: <0> [574.307591] i915_sel-6278 6.... 538591970us : execlists_reset_cancel: 0000:00:02.0 bcs0: <0> [574.307591] i915_sel-6278 6.... 538591982us : execlists_reset_cancel: 0000:00:02.0 vcs0: <0> [574.307591] i915_sel-6278 6.... 538591996us : execlists_reset_cancel: 0000:00:02.0 vcs1: <0> [574.307591] i915_sel-6278 6.... 538592759us : execlists_reset_cancel: 0000:00:02.0 vecs0: <0> [574.307591] i915_sel-6278 6.... 538592977us : execlists_reset_finish: 0000:00:02.0 rcs0: depth->0 <0> [574.307591] i915_sel-6278 6.N.. 538592996us : execlists_reset_finish: 0000:00:02.0 bcs0: depth->0 <0> [574.307591] i915_sel-6278 6.N.. 538593023us : execlists_reset_finish: 0000:00:02.0 vcs0: depth->0 <0> [574.307591] i915_sel-6278 6.N.. 538593037us : execlists_reset_finish: 0000:00:02.0 vcs1: depth->0 <0> [574.307591] i915_sel-6278 6.N.. 538593051us : execlists_reset_finish: 0000:00:02.0 vecs0: depth->0 <0> [574.307591] i915_sel-6278 6.... 538593407us : __intel_gt_set_wedged.part.32: 0000:00:02.0 end <0> [574.307591] kworker/-210 7d..1 551958381us : execlists_unhold: 0000:00:02.0 rcs0: fence f51b:2, current 2 hold release <0> [574.307591] i915_sel-6278 0.... 559490788us : i915_request_retire: 0000:00:02.0 rcs0: fence f51b:2, current 2 <0> [574.307591] i915_sel-6278 0.... 559490793us : intel_context_unpin: 0000:00:02.0 rcs0: context:f51b unpin <0> [574.307591] i915_sel-6278 0.... 559490798us : __engine_park: 0000:00:02.0 rcs0: parked <0> [574.307591] i915_sel-6278 0.... 559490982us : __intel_context_retire: 0000:00:02.0 rcs0: context:f51c retire runtime: { total:30004ns, avg:30004ns } <0> [574.307591] i915_sel-6278 0.... 559491372us : __engine_park: __engine_park:261 GEM_BUG_ON(engine->execlists.queue_priority_hint != (-((int)(~0U >> 1)) - 1)) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-9-chris@chris-wilson.co.uk	2020-02-28 15:48:10 +00:00
Chris Wilson	280e285dc7	drm/i915/selftests: Be a little more lenient for reset workers Give the reset worker a kick before losing help when waiting for hang recovery, as the CPU scheduler is a little unreliable. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-15-chris@chris-wilson.co.uk	2020-02-28 15:45:42 +00:00
Chris Wilson	b0158b9132	drm/i915/selftests: Wait for the context switch As we require a context switch to ensure that the current context is switched out and saved to memory, perform an explicit switch to the kernel context and wait for it. Closes: https://gitlab.freedesktop.org/drm/intel/issues/1336 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200228082330.2411941-18-chris@chris-wilson.co.uk	2020-02-28 15:18:55 +00:00
Chris Wilson	24eba7a998	drm/i915/selftests: Check recovery from corrupted LRC Check that we can recover if the LRC is totally corrupted. Based on a very simple theory that anything that can be adjusted via the context (i.e. on behalf of the user), should be under the purview of the per-engine-reset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-13-chris@chris-wilson.co.uk	2020-02-28 13:04:14 +00:00
Chris Wilson	efb69b9832	drm/i915/selftests: Verify LRC isolation Record the LRC registers before/after a preemption event to ensure that the first context sees nothing from the second client; at least in the normal per-context register state. References: https://gitlab.freedesktop.org/drm/intel/issues/1233 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-12-chris@chris-wilson.co.uk	2020-02-28 13:01:14 +00:00

1 2 3 4 5 ...

957 Commits