linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-24 09:25:45 +07:00

Author	SHA1	Message	Date
Colin Ian King	1805ec67b1	drm/i915/selftests: fix uninitialized variable sum when summing up values Currently the variable sum is not uninitialized and hence will cause an incorrect result in the summation values. Fix this by initializing sum to the first item in the summation. Addresses-Coverity: ("Uninitialized scalar variable") Fixes: `3c7a44bbbf` ("drm/i915/selftests: Perform some basic cycle counting of MI ops") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191210143205.338308-1-colin.king@canonical.com	2019-12-10 14:35:24 +00:00
Chris Wilson	82c69bf586	drm/i915/gt: Detect if we miss WaIdleLiteRestore In order to avoid confusing the HW, we must never submit an empty ring during lite-restore, that is we should always advance the RING_TAIL before submitting to stay ahead of the RING_HEAD. Normally this is prevented by keeping a couple of spare NOPs in the request->wa_tail so that on resubmission we can advance the tail. This relies on the request only being resubmitted once, which is the normal condition as it is seen once for ELSP[1] and then later in ELSP[0]. On preemption, the requests are unwound and the tail reset back to the normal end point (as we know the request is incomplete and therefore its RING_HEAD is even earlier). However, if this w/a should fail we would try and resubmit the request with the RING_TAIL already set to the location of this request's wa_tail potentially causing a GPU hang. We can spot when we do try and incorrectly resubmit without advancing the RING_TAIL and spare any embarrassment by forcing the context restore. In the case of preempt-to-busy, we leave the requests running on the HW while we unwind. As the ring is still live, we cannot rewind our rq->tail without forcing a reload so leave it set to rq->wa_tail and only force a reload if we resubmit after a lite-restore. (Normally, the forced reload will be a part of the preemption event.) Fixes: `22b7a426bb` ("drm/i915/execlists: Preempt-to-busy") Closes: https://gitlab.freedesktop.org/drm/intel/issues/673 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: stable@kernel.vger.org Link: https://patchwork.freedesktop.org/patch/msgid/20191209023215.3519970-1-chris@chris-wilson.co.uk	2019-12-10 10:08:07 +00:00
Daniele Ceraolo Spurio	3c9abe886a	drm/i915/guc: kill the GuC client We now only use 1 client without any plan to add more. The client is also only holding information about the WQ and the process desc, so we can just move those in the intel_guc structure and always use stage_id 0. v2: fix comment (John) v3: fix the comment for real, fix kerneldoc Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205220243.27403-4-daniele.ceraolospurio@intel.com	2019-12-09 13:55:50 -08:00
Daniele Ceraolo Spurio	e9362e1336	drm/i915/guc: kill doorbell code and selftests Instead of relying on the workqueue, the upcoming reworked GuC submission flow will offer the host driver indipendent control over the execution status of each context submitted to GuC. As part of this, the doorbell usage model has been reworked, with each doorbell being paired to a single lrc and a doorbell ring representing new work available for that specific context. This mechanism, however, limits the number of contexts that can be registered with GuC to the number of doorbells, which is an undesired limitation. To avoid this limitation, we requested the GuC team to also provide a H2G that will allow the host to notify the GuC of work available for a specified lrc, so we can use that mechanism instead of relying on the doorbells. We can therefore drop the doorbell code we currently have, also given the fact that in the unlikely case we'd want to switch back to using doorbells we'd have to heavily rework it. The workqueue will still have a use in the new interface to pass special commands, so that code has been retained for now. With the doorbells gone and the GuC client becoming even simpler, the existing GuC selftests don't give us any meaningful coverage so we can remove them as well. Some selftests might come with the new code, but they will look different from what we have now so if doesn't seem worth it to keep the file around in the meantime. v2: fix comments and commit message (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205220243.27403-3-daniele.ceraolospurio@intel.com	2019-12-09 13:55:44 -08:00
Daniele Ceraolo Spurio	18c094b304	drm/i915/guc: add a helper to allocate and map guc vma We already have a couple of use-cases in the code and another one will come in one of the later patches in the series. v2: use the new function for the CT object as well Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v1 Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205220243.27403-2-daniele.ceraolospurio@intel.com	2019-12-09 13:54:33 -08:00
Daniele Ceraolo Spurio	d54dc6eede	drm/i915/guc: Drop leftover preemption code Remove unused enums and ctx_save_restore_disabled() function, leftover from the legacy preemption removal. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191205220243.27403-1-daniele.ceraolospurio@intel.com	2019-12-09 13:54:28 -08:00
Chris Wilson	d634110007	drm/i915/gt: Turn vm off then on again for gen7 mm switch "Have you tried switching it off and on again?" Set the size of the mm to 0 to disable all PD cachelines, before enabling the whole mm again. Let's see if that tricks the TLB into reloading. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191208143648.2986669-1-chris@chris-wilson.co.uk	2019-12-08 15:19:02 +00:00
Matthew Brost	a22198a934	drm/i915/guc: Update uncore access path in flush_ggtt_writes The preferred way to access the uncore is through the GT structure. Update the GuC function, flush_ggtt_writes, to use this path. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <john.c.harrison@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191207010033.24667-1-John.C.Harrison@Intel.com	2019-12-07 16:56:13 +00:00
Andi Shyti	795a4aea63	drm/i915/gt: Replace I915_WRITE with its uncore counterpart Get rid of the last remaining I915_WRITEs and replace them with intel_uncore_write(). Signed-off-by: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191206212417.20178-1-andi@etezian.org	2019-12-06 23:22:06 +00:00
Chris Wilson	045d1fb796	drm/i915/gt: Acquire a GT wakeref for the breadcrumb interrupt Take a wakeref on the intel_gt specifically for the enabled breadcrumb interrupt so that we can safely process the mmio. If the intel_gt is already asleep by the time we try and setup the breadcrumb interrupt, by a process of elimination we know the request must have been completed and we can skip its enablement! <4> [1518.350005] Unclaimed write to register 0x220a8 <4> [1518.350323] WARNING: CPU: 2 PID: 3685 at drivers/gpu/drm/i915/intel_uncore.c:1163 __unclaimed_reg_debug+0x40/0x50 [i915] <4> [1518.350393] Modules linked in: vgem snd_hda_codec_hdmi x86_pkg_temp_thermal i915 coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core btusb cdc_ether btrtl usbnet btbcm btintel r8152 snd_pcm mii bluetooth ecdh_generic ecc i2c_hid pinctrl_sunrisepoint pinctrl_intel intel_lpss_pci prime_numbers [last unloaded: vgem] <4> [1518.350646] CPU: 2 PID: 3685 Comm: gem_exec_parse_ Tainted: G U 5.4.0-rc8-CI-CI_DRM_7490+ #1 <4> [1518.350708] Hardware name: Google Caroline/Caroline, BIOS MrChromebox 08/27/2018 <4> [1518.350946] RIP: 0010:__unclaimed_reg_debug+0x40/0x50 [i915] <4> [1518.350992] Code: 74 05 5b 5d 41 5c c3 45 84 e4 48 c7 c0 95 8d 47 a0 48 c7 c6 8b 8d 47 a0 48 0f 44 f0 89 ea 48 c7 c7 9e 8d 47 a0 e8 40 45 e3 e0 <0f> 0b 83 2d 27 4f 2a 00 01 5b 5d 41 5c c3 66 90 41 55 41 54 55 53 <4> [1518.351100] RSP: 0018:ffffc900007f39c8 EFLAGS: 00010086 <4> [1518.351140] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006 <4> [1518.351202] RDX: 0000000080000006 RSI: 0000000000000000 RDI: 00000000ffffffff <4> [1518.351249] RBP: 00000000000220a8 R08: 0000000000000000 R09: 0000000000000000 <4> [1518.351296] R10: ffffc900007f3990 R11: ffffc900007f3868 R12: 0000000000000000 <4> [1518.351342] R13: 00000000fefeffff R14: 0000000000000092 R15: ffff888155fea000 <4> [1518.351391] FS: 00007fc255abfe40(0000) GS:ffff88817ab00000(0000) knlGS:0000000000000000 <4> [1518.351445] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [1518.351485] CR2: 00007fc2554882d0 CR3: 0000000168ca2005 CR4: 00000000003606e0 <4> [1518.351529] Call Trace: <4> [1518.351746] fwtable_write32+0x114/0x1d0 [i915] <4> [1518.351795] ? sync_file_alloc+0x80/0x80 <4> [1518.352039] gen8_logical_ring_enable_irq+0x30/0x50 [i915] <4> [1518.352295] irq_enable.part.10+0x23/0x40 [i915] <4> [1518.352523] i915_request_enable_breadcrumb+0xb5/0x330 [i915] <4> [1518.352575] ? sync_file_alloc+0x80/0x80 <4> [1518.352612] __dma_fence_enable_signaling+0x60/0x160 <4> [1518.352653] ? sync_file_alloc+0x80/0x80 <4> [1518.352685] dma_fence_add_callback+0x44/0xd0 <4> [1518.352726] sync_file_poll+0x95/0xc0 <4> [1518.352767] do_sys_poll+0x24d/0x570 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205215842.862750-1-chris@chris-wilson.co.uk	2019-12-06 11:27:53 +00:00
Andi Shyti	92c964ca3e	drm/i915/gt: Replace I915_READ with intel_uncore_read Get rid of the last remaining I915_READ in gt/ and make gt-land the first I915_READ-free happy island. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191205164422.727968-1-chris@chris-wilson.co.uk	2019-12-05 18:37:50 +00:00
Chris Wilson	6f7ac82853	drm/i915/gt: Save irqstate around virtual_context_destroy As virtual_context_destroy() may be called from a request signal, it may be called from inside an irq-off section, and so we need to do a full save/restore of the irq state rather than blindly re-enable irqs upon unlocking. <4> [110.024262] WARNING: inconsistent lock state <4> [110.024277] 5.4.0-rc8-CI-CI_DRM_7489+ #1 Tainted: G U <4> [110.024292] -------------------------------- <4> [110.024305] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage. <4> [110.024323] kworker/0:0/5 [HC0[0]:SC0[0]:HE1:SE1] takes: <4> [110.024338] ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915] <4> [110.024592] {IN-HARDIRQ-W} state was registered at: <4> [110.024612] lock_acquire+0xa7/0x1c0 <4> [110.024627] _raw_spin_lock_irqsave+0x33/0x50 <4> [110.024788] intel_engine_breadcrumbs_irq+0x38c/0x600 [i915] <4> [110.024808] irq_work_run_list+0x49/0x70 <4> [110.024824] irq_work_run+0x26/0x50 <4> [110.024839] smp_irq_work_interrupt+0x44/0x1e0 <4> [110.024855] irq_work_interrupt+0xf/0x20 <4> [110.024871] __do_softirq+0xb7/0x47f <4> [110.024885] irq_exit+0xba/0xc0 <4> [110.024898] do_IRQ+0x83/0x160 <4> [110.024910] ret_from_intr+0x0/0x1d <4> [110.024922] irq event stamp: 172864 <4> [110.024938] hardirqs last enabled at (172863): [<ffffffff819ea214>] _raw_spin_unlock_irq+0x24/0x50 <4> [110.024963] hardirqs last disabled at (172864): [<ffffffff819e9fba>] _raw_spin_lock_irq+0xa/0x40 <4> [110.024988] softirqs last enabled at (172812): [<ffffffff81c00385>] __do_softirq+0x385/0x47f <4> [110.025012] softirqs last disabled at (172797): [<ffffffff810b829a>] irq_exit+0xba/0xc0 <4> [110.025031] other info that might help us debug this: <4> [110.025049] Possible unsafe locking scenario: <4> [110.025065] CPU0 <4> [110.025075] ---- <4> [110.025084] lock(&(&rq->lock)->rlock); <4> [110.025099] <Interrupt> <4> [110.025109] lock(&(&rq->lock)->rlock); <4> [110.025124] * DEADLOCK * <4> [110.025144] 4 locks held by kworker/0:0/5: <4> [110.025156] #0: ffff88827588f528 ((wq_completion)events){+.+.}, at: process_one_work+0x1de/0x620 <4> [110.025187] #1: ffffc9000006fe78 ((work_completion)(&engine->retire_work)){+.+.}, at: process_one_work+0x1de/0x620 <4> [110.025219] #2: ffff88825605e270 (&kernel#2){+.+.}, at: engine_retire+0x57/0xe0 [i915] <4> [110.025405] #3: ffff88826a0c7a18 (&(&rq->lock)->rlock){?.-.}, at: i915_request_retire+0x221/0x930 [i915] <4> [110.025634] stack backtrace: <4> [110.025653] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G U 5.4.0-rc8-CI-CI_DRM_7489+ #1 <4> [110.025675] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017 <4> [110.025856] Workqueue: events engine_retire [i915] <4> [110.025872] Call Trace: <4> [110.025891] dump_stack+0x71/0x9b <4> [110.025907] mark_lock+0x49a/0x500 <4> [110.025926] ? print_shortest_lock_dependencies+0x200/0x200 <4> [110.025946] mark_held_locks+0x49/0x70 <4> [110.025962] ? _raw_spin_unlock_irq+0x24/0x50 <4> [110.025978] lockdep_hardirqs_on+0xa2/0x1c0 <4> [110.025995] _raw_spin_unlock_irq+0x24/0x50 <4> [110.026171] virtual_context_destroy+0xc5/0x2e0 [i915] <4> [110.026376] __active_retire+0xb4/0x290 [i915] <4> [110.026396] dma_fence_signal_locked+0x9e/0x1b0 <4> [110.026613] i915_request_retire+0x451/0x930 [i915] <4> [110.026766] retire_requests+0x4d/0x60 [i915] <4> [110.026919] engine_retire+0x63/0xe0 [i915] Fixes: `b1e3177bd1` ("drm/i915: Coordinate i915_active with its own mutex") Fixes: `6d06779e86` ("drm/i915: Load balancing across a virtual engine") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205145934.663183-1-chris@chris-wilson.co.uk	2019-12-05 17:27:55 +00:00
Chris Wilson	0471a44871	drm/i915/gt: Bump the PP_DIR invalidation for Baytrail Invalidate the ring TLB and increase the delay required for Baytrail. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205113726.413351-3-chris@chris-wilson.co.uk	2019-12-05 13:51:06 +00:00
Chris Wilson	ccd2094559	drm/i915: Try hard to bind the context It is not acceptable for context pinning to fail with -ENOSPC as we should always be able to make space in the GGTT. The only reason we may fail is that other "temporary" context pins are reserving their space and we need to wait for an available slot. Closes: https://gitlab.freedesktop.org/drm/intel/issues/676 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191205113726.413351-2-chris@chris-wilson.co.uk	2019-12-05 13:50:54 +00:00
Abdiel Janulgue	cc662126b4	drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET This is really just an alias of mmap_gtt. The 'mmap offset' nomenclature comes from the value returned by this ioctl which is the offset into the device fd which userpace uses with mmap(2). mmap_gtt was our initial mmap_offset implementation, this extends our CPU mmap support to allow additional fault handlers that depends on the object's backing pages. Note that we multiplex mmap_gtt and mmap_offset through the same ioctl, and use the zero extending behaviour of drm to differentiate between them, when we inspect the flags. To support multiple mmap types on an object we need to support multiple mmap_offsets for an object (each offset in the global device address space corresponding to a unique instance of the object for a file + mmap type). As we drop the simplified drm core idea of a single mmap_offset, we need to provide replacement hooks for the dumb mmap interface as well. Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1675 Testcase: igt/gem_mmap_offset Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191204120032.3682839-1-chris@chris-wilson.co.uk	2019-12-04 15:11:44 +00:00
Chris Wilson	13bb5b99ec	drm/i915/gt: Set the PD again for Haswell And Haswell still occasionally forgets it is meant to be using a new page directory, so repeat ourselves a little louder. <7> [509.919864] heartbeat rcs0 heartbeat {prio:-2147483645} not ticking <7> [509.919895] heartbeat Awake? 8 <7> [509.919903] heartbeat Barriers?: no <7> [509.919912] heartbeat Heartbeat: 3008 ms ago <7> [509.919930] heartbeat Reset count: 0 (global 0) <7> [509.919937] heartbeat Requests: <7> [509.921008] heartbeat active a7eb:56e1* @ 5847ms: <7> [509.921157] heartbeat ring->start: 0x00001000 <7> [509.921164] heartbeat ring->head: 0x00001610 <7> [509.921170] heartbeat ring->tail: 0x000023d8 <7> [509.921176] heartbeat ring->emit: 0x000023d8 <7> [509.921182] heartbeat ring->space: 0x00002570 <7> [509.921189] heartbeat ring->hwsp: 0x7fffe100 <7> [509.921197] heartbeat [head 1628, postfix 1738, tail 1750, batch 0xffffffff_ffffffff]: <7> [509.921289] heartbeat [0000] 7a000002 00100002 00000000 00000000 7a000002 01154c1e 7ffff080 00000000 <7> [509.921299] heartbeat [0020] 11000001 00002220 ffffffff 12400001 00002220 7ffff000 00000000 11000001 <7> [509.921308] heartbeat [0040] 00002228 6e900000 7a000002 00100002 00000000 00000000 7a000002 01154c1e <7> [509.921317] heartbeat [0060] 7ffff080 00000000 12400001 00002228 7ffff000 00000000 7a000002 00100002 <7> [509.921326] heartbeat [0080] 00000000 00000000 7a000002 01154c1e 7ffff080 00000000 7a000002 001010a1 <7> [509.921335] heartbeat [00a0] 7ffff080 00000000 04000000 11000005 00022050 00010001 00012050 00010001 <7> [509.921345] heartbeat [00c0] 0001a050 00010001 00000000 0c000000 459a110c 00000000 11000005 00022050 <7> [509.921354] heartbeat [00e0] 00010000 00012050 00010000 0001a050 00010000 12400001 0001a050 7ffff000 <7> [509.921363] heartbeat [0100] 00000000 04000001 18802100 00000000 7a000002 011050a1 7fffe100 000056e1 <7> [509.921370] heartbeat [0120] 01000000 00000000 <7> [509.921538] heartbeat MMIO base: 0x00002000 <7> [509.921682] heartbeat CCID: 0x3fa0110d <7> [509.922342] heartbeat RING_START: 0x00001000 <7> [509.922353] heartbeat RING_HEAD: 0x00001628 <7> [509.922366] heartbeat RING_TAIL: 0x000023d8 <7> [509.922381] heartbeat RING_CTL: 0x00003001 <7> [509.922396] heartbeat RING_MODE: 0x00004000 <7> [509.922408] heartbeat RING_IMR: ffffffde <7> [509.922421] heartbeat ACTHD: 0x00000000_30e01628 <7> [509.922434] heartbeat BBADDR: 0x00000000_00004004 <7> [509.922446] heartbeat DMA_FADDR: 0x00000000_00002800 <7> [509.922458] heartbeat IPEIR: 0x00000000 <7> [509.922470] heartbeat IPEHR: 0x780c0000 <7> [509.922642] heartbeat PP_DIR_BASE: 0x6e700000 <7> [509.922652] heartbeat PP_DIR_BASE_READ: 0x00000000 <7> [509.922662] heartbeat PP_DIR_DCLV: 0xffffffff <7> [509.922678] heartbeat E a7eb:56e1* @ 5849ms: <7> [509.922689] heartbeat E a7eb:56e2- @ 5849ms: <7> [509.922698] heartbeat E a7eb:56e3 @ 5848ms: <7> [509.922707] heartbeat E a7eb:56e4 @ 5848ms: <7> [509.922715] heartbeat E a7eb:56e5 @ 5847ms: <7> [509.922724] heartbeat E a7eb:56e6 @ 5846ms: <7> [509.922735] heartbeat E a7eb:56e7 @ 5846ms: <7> [509.922744] heartbeat ...skipping 4 executing requests... <7> [509.922754] heartbeat E a7eb:56ec @ 3010ms: <7> [509.922796] heartbeat HWSP: <7> [509.922807] heartbeat [0000] 00000001 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [509.922817] heartbeat [0020] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [509.922826] heartbeat * <7> [509.922836] heartbeat [0100] 000056e0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [509.922845] heartbeat [0120] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <7> [509.922851] heartbeat * <7> [509.922870] heartbeat Idle? no <7> [509.922878] heartbeat Signals: <7> [509.923000] heartbeat [a7eb:56e2] @ 5850ms Here, we have a failed context restore after the PD switch, but note that the PP_DIR_BASE register does not match the LRI in the ring. Bump it to 8^W 4 loops, and with that Baytrail starts passing the sanity checks. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191203211631.3167430-1-chris@chris-wilson.co.uk	2019-12-03 23:50:19 +00:00
Chris Wilson	f70de8d2ca	drm/i915/gt: Track the context validity explicitly Rather than assume if and only if the engine->default_state is not set that the context is invalid, instead track when we know the context has valid state -- either because we have copied the default_state or we have completed a context switch to save the HW state. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191203124155.3019926-1-chris@chris-wilson.co.uk	2019-12-03 16:49:31 +00:00
Chris Wilson	49e74c8f9a	drm/i915/execlists: Skip nested spinlock for validating pending Only along the submission path can we guarantee that the locked request is indeed from a foreign engine, and so the nesting of engine/rq is permissible. On the submission tasklet (process_csb()), we may find ourselves competing with the normal nesting of rq/engine, invalidating our nesting. As we only use the spinlock for debug purposes, skip the debug if we cannot acquire the spinlock for safe validation - catching 99% of the bugs is better than causing a hard lockup. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Fixes: `c95d31c3df` ("drm/i915/execlists: Lock the request while validating it during promotion") Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191203152631.3107653-2-chris@chris-wilson.co.uk	2019-12-03 15:42:28 +00:00
Chris Wilson	80aac91b27	drm/i915/execlists: Add a couple more validity checks to assert_pending() Check the pending request submission is valid: that it at least has a reference for the submission and that the request is on the active list. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191203152631.3107653-1-chris@chris-wilson.co.uk	2019-12-03 15:42:28 +00:00
Chris Wilson	42d1051130	drm/i915: Lift i915_vma_pin() out of intel_renderstate_emit() Once inside a request, inside the timeline->mutex, pinning is verboten. <4> [896.032829] ====================================================== <4> [896.032831] WARNING: possible circular locking dependency detected <4> [896.032835] 5.4.0-rc8-CI-Patchwork_15533+ #1 Tainted: G U <4> [896.032838] ------------------------------------------------------ <4> [896.032841] gem_exec_parall/3720 is trying to acquire lock: <4> [896.032844] ffff888401863270 (&kernel#2){+.+.}, at: i915_request_create+0x16/0x1c0 [i915] <4> [896.032915] but task is already holding lock: <4> [896.032917] ffff8883ec1c93c0 (&vm->mutex){+.+.}, at: i915_vma_pin+0xf3/0x11c0 [i915] <4> [896.032952] which lock already depends on the new lock. <4> [896.032954] the existing dependency chain (in reverse order) is: <4> [896.032956] -> #1 (&vm->mutex){+.+.}: <4> [896.032961] __mutex_lock+0x9a/0x9d0 <4> [896.032995] i915_vma_pin+0xf3/0x11c0 [i915] <4> [896.033033] intel_renderstate_emit+0xb9/0x9e0 [i915] <4> [896.033081] i915_gem_init+0x5a9/0xa50 [i915] <4> [896.033112] i915_driver_probe+0xb00/0x15f0 [i915] <4> [896.033144] i915_pci_probe+0x43/0x1c0 [i915] <4> [896.033149] pci_device_probe+0x9e/0x120 <4> [896.033154] really_probe+0xea/0x420 <4> [896.033158] driver_probe_device+0x10b/0x120 <4> [896.033161] device_driver_attach+0x4a/0x50 <4> [896.033164] __driver_attach+0x97/0x130 <4> [896.033168] bus_for_each_dev+0x74/0xc0 <4> [896.033171] bus_add_driver+0x142/0x220 <4> [896.033174] driver_register+0x56/0xf0 <4> [896.033178] do_one_initcall+0x58/0x2ff <4> [896.033183] do_init_module+0x56/0x1f8 <4> [896.033187] load_module+0x243e/0x29f0 <4> [896.033190] __do_sys_finit_module+0xe9/0x110 <4> [896.033194] do_syscall_64+0x4f/0x210 <4> [896.033197] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4> [896.033200] -> #0 (&kernel#2){+.+.}: <4> [896.033206] __lock_acquire+0x1328/0x15d0 <4> [896.033209] lock_acquire+0xa7/0x1c0 <4> [896.033213] __mutex_lock+0x9a/0x9d0 <4> [896.033255] i915_request_create+0x16/0x1c0 [i915] <4> [896.033287] intel_engine_flush_barriers+0x4c/0x100 [i915] <4> [896.033327] ggtt_flush+0x37/0x60 [i915] <4> [896.033366] i915_gem_evict_something+0x46b/0x5a0 [i915] <4> [896.033407] i915_gem_gtt_insert+0x21d/0x6a0 [i915] <4> [896.033449] i915_vma_pin+0xb36/0x11c0 [i915] <4> [896.033488] gen6_ppgtt_pin+0xd5/0x170 [i915] <4> [896.033523] ring_context_pin+0x2e/0xc0 [i915] <4> [896.033554] __intel_context_do_pin+0x6b/0x190 [i915] <4> [896.033591] i915_gem_do_execbuffer+0x1814/0x26c0 [i915] <4> [896.033627] i915_gem_execbuffer2_ioctl+0x11b/0x460 [i915] <4> [896.033632] drm_ioctl_kernel+0xa7/0xf0 <4> [896.033635] drm_ioctl+0x2e1/0x390 <4> [896.033638] do_vfs_ioctl+0xa0/0x6f0 <4> [896.033641] ksys_ioctl+0x35/0x60 <4> [896.033644] __x64_sys_ioctl+0x11/0x20 <4> [896.033647] do_syscall_64+0x4f/0x210 <4> [896.033650] entry_SYSCALL_64_after_hwframe+0x49/0xbe Lift the object allocation and pin prior to the request construction. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191202204316.2665847-1-chris@chris-wilson.co.uk	2019-12-03 13:23:00 +00:00
Chris Wilson	65f6d12c6b	drm/i915/gt: Simplify rc6 w/a application Quite simply we only need to check for prior corruption on enabling rc6 on module load and resume, so by hooking into the common entry points. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191202110836.2342685-2-chris@chris-wilson.co.uk	2019-12-02 21:57:22 +00:00
Chris Wilson	61e258ee33	drm/i915/gt: Use soft-rc6 for w/a protection Now that we have soft-rc6 in place, we can use that instead of the forcewake to disable rc6 while active; preferred by a few microbenchmarks. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191202110836.2342685-1-chris@chris-wilson.co.uk	2019-12-02 21:57:22 +00:00
Chris Wilson	f997056d5b	drm/i915/gt: Push the flush_pd before the set-context Move our "wait for the PD load to complete" paranoia before the MI_SET_CONTEXT just in case the context restore tries to access local addresses. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191130120503.1609483-1-chris@chris-wilson.co.uk	2019-11-30 15:25:25 +00:00
Chris Wilson	3cd6e8860e	drm/i915/gen7: Re-enable full-ppgtt for ivb & hsw After much hair pulling, resort to preallocating the ppGTT entries on init to circumvent the apparent lack of PD invalidate following the write to PP_DCLV upon switching mm between contexts (and here the same context after binding new objects). However, the details of that PP_DCLV invalidate are still unknown, and it appears we need to reload the mm twice to cover over a timing issue. Worrying. Fixes: `3dc007fe9b` ("drm/i915/gtt: Downgrade gen7 (ivb, byt, hsw) back to aliasing-ppgtt") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191129201328.1398583-1-chris@chris-wilson.co.uk	2019-11-30 09:21:12 +00:00
Chris Wilson	97c1635397	drm/i915/execlists: Ensure the tasklet is decoupled upon shutdown As we only cancel the timers asynchronously, they may still be running on another CPU as we shutdown, raising one last softirq. So be safe and make sure the tasklet is flushed before destroying the engine's memory. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191129172542.1222810-1-chris@chris-wilson.co.uk	2019-11-29 20:09:14 +00:00
Chris Wilson	0cb7da1062	drm/i915/selftests: Wait only on the expected barrier Wait on only the last request on the kernel_context after emitting a barrier so that we do not wait for everything in general and by doing so cause an accidental emission of the barrier! Bugzilla; https://bugs.freedesktop.org/show_bug.cgi?id=112405 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191129103455.744389-1-chris@chris-wilson.co.uk	2019-11-29 14:23:53 +00:00
Michel Thierry	ff690b2111	drm/i915/tgl: Implement Wa_1604555607 Implement Wa_1604555607 (set the DS pairing timer to 128 cycles). FF_MODE2 is part of the register state context, that's why it is implemented here. At TGL A0 stepping, FF_MODE2 register read back is broken, hence disabling the WA verification. v2: Rebased on top of the WA refactoring (Oscar) v3: Correctly add to ctx_workarounds_init (Michel) v4: uncore read is used [Tvrtko] Macros as used for MASK definition [Chris] v5: Skip the Wa_1604555607 verification [Ram] i915 ptr retrieved from engine. [Tvrtko] v6: Added wa_add as a wrapper for __wa_add [Chris] wa_add is directly called instead of new wrapper [tvrtko] BSpec: 19363 HSDES: 1604555607 Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Ramalingam C <ramlingam.c@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> [v5] Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191128021005.3350-1-ramalingam.c@intel.com	2019-11-29 11:48:20 +00:00
Chris Wilson	7983990ca9	drm/i915/selftests: Try to show where the pulse went We have a case of a mysteriously absent pulse, so dump the engine details to see if we can find out what happened to it. References: https://bugs.freedesktop.org/show_bug.cgi?id=112405 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191128102546.3857140-1-chris@chris-wilson.co.uk	2019-11-28 11:39:50 +00:00
Chris Wilson	e3f3a0f269	drm/i915/gt: Defer breadcrumb processing to after the irq handler The design of our interrupt handlers is that we ack the receipt of the interrupt first, inside the critical section where the master interrupt control is off and other cpus cannot start processing the next interrupt; and then process the interrupt events afterwards. However, Icelake introduced a whole new set of banked GT_IIR that are inherently serialised and slow to retrieve the IIR and must be processed within the critical section. We can still push our breadcrumbs out of this critical section by using our irq_worker. On bdw+, this should not make too much of a difference as we only slightly defer the breadcrumbs, but on icl+ this should make a big difference to our throughput of interrupts from concurrently executing engines. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191127115813.3345823-1-chris@chris-wilson.co.uk	2019-11-27 17:02:14 +00:00
Chris Wilson	df9f85d858	drm/i915: Serialise i915_active_fence_set() with itself The expected downside to commit `58b4c1a07a` ("drm/i915: Reduce nested prepare_remote_context() to a trylock") was that it would need to return -EAGAIN to userspace in order to resolve potential mutex inversion. Such an unsightly round trip is unnecessary if we could atomically insert a barrier into the i915_active_fence, so make it happen. Currently, we use the timeline->mutex (or some other named outer lock) to order insertion into the i915_active_fence (and so individual nodes of i915_active). Inside __i915_active_fence_set, we only need then serialise with the interrupt handler in order to claim the timeline for ourselves. However, if we remove the outer lock, we need to ensure the order is intact between not only multiple threads trying to insert themselves into the timeline, but also with the interrupt handler completing the previous occupant. We use xchg() on insert so that we have an ordered sequence of insertions (and each caller knows the previous fence on which to wait, preserving the chain of all fences in the timeline), but we then have to cmpxchg() in the interrupt handler to avoid overwriting the new occupant. The only nasty side-effect is having to temporarily strip off the RCU-annotations to apply the atomic operations, otherwise the rules are much more conventional! Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112402 Fixes: `58b4c1a07a` ("drm/i915: Reduce nested prepare_remote_context() to a trylock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191127134527.3438410-1-chris@chris-wilson.co.uk	2019-11-27 17:02:14 +00:00
Chris Wilson	730eaeb524	drm/i915/gt: Manual rc6 entry upon parking Now that we rapidly park the GT when the GPU idles, we often find ourselves idling faster than the RC6 promotion timer. Thus if we tell the GPU to enter RC6 manually as we park, we can do so quicker (by around 50ms, half an EI on average) and marginally increase our powersaving across all execlists platforms. v2: Now with a selftest to check we can enter RC6 manually Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Imre Deak <imre.deak@intel.com> Acked-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191127095657.3209854-1-chris@chris-wilson.co.uk	2019-11-27 12:53:27 +00:00
Chris Wilson	58b4c1a07a	drm/i915: Reduce nested prepare_remote_context() to a trylock On context retiring, we may invoke the kernel_context to unpin this context. Elsewhere, we may use the kernel_context to modify this context. This currently leads to an AB-BA lock inversion, so we need to back-off from the contended lock, and repeat. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111732 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Fixes: `a9877da2d6` ("drm/i915/oa: Reconfigure contexts on the fly") Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191126065521.2331017-1-chris@chris-wilson.co.uk	2019-11-26 12:45:45 +00:00
Chris Wilson	4f88f8747f	drm/i915/gt: Schedule request retirement when timeline idles The major drawback of commit `7e34f4e4aa` ("drm/i915/gen8+: Add RC6 CTX corruption WA") is that it disables RC6 while Skylake (and friends) is active, and we do not consider the GPU idle until all outstanding requests have been retired and the engine switched over to the kernel context. If userspace is idle, this task falls onto our background idle worker, which only runs roughly once a second, meaning that userspace has to have been idle for a couple of seconds before we enable RC6 again. Naturally, this causes us to consume considerably more energy than before as powersaving is effectively disabled while a display server (here's looking at you Xorg) is running. As execlists will get a completion event as each context is completed, we can use this interrupt to queue a retire worker bound to this engine to cleanup idle timelines. We will then immediately notice the idle engine (without userspace intervention or the aid of the background retire worker) and start parking the GPU. Thus during light workloads, we will do much more work to idle the GPU faster... Hopefully with commensurate power saving! v2: Watch context completions and only look at those local to the engine when retiring to reduce the amount of excess work we perform. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112315 References: `7e34f4e4aa` ("drm/i915/gen8+: Add RC6 CTX corruption WA") References: `2248a28384` ("drm/i915/gen8+: Add RC6 CTX corruption WA") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-3-chris@chris-wilson.co.uk	2019-11-25 13:17:18 +00:00
Chris Wilson	88a4655e75	drm/i915/gt: Adapt engine_park synchronisation rules for engine_retire In the next patch, we will introduce a new asynchronous retirement worker, fed by execlists CS events. Here we may queue a retirement as soon as a request is submitted to HW (and completes instantly), and we also want to process that retirement as early as possible and cannot afford to postpone (as there may not be another opportunity to retire it for a few seconds). To allow the new async retirer to run in parallel with our submission, pull the __i915_request_queue (that passes the request to HW) inside the timelines spinlock so that the retirement cannot release the timeline before we have completed the submission. v2: Actually to play nicely with engine_retire, we have to raise the timeline.active_lock before releasing the HW. intel_gt_retire_requsts() is still serialised by the outer lock so they cannot see this intermediate state, and engine_retire is serialised by HW submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-2-chris@chris-wilson.co.uk	2019-11-25 13:17:18 +00:00
Chris Wilson	de5825beae	drm/i915: Serialise with engine-pm around requests on the kernel_context As the engine->kernel_context is used within the engine-pm barrier, we have to be careful when emitting requests outside of the barrier, as the strict timeline locking rules do not apply. Instead, we must ensure the engine_park() cannot be entered as we build the request, which is simplest by taking an explicit engine-pm wakeref around the request construction. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-1-chris@chris-wilson.co.uk	2019-11-25 13:17:18 +00:00
Chris Wilson	da0ef77e1e	drm/i915/execlists: Fixup cancel_port_requests() I rushed a last minute correction to cancel_port_requests() to prevent the snooping of *execlists->active as the inflight array was being updated, without noticing we iterated the inflight array starting from active! Oops. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112387 Fixes: `331bf90591` ("drm/i915/gt: Mark the execlists->active as the primary volatile access") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125112520.1760492-1-chris@chris-wilson.co.uk	2019-11-25 13:17:18 +00:00
Chris Wilson	331bf90591	drm/i915/gt: Mark the execlists->active as the primary volatile access Since we want to do a lockless read of the current active request, and that request is written to by process_csb also without serialisation, we need to instruct gcc to take care in reading the pointer itself. Otherwise, we have observed execlists_active() to report 0x40. [ 2400.760381] igt/para-4098 1..s. 2376479300us : process_csb: rcs0 cs-irq head=3, tail=4 [ 2400.760826] igt/para-4098 1..s. 2376479303us : process_csb: rcs0 csb[4]: status=0x00000001:0x00000000 [ 2400.761271] igt/para-4098 1..s. 2376479306us : trace_ports: rcs0: promote { b9c59:2622, b9c55:2624 } [ 2400.761726] igt/para-4097 0d... 2376479311us : __i915_schedule: rcs0: -2147483648->3, inflight:0000000000000040, rq:ffff888208c1e940 which is impossible! The answer is that as we keep the existing execlists->active pointing into the array as we copy over that array, the unserialised read may see a partial pointer value. Fixes: `df40306902` ("drm/i915/execlists: Lift process_csb() out of the irq-off spinlock") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125094318.1630806-1-chris@chris-wilson.co.uk	2019-11-25 09:45:37 +00:00
Chris Wilson	e8e61f105a	drm/i915/selftests: Flush the active callbacks Before checking the current i915_active state for the asynchronous work we submitted, flush any ongoing callback. This ensures that our sampling is robust and does not sporadically fail due to bad timing as the work is running on another cpu. v2: Drop the fence callback sync, retiring under the lock should be good enough to synchronize with engine_retire() and the intel_gt_retire_requests() background worker. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191122132404.690440-1-chris@chris-wilson.co.uk	2019-11-22 17:54:12 +00:00
Chris Wilson	cfd821b243	drm/i915/selftests: Force bonded submission to overlap Bonded request submission is designed to allow requests to execute in parallel as laid out by the user. If the master request is already finished before its bonded pair is submitted, the pair were not destined to run in parallel and we lose the information about the master engine to dictate selection of the secondary. If the second request was required to be run on a particular engine in a virtual set, that should have been specified, rather than left to the whims of a random unconnected requests! In the selftest, I made the mistake of not ensuring the master would overlap with its bonded pairs, meaning that it could indeed complete before we submitted the bonds. Those bonds were then free to select any available engine in their virtual set, and not the one expected by the test. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191122112152.660743-1-chris@chris-wilson.co.uk	2019-11-22 13:06:36 +00:00
Chris Wilson	1ff2f9e26c	drm/i915/selftests: Always hold a reference on a waited upon request Whenever we wait on a request, make sure we actually hold a reference to it and that it cannot be retired/freed on another CPU! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-4-chris@chris-wilson.co.uk	2019-11-21 16:14:57 +00:00
Chris Wilson	93b0e8fe47	drm/i915: Mark intel_wakeref_get() as a sleeper Assume that intel_wakeref_get() may take the mutex, and perform other sleeping actions in the course of its callbacks and so use might_sleep() to ensure that all callers abide. Anything that cannot sleep has to use e.g. intel_wakeref_get_if_active() to guarantee its avoidance of the non-atomic paths. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191121130528.309474-1-chris@chris-wilson.co.uk	2019-11-21 13:22:04 +00:00
Chris Wilson	c95d31c3df	drm/i915/execlists: Lock the request while validating it during promotion Since the request is already on the HW as we perform its validation, it and even its subsequent barrier may be concurrently retired before we process the assertions. If it is retired already and so off the HW, our assertions become void and we need to ignore them. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=112363 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191121103546.146487-1-chris@chris-wilson.co.uk	2019-11-21 12:14:45 +00:00
Chris Wilson	090a82e916	drm/i915/gt: Hold request reference while waiting for w/a verification As we wait upon a request, we must be holding a reference to it, and be wary that i915_request_add() consumes the passed in reference. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191121093326.134774-1-chris@chris-wilson.co.uk	2019-11-21 11:54:46 +00:00
Chris Wilson	689122dcc3	Revert "drm/i915/gt: Wait for new requests in intel_gt_retire_requests()" From inside an active timeline in the execbuf ioctl, we may try to reclaim some space in the GGTT. We need GGTT space for all objects on !full-ppgtt platforms, and for context images everywhere. However, to free up space in the GGTT we may need to remove some pinned objects (e.g. context images) that require flushing the idle barriers to remove. For this we use the big hammer of intel_gt_wait_for_idle() However, commit `7936a22dd4` ("drm/i915/gt: Wait for new requests in intel_gt_retire_requests()") will continue spinning on the wait if a timeline is active but lacks requests, as is the case during execbuf reservation. Spinning forever is quite time consuming, so revert that commit and start again. In practice, the effect commit `7936a22dd4` was trying to achieve is accomplished by commit `1683d24c14` ("drm/i915/gt: Move new timelines to the end of active_list"), so there is no immediate rush to replace the looping. Testcase: igt/gem_exec_reloc/basic-range Fixes: `7936a22dd4` ("drm/i915/gt: Wait for new requests in intel_gt_retire_requests()") References: `1683d24c14` ("drm/i915/gt: Move new timelines to the end of active_list") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191121071044.97798-1-chris@chris-wilson.co.uk	2019-11-21 07:59:30 +00:00
Stuart Summers	e18417b48b	drm/i915: Use intel_gt_pm_put_async in GuC submission path GuC submission path can be called from an interrupt context and so should use a worker to avoid holding a mutex. References: `07779a76ee` ("drm/i915: Mark up the calling context for intel_wakeref_put()") Signed-off-by: Stuart Summers <stuart.summers@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191120211321.88021-1-stuart.summers@intel.com	2019-11-20 21:23:10 +00:00
Chris Wilson	e435c608e8	drm/i915/gt: Fixup config ifdeffery for pm_suspend_target_state pm_suspend_target_state is declared under CONFIG_PM_SLEEP but only defined under CONFIG_SUSPEND. Play safe and only use the symbol if it is both declared and defined. Reported-by: kbuild-all@lists.01.org Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Fixes: `a70a9e998e` ("drm/i915: Defer rc6 shutdown to suspend_late") Cc: Andi Shyti <andi.shyti@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120182209.3967833-1-chris@chris-wilson.co.uk	2019-11-20 20:34:44 +00:00
Chris Wilson	88cec4973d	drm/i915/gt: Declare timeline.lock to be irq-free Now that we never allow the intel_wakeref callbacks to be invoked from interrupt context, we do not need the irqsafe spinlock for the timeline. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120170858.3965380-1-chris@chris-wilson.co.uk	2019-11-20 17:12:11 +00:00
Chris Wilson	5cba288466	drm/i915/gt: Unlock engine-pm after queuing the kernel context switch In commit `a79ca656b6` ("drm/i915: Push the wakeref->count deferral to the backend"), I erroneously concluded that we last modify the engine inside __i915_request_commit() meaning that we could enable concurrent submission for userspace as we enqueued this request. However, this falls into a trap with other users of the engine->kernel_context waking up and submitting their request before the idle-switch is queued, with the result that the kernel_context is executed out-of-sequence most likely upsetting the GPU and certainly ourselves when we try to retire the out-of-sequence requests. As such we need to hold onto the effective engine->kernel_context mutex lock (via the engine pm mutex proxy) until we have finish queuing the request to the engine. v2: Serialise against concurrent intel_gt_retire_requests() v3: Describe the hairy locking scheme with intel_gt_retire_requests() for future reference. v4: Combine timeline->lock and engine pm release; it's hairy. Fixes: `a79ca656b6` ("drm/i915: Push the wakeref->count deferral to the backend") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120165514.3955081-2-chris@chris-wilson.co.uk	2019-11-20 16:58:26 +00:00
Chris Wilson	a6edbca74b	drm/i915/gt: Close race between engine_park and intel_gt_retire_requests The general concept was that intel_timeline.active_count was locked by the intel_timeline.mutex. The exception was for power management, where the engine->kernel_context->timeline could be manipulated under the global wakeref.mutex. This was quite solid, as we always manipulated the timeline only while we held an engine wakeref. And then we started retiring requests outside of struct_mutex, only using the timelines.active_list and the timeline->mutex. There we started manipulating intel_timeline.active_count outside of an engine wakeref, and so introduced a race between __engine_park() and intel_gt_retire_requests(), a race that could result in the engine->kernel_context not being added to the active timelines and so losing requests, which caused us to keep the system permanently powered up [and unloadable]. The race would be easy to close if we could take the engine wakeref for the timeline before we retire -- except timelines are not bound to any engine and so we would need to keep all active engines awake. The alternative is to guard intel_timeline_enter/intel_timeline_exit for use outside of the timeline->mutex. Fixes: `e5dadff4b0` ("drm/i915: Protect request retirement with timeline->mutex") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120165514.3955081-1-chris@chris-wilson.co.uk	2019-11-20 16:57:33 +00:00
Chris Wilson	07779a76ee	drm/i915: Mark up the calling context for intel_wakeref_put() Previously, we assumed we could use mutex_trylock() within an atomic context, falling back to a worker if contended. However, such trickery is illegal inside interrupt context, and so we need to always use a worker under such circumstances. As we normally are in process context, we can typically use a plain mutex, and only defer to a work when we know we are being called from an interrupt path. Fixes: `51fbd8de87` ("drm/i915/pmu: Atomically acquire the gt_pm wakeref") References: `a0855d24fc` ("locking/mutex: Complain upon mutex API misuse in IRQ contexts") References: https://bugs.freedesktop.org/show_bug.cgi?id=111626 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191120125433.3767149-1-chris@chris-wilson.co.uk	2019-11-20 15:59:23 +00:00

1 2 3 4 5 ...

655 Commits