linux_dsm_epyc7002

mirror of https://github.com/AuxXxilium/linux_dsm_epyc7002.git synced 2024-12-26 08:25:01 +07:00

Author	SHA1	Message	Date
Michal Wajdeczko	78577e294b	drm/i915/guc: Rename intel_guc_is_alive to intel_guc_is_loaded This function just check our software flag, while 'is_alive' may suggest that we are checking runtime firmware status. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190522193203.23932-5-michal.wajdeczko@intel.com	2019-05-23 21:58:36 +01:00
Michal Wajdeczko	beca36ffbd	drm/i915/selftests: Use prepare/finish during atomic reset test We were testing full GPU reset in atomic context without correctly wrapping it by prepare/finish steps. This could confuse our GuC reset handling code. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190522193203.23932-4-michal.wajdeczko@intel.com	2019-05-23 21:58:36 +01:00
Michal Wajdeczko	f6470c9bcc	drm/i915/selftests: Split igt_atomic_reset testcase Split igt_atomic_reset selftests into separate full & engines parts, so we can move former to the dedicated reset selftests file. While here change engines test to loop first over atomic phases and then loop over available engines. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190522193203.23932-3-michal.wajdeczko@intel.com	2019-05-23 21:53:26 +01:00
Michal Wajdeczko	932309fb03	drm/i915/selftests: Move some reset testcases to separate file igt_global_reset and igt_wedged_reset testcases are first candidates. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190522193203.23932-2-michal.wajdeczko@intel.com	2019-05-23 21:52:26 +01:00
Chris Wilson	d3622099c7	drm/i915/gtt: Always acquire struct_mutex for gen6_ppgtt_cleanup We rearranged the vm_destroy_ioctl to avoid taking struct_mutex, little realising that buried underneath the gen6 ppgtt release path was a struct_mutex requirement (to remove its GGTT vma). Until that struct_mutex is vanquished, take a detour in gen6_ppgtt_cleanup to do the i915_vma_destroy from inside a worker under the struct_mutex. <4> [257.740160] WARN_ON(debug_locks && !lock_is_held(&(&vma->vm->i915->drm.struct_mutex)->dep_map)) <4> [257.740213] WARNING: CPU: 3 PID: 1507 at drivers/gpu/drm/i915/i915_vma.c:841 i915_vma_destroy+0x1ae/0x3a0 [i915] <4> [257.740214] Modules linked in: snd_hda_codec_hdmi i915 x86_pkg_temp_thermal mei_hdcp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core r8169 realtek snd_pcm mei_me mei prime_numbers lpc_ich <4> [257.740224] CPU: 3 PID: 1507 Comm: gem_vm_create Tainted: G U 5.2.0-rc1-CI-CI_DRM_6118+ #1 <4> [257.740225] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016 <4> [257.740249] RIP: 0010:i915_vma_destroy+0x1ae/0x3a0 [i915] <4> [257.740250] Code: 00 00 00 48 81 c7 c8 00 00 00 e8 ed 08 f0 e0 85 c0 0f 85 78 fe ff ff 48 c7 c6 e8 ec 30 a0 48 c7 c7 da 55 33 a0 e8 42 8c e9 e0 <0f> 0b 8b 83 40 01 00 00 85 c0 0f 84 63 fe ff ff 48 c7 c1 c1 58 33 <4> [257.740251] RSP: 0018:ffffc90000aafc68 EFLAGS: 00010282 <4> [257.740252] RAX: 0000000000000000 RBX: ffff8883f7957840 RCX: 0000000000000003 <4> [257.740253] RDX: 0000000000000046 RSI: 0000000000000006 RDI: ffffffff8212d1b9 <4> [257.740254] RBP: ffffc90000aafcc8 R08: 0000000000000000 R09: 0000000000000000 <4> [257.740255] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8883f4d5c2a8 <4> [257.740256] R13: ffff8883f4d5d680 R14: ffff8883f4d5c668 R15: ffff8883f4d5c2f0 <4> [257.740257] FS: 00007f777fa8fe40(0000) GS:ffff88840f780000(0000) knlGS:0000000000000000 <4> [257.740258] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [257.740259] CR2: 00007f777f6522b0 CR3: 00000003c612a006 CR4: 00000000001606e0 <4> [257.740260] Call Trace: <4> [257.740283] gen6_ppgtt_cleanup+0x25/0x60 [i915] <4> [257.740306] i915_ppgtt_release+0x102/0x290 [i915] <4> [257.740330] i915_gem_vm_destroy_ioctl+0x7c/0xa0 [i915] <4> [257.740376] ? i915_gem_vm_create_ioctl+0x160/0x160 [i915] <4> [257.740379] drm_ioctl_kernel+0x83/0xf0 <4> [257.740382] drm_ioctl+0x2f3/0x3b0 <4> [257.740422] ? i915_gem_vm_create_ioctl+0x160/0x160 [i915] <4> [257.740426] ? _raw_spin_unlock_irqrestore+0x39/0x60 <4> [257.740430] do_vfs_ioctl+0xa0/0x6e0 <4> [257.740433] ? lock_acquire+0xa6/0x1c0 <4> [257.740436] ? __task_pid_nr_ns+0xb9/0x1f0 <4> [257.740439] ksys_ioctl+0x35/0x60 <4> [257.740441] __x64_sys_ioctl+0x11/0x20 <4> [257.740443] do_syscall_64+0x55/0x1c0 <4> [257.740445] entry_SYSCALL_64_after_hwframe+0x49/0xbe References: `e0695db729` ("drm/i915: Create/destroy VM (ppGTT) for use with contexts") Fixes: `7f3f317a66` ("drm/i915: Restore control over ppgtt for context creation ABI") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190523064933.23604-1-chris@chris-wilson.co.uk	2019-05-23 21:12:12 +01:00
Jani Nikula	09a93ef3d6	drm/i915: remove duplicate typedef for intel_wakeref_t Fix the duplicate typedef for intel_wakeref_t leading to Clang build issues. While at it, actually make the intel_runtime_pm.h header self-contained, which was claimed in the commit being fixed. Reported-by: Nathan Chancellor <natechancellor@gmail.com> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> References: http://mid.mail-archive.com/20190521183850.GA9157@archlinux-epyc References: https://travis-ci.com/ClangBuiltLinux/continuous-integration/jobs/201754420#L2435 Fixes: `0d5adc5f2f` ("drm/i915: extract intel_runtime_pm.h from intel_drv.h") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Tested-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190522103505.2082-1-jani.nikula@intel.com	2019-05-23 15:46:42 +03:00
Jani Nikula	cfc0e7bbf4	drm/i915: Update DRIVER_DATE to 20190523 Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2019-05-23 11:57:24 +03:00
Gwan-gyeong Mun	47d0ccecc9	drm/i915/dp: Support DP ports YUV 4:2:0 output to GEN11 Bspec describes that GEN10 only supports capability of YUV 4:2:0 output to HDMI port and GEN11 supports capability of YUV 4:2:0 output to both DP and HDMI ports. v2: Minor style fix. Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521121721.32010-7-gwan-gyeong.mun@intel.com	2019-05-23 09:49:44 +03:00
Gwan-gyeong Mun	16668f486f	drm/i915/dp: Change a link bandwidth computation for DP Data M/N calculations were assumed a bpp as RGB format. But when we are using YCbCr 4:2:0 output format on DP, we should change bpp calculations as YCbCr 4:2:0 format. The pipe_bpp value was assumed RGB format, therefore, it was multiplied with 3. But YCbCr 4:2:0 requires a multiplier value to 1.5. Therefore we need to divide pipe_bpp to 2 while DP output uses YCbCr4:2:0 format. - RGB format bpp = bpc x 3 - YCbCr 4:2:0 format bpp = bpc x 1.5 But Link M/N values are calculated and applied based on the Full Clock for YCbCr 4:2:0. And DP YCbCr 4:2:0 does not need to pixel clock double for a dotclock caluation. Only for HDMI YCbCr 4:2:0 needs to pixel clock double for a dot clock calculation. It only affects dp and edp port which use YCbCr 4:2:0 output format. And for now, it does not consider a use case of DSC + YCbCr 4:2:0. v2: Addressed review comments from Ville. Remove a changing of pipe_bpp on intel_ddi_set_pipe_settings(). Because the pipe is running at the full bpp, keep pipe_bpp as RGB even though YCbCr 4:2:0 output format is used. Add a link bandwidth computation for YCbCr4:2:0 output format. v3: Addressed reivew comments from Ville. In order to make codes simple, it adds and uses intel_dp_output_bpp() function. v6: Link M/N values are calculated and applied based on the Full Clock for YCbCr420. The Bit per Pixel needs to be adjusted for YUV420 mode as it requires only half of the RGB case. - Link M/N values are calculated and applied based on the Full Clock - Data M/N values needs to be calculated considering the data is half due to subsampling Remove a doubling of pixel clock on a dot clock calculator for DP YCbCr 4:2:0. Rebase and remove a duplicate setting of vsc_sdp.DB17. Add a setting of dynamic range bit to vsc_sdp.DB17. Change Content Type bit to "Graphics" from "Not defined". Change a dividing of pipe_bpp to muliplying to constant values on a switch-case statement. v7: Addressed review comments from Ville. Move a setting of dynamic range bit and a setting of bpc which is based on pipe_bpp to a "drm/i915/dp: Program VSC Header and DB for Pixel Encoding/Colorimetry Format" commit. Change Content Type bit to "Not defined" from "Graphics". Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521121721.32010-6-gwan-gyeong.mun@intel.com	2019-05-23 09:49:44 +03:00
Gwan-gyeong Mun	ec4401d389	drm/i915/dp: Add a support of YCBCR 4:2:0 to DP MSA When YCBCR 4:2:0 outputs is used for DP, we should program YCBCR 4:2:0 to MSA and VSC SDP. As per DP 1.4a spec section 2.2.4.3 [MSA Field for Indication of Color Encoding Format and Content Color Gamut] while sending YCBCR 420 signals we should program MSA MISC1 fields which indicate VSC SDP for the Pixel Encoding/Colorimetry Format. v2: Block comment style fix. v6: Fix an wrong setting of MSA MISC1 fields for Pixel Encoding/Colorimetry Format indication. As per DP 1.4a spec Table 2-96 [MSA MISC1 and MISC0 Fields for Pixel Encoding/Colorimetry Format Indication] When MISC1, bit 6, is Set to 1, a Source device uses a VSC SDP to indicate the Pixel Encoding/Colorimetry Format. On the wrong version it set a bit 5 of MISC1, now it set a bit 6 of MISC1. Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521121721.32010-5-gwan-gyeong.mun@intel.com	2019-05-23 09:49:43 +03:00
Gwan-gyeong Mun	3c053a96ef	drm/i915/dp: Program VSC Header and DB for Pixel Encoding/Colorimetry Format Function intel_pixel_encoding_setup_vsc handles vsc header and data block setup for pixel encoding / colorimetry format. Setup VSC header and data block in function intel_pixel_encoding_setup_vsc for pixel encoding / colorimetry format as per dp 1.4a spec, section 2.2.5.7.1, table 2-119: VSC SDP Header Bytes, section 2.2.5.7.5, table 2-120:VSC SDP Payload for DB16 through DB18. v2: Minor style fix. [Maarten] Refer to commit ids instead of patchwork. [Maarten] v6: Rebase v7: Rebase and addressed review comments from Ville. Use a structure initializer instead of memset(). Fix non-standard comment format. Remove a referring to specific commit. Add a setting of dynamic range bit to vsc_sdp.DB17. Add a setting of bpc which is based on pipe_bpp. Remove duplicated checking of connector's ycbcr_420_allowed from intel_pixel_encoding_setup_vsc(). It is already checked from intel_dp_ycbcr420_config(). Remove comments for VSC_SDP_EXTENSION_FOR_COLORIMETRY_SUPPORTED. It is already implemented on intel_dp_get_colorimetry_status(). v8: A missing of setting bpc to VSC setup is the pretty fatal case, it replaces DRM_DEBUG_KMS() to MISSING_CASE(). [Maarten] v9: Use a changed member name of struct dp_sdp. it renamed to db from DB. Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521121721.32010-4-gwan-gyeong.mun@intel.com	2019-05-23 09:49:43 +03:00
Gwan-gyeong Mun	4d432f956d	drm: Rename struct edp_vsc_psr to struct dp_sdp VSC SDP Payload for PSR is one of data block type of SDP (Secondaray Data Packet). In order to generalize SDP packet structure name, it renames struct edp_vsc_psr to struct dp_sdp. And each SDP data blocks have different usages, each SDP type has different reserved data blocks and Video_Stream_Configuration Extension VESA SDP might use all of Data Blocks as Extended INFORFRAME Data Byte. so it makes Data Block variables as array type. And it adds comments of details of DB of VSC SDP Payload for Pixel Encoding/Colorimetry Format. This comments follows DP 1.4a spec, section 2.2.5.7.5, chapter "VSC SDP Payload for Pixel Encoding/Colorimetry Format". v7: Addressed review comments from Ville. v9: Rename a member value name DB to db on struct dp_sdp [Laurent] Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521121721.32010-3-gwan-gyeong.mun@intel.com	2019-05-23 09:49:32 +03:00
Gwan-gyeong Mun	8e9d645c68	drm/i915/dp: Add a config function for YCBCR420 outputs This patch checks a support of YCBCR420 outputs on an encoder level. If the input mode is YCBCR420-only mode then it prepares DP as an YCBCR420 output, else it continues with RGB output mode. It set output_format to INTEL_OUTPUT_FORMAT_YCBCR420 in order to using a pipe scaler as RGB to YCbCr 4:4:4. v2: Addressed review comments from Ville. Style fixed with few naming. %s/config/crtc_state/ %s/intel_crtc/crtc/ If lscon is active, it makes not to call intel_dp_ycbcr420_config() to avoid to clobber of lspcon_ycbcr420_config() routine. And it move the 420_only check into the intel_dp_ycbcr420_config(). v3: Fix uninitialized return value and it is reported by Dan Carpenter. v4: Addressed review comments from Ville. In order to avoid the extra indentation, it inverts if-clause on intel_dp_ycbcr420_config(). Remove the error print where no errors print are allowed. v6: Rebase v7: Move intel_dp_get_colorimetry_status() to intel_dp from intel_psr. intel_dp_get_colorimetry_status() checks VSC_SDP_EXTENSION_FOR_COLORIMETRY_SUPPORTED bit in the DPRX_FEATURE_ENUMERATION_LIST register. And intel_dp_ycbcr420_config() uses intel_dp_get_colorimetry_status(). Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521121721.32010-2-gwan-gyeong.mun@intel.com	2019-05-23 09:48:59 +03:00
Tvrtko Ursulin	c5d3e39caa	drm/i915: Engine discovery query Engine discovery query allows userspace to enumerate engines, probe their configuration features, all without needing to maintain the internal PCI ID based database. A new query for the generic i915 query ioctl is added named DRM_I915_QUERY_ENGINE_INFO, together with accompanying structure drm_i915_query_engine_info. The address of latter should be passed to the kernel in the query.data_ptr field, and should be large enough for the kernel to fill out all known engines as struct drm_i915_engine_info elements trailing the query. As with other queries, setting the item query length to zero allows userspace to query minimum required buffer size. Enumerated engines have common type mask which can be used to query all hardware engines, versus engines userspace can submit to using the execbuf uAPI. Engines also have capabilities which are per engine class namespace of bits describing features not present on all engine instances. v2: * Fixed HEVC assignment. * Reorder some fields, rename type to flags, increase width. (Lionel) * No need to allocate temporary storage if we do it engine by engine. (Lionel) v3: * Describe engine flags and mark mbz fields. (Lionel) * HEVC only applies to VCS. v4: * Squash SFC flag into main patch. * Tidy some comments. v5: * Add uabi_ prefix to engine capabilities. (Chris Wilson) * Report exact size of engine info array. (Chris Wilson) * Drop the engine flags. (Joonas Lahtinen) * Added some more reserved fields. * Move flags after class/instance. v6: * Do not check engine info array was zeroed by userspace but zero the unused fields for them instead. v7: * Simplify length calculation loop. (Lionel Landwerlin) v8: * Remove MBZ comments where not applicable. * Rename ABI flags to match engine class define naming. * Rename SFC ABI flag to reflect it applies to VCS and VECS. * SFC is wired to even _logical_ engine instances. * SFC applies to VCS and VECS. * HEVC is present on all instances on Gen11. (Tony) * Simplify length calculation even more. (Chris Wilson) * Move info_ptr assigment closer to loop for clarity. (Chris Wilson) * Use vdbox_sfc_access from runtime info. * Rebase for RUNTIME_INFO. * Refactor for lower indentation. * Rename uAPI class/instance to engine_class/instance to avoid C++ keyword. v9: * Rebase for s/num_rings/num_engines/ in RUNTIME_INFO. v10: * Use new copy_query_item. v11: * Consolidate with struct i915_engine_class_instnace. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jon Bloomfield <jon.bloomfield@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Tony Ye <tony.ye@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> # v7 Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190522090054.6007-1-tvrtko.ursulin@linux.intel.com	2019-05-22 14:17:55 +01:00
Tvrtko Ursulin	cbe3e1d103	drm/i915/icl: Add WaDisableBankHangMode Disable GPU hang by default on unrecoverable ECC cache errors. v2: * Rebase. v3: * Use intel_uncore_read. (Chris) Fixes: `cc38cae7c4` ("drm/i915/icl: Introduce initial Icelake Workarounds") Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190520110442.403-2-tvrtko.ursulin@linux.intel.com	2019-05-22 10:11:10 +01:00
Tvrtko Ursulin	fde938867b	drm/i915/selftests: Verify context workarounds Test context workarounds have been correctly applied in newly created contexts. To accomplish this the existing engine_wa_list_verify helper is extended to take in a context from which reading of the workaround list will be done. Context workaround verification is done from the existing subtests, which have been renamed to reflect they are no longer only about GT and engine workarounds. v2: * Test after resets and refactor to use intel_context more. (Chris) v3: * Use ce->engine->i915 instead of ce->gem_context->i915. (Chris) * gem_engine_iter.idx is engine->id + 1. (Chris) v4: * Make local function static. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190520142546.12493-1-tvrtko.ursulin@linux.intel.com	2019-05-22 10:11:09 +01:00
Chris Wilson	a88b6e4cba	drm/i915: Allow specification of parallel execbuf There is a desire to split a task onto two engines and have them run at the same time, e.g. scanline interleaving to spread the workload evenly. Through the use of the out-fence from the first execbuf, we can coordinate secondary execbuf to only become ready simultaneously with the first, so that with all things idle the second execbufs are executed in parallel with the first. The key difference here between the new EXEC_FENCE_SUBMIT and the existing EXEC_FENCE_IN is that the in-fence waits for the completion of the first request (so that all of its rendering results are visible to the second execbuf, the more common userspace fence requirement). Since we only have a single input fence slot, userspace cannot mix an in-fence and a submit-fence. It has to use one or the other! This is not such a harsh requirement, since by virtue of the submit-fence, the secondary execbuf inherit all of the dependencies from the first request, and for the application the dependencies should be common between the primary and secondary execbuf. Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Testcase: igt/gem_exec_fence/parallel Link: https://github.com/intel/media-driver/pull/546 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-10-chris@chris-wilson.co.uk	2019-05-22 08:40:50 +01:00
Chris Wilson	ee1136908e	drm/i915/execlists: Virtual engine bonding Some users require that when a master batch is executed on one particular engine, a companion batch is run simultaneously on a specific slave engine. For this purpose, we introduce virtual engine bonding, allowing maps of master:slaves to be constructed to constrain which physical engines a virtual engine may select given a fence on a master engine. For the moment, we continue to ignore the issue of preemption deferring the master request for later. Ideally, we would like to then also remove the slave and run something else rather than have it stall the pipeline. With load balancing, we should be able to move workload around it, but there is a similar stall on the master pipeline while it may wait for the slave to be executed. At the cost of more latency for the bonded request, it may be interesting to launch both on their engines in lockstep. (Bubbles abound.) Opens: Also what about bonding an engine as its own master? It doesn't break anything internally, so allow the silliness. v2: Emancipate the bonds v3: Couple in delayed scheduling for the selftests v4: Handle invalid mutually exclusive bonding v5: Mention what the uapi does v6: s/nbond/num_bonds/ Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-9-chris@chris-wilson.co.uk	2019-05-22 08:40:46 +01:00
Chris Wilson	f71e01a78b	drm/i915: Extend execution fence to support a callback In the next patch, we will want to configure the slave request depending on which physical engine the master request is executed on. For this, we introduce a callback from the execute fence to convey this information. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-8-chris@chris-wilson.co.uk	2019-05-22 08:40:45 +01:00
Chris Wilson	78e41ddd21	drm/i915: Apply an execution_mask to the virtual_engine Allow the user to direct which physical engines of the virtual engine they wish to execute one, as sometimes it is necessary to override the load balancing algorithm. v2: Only kick the virtual engines on context-out if required Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-7-chris@chris-wilson.co.uk	2019-05-22 08:40:43 +01:00
Chris Wilson	6d06779e86	drm/i915: Load balancing across a virtual engine Having allowed the user to define a set of engines that they will want to only use, we go one step further and allow them to bind those engines into a single virtual instance. Submitting a batch to the virtual engine will then forward it to any one of the set in a manner as best to distribute load. The virtual engine has a single timeline across all engines (it operates as a single queue), so it is not able to concurrently run batches across multiple engines by itself; that is left up to the user to submit multiple concurrent batches to multiple queues. Multiple users will be load balanced across the system. The mechanism used for load balancing in this patch is a late greedy balancer. When a request is ready for execution, it is added to each engine's queue, and when an engine is ready for its next request it claims it from the virtual engine. The first engine to do so, wins, i.e. the request is executed at the earliest opportunity (idle moment) in the system. As not all HW is created equal, the user is still able to skip the virtual engine and execute the batch on a specific engine, all within the same queue. It will then be executed in order on the correct engine, with execution on other virtual engines being moved away due to the load detection. A couple of areas for potential improvement left! - The virtual engine always take priority over equal-priority tasks. Mostly broken up by applying FQ_CODEL rules for prioritising new clients, and hopefully the virtual and real engines are not then congested (i.e. all work is via virtual engines, or all work is to the real engine). - We require the breadcrumb irq around every virtual engine request. For normal engines, we eliminate the need for the slow round trip via interrupt by using the submit fence and queueing in order. For virtual engines, we have to allow any job to transfer to a new ring, and cannot coalesce the submissions, so require the completion fence instead, forcing the persistent use of interrupts. - We only drip feed single requests through each virtual engine and onto the physical engines, even if there was enough work to fill all ELSP, leaving small stalls with an idle CS event at the end of every request. Could we be greedy and fill both slots? Being lazy is virtuous for load distribution on less-than-full workloads though. Other areas of improvement are more general, such as reducing lock contention, reducing dispatch overhead, looking at direct submission rather than bouncing around tasklets etc. sseu: Lift the restriction to allow sseu to be reconfigured on virtual engines composed of RENDER_CLASS (rcs). v2: macroize check_user_mbz() v3: Cancel virtual engines on wedging v4: Commence commenting v5: Replace 64b sibling_mask with a list of class:instance v6: Drop the one-element array in the uabi v7: Assert it is an virtual engine in to_virtual_engine() v8: Skip over holes in [class][inst] so we can selftest with (vcs0, vcs2) Link: https://github.com/intel/media-driver/pull/283 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-6-chris@chris-wilson.co.uk	2019-05-22 08:40:38 +01:00
Chris Wilson	b81dde7194	drm/i915: Allow userspace to clone contexts on creation A usecase arose out of handling context recovery in mesa, whereby they wish to recreate a context with fresh logical state but preserving all other details of the original. Currently, they create a new context and iterate over which bits they want to copy across, but it would much more convenient if they were able to just pass in a target context to clone during creation. This essentially extends the setparam during creation to pull the details from a target context instead of the user supplied parameters. The ideal here is that we don't expose control over anything more than can be obtained via CONTEXT_PARAM. That is userspace retains explicit control over all features, and this api is just convenience. For example, you could replace struct context_param p = { .param = CONTEXT_PARAM_VM }; param.ctx_id = old_id; gem_context_get_param(&p.param); new_id = gem_context_create(); param.ctx_id = new_id; gem_context_set_param(&p.param); gem_vm_destroy(param.value); /* drop the ref to VM_ID handle */ with struct create_ext_param p = { { .name = CONTEXT_CREATE_CLONE }, .clone_id = old_id, .flags = CLONE_FLAGS_VM } new_id = gem_context_create_ext(&p); and not have to worry about stray namespace pollution etc. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-5-chris@chris-wilson.co.uk	2019-05-22 08:40:37 +01:00
Chris Wilson	8319f44c05	drm/i915: Re-expose SINGLE_TIMELINE flags for context creation The SINGLE_TIMELINE flag can be used to create a context such that all engine instances within that context share a common timeline. This can be useful for mixing operations between real and virtual engines, or when using a composite context for a single client API context. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-4-chris@chris-wilson.co.uk	2019-05-22 08:40:35 +01:00
Chris Wilson	e620f7b3a2	drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Allow the user to specify a local engine index (as opposed to class:index) that they can use to refer to a preset engine inside the ctx->engine[] array defined by an earlier I915_CONTEXT_PARAM_ENGINES. This will be useful for setting SSEU parameters on virtual engines that are local to the context and do not have a valid global class:instance lookup. Note that due to the ambiguity in using class:instance with ctx->engines[], if a user supplied engine map is active the user must specify the engine to alter by its index into the ctx->engines[]. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-3-chris@chris-wilson.co.uk	2019-05-22 08:40:34 +01:00
Chris Wilson	976b55f0e1	drm/i915: Allow a context to define its set of engines Over the last few years, we have debated how to extend the user API to support an increase in the number of engines, that may be sparse and even be heterogeneous within a class (not all video decoders created equal). We settled on using (class, instance) tuples to identify a specific engine, with an API for the user to construct a map of engines to capabilities. Into this picture, we then add a challenge of virtual engines; one user engine that maps behind the scenes to any number of physical engines. To keep it general, we want the user to have full control over that mapping. To that end, we allow the user to constrain a context to define the set of engines that it can access, order fully controlled by the user via (class, instance). With such precise control in context setup, we can continue to use the existing execbuf uABI of specifying a single index; only now it doesn't automagically map onto the engines, it uses the user defined engine map from the context. v2: Fixup freeing of local on success of get_engines() v3: Allow empty engines[] v4: s/nengine/num_engines/ v5: Replace 64 limit on num_engines with a note that execbuf is currently limited to only using the first 64 engines. v6: Actually use the engines_mutex to guard the ctx->engines. Testcase: igt/gem_ctx_engines Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-2-chris@chris-wilson.co.uk	2019-05-22 08:40:31 +01:00
Chris Wilson	7f3f317a66	drm/i915: Restore control over ppgtt for context creation ABI Having hid the partially exposed new ABI from the PR, put it back again for completion of context recovery. A significant part of context recovery is the ability to reuse as much of the old context as is feasible (to avoid expensive reconstruction). The biggest chunk kept hidden at the moment is fine-control over the ctx->ppgtt (the GPU page tables and associated translation tables and kernel maps), so make control over the ctx->ppgtt explicit. This allows userspace to create and share virtual memory address spaces (within the limits of a single fd) between contexts they own, along with the ability to query the contexts for the vm state. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190521211134.16117-1-chris@chris-wilson.co.uk	2019-05-22 08:40:29 +01:00
Ville Syrjälä	5c000fb33b	drm/i915: Bump gen7+ fb size limits to 16kx16k With gtt remapping in place we can use arbitrarily large framebuffers. Let's bump the limits to 16kx16k on gen7+. The limit was chosen to match the maximum 2D surface size of the 3D engine. With the remapping we could easily go higher than that for the display engine. However the modesetting ddx will blindly assume it can handle whatever is reported via kms. The oversized buffer dimensions are not caught by glamor nor Mesa until finally an assert will trip when genxml attempts to pack the SURFACE_STATE. So we pick a safe limit to avoid the X server from crashing (or potentially misbehaving if the genxml asserts are compiled out). Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110187 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-9-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2019-05-20 18:04:48 +03:00
Ville Syrjälä	2033012982	drm/i915: Bump fb stride limit to 128KiB for gen4+ and 256KiB for gen7+ With gtt remapping plugged in we can simply raise the stride limit on gen4+. Let's just pick the limit to match the render engine max stride (256KiB on gen7+, 128KiB on gen4+). No remapping CCS because the virtual address of each page actually matters due to the new hash mode (WaCompressedResourceDisplayNewHashMode:skl,kbl etc.), and no remapping on gen2/3 due extra complications from fence alignment and gen2 2KiB GTT tile size. Also no real benefit since the display engine limits already match the other limits. v2: Rebase due to is_ccs_modifier() v3: Tweak the comment and commit msg v4: Fix gen4+ stride limit to be 128KiB Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> #v3 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-8-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2019-05-20 18:04:48 +03:00
Ville Syrjälä	aa5ca8b742	drm/i915: Align dumb buffer stride to 4k to allow for gtt remapping Align dumb buffer stride to 4k if the fb will be big enough to require gtt remapping. v2: Leave the stride alone for buffers that look to be for the cursor v3: Make it not a hack (Daniel) Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-7-ville.syrjala@linux.intel.com Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2019-05-20 18:04:48 +03:00
Ville Syrjälä	54d4d719fa	drm/i915: Overcome display engine stride limits via GTT remapping The display engine stride limits are getting in our way. On SKL+ we are limited to 8k pixels, which is easily exceeded with three 4k displays. To overcome this limitation we can remap the pages in the GTT to provide the display engine with a view of memory with a smaller stride. The code is mostly already there as We already play tricks with the plane surface address and x/y offsets. A few caveats apply: * linear buffers need the fb stride to be page aligned, as otherwise the remapped lines wouldn't start at the same spot * compressed buffers can't be remapped due to the new ccs hash mode causing the virtual address of the pages to affect the interpretation of the compressed data. IIRC the old hash was limited to the low 12 bits so if we were using that mode we could remap. As it stands we just refuse to remapp with compressed fbs. * no remapping gen2/3 as we'd need a fence for the remapped vma, which we currently don't have. Need to deal with the fence POT requirements, and do something about the gen2 gtt page size vs tile size difference v2: Rebase due to is_ccs_modifier() Fix up the skl+ stride_mult mess memset() the gtt_view because otherwise we could leave junk in plane[1] when going from 2 plane to 1 plane format v3: intel_check_plane_stride() was split out v4: Drop the aligned viewport stuff, it was meant for ccs which can't be remapped anyway v5: Introduce intel_plane_can_remap() Reorder the code so that plane_state->view gets filled even for invisible planes, otherwise we'd keep using stale values and could explode during remapping. The new logic never remaps invisible planes since we don't have a viewport, and instead pins the full fb instead v6: Fix plane src coord checks after remapping by moving plane_state->base.src to the final plane x/y offsets. Allow intel_plane_check_stride() to fail even with remapping (can happen at least with a linear 64bpp fb with a 4k plane and a suitably inconvenient src coordinates). Improve aux plane FIXME (Daniel) Move some code shuffling into a separate patch (Daniel) Testcase: igt/kms_big_fb Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-6-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2019-05-20 18:04:47 +03:00
Ville Syrjälä	a88c40ebb8	drm/i915: Shuffle stride checking code around Reorganize some fb stride checking code a bit to prepare for gtt remapping. And do a bit of s/pitch/stride/ renaming in the process for a bit more uniformity (apart from the whole fb->pitches[] thing). Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-5-ville.syrjala@linux.intel.com Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2019-05-20 18:04:47 +03:00
Ville Syrjälä	bb211c3d0c	drm/i915/selftests: Add live vma selftest Add a live selftest to excercise rotated/remapped vmas. We simply write through the rotated/remapped vma, and confirm that the data appears in the right page when read through the normal vma. Not sure what the fallout of making all rotated/remapped vmas mappable/fenceable would be, hence I just hacked it in the test. v2: Grab rpm reference (Chris) GEM_BUG_ON(view.type not as expected) (Chris) Allow CAN_FENCE for rotated/remapped vmas (Chris) Update intel_plane_uses_fence() to ask for a fence only for normal vmas on gen4+ v3: Deal with intel_wakeref_t v4: Rebase Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-4-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2019-05-20 18:04:47 +03:00
Ville Syrjälä	e2e394bffa	drm/i915/selftests: Add mock selftest for remapped vmas Extend the rotated vma mock selftest to cover remapped vmas as well. TODO: reindent the loops I guess? Left like this for now to ease review v2: Include the vma type in the error message (Chris) v3: Deal with trimmed sg v4: Drop leftover debugs Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-3-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2019-05-20 18:04:47 +03:00
Ville Syrjälä	1a74fc0b3f	drm/i915: Add a new "remapped" gtt_view To overcome display engine stride limits we'll want to remap the pages in the GTT. To that end we need a new gtt_view type which is just like the "rotated" type except not rotated. v2: Use intel_remapped_plane_info base type s/unused/unused_mbz/ (Chris) Separate BUILD_BUG_ON()s (Chris) Use I915_GTT_PAGE_SIZE (Chris) v3: Use i915_gem_object_get_dma_address() (Chris) Trim the sg (Tvrtko) v4: Actually trim this time. Limit the max length to one row of pages to keep things simple Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190509122159.24376-2-ville.syrjala@linux.intel.com Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>	2019-05-20 18:04:47 +03:00
Chris Wilson	4cc79cbb01	drm/i915/execlists: Drop promotion on unsubmit With the disappearance of NEWCLIENT, we no longer need to provide the priority boost on preemption in order to prevent repeated gazumping, and we can remove the dead code. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-5-chris@chris-wilson.co.uk	2019-05-17 16:05:08 +01:00
Chris Wilson	68fc728b01	drm/i915: Downgrade NEWCLIENT to non-preemptive Commit `1413b2bc07` ("drm/i915: Trim NEWCLIENT boosting") had the intended consequence of not allowing a sequence of work that merely crossed into a new engine the privilege to be promoted to NEWCLIENT status. It also had the unintended consequence of actually making NEWCLIENT effective on heavily oversubscribed transcode machines and impacting upon their throughput. If we consider a client packet composed of (rcsA, rcsB, vcs) and 30 of those clients, using the NEWCLIENT boost that will be scheduled as rcsA x 30, (rcsB, vcs) x 30 where as before it would have been (rcsA, rcsB, vcs) x 30 That is with NEWCLIENT only boosting the first request of each client, we would execute all rcsA requests prior to running on the vcs engines; acruing a lot of dead time as compared to the previous case where the vcs engine would be started in parallel to processing the second client. The previous patch has the effect of delaying submission until it is required by a third party (either the user with an explicit wait, or by another client/engine). We reduce the NEWCLIENT bump to a mere WAIT, which has the effect of removing its preemptive grant and reducing it to the same level as any other user interaction -- that it will not be promoted above the interengine dependencies, and so preventing NEWCLIENTS from starving other engines. This a large nerf to the rrul properties of the current NEWCLIENT, but it still does give prioritised submission to new requests from light workloads. References: `b16c765122` ("drm/i915: Priority boost for new clients") Fixes: `1413b2bc07` ("drm/i915: Trim NEWCLIENT boosting") # customer impact Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Dmitry Ermilov <dmitry.ermilov@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-4-chris@chris-wilson.co.uk	2019-05-17 16:04:56 +01:00
Chris Wilson	6e7eb7a807	drm/i915: Bump signaler priority on adding a waiter The handling of the no-preemption priority level imposes the restriction that we need to maintain the implied ordering even though preemption is disabled. Otherwise we may end up with an AB-BA deadlock across multiple engine due to a real preemption event reordering the no-preemption WAITs. To resolve this issue we currently promote all requests to WAIT on unsubmission, however this interferes with the timeslicing requirement that we do not apply any implicit promotion that will defeat the round-robin timeslice list. (If we automatically promote the active request it will go back to the head of the queue and not the tail!) So we need implicit promotion to prevent reordering around semaphores where we are not allowed to preempt, and we must avoid implicit promotion on unsubmission. So instead of at unsubmit, if we apply that implicit promotion on adding the dependency, we avoid the semaphore deadlock and we also reduce the gains made by the promotion for user space waiting. Furthermore, by keeping the earlier dependencies at a higher level, we reduce the search space for timeslicing without altering runtime scheduling too badly (no dependencies at all will be assigned a higher priority for rrul). v2: Limit the bump to external edges (as originally intended) i.e. between contexts and out to the user. Testcase: igt/gem_concurrent_blit Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-3-chris@chris-wilson.co.uk	2019-05-17 16:04:46 +01:00
Chris Wilson	af461ff3fa	drm/i915/hdcp: Use both bits for device_count Smatch spotted: drivers/gpu/drm/i915//intel_hdcp.c:1406 hdcp2_authenticate_repeater_topology() warn: should this be a bitwise op? and indeed looks to be suspect that we do need to use a bitwise or to combine the two register fields into one counter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ramalingam C <ramalingam.c@intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190517102225.3069-3-chris@chris-wilson.co.uk	2019-05-17 14:40:50 +01:00
Chris Wilson	96ac08137e	drm/i915/dp: Initialise locals for static analysis Just to squelch an smatch warning that doesn't see the with_() being taken unconditionally: drivers/gpu/drm/i915//intel_dp.c:230 intel_dp_get_fia_supported_lane_count() error: uninitialized symbol 'lane_info'. drivers/gpu/drm/i915//intel_dp.c:5338 intel_digital_port_connected() error: uninitialized symbol 'is_connected'. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190517102225.3069-2-chris@chris-wilson.co.uk	2019-05-17 14:40:39 +01:00
Chris Wilson	17db337f50	drm/i915: Truly bump ready tasks ahead of busywaits In commit `b7404c7ecb` ("drm/i915: Bump ready tasks ahead of busywaits"), I tried cutting a corner in order to not install a signal for each of our dependencies, and only listened to requests on which we were intending to busywait. The compromise that was made was that instead of then being able to promote the request with a full NOSEMAPHORE like its non-busywaiting brethren, as we had not ensured we had cleared the semaphore chain, we settled for only using the NEWCLIENT boost. With an over saturated system with multiple NEWCLIENTS in flight at any time, this was found to be an inadequate promotion and left us with a much poorer scheduling order than prior to using semaphores. The outcome of this patch, is that all requests have NOSEMAPHORE priority when they have no dependencies and are ready to run and not busywait, restoring the pre-semaphore ordering on saturated systems. We can demonstrate the effect of poor scheduling order by oversaturating the system using gem_wsim on a system with multiple vcs engines (i.e running the same workloads across more clients than required for peak throughput, e.g. media_load_balance_17i7.wsim -c4 -b context): x v5.1 (normalized) + tip * fix +------------------------------------------------------------------------+ \| x \| \| x \| \| x \| \| x \| \| %x \| \| %%x \| \| %%x \| \| %%x \| \| %%x \| \| %%x \| \| %%x \| \| %%x \| \| %%x \| \| %%x \| \| %%x \| \| %#x \| \| %#x \| \| %#x \| \| %#x \| \| %#x \| \| + %#xx \| \| + %#xx \| \| + %%#xx \| \| + %%#xx \| \| + %%#xx \| \| + %%#xx \| \| + %%##x \| \| +++ %%##x \| \| +++ %%##x \| \| +++ %%##x \| \| ++++ %%##x \| \| ++++ %%##x \| \| ++++ %%##xx \| \| ++++ %###xx \| \| ++++ %###xx \| \| ++++ %###xx \| \| ++++ %###xx \| \| ++++ + %#O#xx \| \| ++++ + %#O#xx \| \| ++++++ + %#O#xx \| \| ++++++++++ %OOOxxx\| \| ++++++++++ + %#OOO#xx\| \| + ++++++++++++ ++ +++++ + ++ @@OOOO#xx\| \| \|A_\| \| \|\|__________M_______A____________________\| \| \| \|A_\| \| +------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 120 0.99456 1.00628 0.999985 1.0001545 0.0024387139 + 120 0.873021 1.00037 0.884134 0.90148752 0.039190862 Difference at 99.5% confidence -0.098667 +/- 0.0110762 -9.86517% +/- 1.10745% (Student's t, pooled s = 0.0277657) % 120 0.990207 1.00165 0.9970265 0.99699748 0.0021024 Difference at 99.5% confidence -0.003157 +/- 0.000908245 -0.315651% +/- 0.0908105% (Student's t, pooled s = 0.00227678) Fixes: `b7404c7ecb` ("drm/i915: Bump ready tasks ahead of busywaits") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Cc: Dmitry Ermilov <dmitry.ermilov@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-2-chris@chris-wilson.co.uk	2019-05-17 13:56:13 +01:00
Chris Wilson	dba5a7f301	drm/i915: Mark semaphores as complete on unsubmit out if payload was started Avoid charging us for the presumed busywait if the request was preempted after successfully using semaphores to reduce inter-engine latency. v2: Bump the priority to reflect the lack of semaphores now required. References: `ca6e56f654` ("drm/i915: Disable semaphore busywaits on saturated systems") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190515130052.4475-1-chris@chris-wilson.co.uk	2019-05-17 13:42:17 +01:00
Imre Deak	4e309bafeb	drm/i915: Assert that TypeC ports are not used for eDP Add an assert that we don't use TypeC ports for eDP. That may in theory be possible on TypeC legacy ports, but I'm not sure if that's a practical scenario, so let's deal with that only if there's a use case. Adding support for that wouldn't be too difficult, since TypeC mode switching is not possible on TypeC legacy ports. Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-12-imre.deak@intel.com	2019-05-14 14:06:30 +03:00
Imre Deak	b4c7ea6354	drm/i915: Avoid taking the PPS lock for non-eDP/VLV/CHV On ICL we have to make sure that we enable the AUX power domain in a controlled way (corresponding to the port's actual TypeC mode). Since the PPS lock - which takes an AUX power ref - is only needed on eDP on all platforms and eDP/DP on VLV/CHV avoid taking it in all other cases. v2: - Clarify commit log about the condition for taking the PPS lock. (Ville) Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-11-imre.deak@intel.com	2019-05-14 14:06:30 +03:00
Imre Deak	08d8e17005	drm/i915: Replace use of PLLS power domain with DISPLAY_CORE domain There isn't a separate power domain specific to PLLs. When programming them we require the same power domain to be enabled which is needed when accessing other display core parts (not specific to any pipe/port/transcoder). This corresponds to the DISPLAY_CORE domain added previously in this patchset, so use that instead to save bits in the power domain mask. Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-10-imre.deak@intel.com	2019-05-14 14:06:30 +03:00
Imre Deak	6f08ebe779	drm/i915: Remove the unneeded AUX power ref from intel_dp_hpd_pulse() The power get/put was added in commit `1c767b339b` ("drm/i915: take display port power domain in DP HPD handler") Author: Imre Deak <imre.deak@intel.com> Date: Mon Aug 18 14:42:42 2014 +0300 to account for the HW access in ibx_digital_port_connected(). This latter call was in turn removed in commit `7d23e3c37b` ("drm/i915: Cleaning up intel_dp_hpd_pulse") Author: Shubhangi Shrivastava <shubhangi.shrivastava@intel.com> Date: Wed Mar 30 18:05:23 2016 +0530 after which we didn't actually need the power reference. One way we are accessing the HW during HPD pulse handling is via DP AUX transfers, but the transfer function takes its own reference, so doesn't need the reference in intel_dp_hpd_pulse(). The other spot is in intel_psr_short_pulse()->intel_psr_disable_locked() but that can only happen when the panel is enabled with the corresponding modeset already holding the required power reference. v2: - Remove the unneeded power get/put from intel_psr_disable_locked(). (Ville) - Checkpatch commit quoting format fix in the commit log. Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-9-imre.deak@intel.com	2019-05-14 14:06:30 +03:00
Imre Deak	6cfe7ec02e	drm/i915: Remove the unneeded AUX power ref from intel_dp_detect() We don't need the AUX power for the whole duration of the detect, only when we're doing AUX transfers. The AUX transfer function takes its own reference on the AUX power domain already. The two places during detect which access display core registers (not specific to a pipe/port/transcoder) only need the power domain that is required for that access. That power domain is equivalent to the device global power domain on most platforms (enabled whenever we hold a runtime PM reference) except on CHV/VLV where it's equivalent to the display power well. Add a new power domain that reflects the above, and use this at the two spots accessing registers. With that we can avoid taking the AUX reference for the whole duration of the detect function. Put the domains asynchronously to avoid the unneeded on-off-on toggling. Also adapt the idea from with_intel_runtime_pm et al. for making it easy to write short sequences where a display power ref is needed. v2: (Ville) - Add with_intel_display_power() helper to simplify things. - s/bool res/bool is_connected/ Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-8-imre.deak@intel.com	2019-05-14 14:06:30 +03:00
Imre Deak	ad5125d6ef	drm/i915: WARN for eDP encoders in intel_dp_detect_dpcd() We are not calling this function for eDP, so add an early assert about this for clarity. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-7-imre.deak@intel.com	2019-05-14 14:06:30 +03:00
Imre Deak	f39194a7a8	drm/i915: Disable power asynchronously during DP AUX transfers In a follow-up patch we will restrict holding the reference on the AUX power domain to the AUX transfer function. To avoid the unnecessary on-off-on power togglings drop the reference asynchronously. There is no reason we couldn't do this in general and also put the reference asynchronously in pps_unlock(); but that's a separate change that can be done as a follow-up. Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-6-imre.deak@intel.com	2019-05-14 14:06:30 +03:00
Imre Deak	e0da2d63ab	drm/i915: Add support for asynchronous display power disabling By disabling a power domain asynchronously we can restrict holding a reference on that power domain to the actual code sequence that requires the power to be on for the HW access it's doing, by also avoiding unneeded on-off-on togglings of the power domain (since the disabling happens with a delay). One benefit is potential power saving due to the following two reasons: 1. The fact that we will now be holding the reference only for the necessary duration by the end of the patchset. While simply not delaying the disabling has the same benefit, it has the problem that frequent on-off-on power switching has its own power cost (see the 2. point below) and the debug trace for power well on/off events will cause a lot of dmesg spam (see details about this further below). 2. Avoiding the power cost of freuqent on-off-on power switching. This requires us to find the optimal disabling delay based on the measured power cost of on->off and off->on switching of each power well vs. the power of keeping the given power well on. In this patchset I'm not providing this optimal delay for two reasons: a) I don't have the means yet to perform the measurement (with high enough signal-to-noise ratio, or with the help of an energy counter that takes switching into account). I'm currently looking for a way to measure this. b) Before reducing the disabling delay we need an alternative way for debug tracing powerwell on/off events. Simply avoiding/throttling the debug messages is not a solution, see further below. Note that even in the case where we can't measure any considerable power cost of frequent on-off switching of powerwells, it still would make sense to do the disabling asynchronously (with 0 delay) to avoid blocking on the disabling. On VLV I measured this disabling time overhead to be 1ms on average with a worst case of 4ms. In the case of the AUX power domains on ICL we would also need to keep the sequence where we hold the power reference short, the way it would be by the end of this patchset where we hold it only for the actual AUX transfer. Anything else would make the locking we need for ICL TypeC ports (whenever we hold a reference on any AUX power domain) rather problematic, adding for instance unnecessary lockdep dependencies to the required TypeC port lock. I chose the disabling delay to be 100msec for now to avoid the unneeded toggling (and so not to introduce dmesg spamming) in the DP MST sideband signaling code. We could optimize this delay later, once we have the means to measure the switching power cost (see above). Note that simply removing/throttling the debug tracing for power well on/off events is not a solution. We need to know the exact spots of these events and cannot rely only on incorrect register accesses caught (due to not holding a wakeref at the time of access). Incorrect powerwell enabling/disabling could lead to other problems, for instance we need to keep certain powerwells enabled for the duration of modesets and AUX transfers. v2: - Clarify the commit log parts about power cost measurement and the problem of simply removing/throttling debug tracing. (Chris) - Optimize out local wakeref vars at intel_runtime_pm_put_raw() and intel_display_power_put_async() call sites if CONFIG_DRM_I915_DEBUG_RUNTIME_PM=n. (Chris) - Rebased on v2 of the wakeref w/o power-on guarantee patch. - Add missing docbook headers. v3: - Checkpatch spelling/missing-empty-line fix. v4: - Fix unintended local wakeref var optimization when using call-arguments with side-effects, by using inline funcs instead of macros. In this patch in particular this will fix the intel_display_power_grab_async_put_ref()->intel_runtime_pm_put_raw() call). No size change in practice (would be the same disregarding the corresponding change in intel_display_power_grab_async_put_ref()): $ size i915-macro.ko text data bss dec hex filename 2455190 105890 10272 2571352 273c58 i915-macro.ko $ size i915-inline.ko text data bss dec hex filename 2455195 105890 10272 2571357 273c5d i915-inline.ko Kudos to Stan for reporting the raw-wakeref WARNs this issue caused. His config has CONFIG_DRM_I915_DEBUG_RUNTIME_PM=n, which I didn't retest after v1, and we are also not testing this config in CI. Now tested both with CONFIG_DRM_I915_DEBUG_RUNTIME_PM=y/n on ICL, connecting both Chamelium and regular DP, HDMI sinks. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjala <ville.syrjala@linux.intel.com> Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190513192533.12586-1-imre.deak@intel.com	2019-05-14 14:06:10 +03:00
Imre Deak	ee70080a52	drm/i915: Verify power domains state during suspend in all cases There is no reason why we couldn't verify the power domains state during suspend in all cases, so do that. I overlooked this when originally adding the check. Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190509173446.31095-4-imre.deak@intel.com	2019-05-14 13:56:11 +03:00

1 2 3 4 5 ...

827754 Commits